Conference papers (NTIS)
Permanent URI for this collection
Browse
Recent Submissions
Item Behind the curtain of clinical data science(2025) Wolf, Kateřina; Fatka, Jiří; Jani, Filip; Dekojová, Tereza; Holubová, Monika; Houdová, LucieAnalytical pipelines and software processing biological data have become standard. The software tools development does not end with a release because continuous maintenance and further improvement of knowledge and data science behind is always necessary. A large part of the knowledge integration plays data. Data is an essential part of any analytical pipeline and may significantly impact on the result’s correctness, precision, and interpretation. An equally important part is the data representation itself and its comprehensibility. Moreover, the rapidly changing world of science, as well as technological progress (e.g., artificial intelligence), brings further challenges.Item FungiTastic: A Multi-Modal Dataset and Benchmark for Image Categorization(IEEE Computer Society, 2025) Picek, Lukáš; Janouskova, Klara; Čermák, Vojtěch; Matas, JiříWe introduce a new, challenging benchmark and a dataset, FungiTastic, based on fungal records continuously collected over a twenty-year span. The dataset is labelled and curated by experts and consists of about 350k multimodal observations of 6k fine-grained categories (species). The fungi observations include photographs and additional data, e.g., meteorological and climatic data, satellite images, and body part segmentation masks. FungiTastic is one of the few benchmarks that include a test set with DNA-sequenced ground truth of unprecedented label reliability. The benchmark is designed to support (i) standard closed-set classification, (ii) open-set classification, (iii) multi-modal classification, (iv) few-shot learning, (v) domain shift, and many more. We provide tailored baselines for many use cases, a multitude of ready-to-use pre-trained models on HuggingFace, and a framework for model training. The documentation and the baselines are available at GitHub and Kaggle.Item 2COOOL: 2nd Workshop on the Challenge of Out-of-Label Hazards in Autonomous Driving(Institute of Electrical and Electronics Engineers Inc., 2025) Alshami, Ali K.; Rabinowitz, Ryan; Shoman, Maged; Fang, Jianwu; Picek, Lukáš; Lo, Shao-Yuan; Cruz, Steve; Lam, Khang Nhut; Kamod, Nachiket; Li, Lei-Lei; Kalita, Jugal; Boult, Terrance E.As the Computer Vision community advances autonomous driving algorithms, integrating vision-based insights with sensor data remains essential for improving perception, decision-making, planning, prediction, simulation, and control. Yet we must ask: It's 2025-why don't we have entirely safe self-driving cars yet? A key part of the answer lies in addressing novel scenarios, one of the most critical barriers to real-world deployment. Our 2COOOL workshop provides a dedicated forum for researchers and industry experts to push the state-of-the-art in novelty handling, including out-of-distribution hazard detection, vision-language models for hazard understanding, new benchmarking and methodologies, and safe autonomous driving practices. The '2nd Workshop on the Challenge of Out-of-Label Hazards in Autonomous Driving' (2COOOL) will be held at the International Conference on Computer Vision (ICCV) 2025 in Honolulu, Hawaii, on October 19, 2025. We aim to inspire the development of new algorithms and systems for hazard avoidance, drawing on ideas from anomaly detection, open-set recognition, open-vocabulary modeling, domain adaptation, and related fields. Building on the success of its inaugural edition at the Winter Conference on Applications of Computer Vision (WACV) 2025, the workshop will feature a dynamic mix of academic and industry participation.Item The SignEval 2025 Challenge at the ICCV Multimodal Sign Language Recognition Workshop: Results and Discussion(Institute of Electrical and Electronics Engineers Inc., 2025) Luqman, Hamzah; Mineo, Raffaele; Aljubran, Murtadha; Hasanaath, Ahmed Abul; Sorrenti, Amelia; Alyami, Sarah; Al-Azani, Sadam; Alowaifeer, Maad; Moon, JiHwan; Javorek, Václav; Železný, Tomáš; Hrúz, Marek; Caligiore, Gaia; Giancola, Silvio; Polikovsky, Senya; Alfarraj, Motaz; Fontana, Sabina; Mahmud, Mufti; Khan, Muhammad Haris; Islam, Kamrul; Gurbuz, Sevgi; Ragonese, Egidio; Bellitto, Giovanni; Salanitri, Federica Proietto; Spampinato, Concetto; Palazzo, SimoneThis paper summarizes the results of the first multimodal sign language recognition challenge, SignEval 2025, organized at ICCV 2025. The challenge featured two tracks: (i) Continuous sign language recognition (CSLR) task based on the newly curated Isharah dataset, a Saudi Sign Language dataset, and (ii) Isolated sign language recognition (ISLR) task using the MultiMeDaLIS dataset, a multimodal Italian Sign Language corpus tailored for doctor-patient communication. Two tasks are defined within the CSLR track: Signer-Independent and Unseen-Sentences. The Signer-Independent task tests the model's ability to generalize across signers, a critical property for scalable real-world CSLR systems. The Unseen-Sentences task evaluates the model's capability to recognize novel sentence compositions by leveraging learned grammar and semantics. The ISLR track utilized MultiMeDaLIS, a multi-modal dataset. The participants of this track were challenged to classify isolated signs using only radar and RGB modalities. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks and achieving state-of-the-art results on both tracks. More information on the challenges, tasks, leaderboard, baselines and development kits are available on https://multimodal-sign-language-recognition.github.io/ICCV-2025/.Item Overview of FungiCLEF 2025: Few-Shot Classification With Rare Fungi Species(CEUR-WS, 2025) Janouskova, Klara; Matas, Jiří; Picek, LukášFungiCLEF 2025, the 4th edition of the FungiCLEF challenge, was organized as part of the LifeCLEF and the FGVC workshops. This year’s edition targeted few-shot classification of rare fungi species. Participants were tasked with identifying species from multimodal observations, including images, structured metadata, and environmental data. The data was collected through citizen science and underwent expert-based labeling. Building upon the FungiTastic dataset, FungiCLEF 2025 emphasized real-world constraints such as limited training samples, high intra-class variability, fine-grained inter-class similarities, and distribution shift. The competition attracted 74 teams, with the leading submissions demonstrating significant gains over the provided baselines, showcasing the potential of pretrained vision transformers, contrastive learning, and ensemble techniques. This overview summarizes the challenge setup, dataset, baselines, participant strategies, and key findings, and outlines directions for future work. The winning team achieved a top-5 accuracy of 78.9%, outperforming baselines by over 52%. © 2025 Copyright for this paper by its authors.Item Overview of AnimalCLEF 2025: Recognizing Individual Animals in Images(CEUR-WS, 2025) Adam, Lukáš; Papafitsoros, Kostas; Kovář, Roman; Čermák, Vojtěch; Picek, LukášThe first edition of the individual animal identification challenge, AnimalCLEF 2025, organized within LifeCLEF,advances the field of animal re-identification using computer vision and machine learning. Building on theWildlifeReID-10k dataset and incorporating new data, AnimalCLEF 2025 challenges participants to recognizeindividual animals from images for three species: lynxes, salamanders and sea turtles. The mix of species withdifferent image capture conditions attempts to make the submitted prediction models generalizable to unseenspecies. The competition attracted 270 participants across 230 teams, with 136 outperforming the provided baselinebased on MegaDescriptor. This overview paper provides (i) a comprehensive description of the challenge andprovided baseline method, (ii) detailed characteristics of the dataset and task specifications, (iii) an examinationof the methods employed by contestants, and (iv) a discussion of the competition outcomes. The results highlightincremental advancements in animal re-identification, showcasing innovative approaches and techniques thatpush the limits of previous work.Item Zero-shot hazard identification in Autonomous Driving: A Case Study on the COOOL Benchmark(Institute of Electrical and Electronics Engineers Inc., 2025) Picek, Lukáš; Čermák, Vojtěch; Hanzl, MarekThis paper presents our submission to the COOOL com-petition, a novel benchmark for detecting and classifying out-of-label hazards in autonomous driving. Our approach integrates diverse methods across three core tasks: (i) driver reaction detection, (ii) hazard object identification, and (iii) hazard captioning. We propose kernel-based change point detection on bounding boxes and optical flow dynamics for driver reaction detection to analyze motion patterns. For hazard identification, we combined a naive proximity-based strategy with object classification using a pre-trained ViT model. At last, for hazard captioning, we used the Molmo vision-language model with tailored prompts to generate precise and context-aware descriptions of rare and low-resolution hazards. The proposed pipeline outperformed the baseline methods by a large margin, re-ducing the relative error by 33%, and scored 2nd on the final leaderboard consisting of 32 teams.Item WildlifeReID-10k: Wildlife Re-Identification Dataset with 10k Individual Animals(IEEE Computer Society, 2025) Adam, Lukáš; Čermák, Vojtěch; Papafitsoros, Kostas; Picek, LukášThis paper introduces WildlifeReID-10k, a new large-scale re-identification benchmark with more than 10k animal identities of around 33 species across more than 140k images, resampled from 37 existing datasets. WildlifeReID-10k covers diverse animal species and poses significant challenges for SoTA methods, ensuring fair and robust evaluation through its time-aware and similarity-aware split protocol. The latter is designed to address the common issue of training-to-test data leakage caused by visually similar images appearing in both training and test sets. The WildlifeReID-10k dataset and benchmark are publicly available on Kaggle, along with strong baselines for both closed-set and open-set evaluation, enabling fair, transparent, and standardized evaluation of not just multi-species animal re-identification models.Item Tensile Properties of 3D Printed Infill Structures with Different Densities(Technical University of Liberec, 2025) Heczko, Jan; Krystek, Jan; Laš, VladislavThe tensile properties of 3D printed rectilinear infill pattern are investigated. Stress-strain curves are obtained for specimens with different infill densities and the Young’s modulus and ultimate tensile strength are evaluated.Item Time homogenization in modelling of rubber damage and ageing(CRC Press/Balkema, 2025) Heczko, JanThe contribution is focused on cumulation of the effects of damage and ageing of elastomers on their stiffness properties. An earlier model is enhanced by introducing permanent set to an approximation of hyperelastic parameters as functions of time and loading type. The nature of the model, however, does not offer any straightforward way of extrapolation in case of different loading modes or service conditions. Therefore, a complex material model that takes into account different mechanisms of ageing and damage is considered. In order to enable numerical simulations of high-cycle loading, the method of asymptotic series is applied resulting in a time-homogenized version of the model. In addition to savings in computational time, the model is formally similar to the original approximation.Item Short cycle covers and the colouring defect of a cubic graph(Elsevier B.V., 2025) Karabáš, Ján; Máčajová, Edita; Nedela, Roman; Škoviera, MartinA longstanding conjecture of Alon and Tarsi, and indepentry Jaeger (1985), suggests that the edges of every bridgeless graph can be covered with cycles of total length at most 7/5 •m, where m is the number of edges. We study the relationship between cycle covers and structural properties of cubic graphs, focusing on their colouring defect. This invariant, introduced by Steffen in 2015, is defined as the minimum number of edges left uncovered by any set of three perfect matchings of a cubic graph. We show that every bridgeless cubic graph with colouring defect not exceeding 3 admits a cycle cover of length at most 4/3 •m + 1, just one step above the universal lower bound of 4/3 •m for all cubic graphs. We also prove that, regardless of defect, the same bound holds for bridgeless cubic graphs that have an edge whose end vertices removed yield a 3-edge-colourable graph and the edge lies on a 5-cycle. Motivated by our investigations, we introduce a new invariant for cubic graphs, their covering excess, to measure the deviation of the length of a shortest cycle cover from the mentioned lower bound. Finally, we show that every bridgeless cubic graph with covering excess at most 1 admits a cycle double cover.Item Colouring defect of strong snarks(Elsevier B.V., 2025) Karabáš, Ján; Máčajová, Edita; Nedela, Roman; Škoviera, MartinA strong snark is a 2-connected cubic graph which is not 3-edge-colourable and remains so after deleting any edge and suppressing the resulting 2-valent vertices. Strong snarks were introduced by Jaeger in 1985 as a class of cubic graphs that might include counterexamples to the cycle double cover conjecture, the 5-flow conjecture, or to other related longstanding conjectures. With these conjectures still widely open, strong snarks merit further investigation. In this paper we study colouring defect of strong snarks, an invariant introduced by Steffen in 2015 as the minimum number of edges of a cubic graph left uncovered by any set of three perfect matchings. This invariant provides one of measures of edge uncolourability of cubic graphs recently studied by several authors. Our main result shows that the colouring defect of a strong snark is at least 6, and that the bound is sharp.Item RailSafeNet: Visual Scene Understanding for Tram Safety(Springer Cham, 2026) Valach, Ondřej; Gruber, IvanTram-human interaction safety is an important challenge, given that trams frequently operate in densely populated areas, where collisions can range from minor injuries to fatal outcomes. This paper addresses the issue from the perspective of designing a solution leveraging digital image processing, deep learning, and artificial intelligence to improve the safety of pedestrians, drivers, cyclists, pets, and tram passengers. We present RailSafeNet, a real-time framework that fuses semantic segmentation, object detection and a rule-based Distance Assessor to highlight track intrusions. Using only monocular video, the system identifies rails, localises nearby objects and classifies their risk by comparing projected distances with the standard 1435 mm rail gauge. Experiments on the diverse RailSem19 dataset show that a class-filtered SegFormer B3 model achieves 65% intersection-over-union (IoU), while a fine-tuned YOLOv8 attains 75.6% mean average precision (mAP) calculated at an intersection over union (IoU) threshold of 0.50. RailSafeNet therefore delivers accurate, annotation-light scene understanding that can warn drivers before dangerous situations escalate. Code available at https://github.com/oValach/RailSafeNet.Item Saudi Sign Language Translation Using T5(Springer Cham, 2026) Alhejab, Ali; Železný, Tomáš; Alkanhal, Lamya; Gruber, Ivan; Alharbi, Yazeed; Straka, Jakub; Javorek, Václav; Hrúz, Marek; Alkalifah, Badriah; Ali, AhmedThis paper explores the application of T5 models for Saudi Sign Language (SSL) translation using a novel dataset. The SSL dataset includes three challenging testing protocols, enabling comprehensive evaluation across different scenarios. Additionally, it captures unique SSL characteristics, such as face coverings, which pose challenges for sign recognition and translation. In our experiments, we investigate the impact of pre-training on American Sign Language (ASL) data by comparing T5 models pre-trained on the YouTubeASL dataset with models trained directly on the SSL dataset. Experimental results demonstrate that pre-training on YouTubeASL significantly improves models' performance (roughly in BLEU-4), indicating cross-linguistic transferability in sign language models. Our findings highlight the benefits of leveraging large-scale ASL data to improve SSL translation and provide insights into the development of more effective sign language translation systems. Our code is publicly available at our GitHub repository.Item Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR(Springer, 2026) Pražák, Aleš; Kunešová, Marie; Psutka, JosefOverlapping speech remains a major challenge for automatic speech recognition (ASR) in real-world applications, particularly in broadcast media with dynamic, multi-speaker interactions. We propose a light-weight, target-speaker-based extension to an existing streaming ASR system to enable practical transcription of overlapping speech with minimal computational overhead. Our approach combines a speaker-independent (SI) model for standard operation with a speaker-conditioned (SC) model selectively applied in overlapping scenarios. Overlap detection is achieved using a compact binary classifier trained on frozen SI model output, offering accurate segmentation at negligible cost. The SC model employs Feature-wise Linear Modulation (FiLM) to incorporate speaker embeddings and is trained on synthetically mixed data to transcribe only the target speaker. Our method supports dynamic speaker tracking and reuses existing modules with minimal modifications. Evaluated on a challenging set of Czech television debates with 16% overlap, the system reduced WER on overlapping segments from 68.0% (baseline) to 35.78% while increasing total computational load by 44%. The proposed system offers an effective and scalable solution for overlap transcription in continuous ASR services.Item An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-Shot Multi-speaker TTS(Springer, 2026) Kunešová, Marie; Hanzlíček, Zdeněk; Matoušek, JindřichZero-shot multi-speaker text-to-speech (TTS) systems rely on speaker embeddings to synthesize speech in the voice of an unseen speaker, using only a short reference utterance. While many speaker embeddings have been developed for speaker recognition, their relative effectiveness in zero-shot TTS remains underexplored. In this work, we employ a YourTTS-based TTS system to compare three different speaker encoders – YourTTS’s original H/ASP encoder, x-vector embeddings, and ECAPA-TDNN embeddings – within an otherwise fixed zero-shot TTS framework. All models were trained on the same dataset of Czech read speech and evaluated on 24 out-of-domain target speakers using both subjective and objective methods. The subjective evaluation was conducted via a listening test focused on speaker similarity, while the objective evaluation measured cosine distances between speaker embeddings extracted from synthesized and real utterances. Across both evaluations, the original H/ASP encoder consistently outperformed the alternatives, with ECAPA-TDNN showing better results than x-vectors. These findings suggest that, despite the popularity of ECAPA-TDNN in speaker recognition, it does not necessarily offer improvements for speaker similarity in zero-shot TTS in this configuration. Our study highlights the importance of empirical evaluation when reusing speaker recognition embeddings in TTS and provides a framework for additional future comparisons.Item Ruminal Probes with Reliable Wireless Data Transmission(IEEE, 2025) Čečil, Roman; Kumprechtová, Dana; Koukolová, VeronikaThe paper presents design of in-vivo ruminal pH probes and compares them with current state-of-the-art alternatives. The main advantage of the proposed solution is a novel, patented approach for transmitting measured data from bovine rumen to a server or cloud via a re-transmitter placed in a collar device at the animal’s neck. Additionally, in comparison to the commonly measured values such as ruminal temperature and pH, the proposed solution enhances the data set with oxido-reduction potential (ORP) values measured by the ruminal probe and activity data acquired by a MEMS accelerometer located in the collar device.Item Coordinated operation between distribution system operator and demand-side management(IEEE, 2025) Hering, Pavel; Střelec, MartinThis paper presents a coordination scheme that aims to minimize the required data exchange between the distribution system operator (DSO) and stakeholders actively operating on the demand side (e.g. prosumers, EV charging point operators). The proposed methodology reduces the complexity of coordination among various stakeholders with the objective of maximizing the utilization of the available grid hosting capacity. The proposed decoupled optimization approach consists of two distinct optimization problems: i) a region-based optimization method that determines secure operational limits for power injections at consumption points from the grid perspective, and ii) a demand-side optimization concerning the operation of technology asset groups implementing local operational policies, such as cost-optimal operation, operation in accordance with the maximum secure active power injections and provision of system services to DSO. The decoupled optimization problems are formally defined and the overall methodological approach is demonstrated using three use cases related to different demand-side operational policies.Item Modelování a identifikace portálových jeřábů(Západočeská univerzita v Plzni, 2025) Sukovatý, DanielPříspěvek se zabývá modelováním a identifikací dynamiky portálového jeřábu za účelem návrhu řízení s automatickým tlumením nežádoucích kmitů zavěšené zátěže. Jeřáb je aproximován jako trojité kyvadlo zavěšené na posuvném vozíku, přičemž každé rameno je popsáno polohou těžiště, délkou, hmotností, momentem setrvačnosti a koeficientem tlumení. Pohybové rovnice jsou odvozeny pomocí Lagrangeovy metody a linearizovány v dolní rovnovážné poloze, čímž vzniká stavový model vhodný pro syntézu regulátoru. Pro identifikaci neznámých parametrů zátěže je navržena metoda využívající první dvě rezonanční frekvence systému. Z porovnání charakteristického polynomu matice dynamiky a její Jordanovy formy je sestrojena účelová funkce, jejíž minimum odpovídá fyzikálním parametrům; ta je hledána negradientní optimalizační metodou Pattern Search. Rezonanční frekvence jsou odhadovány experimentálně pomocí reléové zpětné vazby na základě harmonické linearizace. V simulacích dosahují postupy dostatečné přesnosti, u reálného systému se však objevují významné odchylky zejména u druhé rezonance a výpočty parametrů jsou časově náročné. Diskutována je možnost využití online identifikace, například na bázi Kalmanova filtru.Item Multi-label Classification and Named Entity Recognition for Historical Documents(Springer, 2025) Gruber, Ivan; Hlaváč, Miroslav; Neduchal, Petr; Hrúz, MarekIn this paper, we present improvements to our processing pipeline for historical document digitization. The original pipeline is extended with two new functionalities - page labeling, and named entity recognition. We handle page labeling as a multi-label classification task, for which we choose the Query2Label approach. Query2Label is tested on our internal NKVD dataset and reaches a mean average precision equal to 80.03% on the test set. For the named entity recognition task we utilize pre-trained transformer-based models DeepPavlov and benchmark them on two entities - person name, and location. The best model reaches promising results despite not being trained on our data at all.