Conference Papers (KKY)
Permanent URI for this collection
Browse
Recent Submissions
Item Aspects of density approximation by tensor trains(IEEE, 2025) Ajgl, Jiří; Straka, OndřejPoint-mass filters solve Bayesian recursive relations by approximating probability density functions of a system state over grids of discrete points. The approach suffers from the curse of dimensionality. The exponential increase of the number of the grid points can be mitigated by application of low-rank approximations of multidimensional arrays. Tensor train decompositions represent individual values by the product of matrices. This paper focuses on selected issues that are substantial in state estimation. Namely, the contamination of the density approximations by negative values is discussed first. Functional decompositions of quadratic functions are compared with decompositions of discretised Gaussian densities next. In particular, the connection of correlation with tensor train ranks is explored. Last, the consequences of interpolating the density values from one grid to a new grid are analysed.Item Stone Soup: ADS-B-based Multi-Target Tracking with Stochastic Integration Filter(IEEE, 2025) Hiles, John; Matoušek, Jakub; Blasch, Erik; Niu, Ruixin; Straka, Ondřej; Duník, JindřichThis paper focuses on the multi-target tracking using the Stone Soup framework. In particular, we aim at evaluation of two multi-target tracking scenarios based on the simulated class-B dataset and ADS-B class-A dataset provided by OpenSky Network. The scenarios are evaluated w.r.t. selection of a local state estimator using a range of the Stone Soup metrics. Source code with scenario definitions and Stone Soup set-up are provided along with the paper.Item Interpretable Augmented Physics-Based Model for Estimation and Tracking(IEEE, 2025) Straka, Ondřej; Duník, Jindřich; Closas, Pau; Imbiriba, TalesState-space estimation and tracking rely on accurate dynamical models to perform well. However, obtaining an accurate dynamical model for complex scenarios or adapting to changes in the system poses challenges to the estimation process. Recently, augmented physics-based models (APBMs) appear as an appealing strategy to cope with these challenges where the composition of a small and adaptive neural network with known physics-based models (PBM) is learned on the fly following an augmented state-space estimation approach. A major issue when introducing data-driven components in such a scenario is the danger of compromising the meaning (or interpretability) of estimated states. In this work, we propose a novel constrained estimation strategy that constrains the APBM dynamics close to the PBM. The novel state-space constrained approach leads to more flexible ways to impose constraints than the traditional APBM approach. Our experiments with a radar-tracking scenario demonstrate different aspects of the proposed approach and the trade-offs inherent in the imposed constraints.Item Efficient Gaussian Mixture Filters Based on Transition Density Approximation(IEEE, 2025) Straka, Ondřej; Hanebeck, Uwe D.Gaussian mixture filters for nonlinear systems usually rely on severe approximations when calculating mixtures in the prediction and filtering step. Thus, offline approximations of noise densities by Gaussian mixture densities to reduce the approximation error have been proposed. This results in exponential growth in the number of components, requiring ongoing component reduction, which is computationally complex. In this paper, the key idea is to approximate the true transition density by an axis-aligned Gaussian mixture, where two different approaches are derived. These approximations automatically ensure a constant number of components in the posterior densities without the need for explicit reduction. In addition, they allow a trade-off between estimation quality and computational complexity.Item Asking Questions: an Innovative Way to Interact with Oral History Archives(International Speech Communication Association, 2023) Švec, Jan; Bulín, Martin; Frémund, Adam; Polák, FilipThe paper describes our initial effort to use Transformer-based neural networks for understanding and presenting oral history archives. Such archives of interviews often contain large passages of the interviewee’s speech. Our approach automatically generates relevant questions, which enrich such monotonous parts and allows the listener to better orient in the interview. The generated questions also allow for finding interesting parts of the interview without changing the original meaning of the testimony. We present our working pipeline consisting of a Wav2Vec speech recognizer, BERT-based punctuation detection, T5 asking questions model and BERT-based semantic continuity model.Item The System for Efficient Indexing and Search in the Large Archives of Scanned Historical Documents(Springer, 2023) Bulín, Martin; Švec, Jan; Ircing, PavelThe paper introduces software capable of indexing and searching large archives of scanned historical documents. The system capabilities are demonstrated on the collection containing documents from the archives of the post-Soviet security services. The backend of the system was designed with a focus on flexibility (it is actually already being used for other related tasks) and scalability to larger volumes of data. The graphical user interface design has been consulted with historians interested in using the archived documents and was developed in several iterations, gradually including the changes induced both by the user’s requests and by our improving knowledge about the nature of the processed data.Item Multimodal Low-Cost Robotic Entity based on Raspberry Pi(Západočeská univerzita v Plzni, 2023) Bulín, MartinWith the presence of numerous high-priced robotic entities available in the market, there arises a pressing need to develop low-cost alternatives for proof-of-concept validation, under- scoring the demand for affordable solutions. This study focuses on integrating students’ ma- chine learning projects onto a real robotic platform, facilitating hands-on experience and bridg- ing the theory-practice gap. The primary objective is to develop a versatile robotic device with multiple interfaces as a platform for students’ projects implementation and fundamental ideas verification. By applying their ideas to an embodied entity and testing real-life scenarios, stu- dents can enhance their understanding of complex principles while fostering innovation.Item Design of Efficient Point-Mass Filter with Terrain Aided Navigation Illustration(IEEE, 2023) Matoušek, Jakub; Duník, Jindřich; Brandner, MarekThis paper deals with state estimation of stochastic models with linear state dynamics, continuous or discrete in time. The emphasis is laid on a numerical solution to the state prediction by the time-update step of the grid-point-based point-mass filter (PMF), which is the most computationally demanding part of the PMF algorithm. A novel efficient PMF (ePMF) estimator, unifying continuous and discrete, approaches is proposed, designed, and discussed. By numerical illustrations, it is shown, that the proposed ePMF can lead to a time complexity reduction that exceeds 99.9% without compromising accuracy. The MATLAB® code of the ePMF is released with this paper.Item Distributed Point-Mass Filter with Reduced Data Transfer Using Copula Theory(IEEE, 2023) Matoušek, Jakub; Duník, Jindřich; Forsling, RobinThis paper deals with distributed Bayesian state estimation of generally nonlinear stochastic dynamic systems. In particular, distributed point-mass filter algorithm is developed. It is comprised of a basic part that is accurate but data intense and optional step employing advanced copula theory. The optional step significantly reduces data transfer for the price of a small accuracy decrease. In the end, the developed algorithm is numerically compared to the usually employed distributed extended Kalman filter.Item Automatic Testing of PI(D) Autotuners(IEEE Nuclear and Plasma Sciences Society, 2023) Slavíček, Lukáš; Balda, Pavel; Schlegel, MilošThis paper presents a software tool for the automatic testing of PI(D) autotuners. The described tool was created in the REXYGEN system. The measured data were subsequently processed in MATLAB. The PID_Compact controller from global automation producer Siemens and PIDMA controller from Czech SME company REX Controls were tested. The autotuning methods were compared on a subset of well-known PI(D) control benchmarks by K. J. Åström and T. Hägglund. The presented results show the number of failed tuning experiments or the number of stable and unstable closed-loops for both controllers. Furthermore, the time of tuning experiments and frequency quality indices are compared. According to the results, the PIDMA autotuner significantly outperforms the Siemens, and the stability margins also prove the robustness of PIDMA.Item On Visualisation of Linear Estimation and Fusion: From Equations to Ellipses(IEEE, 2023) Ajgl, Jiří; Straka, OndřejVisualisation of mathematical objects often leads to faster and easier comprehension of theories. On the other hand, deriving conclusions exclusively from a graphical interpretation can be misleading. A typical case in estimation is an ellipse, which corresponds to contour lines of a bivariate Gaussian density. This paper combines insights from various areas. Algebraic relations are presented first, geometric objects are shown subsequently. Constructions of ellipses by determining radii and positions of tangent lines are discussed. The stress is laid on exposition of linear estimation with a focus on methodology of fusion of estimates, especially in the case when cross-correlations of estimation errors are not fully known. The exposition also covers multidimensional variables, where the ellipses become ellipsoids.Item Approximate fusion of probability density functions using Gaussian copulas(IEEE, 2023) Ajgl, Jiří; Straka, OndřejSubjective Bayesian estimation perceives probability density functions as expert opinions. Among various rules for combining the opinions, the product and the weighted geometric mean of densities are prominent. Nevertheless, closed-form representations are scarce and non-parametric approaches often suffer from the curse of dimensionality. This paper prospects the fusion of densities represented by non-parametric marginal densities and a parametric Gaussian copula. The explicit reconstruction of the joint densities followed by an optimisation step is avoided. A cheap approximate combination is proposed instead. The combination of marginal densities is tuned by a Gaussian term, while the proposed copula parameter uses moments of the marginal densities. The presented examples illustrate the approximative nature of the approach for non-Gaussian densities and highlight some numerical issues.Item Domain-centric ADAS Datasets(CEUR-WS, 2023) Diviš, Václav; Schuster, Tobias; Hrúz, MarekSince the rise of Deep Learning methods in the automotive field, multiple initiatives have been collecting datasets in order to train neural networks on different levels of autonomous driving. This requires collecting relevant data and precisely annotating objects, which should represent uniformly distributed features for each specific use case. In this paper, we analyze several large-scale autonomous driving datasets with 2D and 3D annotations in regard to their statistics of appearance and their suitability for training robust object detection neural networks. We discovered that despite spending huge effort on driving hundreds of hours in different regions of the world, merely any focus is spent on analyzing the quality of the collected data, from an operational domain perspective. The analysis of safety-relevant aspects of autonomous driving functions, in particular trajectory planning with relation to time-to-collision feature, showed that most datasets lack annotated objects at further distances and that the distributions of bounding boxes and object positions are unbalanced. We therefore propose a set of rules which help find objects or scenes with inconsistent annotation styles. Lastly, we questioned the relevance of mean Average Precision (mAP) without relation to the object size or distance.Item Overview of SnakeCLEF 2023: Snake Identification in Medically Important Scenarios(CEUR-WS, 2023) Picek, Lukáš; Chamidullin, Rail; Hrúz, Marek; Durso, Andrew M.Developing an effective automatic system for snake species identification has significant importance for biodiversity, conservation, and global health. Snakebites result in over half a million deaths and disabilities worldwide each year, highlighting the urgent need for a system to enhance co-epidemiological data and improve treatment outcomes, especially in remote regions that lack the necessary expertise and data but have high snake diversity and a high incidence of snakebites. The SnakeCLEF challenge provide an evaluation ground that helps track the performance of AI-driven methods for snake species recognition systems on a global scale. The fourth edition of the SnakeCLEF challenge focuses on (i) evaluation of gradual improvements in automatic snake species identification, (ii) testing worldwide generalization on two specific scenarios, i.e., India and Central America, and (iii) evaluation with uneven costs for different errors, such as mistaking a venomous snake for a harmless one. This paper showcases the vital role of a robust automatic identification system for snakes, particularly in regions with limited resources, and highlights the potential impact on biodiversity conservation and global health outcomes. We report (i) a comprehensive description of the provided data, (ii) an evaluation methodology, (iii) an overview of the submitted methods, and (iv) perspectives derived from the achieved results.Item Object Detection Pipeline Using YOLOv8 for Document Information Extraction(CEUR-WS, 2023) Straka, Jakub; Gruber, IvanThe extraction of information from semi-structured documents is an ongoing problem. This task is often approached from the perspective of NLP and large transformer-based models are employed. In our work, we successfully demonstrated that the Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR) tasks can be effectively addressed as object detection problems using a convolutional neural network (CNN) model. We utilized a relatively small and fast YOLOv8 model for which we conducted a series of experiments to explore the impact of different factors on model training. With YOLOv8, we were able to achieve AP 0.716 on the KILE task and 0.638 on the LIR task. Our code is available at https://github.com/strakaj/YOLOv8-for-document-understanding.git.Item Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak(Springer International Publishing, 2023) Lehečka, Jan; Psutka, Josef; Psutka, JosefIn this paper, we are comparing several methods of training the Slovak speech recognition models based on the Transformers architecture. Specifically, we are exploring the approach of transfer learning from the existing Czech pre-trained Wav2Vec 2.0 model into Slovak. We are demonstrating the benefits of the proposed approach on three Slovak datasets. Our Slovak models scored the best results when initializing the weights from the Czech model at the beginning of the pre-training phase. Our results show that the knowledge stored in the Cezch pre-trained model can be successfully reused to solve tasks in Slovak while outperforming even much larger public multilingual models.Item VITS, Tacotron or FastSpeech? Challenging some of the most popular synthesizers(Springer, 2023) Matoušek, Jindřich; Tihelka, Daniel; Tihelková, AliceThe paper presents a comparative study of three neural speech synthesizers, namely VITS, Tacotron$2$ and FastSpeech$2$, which belong among the most popular TTS systems nowadays. Due to their varying nature, they have been tested from several points of view, analysing not only the overall quality of the synthesized speech, but also the capability of processing either orthographic or phonetic inputs. The analysis has been carried out on two English and one Czech voices.Item VITS: Quality vs. Speed Analysis(Springer International Publishing, 2023) Matoušek, Jindřich; Tihelka, DanielIn this paper, we analyze the performance of a modern end-to-end speech synthesis model called Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). We build on the original VITS model and examine how different modifications to its architecture affect synthetic speech quality and computational complexity. Experiments with two Czech voices, a male and a female, were carried out. To assess the quality of speech synthesized by the different modified models, MUSHRA listening tests were performed. The computational complexity was measured in terms of synthesis speed over real time. While the original VITS model is still preferred regarding speech quality, we present a modification of the original structure with a significantly better response yet providing acceptable output quality. Such a configuration can be used when system response latency is critical.Item Detection of objects and their parts using Transformers(Západočeská univerzita v Plzni, 2023) Vyskočil, JiříStandard detection and segmentation methods find objects in an image that can often be clearly distinguished from each other. However, there are also tasks, e.g. Visual Question Answering, that require more detailed descriptions, such as attributes or relations with other objects. In such cases, there is already an intermingling, as a more detailed description can belong to several types of objects, e.g. the leg category can be part of the person category, but also the chair category.In this work, new basic methods for detecting objects and their parts are created. These methods are based on Transformers and the classification layer is created in the same way as in the case of the existing methods of the used dataset. Finally, the methods are compared and evaluated. The best-performing Transformer method is DAB-Deformable-DETR which achieves 35,2 AP for objects and 16,2 AP for parts.Item T5G2P: Multilingual Grapheme-to-Phoneme Conversion with Text-to-Text Transfer Transformer(Springer, 2023) Řezáčková, Markéta; Frémund, Adam; Švec, Jan; Matoušek, JindřichIn recent years, the Text-to-Text Transfer Transformer (T5) neural network has proved more powerful for many text-related tasks, including the grapheme-to-phoneme conversion (G2P). The paper describes the training process of T5-base models for several languages. It shows the advantages of training G2P models using that language-specific basis over the G2P models fine-tuned from the multilingual base model. The paper also explains the reasons for training G2P models on whole sentences (not a dictionary) and evaluates the trained G2P models on unseen sentences and words.