Conference Papers (KKY)

Permanent URI for this collection

Browse

Recent Submissions

Showing 1 - 20 out of 274 results
  • Item
    Asking Questions: an Innovative Way to Interact with Oral History Archives
    (International Speech Communication Association, 2023) Švec, Jan; Bulín, Martin; Frémund, Adam; Polák, Filip
    The paper describes our initial effort to use Transformer-based neural networks for understanding and presenting oral history archives. Such archives of interviews often contain large passages of the interviewee’s speech. Our approach automatically generates relevant questions, which enrich such monotonous parts and allows the listener to better orient in the interview. The generated questions also allow for finding interesting parts of the interview without changing the original meaning of the testimony. We present our working pipeline consisting of a Wav2Vec speech recognizer, BERT-based punctuation detection, T5 asking questions model and BERT-based semantic continuity model.
  • Item
    The System for Efficient Indexing and Search in the Large Archives of Scanned Historical Documents
    (Springer, 2023) Bulín, Martin; Švec, Jan; Ircing, Pavel
    The paper introduces software capable of indexing and searching large archives of scanned historical documents. The system capabilities are demonstrated on the collection containing documents from the archives of the post-Soviet security services. The backend of the system was designed with a focus on flexibility (it is actually already being used for other related tasks) and scalability to larger volumes of data. The graphical user interface design has been consulted with historians interested in using the archived documents and was developed in several iterations, gradually including the changes induced both by the user’s requests and by our improving knowledge about the nature of the processed data.
  • Item
    Multimodal Low-Cost Robotic Entity based on Raspberry Pi
    (Západočeská univerzita v Plzni, 2023) Bulín, Martin
    With the presence of numerous high-priced robotic entities available in the market, there arises a pressing need to develop low-cost alternatives for proof-of-concept validation, under- scoring the demand for affordable solutions. This study focuses on integrating students’ ma- chine learning projects onto a real robotic platform, facilitating hands-on experience and bridg- ing the theory-practice gap. The primary objective is to develop a versatile robotic device with multiple interfaces as a platform for students’ projects implementation and fundamental ideas verification. By applying their ideas to an embodied entity and testing real-life scenarios, stu- dents can enhance their understanding of complex principles while fostering innovation.
  • Item
    Design of Efficient Point-Mass Filter with Terrain Aided Navigation Illustration
    (IEEE, 2023) Matoušek, Jakub; Duník, Jindřich; Brandner, Marek
    This paper deals with state estimation of stochastic models with linear state dynamics, continuous or discrete in time. The emphasis is laid on a numerical solution to the state prediction by the time-update step of the grid-point-based point-mass filter (PMF), which is the most computationally demanding part of the PMF algorithm. A novel efficient PMF (ePMF) estimator, unifying continuous and discrete, approaches is proposed, designed, and discussed. By numerical illustrations, it is shown, that the proposed ePMF can lead to a time complexity reduction that exceeds 99.9% without compromising accuracy. The MATLAB® code of the ePMF is released with this paper.
  • Item
    Distributed Point-Mass Filter with Reduced Data Transfer Using Copula Theory
    (IEEE, 2023) Matoušek, Jakub; Duník, Jindřich; Forsling, Robin
    This paper deals with distributed Bayesian state estimation of generally nonlinear stochastic dynamic systems. In particular, distributed point-mass filter algorithm is developed. It is comprised of a basic part that is accurate but data intense and optional step employing advanced copula theory. The optional step significantly reduces data transfer for the price of a small accuracy decrease. In the end, the developed algorithm is numerically compared to the usually employed distributed extended Kalman filter.
  • Item
    Automatic Testing of PI(D) Autotuners
    (IEEE Nuclear and Plasma Sciences Society, 2023) Slavíček, Lukáš; Balda, Pavel; Schlegel, Miloš
    This paper presents a software tool for the automatic testing of PI(D) autotuners. The described tool was created in the REXYGEN system. The measured data were subsequently processed in MATLAB. The PID_Compact controller from global automation producer Siemens and PIDMA controller from Czech SME company REX Controls were tested. The autotuning methods were compared on a subset of well-known PI(D) control benchmarks by K. J. Åström and T. Hägglund. The presented results show the number of failed tuning experiments or the number of stable and unstable closed-loops for both controllers. Furthermore, the time of tuning experiments and frequency quality indices are compared. According to the results, the PIDMA autotuner significantly outperforms the Siemens, and the stability margins also prove the robustness of PIDMA.
  • Item
    On Visualisation of Linear Estimation and Fusion: From Equations to Ellipses
    (IEEE, 2023) Ajgl, Jiří; Straka, Ondřej
    Visualisation of mathematical objects often leads to faster and easier comprehension of theories. On the other hand, deriving conclusions exclusively from a graphical interpretation can be misleading. A typical case in estimation is an ellipse, which corresponds to contour lines of a bivariate Gaussian density. This paper combines insights from various areas. Algebraic relations are presented first, geometric objects are shown subsequently. Constructions of ellipses by determining radii and positions of tangent lines are discussed. The stress is laid on exposition of linear estimation with a focus on methodology of fusion of estimates, especially in the case when cross-correlations of estimation errors are not fully known. The exposition also covers multidimensional variables, where the ellipses become ellipsoids.
  • Item
    Approximate fusion of probability density functions using Gaussian copulas
    (IEEE, 2023) Ajgl, Jiří; Straka, Ondřej
    Subjective Bayesian estimation perceives probability density functions as expert opinions. Among various rules for combining the opinions, the product and the weighted geometric mean of densities are prominent. Nevertheless, closed-form representations are scarce and non-parametric approaches often suffer from the curse of dimensionality. This paper prospects the fusion of densities represented by non-parametric marginal densities and a parametric Gaussian copula. The explicit reconstruction of the joint densities followed by an optimisation step is avoided. A cheap approximate combination is proposed instead. The combination of marginal densities is tuned by a Gaussian term, while the proposed copula parameter uses moments of the marginal densities. The presented examples illustrate the approximative nature of the approach for non-Gaussian densities and highlight some numerical issues.
  • Item
    Domain-centric ADAS Datasets
    (CEUR-WS, 2023) Diviš, Václav; Schuster, Tobias; Hrúz, Marek
    Since the rise of Deep Learning methods in the automotive field, multiple initiatives have been collecting datasets in order to train neural networks on different levels of autonomous driving. This requires collecting relevant data and precisely annotating objects, which should represent uniformly distributed features for each specific use case. In this paper, we analyze several large-scale autonomous driving datasets with 2D and 3D annotations in regard to their statistics of appearance and their suitability for training robust object detection neural networks. We discovered that despite spending huge effort on driving hundreds of hours in different regions of the world, merely any focus is spent on analyzing the quality of the collected data, from an operational domain perspective. The analysis of safety-relevant aspects of autonomous driving functions, in particular trajectory planning with relation to time-to-collision feature, showed that most datasets lack annotated objects at further distances and that the distributions of bounding boxes and object positions are unbalanced. We therefore propose a set of rules which help find objects or scenes with inconsistent annotation styles. Lastly, we questioned the relevance of mean Average Precision (mAP) without relation to the object size or distance.
  • Item
    Overview of SnakeCLEF 2023: Snake Identification in Medically Important Scenarios
    (CEUR-WS, 2023) Picek, Lukáš; Chamidullin, Rail; Hrúz, Marek; Durso, Andrew M.
    Developing an effective automatic system for snake species identification has significant importance for biodiversity, conservation, and global health. Snakebites result in over half a million deaths and disabilities worldwide each year, highlighting the urgent need for a system to enhance co-epidemiological data and improve treatment outcomes, especially in remote regions that lack the necessary expertise and data but have high snake diversity and a high incidence of snakebites. The SnakeCLEF challenge provide an evaluation ground that helps track the performance of AI-driven methods for snake species recognition systems on a global scale. The fourth edition of the SnakeCLEF challenge focuses on (i) evaluation of gradual improvements in automatic snake species identification, (ii) testing worldwide generalization on two specific scenarios, i.e., India and Central America, and (iii) evaluation with uneven costs for different errors, such as mistaking a venomous snake for a harmless one. This paper showcases the vital role of a robust automatic identification system for snakes, particularly in regions with limited resources, and highlights the potential impact on biodiversity conservation and global health outcomes. We report (i) a comprehensive description of the provided data, (ii) an evaluation methodology, (iii) an overview of the submitted methods, and (iv) perspectives derived from the achieved results.
  • Item
    Object Detection Pipeline Using YOLOv8 for Document Information Extraction
    (CEUR-WS, 2023) Straka, Jakub; Gruber, Ivan
    The extraction of information from semi-structured documents is an ongoing problem. This task is often approached from the perspective of NLP and large transformer-based models are employed. In our work, we successfully demonstrated that the Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR) tasks can be effectively addressed as object detection problems using a convolutional neural network (CNN) model. We utilized a relatively small and fast YOLOv8 model for which we conducted a series of experiments to explore the impact of different factors on model training. With YOLOv8, we were able to achieve AP 0.716 on the KILE task and 0.638 on the LIR task. Our code is available at https://github.com/strakaj/YOLOv8-for-document-understanding.git.
  • Item
    Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak
    (Springer International Publishing, 2023) Lehečka, Jan; Psutka, Josef; Psutka, Josef
    In this paper, we are comparing several methods of training the Slovak speech recognition models based on the Transformers architecture. Specifically, we are exploring the approach of transfer learning from the existing Czech pre-trained Wav2Vec 2.0 model into Slovak. We are demonstrating the benefits of the proposed approach on three Slovak datasets. Our Slovak models scored the best results when initializing the weights from the Czech model at the beginning of the pre-training phase. Our results show that the knowledge stored in the Cezch pre-trained model can be successfully reused to solve tasks in Slovak while outperforming even much larger public multilingual models.
  • Item
    VITS, Tacotron or FastSpeech? Challenging some of the most popular synthesizers
    (Springer, 2023) Matoušek, Jindřich; Tihelka, Daniel; Tihelková, Alice
    The paper presents a comparative study of three neural speech synthesizers, namely VITS, Tacotron$2$ and FastSpeech$2$, which belong among the most popular TTS systems nowadays. Due to their varying nature, they have been tested from several points of view, analysing not only the overall quality of the synthesized speech, but also the capability of processing either orthographic or phonetic inputs. The analysis has been carried out on two English and one Czech voices.
  • Item
    VITS: Quality vs. Speed Analysis
    (Springer International Publishing, 2023) Matoušek, Jindřich; Tihelka, Daniel
    In this paper, we analyze the performance of a modern end-to-end speech synthesis model called Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). We build on the original VITS model and examine how different modifications to its architecture affect synthetic speech quality and computational complexity. Experiments with two Czech voices, a male and a female, were carried out. To assess the quality of speech synthesized by the different modified models, MUSHRA listening tests were performed. The computational complexity was measured in terms of synthesis speed over real time. While the original VITS model is still preferred regarding speech quality, we present a modification of the original structure with a significantly better response yet providing acceptable output quality. Such a configuration can be used when system response latency is critical.
  • Item
    Detection of objects and their parts using Transformers
    (Západočeská univerzita v Plzni, 2023) Vyskočil, Jiří
    Standard detection and segmentation methods find objects in an image that can often be clearly distinguished from each other. However, there are also tasks, e.g. Visual Question Answering, that require more detailed descriptions, such as attributes or relations with other objects. In such cases, there is already an intermingling, as a more detailed description can belong to several types of objects, e.g. the leg category can be part of the person category, but also the chair category.In this work, new basic methods for detecting objects and their parts are created. These methods are based on Transformers and the classification layer is created in the same way as in the case of the existing methods of the used dataset. Finally, the methods are compared and evaluated. The best-performing Transformer method is DAB-Deformable-DETR which achieves 35,2 AP for objects and 16,2 AP for parts.
  • Item
    T5G2P: Multilingual Grapheme-to-Phoneme Conversion with Text-to-Text Transfer Transformer
    (Springer, 2023) Řezáčková, Markéta; Frémund, Adam; Švec, Jan; Matoušek, Jindřich
    In recent years, the Text-to-Text Transfer Transformer (T5) neural network has proved more powerful for many text-related tasks, including the grapheme-to-phoneme conversion (G2P). The paper describes the training process of T5-base models for several languages. It shows the advantages of training G2P models using that language-specific basis over the G2P models fine-tuned from the multilingual base model. The paper also explains the reasons for training G2P models on whole sentences (not a dictionary) and evaluates the trained G2P models on unseen sentences and words.
  • Item
    Neural Speech Synthesis with Enriched Phrase Boundaries
    (International Speech Communication Association, 2023) Kunešová, Marie; Matoušek, Jindřich
    Prosodic phrasing is one of the factors influencing the naturalness of synthesized speech. In this paper, we enrich the phonetic representation for neural speech synthesis with additional markers denoting the strength of phrase breaks between words. These markers are assigned to the training data automatically, using our previously introduced model for audio-based phrase boundary detection. We tested the approach with two different levels of resolution for the break indices-either ten distinct levels (P10) or only “ToBI-like” four levels (P4). Listening tests with two different speaker voices show a statistically significant preference among listeners for P10 or P4 over the baseline speech synthesis without these markers (P0), although which version is judged as better depends on the voice.
  • Item
    KWDOA: Adapted dataset for detection of the direction of arrival of the keyword
    (EasyChair, 2024) Beneš, David; Šmídl, Luboš
    This paper describes a simulated audio dataset of spoken words which accommodate microphone array design for training and evaluating keywords spotting systems. With this dataset you could train a neural network for the detection direction of the speaker.Which is an advanced version of the original, with added noises during a speech in random locations and different rooms with different reverb. Hence it should be closer to r eal-world long-range applications. This task could be a new challenge for the direction of arrivalactivated by keyword spotting systems. Let’s call this task KWDOA. This dataset could serve as the intro level for microphone array designs.
  • Item
    Will XAI Provide Real Explanation or Just a Plausible Rationalization?
    (Springer, 2023) Ircing, Pavel; Švec, Jan
    This paper discusses the analogies between the mainstream theory of human mind and the two broad paradigms that are employed when building artificial intelligence systems. Then it ponders the idea how those analogies could be utilized in building a truly explainable artificial intelligence (AI) applications. The core part is devoted to the problem of unwanted rationalization that could disguise the true reasons lying behind the decisions of the explainable AI systems.
  • Item
    Odhad pózy zvířat pomocí metod hlubokého učení
    (Západočeská univerzita v Plzni, 2023) Majer, Tomáš
    Klasické metody sledování nejčastěji využívají GPS obojky, ale takový přístup je invazivní a umožňuje pozorovat pouze pohyb odchycených jedinců.Ke sledování pohybu zvířat se proto postupně začínají používat fotopasti. Využívá se unikátní kresby některých druhů zvířat k jejich identifikaci napříč jednotlivými snímky. Proces identifikace však vyžaduje experta, který jedince rozpozná. Pokud bychom byli schopni kresbu z obrázku získat a vzájemně porovnat s ostatními známými kresbami, proces identifikace by mohl být automatizován. Získání této kresby lze provést například s využitím odhadu 2D pózy zvířat a namapováním textury na tento odhad.Cílem této práce je navrhnout funkční model pro odhad pózy zvířat se zaměřením na rysa ostrovida za použití metod hlubokého učení.