Using Pre-trained Models for Phoneme Representation in Czech Speech Synthesis
Date issued
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Západočeská univerzita v Plzni
Abstract
Text-to-speech (TTS) systems, i.e., systems producing artificial speech, represent an importanttopic in the field of artificial intelligence. Modern approaches based on neural networksreach very good results, almost comparable to real human speech.Nguyen et al. (2023) argue that including a large-scale pre-trained model for phonemerepresentation in a neural TTS system can further improve the final synthetic speech. We usedtheir pre-trained model called XPhoneBERT to investigate whether it can also enhance the qualityof speech synthesis in the Czech language.
Description
Subject(s)
phoneme representation, Czech speech, synthesis