Using Pre-trained Models for Phoneme Representation in Czech Speech Synthesis

Date issued

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Západočeská univerzita v Plzni

Abstract

Text-to-speech (TTS) systems, i.e., systems producing artificial speech, represent an importanttopic in the field of artificial intelligence. Modern approaches based on neural networksreach very good results, almost comparable to real human speech.Nguyen et al. (2023) argue that including a large-scale pre-trained model for phonemerepresentation in a neural TTS system can further improve the final synthetic speech. We usedtheir pre-trained model called XPhoneBERT to investigate whether it can also enhance the qualityof speech synthesis in the Czech language.

Description

Subject(s)

phoneme representation, Czech speech, synthesis

Citation