Evaluating Phoneme-Level Pretraining in Czech Text-to-Speech Synthesis

Vladař, Lukáš

Evaluating Phoneme-Level Pretraining in Czech Text-to-Speech Synthesis

dc.contributor.author	Vladař, Lukáš
dc.contributor.author	Matoušek, Jindřich
dc.contributor.author	Lehečka, Jan
dc.contributor.author	Řezáčková, Markéta
dc.date.accessioned	2026-04-20T18:05:59Z
dc.date.available	2026-04-20T18:05:59Z
dc.date.issued	2026
dc.date.updated	2026-04-20T18:05:59Z
dc.description.abstract	Pretrained phoneme-level models such as Phoneme-Level BERT and XPhoneBERT have shown promising results in enhancing prosody and expressiveness in English TTS systems. However, their effectiveness in less-studied languages with different prosodic characteristics—such as Czech—remains underexplored. This paper investigates their applicability in Czech text-to-speech synthesis by evaluating PL-BERT within the StyleTTS 2 framework and XPhoneBERT within the VITS architecture. We conduct experiments under both highand and low-resource conditions using professionally read Czech news-style speech to determine the benefits of these pretrained phoneme-level models in Czech speech synthesis and to compare them to each other	en
dc.description.abstract	Modely předtrénované na úrovni fonémů, jako např. Phoneme-Level BERT či XPhoneBERT, prokazují slibné výsledky ve zlepšování prozodie a výrazu anglických systémů TTS. Jejich přínos v méně studovaných jazycích s odlišnými prozodickými charakteristikami—např. v češtině—však zatím není příliš prozkoumán. Tento článek se zabývá jejich použitelností pro syntézu řeči v češtině, konkrétně hodnotí použití modelu PL-BERT v rámci frameworku StyleTTS2 a modelu XPhoneBERT zakomponovaného do architektury VITS. Provedli jsme experimenty při dostatečném i omezeném množství trénovacích dat reprezentovaných profesionálně čtenými zpravodajskými nahrávkami, abychom odhalili výhody těchto modelů předtrénovaných na úrovni fonémů pro českou syntézu řeči a abychom zmíněné modely porovnaly navzájem.	cz
dc.format	12
dc.identifier.document-number	001576343000014
dc.identifier.doi	10.1007/978-3-032-02548-7_14
dc.identifier.isbn	978-3-032-02547-0
dc.identifier.issn	0302-9743
dc.identifier.obd	43947500
dc.identifier.orcid	Vladař, Lukáš 0009-0009-8047-7303
dc.identifier.orcid	Matoušek, Jindřich 0000-0002-7408-7730
dc.identifier.orcid	Lehečka, Jan 0000-0002-3889-8069
dc.identifier.orcid	Řezáčková, Markéta 0000-0002-6194-7826
dc.identifier.uri	http://hdl.handle.net/11025/67717
dc.language.iso	en
dc.project.ID	SGS-2025-011
dc.publisher	Springer
dc.relation.ispartofseries	28th International Conference on Text, Speech, and Dialogue, TSD 2025
dc.subject	phoneme-level pretraining	en
dc.subject	PL-BERT	en
dc.subject	XPhoneBERT	en
dc.subject	VITS	en
dc.subject	StyleTTS 2	en
dc.subject	modely předtrénované na úrovni fonémů	cz
dc.subject	PL-BERT	cz
dc.subject	XPhoneBERT	cz
dc.subject	VITS	cz
dc.subject	StyleTTS 2	cz
dc.title	Evaluating Phoneme-Level Pretraining in Czech Text-to-Speech Synthesis	en
dc.title	Význam modelů předtrénovaných na úrovni fonémů v české syntéze řeči	cz
dc.type	Stať ve sborníku (D)
dc.type	STAŤ VE SBORNÍKU
dc.type.status	Published Version
local.files.count	1	*
local.files.size	1084793	*
local.has.files	yes	*
local.identifier.eid	2-s2.0-105014392462

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: paper.pdf
Size:: 1.03 MB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Conference Papers (KKY)