Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model

Lehečka, Jan

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model

dc.contributor.author	Lehečka, Jan
dc.contributor.author	Hanzlíček, Zdeněk
dc.contributor.author	Matoušek, Jindřich
dc.contributor.author	Tihelka, Daniel
dc.date.accessioned	2025-06-20T08:35:46Z
dc.date.available	2025-06-20T08:35:46Z
dc.date.issued	2024
dc.date.updated	2025-06-20T08:35:46Z
dc.description.abstract	In this paper, we experimented with the SpeechT5 model pre-trained on large-scale datasets. We pre-trained the foundation model from scratch and fine-tuned it on a large-scale robust multi-speaker text-to-speech (TTS) task. We tested the model capabilities in a zero- and few-shot scenario. Based on two listening tests, we evaluated the synthetic audio quality and the similarity of how synthetic voices resemble real voices. Our results showed that the SpeechT5 model can generate a synthetic voice for any speaker using only one minute of the target speaker's data. We successfully demonstrated the high quality and similarity of our synthetic voices on publicly known Czech politicians and celebrities.	en
dc.format	12
dc.identifier.document-number	001307848400005
dc.identifier.doi	10.1007/978-3-031-70566-3_5
dc.identifier.isbn	978-3-031-70565-6
dc.identifier.issn	0302-9743
dc.identifier.obd	43944108
dc.identifier.orcid	Lehečka, Jan 0000-0002-3889-8069
dc.identifier.orcid	Hanzlíček, Zdeněk 0000-0002-4001-9289
dc.identifier.orcid	Matoušek, Jindřich 0000-0002-7408-7730
dc.identifier.orcid	Tihelka, Daniel 0000-0002-3149-2330
dc.identifier.uri	http://hdl.handle.net/11025/60313
dc.language.iso	en
dc.project.ID	90254
dc.project.ID	GA22-27800S
dc.publisher	Springer International Publishing
dc.relation.ispartofseries	27th International Conference on Text, Speech, and Dialogue, TSD 2024
dc.subject	multi-speaker TTS	en
dc.subject	speechT5	en
dc.subject	few-shot TTS	en
dc.subject	zero-shot TTS	en
dc.title	Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model	en
dc.type	Stať ve sborníku (D)
dc.type	STAŤ VE SBORNÍKU
dc.type.status	Published Version
local.files.count	1	*
local.files.size	288363	*
local.has.files	yes	*
local.identifier.eid	2-s2.0-85204384157

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: 978-3-031-70566-3_5.pdf
Size:: 281.6 KB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Conference Papers (KKY)