Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model

dc.contributor.authorLehečka, Jan
dc.contributor.authorHanzlíček, Zdeněk
dc.contributor.authorMatoušek, Jindřich
dc.contributor.authorTihelka, Daniel
dc.date.accessioned2025-06-20T08:35:46Z
dc.date.available2025-06-20T08:35:46Z
dc.date.issued2024
dc.date.updated2025-06-20T08:35:46Z
dc.description.abstractIn this paper, we experimented with the SpeechT5 model pre-trained on large-scale datasets. We pre-trained the foundation model from scratch and fine-tuned it on a large-scale robust multi-speaker text-to-speech (TTS) task. We tested the model capabilities in a zero- and few-shot scenario. Based on two listening tests, we evaluated the synthetic audio quality and the similarity of how synthetic voices resemble real voices. Our results showed that the SpeechT5 model can generate a synthetic voice for any speaker using only one minute of the target speaker's data. We successfully demonstrated the high quality and similarity of our synthetic voices on publicly known Czech politicians and celebrities.en
dc.format12
dc.identifier.document-number001307848400005
dc.identifier.doi10.1007/978-3-031-70566-3_5
dc.identifier.isbn978-3-031-70565-6
dc.identifier.issn0302-9743
dc.identifier.obd43944108
dc.identifier.orcidLehečka, Jan 0000-0002-3889-8069
dc.identifier.orcidHanzlíček, Zdeněk 0000-0002-4001-9289
dc.identifier.orcidMatoušek, Jindřich 0000-0002-7408-7730
dc.identifier.orcidTihelka, Daniel 0000-0002-3149-2330
dc.identifier.urihttp://hdl.handle.net/11025/60313
dc.language.isoen
dc.project.ID90254
dc.project.IDGA22-27800S
dc.publisherSpringer International Publishing
dc.relation.ispartofseries27th International Conference on Text, Speech, and Dialogue, TSD 2024
dc.subjectmulti-speaker TTSen
dc.subjectspeechT5en
dc.subjectfew-shot TTSen
dc.subjectzero-shot TTSen
dc.titleZero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Modelen
dc.typeStať ve sborníku (D)
dc.typeSTAŤ VE SBORNÍKU
dc.type.statusPublished Version
local.files.count1*
local.files.size288363*
local.has.filesyes*
local.identifier.eid2-s2.0-85204384157

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
978-3-031-70566-3_5.pdf
Size:
281.6 KB
Format:
Adobe Portable Document Format
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: