VITS: Quality vs. Speed Analysis

dc.contributor.authorMatoušek, Jindřich
dc.contributor.authorTihelka, Daniel
dc.date.accessioned2025-06-20T08:55:16Z
dc.date.available2025-06-20T08:55:16Z
dc.date.issued2023
dc.date.updated2025-06-20T08:55:16Z
dc.description.abstractIn this paper, we analyze the performance of a modern end-to-end speech synthesis model called Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). We build on the original VITS model and examine how different modifications to its architecture affect synthetic speech quality and computational complexity. Experiments with two Czech voices, a male and a female, were carried out. To assess the quality of speech synthesized by the different modified models, MUSHRA listening tests were performed. The computational complexity was measured in terms of synthesis speed over real time. While the original VITS model is still preferred regarding speech quality, we present a modification of the original structure with a significantly better response yet providing acceptable output quality. Such a configuration can be used when system response latency is critical.en
dc.format12
dc.identifier.doi10.1007/978-3-031-40498-6_19
dc.identifier.isbn978-3-031-40497-9
dc.identifier.issn0302-9743
dc.identifier.obd43940620
dc.identifier.orcidMatoušek, Jindřich 0000-0002-7408-7730
dc.identifier.orcidTihelka, Daniel 0000-0002-3149-2330
dc.identifier.urihttp://hdl.handle.net/11025/61564
dc.language.isoen
dc.project.IDTL05000546
dc.publisherSpringer International Publishing
dc.relation.ispartofseries26th International Conference on Text, Speech, and Dialogue, TSD 2023
dc.subjectneural speech synthesisen
dc.subjectEed-to-end modelingen
dc.subjectvariational autoencoderen
dc.subjectVITSen
dc.subjectspeed optimizationen
dc.titleVITS: Quality vs. Speed Analysisen
dc.typeStať ve sborníku (D)
dc.typeSTAŤ VE SBORNÍKU
dc.type.statusPublished Version
local.files.count1*
local.files.size458168*
local.has.filesyes*
local.identifier.eid2-s2.0-85172002512

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
978-3-031-40498-6_19.pdf
Size:
447.43 KB
Format:
Adobe Portable Document Format
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: