VITS, Tacotron or FastSpeech? Challenging some of the most popular synthesizers
Date issued
2023
Journal Title
Journal ISSN
Volume Title
Publisher
Springer
Abstract
The paper presents a comparative study of three neural speech synthesizers, namely VITS, Tacotron$2$ and FastSpeech$2$, which belong among the most popular TTS systems nowadays. Due to their varying nature, they have been tested from several points of view, analysing not only the overall quality of the synthesized speech, but also the capability of processing either orthographic or phonetic inputs. The analysis has been carried out on two English and one Czech voices.
Description
Subject(s)
text-to-speech synthesis, VITS, FastSpeech2, Tacotron2