Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis
Date issued
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer International Publishing
Abstract
During the development of a speech synthesizer, we often face a lack of training data. This paper describes how the amount of data used to train a speech synthesizer affects the quality of the final synthetic speech. To answer this question, we trained multiple VITS synthesizers using different amounts of training data and we compared them using listening tests and the MCD objective measure. Furthermore, we compared three training strategies: training a speech synthesizer from scratch, fine-tuning a single-speaker model and fine-tuning a multi-speaker model.
Description
Subject(s)
fine-tuning, speech synthesis, training data, VITS