Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis

Vladař, Lukáš

Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis

Files

978-3-031-70566-3_9.pdf (408.5 KB)

Date issued

2024

Authors

Vladař, Lukáš

Matoušek, Jindřich

Publisher

Springer International Publishing

Abstract

During the development of a speech synthesizer, we often face a lack of training data. This paper describes how the amount of data used to train a speech synthesizer affects the quality of the final synthetic speech. To answer this question, we trained multiple VITS synthesizers using different amounts of training data and we compared them using listening tests and the MCD objective measure. Furthermore, we compared three training strategies: training a speech synthesizer from scratch, fine-tuning a single-speaker model and fine-tuning a multi-speaker model.

Subject(s)

fine-tuning, speech synthesis, training data, VITS

Item identifier

http://hdl.handle.net/11025/60332
https://doi.org/10.1007/978-3-031-70566-3_9

Collections

Conference Papers (KKY)

Show full item record

Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis

Files

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Subject(s)

Citation

Item identifier

Collections