Sentences vs Phrases in Neural Speech Synthesis

Tihelka, Daniel

Sentences vs Phrases in Neural Speech Synthesis

Files

978-3-031-70566-3_4.pdf (231.54 KB)

Date issued

2024

Authors

Publisher

Springer International Publishing

Abstract

The neural network-based TTS models are usually trained and inferred on the whole sentences, or, in general, on longer chunks of speech. However, these may negatively affect the responsiveness of the TTS system in cases when latency should be kept as small as possible. We present experiments using smaller chunk lengths, namely phrases, and their impact on speech quality when various chunk length combinations are used for training and inference in the VITS synthesizer.

Subject(s)

phrase, sentence, neural text-to-speech, VITS

Item identifier

http://hdl.handle.net/11025/60351
https://doi.org/10.1007/978-3-031-70566-3_4

Collections

Conference papers (NTIS)

Show full item record

Sentences vs Phrases in Neural Speech Synthesis

Files

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Subject(s)

Citation

Item identifier

Collections