On Comparison of Phonetic Representations for Czech Neural Speech Synthesis
| dc.contributor.author | Matoušek, Jindřich | |
| dc.contributor.author | Tihelka, Daniel | |
| dc.date.accessioned | 2023-01-16T11:00:16Z | |
| dc.date.available | 2023-01-16T11:00:16Z | |
| dc.date.issued | 2022 | |
| dc.description.abstract-translated | In this paper, we investigate two research questions related to the phonetic representation of input text in Czech neural speech synthesis: 1) whether we can afford to reduce the phonetic alphabet, and 2) whether we can remove pauses from phonetic transcription and let the speech synthesis model predict the pause positions itself. In our experiments, three different modern speech synthesis models (FastSpeech 2 + Multi-band MelGAN, Glow-TTS + UnivNet, and VITS) were employed. We have found that the reduced phonetic alphabet outperforms the traditionally used full phonetic alphabet. On the other hand, removing pauses does not help. The presence of pauses (predicted by an external pause prediction tool) in phonetic transcription leads to a slightly better quality of synthetic speech. | en |
| dc.format | 13 s. | cs |
| dc.format.mimetype | application/pdf | |
| dc.identifier.citation | MATOUŠEK, J. TIHELKA, D. On Comparison of Phonetic Representations for Czech Neural Speech Synthesis. In Text, Speech, and Dialogue 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings. Cham: Springer International Publishing, 2022. s. 410-422. ISBN: 978-3-031-16269-5 , ISSN: 0302-9743 | cs |
| dc.identifier.doi | 10.1007/978-3-031-16270-1_34 | |
| dc.identifier.isbn | 978-3-031-16269-5 | |
| dc.identifier.issn | 0302-9743 | |
| dc.identifier.obd | 43936699 | |
| dc.identifier.uri | 2-s2.0-85139064069 | |
| dc.identifier.uri | http://hdl.handle.net/11025/50927 | |
| dc.language.iso | en | en |
| dc.project.ID | 90140/Velká výzkumná infrastruktura_(J) - e-INFRA CZ | cs |
| dc.project.ID | TL05000546/Využití multimediálního výkladového slovníku pro moderní výuku češtiny | cs |
| dc.publisher | Springer International Publishing | en |
| dc.relation.ispartofseries | Text, Speech, and Dialogue 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings | en |
| dc.rights | Plný text je přístupný v rámci univerzity přihlášeným uživatelům. | cs |
| dc.rights | © Springer Nature Switzerland AG | en |
| dc.rights.access | restrictedAccess | en |
| dc.subject.translated | neural speech synthesis | en |
| dc.subject.translated | phonetic representation | en |
| dc.subject.translated | phonetic reductions | en |
| dc.subject.translated | pause modeling | en |
| dc.subject.translated | czech language | en |
| dc.title | On Comparison of Phonetic Representations for Czech Neural Speech Synthesis | en |
| dc.type | konferenční příspěvek | cs |
| dc.type | ConferenceObject | en |
| dc.type.status | Peer-reviewed | en |
| dc.type.version | publishedVersion | en |
Files
Original bundle
1 - 1 out of 1 results
No Thumbnail Available
- Name:
- Matousek_Tihelka-On_Compariso_of_Phonetic_Representations_TSD_2022.pdf
- Size:
- 271.71 KB
- Format:
- Adobe Portable Document Format