Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Přibil, Jiří

Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

dc.contributor.author	Přibil, Jiří
dc.contributor.author	Přibilová, Anna
dc.contributor.author	Matoušek, Jindřich
dc.date.accessioned	2021-02-08T11:00:24Z
dc.date.available	2021-02-08T11:00:24Z
dc.date.issued	2020
dc.description.abstract	Kvalita syntézy řeči je zásadním problémem při porovnávání různých systémů převodu textu na řeč (TTS). Navrhli jsme systém pro automatické hodnocení kvality řeči pomocí statistické analýzy časových příznaků (doba trvání, frázování a časové členění analyzované věty) spolu se standardními spektrálními a prozodickými příznaky. Tento systém byl úspěšně testován na větách produkovaných syntetizátorem řeči založeném na principu výběru jednotek s mužským i ženským hlasem s využitím dvou různých přístupy k manipulaci prozodie. Experimenty ukázaly, že pro správné a stabilní výsledky jsou všechny tři typy řečových příznaků (spektrální, prozodické a časové) nezbytné. Počet použitých statistických parametrů má navíc významný dopad na správnost a přesnost hodnocených výsledků. Bylo také prokázáno, že stabilitu celého procesu hodnocení lze vylepšit rozšířením použitého řečového materiálu. Funkčnost navrhovaného systému byla nakonec ověřena porovnáním s výsledky standardního poslechového testu.	cs
dc.description.abstract-translated	Quality of speech synthesis is a crucial issue in comparison of various text-to-speech (TTS) systems. We proposed a system for automatic evaluation of speech quality by statistical analysis of temporal features (time duration, phrasing, and time structuring of an analysed sentence) together with standard spectral and prosodic features. This system was successfully tested on sentences produced by a unit selection speech synthesizer with a male as well as a female voice using two different approaches to prosody manipulation. Experiments have shown that for correct, sharp, and stable results all three types of speech features (spectral, prosodic, and temporal) are necessary. Furthermore, the number of used statistical parameters has a significant impact on the correctness and precision of the evaluated results. It was also demonstrated that the stability of the whole evaluation process is improved by enlarging the used speech material. Finally, the functionality of the proposed system was verified by comparison of the results with those of the standard listening test.	en
dc.format	9 s.	cs
dc.format.mimetype	application/pdf
dc.identifier.citation	PŘIBIL, J., PŘIBILOVÁ, A., MATOUŠEK, J. Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations. Journal of Electrical engineering, 2020, roč. 71, č. 2, s. 78-86. ISSN 1335-3632.	cs
dc.identifier.document-number	536287900002
dc.identifier.doi	10.2478/jee-2020-0012
dc.identifier.issn	1335-3632
dc.identifier.obd	43929603
dc.identifier.uri	2-s2.0-85085749611
dc.identifier.uri	http://hdl.handle.net/11025/42609
dc.language.iso	en	en
dc.project.ID	GA19-19324S/Plně trénovatelná syntéza české řeči z textu s využitím hlubokých neuronových sítí	cs
dc.publisher	De Gruyter	en
dc.relation.ispartofseries	Journal of ELECTRICAL ENGINEERING	en
dc.rights	© De Gruyter	en
dc.rights.access	openAccess	en
dc.subject	poslechový test	cs
dc.subject	objektivní a subjektivní hodnocení	cs
dc.subject	kvalita syntetické řeči	cs
dc.subject	statistická analýza	cs
dc.subject.translated	listening test	en
dc.subject.translated	objective and subjective evaluation	en
dc.subject.translated	quality of synthetic speech	en
dc.subject.translated	statistical analysis	en
dc.title	Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations	en
dc.title.alternative	Automatická statistická evaluace kvality syntézy řeči výběrem jednotek s různými prozodickými manipulacemi	cs
dc.type	článek	cs
dc.type	article	en
dc.type.status	Peer-reviewed	en
dc.type.version	publishedVersion	en

Collections

OBD
Articles (KKY)

Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Files

Collections