Ensemble of Deep Neural Network Models for MOS Prediction

Kunešová, Marie

Ensemble of Deep Neural Network Models for MOS Prediction

dc.contributor.author	Kunešová, Marie
dc.contributor.author	Matoušek, Jindřich
dc.contributor.author	Lehečka, Jan
dc.contributor.author	Švec, Jan
dc.contributor.author	Michálek, Josef
dc.contributor.author	Tihelka, Daniel
dc.contributor.author	Bulín, Martin
dc.contributor.author	Hanzlíček, Zdeněk
dc.contributor.author	Řezáčková, Markéta
dc.date.accessioned	2025-06-20T08:23:49Z
dc.date.available	2025-06-20T08:23:49Z
dc.date.issued	2023
dc.date.updated	2025-06-20T08:23:49Z
dc.description.abstract	Automatic evaluation of the quality of synthetic speech has the potential to serve as a cheaper and less time-consuming alternative to standard listening tests. In this paper, we present our contribution to the ongoing research: a system for automatic prediction of the mean opinion score (MOS) given by human listeners. The system was specifically developed for the recent VoiceMOS Challenge. Following the success of fusion systems in similar challenges, our contribution is an ensemble that interpolates the outputs of seven different models: four different wav2vec models, a CNN-RNN model, QuartzNet, and the LDNet baseline. During the VoiceMOS challenge, our system achieved the second-best utterance-level MSE of 0.171 and ranged from 2nd to 8th place among all 22 participating teams in terms of other evaluation metrics.	en
dc.description.abstract	Automatické hodnocení kvality syntetické řeči má potenciál stát se levnější a méně časově náročnou alternativou ke standardním poslechovým testům. V tomto článku představujeme náš příspěvek k probíhajícímu výzkumu: systém pro automatickou predikci mean opinion score (MOS) daného lidskými posluchači. Systém byl speciálně vyvinut pro nedávnou soutěž VoiceMOS Challenge. V návaznosti na úspěch kombinovaných systémů v podobných soutěžích je náš systém koncipován jako ensemble interpolující výstupy sedmi různých modelů: čtyři různé wav2vec modely, CNN-RNN model, QuartzNet a soutežní baseline LDNet. Během soutěže VoiceMOS náš systém dosáhl druhého nejlepšího výsledku z hlediska MSE na úrovni nahrávek - 0.171 - a podle ostatních vyhodnocovacích metrik se umístil mezi 2. a 8. místem z 22 účastnících se týmů.	cz
dc.format	5
dc.identifier.doi	10.1109/ICASSP49357.2023.10095676
dc.identifier.isbn	978-1-72816-327-7
dc.identifier.issn	1520-6149
dc.identifier.obd	43939754
dc.identifier.orcid	Kunešová, Marie 0000-0002-7187-8481
dc.identifier.orcid	Matoušek, Jindřich 0000-0002-7408-7730
dc.identifier.orcid	Lehečka, Jan 0000-0002-3889-8069
dc.identifier.orcid	Švec, Jan 0000-0001-8362-5927
dc.identifier.orcid	Michálek, Josef 0000-0001-7757-3163
dc.identifier.orcid	Tihelka, Daniel 0000-0002-3149-2330
dc.identifier.orcid	Bulín, Martin 0000-0003-0276-3143
dc.identifier.orcid	Hanzlíček, Zdeněk 0000-0002-4001-9289
dc.identifier.orcid	Řezáčková, Markéta 0000-0002-6194-7826
dc.identifier.uri	http://hdl.handle.net/11025/59575
dc.language.iso	en
dc.project.ID	SGS-2022-017
dc.project.ID	GA22-27800S
dc.publisher	IEEE
dc.relation.ispartofseries	48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)
dc.subject	MOS prediction	en
dc.subject	speech quality assessment	en
dc.subject	speech synthesis	en
dc.subject	mean opinion score	en
dc.subject	predikce MOS	cz
dc.subject	hodnocení kvality řeči	cz
dc.subject	syntéza řeči	cz
dc.subject	mean opinion score	cz
dc.title	Ensemble of Deep Neural Network Models for MOS Prediction	en
dc.title	Ensemble modelů hlubokých neuronových sítí pro predikci MOS	cz
dc.type	Stať ve sborníku (D)
dc.type	STAŤ VE SBORNÍKU
dc.type.status	Published Version
local.files.count	1	*
local.files.size	1099105	*
local.has.files	yes	*
local.identifier.eid	2-s2.0-85177577272

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: Kunesova_Ensemble_of_Deep_Neural_Network_Models_for_MOS_Prediction.pdf
Size:: 1.05 MB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Conference Papers (KKY)