Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for Voicemos 2024

dc.contributor.authorKunešová, Marie
dc.contributor.authorPražák, Aleš
dc.contributor.authorLehečka, Jan
dc.date.accessioned2026-05-15T09:12:40Z
dc.date.available2026-05-15T09:12:40Z
dc.date.issued2026
dc.description.abstract-translatedWe present a system for non-intrusive prediction of speech quality in noisy and enhanced speech, developed for Track 3 of the Voice-MOS 2024 Challenge. The task required estimating the ITU-T P.835 metrics SIG, BAK, and OVRL without reference signals and with only 100 subjectively labeled utterances for training. Our approach uses wav2vec 2.0 with a two-stage transfer learning strategy: initial fine-tuning on automatically labeled noisy data, followed by adaptation to the challenge data. The system achieved the best performance on BAK prediction (LCC = 0.867) and a very close second place in OVRL (LCC = 0.711) in the official evaluation. Post-challenge experiments show that adding artificially degraded data to the first fine-tuning stage substantially improves SIG prediction, raising correlation with ground truth scores from 0.207 to 0.516. These results demonstrate that transfer learning with targeted data generation is effective for predicting P.835 scores under severe data constraints.en
dc.description.sponsorshipEH23_021/0008436 VaV technologií pro pokročilou digitalizaci v plzeňské metropolitní oblasti (DigiTech)cs
dc.format6 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.doihttps://doi.org/10.1109/ICASSP55912.2026.11463227
dc.identifier.isbn979-8-3315-6701-9
dc.identifier.issn1520-6149
dc.identifier.urihttp://hdl.handle.net/11025/68048
dc.language.isoenen
dc.publisherIEEEen
dc.rights© IEEEen
dc.rights.accessopenAccessen
dc.subjectwav2vec 2.0cs
dc.subjectITU-T P.835cs
dc.subjectVoiceMOS Challengecs
dc.subjectposouzení kvality řečics
dc.subject.translatedspeech quality assessmenten
dc.subject.translatedwav2vec 2.0en
dc.subject.translatedITU-T P.835en
dc.subject.translatedVoiceMOS Challengeen
dc.titleQuality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for Voicemos 2024en
dc.typepostprinten
dc.typepostprintcs
dc.type.statusPeer revieweden
dc.type.versionacceptedVersionen
local.files.count1*
local.files.size243773*
local.has.filesyes*

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
2026_ICASSP_VoiceMOS_Track_3_post-print.pdf
Size:
238.06 KB
Format:
Adobe Portable Document Format
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: