A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives

Lehečka, Jan

A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives

dc.contributor.author	Lehečka, Jan
dc.contributor.author	Psutka, Josef
dc.contributor.author	Šmídl, Luboš
dc.contributor.author	Ircing, Pavel
dc.contributor.author	Psutka, Josef
dc.date.accessioned	2025-06-20T08:35:46Z
dc.date.available	2025-06-20T08:35:46Z
dc.date.issued	2024
dc.date.updated	2025-06-20T08:35:46Z
dc.description.abstract	In this paper, we are comparing monolingual Wav2Vec 2.0 models with various multilingual models to see whether we could improve speech recognition performance on a unique oral history archive containing a lot of mixed-language sentences. Our main goal is to push forward research on this unique dataset, which is an extremely valuable part of our cultural heritage. Our results suggest that monolingual speech recognition models are, in most cases, superior to multilingual models, even when processing the oral history archive full of mixed-language sentences from non-native speakers. We also performed the same experiments on the public CommonVoice dataset to verify our results. We are contributing to the research community by releasing our pre-trained models to the public.	en
dc.format	5
dc.identifier.document-number	001331850101086
dc.identifier.doi	10.21437/Interspeech.2024-472
dc.identifier.isbn	neuvedeno
dc.identifier.issn	2308-457X
dc.identifier.obd	43944107
dc.identifier.orcid	Lehečka, Jan 0000-0002-3889-8069
dc.identifier.orcid	Psutka, Josef 0000-0003-4761-1645
dc.identifier.orcid	Šmídl, Luboš 0000-0002-8169-2410
dc.identifier.orcid	Ircing, Pavel 0000-0001-6967-1687
dc.identifier.orcid	Psutka, Josef 0000-0002-0764-3207
dc.identifier.uri	http://hdl.handle.net/11025/60312
dc.language.iso	en
dc.project.ID	90254
dc.project.ID	VJ01010108
dc.publisher	International Speech Communication Association (ISCA)
dc.relation.ispartofseries	25th Interspeech Conference 2024
dc.subject	speech recognition	en
dc.subject	bilingual models	en
dc.subject	trilingual models	en
dc.subject	oral history archives	en
dc.title	A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives	en
dc.type	Stať ve sborníku (D)
dc.type	STAŤ VE SBORNÍKU
dc.type.status	Published Version
local.files.count	1	*
local.files.size	238126	*
local.has.files	yes	*
local.identifier.eid	2-s2.0-85208505026

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: lehecka24_interspeech.pdf
Size:: 232.54 KB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Conference Papers (KKY)