A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives
| dc.contributor.author | Lehečka, Jan | |
| dc.contributor.author | Psutka, Josef | |
| dc.contributor.author | Šmídl, Luboš | |
| dc.contributor.author | Ircing, Pavel | |
| dc.contributor.author | Psutka, Josef | |
| dc.date.accessioned | 2025-06-20T08:35:46Z | |
| dc.date.available | 2025-06-20T08:35:46Z | |
| dc.date.issued | 2024 | |
| dc.date.updated | 2025-06-20T08:35:46Z | |
| dc.description.abstract | In this paper, we are comparing monolingual Wav2Vec 2.0 models with various multilingual models to see whether we could improve speech recognition performance on a unique oral history archive containing a lot of mixed-language sentences. Our main goal is to push forward research on this unique dataset, which is an extremely valuable part of our cultural heritage. Our results suggest that monolingual speech recognition models are, in most cases, superior to multilingual models, even when processing the oral history archive full of mixed-language sentences from non-native speakers. We also performed the same experiments on the public CommonVoice dataset to verify our results. We are contributing to the research community by releasing our pre-trained models to the public. | en |
| dc.format | 5 | |
| dc.identifier.document-number | 001331850101086 | |
| dc.identifier.doi | 10.21437/Interspeech.2024-472 | |
| dc.identifier.isbn | neuvedeno | |
| dc.identifier.issn | 2308-457X | |
| dc.identifier.obd | 43944107 | |
| dc.identifier.orcid | Lehečka, Jan 0000-0002-3889-8069 | |
| dc.identifier.orcid | Psutka, Josef 0000-0003-4761-1645 | |
| dc.identifier.orcid | Šmídl, Luboš 0000-0002-8169-2410 | |
| dc.identifier.orcid | Ircing, Pavel 0000-0001-6967-1687 | |
| dc.identifier.orcid | Psutka, Josef 0000-0002-0764-3207 | |
| dc.identifier.uri | http://hdl.handle.net/11025/60312 | |
| dc.language.iso | en | |
| dc.project.ID | 90254 | |
| dc.project.ID | VJ01010108 | |
| dc.publisher | International Speech Communication Association (ISCA) | |
| dc.relation.ispartofseries | 25th Interspeech Conference 2024 | |
| dc.subject | speech recognition | en |
| dc.subject | bilingual models | en |
| dc.subject | trilingual models | en |
| dc.subject | oral history archives | en |
| dc.title | A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives | en |
| dc.type | Stať ve sborníku (D) | |
| dc.type | STAŤ VE SBORNÍKU | |
| dc.type.status | Published Version | |
| local.files.count | 1 | * |
| local.files.size | 238126 | * |
| local.has.files | yes | * |
| local.identifier.eid | 2-s2.0-85208505026 |
Files
Original bundle
1 - 1 out of 1 results
No Thumbnail Available
- Name:
- lehecka24_interspeech.pdf
- Size:
- 232.54 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 out of 1 results
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: