Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

Lehečka, Jan

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

dc.contributor.author	Lehečka, Jan
dc.contributor.author	Švec, Jan
dc.contributor.author	Pražák, Aleš
dc.contributor.author	Psutka, Josef
dc.date.accessioned	2023-01-30T11:00:27Z
dc.date.available	2023-01-30T11:00:27Z
dc.date.issued	2022
dc.description.abstract-translated	In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech. We are presenting a large palette of experiments with various fine-tuning setups evaluated on two public datasets (CommonVoice and VoxPopuli) and one extremely challenging dataset from the MALACH project. Our results show that monolingual Wav2Vec 2.0 models are robust ASR systems, which can take advantage of large labeled and unlabeled datasets and successfully compete with state-of-the-art LVCSR systems. Moreover, Wav2Vec models proved to be good zero-shot learners when no training data are available for the target ASR task.	en
dc.format	5 s.	cs
dc.format.mimetype	application/pdf
dc.identifier.citation	LEHEČKA, J. ŠVEC, J. PRAŽÁK, A. PSUTKA, J. Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. New York: Red Hook, 2022. s. 1831-1835. ISBN: neuvedeno , ISSN: 2308-457X	cs
dc.identifier.doi	10.21437/Interspeech.2022-10439
dc.identifier.isbn	neuvedeno
dc.identifier.issn	2308-457X
dc.identifier.obd	43936705
dc.identifier.uri	2-s2.0-85139048808
dc.identifier.uri	http://hdl.handle.net/11025/51163
dc.language.iso	en	en
dc.project.ID	90140/Velká výzkumná infrastruktura_(J) - e-INFRA CZ	cs
dc.project.ID	GA22-27800S/Využití vícemodálních Transformerů pro přirozenější hlasový dialog	cs
dc.project.ID	EF17_048/0007267/InteCom: VaV inteligentních komponent pokročilých technologií pro plzeňskou metropolitní oblast	cs
dc.publisher	International Speech Communication Association	en
dc.relation.ispartofseries	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	en
dc.rights	Plný text není přístupný.	cs
dc.rights	© 2022 ISCA	en
dc.rights.access	closedAccess	en
dc.subject.translated	speech recognition, audio transformers, Wav2Vec	en
dc.title	Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech	en
dc.type	konferenční příspěvek	cs
dc.type	ConferenceObject	en
dc.type.status	Peer-reviewed	en
dc.type.version	publishedVersion	en

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: Lehecka_Svec_Prazak_PsutkaJV-Exploring_Capabilties_Interspeech_2022.pdf
Size:: 197.58 KB
Format:: Adobe Portable Document Format

Download

Collections

OBD
Articles (KKY)
Articles (NTIS)