Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output

dc.contributor.authorŠvec, Jan
dc.contributor.authorLehečka, Jan
dc.contributor.authorŠmídl, Luboš
dc.contributor.authorIrcing, Pavel
dc.date.accessioned2022-03-28T10:00:27Z
dc.date.available2022-03-28T10:00:27Z
dc.date.issued2021
dc.description.abstract-translatedThe paper proposes a module for automatic punctuation prediction and casing reconstruction based on transformers architectures (BERT/T5) that constitutes the current state-of-the-art in many similar NLP tasks. The main motivation for our work was to increase the readability of the ASR output. The ASR output is usually in the form of a continuous stream of text, without punctuation marks and with all words in lowercase. The resulting punctuation and casing reconstruction module is evaluated on both the written text and the actual ASR output in three languages (English, Czech and Slovak).en
dc.format9 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.citationŠVEC, J. LEHEČKA, J. ŠMÍDL, L. IRCING, P. Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output. In Text, Speech, and Dialogue 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings. Cham: Springer International Publishing, 2021. s. 86-94. ISBN: 978-3-030-83526-2 , ISSN: 0302-9743cs
dc.identifier.doi10.1007/978-3-030-83527-9_7
dc.identifier.isbn978-3-030-83526-2
dc.identifier.issn0302-9743
dc.identifier.obd43933408
dc.identifier.uri2-s2.0-85115216462
dc.identifier.urihttp://hdl.handle.net/11025/47244
dc.language.isoenen
dc.project.IDTN01000024/Národní centrum kompetence - Kybernetika a umělá inteligencecs
dc.project.ID90140/Velká výzkumná infrastruktura_(J) - e-INFRA CZcs
dc.publisherSpringer International Publishingen
dc.relation.ispartofseriesText, Speech, and Dialogue 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedingsen
dc.rightsPlný text je přístupný v rámci univerzity přihlášeným uživatelům.cs
dc.rights© Springeren
dc.rights.accessrestrictedAccessen
dc.subject.translatedASRen
dc.subject.translatedBERTen
dc.subject.translatedT5en
dc.subject.translatedPunctuation predictoren
dc.subject.translatedWord casing reconstructionen
dc.titleTransformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Outputen
dc.typekonferenční příspěvekcs
dc.typeConferenceObjecten
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
Svec_Transformer-BasedAutomatic_TSD2021.pdf
Size:
10.05 MB
Format:
Adobe Portable Document Format