T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion
Date issued
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The present paper explores the use of several deep neural network architectures to carry out a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language-independent approach to the task. The models explored are trained on whole sentences in order to automatically capture cross-word context (such as voicedness assimilation) if it exists in the given language. Four different languages, English, Czech, Russian, and German, were chosen due to their different nature and requirements for the G2P task. Ultimately, the Text-to-Text Transfer Transformer (T5) based model achieved very high conversion accuracy on all the tested languages. Also, it exceeded the accuracy reached by a similar system, when trained on a public LibriSpeech database.
Description
Subject(s)
CNN, Czech, English, G2P, German, phonetic transcription, RNN, Russian, T5