Enhancing Masked Language Modeling in BERT Models Using Pretrained Static Embeddings

Mištera, Adam

Enhancing Masked Language Modeling in BERT Models Using Pretrained Static Embeddings

Files

Mištera, Král Enhancing_Masked_Language_Modeling_in_BERT_Models_Using_Pretrained_Static_Embeddings.pdf (458.11 KB)

Date issued

2026

Authors

Mištera, Adam

Král, Pavel

Publisher

Springer

Abstract

This paper explores the integration of pretrained static fastText word vectors into a simplified Transformer-based model to improve its efficiency and accuracy. Despite the fact that these embeddings have been outperformed by large models based on the Transformer architecture, they can still contribute useful linguistic information, when combined with contextual models, especially in low resource or computationally constrained environments. We demonstrate this by incorporating static embeddings directly into our own BERTTINY-based models prior to pretraining using masked language modeling. In this paper, we train the models on seven different languages covering three distinct language families. The results show that the use of static fastText embeddings in these models not only improves convergence for all tested languages, but also significantly improves their evaluation accuracy.
Tento článek zkoumá integraci předem natrénovaných statických slovních vektorů fastText do zjednodušeného modelu založeného na transformátoru s cílem zlepšit jeho účinnost a přesnost. Přestože tyto vnoření slov byly překonány velkými modely založenými na architektuře transformátoru, mohou v kombinaci s kontextovými modely stále přispívat užitečnými lingvistickými informacemi, zejména v prostředích s omezenými zdroji nebo výpočetními možnostmi. To demonstrujeme začleněním statických vnoření slov přímo do našich vlastních modelů založených na BERTTINY před jejich předtrénováním pomocí maskovaného jazykového modelování. V tomto článku trénujeme modely na sedmi různých jazycích pokrývajících tři odlišné jazykové rodiny. Výsledky ukazují, že použití statických fastText reprezentací v těchto modelech nejen zlepšuje konvergenci pro všechny testované jazyky, ale také významně zlepšuje jejich přesnost.

Subject(s)

transformers, embeddings, pretraining, transformátory, vnoření slov, předtrénování

Item identifier

http://hdl.handle.net/11025/67839
https://doi.org/10.1007/978-3-032-02551-7_19

Collections

Conference Papers (KIV)

Show full item record

Enhancing Masked Language Modeling in BERT Models Using Pretrained Static Embeddings

Files

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Subject(s)

Citation

Item identifier

Collections