Multi-label Classification and Named Entity Recognition for Historical Documents

Gruber, Ivan

Multi-label Classification and Named Entity Recognition for Historical Documents

Files

gruber_Multi-label Classification.pdf (942.23 KB)

Date issued

2025

Authors

Publisher

Springer

Abstract

In this paper, we present improvements to our processing pipeline for historical document digitization. The original pipeline is extended with two new functionalities - page labeling, and named entity recognition. We handle page labeling as a multi-label classification task, for which we choose the Query2Label approach. Query2Label is tested on our internal NKVD dataset and reaches a mean average precision equal to 80.03% on the test set. For the named entity recognition task we utilize pre-trained transformer-based models DeepPavlov and benchmark them on two entities - person name, and location. The best model reaches promising results despite not being trained on our data at all.

Subject(s)

multi-label classification, named entity recognition, historical documents

Item identifier

http://hdl.handle.net/11025/67464
https://doi.org/10.1007/978-3-031-81010-7_2

Collections

Conference papers (NTIS)

Show full item record

Multi-label Classification and Named Entity Recognition for Historical Documents

Files

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Subject(s)

Citation

Item identifier

Collections