On self-supervision in historical handwritten document segmentation
Date issued
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Historical document analysis plays a crucial role in understanding and preserving our past. However, this task is oftenhindered by challenges such as limited annotated training data and the diverse nature of historical handwritten documents. Inthis paper,we explore the potential of self-supervised learning (SSL) in historical document analysis,with a particular focus onhistorical handwritten document segmentation, to overcome the need for extensive annotated data while enhancing efficiencyand robustness. We present an overview of SSL methods suitable for historical document analysis and discuss their potentialapplications and benefits. Furthermore, we present an approach for SSL in the document domain, considering various setups,augmentations, and resolutions. We also provide experimental results that demonstrate its feasibility and effectiveness. Ourfindings indicate that most document segmentation tasks can be effectively addressed using SSL features, highlighting thepotential of SSL to advance historical document analysis and pave the way for more efficient and robust document processingworkflows.
Description
Subject(s)
historical handwritten document, self-supervised learning, document digitization, semantic segmentation