Exploring Oral History Archives Using State-of-the-Art Artificial Intelligence Methods

Bulín, Martin

Exploring Oral History Archives Using State-of-the-Art Artificial Intelligence Methods

dc.contributor.author	Bulín, Martin
dc.contributor.author	Švec, Jan
dc.contributor.author	Ircing, Pavel
dc.contributor.author	Frémund, Adam
dc.contributor.author	Polák, Filip
dc.date.accessioned	2026-04-02T18:05:30Z
dc.date.available	2026-04-02T18:05:30Z
dc.date.issued	2025
dc.date.updated	2026-04-02T18:05:30Z
dc.description.abstract	Background: The preservation and analysis of spoken data in oral history archives, such as Holocaust testimonies, provide a vast and complex knowledge source. These archives pose unique challenges and opportunities for computational methods, particularly in self-supervised learning and information retrieval. Objective: This study explores the application of state-of-the-art artificial intelligence (AI) models, particularly transformer-based architectures, to enhance navigation and engagement with large-scale oral history testimonies. The goal is to improve accessibility while preserving the authenticity and integrity of historical records. Methods: We developed an asking questions framework utilizing a fine-tuned T5 model to generate contextually relevant questions from interview transcripts. To ensure semantic coherence, we introduced a semantic continuity model based on a BERT-like architecture trained with contrastive loss. Results: The system successfully generated contextually relevant questions from oral history testimonies, enhancing user navigation and engagement. Filtering techniques improved question quality by retaining only semantically coherent outputs, ensuring alignment with the testimony content. The approach demonstrated effectiveness in handling spontaneous, unstructured speech, with a significant improvement in question relevance compared to models trained on structured text. Applied to real-world interview transcripts, the framework balanced enrichment of user experience with preservation of historical authenticity. Conclusion: By integrating generative AI models with robust retrieval techniques, we enhance the accessibility of oral history archives while maintaining their historical integrity. This research demonstrates how AI-driven approaches can facilitate interactive exploration of vast spoken data repositories, benefiting researchers, historians and the general public.	en
dc.description.abstract	Souvislosti: Uchovávání a analýza mluvených dat v archivech orální historie, jako jsou svědectví o holocaustu, poskytuje rozsáhlý a komplexní zdroj znalostí. Tyto archivy představují jedinečné výzvy a příležitosti pro výpočetní metody, zejména v oblasti samostudia a vyhledávání informací. Cíl: Tato studie zkoumá aplikaci nejmodernějších modelů umělé inteligence (AI), zejména architektur založených na Transformerech, pro zlepšení navigace a zapojení do rozsáhlých svědectví orální historie. Cílem je zlepšit dostupnost a zároveň zachovat autenticitu a integritu historických záznamů. Metody: Vyvinuli jsme rámec pro kladení otázek s využitím vyladěného modelu T5 pro generování kontextově relevantních otázek z přepisů rozhovorů. Pro zajištění sémantické koherence jsme zavedli model sémantické kontinuity založený na architektuře podobné BERT, trénované s kontrastivní ztrátou. Výsledky: Systém úspěšně generoval kontextově relevantní otázky z svědectví orální historie, čímž zlepšil navigaci a zapojení uživatelů. Techniky filtrování zlepšily kvalitu otázek tím, že zachovaly pouze sémanticky koherentní výstupy a zajistily soulad s obsahem svědectví. Tento přístup prokázal účinnost při zpracování spontánní, nestrukturované řeči s výrazným zlepšením relevance otázek ve srovnání s modely trénovanými na strukturovaném textu. Aplikováno na přepisy rozhovorů z reálného světa, tento rámec vyvažoval obohacení uživatelské zkušenosti se zachováním historické autenticity. Závěr: Integrací generativních modelů umělé inteligence s robustními technikami vyhledávání zlepšujeme dostupnost archivů orální historie a zároveň zachováváme jejich historickou integritu. Tento výzkum ukazuje, jak přístupy založené na umělé inteligenci mohou usnadnit interaktivní průzkum rozsáhlých úložišť mluvených dat, což je přínosem pro výzkumníky, historiky i širokou veřejnost.	cz
dc.format	8
dc.identifier.document-number	001538926300003
dc.identifier.doi	10.18267/j.aip.268
dc.identifier.issn	1805-4951
dc.identifier.obd	43947249
dc.identifier.orcid	Bulín, Martin 0000-0003-0276-3143
dc.identifier.orcid	Švec, Jan 0000-0001-8362-5927
dc.identifier.orcid	Ircing, Pavel 0000-0001-6967-1687
dc.identifier.orcid	Frémund, Adam 0000-0001-8780-6629
dc.identifier.orcid	Polák, Filip 0009-0003-3969-3772
dc.identifier.uri	http://hdl.handle.net/11025/67489
dc.language.iso	en
dc.project.ID	GA22-27800S
dc.relation.ispartofseries	Acta Informatica Pragensia
dc.rights.access	A
dc.subject	AI	en
dc.subject	machine learning in digital humanities	en
dc.subject	oral history archives	en
dc.subject	transformer-based models	en
dc.subject	umělá inteligence	cz
dc.subject	strojové učení v digitálních humanitních vědách	cz
dc.subject	archivy orální historie	cz
dc.subject	modely založené na Transformer architektuře	cz
dc.title	Exploring Oral History Archives Using State-of-the-Art Artificial Intelligence Methods	en
dc.title	Průzkum archivů orální historie s využitím nejmodernějších metod umělé inteligence	cz
dc.type	Článek v databázi WoS (Jimp)
dc.type	ČLÁNEK
dc.type.status	Published Version
local.files.count	1	*
local.files.size	866799	*
local.has.files	yes	*
local.identifier.eid	2-s2.0-105011686403

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: AIP_aip-202502-0006.pdf
Size:: 846.48 KB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Articles (KKY)