COMICORDA: Dialogue Act Recognition in Comic Books

dc.contributor.authorMartínek, Jiří
dc.contributor.authorBaloun, Josef
dc.contributor.authorPrantl, Martin
dc.contributor.authorLenc, Ladislav
dc.contributor.authorKrál, Pavel
dc.date.accessioned2025-06-20T08:38:52Z
dc.date.available2025-06-20T08:38:52Z
dc.date.issued2024
dc.date.updated2025-06-20T08:38:52Z
dc.description.abstractDialogue act (DA) recognition is usually realized from a speech signal that is transcribed and segmented into text. However, only a little work in DA recognition from images exists. Therefore, this paper concentrates on this modality and presents a novel DA recognition approach for image documents, namely comic books. To the best of our knowledge, this is the first study investigating dialogue acts from comic books and represents the first steps to building a model for comic book understanding. The proposed method is composed of the following steps: speech balloon segmentation, optical character recognition (OCR), and DA recognition itself. We use YOLOv8 for balloon segmentation, Google Vision for OCR, and Transformer-based models for DA classification. The experiments are performed on a newly created dataset comprising 1,438 annotated comic panels. It contains bounding boxes, transcriptions, and dialogue act annotation. We have achieved nearly 98% average precision for speech balloon segmentation and exceeded the accuracy of 70% for the DA recognition task. We also present an analysis of dialogue structure in the comics domain and compare it with the standard DA datasets, representing another contribution of this paper.en
dc.format13
dc.identifier.isbn978-2-493-81410-4
dc.identifier.issn2951-2093
dc.identifier.obd43943632
dc.identifier.orcidMartínek, Jiří 0000-0003-2981-1723
dc.identifier.orcidBaloun, Josef 0000-0003-1923-5355
dc.identifier.orcidPrantl, Martin 0000-0002-7900-5028
dc.identifier.orcidLenc, Ladislav 0000-0002-1066-7269
dc.identifier.orcidKrál, Pavel 0000-0002-3096-675X
dc.identifier.urihttp://hdl.handle.net/11025/60609
dc.language.isoen
dc.project.IDSGS-2022-016
dc.publisherELRA and ICCL
dc.relation.ispartofseriesJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
dc.subjectcomics processingen
dc.subjectdialogue act recognitionen
dc.subjectOCRen
dc.subjectspeech balloon segmentationen
dc.subjectYOLOv8en
dc.titleCOMICORDA: Dialogue Act Recognition in Comic Booksen
dc.typeStať ve sborníku (D)
dc.typeSTAŤ VE SBORNÍKU
dc.type.statusPublished Version
local.files.count1*
local.files.size694869*
local.has.filesyes*
local.identifier.eid2-s2.0-85195960167

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
2024.lrec-main.316.pdf
Size:
678.58 KB
Format:
Adobe Portable Document Format
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: