Transformer-Based Encoder-Encoder Architecture for Spoken Term Detection

dc.contributor.authorŠvec, Jan
dc.contributor.authorŠmídl, Luboš
dc.contributor.authorLehečka, Jan
dc.date.accessioned2025-06-20T08:43:52Z
dc.date.available2025-06-20T08:43:52Z
dc.date.issued2023
dc.date.updated2025-06-20T08:43:52Z
dc.description.abstractThe paper presents a method for spoken term detection based on the Transformer architecture. We propose the encoder-encoder architecture employing two BERT-like encoders with additional modifications, including attention masking, convolutional and upsampling layers. The encoders project a recognized hypothesis and a searched term into a shared embedding space, where the score of the putative hit is computed using the calibrated dot product. In the experiments, we used the Wav2Vec 2.0 speech recognizer. The proposed system outperformed a baseline method based on deep LSTMs on the English and Czech STD datasets based on USC Shoah Foundation Visual History Archive (MALACH).en
dc.format12
dc.identifier.doi10.1007/978-3-031-47665-5_28
dc.identifier.isbn978-3-031-47664-8
dc.identifier.issn0302-9743
dc.identifier.obd43940821
dc.identifier.orcidŠvec, Jan 0000-0001-8362-5927
dc.identifier.orcidŠmídl, Luboš 0000-0002-8169-2410
dc.identifier.orcidLehečka, Jan 0000-0002-3889-8069
dc.identifier.urihttp://hdl.handle.net/11025/60796
dc.language.isoen
dc.project.IDGA22-27800S
dc.project.IDVJ01010108
dc.publisherSpringer
dc.relation.ispartofseries7th Asian Conference on Pattern Recognition (ACPR 2023)
dc.subjectneural networksen
dc.subjecttransformer architectureen
dc.subjectspoken term detectionen
dc.titleTransformer-Based Encoder-Encoder Architecture for Spoken Term Detectionen
dc.typeStať ve sborníku (D)
dc.typeSTAŤ VE SBORNÍKU
dc.type.statusPublished Version
local.files.count1*
local.files.size1010436*
local.has.filesyes*
local.identifier.eid2-s2.0-85177433088

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
Svec_Smidl_Lehecka_Transformer-Based_Encoder-Encoder_Architecture_for_Spoken_Term_Detection_2023.pdf
Size:
986.75 KB
Format:
Adobe Portable Document Format
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: