Comparing and combining modeling techniques for sentence segmentation of spoken czech using textual and prosodic information

dc.contributor.authorKolář, Jáchym
dc.contributor.authorLiu, Yang
dc.date.accessioned2016-01-08T06:21:52Z
dc.date.available2016-01-08T06:21:52Z
dc.date.issued2010
dc.description.abstract-translatedThis paper deals with automatic sentence boundary detection in spoken Czech using both textual and prosodic information. This task is important to make automatic speech recognition (ASR) output more readable and easier for downstream language processing modules. We compare and combine three statistical models – hidden Markov model, maximum entropy, and adaptive boosting. We evaluate these methods on two Czech corpora, broadcast news and broadcast conversations, using both manual and ASR transcripts. Our results show that superior results are achieved when all the three models are combined via posterior probability interpolation, and that there is substantial difference among the three methods when using different knowledge sources, as well as in different genres. Feature analysis also reveals significant differences in prosodic feature usage patterns between the two genres.en
dc.format4 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.citationKOLÁŘ, Jáchym; LIU, Yang. Comparing and combining modeling techniques for sentence segmentation of spoken czech using textual and prosodic information. In: Proceeding of conference Speech prosody 2010, 11th-14th May 2010, Chicago, USA. Chicago: University of Illionois, 2010, p. [1-4].en
dc.identifier.urihttp://www.kky.zcu.cz/cs/publications/JachymKolar_2010_Comparingand
dc.identifier.urihttp://hdl.handle.net/11025/17173
dc.language.isoenen
dc.publisherUniversity of Illionoisen
dc.rights© Jáchym Kolář - Yang Liucs
dc.rights.accessopenAccessen
dc.subjectsegmentace větcs
dc.subjectprozodiecs
dc.subjectHMMcs
dc.subjectmaximální entropiecs
dc.subjectposílenícs
dc.subject.translatedsentence segmentationen
dc.subject.translatedprosodyen
dc.subject.translatedHMMen
dc.subject.translatedmaximum entropyen
dc.subject.translatedboostingen
dc.titleComparing and combining modeling techniques for sentence segmentation of spoken czech using textual and prosodic informationen
dc.typečlánekcs
dc.typearticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
JachymKolar_2010_Comparingand.pdf
Size:
64.59 KB
Format:
Adobe Portable Document Format
Description:
Plný text
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections