Czech spontaneous speech corpus with structural metadata

Kolář, Jáchym

Czech spontaneous speech corpus with structural metadata

dc.contributor.author	Kolář, Jáchym
dc.contributor.author	Švec, Jan
dc.contributor.author	Strassel, Stephanie
dc.contributor.author	Walker, Christopher
dc.contributor.author	Kozlíková, Dagmar
dc.contributor.author	Psutka, Josef
dc.date.accessioned	2016-01-06T08:58:52Z
dc.date.available	2016-01-06T08:58:52Z
dc.date.issued	2005
dc.description.abstract	Tento článek popisuje český korpus spontánní řeči skládajícíse z nahrávek rozhlasových diskusních pořadů. Jako první kompletní neanglický MDE korpus byl anotován strukturálními metadaty, která zvyšují čitelnost přepisů člověkem a umožňují i další automatické zpracování. Anotace zahrnuje rozdělení přepisů do syntakticko-sémantických jednotek a identifikace výplní a neplynulostí. Mimo modifikací nutných pouze pro češtinu také navrhujeme některé modifikace nezávislé na jazyku, jako je například limitované prozodické značkování na hranicích syntakticko-sémantických jednotek.	cs
dc.description.abstract-translated	This paper describes a Czech spontaneous speech corpus consisting of radio talk show recordings. As the first complete non-English MDE corpus, it has been annotated with structural metadata information beyond the words that is critical to both increasing transcript readability and allowing application of downstream NLP methods. Metadata annotation involves partitioning verbatim transcripts into syntactic/semantic units (SUs) that function to express a complete idea; and identifying fillers and edit disfluencies. Annotation guidelines for English metadata developed by Linguistic Data Consortium were taken as the starting point, with changes applied to accommodate specific phenomena of Czech. In addition to the necessary language-dependent modifications, we further propose some language-independent modifications including limited prosodic labeling at SU boundaries. Statistics about the structural metadata annotation present in the corpus and inter-annotator agreement numbers are also presented.	en
dc.format	4 s.	cs
dc.format.mimetype	application/pdf
dc.identifier.citation	KOLÁŘ, Jáchym; ŠVEC, Jan; STRASSEL, Stephanie; WALKER, Christopher; KOZLÍKOVÁ, Dagmar; PSUTKA, Josef. Czech spontaneous speech corpus with structural metadata. In: Proceedings of ICSPL 2005: 6th Annual Conference of the International Speech Communication Association 2005, Lisboa, Portugal, 4-8 September 2005. [Baixas]: ISCA, 2005, p. 1165-1168. ISSN 1990-9772.	en
dc.identifier.issn	1990-9772
dc.identifier.uri	http://www.kky.zcu.cz/cs/publications/KolarJ_2005_Czechspontaneous
dc.identifier.uri	http://hdl.handle.net/11025/17115
dc.language.iso	en	en
dc.publisher	ISCA	en
dc.rights	© Jáchym Kolář - Jan Švec - Stephanie Strassel - Christopher Walker - Dagmar KozlÍková - Josef Psutka	cs
dc.rights.access	openAccess	en
dc.subject	strukturální metadata	cs
dc.subject	spontánní řeč	cs
dc.subject	neplynulost	cs
dc.subject	výplně	cs
dc.subject.translated	structural metadata	en
dc.subject.translated	spontaneous speech	en
dc.subject.translated	disfluence	en
dc.subject.translated	fillers	en
dc.title	Czech spontaneous speech corpus with structural metadata	en
dc.title.alternative	Český korpus spontánní řeči s anotací strukturálních metadat	cs
dc.type	článek	cs
dc.type	article	en
dc.type.status	Peer-reviewed	en
dc.type.version	publishedVersion	en

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: KolarJ_2005_Czechspontaneous.pdf
Size:: 80.02 KB
Format:: Adobe Portable Document Format
Description:: Plný text

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Articles (KKY)