Evaluation Datasets for Cross-lingual Semantic Textual Similarity

Hercig, Tomáš

Evaluation Datasets for Cross-lingual Semantic Textual Similarity

dc.contributor.author	Hercig, Tomáš
dc.contributor.author	Král, Pavel
dc.date.accessioned	2022-03-21T11:00:18Z
dc.date.available	2022-03-21T11:00:18Z
dc.date.issued	2021
dc.description.abstract	Systémy sémantické textové podobnosti (STS) odhadují míru významové podobnosti mezi dvěma větami. Mezijazyčné systémy STS odhadují míru významové podobnosti mezi dvěma větami, z nichž každá je v jiném jazyce. Nejmodernější algoritmy obvykle využívají přístupy s učitelem, které je obtížné použít pro jazyky s nedostatečnými zdroji. Každý přístup však musí mít k vyhodnocení výsledků anotovaná data. V tomto článku představujeme nové anotované datasety pro vícejazyčné a jednojazyčné STS pro jazyky, kde takové sady zatím nejsou k dispozici. Na těchto datech dále prezentujeme výsledky několika nejmodernějších metod, které lze použít jako základ pro další výzkum. Věříme, že tento článek nejen rozšíří současný výzkum STS pro další jazyky, ale také podpoří soutěž na těchto nových hodnotících datech.	cs
dc.description.abstract-translated	Semantic textual similarity (STS) systems estimate the degree of the meaning similarity between two sentences. Cross-lingual STS systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-art algorithms usually employ a strongly supervised, resource-rich approach difficult to use for poorly-resourced languages. However, any approach needs to have evaluation data to confirm the results. In order to simplify the evaluation process for poorly-resourced languages (in terms of STS evaluation datasets), we present new datasets for cross-lingual and monolingual STS for languages without this evaluation data. We also present the results of several state-of-the-art methods on these data which can be used as a baseline for further research. We believe that this article will not only extend the current STS research to other languages, but will also encourage competition on this new evaluation data.	en
dc.format	6 s.	cs
dc.format.mimetype	application/pdf
dc.identifier.citation	HERCIG, T. KRÁL, P. Evaluation Datasets for Cross-lingual Semantic Textual Similarity. In Deep Learning for Natural Language Processing Methods and Applications. Shoumen: INCOMA, Ltd., 2021. s. 524-529. ISBN: 978-954-452-072-4 , ISSN: 1313-8502	cs
dc.identifier.doi	10.26615/978-954-452-072-4_059
dc.identifier.isbn	978-954-452-072-4
dc.identifier.issn	1313-8502
dc.identifier.obd	43934750
dc.identifier.uri	2-s2.0-85123631732
dc.identifier.uri	http://hdl.handle.net/11025/47197
dc.language.iso	en	en
dc.project.ID	EF17_048/0007267/InteCom: VaV inteligentních komponent pokročilých technologií pro plzeňskou metropolitní oblast	cs
dc.publisher	INCOMA, Ltd.	en
dc.relation.ispartofseries	Deep Learning for Natural Language Processing Methods and Applications	en
dc.rights	© Incoma Ltd.	en
dc.rights.access	openAccess	en
dc.subject	Datová sada	cs
dc.subject	Evaluace	cs
dc.subject	mezijazyková	cs
dc.subject	sémantická textovou podobnost	cs
dc.subject	STS	cs
dc.subject.translated	cross-lingual	en
dc.subject.translated	dataset	en
dc.subject.translated	evaluation	en
dc.subject.translated	Semantic Textual Similarity	en
dc.subject.translated	STS	en
dc.title	Evaluation Datasets for Cross-lingual Semantic Textual Similarity	en
dc.title.alternative	Hodnotící datové sady pro mezijazykovou sémantickou textovou podobnost	cs
dc.type	konferenční příspěvek	cs
dc.type	ConferenceObject	en
dc.type.status	Peer-reviewed	en
dc.type.version	publishedVersion	en

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: 2021.ranlp-main.59.pdf
Size:: 173.99 KB
Format:: Adobe Portable Document Format

Download

Collections

OBD
Conference Papers (KIV)