Metody statistické sémantické analýzy

Steinberger, David

Metody statistické sémantické analýzy

dc.contributor.advisor	Konopík Miloslav, Ing. Ph.D.
dc.contributor.author	Steinberger, David
dc.contributor.referee	Zelinka Jan, Ing. PhD.
dc.date.accepted	2016-9-6
dc.date.accessioned	2017-02-21T08:28:12Z
dc.date.available	2015-9-1
dc.date.available	2017-02-21T08:28:12Z
dc.date.issued	2016
dc.date.submitted	2016-6-23
dc.description.abstract	Tato práce se zabývá statistickou sémantickou podobností a zaměřuje se na nástroj word2vec. Byla navržena rozšíření s ohledem na český jazyk založená na stemmování a n-gramech znaků. Výsledky této práce podávají na českém jazyce o 12% lepší výsledky než původní model. Na anglickém jazyce bylo dosaženo zlepšení o 3%. Nový model poskytuje dobré výsledky i při velmi malém množství trénovacích dat. V rámci práce byly vytvořeny dva trénovací korpusy a jedna obsáhlá testovací datová sada založená na podobnosti dvojic slov. Sada byla získána z 9 různých zdrojů dvojic slov, obsahuje slova v kontextech, odlišuje podobnost a souvislost slov. Výsledná mezi anotátorská shoda dosáhla korelaci 0,81, která je plně srovnatelná s anglickými datovými sadami.	cs
dc.description.abstract-translated	The thesis deals with statistic semantic similarity focused on the word2vec tool. It introduces extensions for the Czech language based upon stemming and character n-grams. The achieved results improve the original tool by 12% on the Czech language and by 3% on English. The new model is providing good results even on small training data. In this thesis, we introduce two new training corpora and one large dataset based on similarity of word pairs. The dataset is compiled from 9 differenet sources, it contains words in their contexts, it distinguishes between the similarity and relatedness of the word pairs. The final inter-rater agreement reaches 0.81 correlation, which is fully comparable with english datasets.	en
dc.description.result	Obhájeno	cs
dc.format	ii s., 60 s., XI s.	cs
dc.format.mimetype	application/pdf
dc.identifier	68335
dc.identifier.uri	http://hdl.handle.net/11025/23695
dc.language.iso	cs	cs
dc.publisher	Západočeská univerzita v Plzni	cs
dc.rights	Plný text práce je přístupný bez omezení.	cs
dc.rights.access	openAccess	en
dc.subject	word2vec	cs
dc.subject	distribuční hypotéza	cs
dc.subject	zpracování přirozeného jazyka	cs
dc.subject	sémantická podobnost	cs
dc.subject	umělé neuronové sítě	cs
dc.subject	sémantické vektory slov	cs
dc.subject.translated	word2vec	en
dc.subject.translated	vector space model	en
dc.subject.translated	distributional hypothesis	en
dc.subject.translated	nlp	en
dc.subject.translated	semantic similarity	en
dc.subject.translated	artificial neural networks	en
dc.subject.translated	word embeddings	en
dc.thesis.degree-grantor	Západočeská univerzita v Plzni. Fakulta aplikovaných věd	cs
dc.thesis.degree-level	Navazující	cs
dc.thesis.degree-name	Ing.	cs
dc.thesis.degree-program	Inženýrská informatika	cs
dc.title	Metody statistické sémantické analýzy	cs
dc.title.alternative	Statistical Semantic Analysis Methods	en
dc.type	diplomová práce	cs
local.relation.IS	https://portal.zcu.cz/StagPortletsJSR168/CleanUrl?urlid=prohlizeni-prace-detail&praceIdno=68335

Files

Original bundle

Showing 1 - 4 out of 4 results

Name:: D.Steinberger.Metody.statisticke.semanticke.analyzy.pdf
Size:: 1.99 MB
Format:: Adobe Portable Document Format
Description:: Plný text práce

Download

Name:: A13N0095Pposudek-op.PDF
Size:: 593.82 KB
Format:: Adobe Portable Document Format
Description:: Posudek oponenta práce

Download

Name:: A13N0095Phodnoceni-ved.PDF
Size:: 372.23 KB
Format:: Adobe Portable Document Format
Description:: Posudek vedoucího práce

Download

Name:: A13N0095Pobhajoba.PDF
Size:: 203.05 KB
Format:: Adobe Portable Document Format
Description:: Průběh obhajoby práce

Download

Collections

Theses (KIV)