Efficient Hash Function for Duplicate Elimination in Dictionaries

dc.contributor.authorSkala, Václav
dc.contributor.authorHrádek, Jan
dc.date.accessioned2014-12-18T09:26:45Z
dc.date.available2014-12-18T09:26:45Z
dc.date.issued2009
dc.description.abstractFast elimination of duplicate data is needed in many areas, especially in the textual data context. A solution to this problem was recently found for geometrical data using a hash function to speed up the process. The usage of the hash function is extremely efficient when incremental elimination is required especially for processing large data sets. In this paper a new construction of the hash function is presented, giving short clusters with few collisions only. The proposed hash function is not a perfect hash function, nevertheless it gives similar properties to it. The hash function used takes advantage of the relatively large amount of available memory on modern computers, and works well with large data sets. Experiments have proved that different approaches should be used for different types of languages, because the structures of Slavonic and Anglo-Saxon languages are different. Therefore, tests were made with a Czech dictionary having 2.5 million words and an English dictionary having 130 thousands words. Algorithm was also tested for a few other languages. Experimental results are presented in this paper as well.en
dc.format10 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.citationAlgoritmy 2009: 18th Conference on Scientific Computing, p. 382-391.en
dc.identifier.isbn978-80-227-3032-7
dc.identifier.urihttp://hdl.handle.net/11025/11785
dc.language.isoenen
dc.publisherSlovenská technická univerzita v Bratislavěcs
dc.relation.ispartofseriesAlgoritmy 2009en
dc.rightsPlný text není přístupný.cs
dc.rights.accessclosedAccessen
dc.subjecthešovací funkcecs
dc.subjecthešovací tabulkacs
dc.subjectstruktura datcs
dc.subject.translatedhash functionen
dc.subject.translatedhash tableen
dc.subject.translateddata structureen
dc.titleEfficient Hash Function for Duplicate Elimination in Dictionariesen
dc.typepreprintcs
dc.typepreprinten
dc.type.statusPeer-revieweden
dc.type.versiondraften

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
Skala_2009_HASH_Dictionary-Algoritmy.pdf
Size:
311.07 KB
Format:
Adobe Portable Document Format
Description:
Plný text
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections