Data advance preparation factors affecting results of sequence rule analysis in web log mining

dc.contributor.authorMunk, Michal
dc.contributor.authorKapusta, Jozef
dc.contributor.authorŠvec, Peter
dc.contributor.authorTurčáni, Milan
dc.date.accessioned2016-01-14T09:34:32Z
dc.date.available2016-01-14T09:34:32Z
dc.date.issued2010
dc.description.abstract-translatedOne of the main tasks of web log mining is discovering patterns of behaviour of portal visitors. Based on the found patterns of users behaviour, which are represented by sequence rules it is possible to modify and improve the web page of an organisation. This article aims at finding out by means of an experiment to what degree it is necessary to realize data preparation for web log mi- ning and it aims also at specifying inevitable steps for obtaining valid data from the log file. Results of the experiment are very important for the portal, which is regularly analysed and modified, since they can prove correctness of individual steps at analysis, or through an identification of “usele- ss” steps they can make the advance preparation of data simpler. These results show that data cleaning from crawlers accesses has a significant impact on the quantity of extracted rules only in case, when we use the method of paths completion. On the contrary, the impact on the reduction of the portion of inexplicable rules as well as the impact on the quality of extracted rules in terms of their basic characteristics was not proved. Paths completing was proved crucial in data prepa- ration for web log mining. It was proved that paths completing has a significant impact both on the quantity and the quality of extracted rules. However, it was prov ed that allowing the used browser upon identifying sessions has neither any significant impact on the quantity nor on the quality of extracted rules. There exist a number of models for identification of users sessions, which are cru- cial in data preparation, however, there e xists also a method, which identifies them expressly. Our next goal is to additionally programme this functionality into the existing system and analyse various parameters of individual methods of identification of sessions compared with the reference direct identification. It also mentions the necessity to pay attention to the analysis of web logs in the real time and to reduce the time needed for the advance preparation of these logs and at the same time to increase accuracy of these data depending on the time of their collection.en
dc.format18 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.citationE+M. Ekonomie a Management = Economics and Management. 2010, č. 4, s. 143-160.cs
dc.identifier.issn1212-3609 (Print)
dc.identifier.issn2336-5604 (Online)
dc.identifier.urihttp://www.ekonomie-management.cz/download/1331826744_e1b0/12_munk.pdf
dc.identifier.urihttp://hdl.handle.net/11025/17373
dc.language.isoenen
dc.publisherTechnická univerzita v Libercics
dc.relation.ispartofseriesE+M. Ekonomie a Management = Economics and Managementcs
dc.rights© Technická univerzita v Libercics
dc.rightsCC BY-NC 4.0cs
dc.rights.accessopenAccessen
dc.subjectweb log miningcs
dc.subjectpříprava datcs
dc.subjecthodnocení kvality datcs
dc.subjectanalýza sekvenčního pravidlacs
dc.subjectvzorycs
dc.subject.translatedweb log miningen
dc.subject.translateddata preparationen
dc.subject.translateddata quality assessmenten
dc.subject.translatedsequence rule analysisen
dc.subject.translatedpatternsen
dc.titleData advance preparation factors affecting results of sequence rule analysis in web log miningen
dc.typečlánekcs
dc.typearticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
12_munk.pdf
Size:
472.41 KB
Format:
Adobe Portable Document Format
Description:
Plný text
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: