Oct 19, 2019   4:48 a.m. Kristián
Academic information system

Final theses

Basic information

Basic information about a final thesis

Type of thesis: Bachelor thesis
Thesis title:Searching for Thematically Similar Documents
Written by (author): Ing. Barbora Brocková
Department: Institute of Informatics, Information Systems and Software Engineering (FIIT)
Thesis supervisor: Ing. Tomáš Kučečka
Opponent:Mgr. Jozef Tvarožek, PhD.
Final thesis progress:Final thesis was successfully defended.

Additional information

Additional information about the final thesis follows. Click on the language link to display the information in the desired language.

Language of final thesis:Slovak

Slovak        English

Title of the thesis:Searching for Thematically Similar Documents
Summary:Nowadays search engines offer us the ability to search for similar documents by the means of keyword comparison and frequent itemsets. Other approach is through navigation through references in the document when a user can click the referenced document and visit it. In our work we decided to compare two approaches for estimation of the document similarity -- keyword based approach and reference approach. We proposed and implemented a method for document comparison based on computed similarity of the document's keyword frequencies. We compared the similarities for the pairs of referenced documents with similarities in document corpus and we found out that the majority of referenced documents were the most similar documents in the corpus based on the keyword frequencies. Furthermore we explored effects of document alteration through addition of reference text surroundings. The results showed that the similarity of referenced documents has increased and the similarity of other (non-referenced) documents could as well decrease. In addition to the document similarity computed with cosine similarity, we organized documents into frequent itemsets with the usage of Apriori algorithm. We claimed the documents within separate frequent itemsets as similar and discovered that the pairs of referenced documents occurred in the major part of frequent itemsets.
Key words:similarity, clustering, reference

Display and download files

To display the final thesis assignment form click on the Display the final thesis assignment form icon. The following icons - Final thesis, Thesis appendices, Supervisor's review, Opponent's review - relate to the final thesis and can be downloaded. They could be displayed on condition they have been inserted and are available publicly.

Display the assignment form

Parts of thesis with postponed release:

Final thesis (final thesis appendices) unlimited
Reviews for final thesis unlimited