Topic identification method for textual document

Abstract— Topic identification is a crucial task for discovering knowledge from textual document. Existing methods for topic identification suffer from word counting problem as they depend on the most frequent terms in the text to produce the topic keyword.Not all frequent terms are relevant. T...

Full description

Bibliographic Details
Main Authors: Jamil, Nurul Syafidah, Ku-Mahamud, Ku Ruhana, Mohamed Din, Aniza
Format: Article
Language:English
Published: JMEST 2017
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/21719/1/JMEST%204%202%202017%206643%206647.pdf
_version_ 1803628212502134784
author Jamil, Nurul Syafidah
Ku-Mahamud, Ku Ruhana
Mohamed Din, Aniza
author_facet Jamil, Nurul Syafidah
Ku-Mahamud, Ku Ruhana
Mohamed Din, Aniza
author_sort Jamil, Nurul Syafidah
collection UUM
description Abstract— Topic identification is a crucial task for discovering knowledge from textual document. Existing methods for topic identification suffer from word counting problem as they depend on the most frequent terms in the text to produce the topic keyword.Not all frequent terms are relevant. This paper proposes a topic identification method that filters the important terms from the preprocessed text and applied term weighting scheme to solve synonym problem.A rule generation algorithm is used to determine the appropriate topics based on the weighted terms.The text document used in the experiment is the English translated Quran.The topics identified from the proposed method were compared with topics identified using Rough Set and domain experts. From the findings, the proposed topic identification method was consistently able to identify topics that are mostly close to the topics that have been given by Rough Set and the experts.The result from the comparison proved that the proposed method was able to be used to capture topics for textual documents.
first_indexed 2024-07-04T06:18:22Z
format Article
id uum-21719
institution Universiti Utara Malaysia
language English
last_indexed 2024-07-04T06:18:22Z
publishDate 2017
publisher JMEST
record_format dspace
spelling uum-217192017-04-19T07:44:55Z https://repo.uum.edu.my/id/eprint/21719/ Topic identification method for textual document Jamil, Nurul Syafidah Ku-Mahamud, Ku Ruhana Mohamed Din, Aniza QA76 Computer software Abstract— Topic identification is a crucial task for discovering knowledge from textual document. Existing methods for topic identification suffer from word counting problem as they depend on the most frequent terms in the text to produce the topic keyword.Not all frequent terms are relevant. This paper proposes a topic identification method that filters the important terms from the preprocessed text and applied term weighting scheme to solve synonym problem.A rule generation algorithm is used to determine the appropriate topics based on the weighted terms.The text document used in the experiment is the English translated Quran.The topics identified from the proposed method were compared with topics identified using Rough Set and domain experts. From the findings, the proposed topic identification method was consistently able to identify topics that are mostly close to the topics that have been given by Rough Set and the experts.The result from the comparison proved that the proposed method was able to be used to capture topics for textual documents. JMEST 2017 Article PeerReviewed application/pdf en https://repo.uum.edu.my/id/eprint/21719/1/JMEST%204%202%202017%206643%206647.pdf Jamil, Nurul Syafidah and Ku-Mahamud, Ku Ruhana and Mohamed Din, Aniza (2017) Topic identification method for textual document. Journal of Multidisciplinary Engineering Science and Technology (JMEST), 4 (2). pp. 6643-6647. ISSN 2458-9403 http://www.jmest.org/wp-content/uploads/JMESTN42352037.pdf
spellingShingle QA76 Computer software
Jamil, Nurul Syafidah
Ku-Mahamud, Ku Ruhana
Mohamed Din, Aniza
Topic identification method for textual document
title Topic identification method for textual document
title_full Topic identification method for textual document
title_fullStr Topic identification method for textual document
title_full_unstemmed Topic identification method for textual document
title_short Topic identification method for textual document
title_sort topic identification method for textual document
topic QA76 Computer software
url https://repo.uum.edu.my/id/eprint/21719/1/JMEST%204%202%202017%206643%206647.pdf
work_keys_str_mv AT jamilnurulsyafidah topicidentificationmethodfortextualdocument
AT kumahamudkuruhana topicidentificationmethodfortextualdocument
AT mohameddinaniza topicidentificationmethodfortextualdocument