Comparison of document similarity algorithms in extracting document keywords from an academic paper

The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen,...

Full description

Bibliographic Details
Main Authors: Miah, M. Saef Ullah, Junaida, Sulaiman, Azad, Saiful, Kamal Z., Zamli, Rajan, Jose
Format: Conference or Workshop Item
Language:English
Published: Institute of Electrical and Electronics Engineers 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/34309/7/Comparison%20of%20document%20similarity.pdf
_version_ 1825824008171421696
author Miah, M. Saef Ullah
Junaida, Sulaiman
Azad, Saiful
Kamal Z., Zamli
Rajan, Jose
author_facet Miah, M. Saef Ullah
Junaida, Sulaiman
Azad, Saiful
Kamal Z., Zamli
Rajan, Jose
author_sort Miah, M. Saef Ullah
collection UMP
description The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen, and relevant documents to EDLC are considered for the comparison. Then, different similarity calculation algorithms were employed in different settings on the documents such as using the whole texts of the documents, selecting the positive sentences of the documents, and generating similarity score with automatically extracted keywords from the documents. The experiment’s outcome provides us with findings that the machine-generated keywords are mostly similar to the curated list by the domain experts. This study also suggests the preferable algorithms for similarity calculation and automated key-phrase extraction for the EDLC domain.
first_indexed 2024-03-06T12:57:42Z
format Conference or Workshop Item
id UMPir34309
institution Universiti Malaysia Pahang
language English
last_indexed 2024-03-06T12:57:42Z
publishDate 2021
publisher Institute of Electrical and Electronics Engineers
record_format dspace
spelling UMPir343092022-07-29T03:58:14Z http://umpir.ump.edu.my/id/eprint/34309/ Comparison of document similarity algorithms in extracting document keywords from an academic paper Miah, M. Saef Ullah Junaida, Sulaiman Azad, Saiful Kamal Z., Zamli Rajan, Jose QA75 Electronic computers. Computer science The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen, and relevant documents to EDLC are considered for the comparison. Then, different similarity calculation algorithms were employed in different settings on the documents such as using the whole texts of the documents, selecting the positive sentences of the documents, and generating similarity score with automatically extracted keywords from the documents. The experiment’s outcome provides us with findings that the machine-generated keywords are mostly similar to the curated list by the domain experts. This study also suggests the preferable algorithms for similarity calculation and automated key-phrase extraction for the EDLC domain. Institute of Electrical and Electronics Engineers 2021-09-17 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/34309/7/Comparison%20of%20document%20similarity.pdf Miah, M. Saef Ullah and Junaida, Sulaiman and Azad, Saiful and Kamal Z., Zamli and Rajan, Jose (2021) Comparison of document similarity algorithms in extracting document keywords from an academic paper. In: IEEE 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM) , 24-26 August 2021 , Pekan, Pahang, Malaysia. pp. 631-636.. (Published) https://doi.org/10.1109/ICSECS52883.2021.00121
spellingShingle QA75 Electronic computers. Computer science
Miah, M. Saef Ullah
Junaida, Sulaiman
Azad, Saiful
Kamal Z., Zamli
Rajan, Jose
Comparison of document similarity algorithms in extracting document keywords from an academic paper
title Comparison of document similarity algorithms in extracting document keywords from an academic paper
title_full Comparison of document similarity algorithms in extracting document keywords from an academic paper
title_fullStr Comparison of document similarity algorithms in extracting document keywords from an academic paper
title_full_unstemmed Comparison of document similarity algorithms in extracting document keywords from an academic paper
title_short Comparison of document similarity algorithms in extracting document keywords from an academic paper
title_sort comparison of document similarity algorithms in extracting document keywords from an academic paper
topic QA75 Electronic computers. Computer science
url http://umpir.ump.edu.my/id/eprint/34309/7/Comparison%20of%20document%20similarity.pdf
work_keys_str_mv AT miahmsaefullah comparisonofdocumentsimilarityalgorithmsinextractingdocumentkeywordsfromanacademicpaper
AT junaidasulaiman comparisonofdocumentsimilarityalgorithmsinextractingdocumentkeywordsfromanacademicpaper
AT azadsaiful comparisonofdocumentsimilarityalgorithmsinextractingdocumentkeywordsfromanacademicpaper
AT kamalzzamli comparisonofdocumentsimilarityalgorithmsinextractingdocumentkeywordsfromanacademicpaper
AT rajanjose comparisonofdocumentsimilarityalgorithmsinextractingdocumentkeywordsfromanacademicpaper