LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking

Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic softwar...

Full description

Bibliographic Details
Main Authors: Timothy Dillan, Dhomas Hatta Fudholi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10147827/
Description
Summary:Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic software that streamlines the process of identifying state-of-the-art research across diverse academic topics. The software automatically retrieves articles from multiple databases and preprocesses the content through tokenization, case folding, token cleansing, stopword removal, and lemmatization. Subsequently, a numeric document-phrase matrix is created and analyzed using latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT) to discover and label topics automatically. The study introduces a novel topic-filtering method based on entity linking and filtering model outputs using a knowledge database to ensure topic relevance. The visual representation employs nested bubble and line charts, effectively illustrating current topics, gaps, and research evolution trends. A user survey spread to 52 student researchers assessing the interface, topic relevance, and research output of the developed software, revealed that the interface is user-friendly, easy to navigate, and the presented information is comprehensible. Survey results also indicated that the generated topics are consistent with the processed article content and relevant to the investigated topic. The visualization effectively aids in understanding the state-of-the-art and research map. This study demonstrates that integrating LDA, BERT, and the proposed topic filtering and labeling method yields a robust tool for preliminary research analysis with high precision and relevance.
ISSN:2169-3536