Advanced Hierarchical Topic Labeling for Short Text

Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The h...

Full description

Bibliographic Details
Main Authors: Paras Tiwari, Ashutosh Tripathi, Avaneesh Singh, Sawan Rai
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10092802/
_version_ 1827967790272741376
author Paras Tiwari
Ashutosh Tripathi
Avaneesh Singh
Sawan Rai
author_facet Paras Tiwari
Ashutosh Tripathi
Avaneesh Singh
Sawan Rai
author_sort Paras Tiwari
collection DOAJ
description Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively.
first_indexed 2024-04-09T18:10:27Z
format Article
id doaj.art-affbf192d816485f9ce727782cf27b4e
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-09T18:10:27Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-affbf192d816485f9ce727782cf27b4e2023-04-13T23:00:26ZengIEEEIEEE Access2169-35362023-01-0111351583517410.1109/ACCESS.2023.326467810092802Advanced Hierarchical Topic Labeling for Short TextParas Tiwari0Ashutosh Tripathi1https://orcid.org/0000-0003-3117-6722Avaneesh Singh2https://orcid.org/0000-0002-1200-7897Sawan Rai3Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, Varanasi, Uttar Pradesh, IndiaDepartment of Computer Science and Engineering, PDPM Indian Institute of Information Technology Design and Manufacturing, Jabalpur, Madhya Pradesh, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, IndiaDepartment of Computer Science and Engineering, PDPM Indian Institute of Information Technology Design and Manufacturing, Jabalpur, Madhya Pradesh, IndiaHierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively.https://ieeexplore.ieee.org/document/10092802/Document categorizationhierarchical topic modelinghierarchical topic labelingtopic modelingtopic labeling
spellingShingle Paras Tiwari
Ashutosh Tripathi
Avaneesh Singh
Sawan Rai
Advanced Hierarchical Topic Labeling for Short Text
IEEE Access
Document categorization
hierarchical topic modeling
hierarchical topic labeling
topic modeling
topic labeling
title Advanced Hierarchical Topic Labeling for Short Text
title_full Advanced Hierarchical Topic Labeling for Short Text
title_fullStr Advanced Hierarchical Topic Labeling for Short Text
title_full_unstemmed Advanced Hierarchical Topic Labeling for Short Text
title_short Advanced Hierarchical Topic Labeling for Short Text
title_sort advanced hierarchical topic labeling for short text
topic Document categorization
hierarchical topic modeling
hierarchical topic labeling
topic modeling
topic labeling
url https://ieeexplore.ieee.org/document/10092802/
work_keys_str_mv AT parastiwari advancedhierarchicaltopiclabelingforshorttext
AT ashutoshtripathi advancedhierarchicaltopiclabelingforshorttext
AT avaneeshsingh advancedhierarchicaltopiclabelingforshorttext
AT sawanrai advancedhierarchicaltopiclabelingforshorttext