Advanced Hierarchical Topic Labeling for Short Text
Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The h...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10092802/ |
_version_ | 1827967790272741376 |
---|---|
author | Paras Tiwari Ashutosh Tripathi Avaneesh Singh Sawan Rai |
author_facet | Paras Tiwari Ashutosh Tripathi Avaneesh Singh Sawan Rai |
author_sort | Paras Tiwari |
collection | DOAJ |
description | Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively. |
first_indexed | 2024-04-09T18:10:27Z |
format | Article |
id | doaj.art-affbf192d816485f9ce727782cf27b4e |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-09T18:10:27Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-affbf192d816485f9ce727782cf27b4e2023-04-13T23:00:26ZengIEEEIEEE Access2169-35362023-01-0111351583517410.1109/ACCESS.2023.326467810092802Advanced Hierarchical Topic Labeling for Short TextParas Tiwari0Ashutosh Tripathi1https://orcid.org/0000-0003-3117-6722Avaneesh Singh2https://orcid.org/0000-0002-1200-7897Sawan Rai3Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, Varanasi, Uttar Pradesh, IndiaDepartment of Computer Science and Engineering, PDPM Indian Institute of Information Technology Design and Manufacturing, Jabalpur, Madhya Pradesh, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, IndiaDepartment of Computer Science and Engineering, PDPM Indian Institute of Information Technology Design and Manufacturing, Jabalpur, Madhya Pradesh, IndiaHierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively.https://ieeexplore.ieee.org/document/10092802/Document categorizationhierarchical topic modelinghierarchical topic labelingtopic modelingtopic labeling |
spellingShingle | Paras Tiwari Ashutosh Tripathi Avaneesh Singh Sawan Rai Advanced Hierarchical Topic Labeling for Short Text IEEE Access Document categorization hierarchical topic modeling hierarchical topic labeling topic modeling topic labeling |
title | Advanced Hierarchical Topic Labeling for Short Text |
title_full | Advanced Hierarchical Topic Labeling for Short Text |
title_fullStr | Advanced Hierarchical Topic Labeling for Short Text |
title_full_unstemmed | Advanced Hierarchical Topic Labeling for Short Text |
title_short | Advanced Hierarchical Topic Labeling for Short Text |
title_sort | advanced hierarchical topic labeling for short text |
topic | Document categorization hierarchical topic modeling hierarchical topic labeling topic modeling topic labeling |
url | https://ieeexplore.ieee.org/document/10092802/ |
work_keys_str_mv | AT parastiwari advancedhierarchicaltopiclabelingforshorttext AT ashutoshtripathi advancedhierarchicaltopiclabelingforshorttext AT avaneeshsingh advancedhierarchicaltopiclabelingforshorttext AT sawanrai advancedhierarchicaltopiclabelingforshorttext |