Multi-Label Classification of Chinese Rural Poverty Governance Texts Based on XLNet and Bi-LSTM Fused Hierarchical Attention Mechanism

Hierarchical multi-label text classification (HMTC) is a highly relevant and widely discussed topic in the era of big data, particularly for efficiently classifying extensive amounts of text data. This study proposes the HTMC-PGT framework for poverty governance’s single-path hierarchical multi-labe...

Full description

Bibliographic Details
Main Authors: Xin Wang, Leifeng Guo
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/13/7377
Description
Summary:Hierarchical multi-label text classification (HMTC) is a highly relevant and widely discussed topic in the era of big data, particularly for efficiently classifying extensive amounts of text data. This study proposes the HTMC-PGT framework for poverty governance’s single-path hierarchical multi-label classification problem. The framework simplifies the HMTC problem into training and combination problems of multi-class classifiers in the classifier tree. Each independent classifier in this framework uses an XLNet pretrained model to extract char-level semantic embeddings of text and employs a hierarchical attention mechanism integrated with Bi-LSTM (BiLSTM + HA) to extract semantic embeddings at the document level for classification purposes. Simultaneously, this study proposes that the structure uses transfer learning (TL) between classifiers in the classifier tree. The experimental results show that the proposed XLNet + BiLSTM + HA + FC + TL model achieves micro-P, micro-R, and micro-F1 values of 96.1%, which is 7.5~38.1% higher than those of other baseline models. The HTMC-PGT framework based on XLNet, BiLSTM + HA, and transfer learning (TL) between classifier tree nodes proposed in this study solves the hierarchical multi-label classification problem of poverty governance text (PGT). It provides a new idea for solving the traditional HMTC problem.
ISSN:2076-3417