GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
The use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each da...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-07-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/14/8211 |
_version_ | 1827734085977505792 |
---|---|
author | Zie Eya Ekolle Ryuji Kohno |
author_facet | Zie Eya Ekolle Ryuji Kohno |
author_sort | Zie Eya Ekolle |
collection | DOAJ |
description | The use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each day from different sources, such as web-pages, blogs, emails, social media, and articles, one of the most common tasks in NLP is the classification of a text corpus. This is important in many institutions for planning, decision-making, and creating archives of their projects. Many algorithms exist to automate text classification tasks but the most intriguing of them is that which also learns these tasks automatically. In this study, we present a new model to infer and learn from data using probabilistic logic and apply it to text classification. This model, called GenCo, is a multi-input single-output (MISO) learning model that uses a collaboration of partial classifications to generate the desired output. It provides a heterogeneity measure to explain its classification results and enables a reduction in the curse of dimensionality in text classification. Experiments with the model were carried out on the Twitter US Airline dataset, the Conference Paper dataset, and the SMS Spam dataset, outperforming baseline models with 98.40%, 89.90%, and 99.26% accuracy, respectively. |
first_indexed | 2024-03-11T01:20:34Z |
format | Article |
id | doaj.art-57e43cd2e03043cabd8660a614566949 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T01:20:34Z |
publishDate | 2023-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-57e43cd2e03043cabd8660a6145669492023-11-18T18:09:53ZengMDPI AGApplied Sciences2076-34172023-07-011314821110.3390/app13148211GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial ClassificationsZie Eya Ekolle0Ryuji Kohno1Department of Electrical and Computer Engineering, Yokohama National University, Yokohama 240-8501, JapanDepartment of Electrical and Computer Engineering, Yokohama National University, Yokohama 240-8501, JapanThe use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each day from different sources, such as web-pages, blogs, emails, social media, and articles, one of the most common tasks in NLP is the classification of a text corpus. This is important in many institutions for planning, decision-making, and creating archives of their projects. Many algorithms exist to automate text classification tasks but the most intriguing of them is that which also learns these tasks automatically. In this study, we present a new model to infer and learn from data using probabilistic logic and apply it to text classification. This model, called GenCo, is a multi-input single-output (MISO) learning model that uses a collaboration of partial classifications to generate the desired output. It provides a heterogeneity measure to explain its classification results and enables a reduction in the curse of dimensionality in text classification. Experiments with the model were carried out on the Twitter US Airline dataset, the Conference Paper dataset, and the SMS Spam dataset, outperforming baseline models with 98.40%, 89.90%, and 99.26% accuracy, respectively.https://www.mdpi.com/2076-3417/13/14/8211natural language processingtext classificationprobabilistic modelsmachine learninggenerative learningcollaborative learning |
spellingShingle | Zie Eya Ekolle Ryuji Kohno GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications Applied Sciences natural language processing text classification probabilistic models machine learning generative learning collaborative learning |
title | GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications |
title_full | GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications |
title_fullStr | GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications |
title_full_unstemmed | GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications |
title_short | GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications |
title_sort | genco a generative learning model for heterogeneous text classification based on collaborative partial classifications |
topic | natural language processing text classification probabilistic models machine learning generative learning collaborative learning |
url | https://www.mdpi.com/2076-3417/13/14/8211 |
work_keys_str_mv | AT zieeyaekolle gencoagenerativelearningmodelforheterogeneoustextclassificationbasedoncollaborativepartialclassifications AT ryujikohno gencoagenerativelearningmodelforheterogeneoustextclassificationbasedoncollaborativepartialclassifications |