GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications

The use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each da...

Full description

Bibliographic Details
Main Authors: Zie Eya Ekolle, Ryuji Kohno
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/14/8211
_version_ 1827734085977505792
author Zie Eya Ekolle
Ryuji Kohno
author_facet Zie Eya Ekolle
Ryuji Kohno
author_sort Zie Eya Ekolle
collection DOAJ
description The use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each day from different sources, such as web-pages, blogs, emails, social media, and articles, one of the most common tasks in NLP is the classification of a text corpus. This is important in many institutions for planning, decision-making, and creating archives of their projects. Many algorithms exist to automate text classification tasks but the most intriguing of them is that which also learns these tasks automatically. In this study, we present a new model to infer and learn from data using probabilistic logic and apply it to text classification. This model, called GenCo, is a multi-input single-output (MISO) learning model that uses a collaboration of partial classifications to generate the desired output. It provides a heterogeneity measure to explain its classification results and enables a reduction in the curse of dimensionality in text classification. Experiments with the model were carried out on the Twitter US Airline dataset, the Conference Paper dataset, and the SMS Spam dataset, outperforming baseline models with 98.40%, 89.90%, and 99.26% accuracy, respectively.
first_indexed 2024-03-11T01:20:34Z
format Article
id doaj.art-57e43cd2e03043cabd8660a614566949
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T01:20:34Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-57e43cd2e03043cabd8660a6145669492023-11-18T18:09:53ZengMDPI AGApplied Sciences2076-34172023-07-011314821110.3390/app13148211GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial ClassificationsZie Eya Ekolle0Ryuji Kohno1Department of Electrical and Computer Engineering, Yokohama National University, Yokohama 240-8501, JapanDepartment of Electrical and Computer Engineering, Yokohama National University, Yokohama 240-8501, JapanThe use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each day from different sources, such as web-pages, blogs, emails, social media, and articles, one of the most common tasks in NLP is the classification of a text corpus. This is important in many institutions for planning, decision-making, and creating archives of their projects. Many algorithms exist to automate text classification tasks but the most intriguing of them is that which also learns these tasks automatically. In this study, we present a new model to infer and learn from data using probabilistic logic and apply it to text classification. This model, called GenCo, is a multi-input single-output (MISO) learning model that uses a collaboration of partial classifications to generate the desired output. It provides a heterogeneity measure to explain its classification results and enables a reduction in the curse of dimensionality in text classification. Experiments with the model were carried out on the Twitter US Airline dataset, the Conference Paper dataset, and the SMS Spam dataset, outperforming baseline models with 98.40%, 89.90%, and 99.26% accuracy, respectively.https://www.mdpi.com/2076-3417/13/14/8211natural language processingtext classificationprobabilistic modelsmachine learninggenerative learningcollaborative learning
spellingShingle Zie Eya Ekolle
Ryuji Kohno
GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
Applied Sciences
natural language processing
text classification
probabilistic models
machine learning
generative learning
collaborative learning
title GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
title_full GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
title_fullStr GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
title_full_unstemmed GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
title_short GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications
title_sort genco a generative learning model for heterogeneous text classification based on collaborative partial classifications
topic natural language processing
text classification
probabilistic models
machine learning
generative learning
collaborative learning
url https://www.mdpi.com/2076-3417/13/14/8211
work_keys_str_mv AT zieeyaekolle gencoagenerativelearningmodelforheterogeneoustextclassificationbasedoncollaborativepartialclassifications
AT ryujikohno gencoagenerativelearningmodelforheterogeneoustextclassificationbasedoncollaborativepartialclassifications