Multi-Label Classification of Microblogging Texts Using Convolution Neural Network

Microblogging sites contain a huge amount of textual data and their classification is an imperative task in many applications, such as information filtering, user profiling, topical analysis, and content tagging. Traditional machine learning approaches mainly use a bag of words or n-gram techniques...

Full description

Bibliographic Details
Main Authors:	Md. Aslam Parwez, Muhammad Abulaish, Jahiruddin
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Social network analysis machine learning deep learning multi-label classification word embedding convolution neural network
Online Access:	https://ieeexplore.ieee.org/document/8723320/

_version_	1819276220223717376
author	Md. Aslam Parwez Muhammad Abulaish Jahiruddin
author_facet	Md. Aslam Parwez Muhammad Abulaish Jahiruddin
author_sort	Md. Aslam Parwez
collection	DOAJ
description	Microblogging sites contain a huge amount of textual data and their classification is an imperative task in many applications, such as information filtering, user profiling, topical analysis, and content tagging. Traditional machine learning approaches mainly use a bag of words or n-gram techniques to generate feature vectors as text representation to train classifiers and perform considerably well for many text information processing tasks. Since short texts, such as tweets, contain a very limited number of words, the traditional machine learning approaches suffer from data sparsity and curse of dimensionality problems due to feature representation using a bag of words or n-grams techniques. Nowadays, the use of feature vectors, such as word embeddings, as an input to neural networks for text classification and clustering has shown a remarkable performance gain. In this paper, we present the different neural network models for multi-label classification of microblogging data. The proposed models are based on convolutional neural network (CNN) architectures, which utilize pre-trained word embeddings from generic and domain-specific textual data sources. The word embeddings are used individually and in various combinations through different channels of CNN to predict class labels. We also present a comparative analysis of the proposed CNN models with traditional machine learning models and one of the existing CNN architectures. The proposed models are evaluated over a real Twitter dataset, and the experimental results establish their efficacy to classify microblogging texts with improved accuracy in comparison with the traditional machine learning approaches and the existing CNN models.
first_indexed	2024-12-23T23:36:45Z
format	Article
id	doaj.art-cbd8ad28c2cf4f98ad8ee7691e36ab8d
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-23T23:36:45Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-cbd8ad28c2cf4f98ad8ee7691e36ab8d2022-12-21T17:25:51ZengIEEEIEEE Access2169-35362019-01-017686786869110.1109/ACCESS.2019.29194948723320Multi-Label Classification of Microblogging Texts Using Convolution Neural NetworkMd. Aslam Parwez0https://orcid.org/0000-0003-0087-7171Muhammad Abulaish1https://orcid.org/0000-0003-3387-4743 Jahiruddin2Department of Computer Science, Jamia Millia Islamia (A Central University), New Delhi, IndiaDepartment of Computer Science, South Asian University, New Delhi, IndiaDepartment of Computer Science, Jamia Millia Islamia (A Central University), New Delhi, IndiaMicroblogging sites contain a huge amount of textual data and their classification is an imperative task in many applications, such as information filtering, user profiling, topical analysis, and content tagging. Traditional machine learning approaches mainly use a bag of words or n-gram techniques to generate feature vectors as text representation to train classifiers and perform considerably well for many text information processing tasks. Since short texts, such as tweets, contain a very limited number of words, the traditional machine learning approaches suffer from data sparsity and curse of dimensionality problems due to feature representation using a bag of words or n-grams techniques. Nowadays, the use of feature vectors, such as word embeddings, as an input to neural networks for text classification and clustering has shown a remarkable performance gain. In this paper, we present the different neural network models for multi-label classification of microblogging data. The proposed models are based on convolutional neural network (CNN) architectures, which utilize pre-trained word embeddings from generic and domain-specific textual data sources. The word embeddings are used individually and in various combinations through different channels of CNN to predict class labels. We also present a comparative analysis of the proposed CNN models with traditional machine learning models and one of the existing CNN architectures. The proposed models are evaluated over a real Twitter dataset, and the experimental results establish their efficacy to classify microblogging texts with improved accuracy in comparison with the traditional machine learning approaches and the existing CNN models.https://ieeexplore.ieee.org/document/8723320/Social network analysismachine learningdeep learningmulti-label classificationword embeddingconvolution neural network
spellingShingle	Md. Aslam Parwez Muhammad Abulaish Jahiruddin Multi-Label Classification of Microblogging Texts Using Convolution Neural Network IEEE Access Social network analysis machine learning deep learning multi-label classification word embedding convolution neural network
title	Multi-Label Classification of Microblogging Texts Using Convolution Neural Network
title_full	Multi-Label Classification of Microblogging Texts Using Convolution Neural Network
title_fullStr	Multi-Label Classification of Microblogging Texts Using Convolution Neural Network
title_full_unstemmed	Multi-Label Classification of Microblogging Texts Using Convolution Neural Network
title_short	Multi-Label Classification of Microblogging Texts Using Convolution Neural Network
title_sort	multi label classification of microblogging texts using convolution neural network
topic	Social network analysis machine learning deep learning multi-label classification word embedding convolution neural network
url	https://ieeexplore.ieee.org/document/8723320/
work_keys_str_mv	AT mdaslamparwez multilabelclassificationofmicrobloggingtextsusingconvolutionneuralnetwork AT muhammadabulaish multilabelclassificationofmicrobloggingtextsusingconvolutionneuralnetwork AT jahiruddin multilabelclassificationofmicrobloggingtextsusingconvolutionneuralnetwork

Multi-Label Classification of Microblogging Texts Using Convolution Neural Network

Similar Items