Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook

Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language process...

Full description

Bibliographic Details
Main Authors: Hiep Xuan Huynh, Vu Tuan Nguyen, Nghia Duong-Trung, Van-Huy Pham, Cang Thuong Phan
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8735698/
_version_ 1818350136277860352
author Hiep Xuan Huynh
Vu Tuan Nguyen
Nghia Duong-Trung
Van-Huy Pham
Cang Thuong Phan
author_facet Hiep Xuan Huynh
Vu Tuan Nguyen
Nghia Duong-Trung
Van-Huy Pham
Cang Thuong Phan
author_sort Hiep Xuan Huynh
collection DOAJ
description Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.
first_indexed 2024-12-13T18:17:03Z
format Article
id doaj.art-79c32807a6bf46cda486de9cf09262c5
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-13T18:17:03Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-79c32807a6bf46cda486de9cf09262c52022-12-21T23:35:49ZengIEEEIEEE Access2169-35362019-01-017786757868410.1109/ACCESS.2019.29224278735698Distributed Framework for Automating Opinion Discretization From Text Corpora on FacebookHiep Xuan Huynh0https://orcid.org/0000-0002-9213-131XVu Tuan Nguyen1Nghia Duong-Trung2Van-Huy Pham3Cang Thuong Phan4College of Information and Communications Technology, Can Tho University, Can Tho, VietnamCollege of Information and Communications Technology, Can Tho University, Can Tho, VietnamDepartment of Computer Science, Can Tho University of Technology, Can Tho University, Can Tho, VietnamNLP-KD Lab, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh, VietnamCollege of Information and Communications Technology, Can Tho University, Can Tho, VietnamNowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.https://ieeexplore.ieee.org/document/8735698/Apache sparkclassificationconvolutional neural networksdeep learningopinion miningTensorFlow
spellingShingle Hiep Xuan Huynh
Vu Tuan Nguyen
Nghia Duong-Trung
Van-Huy Pham
Cang Thuong Phan
Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
IEEE Access
Apache spark
classification
convolutional neural networks
deep learning
opinion mining
TensorFlow
title Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
title_full Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
title_fullStr Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
title_full_unstemmed Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
title_short Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
title_sort distributed framework for automating opinion discretization from text corpora on facebook
topic Apache spark
classification
convolutional neural networks
deep learning
opinion mining
TensorFlow
url https://ieeexplore.ieee.org/document/8735698/
work_keys_str_mv AT hiepxuanhuynh distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook
AT vutuannguyen distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook
AT nghiaduongtrung distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook
AT vanhuypham distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook
AT cangthuongphan distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook