Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth

Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for i...

Full description

Bibliographic Details
Main Authors: Jan Žižka, Arnošt Svoboda
Format: Article
Language:English
Published: Mendel University Press 2015-01-01
Series:Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
Subjects:
Online Access:https://acta.mendelu.cz/63/6/2229/
Description
Summary:Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.
ISSN:1211-8516
2464-8310