NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification

Research in financial domain has shown that sentiment aspects of stock news have a profound impact on volume trades, volatility, stock prices and firm earnings. In-depth analysis of stock news is now sourced from financial reviews by various social networking and marketing sites to help improve deci...

Full description

Bibliographic Details
Main Authors: Yazdani, Sepideh Foroozan, Tan, Zhiyuan, Kakavand, Mohsen, Mustapha, Aida
Format: Article
Language:English
Published: Springer 2018
Subjects:
Online Access:http://eprints.uthm.edu.my/5136/1/AJ%202018%20%28843%29%20NgramPOS%20a%20bigram-based%20linguistic%20and%20statistical%20feature%20process%20model%20for%20unstructured%20text%20classification.pdf
_version_ 1825709971067633664
author Yazdani, Sepideh Foroozan
Tan, Zhiyuan
Kakavand, Mohsen
Mustapha, Aida
author_facet Yazdani, Sepideh Foroozan
Tan, Zhiyuan
Kakavand, Mohsen
Mustapha, Aida
author_sort Yazdani, Sepideh Foroozan
collection UTHM
description Research in financial domain has shown that sentiment aspects of stock news have a profound impact on volume trades, volatility, stock prices and firm earnings. In-depth analysis of stock news is now sourced from financial reviews by various social networking and marketing sites to help improve decision making. Nonetheless, such reviews are in the form of unstructured text, which requires natural language processing (NLP) in order to extract the sentiments. Accordingly, in this study we investigate the use of NLP tasks in effort to improve the performance of sentiment classification in evaluating the information content of financial news as an instrument in investment decision support system. At present, feature extraction approach is mainly based on the occurrence frequency of words. Therefore low-frequency linguistic features that could be critical in sentiment classification are typically ignored. In this research, we attempt to improve current sentiment analysis approaches for financial news classification by focusing on low-frequency but informative linguistic expressions. Our proposed combination of low and high-frequency linguistic expressions contributes a novel set of features for sentiment classification. The experimental results show that an optimal Ngram feature selection (combination of optimal unigram and bigram features) enhances sentiment classification accuracy as compared to other types of feature sets.
first_indexed 2024-03-05T21:50:30Z
format Article
id uthm.eprints-5136
institution Universiti Tun Hussein Onn Malaysia
language English
last_indexed 2024-03-05T21:50:30Z
publishDate 2018
publisher Springer
record_format dspace
spelling uthm.eprints-51362022-01-06T02:29:16Z http://eprints.uthm.edu.my/5136/ NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification Yazdani, Sepideh Foroozan Tan, Zhiyuan Kakavand, Mohsen Mustapha, Aida QA76 Computer software TA Engineering (General). Civil engineering (General) TA329-348 Engineering mathematics. Engineering analysis Research in financial domain has shown that sentiment aspects of stock news have a profound impact on volume trades, volatility, stock prices and firm earnings. In-depth analysis of stock news is now sourced from financial reviews by various social networking and marketing sites to help improve decision making. Nonetheless, such reviews are in the form of unstructured text, which requires natural language processing (NLP) in order to extract the sentiments. Accordingly, in this study we investigate the use of NLP tasks in effort to improve the performance of sentiment classification in evaluating the information content of financial news as an instrument in investment decision support system. At present, feature extraction approach is mainly based on the occurrence frequency of words. Therefore low-frequency linguistic features that could be critical in sentiment classification are typically ignored. In this research, we attempt to improve current sentiment analysis approaches for financial news classification by focusing on low-frequency but informative linguistic expressions. Our proposed combination of low and high-frequency linguistic expressions contributes a novel set of features for sentiment classification. The experimental results show that an optimal Ngram feature selection (combination of optimal unigram and bigram features) enhances sentiment classification accuracy as compared to other types of feature sets. Springer 2018 Article PeerReviewed text en http://eprints.uthm.edu.my/5136/1/AJ%202018%20%28843%29%20NgramPOS%20a%20bigram-based%20linguistic%20and%20statistical%20feature%20process%20model%20for%20unstructured%20text%20classification.pdf Yazdani, Sepideh Foroozan and Tan, Zhiyuan and Kakavand, Mohsen and Mustapha, Aida (2018) NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification. WIRELESS NETWORKS. pp. 1-11. ISSN 1022-0038
spellingShingle QA76 Computer software
TA Engineering (General). Civil engineering (General)
TA329-348 Engineering mathematics. Engineering analysis
Yazdani, Sepideh Foroozan
Tan, Zhiyuan
Kakavand, Mohsen
Mustapha, Aida
NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification
title NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification
title_full NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification
title_fullStr NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification
title_full_unstemmed NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification
title_short NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification
title_sort ngrampos a bigram based linguistic and statistical feature process model for unstructured text classification
topic QA76 Computer software
TA Engineering (General). Civil engineering (General)
TA329-348 Engineering mathematics. Engineering analysis
url http://eprints.uthm.edu.my/5136/1/AJ%202018%20%28843%29%20NgramPOS%20a%20bigram-based%20linguistic%20and%20statistical%20feature%20process%20model%20for%20unstructured%20text%20classification.pdf
work_keys_str_mv AT yazdanisepidehforoozan ngramposabigrambasedlinguisticandstatisticalfeatureprocessmodelforunstructuredtextclassification
AT tanzhiyuan ngramposabigrambasedlinguisticandstatisticalfeatureprocessmodelforunstructuredtextclassification
AT kakavandmohsen ngramposabigrambasedlinguisticandstatisticalfeatureprocessmodelforunstructuredtextclassification
AT mustaphaaida ngramposabigrambasedlinguisticandstatisticalfeatureprocessmodelforunstructuredtextclassification