An automatic non-English sentiment lexicon builder using unannotated corpus

Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In orde...

Full description

Bibliographic Details
Main Authors: Kaity, Mohammed, Balakrishnan, Vimala
Format: Article
Published: Springer Verlag 2019
Subjects:
_version_ 1825722195732594688
author Kaity, Mohammed
Balakrishnan, Vimala
author_facet Kaity, Mohammed
Balakrishnan, Vimala
author_sort Kaity, Mohammed
collection UM
description Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In order to overcome challenges faced in building non-English lexicons, we propose a language-independent method that automatically builds non-English sentiment lexicons based on currently available English lexicons with an unannotated corpus from the target language. The proposed method will automatically recognize and extract new polarity words from the unannotated corpus based on the initial seed lexicons that are developed by translating three reliable English lexicons. The experimental results from the test datasets confirmed that a developed non-English sentiment lexicon could significantly enhance the performance of non-English sentiment classifications, compared with other methods and lexicons. The developed lexicon in the Arabic language outperformed other commonly used methods for developing non-English lexicons, with an 0.74 F measure. The adopted approach in this study was proven to be language independent and can be implemented in other languages as well. This paper also contributes to understanding the approaches to developing sentiment resources. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
first_indexed 2024-03-06T06:01:57Z
format Article
id um.eprints-24153
institution Universiti Malaya
last_indexed 2024-03-06T06:01:57Z
publishDate 2019
publisher Springer Verlag
record_format dspace
spelling um.eprints-241532020-04-06T15:27:57Z http://eprints.um.edu.my/24153/ An automatic non-English sentiment lexicon builder using unannotated corpus Kaity, Mohammed Balakrishnan, Vimala QA75 Electronic computers. Computer science Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In order to overcome challenges faced in building non-English lexicons, we propose a language-independent method that automatically builds non-English sentiment lexicons based on currently available English lexicons with an unannotated corpus from the target language. The proposed method will automatically recognize and extract new polarity words from the unannotated corpus based on the initial seed lexicons that are developed by translating three reliable English lexicons. The experimental results from the test datasets confirmed that a developed non-English sentiment lexicon could significantly enhance the performance of non-English sentiment classifications, compared with other methods and lexicons. The developed lexicon in the Arabic language outperformed other commonly used methods for developing non-English lexicons, with an 0.74 F measure. The adopted approach in this study was proven to be language independent and can be implemented in other languages as well. This paper also contributes to understanding the approaches to developing sentiment resources. © 2019, Springer Science+Business Media, LLC, part of Springer Nature. Springer Verlag 2019 Article PeerReviewed Kaity, Mohammed and Balakrishnan, Vimala (2019) An automatic non-English sentiment lexicon builder using unannotated corpus. The Journal of Supercomputing, 75 (4). pp. 2243-2268. ISSN 0920-8542, DOI https://doi.org/10.1007/s11227-019-02755-3 <https://doi.org/10.1007/s11227-019-02755-3>. https://doi.org/10.1007/s11227-019-02755-3 doi:10.1007/s11227-019-02755-3
spellingShingle QA75 Electronic computers. Computer science
Kaity, Mohammed
Balakrishnan, Vimala
An automatic non-English sentiment lexicon builder using unannotated corpus
title An automatic non-English sentiment lexicon builder using unannotated corpus
title_full An automatic non-English sentiment lexicon builder using unannotated corpus
title_fullStr An automatic non-English sentiment lexicon builder using unannotated corpus
title_full_unstemmed An automatic non-English sentiment lexicon builder using unannotated corpus
title_short An automatic non-English sentiment lexicon builder using unannotated corpus
title_sort automatic non english sentiment lexicon builder using unannotated corpus
topic QA75 Electronic computers. Computer science
work_keys_str_mv AT kaitymohammed anautomaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus
AT balakrishnanvimala anautomaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus
AT kaitymohammed automaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus
AT balakrishnanvimala automaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus