An automatic non-English sentiment lexicon builder using unannotated corpus
Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In orde...
Main Authors: | , |
---|---|
Format: | Article |
Published: |
Springer Verlag
2019
|
Subjects: |
_version_ | 1825722195732594688 |
---|---|
author | Kaity, Mohammed Balakrishnan, Vimala |
author_facet | Kaity, Mohammed Balakrishnan, Vimala |
author_sort | Kaity, Mohammed |
collection | UM |
description | Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In order to overcome challenges faced in building non-English lexicons, we propose a language-independent method that automatically builds non-English sentiment lexicons based on currently available English lexicons with an unannotated corpus from the target language. The proposed method will automatically recognize and extract new polarity words from the unannotated corpus based on the initial seed lexicons that are developed by translating three reliable English lexicons. The experimental results from the test datasets confirmed that a developed non-English sentiment lexicon could significantly enhance the performance of non-English sentiment classifications, compared with other methods and lexicons. The developed lexicon in the Arabic language outperformed other commonly used methods for developing non-English lexicons, with an 0.74 F measure. The adopted approach in this study was proven to be language independent and can be implemented in other languages as well. This paper also contributes to understanding the approaches to developing sentiment resources. © 2019, Springer Science+Business Media, LLC, part of Springer Nature. |
first_indexed | 2024-03-06T06:01:57Z |
format | Article |
id | um.eprints-24153 |
institution | Universiti Malaya |
last_indexed | 2024-03-06T06:01:57Z |
publishDate | 2019 |
publisher | Springer Verlag |
record_format | dspace |
spelling | um.eprints-241532020-04-06T15:27:57Z http://eprints.um.edu.my/24153/ An automatic non-English sentiment lexicon builder using unannotated corpus Kaity, Mohammed Balakrishnan, Vimala QA75 Electronic computers. Computer science Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In order to overcome challenges faced in building non-English lexicons, we propose a language-independent method that automatically builds non-English sentiment lexicons based on currently available English lexicons with an unannotated corpus from the target language. The proposed method will automatically recognize and extract new polarity words from the unannotated corpus based on the initial seed lexicons that are developed by translating three reliable English lexicons. The experimental results from the test datasets confirmed that a developed non-English sentiment lexicon could significantly enhance the performance of non-English sentiment classifications, compared with other methods and lexicons. The developed lexicon in the Arabic language outperformed other commonly used methods for developing non-English lexicons, with an 0.74 F measure. The adopted approach in this study was proven to be language independent and can be implemented in other languages as well. This paper also contributes to understanding the approaches to developing sentiment resources. © 2019, Springer Science+Business Media, LLC, part of Springer Nature. Springer Verlag 2019 Article PeerReviewed Kaity, Mohammed and Balakrishnan, Vimala (2019) An automatic non-English sentiment lexicon builder using unannotated corpus. The Journal of Supercomputing, 75 (4). pp. 2243-2268. ISSN 0920-8542, DOI https://doi.org/10.1007/s11227-019-02755-3 <https://doi.org/10.1007/s11227-019-02755-3>. https://doi.org/10.1007/s11227-019-02755-3 doi:10.1007/s11227-019-02755-3 |
spellingShingle | QA75 Electronic computers. Computer science Kaity, Mohammed Balakrishnan, Vimala An automatic non-English sentiment lexicon builder using unannotated corpus |
title | An automatic non-English sentiment lexicon builder using unannotated corpus |
title_full | An automatic non-English sentiment lexicon builder using unannotated corpus |
title_fullStr | An automatic non-English sentiment lexicon builder using unannotated corpus |
title_full_unstemmed | An automatic non-English sentiment lexicon builder using unannotated corpus |
title_short | An automatic non-English sentiment lexicon builder using unannotated corpus |
title_sort | automatic non english sentiment lexicon builder using unannotated corpus |
topic | QA75 Electronic computers. Computer science |
work_keys_str_mv | AT kaitymohammed anautomaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus AT balakrishnanvimala anautomaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus AT kaitymohammed automaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus AT balakrishnanvimala automaticnonenglishsentimentlexiconbuilderusingunannotatedcorpus |