A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by t...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10129178/ |
_version_ | 1827932563635699712 |
---|---|
author | Rosario Catelli Luca Bevilacqua Nicola Mariniello Vladimiro Scotto Di Carlo Massimo Magaldi Hamido Fujita Giuseppe De Pietro Massimo Esposito |
author_facet | Rosario Catelli Luca Bevilacqua Nicola Mariniello Vladimiro Scotto Di Carlo Massimo Magaldi Hamido Fujita Giuseppe De Pietro Massimo Esposito |
author_sort | Rosario Catelli |
collection | DOAJ |
description | The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness. |
first_indexed | 2024-03-13T07:11:41Z |
format | Article |
id | doaj.art-3b0c29303c954893b8925ddf2106a5ab |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T07:11:41Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-3b0c29303c954893b8925ddf2106a5ab2023-06-05T23:00:35ZengIEEEIEEE Access2169-35362023-01-0111522145222510.1109/ACCESS.2023.327749010129178A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the SentimentRosario Catelli0https://orcid.org/0000-0001-5598-6477Luca Bevilacqua1Nicola Mariniello2Vladimiro Scotto Di Carlo3https://orcid.org/0000-0002-0979-1879Massimo Magaldi4Hamido Fujita5https://orcid.org/0000-0001-5256-210XGiuseppe De Pietro6Massimo Esposito7https://orcid.org/0000-0002-7196-7994Institute for High Performance Computing and Networking (ICAR), National Research Council, Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyFaculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, VietnamInstitute for High Performance Computing and Networking (ICAR), National Research Council, Naples, ItalyInstitute for High Performance Computing and Networking (ICAR), National Research Council, Naples, ItalyThe growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.https://ieeexplore.ieee.org/document/10129178/Italian cultural heritagedata setfake reviewssentiment analysisdeceptive |
spellingShingle | Rosario Catelli Luca Bevilacqua Nicola Mariniello Vladimiro Scotto Di Carlo Massimo Magaldi Hamido Fujita Giuseppe De Pietro Massimo Esposito A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment IEEE Access Italian cultural heritage data set fake reviews sentiment analysis deceptive |
title | A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment |
title_full | A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment |
title_fullStr | A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment |
title_full_unstemmed | A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment |
title_short | A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment |
title_sort | new italian cultural heritage data set detecting fake reviews with bert and electra leveraging the sentiment |
topic | Italian cultural heritage data set fake reviews sentiment analysis deceptive |
url | https://ieeexplore.ieee.org/document/10129178/ |
work_keys_str_mv | AT rosariocatelli anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT lucabevilacqua anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT nicolamariniello anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT vladimiroscottodicarlo anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT massimomagaldi anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT hamidofujita anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT giuseppedepietro anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT massimoesposito anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT rosariocatelli newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT lucabevilacqua newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT nicolamariniello newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT vladimiroscottodicarlo newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT massimomagaldi newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT hamidofujita newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT giuseppedepietro newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment AT massimoesposito newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment |