A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment

The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by t...

Full description

Bibliographic Details
Main Authors: Rosario Catelli, Luca Bevilacqua, Nicola Mariniello, Vladimiro Scotto Di Carlo, Massimo Magaldi, Hamido Fujita, Giuseppe De Pietro, Massimo Esposito
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10129178/
_version_ 1827932563635699712
author Rosario Catelli
Luca Bevilacqua
Nicola Mariniello
Vladimiro Scotto Di Carlo
Massimo Magaldi
Hamido Fujita
Giuseppe De Pietro
Massimo Esposito
author_facet Rosario Catelli
Luca Bevilacqua
Nicola Mariniello
Vladimiro Scotto Di Carlo
Massimo Magaldi
Hamido Fujita
Giuseppe De Pietro
Massimo Esposito
author_sort Rosario Catelli
collection DOAJ
description The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.
first_indexed 2024-03-13T07:11:41Z
format Article
id doaj.art-3b0c29303c954893b8925ddf2106a5ab
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T07:11:41Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3b0c29303c954893b8925ddf2106a5ab2023-06-05T23:00:35ZengIEEEIEEE Access2169-35362023-01-0111522145222510.1109/ACCESS.2023.327749010129178A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the SentimentRosario Catelli0https://orcid.org/0000-0001-5598-6477Luca Bevilacqua1Nicola Mariniello2Vladimiro Scotto Di Carlo3https://orcid.org/0000-0002-0979-1879Massimo Magaldi4Hamido Fujita5https://orcid.org/0000-0001-5256-210XGiuseppe De Pietro6Massimo Esposito7https://orcid.org/0000-0002-7196-7994Institute for High Performance Computing and Networking (ICAR), National Research Council, Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyEngineering Ingegneria Informatica S.p.A., Naples, ItalyFaculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, VietnamInstitute for High Performance Computing and Networking (ICAR), National Research Council, Naples, ItalyInstitute for High Performance Computing and Networking (ICAR), National Research Council, Naples, ItalyThe growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.https://ieeexplore.ieee.org/document/10129178/Italian cultural heritagedata setfake reviewssentiment analysisdeceptive
spellingShingle Rosario Catelli
Luca Bevilacqua
Nicola Mariniello
Vladimiro Scotto Di Carlo
Massimo Magaldi
Hamido Fujita
Giuseppe De Pietro
Massimo Esposito
A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
IEEE Access
Italian cultural heritage
data set
fake reviews
sentiment analysis
deceptive
title A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
title_full A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
title_fullStr A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
title_full_unstemmed A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
title_short A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
title_sort new italian cultural heritage data set detecting fake reviews with bert and electra leveraging the sentiment
topic Italian cultural heritage
data set
fake reviews
sentiment analysis
deceptive
url https://ieeexplore.ieee.org/document/10129178/
work_keys_str_mv AT rosariocatelli anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT lucabevilacqua anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT nicolamariniello anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT vladimiroscottodicarlo anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT massimomagaldi anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT hamidofujita anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT giuseppedepietro anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT massimoesposito anewitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT rosariocatelli newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT lucabevilacqua newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT nicolamariniello newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT vladimiroscottodicarlo newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT massimomagaldi newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT hamidofujita newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT giuseppedepietro newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment
AT massimoesposito newitalianculturalheritagedatasetdetectingfakereviewswithbertandelectraleveragingthesentiment