Interpretable Classification of Wiki-Review Streams

Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect...

Full description

Bibliographic Details
Main Authors: Silvia Garcia-Mendez, Fatima Leal, Benedita Malheiro, Juan Carlos Burguillo-Rial
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10356073/
_version_ 1797376259943038976
author Silvia Garcia-Mendez
Fatima Leal
Benedita Malheiro
Juan Carlos Burguillo-Rial
author_facet Silvia Garcia-Mendez
Fatima Leal
Benedita Malheiro
Juan Carlos Burguillo-Rial
author_sort Silvia Garcia-Mendez
collection DOAJ
description Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90&#x0025; values for all evaluation metrics (accuracy, precision, recall, and <inline-formula> <tex-math notation="LaTeX">${F}$ </tex-math></inline-formula>-measure).
first_indexed 2024-03-08T19:35:58Z
format Article
id doaj.art-0400dab2c00144c1a8922d2ad6ff0d18
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T19:35:58Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0400dab2c00144c1a8922d2ad6ff0d182023-12-26T00:12:04ZengIEEEIEEE Access2169-35362023-01-011114113714115110.1109/ACCESS.2023.334247210356073Interpretable Classification of Wiki-Review StreamsSilvia Garcia-Mendez0https://orcid.org/0000-0003-0533-1303Fatima Leal1https://orcid.org/0000-0003-4418-2590Benedita Malheiro2https://orcid.org/0000-0001-9083-4292Juan Carlos Burguillo-Rial3Information Technologies Group, atlanTTic, University of Vigo, Vigo, SpainResearch on Economics, Management and Information Technologies, Universidade Portucalense, Porto, PortugalISEP, Polytechnic of Porto, Porto, PortugalInformation Technologies Group, atlanTTic, University of Vigo, Vigo, SpainWiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90&#x0025; values for all evaluation metrics (accuracy, precision, recall, and <inline-formula> <tex-math notation="LaTeX">${F}$ </tex-math></inline-formula>-measure).https://ieeexplore.ieee.org/document/10356073/Data reliability and fairnessdata-stream processing and classificationsynthetic datatransparencyvandalismwikis
spellingShingle Silvia Garcia-Mendez
Fatima Leal
Benedita Malheiro
Juan Carlos Burguillo-Rial
Interpretable Classification of Wiki-Review Streams
IEEE Access
Data reliability and fairness
data-stream processing and classification
synthetic data
transparency
vandalism
wikis
title Interpretable Classification of Wiki-Review Streams
title_full Interpretable Classification of Wiki-Review Streams
title_fullStr Interpretable Classification of Wiki-Review Streams
title_full_unstemmed Interpretable Classification of Wiki-Review Streams
title_short Interpretable Classification of Wiki-Review Streams
title_sort interpretable classification of wiki review streams
topic Data reliability and fairness
data-stream processing and classification
synthetic data
transparency
vandalism
wikis
url https://ieeexplore.ieee.org/document/10356073/
work_keys_str_mv AT silviagarciamendez interpretableclassificationofwikireviewstreams
AT fatimaleal interpretableclassificationofwikireviewstreams
AT beneditamalheiro interpretableclassificationofwikireviewstreams
AT juancarlosburguillorial interpretableclassificationofwikireviewstreams