Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research

A growing body of literature has examined the potential of machine learning algorithms in constructing social indicators based on the automatic classification of digital traces. However, as long as the classification algorithms’ predictions are not completely error-free, the estimate of t...

Full description

Bibliographic Details
Main Authors:	Sergey Smetanin, Mikhail Komarov
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Misclassification bias social indicators classification supervised machine learning computational social science sentiment analysis
Online Access:	https://ieeexplore.ieee.org/document/9706439/

_version_	1828890039107452928
author	Sergey Smetanin Mikhail Komarov
author_facet	Sergey Smetanin Mikhail Komarov
author_sort	Sergey Smetanin
collection	DOAJ
description	A growing body of literature has examined the potential of machine learning algorithms in constructing social indicators based on the automatic classification of digital traces. However, as long as the classification algorithms’ predictions are not completely error-free, the estimate of the relative occurrence of a particular class may be affected by misclassification bias, thereby affecting the value of the calculated social indicator. Although a significant amount of studies have investigated misclassification bias correction techniques, they commonly rely on a set of assumptions that are likely to be violated in practice, which calls into question the effectiveness of these methods. Thus, there is a knowledge gap with respect to the assessment of misclassification bias’s impact on a specific social indicator formula without strict reference to the number of classes. Moreover, given the erroneous nature of automatic classification algorithms, the quality of a predicted indicator can be assessed not only using regression quality metrics, as was done in existing literature, but also using correlation metrics. In this paper, we propose a simulation approach for assessing the impact of misclassification bias on the calculated social indicators in terms of regression and correlation metrics. The proposed approach focuses on indicators calculated based on the distribution of classes and can process any number of classes. The proposed approach allows selecting the most appropriate classification model for a particular social indicator, and vice versa. Moreover, it allows for assessment of the optimistic level of correlation between the indicator calculated based on the results of the classification algorithm and the true underlying indicator.
first_indexed	2024-12-13T12:51:37Z
format	Article
id	doaj.art-d3fc775f847b43f186dc401a633616de
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-13T12:51:37Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-d3fc775f847b43f186dc401a633616de2022-12-21T23:45:19ZengIEEEIEEE Access2169-35362022-01-0110188861889810.1109/ACCESS.2022.31498979706439Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators ResearchSergey Smetanin0https://orcid.org/0000-0001-6373-3410Mikhail Komarov1https://orcid.org/0000-0001-7075-0016Department of Business Informatics, Graduate School of Business, National Research University Higher School of Economics, Moscow, RussiaDepartment of Business Informatics, Graduate School of Business, National Research University Higher School of Economics, Moscow, RussiaA growing body of literature has examined the potential of machine learning algorithms in constructing social indicators based on the automatic classification of digital traces. However, as long as the classification algorithms’ predictions are not completely error-free, the estimate of the relative occurrence of a particular class may be affected by misclassification bias, thereby affecting the value of the calculated social indicator. Although a significant amount of studies have investigated misclassification bias correction techniques, they commonly rely on a set of assumptions that are likely to be violated in practice, which calls into question the effectiveness of these methods. Thus, there is a knowledge gap with respect to the assessment of misclassification bias’s impact on a specific social indicator formula without strict reference to the number of classes. Moreover, given the erroneous nature of automatic classification algorithms, the quality of a predicted indicator can be assessed not only using regression quality metrics, as was done in existing literature, but also using correlation metrics. In this paper, we propose a simulation approach for assessing the impact of misclassification bias on the calculated social indicators in terms of regression and correlation metrics. The proposed approach focuses on indicators calculated based on the distribution of classes and can process any number of classes. The proposed approach allows selecting the most appropriate classification model for a particular social indicator, and vice versa. Moreover, it allows for assessment of the optimistic level of correlation between the indicator calculated based on the results of the classification algorithm and the true underlying indicator.https://ieeexplore.ieee.org/document/9706439/Misclassification biassocial indicatorsclassificationsupervised machine learningcomputational social sciencesentiment analysis
spellingShingle	Sergey Smetanin Mikhail Komarov Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research IEEE Access Misclassification bias social indicators classification supervised machine learning computational social science sentiment analysis
title	Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research
title_full	Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research
title_fullStr	Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research
title_full_unstemmed	Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research
title_short	Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research
title_sort	misclassification bias in computational social science a simulation approach for assessing the impact of classification errors on social indicators research
topic	Misclassification bias social indicators classification supervised machine learning computational social science sentiment analysis
url	https://ieeexplore.ieee.org/document/9706439/
work_keys_str_mv	AT sergeysmetanin misclassificationbiasincomputationalsocialscienceasimulationapproachforassessingtheimpactofclassificationerrorsonsocialindicatorsresearch AT mikhailkomarov misclassificationbiasincomputationalsocialscienceasimulationapproachforassessingtheimpactofclassificationerrorsonsocialindicatorsresearch

Misclassification Bias in Computational Social Science: A Simulation Approach for Assessing the Impact of Classification Errors on Social Indicators Research

Similar Items