Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases

While organizations in the current era of big data are generating massive volumes of data, they also need to ensure that its quality is maintained for it to be useful in decision-making purposes. The problem of dirty data plagues every organization. One aspect of dirty data is the presence of duplic...

Full description

Bibliographic Details
Main Authors: Morteza Saberi, Omar Khadeer Hussain, Elizabeth Chang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8746175/
_version_ 1818911286982868992
author Morteza Saberi
Omar Khadeer Hussain
Elizabeth Chang
author_facet Morteza Saberi
Omar Khadeer Hussain
Elizabeth Chang
author_sort Morteza Saberi
collection DOAJ
description While organizations in the current era of big data are generating massive volumes of data, they also need to ensure that its quality is maintained for it to be useful in decision-making purposes. The problem of dirty data plagues every organization. One aspect of dirty data is the presence of duplicate data records that negatively impact the organization's operations in many ways. Many existing approaches attempt to address this problem by using traditional data cleansing methods. In this paper, we address this problem by using an in-house crowdsourcing-based framework, namely, DedupCrowd. One of the main obstacles of crowdsourcing-based approaches is to monitor the performance of the crowd, by which the integrity of the whole process is maintained. In this paper, a statistical quality control-based technique is proposed to regulate the performance of the crowd. We apply our proposed framework in the context of a contact center, where the Customer Service Representatives are used as the crowd to assist in the process of deduplicating detection. By using comprehensive working examples, we show how the different modules of the DedupCrowd work not only to monitor the performance of the crowd but also to assist in duplicate detection.
first_indexed 2024-12-19T22:56:18Z
format Article
id doaj.art-d8dff5c07f4f4820ba14a73982024eaa
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T22:56:18Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-d8dff5c07f4f4820ba14a73982024eaa2022-12-21T20:02:38ZengIEEEIEEE Access2169-35362019-01-017907159073010.1109/ACCESS.2019.29249798746175Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ DatabasesMorteza Saberi0https://orcid.org/0000-0002-5168-2078Omar Khadeer Hussain1https://orcid.org/0000-0002-5738-6560Elizabeth Chang2School of Information, Systems and Modelling, University of Technology Sydney, Sydney, NSW, AustraliaSchool of Business, University of New South Wales, Canberra, ACT, AustraliaSchool of Business, University of New South Wales, Canberra, ACT, AustraliaWhile organizations in the current era of big data are generating massive volumes of data, they also need to ensure that its quality is maintained for it to be useful in decision-making purposes. The problem of dirty data plagues every organization. One aspect of dirty data is the presence of duplicate data records that negatively impact the organization's operations in many ways. Many existing approaches attempt to address this problem by using traditional data cleansing methods. In this paper, we address this problem by using an in-house crowdsourcing-based framework, namely, DedupCrowd. One of the main obstacles of crowdsourcing-based approaches is to monitor the performance of the crowd, by which the integrity of the whole process is maintained. In this paper, a statistical quality control-based technique is proposed to regulate the performance of the crowd. We apply our proposed framework in the context of a contact center, where the Customer Service Representatives are used as the crowd to assist in the process of deduplicating detection. By using comprehensive working examples, we show how the different modules of the DedupCrowd work not only to monitor the performance of the crowd but also to assist in duplicate detection.https://ieeexplore.ieee.org/document/8746175/Quality managementquality controldata qualityduplicate detectionin-house crowdsourcing
spellingShingle Morteza Saberi
Omar Khadeer Hussain
Elizabeth Chang
Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
IEEE Access
Quality management
quality control
data quality
duplicate detection
in-house crowdsourcing
title Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
title_full Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
title_fullStr Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
title_full_unstemmed Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
title_short Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
title_sort quality management of workers in an in house crowdsourcing based framework for deduplication of organizations x2019 databases
topic Quality management
quality control
data quality
duplicate detection
in-house crowdsourcing
url https://ieeexplore.ieee.org/document/8746175/
work_keys_str_mv AT mortezasaberi qualitymanagementofworkersinaninhousecrowdsourcingbasedframeworkfordeduplicationoforganizationsx2019databases
AT omarkhadeerhussain qualitymanagementofworkersinaninhousecrowdsourcingbasedframeworkfordeduplicationoforganizationsx2019databases
AT elizabethchang qualitymanagementofworkersinaninhousecrowdsourcingbasedframeworkfordeduplicationoforganizationsx2019databases