Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases
While organizations in the current era of big data are generating massive volumes of data, they also need to ensure that its quality is maintained for it to be useful in decision-making purposes. The problem of dirty data plagues every organization. One aspect of dirty data is the presence of duplic...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8746175/ |
_version_ | 1818911286982868992 |
---|---|
author | Morteza Saberi Omar Khadeer Hussain Elizabeth Chang |
author_facet | Morteza Saberi Omar Khadeer Hussain Elizabeth Chang |
author_sort | Morteza Saberi |
collection | DOAJ |
description | While organizations in the current era of big data are generating massive volumes of data, they also need to ensure that its quality is maintained for it to be useful in decision-making purposes. The problem of dirty data plagues every organization. One aspect of dirty data is the presence of duplicate data records that negatively impact the organization's operations in many ways. Many existing approaches attempt to address this problem by using traditional data cleansing methods. In this paper, we address this problem by using an in-house crowdsourcing-based framework, namely, DedupCrowd. One of the main obstacles of crowdsourcing-based approaches is to monitor the performance of the crowd, by which the integrity of the whole process is maintained. In this paper, a statistical quality control-based technique is proposed to regulate the performance of the crowd. We apply our proposed framework in the context of a contact center, where the Customer Service Representatives are used as the crowd to assist in the process of deduplicating detection. By using comprehensive working examples, we show how the different modules of the DedupCrowd work not only to monitor the performance of the crowd but also to assist in duplicate detection. |
first_indexed | 2024-12-19T22:56:18Z |
format | Article |
id | doaj.art-d8dff5c07f4f4820ba14a73982024eaa |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-19T22:56:18Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-d8dff5c07f4f4820ba14a73982024eaa2022-12-21T20:02:38ZengIEEEIEEE Access2169-35362019-01-017907159073010.1109/ACCESS.2019.29249798746175Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ DatabasesMorteza Saberi0https://orcid.org/0000-0002-5168-2078Omar Khadeer Hussain1https://orcid.org/0000-0002-5738-6560Elizabeth Chang2School of Information, Systems and Modelling, University of Technology Sydney, Sydney, NSW, AustraliaSchool of Business, University of New South Wales, Canberra, ACT, AustraliaSchool of Business, University of New South Wales, Canberra, ACT, AustraliaWhile organizations in the current era of big data are generating massive volumes of data, they also need to ensure that its quality is maintained for it to be useful in decision-making purposes. The problem of dirty data plagues every organization. One aspect of dirty data is the presence of duplicate data records that negatively impact the organization's operations in many ways. Many existing approaches attempt to address this problem by using traditional data cleansing methods. In this paper, we address this problem by using an in-house crowdsourcing-based framework, namely, DedupCrowd. One of the main obstacles of crowdsourcing-based approaches is to monitor the performance of the crowd, by which the integrity of the whole process is maintained. In this paper, a statistical quality control-based technique is proposed to regulate the performance of the crowd. We apply our proposed framework in the context of a contact center, where the Customer Service Representatives are used as the crowd to assist in the process of deduplicating detection. By using comprehensive working examples, we show how the different modules of the DedupCrowd work not only to monitor the performance of the crowd but also to assist in duplicate detection.https://ieeexplore.ieee.org/document/8746175/Quality managementquality controldata qualityduplicate detectionin-house crowdsourcing |
spellingShingle | Morteza Saberi Omar Khadeer Hussain Elizabeth Chang Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases IEEE Access Quality management quality control data quality duplicate detection in-house crowdsourcing |
title | Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases |
title_full | Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases |
title_fullStr | Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases |
title_full_unstemmed | Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases |
title_short | Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ Databases |
title_sort | quality management of workers in an in house crowdsourcing based framework for deduplication of organizations x2019 databases |
topic | Quality management quality control data quality duplicate detection in-house crowdsourcing |
url | https://ieeexplore.ieee.org/document/8746175/ |
work_keys_str_mv | AT mortezasaberi qualitymanagementofworkersinaninhousecrowdsourcingbasedframeworkfordeduplicationoforganizationsx2019databases AT omarkhadeerhussain qualitymanagementofworkersinaninhousecrowdsourcingbasedframeworkfordeduplicationoforganizationsx2019databases AT elizabethchang qualitymanagementofworkersinaninhousecrowdsourcingbasedframeworkfordeduplicationoforganizationsx2019databases |