Mining and visualising contradictory data

Abstract Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidi...

Full description

Bibliographic Details
Main Authors: Honour Chika Nwagwu, George Okereke, Chukwuemeka Nwobodo
Format: Article
Language:English
Published: SpringerOpen 2017-10-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-017-0100-9
_version_ 1818914349875462144
author Honour Chika Nwagwu
George Okereke
Chukwuemeka Nwobodo
author_facet Honour Chika Nwagwu
George Okereke
Chukwuemeka Nwobodo
author_sort Honour Chika Nwagwu
collection DOAJ
description Abstract Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidimensional attribute-values of a big dataset. This work explains the importance of identifying contradictions in a noisy dataset. It also examines how contradictory data in a large and noisy dataset can be mined and visually analysed. The authors developed ‘ConTra’, an open source application which applies mutual exclusion rule in identifying contradictory data, existing in comma separated values (CSV) dataset. ConTra’s capability to enable the identification of contradictory data in different sizes of datasets is examined. The results show that ConTra can process large dataset when hosted in servers with fast processors. It is also shown in this work that ConTra is 100% accurate in identifying contradictory data of objects whose attribute values do not conform to the mutual exclusion rule of a dataset in CSV format. Different approaches through which ConTra can mine and identify contradictory data are also presented.
first_indexed 2024-12-19T23:44:59Z
format Article
id doaj.art-ec66885673ce48f5a500a9be0c9d0f71
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-12-19T23:44:59Z
publishDate 2017-10-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-ec66885673ce48f5a500a9be0c9d0f712022-12-21T20:01:20ZengSpringerOpenJournal of Big Data2196-11152017-10-014111110.1186/s40537-017-0100-9Mining and visualising contradictory dataHonour Chika Nwagwu0George Okereke1Chukwuemeka Nwobodo2Computer Science Department, University of NigeriaComputer Science Department, University of NigeriaCare of Dr. Nwagwu Honour Chika, Computer Science Department, University of NigeriaAbstract Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidimensional attribute-values of a big dataset. This work explains the importance of identifying contradictions in a noisy dataset. It also examines how contradictory data in a large and noisy dataset can be mined and visually analysed. The authors developed ‘ConTra’, an open source application which applies mutual exclusion rule in identifying contradictory data, existing in comma separated values (CSV) dataset. ConTra’s capability to enable the identification of contradictory data in different sizes of datasets is examined. The results show that ConTra can process large dataset when hosted in servers with fast processors. It is also shown in this work that ConTra is 100% accurate in identifying contradictory data of objects whose attribute values do not conform to the mutual exclusion rule of a dataset in CSV format. Different approaches through which ConTra can mine and identify contradictory data are also presented.http://link.springer.com/article/10.1186/s40537-017-0100-9ConTraComma separated valuesDatasetContradictionsContradictory dataMutual exclusion values
spellingShingle Honour Chika Nwagwu
George Okereke
Chukwuemeka Nwobodo
Mining and visualising contradictory data
Journal of Big Data
ConTra
Comma separated values
Dataset
Contradictions
Contradictory data
Mutual exclusion values
title Mining and visualising contradictory data
title_full Mining and visualising contradictory data
title_fullStr Mining and visualising contradictory data
title_full_unstemmed Mining and visualising contradictory data
title_short Mining and visualising contradictory data
title_sort mining and visualising contradictory data
topic ConTra
Comma separated values
Dataset
Contradictions
Contradictory data
Mutual exclusion values
url http://link.springer.com/article/10.1186/s40537-017-0100-9
work_keys_str_mv AT honourchikanwagwu miningandvisualisingcontradictorydata
AT georgeokereke miningandvisualisingcontradictorydata
AT chukwuemekanwobodo miningandvisualisingcontradictorydata