Mining and visualising contradictory data
Abstract Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidi...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2017-10-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40537-017-0100-9 |
_version_ | 1818914349875462144 |
---|---|
author | Honour Chika Nwagwu George Okereke Chukwuemeka Nwobodo |
author_facet | Honour Chika Nwagwu George Okereke Chukwuemeka Nwobodo |
author_sort | Honour Chika Nwagwu |
collection | DOAJ |
description | Abstract Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidimensional attribute-values of a big dataset. This work explains the importance of identifying contradictions in a noisy dataset. It also examines how contradictory data in a large and noisy dataset can be mined and visually analysed. The authors developed ‘ConTra’, an open source application which applies mutual exclusion rule in identifying contradictory data, existing in comma separated values (CSV) dataset. ConTra’s capability to enable the identification of contradictory data in different sizes of datasets is examined. The results show that ConTra can process large dataset when hosted in servers with fast processors. It is also shown in this work that ConTra is 100% accurate in identifying contradictory data of objects whose attribute values do not conform to the mutual exclusion rule of a dataset in CSV format. Different approaches through which ConTra can mine and identify contradictory data are also presented. |
first_indexed | 2024-12-19T23:44:59Z |
format | Article |
id | doaj.art-ec66885673ce48f5a500a9be0c9d0f71 |
institution | Directory Open Access Journal |
issn | 2196-1115 |
language | English |
last_indexed | 2024-12-19T23:44:59Z |
publishDate | 2017-10-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj.art-ec66885673ce48f5a500a9be0c9d0f712022-12-21T20:01:20ZengSpringerOpenJournal of Big Data2196-11152017-10-014111110.1186/s40537-017-0100-9Mining and visualising contradictory dataHonour Chika Nwagwu0George Okereke1Chukwuemeka Nwobodo2Computer Science Department, University of NigeriaComputer Science Department, University of NigeriaCare of Dr. Nwagwu Honour Chika, Computer Science Department, University of NigeriaAbstract Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidimensional attribute-values of a big dataset. This work explains the importance of identifying contradictions in a noisy dataset. It also examines how contradictory data in a large and noisy dataset can be mined and visually analysed. The authors developed ‘ConTra’, an open source application which applies mutual exclusion rule in identifying contradictory data, existing in comma separated values (CSV) dataset. ConTra’s capability to enable the identification of contradictory data in different sizes of datasets is examined. The results show that ConTra can process large dataset when hosted in servers with fast processors. It is also shown in this work that ConTra is 100% accurate in identifying contradictory data of objects whose attribute values do not conform to the mutual exclusion rule of a dataset in CSV format. Different approaches through which ConTra can mine and identify contradictory data are also presented.http://link.springer.com/article/10.1186/s40537-017-0100-9ConTraComma separated valuesDatasetContradictionsContradictory dataMutual exclusion values |
spellingShingle | Honour Chika Nwagwu George Okereke Chukwuemeka Nwobodo Mining and visualising contradictory data Journal of Big Data ConTra Comma separated values Dataset Contradictions Contradictory data Mutual exclusion values |
title | Mining and visualising contradictory data |
title_full | Mining and visualising contradictory data |
title_fullStr | Mining and visualising contradictory data |
title_full_unstemmed | Mining and visualising contradictory data |
title_short | Mining and visualising contradictory data |
title_sort | mining and visualising contradictory data |
topic | ConTra Comma separated values Dataset Contradictions Contradictory data Mutual exclusion values |
url | http://link.springer.com/article/10.1186/s40537-017-0100-9 |
work_keys_str_mv | AT honourchikanwagwu miningandvisualisingcontradictorydata AT georgeokereke miningandvisualisingcontradictorydata AT chukwuemekanwobodo miningandvisualisingcontradictorydata |