SparkDWM: a scalable design of a Data Washing Machine using Apache Spark

Data volume has been one of the fast-growing assets of most real-world applications. This increases the rate of human errors such as duplication of records, misspellings, and erroneous transpositions, among other data quality issues. Entity Resolution is an ETL process that aims to resolve data inco...

Full description

Bibliographic Details
Main Authors: Nicholas Kofi Akortia Hagan, John R. Talburt
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-09-01
Series:Frontiers in Big Data
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdata.2024.1446071/full