FDup: a framework for general-purpose and efficient entity deduplication of record collections
Deduplication is a technique aiming at identifying and resolving duplicate metadata records in a collection. This article describes FDup (Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata recor...
Main Authors: | Michele De Bonis, Paolo Manghi, Claudio Atzori |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2022-09-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-1058.pdf |
Similar Items
-
Using Deduplicating Storage for Efficient Disk Image Deployment
by: Xing Lin, et al.
Published: (2015-08-01) -
Time-conserving deduplicated data retrieval framework for the cloud computing environment
by: P. Swathika, et al.
Published: (2023-10-01) -
Survey on Deduplication Techniques in Flash-Based Storage
by: Ilya Chernov, et al.
Published: (2018-05-01) -
Data Deduplication System Based on Content-Defined Chunking Using Bytes Pair Frequency Occurrence
by: Ahmed Sardar M. Saeed, et al.
Published: (2020-11-01) -
A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension
by: Otmane Azeroual, et al.
Published: (2022-04-01)