FDup: a framework for general-purpose and efficient entity deduplication of record collections

Deduplication is a technique aiming at identifying and resolving duplicate metadata records in a collection. This article describes FDup (Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata recor...

Full description

Bibliographic Details
Main Authors: Michele De Bonis, Paolo Manghi, Claudio Atzori
Format: Article
Language:English
Published: PeerJ Inc. 2022-09-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-1058.pdf