FDup: a framework for general-purpose and efficient entity deduplication of record collections
Deduplication is a technique aiming at identifying and resolving duplicate metadata records in a collection. This article describes FDup (Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata recor...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2022-09-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-1058.pdf |