CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems

Various components are involved in the end-to-end path of data transfer. Protecting data integrity from failures in these intermediate components is a key feature of big data transfer tools. Although most of these components provide some degree of data integrity, they are either too expensive or ine...

Full description

Bibliographic Details
Main Authors: Preethika Kasu, Prince Hamandawana, Tae-Sun Chung
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/13/7830
_version_ 1827735122888097792
author Preethika Kasu
Prince Hamandawana
Tae-Sun Chung
author_facet Preethika Kasu
Prince Hamandawana
Tae-Sun Chung
author_sort Preethika Kasu
collection DOAJ
description Various components are involved in the end-to-end path of data transfer. Protecting data integrity from failures in these intermediate components is a key feature of big data transfer tools. Although most of these components provide some degree of data integrity, they are either too expensive or inefficient in recovering corrupted data. This problem highlights the need for application-level end-to-end integrity verification during data transfer. However, the computational, memory, and storage overhead of big data transfer tools can be a significant bottleneck for ensuring data integrity due to the large size of the data. This paper proposes a novel framework for data integrity verification in big data transfer systems using a cross-referencing Bloom filter. This framework has three advantages over state-of-the-art data integrity techniques: lower computation and memory overhead and zero false-positive errors for a limited number of elements. This study evaluates the computation, memory, recovery time, and false-positive overhead for the proposed framework and compares them with state-of-the-art solutions. The evaluation results indicate that the proposed framework is efficient in detecting and recovering from integrity errors while eliminating false positives in the Bloom filter data structure. In addition, we observe negligible computation, memory, and recovery overheads for all workloads.
first_indexed 2024-03-11T01:45:56Z
format Article
id doaj.art-372de2e728274472a388d8f1933af709
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T01:45:56Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-372de2e728274472a388d8f1933af7092023-11-18T16:12:04ZengMDPI AGApplied Sciences2076-34172023-07-011313783010.3390/app13137830CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer SystemsPreethika Kasu0Prince Hamandawana1Tae-Sun Chung2Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of KoreaDepartment of Software, Ajou University, Suwon 16499, Republic of KoreaDepartment of Artificial Intelligence, Ajou University, Suwon 16499, Republic of KoreaVarious components are involved in the end-to-end path of data transfer. Protecting data integrity from failures in these intermediate components is a key feature of big data transfer tools. Although most of these components provide some degree of data integrity, they are either too expensive or inefficient in recovering corrupted data. This problem highlights the need for application-level end-to-end integrity verification during data transfer. However, the computational, memory, and storage overhead of big data transfer tools can be a significant bottleneck for ensuring data integrity due to the large size of the data. This paper proposes a novel framework for data integrity verification in big data transfer systems using a cross-referencing Bloom filter. This framework has three advantages over state-of-the-art data integrity techniques: lower computation and memory overhead and zero false-positive errors for a limited number of elements. This study evaluates the computation, memory, recovery time, and false-positive overhead for the proposed framework and compares them with state-of-the-art solutions. The evaluation results indicate that the proposed framework is efficient in detecting and recovering from integrity errors while eliminating false positives in the Bloom filter data structure. In addition, we observe negligible computation, memory, and recovery overheads for all workloads.https://www.mdpi.com/2076-3417/13/13/7830data integrityBloom filtersprobabilistic structuresfalse-positive errorsdistributed systemshigh-performance computing
spellingShingle Preethika Kasu
Prince Hamandawana
Tae-Sun Chung
CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
Applied Sciences
data integrity
Bloom filters
probabilistic structures
false-positive errors
distributed systems
high-performance computing
title CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
title_full CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
title_fullStr CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
title_full_unstemmed CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
title_short CRBF: Cross-Referencing Bloom-Filter-Based Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
title_sort crbf cross referencing bloom filter based data integrity verification framework for object based big data transfer systems
topic data integrity
Bloom filters
probabilistic structures
false-positive errors
distributed systems
high-performance computing
url https://www.mdpi.com/2076-3417/13/13/7830
work_keys_str_mv AT preethikakasu crbfcrossreferencingbloomfilterbaseddataintegrityverificationframeworkforobjectbasedbigdatatransfersystems
AT princehamandawana crbfcrossreferencingbloomfilterbaseddataintegrityverificationframeworkforobjectbasedbigdatatransfersystems
AT taesunchung crbfcrossreferencingbloomfilterbaseddataintegrityverificationframeworkforobjectbasedbigdatatransfersystems