BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability

Hadoop Distributed File System (HDFS) is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop. HDFS allows one to manage large volumes of data using low-cost commodity hardware. However, vulnerabilities in HDFS can be exploited for nefarious activities....

Full description

Bibliographic Details
Main Authors: Viraaji Mothukuri, Sai S. Cheerla, Reza M. Parizi, Qi Zhang, Kim-Kwang Raymond Choo
Format: Article
Language:English
Published: Elsevier 2021-12-01
Series:Blockchain: Research and Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2096720921000270
_version_ 1818835487773687808
author Viraaji Mothukuri
Sai S. Cheerla
Reza M. Parizi
Qi Zhang
Kim-Kwang Raymond Choo
author_facet Viraaji Mothukuri
Sai S. Cheerla
Reza M. Parizi
Qi Zhang
Kim-Kwang Raymond Choo
author_sort Viraaji Mothukuri
collection DOAJ
description Hadoop Distributed File System (HDFS) is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop. HDFS allows one to manage large volumes of data using low-cost commodity hardware. However, vulnerabilities in HDFS can be exploited for nefarious activities. This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files. This is the focus of this paper, where we aim to improve the security of HDFS using a blockchain-enabled approach (hereafter referred to as BlockHDFS). Specifically, the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files' metadata for building trusted data security and traceability in HDFS.
first_indexed 2024-12-19T02:51:30Z
format Article
id doaj.art-a805439e18984d79a5adf5606262e3da
institution Directory Open Access Journal
issn 2666-9536
language English
last_indexed 2024-12-19T02:51:30Z
publishDate 2021-12-01
publisher Elsevier
record_format Article
series Blockchain: Research and Applications
spelling doaj.art-a805439e18984d79a5adf5606262e3da2022-12-21T20:38:37ZengElsevierBlockchain: Research and Applications2666-95362021-12-0124100032BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceabilityViraaji Mothukuri0Sai S. Cheerla1Reza M. Parizi2Qi Zhang3Kim-Kwang Raymond Choo4College of Computing and Software Engineering, Kennesaw State University, Kennesaw, GA, 30144, USACollege of Computing and Software Engineering, Kennesaw State University, Kennesaw, GA, 30144, USACollege of Computing and Software Engineering, Kennesaw State University, Kennesaw, GA, 30144, USAIBM Thomas J. Research Center, Yorktown Heights, NY, 10598, USADepartment of Information Systems and Cyber Security, University of Texas at San Antonio, San Antonio, TX, 78249, USA; Corresponding author.Hadoop Distributed File System (HDFS) is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop. HDFS allows one to manage large volumes of data using low-cost commodity hardware. However, vulnerabilities in HDFS can be exploited for nefarious activities. This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files. This is the focus of this paper, where we aim to improve the security of HDFS using a blockchain-enabled approach (hereafter referred to as BlockHDFS). Specifically, the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files' metadata for building trusted data security and traceability in HDFS.http://www.sciencedirect.com/science/article/pii/S2096720921000270Big dataHadoopBlockchainHyperledger fabricHadoop distributed file system (HDFS)Traceability
spellingShingle Viraaji Mothukuri
Sai S. Cheerla
Reza M. Parizi
Qi Zhang
Kim-Kwang Raymond Choo
BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability
Blockchain: Research and Applications
Big data
Hadoop
Blockchain
Hyperledger fabric
Hadoop distributed file system (HDFS)
Traceability
title BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability
title_full BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability
title_fullStr BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability
title_full_unstemmed BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability
title_short BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability
title_sort blockhdfs blockchain integrated hadoop distributed file system for secure provenance traceability
topic Big data
Hadoop
Blockchain
Hyperledger fabric
Hadoop distributed file system (HDFS)
Traceability
url http://www.sciencedirect.com/science/article/pii/S2096720921000270
work_keys_str_mv AT viraajimothukuri blockhdfsblockchainintegratedhadoopdistributedfilesystemforsecureprovenancetraceability
AT saischeerla blockhdfsblockchainintegratedhadoopdistributedfilesystemforsecureprovenancetraceability
AT rezamparizi blockhdfsblockchainintegratedhadoopdistributedfilesystemforsecureprovenancetraceability
AT qizhang blockhdfsblockchainintegratedhadoopdistributedfilesystemforsecureprovenancetraceability
AT kimkwangraymondchoo blockhdfsblockchainintegratedhadoopdistributedfilesystemforsecureprovenancetraceability