Efficient logging and querying for blockchain-based cross-site genomic dataset access audit

Abstract Background Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized...

Full description

Bibliographic Details
Main Authors: Shuaicheng Ma, Yang Cao, Li Xiong
Format: Article
Language:English
Published: BMC 2020-07-01
Series:BMC Medical Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12920-020-0725-y
_version_ 1818725882461683712
author Shuaicheng Ma
Yang Cao
Li Xiong
author_facet Shuaicheng Ma
Yang Cao
Li Xiong
author_sort Shuaicheng Ma
collection DOAJ
description Abstract Background Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access log audit system is vulnerable to the single point of attack and also lack transparency since the log could be tampered by a malicious system administrator or internal adversaries. Several studies have proposed blockchain-based access audit to solve this problem but without considering the efficiency of the audit queries. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. Methods We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. Results We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. Conclusion We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites.
first_indexed 2024-12-17T21:49:22Z
format Article
id doaj.art-b404ccbd523f41e49c8d6acb09956528
institution Directory Open Access Journal
issn 1755-8794
language English
last_indexed 2024-12-17T21:49:22Z
publishDate 2020-07-01
publisher BMC
record_format Article
series BMC Medical Genomics
spelling doaj.art-b404ccbd523f41e49c8d6acb099565282022-12-21T21:31:22ZengBMCBMC Medical Genomics1755-87942020-07-0113S711310.1186/s12920-020-0725-yEfficient logging and querying for blockchain-based cross-site genomic dataset access auditShuaicheng Ma0Yang Cao1Li Xiong2Department of Computer Science, Emory UniversityDepartment of Social Informatics, Kyoto UniversityDepartment of Computer Science, Emory UniversityAbstract Background Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access log audit system is vulnerable to the single point of attack and also lack transparency since the log could be tampered by a malicious system administrator or internal adversaries. Several studies have proposed blockchain-based access audit to solve this problem but without considering the efficiency of the audit queries. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. Methods We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. Results We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. Conclusion We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites.http://link.springer.com/article/10.1186/s12920-020-0725-yBlockchainGenomeCross-site genomic datasetsAccess log audit
spellingShingle Shuaicheng Ma
Yang Cao
Li Xiong
Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
BMC Medical Genomics
Blockchain
Genome
Cross-site genomic datasets
Access log audit
title Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_full Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_fullStr Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_full_unstemmed Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_short Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_sort efficient logging and querying for blockchain based cross site genomic dataset access audit
topic Blockchain
Genome
Cross-site genomic datasets
Access log audit
url http://link.springer.com/article/10.1186/s12920-020-0725-y
work_keys_str_mv AT shuaichengma efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit
AT yangcao efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit
AT lixiong efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit