Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
Abstract Background Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-07-01
|
Series: | BMC Medical Genomics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12920-020-0725-y |
_version_ | 1818725882461683712 |
---|---|
author | Shuaicheng Ma Yang Cao Li Xiong |
author_facet | Shuaicheng Ma Yang Cao Li Xiong |
author_sort | Shuaicheng Ma |
collection | DOAJ |
description | Abstract Background Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access log audit system is vulnerable to the single point of attack and also lack transparency since the log could be tampered by a malicious system administrator or internal adversaries. Several studies have proposed blockchain-based access audit to solve this problem but without considering the efficiency of the audit queries. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. Methods We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. Results We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. Conclusion We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites. |
first_indexed | 2024-12-17T21:49:22Z |
format | Article |
id | doaj.art-b404ccbd523f41e49c8d6acb09956528 |
institution | Directory Open Access Journal |
issn | 1755-8794 |
language | English |
last_indexed | 2024-12-17T21:49:22Z |
publishDate | 2020-07-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Genomics |
spelling | doaj.art-b404ccbd523f41e49c8d6acb099565282022-12-21T21:31:22ZengBMCBMC Medical Genomics1755-87942020-07-0113S711310.1186/s12920-020-0725-yEfficient logging and querying for blockchain-based cross-site genomic dataset access auditShuaicheng Ma0Yang Cao1Li Xiong2Department of Computer Science, Emory UniversityDepartment of Social Informatics, Kyoto UniversityDepartment of Computer Science, Emory UniversityAbstract Background Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access log audit system is vulnerable to the single point of attack and also lack transparency since the log could be tampered by a malicious system administrator or internal adversaries. Several studies have proposed blockchain-based access audit to solve this problem but without considering the efficiency of the audit queries. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. Methods We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. Results We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. Conclusion We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites.http://link.springer.com/article/10.1186/s12920-020-0725-yBlockchainGenomeCross-site genomic datasetsAccess log audit |
spellingShingle | Shuaicheng Ma Yang Cao Li Xiong Efficient logging and querying for blockchain-based cross-site genomic dataset access audit BMC Medical Genomics Blockchain Genome Cross-site genomic datasets Access log audit |
title | Efficient logging and querying for blockchain-based cross-site genomic dataset access audit |
title_full | Efficient logging and querying for blockchain-based cross-site genomic dataset access audit |
title_fullStr | Efficient logging and querying for blockchain-based cross-site genomic dataset access audit |
title_full_unstemmed | Efficient logging and querying for blockchain-based cross-site genomic dataset access audit |
title_short | Efficient logging and querying for blockchain-based cross-site genomic dataset access audit |
title_sort | efficient logging and querying for blockchain based cross site genomic dataset access audit |
topic | Blockchain Genome Cross-site genomic datasets Access log audit |
url | http://link.springer.com/article/10.1186/s12920-020-0725-y |
work_keys_str_mv | AT shuaichengma efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit AT yangcao efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit AT lixiong efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit |