Efficient privacy-preserving variable-length substring match for genome sequence

Abstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies...

Full description

Bibliographic Details
Main Authors:	Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu
Format:	Article
Language:	English
Published:	BMC 2022-04-01
Series:	Algorithms for Molecular Biology
Subjects:	Private genome sequence search Secure multiparty computation Secret sharing FM-index Suffix array LCP array
Online Access:	https://doi.org/10.1186/s13015-022-00211-1

_version_	1811289658619854848
author	Yoshiki Nakagawa Satsuya Ohata Kana Shimizu
author_facet	Yoshiki Nakagawa Satsuya Ohata Kana Shimizu
author_sort	Yoshiki Nakagawa
collection	DOAJ
description	Abstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that $$V[V[\ldots V[p_0] \ldots ]]$$ V [ V [ … V [ p 0 ] … ] ] is computed for a given depth of recursion where $$p_0$$ p 0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.
first_indexed	2024-04-13T03:59:31Z
format	Article
id	doaj.art-108657317ce04001baf4c1816de84486
institution	Directory Open Access Journal
issn	1748-7188
language	English
last_indexed	2024-04-13T03:59:31Z
publishDate	2022-04-01
publisher	BMC
record_format	Article
series	Algorithms for Molecular Biology
spelling	doaj.art-108657317ce04001baf4c1816de844862022-12-22T03:03:31ZengBMCAlgorithms for Molecular Biology1748-71882022-04-0117112210.1186/s13015-022-00211-1Efficient privacy-preserving variable-length substring match for genome sequenceYoshiki Nakagawa0Satsuya Ohata1Kana Shimizu2Department of Computer Science and Engineering, Waseda UniversitySelf-employmentDepartment of Computer Science and Engineering, Waseda UniversityAbstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that $$V[V[\ldots V[p_0] \ldots ]]$$ V [ V [ … V [ p 0 ] … ] ] is computed for a given depth of recursion where $$p_0$$ p 0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.https://doi.org/10.1186/s13015-022-00211-1Private genome sequence searchSecure multiparty computationSecret sharingFM-indexSuffix arrayLCP array
spellingShingle	Yoshiki Nakagawa Satsuya Ohata Kana Shimizu Efficient privacy-preserving variable-length substring match for genome sequence Algorithms for Molecular Biology Private genome sequence search Secure multiparty computation Secret sharing FM-index Suffix array LCP array
title	Efficient privacy-preserving variable-length substring match for genome sequence
title_full	Efficient privacy-preserving variable-length substring match for genome sequence
title_fullStr	Efficient privacy-preserving variable-length substring match for genome sequence
title_full_unstemmed	Efficient privacy-preserving variable-length substring match for genome sequence
title_short	Efficient privacy-preserving variable-length substring match for genome sequence
title_sort	efficient privacy preserving variable length substring match for genome sequence
topic	Private genome sequence search Secure multiparty computation Secret sharing FM-index Suffix array LCP array
url	https://doi.org/10.1186/s13015-022-00211-1
work_keys_str_mv	AT yoshikinakagawa efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence AT satsuyaohata efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence AT kanashimizu efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence

Efficient privacy-preserving variable-length substring match for genome sequence

Similar Items