Efficient privacy-preserving variable-length substring match for genome sequence

Abstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies...

Full description

Bibliographic Details
Main Authors: Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu
Format: Article
Language:English
Published: BMC 2022-04-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-022-00211-1
_version_ 1811289658619854848
author Yoshiki Nakagawa
Satsuya Ohata
Kana Shimizu
author_facet Yoshiki Nakagawa
Satsuya Ohata
Kana Shimizu
author_sort Yoshiki Nakagawa
collection DOAJ
description Abstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that $$V[V[\ldots V[p_0] \ldots ]]$$ V [ V [ … V [ p 0 ] … ] ] is computed for a given depth of recursion where $$p_0$$ p 0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.
first_indexed 2024-04-13T03:59:31Z
format Article
id doaj.art-108657317ce04001baf4c1816de84486
institution Directory Open Access Journal
issn 1748-7188
language English
last_indexed 2024-04-13T03:59:31Z
publishDate 2022-04-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj.art-108657317ce04001baf4c1816de844862022-12-22T03:03:31ZengBMCAlgorithms for Molecular Biology1748-71882022-04-0117112210.1186/s13015-022-00211-1Efficient privacy-preserving variable-length substring match for genome sequenceYoshiki Nakagawa0Satsuya Ohata1Kana Shimizu2Department of Computer Science and Engineering, Waseda UniversitySelf-employmentDepartment of Computer Science and Engineering, Waseda UniversityAbstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that $$V[V[\ldots V[p_0] \ldots ]]$$ V [ V [ … V [ p 0 ] … ] ] is computed for a given depth of recursion where $$p_0$$ p 0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.https://doi.org/10.1186/s13015-022-00211-1Private genome sequence searchSecure multiparty computationSecret sharingFM-indexSuffix arrayLCP array
spellingShingle Yoshiki Nakagawa
Satsuya Ohata
Kana Shimizu
Efficient privacy-preserving variable-length substring match for genome sequence
Algorithms for Molecular Biology
Private genome sequence search
Secure multiparty computation
Secret sharing
FM-index
Suffix array
LCP array
title Efficient privacy-preserving variable-length substring match for genome sequence
title_full Efficient privacy-preserving variable-length substring match for genome sequence
title_fullStr Efficient privacy-preserving variable-length substring match for genome sequence
title_full_unstemmed Efficient privacy-preserving variable-length substring match for genome sequence
title_short Efficient privacy-preserving variable-length substring match for genome sequence
title_sort efficient privacy preserving variable length substring match for genome sequence
topic Private genome sequence search
Secure multiparty computation
Secret sharing
FM-index
Suffix array
LCP array
url https://doi.org/10.1186/s13015-022-00211-1
work_keys_str_mv AT yoshikinakagawa efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence
AT satsuyaohata efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence
AT kanashimizu efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence