Efficient privacy-preserving variable-length substring match for genome sequence
Abstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-04-01
|
Series: | Algorithms for Molecular Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13015-022-00211-1 |
_version_ | 1811289658619854848 |
---|---|
author | Yoshiki Nakagawa Satsuya Ohata Kana Shimizu |
author_facet | Yoshiki Nakagawa Satsuya Ohata Kana Shimizu |
author_sort | Yoshiki Nakagawa |
collection | DOAJ |
description | Abstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that $$V[V[\ldots V[p_0] \ldots ]]$$ V [ V [ … V [ p 0 ] … ] ] is computed for a given depth of recursion where $$p_0$$ p 0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment. |
first_indexed | 2024-04-13T03:59:31Z |
format | Article |
id | doaj.art-108657317ce04001baf4c1816de84486 |
institution | Directory Open Access Journal |
issn | 1748-7188 |
language | English |
last_indexed | 2024-04-13T03:59:31Z |
publishDate | 2022-04-01 |
publisher | BMC |
record_format | Article |
series | Algorithms for Molecular Biology |
spelling | doaj.art-108657317ce04001baf4c1816de844862022-12-22T03:03:31ZengBMCAlgorithms for Molecular Biology1748-71882022-04-0117112210.1186/s13015-022-00211-1Efficient privacy-preserving variable-length substring match for genome sequenceYoshiki Nakagawa0Satsuya Ohata1Kana Shimizu2Department of Computer Science and Engineering, Waseda UniversitySelf-employmentDepartment of Computer Science and Engineering, Waseda UniversityAbstract The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that $$V[V[\ldots V[p_0] \ldots ]]$$ V [ V [ … V [ p 0 ] … ] ] is computed for a given depth of recursion where $$p_0$$ p 0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.https://doi.org/10.1186/s13015-022-00211-1Private genome sequence searchSecure multiparty computationSecret sharingFM-indexSuffix arrayLCP array |
spellingShingle | Yoshiki Nakagawa Satsuya Ohata Kana Shimizu Efficient privacy-preserving variable-length substring match for genome sequence Algorithms for Molecular Biology Private genome sequence search Secure multiparty computation Secret sharing FM-index Suffix array LCP array |
title | Efficient privacy-preserving variable-length substring match for genome sequence |
title_full | Efficient privacy-preserving variable-length substring match for genome sequence |
title_fullStr | Efficient privacy-preserving variable-length substring match for genome sequence |
title_full_unstemmed | Efficient privacy-preserving variable-length substring match for genome sequence |
title_short | Efficient privacy-preserving variable-length substring match for genome sequence |
title_sort | efficient privacy preserving variable length substring match for genome sequence |
topic | Private genome sequence search Secure multiparty computation Secret sharing FM-index Suffix array LCP array |
url | https://doi.org/10.1186/s13015-022-00211-1 |
work_keys_str_mv | AT yoshikinakagawa efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence AT satsuyaohata efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence AT kanashimizu efficientprivacypreservingvariablelengthsubstringmatchforgenomesequence |