Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]

Searching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce Seqpare, a self-consistent and effective metric of similarity and tool for comparing...

Full description

Bibliographic Details
Main Authors: Selena C. Feng, Nathan C. Sheffield, Jianglin Feng
Format: Article
Language:English
Published: F1000 Research Ltd 2021-01-01
Series:F1000Research
Online Access:https://f1000research.com/articles/9-581/v2
_version_ 1828977003066294272
author Selena C. Feng
Nathan C. Sheffield
Jianglin Feng
author_facet Selena C. Feng
Nathan C. Sheffield
Jianglin Feng
author_sort Selena C. Feng
collection DOAJ
description Searching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce Seqpare, a self-consistent and effective metric of similarity and tool for comparing sequences based on their interval sets. With this metric, the similarity of two interval sets is quantified by a single index, the ratio of their effective overlap over the union: an index of zero indicates unrelated interval sets, and an index of one means that the interval sets are identical. Analysis and tests confirm the effectiveness and self-consistency of the Seqpare metric.
first_indexed 2024-12-14T14:50:02Z
format Article
id doaj.art-157aedf3a25a47b6ad70413f9a3a41bb
institution Directory Open Access Journal
issn 2046-1402
language English
last_indexed 2024-12-14T14:50:02Z
publishDate 2021-01-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj.art-157aedf3a25a47b6ad70413f9a3a41bb2022-12-21T22:57:09ZengF1000 Research LtdF1000Research2046-14022021-01-01910.12688/f1000research.23390.231408Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]Selena C. Feng0Nathan C. Sheffield1Jianglin Feng2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USACenter for Public Health Genomics, University of Virginia, Charlottesville, VA, USADeepstanding LLC, Crozet, VA, 22932, USASearching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce Seqpare, a self-consistent and effective metric of similarity and tool for comparing sequences based on their interval sets. With this metric, the similarity of two interval sets is quantified by a single index, the ratio of their effective overlap over the union: an index of zero indicates unrelated interval sets, and an index of one means that the interval sets are identical. Analysis and tests confirm the effectiveness and self-consistency of the Seqpare metric.https://f1000research.com/articles/9-581/v2
spellingShingle Selena C. Feng
Nathan C. Sheffield
Jianglin Feng
Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]
F1000Research
title Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]
title_full Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]
title_fullStr Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]
title_full_unstemmed Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]
title_short Seqpare: a novel metric of similarity between genomic interval sets [version 2; peer review: 2 approved]
title_sort seqpare a novel metric of similarity between genomic interval sets version 2 peer review 2 approved
url https://f1000research.com/articles/9-581/v2
work_keys_str_mv AT selenacfeng seqpareanovelmetricofsimilaritybetweengenomicintervalsetsversion2peerreview2approved
AT nathancsheffield seqpareanovelmetricofsimilaritybetweengenomicintervalsetsversion2peerreview2approved
AT jianglinfeng seqpareanovelmetricofsimilaritybetweengenomicintervalsetsversion2peerreview2approved