Mappability and Read Length
Power-law distributions are the main functional form forthe distribution of repeat size and repeat copy number in the human genome. When the genome is broken into fragments for sequencing, the limited size offragments and reads may prevent an unique alignment of repeatsequences to the reference seq...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2014-11-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00381/full |
_version_ | 1818196797667934208 |
---|---|
author | Wentian eLi Jan eFreudenberg |
author_facet | Wentian eLi Jan eFreudenberg |
author_sort | Wentian eLi |
collection | DOAJ |
description | Power-law distributions are the main functional form forthe distribution of repeat size and repeat copy number in the human genome. When the genome is broken into fragments for sequencing, the limited size offragments and reads may prevent an unique alignment of repeatsequences to the reference sequence. Repeats in the human genome canbe as long as $10^4$ bases, or $10^5-10^6$ bases when allowing for mismatches between repeat units. Sequence reads from these regions are therefore unmappable when the read length is in the range of $10^3$ bases.With the read length of exactly 1000 bases, slightly more than 1% of theassembled genome, and slightly less than 1% of the 1kbreads, are unmappable, excluding the unassembled portion of the humangenome (8% in GRCh37). The slow decay (long tail) ofthe power-law function implies a diminishing return in convertingunmappable regions/reads to become mappable with the increase of theread length, with the understanding that increasing read length willalways move towards the direction of 100% mappability. |
first_indexed | 2024-12-12T01:39:48Z |
format | Article |
id | doaj.art-41dc01250a904de4987048ba1f18ba41 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-12T01:39:48Z |
publishDate | 2014-11-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-41dc01250a904de4987048ba1f18ba412022-12-22T00:42:45ZengFrontiers Media S.A.Frontiers in Genetics1664-80212014-11-01510.3389/fgene.2014.00381110803Mappability and Read LengthWentian eLi0Jan eFreudenberg1Feinstein Institute for Medical Research, North Shore LIJ Health SystemFeinstein Institute for Medical Research, North Shore LIJ Health SystemPower-law distributions are the main functional form forthe distribution of repeat size and repeat copy number in the human genome. When the genome is broken into fragments for sequencing, the limited size offragments and reads may prevent an unique alignment of repeatsequences to the reference sequence. Repeats in the human genome canbe as long as $10^4$ bases, or $10^5-10^6$ bases when allowing for mismatches between repeat units. Sequence reads from these regions are therefore unmappable when the read length is in the range of $10^3$ bases.With the read length of exactly 1000 bases, slightly more than 1% of theassembled genome, and slightly less than 1% of the 1kbreads, are unmappable, excluding the unassembled portion of the humangenome (8% in GRCh37). The slow decay (long tail) ofthe power-law function implies a diminishing return in convertingunmappable regions/reads to become mappable with the increase of theread length, with the understanding that increasing read length willalways move towards the direction of 100% mappability.http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00381/fullNext-generation sequencingRepeatsCopy Number Variationspower-law distributionmappability |
spellingShingle | Wentian eLi Jan eFreudenberg Mappability and Read Length Frontiers in Genetics Next-generation sequencing Repeats Copy Number Variations power-law distribution mappability |
title | Mappability and Read Length |
title_full | Mappability and Read Length |
title_fullStr | Mappability and Read Length |
title_full_unstemmed | Mappability and Read Length |
title_short | Mappability and Read Length |
title_sort | mappability and read length |
topic | Next-generation sequencing Repeats Copy Number Variations power-law distribution mappability |
url | http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00381/full |
work_keys_str_mv | AT wentianeli mappabilityandreadlength AT janefreudenberg mappabilityandreadlength |