Identifying elemental genomic track types and representing them uniformly

<p>Abstract</p> <p>Background</p> <p>With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly bei...

Full description

Bibliographic Details
Main Authors: Gundersen Sveinung, Kalaš Matúš, Abul Osman, Frigessi Arnoldo, Hovig Eivind, Sandve Geir
Format: Article
Language:English
Published: BMC 2011-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/494
_version_ 1828411338533109760
author Gundersen Sveinung
Kalaš Matúš
Abul Osman
Frigessi Arnoldo
Hovig Eivind
Sandve Geir
author_facet Gundersen Sveinung
Kalaš Matúš
Abul Osman
Frigessi Arnoldo
Hovig Eivind
Sandve Geir
author_sort Gundersen Sveinung
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.</p> <p>Results</p> <p>We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.</p> <p>Conclusions</p> <p>The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.</p>
first_indexed 2024-12-10T12:28:12Z
format Article
id doaj.art-0022b44dc1354015be97db49d23bc8c6
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T12:28:12Z
publishDate 2011-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-0022b44dc1354015be97db49d23bc8c62022-12-22T01:48:53ZengBMCBMC Bioinformatics1471-21052011-12-0112149410.1186/1471-2105-12-494Identifying elemental genomic track types and representing them uniformlyGundersen SveinungKalaš MatúšAbul OsmanFrigessi ArnoldoHovig EivindSandve Geir<p>Abstract</p> <p>Background</p> <p>With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.</p> <p>Results</p> <p>We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.</p> <p>Conclusions</p> <p>The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.</p>http://www.biomedcentral.com/1471-2105/12/494
spellingShingle Gundersen Sveinung
Kalaš Matúš
Abul Osman
Frigessi Arnoldo
Hovig Eivind
Sandve Geir
Identifying elemental genomic track types and representing them uniformly
BMC Bioinformatics
title Identifying elemental genomic track types and representing them uniformly
title_full Identifying elemental genomic track types and representing them uniformly
title_fullStr Identifying elemental genomic track types and representing them uniformly
title_full_unstemmed Identifying elemental genomic track types and representing them uniformly
title_short Identifying elemental genomic track types and representing them uniformly
title_sort identifying elemental genomic track types and representing them uniformly
url http://www.biomedcentral.com/1471-2105/12/494
work_keys_str_mv AT gundersensveinung identifyingelementalgenomictracktypesandrepresentingthemuniformly
AT kalasmatus identifyingelementalgenomictracktypesandrepresentingthemuniformly
AT abulosman identifyingelementalgenomictracktypesandrepresentingthemuniformly
AT frigessiarnoldo identifyingelementalgenomictracktypesandrepresentingthemuniformly
AT hovigeivind identifyingelementalgenomictracktypesandrepresentingthemuniformly
AT sandvegeir identifyingelementalgenomictracktypesandrepresentingthemuniformly