BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working w...

Full description

Bibliographic Details
Main Authors: David Sehnal, Sebastian Bittrich, Sameer Velankar, Jaroslav Koča, Radka Svobodová, Stephen K Burley, Alexander S Rose
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-10-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008247
_version_ 1819260299283267584
author David Sehnal
Sebastian Bittrich
Sameer Velankar
Jaroslav Koča
Radka Svobodová
Stephen K Burley
Alexander S Rose
author_facet David Sehnal
Sebastian Bittrich
Sameer Velankar
Jaroslav Koča
Radka Svobodová
Stephen K Burley
Alexander S Rose
author_sort David Sehnal
collection DOAJ
description 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
first_indexed 2024-12-23T19:23:42Z
format Article
id doaj.art-313803daed7c480ca2e26869858d8f23
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-23T19:23:42Z
publishDate 2020-10-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-313803daed7c480ca2e26869858d8f232022-12-21T17:34:06ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-10-011610e100824710.1371/journal.pcbi.1008247BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.David SehnalSebastian BittrichSameer VelankarJaroslav KočaRadka SvobodováStephen K BurleyAlexander S Rose3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.https://doi.org/10.1371/journal.pcbi.1008247
spellingShingle David Sehnal
Sebastian Bittrich
Sameer Velankar
Jaroslav Koča
Radka Svobodová
Stephen K Burley
Alexander S Rose
BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.
PLoS Computational Biology
title BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.
title_full BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.
title_fullStr BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.
title_full_unstemmed BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.
title_short BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.
title_sort binarycif and ciftools lightweight efficient and extensible macromolecular data management
url https://doi.org/10.1371/journal.pcbi.1008247
work_keys_str_mv AT davidsehnal binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT sebastianbittrich binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT sameervelankar binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT jaroslavkoca binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT radkasvobodova binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT stephenkburley binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT alexandersrose binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement