A framework for the preservation of a docker container

Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, in...

Full description

Bibliographic Details
Main Authors: Emsley, I, De Roure, D
Format: Journal article
Published: Digital Curation Centre 2018
_version_ 1826286292480032768
author Emsley, I
De Roure, D
author_facet Emsley, I
De Roure, D
author_sort Emsley, I
collection OXFORD
description Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, in addition to scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place, are becoming common within projects. It's seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results, by creating the machine environment and package versions. It also helps development and testing by ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process, but also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology, to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance as useful within the build process to provide a complete overview of the workflow.
first_indexed 2024-03-07T01:41:35Z
format Journal article
id oxford-uuid:970ad8fd-625c-4289-aa54-e8bbcfd8f291
institution University of Oxford
last_indexed 2024-03-07T01:41:35Z
publishDate 2018
publisher Digital Curation Centre
record_format dspace
spelling oxford-uuid:970ad8fd-625c-4289-aa54-e8bbcfd8f2912022-03-26T23:56:53ZA framework for the preservation of a docker containerJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:970ad8fd-625c-4289-aa54-e8bbcfd8f291Symplectic Elements at OxfordDigital Curation Centre2018Emsley, IDe Roure, DReliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, in addition to scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place, are becoming common within projects. It's seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results, by creating the machine environment and package versions. It also helps development and testing by ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process, but also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology, to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance as useful within the build process to provide a complete overview of the workflow.
spellingShingle Emsley, I
De Roure, D
A framework for the preservation of a docker container
title A framework for the preservation of a docker container
title_full A framework for the preservation of a docker container
title_fullStr A framework for the preservation of a docker container
title_full_unstemmed A framework for the preservation of a docker container
title_short A framework for the preservation of a docker container
title_sort framework for the preservation of a docker container
work_keys_str_mv AT emsleyi aframeworkforthepreservationofadockercontainer
AT deroured aframeworkforthepreservationofadockercontainer
AT emsleyi frameworkforthepreservationofadockercontainer
AT deroured frameworkforthepreservationofadockercontainer