Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library

<p>Abstract</p> <p>Background</p> <p>The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about t...

Full description

Bibliographic Details
Main Author: Page Roderic DM
Format: Article
Language:English
Published: BMC 2011-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/187
_version_ 1818540364510789632
author Page Roderic DM
author_facet Page Roderic DM
author_sort Page Roderic DM
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive.</p> <p>Description</p> <p>A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site <url>http://biostor.org/openurl/</url>. This resolver can be used on the web, or called by bibliographic tools that support OpenURL.</p> <p>Conclusions</p> <p>BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from <url>http://biostor.org/</url>.</p>
first_indexed 2024-12-11T21:54:22Z
format Article
id doaj.art-e32c235dba2547cab58b19cddfb55fc1
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T21:54:22Z
publishDate 2011-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-e32c235dba2547cab58b19cddfb55fc12022-12-22T00:49:22ZengBMCBMC Bioinformatics1471-21052011-05-0112118710.1186/1471-2105-12-187Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage LibraryPage Roderic DM<p>Abstract</p> <p>Background</p> <p>The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive.</p> <p>Description</p> <p>A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site <url>http://biostor.org/openurl/</url>. This resolver can be used on the web, or called by bibliographic tools that support OpenURL.</p> <p>Conclusions</p> <p>BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from <url>http://biostor.org/</url>.</p>http://www.biomedcentral.com/1471-2105/12/187
spellingShingle Page Roderic DM
Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
BMC Bioinformatics
title Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
title_full Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
title_fullStr Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
title_full_unstemmed Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
title_short Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
title_sort extracting scientific articles from a large digital archive biostor and the biodiversity heritage library
url http://www.biomedcentral.com/1471-2105/12/187
work_keys_str_mv AT pagerodericdm extractingscientificarticlesfromalargedigitalarchivebiostorandthebiodiversityheritagelibrary