A machine reading system for assembling synthetic paleontological databases.
Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types....
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2014-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC4250071?pdf=render |
_version_ | 1818416055708549120 |
---|---|
author | Shanan E Peters Ce Zhang Miron Livny Christopher Ré |
author_facet | Shanan E Peters Ce Zhang Miron Livny Christopher Ré |
author_sort | Shanan E Peters |
collection | DOAJ |
description | Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry. |
first_indexed | 2024-12-14T11:44:48Z |
format | Article |
id | doaj.art-c8c5a154049646a8b8a734cac747c7f5 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-14T11:44:48Z |
publishDate | 2014-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-c8c5a154049646a8b8a734cac747c7f52022-12-21T23:02:39ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-01912e11352310.1371/journal.pone.0113523A machine reading system for assembling synthetic paleontological databases.Shanan E PetersCe ZhangMiron LivnyChristopher RéMany aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.http://europepmc.org/articles/PMC4250071?pdf=render |
spellingShingle | Shanan E Peters Ce Zhang Miron Livny Christopher Ré A machine reading system for assembling synthetic paleontological databases. PLoS ONE |
title | A machine reading system for assembling synthetic paleontological databases. |
title_full | A machine reading system for assembling synthetic paleontological databases. |
title_fullStr | A machine reading system for assembling synthetic paleontological databases. |
title_full_unstemmed | A machine reading system for assembling synthetic paleontological databases. |
title_short | A machine reading system for assembling synthetic paleontological databases. |
title_sort | machine reading system for assembling synthetic paleontological databases |
url | http://europepmc.org/articles/PMC4250071?pdf=render |
work_keys_str_mv | AT shananepeters amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT cezhang amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT mironlivny amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT christopherre amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT shananepeters machinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT cezhang machinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT mironlivny machinereadingsystemforassemblingsyntheticpaleontologicaldatabases AT christopherre machinereadingsystemforassemblingsyntheticpaleontologicaldatabases |