A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments

Abstract The world’s coastlines are spatially highly variable, coupled-human-natural systems that comprise a nested hierarchy of component landforms, ecosystems, and human interventions, each interacting over a range of space and time scales. Understanding and predicting coastline dynamics necessita...

Full description

Bibliographic Details
Main Authors: Daniel Buscombe, Phillipe Wernette, Sharon Fitzpatrick, Jaycee Favela, Evan B. Goldstein, Nicholas M. Enwright
Format: Article
Language:English
Published: Nature Portfolio 2023-01-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-023-01929-2
_version_ 1797946078853595136
author Daniel Buscombe
Phillipe Wernette
Sharon Fitzpatrick
Jaycee Favela
Evan B. Goldstein
Nicholas M. Enwright
author_facet Daniel Buscombe
Phillipe Wernette
Sharon Fitzpatrick
Jaycee Favela
Evan B. Goldstein
Nicholas M. Enwright
author_sort Daniel Buscombe
collection DOAJ
description Abstract The world’s coastlines are spatially highly variable, coupled-human-natural systems that comprise a nested hierarchy of component landforms, ecosystems, and human interventions, each interacting over a range of space and time scales. Understanding and predicting coastline dynamics necessitates frequent observation from imaging sensors on remote sensing platforms. Machine Learning models that carry out supervised (i.e., human-guided) pixel-based classification, or image segmentation, have transformative applications in spatio-temporal mapping of dynamic environments, including transient coastal landforms, sediments, habitats, waterbodies, and water flows. However, these models require large and well-documented training and testing datasets consisting of labeled imagery. We describe “Coast Train,” a multi-labeler dataset of orthomosaic and satellite images of coastal environments and corresponding labels. These data include imagery that are diverse in space and time, and contain 1.2 billion labeled pixels, representing over 3.6 million hectares. We use a human-in-the-loop tool especially designed for rapid and reproducible Earth surface image segmentation. Our approach permits image labeling by multiple labelers, in turn enabling quantification of pixel-level agreement over individual and collections of images.
first_indexed 2024-04-10T21:05:12Z
format Article
id doaj.art-45d678cc93a240f8a63e5767197bb636
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-04-10T21:05:12Z
publishDate 2023-01-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-45d678cc93a240f8a63e5767197bb6362023-01-22T12:04:16ZengNature PortfolioScientific Data2052-44632023-01-0110111810.1038/s41597-023-01929-2A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal EnvironmentsDaniel Buscombe0Phillipe Wernette1Sharon Fitzpatrick2Jaycee Favela3Evan B. Goldstein4Nicholas M. Enwright5Contractor, U.S. Geological Survey Pacific Coastal and Marine Science CenterU.S. Geological Survey Pacific Coastal and Marine Science CenterContractor, U.S. Geological Survey Pacific Coastal and Marine Science CenterContractor, U.S. Geological Survey Pacific Coastal and Marine Science CenterDepartment of Geography, Environment, and Sustainability, University of North Carolina at GreensboroU.S. Geological Survey Wetland and Aquatic Research CenterAbstract The world’s coastlines are spatially highly variable, coupled-human-natural systems that comprise a nested hierarchy of component landforms, ecosystems, and human interventions, each interacting over a range of space and time scales. Understanding and predicting coastline dynamics necessitates frequent observation from imaging sensors on remote sensing platforms. Machine Learning models that carry out supervised (i.e., human-guided) pixel-based classification, or image segmentation, have transformative applications in spatio-temporal mapping of dynamic environments, including transient coastal landforms, sediments, habitats, waterbodies, and water flows. However, these models require large and well-documented training and testing datasets consisting of labeled imagery. We describe “Coast Train,” a multi-labeler dataset of orthomosaic and satellite images of coastal environments and corresponding labels. These data include imagery that are diverse in space and time, and contain 1.2 billion labeled pixels, representing over 3.6 million hectares. We use a human-in-the-loop tool especially designed for rapid and reproducible Earth surface image segmentation. Our approach permits image labeling by multiple labelers, in turn enabling quantification of pixel-level agreement over individual and collections of images.https://doi.org/10.1038/s41597-023-01929-2
spellingShingle Daniel Buscombe
Phillipe Wernette
Sharon Fitzpatrick
Jaycee Favela
Evan B. Goldstein
Nicholas M. Enwright
A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments
Scientific Data
title A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments
title_full A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments
title_fullStr A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments
title_full_unstemmed A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments
title_short A 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments
title_sort 1 2 billion pixel human labeled dataset for data driven classification of coastal environments
url https://doi.org/10.1038/s41597-023-01929-2
work_keys_str_mv AT danielbuscombe a12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT phillipewernette a12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT sharonfitzpatrick a12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT jayceefavela a12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT evanbgoldstein a12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT nicholasmenwright a12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT danielbuscombe 12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT phillipewernette 12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT sharonfitzpatrick 12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT jayceefavela 12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT evanbgoldstein 12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments
AT nicholasmenwright 12billionpixelhumanlabeleddatasetfordatadrivenclassificationofcoastalenvironments