Maximizing citizen scientists’ contribution to automated species recognition

Abstract Technological advances and data availability have enabled artificial intelligence-driven tools that can increasingly successfully assist in identifying species from images. Especially within citizen science, an emerging source of information filling the knowledge gaps needed to solve the bi...

Full description

Bibliographic Details
Main Authors: Wouter Koch, Laurens Hogeweg, Erlend B. Nilsen, Anders G. Finstad
Format: Article
Language:English
Published: Nature Portfolio 2022-05-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-11257-x
_version_ 1818203775534366720
author Wouter Koch
Laurens Hogeweg
Erlend B. Nilsen
Anders G. Finstad
author_facet Wouter Koch
Laurens Hogeweg
Erlend B. Nilsen
Anders G. Finstad
author_sort Wouter Koch
collection DOAJ
description Abstract Technological advances and data availability have enabled artificial intelligence-driven tools that can increasingly successfully assist in identifying species from images. Especially within citizen science, an emerging source of information filling the knowledge gaps needed to solve the biodiversity crisis, such tools can allow participants to recognize and report more poorly known species. This can be an important tool in addressing the substantial taxonomic bias in biodiversity data, where broadly recognized, charismatic species are highly over-represented. Meanwhile, the recognition models are trained using the same biased data, so it is important to consider what additional images are needed to improve recognition models. In this study, we investigated how the amount of training data influenced the performance of species recognition models for various taxa. We utilized a large citizen science dataset collected in Norway, where images are added independently from identification. We demonstrate that while adding images of currently under-represented taxa will generally improve recognition models more, there are important deviations from this general pattern. Thus, a more focused prioritization of data collection beyond the basic paradigm that “more is better” is likely to significantly improve species recognition models and advance the representativeness of biodiversity data.
first_indexed 2024-12-12T03:30:42Z
format Article
id doaj.art-ff33d342610347a6a90f2c88484bb778
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-12-12T03:30:42Z
publishDate 2022-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-ff33d342610347a6a90f2c88484bb7782022-12-22T00:39:56ZengNature PortfolioScientific Reports2045-23222022-05-0112111010.1038/s41598-022-11257-xMaximizing citizen scientists’ contribution to automated species recognitionWouter Koch0Laurens Hogeweg1Erlend B. Nilsen2Anders G. Finstad3Department of Natural History, Norwegian University of Science and TechnologyIntel BeneluxFaculty of Biosciences and Aquaculture, Nord UniversityDepartment of Natural History, Norwegian University of Science and TechnologyAbstract Technological advances and data availability have enabled artificial intelligence-driven tools that can increasingly successfully assist in identifying species from images. Especially within citizen science, an emerging source of information filling the knowledge gaps needed to solve the biodiversity crisis, such tools can allow participants to recognize and report more poorly known species. This can be an important tool in addressing the substantial taxonomic bias in biodiversity data, where broadly recognized, charismatic species are highly over-represented. Meanwhile, the recognition models are trained using the same biased data, so it is important to consider what additional images are needed to improve recognition models. In this study, we investigated how the amount of training data influenced the performance of species recognition models for various taxa. We utilized a large citizen science dataset collected in Norway, where images are added independently from identification. We demonstrate that while adding images of currently under-represented taxa will generally improve recognition models more, there are important deviations from this general pattern. Thus, a more focused prioritization of data collection beyond the basic paradigm that “more is better” is likely to significantly improve species recognition models and advance the representativeness of biodiversity data.https://doi.org/10.1038/s41598-022-11257-x
spellingShingle Wouter Koch
Laurens Hogeweg
Erlend B. Nilsen
Anders G. Finstad
Maximizing citizen scientists’ contribution to automated species recognition
Scientific Reports
title Maximizing citizen scientists’ contribution to automated species recognition
title_full Maximizing citizen scientists’ contribution to automated species recognition
title_fullStr Maximizing citizen scientists’ contribution to automated species recognition
title_full_unstemmed Maximizing citizen scientists’ contribution to automated species recognition
title_short Maximizing citizen scientists’ contribution to automated species recognition
title_sort maximizing citizen scientists contribution to automated species recognition
url https://doi.org/10.1038/s41598-022-11257-x
work_keys_str_mv AT wouterkoch maximizingcitizenscientistscontributiontoautomatedspeciesrecognition
AT laurenshogeweg maximizingcitizenscientistscontributiontoautomatedspeciesrecognition
AT erlendbnilsen maximizingcitizenscientistscontributiontoautomatedspeciesrecognition
AT andersgfinstad maximizingcitizenscientistscontributiontoautomatedspeciesrecognition