Recognizability bias in citizen science photographs

Citizen science and automated collection methods increasingly depend on image recognition to provide the amounts of observational data research and management needs. Recognition models, meanwhile, also require large amounts of data from these sources, creating a feedback loop between the methods and...

Full description

Bibliographic Details
Main Authors: Wouter Koch, Laurens Hogeweg, Erlend B. Nilsen, Robert B. O’Hara, Anders G. Finstad
Format: Article
Language:English
Published: The Royal Society 2023-02-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/10.1098/rsos.221063
_version_ 1797858617953615872
author Wouter Koch
Laurens Hogeweg
Erlend B. Nilsen
Robert B. O’Hara
Anders G. Finstad
author_facet Wouter Koch
Laurens Hogeweg
Erlend B. Nilsen
Robert B. O’Hara
Anders G. Finstad
author_sort Wouter Koch
collection DOAJ
description Citizen science and automated collection methods increasingly depend on image recognition to provide the amounts of observational data research and management needs. Recognition models, meanwhile, also require large amounts of data from these sources, creating a feedback loop between the methods and tools. Species that are harder to recognize, both for humans and machine learning algorithms, are likely to be under-reported, and thus be less prevalent in the training data. As a result, the feedback loop may hamper training mostly for species that already pose the greatest challenge. In this study, we trained recognition models for various taxa, and found evidence for a ‘recognizability bias’, where species that are more readily identified by humans and recognition models alike are more prevalent in the available image data. This pattern is present across multiple taxa, and does not appear to relate to differences in picture quality, biological traits or data collection metrics other than recognizability. This has implications for the expected performance of future models trained with more data, including such challenging species.
first_indexed 2024-04-09T21:16:14Z
format Article
id doaj.art-5a0595c2448f4d77ada09f3497e958bd
institution Directory Open Access Journal
issn 2054-5703
language English
last_indexed 2024-04-09T21:16:14Z
publishDate 2023-02-01
publisher The Royal Society
record_format Article
series Royal Society Open Science
spelling doaj.art-5a0595c2448f4d77ada09f3497e958bd2023-03-28T08:50:59ZengThe Royal SocietyRoyal Society Open Science2054-57032023-02-0110210.1098/rsos.221063Recognizability bias in citizen science photographsWouter Koch0Laurens Hogeweg1Erlend B. Nilsen2Robert B. O’Hara3Anders G. Finstad4Department of Natural History, Norwegian University of Science and Technology, 7491 Trondheim, NorwayIntel Benelux, High Tech Campus 83, 5656 AE Eindhoven, The NetherlandsNorwegian Institute for Nature Research, Postboks 5685 Torgarden, 7485 Trondheim, NorwayDepartment of Mathematical Sciences, Norwegian University of Science and Technology, 7491 Trondheim, NorwayDepartment of Natural History, Norwegian University of Science and Technology, 7491 Trondheim, NorwayCitizen science and automated collection methods increasingly depend on image recognition to provide the amounts of observational data research and management needs. Recognition models, meanwhile, also require large amounts of data from these sources, creating a feedback loop between the methods and tools. Species that are harder to recognize, both for humans and machine learning algorithms, are likely to be under-reported, and thus be less prevalent in the training data. As a result, the feedback loop may hamper training mostly for species that already pose the greatest challenge. In this study, we trained recognition models for various taxa, and found evidence for a ‘recognizability bias’, where species that are more readily identified by humans and recognition models alike are more prevalent in the available image data. This pattern is present across multiple taxa, and does not appear to relate to differences in picture quality, biological traits or data collection metrics other than recognizability. This has implications for the expected performance of future models trained with more data, including such challenging species.https://royalsocietypublishing.org/doi/10.1098/rsos.221063citizen scienceimage recognitionmachine learningrecognizability
spellingShingle Wouter Koch
Laurens Hogeweg
Erlend B. Nilsen
Robert B. O’Hara
Anders G. Finstad
Recognizability bias in citizen science photographs
Royal Society Open Science
citizen science
image recognition
machine learning
recognizability
title Recognizability bias in citizen science photographs
title_full Recognizability bias in citizen science photographs
title_fullStr Recognizability bias in citizen science photographs
title_full_unstemmed Recognizability bias in citizen science photographs
title_short Recognizability bias in citizen science photographs
title_sort recognizability bias in citizen science photographs
topic citizen science
image recognition
machine learning
recognizability
url https://royalsocietypublishing.org/doi/10.1098/rsos.221063
work_keys_str_mv AT wouterkoch recognizabilitybiasincitizensciencephotographs
AT laurenshogeweg recognizabilitybiasincitizensciencephotographs
AT erlendbnilsen recognizabilitybiasincitizensciencephotographs
AT robertbohara recognizabilitybiasincitizensciencephotographs
AT andersgfinstad recognizabilitybiasincitizensciencephotographs