The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets

Abstract Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the perform...

Full description

Bibliographic Details
Main Authors: Marianna Chimienti, Akiko Kato, Olivia Hicks, Frédéric Angelier, Michaël Beaulieu, Jazel Ouled-Cheikh, Coline Marciau, Thierry Raclot, Meagan Tucker, Danuta Maria Wisniewska, André Chiaradia, Yan Ropert-Coudert
Format: Article
Language:English
Published: Nature Portfolio 2022-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-22258-1
_version_ 1828094292591116288
author Marianna Chimienti
Akiko Kato
Olivia Hicks
Frédéric Angelier
Michaël Beaulieu
Jazel Ouled-Cheikh
Coline Marciau
Thierry Raclot
Meagan Tucker
Danuta Maria Wisniewska
André Chiaradia
Yan Ropert-Coudert
author_facet Marianna Chimienti
Akiko Kato
Olivia Hicks
Frédéric Angelier
Michaël Beaulieu
Jazel Ouled-Cheikh
Coline Marciau
Thierry Raclot
Meagan Tucker
Danuta Maria Wisniewska
André Chiaradia
Yan Ropert-Coudert
author_sort Marianna Chimienti
collection DOAJ
description Abstract Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure.
first_indexed 2024-04-11T06:56:47Z
format Article
id doaj.art-76381449fbf94c0e953441686b81024a
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-11T06:56:47Z
publishDate 2022-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-76381449fbf94c0e953441686b81024a2022-12-22T04:39:00ZengNature PortfolioScientific Reports2045-23222022-11-0112111310.1038/s41598-022-22258-1The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasetsMarianna Chimienti0Akiko Kato1Olivia Hicks2Frédéric Angelier3Michaël Beaulieu4Jazel Ouled-Cheikh5Coline Marciau6Thierry Raclot7Meagan Tucker8Danuta Maria Wisniewska9André Chiaradia10Yan Ropert-Coudert11Centre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéGerman Oceanographic MuseumInstitut de Recerca de la Biodiversitat (IRBio) and Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals (BEECA), Facultat de Biologia, Universitat de Barcelona.Centre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéInstitut Pluridisciplinaire Hubert Curien, UMR7178, CNRS-Universite de StrasbourgConservation Department, Phillip Island Nature ParksSound Communication and Behaviour Group, Department of Biology, University of Southern DenmarkConservation Department, Phillip Island Nature ParksCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéAbstract Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure.https://doi.org/10.1038/s41598-022-22258-1
spellingShingle Marianna Chimienti
Akiko Kato
Olivia Hicks
Frédéric Angelier
Michaël Beaulieu
Jazel Ouled-Cheikh
Coline Marciau
Thierry Raclot
Meagan Tucker
Danuta Maria Wisniewska
André Chiaradia
Yan Ropert-Coudert
The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
Scientific Reports
title The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_full The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_fullStr The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_full_unstemmed The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_short The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_sort role of individual variability on the predictive performance of machine learning applied to large bio logging datasets
url https://doi.org/10.1038/s41598-022-22258-1
work_keys_str_mv AT mariannachimienti theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT akikokato theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT oliviahicks theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT fredericangelier theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT michaelbeaulieu theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT jazelouledcheikh theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT colinemarciau theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT thierryraclot theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT meagantucker theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT danutamariawisniewska theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT andrechiaradia theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT yanropertcoudert theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT mariannachimienti roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT akikokato roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT oliviahicks roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT fredericangelier roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT michaelbeaulieu roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT jazelouledcheikh roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT colinemarciau roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT thierryraclot roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT meagantucker roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT danutamariawisniewska roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT andrechiaradia roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT yanropertcoudert roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets