The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
Abstract Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the perform...
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-11-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-022-22258-1 |
_version_ | 1828094292591116288 |
---|---|
author | Marianna Chimienti Akiko Kato Olivia Hicks Frédéric Angelier Michaël Beaulieu Jazel Ouled-Cheikh Coline Marciau Thierry Raclot Meagan Tucker Danuta Maria Wisniewska André Chiaradia Yan Ropert-Coudert |
author_facet | Marianna Chimienti Akiko Kato Olivia Hicks Frédéric Angelier Michaël Beaulieu Jazel Ouled-Cheikh Coline Marciau Thierry Raclot Meagan Tucker Danuta Maria Wisniewska André Chiaradia Yan Ropert-Coudert |
author_sort | Marianna Chimienti |
collection | DOAJ |
description | Abstract Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure. |
first_indexed | 2024-04-11T06:56:47Z |
format | Article |
id | doaj.art-76381449fbf94c0e953441686b81024a |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-11T06:56:47Z |
publishDate | 2022-11-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-76381449fbf94c0e953441686b81024a2022-12-22T04:39:00ZengNature PortfolioScientific Reports2045-23222022-11-0112111310.1038/s41598-022-22258-1The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasetsMarianna Chimienti0Akiko Kato1Olivia Hicks2Frédéric Angelier3Michaël Beaulieu4Jazel Ouled-Cheikh5Coline Marciau6Thierry Raclot7Meagan Tucker8Danuta Maria Wisniewska9André Chiaradia10Yan Ropert-Coudert11Centre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéGerman Oceanographic MuseumInstitut de Recerca de la Biodiversitat (IRBio) and Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals (BEECA), Facultat de Biologia, Universitat de Barcelona.Centre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéInstitut Pluridisciplinaire Hubert Curien, UMR7178, CNRS-Universite de StrasbourgConservation Department, Phillip Island Nature ParksSound Communication and Behaviour Group, Department of Biology, University of Southern DenmarkConservation Department, Phillip Island Nature ParksCentre d’Etudes Biologiques de Chizé, UMR7372 CNRS - La Rochelle UniversitéAbstract Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure.https://doi.org/10.1038/s41598-022-22258-1 |
spellingShingle | Marianna Chimienti Akiko Kato Olivia Hicks Frédéric Angelier Michaël Beaulieu Jazel Ouled-Cheikh Coline Marciau Thierry Raclot Meagan Tucker Danuta Maria Wisniewska André Chiaradia Yan Ropert-Coudert The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets Scientific Reports |
title | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_full | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_fullStr | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_full_unstemmed | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_short | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_sort | role of individual variability on the predictive performance of machine learning applied to large bio logging datasets |
url | https://doi.org/10.1038/s41598-022-22258-1 |
work_keys_str_mv | AT mariannachimienti theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT akikokato theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT oliviahicks theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT fredericangelier theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT michaelbeaulieu theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT jazelouledcheikh theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT colinemarciau theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT thierryraclot theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT meagantucker theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT danutamariawisniewska theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT andrechiaradia theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT yanropertcoudert theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT mariannachimienti roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT akikokato roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT oliviahicks roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT fredericangelier roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT michaelbeaulieu roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT jazelouledcheikh roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT colinemarciau roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT thierryraclot roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT meagantucker roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT danutamariawisniewska roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT andrechiaradia roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT yanropertcoudert roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets |