A review of supervised learning methods for classifying animal behavioural states from environmental features

Abstract Accurately predicting behavioural modes of animals in response to environmental features is important for ecology and conservation. Supervised learning (SL) methods are increasingly common in animal movement ecology for classifying behavioural modes. However, few examples exist of applying...

Full description

Bibliographic Details
Main Authors: Silas Bergen, Manuela M. Huso, Adam E. Duerr, Melissa A. Braham, Sara Schmuecker, Tricia A. Miller, Todd E. Katzner
Format: Article
Language:English
Published: Wiley 2023-01-01
Series:Methods in Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1111/2041-210X.14019
Description
Summary:Abstract Accurately predicting behavioural modes of animals in response to environmental features is important for ecology and conservation. Supervised learning (SL) methods are increasingly common in animal movement ecology for classifying behavioural modes. However, few examples exist of applying SL to classify polytomous animal behaviour from environmental features especially in the context of millions of animal observations. We review SL methods (weighted k‐nearest neighbours; neural nets; random forests; and boosted classification trees with XGBoost) for classifying polytomous animal behaviour from environmental predictors. We also describe tuning parameter selection and assessment strategies, approaches for visualizing relationships between predictors and class outputs, and computational considerations. We demonstrate these methods by predicting three categories of risk to bald eagles from colliding with wind turbines using, as predictors, 12 environmental state features associated with 1.7 million GPS telemetry data points from 57 eagles. Of the SL methods we considered, XGBoost yielded the most accurate model with 86.2% classification accuracy and pairwise‐averaged area under the ROC curve of 90.6. Computational time of XGBoost scaled better to large data than any other SL method. We also show how SHAP values integrated in the R package (xgboost) facilitate investigation of variable relationships and importance. For big data applications, XGBoost appears to provide superior classification accuracy and computational efficiency. Our results suggest XGBoost should be considered as an early modelling option in situations where the intent is to classify millions of animal behaviour observations from environmental predictors and to understand relationships between those predictors and movement behaviours. We also offer a tutorial to assist researchers in implementing this method.
ISSN:2041-210X