Zero-shot personalization of speech foundation models for depressed mood monitoring

Summary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mos...

Full description

Bibliographic Details
Main Authors: Maurice Gerczuk, Andreas Triantafyllopoulos, Shahin Amiriparian, Alexander Kathan, Jonathan Bauer, Matthias Berking, Björn W. Schuller
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:Patterns
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666389923002635
_version_ 1797630638710325248
author Maurice Gerczuk
Andreas Triantafyllopoulos
Shahin Amiriparian
Alexander Kathan
Jonathan Bauer
Matthias Berking
Björn W. Schuller
author_facet Maurice Gerczuk
Andreas Triantafyllopoulos
Shahin Amiriparian
Alexander Kathan
Jonathan Bauer
Matthias Berking
Björn W. Schuller
author_sort Maurice Gerczuk
collection DOAJ
description Summary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness. The bigger picture: Depression, as one of the most prevalent mental health diseases, negatively impacts millions of lives. Diagnoses are achieved by the assessment of symptoms with standardized tests. However, recent studies indicate that continuously monitoring symptoms (e.g., with ecological momentary assessments [EMAs]) may provide relevant additional information for both diagnosis and treatment decisions. More recently, these manual methods have been complemented by passive sensing methods. Here, speech can serve as a valuable objective marker because it has been shown to be impacted by various pathologies, such as anxiety and mood disorders, and can be collected non-invasively and cheaply. Existing machine learning methods that aim to measure mood, however, often fail to accurately model intra-individual variations, assuming that data are sourced from homogeneous populations. We introduce and evaluate an effective zero-shot personalization of speech foundation models that utilizes diagnostic information about each patient to improve per-speaker depressive mood recognition over a 2-week EMA period.
first_indexed 2024-03-11T11:09:56Z
format Article
id doaj.art-e54ff776f2cd4a6e8cf760064c66b82c
institution Directory Open Access Journal
issn 2666-3899
language English
last_indexed 2024-03-11T11:09:56Z
publishDate 2023-11-01
publisher Elsevier
record_format Article
series Patterns
spelling doaj.art-e54ff776f2cd4a6e8cf760064c66b82c2023-11-12T04:41:05ZengElsevierPatterns2666-38992023-11-01411100873Zero-shot personalization of speech foundation models for depressed mood monitoringMaurice Gerczuk0Andreas Triantafyllopoulos1Shahin Amiriparian2Alexander Kathan3Jonathan Bauer4Matthias Berking5Björn W. Schuller6Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany; Corresponding authorChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyDepartment of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, GermanyDepartment of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany; GLAM, Imperial College, London, UKSummary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness. The bigger picture: Depression, as one of the most prevalent mental health diseases, negatively impacts millions of lives. Diagnoses are achieved by the assessment of symptoms with standardized tests. However, recent studies indicate that continuously monitoring symptoms (e.g., with ecological momentary assessments [EMAs]) may provide relevant additional information for both diagnosis and treatment decisions. More recently, these manual methods have been complemented by passive sensing methods. Here, speech can serve as a valuable objective marker because it has been shown to be impacted by various pathologies, such as anxiety and mood disorders, and can be collected non-invasively and cheaply. Existing machine learning methods that aim to measure mood, however, often fail to accurately model intra-individual variations, assuming that data are sourced from homogeneous populations. We introduce and evaluate an effective zero-shot personalization of speech foundation models that utilizes diagnostic information about each patient to improve per-speaker depressive mood recognition over a 2-week EMA period.http://www.sciencedirect.com/science/article/pii/S2666389923002635DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
spellingShingle Maurice Gerczuk
Andreas Triantafyllopoulos
Shahin Amiriparian
Alexander Kathan
Jonathan Bauer
Matthias Berking
Björn W. Schuller
Zero-shot personalization of speech foundation models for depressed mood monitoring
Patterns
DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
title Zero-shot personalization of speech foundation models for depressed mood monitoring
title_full Zero-shot personalization of speech foundation models for depressed mood monitoring
title_fullStr Zero-shot personalization of speech foundation models for depressed mood monitoring
title_full_unstemmed Zero-shot personalization of speech foundation models for depressed mood monitoring
title_short Zero-shot personalization of speech foundation models for depressed mood monitoring
title_sort zero shot personalization of speech foundation models for depressed mood monitoring
topic DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
url http://www.sciencedirect.com/science/article/pii/S2666389923002635
work_keys_str_mv AT mauricegerczuk zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring
AT andreastriantafyllopoulos zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring
AT shahinamiriparian zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring
AT alexanderkathan zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring
AT jonathanbauer zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring
AT matthiasberking zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring
AT bjornwschuller zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring