Zero-shot personalization of speech foundation models for depressed mood monitoring
Summary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mos...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-11-01
|
Series: | Patterns |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666389923002635 |
_version_ | 1797630638710325248 |
---|---|
author | Maurice Gerczuk Andreas Triantafyllopoulos Shahin Amiriparian Alexander Kathan Jonathan Bauer Matthias Berking Björn W. Schuller |
author_facet | Maurice Gerczuk Andreas Triantafyllopoulos Shahin Amiriparian Alexander Kathan Jonathan Bauer Matthias Berking Björn W. Schuller |
author_sort | Maurice Gerczuk |
collection | DOAJ |
description | Summary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness. The bigger picture: Depression, as one of the most prevalent mental health diseases, negatively impacts millions of lives. Diagnoses are achieved by the assessment of symptoms with standardized tests. However, recent studies indicate that continuously monitoring symptoms (e.g., with ecological momentary assessments [EMAs]) may provide relevant additional information for both diagnosis and treatment decisions. More recently, these manual methods have been complemented by passive sensing methods. Here, speech can serve as a valuable objective marker because it has been shown to be impacted by various pathologies, such as anxiety and mood disorders, and can be collected non-invasively and cheaply. Existing machine learning methods that aim to measure mood, however, often fail to accurately model intra-individual variations, assuming that data are sourced from homogeneous populations. We introduce and evaluate an effective zero-shot personalization of speech foundation models that utilizes diagnostic information about each patient to improve per-speaker depressive mood recognition over a 2-week EMA period. |
first_indexed | 2024-03-11T11:09:56Z |
format | Article |
id | doaj.art-e54ff776f2cd4a6e8cf760064c66b82c |
institution | Directory Open Access Journal |
issn | 2666-3899 |
language | English |
last_indexed | 2024-03-11T11:09:56Z |
publishDate | 2023-11-01 |
publisher | Elsevier |
record_format | Article |
series | Patterns |
spelling | doaj.art-e54ff776f2cd4a6e8cf760064c66b82c2023-11-12T04:41:05ZengElsevierPatterns2666-38992023-11-01411100873Zero-shot personalization of speech foundation models for depressed mood monitoringMaurice Gerczuk0Andreas Triantafyllopoulos1Shahin Amiriparian2Alexander Kathan3Jonathan Bauer4Matthias Berking5Björn W. Schuller6Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany; Corresponding authorChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyDepartment of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, GermanyDepartment of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany; GLAM, Imperial College, London, UKSummary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness. The bigger picture: Depression, as one of the most prevalent mental health diseases, negatively impacts millions of lives. Diagnoses are achieved by the assessment of symptoms with standardized tests. However, recent studies indicate that continuously monitoring symptoms (e.g., with ecological momentary assessments [EMAs]) may provide relevant additional information for both diagnosis and treatment decisions. More recently, these manual methods have been complemented by passive sensing methods. Here, speech can serve as a valuable objective marker because it has been shown to be impacted by various pathologies, such as anxiety and mood disorders, and can be collected non-invasively and cheaply. Existing machine learning methods that aim to measure mood, however, often fail to accurately model intra-individual variations, assuming that data are sourced from homogeneous populations. We introduce and evaluate an effective zero-shot personalization of speech foundation models that utilizes diagnostic information about each patient to improve per-speaker depressive mood recognition over a 2-week EMA period.http://www.sciencedirect.com/science/article/pii/S2666389923002635DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem |
spellingShingle | Maurice Gerczuk Andreas Triantafyllopoulos Shahin Amiriparian Alexander Kathan Jonathan Bauer Matthias Berking Björn W. Schuller Zero-shot personalization of speech foundation models for depressed mood monitoring Patterns DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem |
title | Zero-shot personalization of speech foundation models for depressed mood monitoring |
title_full | Zero-shot personalization of speech foundation models for depressed mood monitoring |
title_fullStr | Zero-shot personalization of speech foundation models for depressed mood monitoring |
title_full_unstemmed | Zero-shot personalization of speech foundation models for depressed mood monitoring |
title_short | Zero-shot personalization of speech foundation models for depressed mood monitoring |
title_sort | zero shot personalization of speech foundation models for depressed mood monitoring |
topic | DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem |
url | http://www.sciencedirect.com/science/article/pii/S2666389923002635 |
work_keys_str_mv | AT mauricegerczuk zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT andreastriantafyllopoulos zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT shahinamiriparian zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT alexanderkathan zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT jonathanbauer zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT matthiasberking zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT bjornwschuller zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring |