Zero-shot personalization of speech foundation models for depressed mood monitoring

Summary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mos...

Full description

Bibliographic Details
Main Authors:	Maurice Gerczuk, Andreas Triantafyllopoulos, Shahin Amiriparian, Alexander Kathan, Jonathan Bauer, Matthias Berking, Björn W. Schuller
Format:	Article
Language:	English
Published:	Elsevier 2023-11-01
Series:	Patterns
Subjects:	DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666389923002635

_version_	1797630638710325248
author	Maurice Gerczuk Andreas Triantafyllopoulos Shahin Amiriparian Alexander Kathan Jonathan Bauer Matthias Berking Björn W. Schuller
author_facet	Maurice Gerczuk Andreas Triantafyllopoulos Shahin Amiriparian Alexander Kathan Jonathan Bauer Matthias Berking Björn W. Schuller
author_sort	Maurice Gerczuk
collection	DOAJ
description	Summary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness. The bigger picture: Depression, as one of the most prevalent mental health diseases, negatively impacts millions of lives. Diagnoses are achieved by the assessment of symptoms with standardized tests. However, recent studies indicate that continuously monitoring symptoms (e.g., with ecological momentary assessments [EMAs]) may provide relevant additional information for both diagnosis and treatment decisions. More recently, these manual methods have been complemented by passive sensing methods. Here, speech can serve as a valuable objective marker because it has been shown to be impacted by various pathologies, such as anxiety and mood disorders, and can be collected non-invasively and cheaply. Existing machine learning methods that aim to measure mood, however, often fail to accurately model intra-individual variations, assuming that data are sourced from homogeneous populations. We introduce and evaluate an effective zero-shot personalization of speech foundation models that utilizes diagnostic information about each patient to improve per-speaker depressive mood recognition over a 2-week EMA period.
first_indexed	2024-03-11T11:09:56Z
format	Article
id	doaj.art-e54ff776f2cd4a6e8cf760064c66b82c
institution	Directory Open Access Journal
issn	2666-3899
language	English
last_indexed	2024-03-11T11:09:56Z
publishDate	2023-11-01
publisher	Elsevier
record_format	Article
series	Patterns
spelling	doaj.art-e54ff776f2cd4a6e8cf760064c66b82c2023-11-12T04:41:05ZengElsevierPatterns2666-38992023-11-01411100873Zero-shot personalization of speech foundation models for depressed mood monitoringMaurice Gerczuk0Andreas Triantafyllopoulos1Shahin Amiriparian2Alexander Kathan3Jonathan Bauer4Matthias Berking5Björn W. Schuller6Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany; Corresponding authorChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, GermanyDepartment of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, GermanyDepartment of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, GermanyChair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany; GLAM, Imperial College, London, UKSummary: The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness. The bigger picture: Depression, as one of the most prevalent mental health diseases, negatively impacts millions of lives. Diagnoses are achieved by the assessment of symptoms with standardized tests. However, recent studies indicate that continuously monitoring symptoms (e.g., with ecological momentary assessments [EMAs]) may provide relevant additional information for both diagnosis and treatment decisions. More recently, these manual methods have been complemented by passive sensing methods. Here, speech can serve as a valuable objective marker because it has been shown to be impacted by various pathologies, such as anxiety and mood disorders, and can be collected non-invasively and cheaply. Existing machine learning methods that aim to measure mood, however, often fail to accurately model intra-individual variations, assuming that data are sourced from homogeneous populations. We introduce and evaluate an effective zero-shot personalization of speech foundation models that utilizes diagnostic information about each patient to improve per-speaker depressive mood recognition over a 2-week EMA period.http://www.sciencedirect.com/science/article/pii/S2666389923002635DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
spellingShingle	Maurice Gerczuk Andreas Triantafyllopoulos Shahin Amiriparian Alexander Kathan Jonathan Bauer Matthias Berking Björn W. Schuller Zero-shot personalization of speech foundation models for depressed mood monitoring Patterns DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
title	Zero-shot personalization of speech foundation models for depressed mood monitoring
title_full	Zero-shot personalization of speech foundation models for depressed mood monitoring
title_fullStr	Zero-shot personalization of speech foundation models for depressed mood monitoring
title_full_unstemmed	Zero-shot personalization of speech foundation models for depressed mood monitoring
title_short	Zero-shot personalization of speech foundation models for depressed mood monitoring
title_sort	zero shot personalization of speech foundation models for depressed mood monitoring
topic	DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
url	http://www.sciencedirect.com/science/article/pii/S2666389923002635
work_keys_str_mv	AT mauricegerczuk zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT andreastriantafyllopoulos zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT shahinamiriparian zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT alexanderkathan zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT jonathanbauer zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT matthiasberking zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring AT bjornwschuller zeroshotpersonalizationofspeechfoundationmodelsfordepressedmoodmonitoring

Zero-shot personalization of speech foundation models for depressed mood monitoring

Similar Items