Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity

Abstract The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In th...

Full description

Bibliographic Details
Main Authors: Sri Harsha Dumpala, Katerina Dikaios, Sebastian Rodriguez, Ross Langley, Sheri Rempel, Rudolf Uher, Sageev Oore
Format: Article
Language:English
Published: Nature Portfolio 2023-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-35184-7
_version_ 1797778942141136896
author Sri Harsha Dumpala
Katerina Dikaios
Sebastian Rodriguez
Ross Langley
Sheri Rempel
Rudolf Uher
Sageev Oore
author_facet Sri Harsha Dumpala
Katerina Dikaios
Sebastian Rodriguez
Ross Langley
Sheri Rempel
Rudolf Uher
Sageev Oore
author_sort Sri Harsha Dumpala
collection DOAJ
description Abstract The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper, we test the hypothesis that the representations of personal identity in speech, known as speaker embeddings, improve the detection of depression and estimation of depressive symptoms severity. We further examine whether changes in depression severity interfere with the recognition of speaker’s identity. We extract speaker embeddings from models pre-trained on a large sample of speakers from the general population without information on depression diagnosis. We test these speaker embeddings for severity estimation in independent datasets consisting of clinical interviews (DAIC-WOZ), spontaneous speech (VocalMind), and longitudinal data (VocalMind). We also use the severity estimates to predict presence of depression. Speaker embeddings, combined with established acoustic features (OpenSMILE), predicted severity with root mean square error (RMSE) values of 6.01 and 6.28 in DAIC-WOZ and VocalMind datasets, respectively, lower than acoustic features alone or speaker embeddings alone. When used to detect depression, speaker embeddings showed higher balanced accuracy (BAc) and surpassed previous state-of-the-art performance in depression detection from speech, with BAc values of 66% and 64% in DAIC-WOZ and VocalMind datasets, respectively. Results from a subset of participants with repeated speech samples show that the speaker identification is affected by changes in depression severity. These results suggest that depression overlaps with personal identity in the acoustic space. While speaker embeddings improve depression detection and severity estimation, deterioration or improvement in mood may interfere with speaker verification.
first_indexed 2024-03-12T23:24:48Z
format Article
id doaj.art-ffc015bf2c1a4d158974c938a4ffac9f
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-12T23:24:48Z
publishDate 2023-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-ffc015bf2c1a4d158974c938a4ffac9f2023-07-16T11:16:27ZengNature PortfolioScientific Reports2045-23222023-07-0113111110.1038/s41598-023-35184-7Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identitySri Harsha Dumpala0Katerina Dikaios1Sebastian Rodriguez2Ross Langley3Sheri Rempel4Rudolf Uher5Sageev Oore6Faculty of Computer Science, Dalhousie UniversityDalhousie UniversityFaculty of Computer Science, Dalhousie UniversityDalhousie UniversityNova Scotia HealthDalhousie UniversityFaculty of Computer Science, Dalhousie UniversityAbstract The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper, we test the hypothesis that the representations of personal identity in speech, known as speaker embeddings, improve the detection of depression and estimation of depressive symptoms severity. We further examine whether changes in depression severity interfere with the recognition of speaker’s identity. We extract speaker embeddings from models pre-trained on a large sample of speakers from the general population without information on depression diagnosis. We test these speaker embeddings for severity estimation in independent datasets consisting of clinical interviews (DAIC-WOZ), spontaneous speech (VocalMind), and longitudinal data (VocalMind). We also use the severity estimates to predict presence of depression. Speaker embeddings, combined with established acoustic features (OpenSMILE), predicted severity with root mean square error (RMSE) values of 6.01 and 6.28 in DAIC-WOZ and VocalMind datasets, respectively, lower than acoustic features alone or speaker embeddings alone. When used to detect depression, speaker embeddings showed higher balanced accuracy (BAc) and surpassed previous state-of-the-art performance in depression detection from speech, with BAc values of 66% and 64% in DAIC-WOZ and VocalMind datasets, respectively. Results from a subset of participants with repeated speech samples show that the speaker identification is affected by changes in depression severity. These results suggest that depression overlaps with personal identity in the acoustic space. While speaker embeddings improve depression detection and severity estimation, deterioration or improvement in mood may interfere with speaker verification.https://doi.org/10.1038/s41598-023-35184-7
spellingShingle Sri Harsha Dumpala
Katerina Dikaios
Sebastian Rodriguez
Ross Langley
Sheri Rempel
Rudolf Uher
Sageev Oore
Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
Scientific Reports
title Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_full Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_fullStr Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_full_unstemmed Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_short Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_sort manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
url https://doi.org/10.1038/s41598-023-35184-7
work_keys_str_mv AT sriharshadumpala manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity
AT katerinadikaios manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity
AT sebastianrodriguez manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity
AT rosslangley manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity
AT sherirempel manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity
AT rudolfuher manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity
AT sageevoore manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity