Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development

BackgroundIn contrast to all other areas of medicine, psychiatry is still nearly entirely reliant on subjective assessments such as patient self-report and clinical observation. The lack of objective information on which to base clinical decisions can contribute to reduced qu...

Full description

Bibliographic Details
Main Authors: Michael L Birnbaum, Avner Abrami, Stephen Heisig, Asra Ali, Elizabeth Arenare, Carla Agurto, Nathaniel Lu, John M Kane, Guillermo Cecchi
Format: Article
Language:English
Published: JMIR Publications 2022-01-01
Series:JMIR Mental Health
Online Access:https://mental.jmir.org/2022/1/e24699
_version_ 1797735431810318336
author Michael L Birnbaum
Avner Abrami
Stephen Heisig
Asra Ali
Elizabeth Arenare
Carla Agurto
Nathaniel Lu
John M Kane
Guillermo Cecchi
author_facet Michael L Birnbaum
Avner Abrami
Stephen Heisig
Asra Ali
Elizabeth Arenare
Carla Agurto
Nathaniel Lu
John M Kane
Guillermo Cecchi
author_sort Michael L Birnbaum
collection DOAJ
description BackgroundIn contrast to all other areas of medicine, psychiatry is still nearly entirely reliant on subjective assessments such as patient self-report and clinical observation. The lack of objective information on which to base clinical decisions can contribute to reduced quality of care. Behavioral health clinicians need objective and reliable patient data to support effective targeted interventions. ObjectiveWe aimed to investigate whether reliable inferences—psychiatric signs, symptoms, and diagnoses—can be extracted from audiovisual patterns in recorded evaluation interviews of participants with schizophrenia spectrum disorders and bipolar disorder. MethodsWe obtained audiovisual data from 89 participants (mean age 25.3 years; male: 48/89, 53.9%; female: 41/89, 46.1%): individuals with schizophrenia spectrum disorders (n=41), individuals with bipolar disorder (n=21), and healthy volunteers (n=27). We developed machine learning models based on acoustic and facial movement features extracted from participant interviews to predict diagnoses and detect clinician-coded neuropsychiatric symptoms, and we assessed model performance using area under the receiver operating characteristic curve (AUROC) in 5-fold cross-validation. ResultsThe model successfully differentiated between schizophrenia spectrum disorders and bipolar disorder (AUROC 0.73) when aggregating face and voice features. Facial action units including cheek-raising muscle (AUROC 0.64) and chin-raising muscle (AUROC 0.74) provided the strongest signal for men. Vocal features, such as energy in the frequency band 1 to 4 kHz (AUROC 0.80) and spectral harmonicity (AUROC 0.78), provided the strongest signal for women. Lip corner–pulling muscle signal discriminated between diagnoses for both men (AUROC 0.61) and women (AUROC 0.62). Several psychiatric signs and symptoms were successfully inferred: blunted affect (AUROC 0.81), avolition (AUROC 0.72), lack of vocal inflection (AUROC 0.71), asociality (AUROC 0.63), and worthlessness (AUROC 0.61). ConclusionsThis study represents advancement in efforts to capitalize on digital data to improve diagnostic assessment and supports the development of a new generation of innovative clinical tools by employing acoustic and facial data analysis.
first_indexed 2024-03-12T12:58:00Z
format Article
id doaj.art-e9abbd1c1fef45c782e4385150d77103
institution Directory Open Access Journal
issn 2368-7959
language English
last_indexed 2024-03-12T12:58:00Z
publishDate 2022-01-01
publisher JMIR Publications
record_format Article
series JMIR Mental Health
spelling doaj.art-e9abbd1c1fef45c782e4385150d771032023-08-28T20:25:15ZengJMIR PublicationsJMIR Mental Health2368-79592022-01-0191e2469910.2196/24699Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm DevelopmentMichael L Birnbaumhttps://orcid.org/0000-0002-4285-7868Avner Abramihttps://orcid.org/0000-0003-3387-5607Stephen Heisighttps://orcid.org/0000-0001-8096-1730Asra Alihttps://orcid.org/0000-0001-8552-330XElizabeth Arenarehttps://orcid.org/0000-0003-0911-3207Carla Agurtohttps://orcid.org/0000-0002-0617-4488Nathaniel Luhttps://orcid.org/0000-0001-9695-2249John M Kanehttps://orcid.org/0000-0002-2628-9442Guillermo Cecchihttps://orcid.org/0000-0003-1013-8348 BackgroundIn contrast to all other areas of medicine, psychiatry is still nearly entirely reliant on subjective assessments such as patient self-report and clinical observation. The lack of objective information on which to base clinical decisions can contribute to reduced quality of care. Behavioral health clinicians need objective and reliable patient data to support effective targeted interventions. ObjectiveWe aimed to investigate whether reliable inferences—psychiatric signs, symptoms, and diagnoses—can be extracted from audiovisual patterns in recorded evaluation interviews of participants with schizophrenia spectrum disorders and bipolar disorder. MethodsWe obtained audiovisual data from 89 participants (mean age 25.3 years; male: 48/89, 53.9%; female: 41/89, 46.1%): individuals with schizophrenia spectrum disorders (n=41), individuals with bipolar disorder (n=21), and healthy volunteers (n=27). We developed machine learning models based on acoustic and facial movement features extracted from participant interviews to predict diagnoses and detect clinician-coded neuropsychiatric symptoms, and we assessed model performance using area under the receiver operating characteristic curve (AUROC) in 5-fold cross-validation. ResultsThe model successfully differentiated between schizophrenia spectrum disorders and bipolar disorder (AUROC 0.73) when aggregating face and voice features. Facial action units including cheek-raising muscle (AUROC 0.64) and chin-raising muscle (AUROC 0.74) provided the strongest signal for men. Vocal features, such as energy in the frequency band 1 to 4 kHz (AUROC 0.80) and spectral harmonicity (AUROC 0.78), provided the strongest signal for women. Lip corner–pulling muscle signal discriminated between diagnoses for both men (AUROC 0.61) and women (AUROC 0.62). Several psychiatric signs and symptoms were successfully inferred: blunted affect (AUROC 0.81), avolition (AUROC 0.72), lack of vocal inflection (AUROC 0.71), asociality (AUROC 0.63), and worthlessness (AUROC 0.61). ConclusionsThis study represents advancement in efforts to capitalize on digital data to improve diagnostic assessment and supports the development of a new generation of innovative clinical tools by employing acoustic and facial data analysis.https://mental.jmir.org/2022/1/e24699
spellingShingle Michael L Birnbaum
Avner Abrami
Stephen Heisig
Asra Ali
Elizabeth Arenare
Carla Agurto
Nathaniel Lu
John M Kane
Guillermo Cecchi
Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development
JMIR Mental Health
title Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development
title_full Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development
title_fullStr Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development
title_full_unstemmed Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development
title_short Acoustic and Facial Features From Clinical Interviews for Machine Learning–Based Psychiatric Diagnosis: Algorithm Development
title_sort acoustic and facial features from clinical interviews for machine learning based psychiatric diagnosis algorithm development
url https://mental.jmir.org/2022/1/e24699
work_keys_str_mv AT michaellbirnbaum acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT avnerabrami acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT stephenheisig acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT asraali acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT elizabetharenare acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT carlaagurto acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT nathaniellu acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT johnmkane acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment
AT guillermocecchi acousticandfacialfeaturesfromclinicalinterviewsformachinelearningbasedpsychiatricdiagnosisalgorithmdevelopment