Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia
Acoustic speech markers are well-characterized in Frontotemporal Dementia (FTD), a heterogeneous spectrum of progressive neurodegenerative diseases that can affect speech production and comprehension as well as higher-order cognition, behavior, and motor control. While profound apathy and deficits i...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/147525 |
_version_ | 1826211001090965504 |
---|---|
author | Parllaku, Fjona |
author2 | Glass, James |
author_facet | Glass, James Parllaku, Fjona |
author_sort | Parllaku, Fjona |
collection | MIT |
description | Acoustic speech markers are well-characterized in Frontotemporal Dementia (FTD), a heterogeneous spectrum of progressive neurodegenerative diseases that can affect speech production and comprehension as well as higher-order cognition, behavior, and motor control. While profound apathy and deficits in emotion processing are also common symptoms, emotional content has yet to be explored in acoustic models of speech. We retrospectively analyze a dataset of standard elicited speech tasks from 69 FTD and 131 healthy elderly controls seen at the University of Melbourne. We develop two ResNet50 models to classify FTD vs healthy elderly controls using spectrograms of speech samples: 1) a naive model, and 2) a model that was pretrained on an emotions speech dataset. We compare the validation accuracies of the two models on different speech tasks. The pre-trained model better classifies FTD vs. healthy elderly controls, and the behavioral variant of FTD (bvFTD) vs. healthy elderly controls with validation accuracy scores of 79% and 84% respectively in the monologue speech task, and 93% and 90% in the picture description one. When considered singularly, the ‘happy’ emotion best discriminates between FTD vs healthy elderly controls compared to other latent emotions. Pre-training acoustic models on latent emotion increases the classification accuracy for FTD. We demonstrate the greatest improvement in model performance on elicited speech tasks with greater emotional content. Considered more broadly, our findings suggest that inclusion of latent emotion in acoustic classification models provides a benefit in neurologic diseases that affect emotion. |
first_indexed | 2024-09-23T14:59:10Z |
format | Thesis |
id | mit-1721.1/147525 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:59:10Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1475252023-01-20T03:13:27Z Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia Parllaku, Fjona Glass, James Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Acoustic speech markers are well-characterized in Frontotemporal Dementia (FTD), a heterogeneous spectrum of progressive neurodegenerative diseases that can affect speech production and comprehension as well as higher-order cognition, behavior, and motor control. While profound apathy and deficits in emotion processing are also common symptoms, emotional content has yet to be explored in acoustic models of speech. We retrospectively analyze a dataset of standard elicited speech tasks from 69 FTD and 131 healthy elderly controls seen at the University of Melbourne. We develop two ResNet50 models to classify FTD vs healthy elderly controls using spectrograms of speech samples: 1) a naive model, and 2) a model that was pretrained on an emotions speech dataset. We compare the validation accuracies of the two models on different speech tasks. The pre-trained model better classifies FTD vs. healthy elderly controls, and the behavioral variant of FTD (bvFTD) vs. healthy elderly controls with validation accuracy scores of 79% and 84% respectively in the monologue speech task, and 93% and 90% in the picture description one. When considered singularly, the ‘happy’ emotion best discriminates between FTD vs healthy elderly controls compared to other latent emotions. Pre-training acoustic models on latent emotion increases the classification accuracy for FTD. We demonstrate the greatest improvement in model performance on elicited speech tasks with greater emotional content. Considered more broadly, our findings suggest that inclusion of latent emotion in acoustic classification models provides a benefit in neurologic diseases that affect emotion. M.Eng. 2023-01-19T19:56:09Z 2023-01-19T19:56:09Z 2022-09 2022-09-16T20:24:04.575Z Thesis https://hdl.handle.net/1721.1/147525 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Parllaku, Fjona Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia |
title | Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia |
title_full | Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia |
title_fullStr | Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia |
title_full_unstemmed | Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia |
title_short | Speech-Based Artificial Intelligence Emotion Biomarkers in Frontotemporal Dementia |
title_sort | speech based artificial intelligence emotion biomarkers in frontotemporal dementia |
url | https://hdl.handle.net/1721.1/147525 |
work_keys_str_mv | AT parllakufjona speechbasedartificialintelligenceemotionbiomarkersinfrontotemporaldementia |