Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information

Recent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-...

Full description

Bibliographic Details
Main Authors: A-Hyeon Jo, Keun-Chang Kwak
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9998535/
_version_ 1797974446177255424
author A-Hyeon Jo
Keun-Chang Kwak
author_facet A-Hyeon Jo
Keun-Chang Kwak
author_sort A-Hyeon Jo
collection DOAJ
description Recent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-modal approaches in deep learning technology, which enable utilization of various information through fusion of varied data types. This study proposes a four-stream-based depression diagnosis model consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and convolutional neural networks (CNN), using speech and text data. One-dimensional features of audio signals are extracted using Mel Frequency Cepstral Coefficients and Gammatone Cepstral Coefficients, and two-dimensional features are extracted from Bark, equivalent rectangular bandwidth, and Log-Mel spectrograms, based on time-frequency transform. The extracted features are applied to Bi-LSTM and CNN-based transfer learning models. Word encoding was used for mapping of text to sequences with numeric indices, and word embedding used for representation of all words in numeric dense vectors. These were applied to Bi-LSTM and n-gram-based CNN models. Finally, an ensemble of the softmax values output from the four deep learning models was used to perform depression diagnosis, based on the proposed four-stream model. Using the proposed model, experiments were performed with the Extended Distress Analysis Interview Corpus Wizard of Oz depression database and other datasets. Experimental results showed improved performance by 10.7% to 11.9% over two-stream-based state-of-the-art methods. This demonstrates that the proposed model is effective for depression diagnosis.
first_indexed 2024-04-11T04:20:05Z
format Article
id doaj.art-f1a9ff88c7d3434ab9417d2d3e0fd007
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T04:20:05Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f1a9ff88c7d3434ab9417d2d3e0fd0072022-12-31T00:01:09ZengIEEEIEEE Access2169-35362022-01-011013411313413510.1109/ACCESS.2022.32318849998535Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text InformationA-Hyeon Jo0https://orcid.org/0000-0003-1909-5655Keun-Chang Kwak1https://orcid.org/0000-0002-3821-0711Department of Electronic Engineering, Chosun University, Gwangju, South KoreaDepartment of Electronic Engineering, Chosun University, Gwangju, South KoreaRecent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-modal approaches in deep learning technology, which enable utilization of various information through fusion of varied data types. This study proposes a four-stream-based depression diagnosis model consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and convolutional neural networks (CNN), using speech and text data. One-dimensional features of audio signals are extracted using Mel Frequency Cepstral Coefficients and Gammatone Cepstral Coefficients, and two-dimensional features are extracted from Bark, equivalent rectangular bandwidth, and Log-Mel spectrograms, based on time-frequency transform. The extracted features are applied to Bi-LSTM and CNN-based transfer learning models. Word encoding was used for mapping of text to sequences with numeric indices, and word embedding used for representation of all words in numeric dense vectors. These were applied to Bi-LSTM and n-gram-based CNN models. Finally, an ensemble of the softmax values output from the four deep learning models was used to perform depression diagnosis, based on the proposed four-stream model. Using the proposed model, experiments were performed with the Extended Distress Analysis Interview Corpus Wizard of Oz depression database and other datasets. Experimental results showed improved performance by 10.7% to 11.9% over two-stream-based state-of-the-art methods. This demonstrates that the proposed model is effective for depression diagnosis.https://ieeexplore.ieee.org/document/9998535/Artificial intelligencedepression diagnosismulti-modalfour-streambidirectional long short-term memoryconvolutional neural networks
spellingShingle A-Hyeon Jo
Keun-Chang Kwak
Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
IEEE Access
Artificial intelligence
depression diagnosis
multi-modal
four-stream
bidirectional long short-term memory
convolutional neural networks
title Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_full Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_fullStr Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_full_unstemmed Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_short Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_sort diagnosis of depression based on four stream model of bi lstm and cnn from audio and text information
topic Artificial intelligence
depression diagnosis
multi-modal
four-stream
bidirectional long short-term memory
convolutional neural networks
url https://ieeexplore.ieee.org/document/9998535/
work_keys_str_mv AT ahyeonjo diagnosisofdepressionbasedonfourstreammodelofbilstmandcnnfromaudioandtextinformation
AT keunchangkwak diagnosisofdepressionbasedonfourstreammodelofbilstmandcnnfromaudioandtextinformation