Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
Recent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9998535/ |
_version_ | 1797974446177255424 |
---|---|
author | A-Hyeon Jo Keun-Chang Kwak |
author_facet | A-Hyeon Jo Keun-Chang Kwak |
author_sort | A-Hyeon Jo |
collection | DOAJ |
description | Recent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-modal approaches in deep learning technology, which enable utilization of various information through fusion of varied data types. This study proposes a four-stream-based depression diagnosis model consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and convolutional neural networks (CNN), using speech and text data. One-dimensional features of audio signals are extracted using Mel Frequency Cepstral Coefficients and Gammatone Cepstral Coefficients, and two-dimensional features are extracted from Bark, equivalent rectangular bandwidth, and Log-Mel spectrograms, based on time-frequency transform. The extracted features are applied to Bi-LSTM and CNN-based transfer learning models. Word encoding was used for mapping of text to sequences with numeric indices, and word embedding used for representation of all words in numeric dense vectors. These were applied to Bi-LSTM and n-gram-based CNN models. Finally, an ensemble of the softmax values output from the four deep learning models was used to perform depression diagnosis, based on the proposed four-stream model. Using the proposed model, experiments were performed with the Extended Distress Analysis Interview Corpus Wizard of Oz depression database and other datasets. Experimental results showed improved performance by 10.7% to 11.9% over two-stream-based state-of-the-art methods. This demonstrates that the proposed model is effective for depression diagnosis. |
first_indexed | 2024-04-11T04:20:05Z |
format | Article |
id | doaj.art-f1a9ff88c7d3434ab9417d2d3e0fd007 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T04:20:05Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-f1a9ff88c7d3434ab9417d2d3e0fd0072022-12-31T00:01:09ZengIEEEIEEE Access2169-35362022-01-011013411313413510.1109/ACCESS.2022.32318849998535Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text InformationA-Hyeon Jo0https://orcid.org/0000-0003-1909-5655Keun-Chang Kwak1https://orcid.org/0000-0002-3821-0711Department of Electronic Engineering, Chosun University, Gwangju, South KoreaDepartment of Electronic Engineering, Chosun University, Gwangju, South KoreaRecent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-modal approaches in deep learning technology, which enable utilization of various information through fusion of varied data types. This study proposes a four-stream-based depression diagnosis model consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and convolutional neural networks (CNN), using speech and text data. One-dimensional features of audio signals are extracted using Mel Frequency Cepstral Coefficients and Gammatone Cepstral Coefficients, and two-dimensional features are extracted from Bark, equivalent rectangular bandwidth, and Log-Mel spectrograms, based on time-frequency transform. The extracted features are applied to Bi-LSTM and CNN-based transfer learning models. Word encoding was used for mapping of text to sequences with numeric indices, and word embedding used for representation of all words in numeric dense vectors. These were applied to Bi-LSTM and n-gram-based CNN models. Finally, an ensemble of the softmax values output from the four deep learning models was used to perform depression diagnosis, based on the proposed four-stream model. Using the proposed model, experiments were performed with the Extended Distress Analysis Interview Corpus Wizard of Oz depression database and other datasets. Experimental results showed improved performance by 10.7% to 11.9% over two-stream-based state-of-the-art methods. This demonstrates that the proposed model is effective for depression diagnosis.https://ieeexplore.ieee.org/document/9998535/Artificial intelligencedepression diagnosismulti-modalfour-streambidirectional long short-term memoryconvolutional neural networks |
spellingShingle | A-Hyeon Jo Keun-Chang Kwak Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information IEEE Access Artificial intelligence depression diagnosis multi-modal four-stream bidirectional long short-term memory convolutional neural networks |
title | Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information |
title_full | Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information |
title_fullStr | Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information |
title_full_unstemmed | Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information |
title_short | Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information |
title_sort | diagnosis of depression based on four stream model of bi lstm and cnn from audio and text information |
topic | Artificial intelligence depression diagnosis multi-modal four-stream bidirectional long short-term memory convolutional neural networks |
url | https://ieeexplore.ieee.org/document/9998535/ |
work_keys_str_mv | AT ahyeonjo diagnosisofdepressionbasedonfourstreammodelofbilstmandcnnfromaudioandtextinformation AT keunchangkwak diagnosisofdepressionbasedonfourstreammodelofbilstmandcnnfromaudioandtextinformation |