Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information

Recent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-...

Full description

Bibliographic Details
Main Authors:	A-Hyeon Jo, Keun-Chang Kwak
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Artificial intelligence depression diagnosis multi-modal four-stream bidirectional long short-term memory convolutional neural networks
Online Access:	https://ieeexplore.ieee.org/document/9998535/

_version_	1797974446177255424
author	A-Hyeon Jo Keun-Chang Kwak
author_facet	A-Hyeon Jo Keun-Chang Kwak
author_sort	A-Hyeon Jo
collection	DOAJ
description	Recent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-modal approaches in deep learning technology, which enable utilization of various information through fusion of varied data types. This study proposes a four-stream-based depression diagnosis model consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and convolutional neural networks (CNN), using speech and text data. One-dimensional features of audio signals are extracted using Mel Frequency Cepstral Coefficients and Gammatone Cepstral Coefficients, and two-dimensional features are extracted from Bark, equivalent rectangular bandwidth, and Log-Mel spectrograms, based on time-frequency transform. The extracted features are applied to Bi-LSTM and CNN-based transfer learning models. Word encoding was used for mapping of text to sequences with numeric indices, and word embedding used for representation of all words in numeric dense vectors. These were applied to Bi-LSTM and n-gram-based CNN models. Finally, an ensemble of the softmax values output from the four deep learning models was used to perform depression diagnosis, based on the proposed four-stream model. Using the proposed model, experiments were performed with the Extended Distress Analysis Interview Corpus Wizard of Oz depression database and other datasets. Experimental results showed improved performance by 10.7% to 11.9% over two-stream-based state-of-the-art methods. This demonstrates that the proposed model is effective for depression diagnosis.
first_indexed	2024-04-11T04:20:05Z
format	Article
id	doaj.art-f1a9ff88c7d3434ab9417d2d3e0fd007
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-11T04:20:05Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-f1a9ff88c7d3434ab9417d2d3e0fd0072022-12-31T00:01:09ZengIEEEIEEE Access2169-35362022-01-011013411313413510.1109/ACCESS.2022.32318849998535Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text InformationA-Hyeon Jo0https://orcid.org/0000-0003-1909-5655Keun-Chang Kwak1https://orcid.org/0000-0002-3821-0711Department of Electronic Engineering, Chosun University, Gwangju, South KoreaDepartment of Electronic Engineering, Chosun University, Gwangju, South KoreaRecent development trends in artificial intelligence applications have seen increasing interest in the design of automated systems for depression detection and diagnosis among the affective computing community. Particularly, active research has been conducted in depression diagnosis, based on multi-modal approaches in deep learning technology, which enable utilization of various information through fusion of varied data types. This study proposes a four-stream-based depression diagnosis model consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and convolutional neural networks (CNN), using speech and text data. One-dimensional features of audio signals are extracted using Mel Frequency Cepstral Coefficients and Gammatone Cepstral Coefficients, and two-dimensional features are extracted from Bark, equivalent rectangular bandwidth, and Log-Mel spectrograms, based on time-frequency transform. The extracted features are applied to Bi-LSTM and CNN-based transfer learning models. Word encoding was used for mapping of text to sequences with numeric indices, and word embedding used for representation of all words in numeric dense vectors. These were applied to Bi-LSTM and n-gram-based CNN models. Finally, an ensemble of the softmax values output from the four deep learning models was used to perform depression diagnosis, based on the proposed four-stream model. Using the proposed model, experiments were performed with the Extended Distress Analysis Interview Corpus Wizard of Oz depression database and other datasets. Experimental results showed improved performance by 10.7% to 11.9% over two-stream-based state-of-the-art methods. This demonstrates that the proposed model is effective for depression diagnosis.https://ieeexplore.ieee.org/document/9998535/Artificial intelligencedepression diagnosismulti-modalfour-streambidirectional long short-term memoryconvolutional neural networks
spellingShingle	A-Hyeon Jo Keun-Chang Kwak Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information IEEE Access Artificial intelligence depression diagnosis multi-modal four-stream bidirectional long short-term memory convolutional neural networks
title	Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_full	Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_fullStr	Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_full_unstemmed	Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_short	Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information
title_sort	diagnosis of depression based on four stream model of bi lstm and cnn from audio and text information
topic	Artificial intelligence depression diagnosis multi-modal four-stream bidirectional long short-term memory convolutional neural networks
url	https://ieeexplore.ieee.org/document/9998535/
work_keys_str_mv	AT ahyeonjo diagnosisofdepressionbasedonfourstreammodelofbilstmandcnnfromaudioandtextinformation AT keunchangkwak diagnosisofdepressionbasedonfourstreammodelofbilstmandcnnfromaudioandtextinformation

Diagnosis of Depression Based on Four-Stream Model of Bi-LSTM and CNN From Audio and Text Information

Similar Items