A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media

Depression has become one of the most common mental illnesses, and the widespread use of social media provides new ideas for detecting various mental illnesses. The purpose of this study is to use machine learning technology to detect users of depressive patients based on user-shared content and pos...

Full description

Bibliographic Details
Main Authors: Jingfang Liu, Mengshi Shi
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-01-01
Series:Frontiers in Psychology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpsyg.2021.802821/full
_version_ 1819007681274314752
author Jingfang Liu
Mengshi Shi
author_facet Jingfang Liu
Mengshi Shi
author_sort Jingfang Liu
collection DOAJ
description Depression has become one of the most common mental illnesses, and the widespread use of social media provides new ideas for detecting various mental illnesses. The purpose of this study is to use machine learning technology to detect users of depressive patients based on user-shared content and posting behaviors in social media. At present, the existing research mostly uses a single detection method, and the unbalanced class distribution often leads to a low recognition rate. In addition, a large number of irrelevant or redundant features in high-dimensional data sets interfere with the accuracy of recognition. To solve this problem, this paper proposes a hybrid feature selection and stacking ensemble strategy for depression user detection. First, recursive elimination method and extremely randomized trees method are used to calculate feature importance and mutual information value, calculate feature weight vector, and select the optimal feature subset according to the feature weight. Second, naive bayes, k-nearest neighbor, regularized logistic regression and support vector machine are used as base learners, and a simple logistic regression algorithm is used as a combination strategy to build a stacking model. Experimental results show that compared with other machine learning algorithms, the proposed hybrid method, which integrates feature selection and ensemble, has a higher accuracy of 90.27% in identifying online patients. We believe this study will help develop new methods to identify depressed people in social networks, providing guidance for future research.
first_indexed 2024-12-21T00:28:26Z
format Article
id doaj.art-a5ef3baad81c431081d1747a57ea49b5
institution Directory Open Access Journal
issn 1664-1078
language English
last_indexed 2024-12-21T00:28:26Z
publishDate 2022-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Psychology
spelling doaj.art-a5ef3baad81c431081d1747a57ea49b52022-12-21T19:21:56ZengFrontiers Media S.A.Frontiers in Psychology1664-10782022-01-011210.3389/fpsyg.2021.802821802821A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social MediaJingfang LiuMengshi ShiDepression has become one of the most common mental illnesses, and the widespread use of social media provides new ideas for detecting various mental illnesses. The purpose of this study is to use machine learning technology to detect users of depressive patients based on user-shared content and posting behaviors in social media. At present, the existing research mostly uses a single detection method, and the unbalanced class distribution often leads to a low recognition rate. In addition, a large number of irrelevant or redundant features in high-dimensional data sets interfere with the accuracy of recognition. To solve this problem, this paper proposes a hybrid feature selection and stacking ensemble strategy for depression user detection. First, recursive elimination method and extremely randomized trees method are used to calculate feature importance and mutual information value, calculate feature weight vector, and select the optimal feature subset according to the feature weight. Second, naive bayes, k-nearest neighbor, regularized logistic regression and support vector machine are used as base learners, and a simple logistic regression algorithm is used as a combination strategy to build a stacking model. Experimental results show that compared with other machine learning algorithms, the proposed hybrid method, which integrates feature selection and ensemble, has a higher accuracy of 90.27% in identifying online patients. We believe this study will help develop new methods to identify depressed people in social networks, providing guidance for future research.https://www.frontiersin.org/articles/10.3389/fpsyg.2021.802821/fulldepressionmachine learningensemble learningfeature selectionsocial media
spellingShingle Jingfang Liu
Mengshi Shi
A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media
Frontiers in Psychology
depression
machine learning
ensemble learning
feature selection
social media
title A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media
title_full A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media
title_fullStr A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media
title_full_unstemmed A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media
title_short A Hybrid Feature Selection and Ensemble Approach to Identify Depressed Users in Online Social Media
title_sort hybrid feature selection and ensemble approach to identify depressed users in online social media
topic depression
machine learning
ensemble learning
feature selection
social media
url https://www.frontiersin.org/articles/10.3389/fpsyg.2021.802821/full
work_keys_str_mv AT jingfangliu ahybridfeatureselectionandensembleapproachtoidentifydepressedusersinonlinesocialmedia
AT mengshishi ahybridfeatureselectionandensembleapproachtoidentifydepressedusersinonlinesocialmedia
AT jingfangliu hybridfeatureselectionandensembleapproachtoidentifydepressedusersinonlinesocialmedia
AT mengshishi hybridfeatureselectionandensembleapproachtoidentifydepressedusersinonlinesocialmedia