Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach

Abstract Textual-based factors have been widely regarded as a promising feature that can be applied to financial issues. This study focuses on extracting both basic and semantic textual features to supplement the traditionally used financial indicators. The main is to improve Chinese listed companie...

Full description

Bibliographic Details
Main Authors: Shixuan Li, Wenxuan Shi
Format: Article
Language:English
Published: Springer 2023-10-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://doi.org/10.1007/s44196-023-00342-2
_version_ 1797556538388250624
author Shixuan Li
Wenxuan Shi
author_facet Shixuan Li
Wenxuan Shi
author_sort Shixuan Li
collection DOAJ
description Abstract Textual-based factors have been widely regarded as a promising feature that can be applied to financial issues. This study focuses on extracting both basic and semantic textual features to supplement the traditionally used financial indicators. The main is to improve Chinese listed companies’ financial distress prediction (FDP). A unique paradigm is proposed in this study that combines financial and multi-type textual predictive factors, feature selection methods, classifiers, and time spans to achieve the optimal FDP. The frequency counts, TF-IDF, TextRank, and word embedding approaches are employed to extract frequency count-based, keyword-based, sentiment, and readability indicators. The experimental results prove that financial domain sentiment lexicons, word embedding-based readability analysis approaches, and the basic textual features of Management Discussion and Analysis can be important elements of FDP. Moreover, the finding highlights the fact that incorporating financial and textual features can achieve optimal performance 4 or 5 years before the expected baseline year; applying the RF-GBDT combined model can also outperform other classifiers. This study makes an innovative contribution, since it expands the multiple text analysis method in the financial text mining field and provides new findings on how to provide early warning signs related to financial risk. The approaches developed in this research can serve as a template that can be used to resolve other financial issues.
first_indexed 2024-03-10T17:04:20Z
format Article
id doaj.art-d8f5af1ce5b44310b6c2e4ffc17e59dd
institution Directory Open Access Journal
issn 1875-6883
language English
last_indexed 2024-03-10T17:04:20Z
publishDate 2023-10-01
publisher Springer
record_format Article
series International Journal of Computational Intelligence Systems
spelling doaj.art-d8f5af1ce5b44310b6c2e4ffc17e59dd2023-11-20T10:52:31ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832023-10-0116112410.1007/s44196-023-00342-2Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined ApproachShixuan Li0Wenxuan Shi1School of Safety Science and Emergency Management, Wuhan University of TechnologySchool of Information Management, Wuhan UniversityAbstract Textual-based factors have been widely regarded as a promising feature that can be applied to financial issues. This study focuses on extracting both basic and semantic textual features to supplement the traditionally used financial indicators. The main is to improve Chinese listed companies’ financial distress prediction (FDP). A unique paradigm is proposed in this study that combines financial and multi-type textual predictive factors, feature selection methods, classifiers, and time spans to achieve the optimal FDP. The frequency counts, TF-IDF, TextRank, and word embedding approaches are employed to extract frequency count-based, keyword-based, sentiment, and readability indicators. The experimental results prove that financial domain sentiment lexicons, word embedding-based readability analysis approaches, and the basic textual features of Management Discussion and Analysis can be important elements of FDP. Moreover, the finding highlights the fact that incorporating financial and textual features can achieve optimal performance 4 or 5 years before the expected baseline year; applying the RF-GBDT combined model can also outperform other classifiers. This study makes an innovative contribution, since it expands the multiple text analysis method in the financial text mining field and provides new findings on how to provide early warning signs related to financial risk. The approaches developed in this research can serve as a template that can be used to resolve other financial issues.https://doi.org/10.1007/s44196-023-00342-2Textual factorsFeature selectionEnsemble classifiersFinancial distress predictionWord embedding
spellingShingle Shixuan Li
Wenxuan Shi
Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
International Journal of Computational Intelligence Systems
Textual factors
Feature selection
Ensemble classifiers
Financial distress prediction
Word embedding
title Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
title_full Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
title_fullStr Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
title_full_unstemmed Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
title_short Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
title_sort incorporating multiple textual factors into unbalanced financial distress prediction a feature selection methods and ensemble classifiers combined approach
topic Textual factors
Feature selection
Ensemble classifiers
Financial distress prediction
Word embedding
url https://doi.org/10.1007/s44196-023-00342-2
work_keys_str_mv AT shixuanli incorporatingmultipletextualfactorsintounbalancedfinancialdistresspredictionafeatureselectionmethodsandensembleclassifierscombinedapproach
AT wenxuanshi incorporatingmultipletextualfactorsintounbalancedfinancialdistresspredictionafeatureselectionmethodsandensembleclassifierscombinedapproach