Textual knowledge integration for financial asset management

The scenario when investors need to manage a large number of financial assets has an essential difference from what most of the people do for stock movement prediction today. In traditional asset allocation models, expected returns and correlations of financial assets are difficult to estimate from...

Full description

Bibliographic Details
Main Author: Xing, Frank Zhutian
Other Authors: Erik Cambria
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2018
Subjects:
Online Access:https://hdl.handle.net/10356/87459
http://hdl.handle.net/10220/46751
_version_ 1811685926849478656
author Xing, Frank Zhutian
author2 Erik Cambria
author_facet Erik Cambria
Xing, Frank Zhutian
author_sort Xing, Frank Zhutian
collection NTU
description The scenario when investors need to manage a large number of financial assets has an essential difference from what most of the people do for stock movement prediction today. In traditional asset allocation models, expected returns and correlations of financial assets are difficult to estimate from historical price series, which are non-stationary and volatile. Therefore, I resort to textual knowledge hidden behind the huge amount of unstructured information produced by human beings. The research goals of this thesis include incorporating natural language process- ing techniques into several asset allocation models and finding the proper variables in financial models that naturally link to the contents of financial reports and the market sentiment. New perspectives investigated in the thesis extend the current framework of the Markowitz model and the Black-Litterman model by re-thinking asset expected returns and asset correlations. I try to inject into these two concepts new connotations. Both sub-symbolic AI and symbolic AI approaches are explored for semantic linkage and market view modeling, which are associated with key variables in asset allocation models. In the introductory chapter, types of financial texts are reviewed. However, most of the existing approaches treat heterogeneous information sources with no difference. I propose to separately consider semantics conveyed in financial texts and the sentiment time series formulated from social media posts. Afterward, re- cent advances in computational semantic representation of words and documents are leveraged to construct a dependence structure of financial assets. This structure (termed vine dependence) is useful in robust estimation of the covariance matrix of asset returns, which is a critical risk indicator of the asset combination held by investors. A vine-growing algorithm is proposed and a large vine structure for main US stocks is constructed. Furthermore, I study adding the market sentiment to the posterior inference of asset expected returns. Specially, sentic computing, a concept-level sentiment analysis method that takes advantages of syntactic features, is used in processing mass opinion streams. A novel recurrent neural network design termed ECM-LSTM is used in forming subjective investor views and benchmarked with popular neural network architectures such as DENFIS and LSTM, and forecasting models such as ARIMA and the Holt-Winters methods. The sentiment views enable explaining asset re-allocation decisions in a storytelling manner. In the end, like in many ambitious AI projects, the system needs maintenance to keep pace with demands and accumulation of commonsense knowledge to prevent having to start all over again. I discuss a method for continuously optimizing the polarity scores in a sentiment knowledge base by new-coming information. A series of experiments were conducted to test the portfolio performances, the validity of sentiment time series, and model scalability. I find the robust estimation of asset correlations by semantic linkages to be superior to estimation using historical price data in a sense that with the help of a proper semantic vine, the portfolio outperformed 80% to 90% of its peers in terms of annualized return. The improvement in annualized return is circa 2% for incorporating sentiment, and more than 10% for employing ECM-LSTM. This thesis increases our understanding of how to systematically integrate textual knowledge for financial asset management.
first_indexed 2024-10-01T04:52:17Z
format Thesis-Doctor of Philosophy
id ntu-10356/87459
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:52:17Z
publishDate 2018
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/874592020-03-07T11:52:00Z Textual knowledge integration for financial asset management Xing, Frank Zhutian Erik Cambria School of Computer Science and Engineering Centre for Computational Intelligence cambria@ntu.edu.sg DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence The scenario when investors need to manage a large number of financial assets has an essential difference from what most of the people do for stock movement prediction today. In traditional asset allocation models, expected returns and correlations of financial assets are difficult to estimate from historical price series, which are non-stationary and volatile. Therefore, I resort to textual knowledge hidden behind the huge amount of unstructured information produced by human beings. The research goals of this thesis include incorporating natural language process- ing techniques into several asset allocation models and finding the proper variables in financial models that naturally link to the contents of financial reports and the market sentiment. New perspectives investigated in the thesis extend the current framework of the Markowitz model and the Black-Litterman model by re-thinking asset expected returns and asset correlations. I try to inject into these two concepts new connotations. Both sub-symbolic AI and symbolic AI approaches are explored for semantic linkage and market view modeling, which are associated with key variables in asset allocation models. In the introductory chapter, types of financial texts are reviewed. However, most of the existing approaches treat heterogeneous information sources with no difference. I propose to separately consider semantics conveyed in financial texts and the sentiment time series formulated from social media posts. Afterward, re- cent advances in computational semantic representation of words and documents are leveraged to construct a dependence structure of financial assets. This structure (termed vine dependence) is useful in robust estimation of the covariance matrix of asset returns, which is a critical risk indicator of the asset combination held by investors. A vine-growing algorithm is proposed and a large vine structure for main US stocks is constructed. Furthermore, I study adding the market sentiment to the posterior inference of asset expected returns. Specially, sentic computing, a concept-level sentiment analysis method that takes advantages of syntactic features, is used in processing mass opinion streams. A novel recurrent neural network design termed ECM-LSTM is used in forming subjective investor views and benchmarked with popular neural network architectures such as DENFIS and LSTM, and forecasting models such as ARIMA and the Holt-Winters methods. The sentiment views enable explaining asset re-allocation decisions in a storytelling manner. In the end, like in many ambitious AI projects, the system needs maintenance to keep pace with demands and accumulation of commonsense knowledge to prevent having to start all over again. I discuss a method for continuously optimizing the polarity scores in a sentiment knowledge base by new-coming information. A series of experiments were conducted to test the portfolio performances, the validity of sentiment time series, and model scalability. I find the robust estimation of asset correlations by semantic linkages to be superior to estimation using historical price data in a sense that with the help of a proper semantic vine, the portfolio outperformed 80% to 90% of its peers in terms of annualized return. The improvement in annualized return is circa 2% for incorporating sentiment, and more than 10% for employing ECM-LSTM. This thesis increases our understanding of how to systematically integrate textual knowledge for financial asset management. Doctor of Philosophy 2018-11-30T05:28:58Z 2019-12-06T16:42:22Z 2018-11-30T05:28:58Z 2019-12-06T16:42:22Z 2018 Thesis-Doctor of Philosophy Xing, F. Z. (2018). Textual knowledge integration for financial asset management. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/87459 http://hdl.handle.net/10220/46751 10.32657/10220/46751 en 136 p. application/pdf Nanyang Technological University
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Xing, Frank Zhutian
Textual knowledge integration for financial asset management
title Textual knowledge integration for financial asset management
title_full Textual knowledge integration for financial asset management
title_fullStr Textual knowledge integration for financial asset management
title_full_unstemmed Textual knowledge integration for financial asset management
title_short Textual knowledge integration for financial asset management
title_sort textual knowledge integration for financial asset management
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
url https://hdl.handle.net/10356/87459
http://hdl.handle.net/10220/46751
work_keys_str_mv AT xingfrankzhutian textualknowledgeintegrationforfinancialassetmanagement