Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text

<p>As the fields of machine learning, natural language processing, and big data become increasingly important throughout various industries, including finance, it becomes crucial to evaluate how they are being utilised and understand the motivations behind the recommendations of algorithms, in...

Full description

Bibliographic Details
Main Author: Patel, K
Other Authors: Coecke, B
Format: Thesis
Language:English
Published: 2021
Subjects:
_version_ 1797107921340858368
author Patel, K
author2 Coecke, B
author_facet Coecke, B
Patel, K
author_sort Patel, K
collection OXFORD
description <p>As the fields of machine learning, natural language processing, and big data become increasingly important throughout various industries, including finance, it becomes crucial to evaluate how they are being utilised and understand the motivations behind the recommendations of algorithms, in order to make sound decisions based on this information. In finance, particularly, being able to create concise and easy to comprehend tools for understanding machine learning models is extremely beneficial. These tools could serve multiple purposes, including allowing managers to better explain to and convince investors of machine learning based trading strategies. Decision-makers themselves might also be able to act according to the information provided by these tools, which could assist in, for example, making more ethically informed decisions. Thus, this type of research not only has implications in the areas of machine learning, Natural Language Processing (NLP), and finance but also in fields like AI explainability, model interpretability and Responsible Research and Innovation.</p> <p>The central problem this thesis addresses is whether machine learning methods can be used to mine representative sentences from financial text that are able to capture the majority of sentiment in a full document with only a few sentences. These mined sentences are referred to as ‘justifications’, and the process has been called ‘justification mining’. The full documents used here are data taken from 10-K filings. Before examining justification mining, however, transfer learning methods suitable for training data annotated at a different level than testing data (sentence- versus document-level) are assessed. The purpose of this is to address a problem that often occurs in NLP research, where no annotated training data perfectly suited to the research is readily available. Therefore, methods must be created to make use of what is available, in this case sentence-annotated training data being employed to train classifiers for prediction on document-level testing data. These transfer learning methods are then employed in the next step, which focuses on justification mining. The process of justification mining itself is first developed using transformer models to encode embeddings and, then, using clustering algorithms and cosine similarity to extract or mine justifications. This process is evaluated by comparing sentiment from mined justifications to sentiment from full documents, in order to assess the ability of justification mining to capture or summarise sentiment. It is also evaluated by correlating predicted sentiment from mined justifications and full documents to future stock returns, to gauge whether justification mining offers any benefit in identifying signals in the data for financial purposes.</p> <p>Little work has been done previously that evaluates the results of financial sentiment analyses in this way. In the NLP domain, the best way of extracting aggregate justifications for sentiment is still an open question. Moreover, few research papers attempt to apply transfer learning from lower-level data to entire documents. Nor are 10-K filings widely studied in this context, as they are difficult to parse. The methods created in this thesis might offer a novel means of providing information that can assess the motivations behind sentiment analyses.</p> In the transfer learning process, feature engineering and preprocessing steps were modified to obtain accuracies up to 0.903. For justification mining, considering statistically significant (p≤0.05) correlations of |𝑟|>0.2, sentiment from mined justifications more often correlated with future stock returns than did full document sentiment. Although the correlations (r) were somewhat weak overall, they could potentially be combined with traditional alpha signals to enhance these signals (see Section 11 for further discussion). Moreover, high degrees of similarity were found between aggregated sentiment from mined justifications and full documents, with similarity scores up to 0.9999, supporting the efficacy of justification mining in capturing full document sentiment. In these evaluations, transformer models performed better in numericising text for input into ML models than traditional approaches like Bag of Words. In fact, every statistically significant sentiment to stock return correlation bar one, as well as every mined justification to full document similarity and correlation score above 0.7, used transformer models for numericisation.</p> <p>These results imply that justification mining might be successful in eliminating sentiment noise in financial data, as well as in capturing the majority of sentiment from a full document. Moreover, transformer models might provide an advantage over traditional approaches like Bag of Words for numericisation of text. Mined justifications, themselves, provide an easily interpretable and presentable means of explaining the output of sentiment analysis and have numerous uses, including identifying the driving factors behind sentiment in a financial document, which could be helpful for making more ethically informed decisions or building investor trust in the methodology of sentiment analysis algorithms.</p>
first_indexed 2024-03-07T07:22:26Z
format Thesis
id oxford-uuid:242abee5-b74d-4caa-bac3-c9aba0a977ac
institution University of Oxford
language English
last_indexed 2024-03-07T07:22:26Z
publishDate 2021
record_format dspace
spelling oxford-uuid:242abee5-b74d-4caa-bac3-c9aba0a977ac2022-10-26T15:17:51ZJustification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial textThesishttp://purl.org/coar/resource_type/c_db06uuid:242abee5-b74d-4caa-bac3-c9aba0a977acTransfer learning (Machine learning)Quantitative analystsComputational financeDeep learning (Machine learning)Machine learningQuantitative researchTrading rooms (Finance)Ensemble learning (Machine learning)Support vector machinesSentiment analysisFinancial engineeringAlgorithmic tradingNatural language processing (Computer science)FinanceSupervised learning (Machine learning)EnglishHyrax Deposit2021Patel, KCoecke, BSimpson, E<p>As the fields of machine learning, natural language processing, and big data become increasingly important throughout various industries, including finance, it becomes crucial to evaluate how they are being utilised and understand the motivations behind the recommendations of algorithms, in order to make sound decisions based on this information. In finance, particularly, being able to create concise and easy to comprehend tools for understanding machine learning models is extremely beneficial. These tools could serve multiple purposes, including allowing managers to better explain to and convince investors of machine learning based trading strategies. Decision-makers themselves might also be able to act according to the information provided by these tools, which could assist in, for example, making more ethically informed decisions. Thus, this type of research not only has implications in the areas of machine learning, Natural Language Processing (NLP), and finance but also in fields like AI explainability, model interpretability and Responsible Research and Innovation.</p> <p>The central problem this thesis addresses is whether machine learning methods can be used to mine representative sentences from financial text that are able to capture the majority of sentiment in a full document with only a few sentences. These mined sentences are referred to as ‘justifications’, and the process has been called ‘justification mining’. The full documents used here are data taken from 10-K filings. Before examining justification mining, however, transfer learning methods suitable for training data annotated at a different level than testing data (sentence- versus document-level) are assessed. The purpose of this is to address a problem that often occurs in NLP research, where no annotated training data perfectly suited to the research is readily available. Therefore, methods must be created to make use of what is available, in this case sentence-annotated training data being employed to train classifiers for prediction on document-level testing data. These transfer learning methods are then employed in the next step, which focuses on justification mining. The process of justification mining itself is first developed using transformer models to encode embeddings and, then, using clustering algorithms and cosine similarity to extract or mine justifications. This process is evaluated by comparing sentiment from mined justifications to sentiment from full documents, in order to assess the ability of justification mining to capture or summarise sentiment. It is also evaluated by correlating predicted sentiment from mined justifications and full documents to future stock returns, to gauge whether justification mining offers any benefit in identifying signals in the data for financial purposes.</p> <p>Little work has been done previously that evaluates the results of financial sentiment analyses in this way. In the NLP domain, the best way of extracting aggregate justifications for sentiment is still an open question. Moreover, few research papers attempt to apply transfer learning from lower-level data to entire documents. Nor are 10-K filings widely studied in this context, as they are difficult to parse. The methods created in this thesis might offer a novel means of providing information that can assess the motivations behind sentiment analyses.</p> In the transfer learning process, feature engineering and preprocessing steps were modified to obtain accuracies up to 0.903. For justification mining, considering statistically significant (p≤0.05) correlations of |𝑟|>0.2, sentiment from mined justifications more often correlated with future stock returns than did full document sentiment. Although the correlations (r) were somewhat weak overall, they could potentially be combined with traditional alpha signals to enhance these signals (see Section 11 for further discussion). Moreover, high degrees of similarity were found between aggregated sentiment from mined justifications and full documents, with similarity scores up to 0.9999, supporting the efficacy of justification mining in capturing full document sentiment. In these evaluations, transformer models performed better in numericising text for input into ML models than traditional approaches like Bag of Words. In fact, every statistically significant sentiment to stock return correlation bar one, as well as every mined justification to full document similarity and correlation score above 0.7, used transformer models for numericisation.</p> <p>These results imply that justification mining might be successful in eliminating sentiment noise in financial data, as well as in capturing the majority of sentiment from a full document. Moreover, transformer models might provide an advantage over traditional approaches like Bag of Words for numericisation of text. Mined justifications, themselves, provide an easily interpretable and presentable means of explaining the output of sentiment analysis and have numerous uses, including identifying the driving factors behind sentiment in a financial document, which could be helpful for making more ethically informed decisions or building investor trust in the methodology of sentiment analysis algorithms.</p>
spellingShingle Transfer learning (Machine learning)
Quantitative analysts
Computational finance
Deep learning (Machine learning)
Machine learning
Quantitative research
Trading rooms (Finance)
Ensemble learning (Machine learning)
Support vector machines
Sentiment analysis
Financial engineering
Algorithmic trading
Natural language processing (Computer science)
Finance
Supervised learning (Machine learning)
Patel, K
Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
title Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
title_full Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
title_fullStr Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
title_full_unstemmed Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
title_short Justification Mining: Developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
title_sort justification mining developing a novel machine learning method for identifying representative sentences and summarising sentiment in financial text
topic Transfer learning (Machine learning)
Quantitative analysts
Computational finance
Deep learning (Machine learning)
Machine learning
Quantitative research
Trading rooms (Finance)
Ensemble learning (Machine learning)
Support vector machines
Sentiment analysis
Financial engineering
Algorithmic trading
Natural language processing (Computer science)
Finance
Supervised learning (Machine learning)
work_keys_str_mv AT patelk justificationminingdevelopinganovelmachinelearningmethodforidentifyingrepresentativesentencesandsummarisingsentimentinfinancialtext