Scalable syntactic inductive biases for neural language models

<p>Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to an...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awdur:	Kuncoro, AS
Awduron Eraill:	Blunsom, P
Fformat:	Traethawd Ymchwil
Iaith:	English
Cyhoeddwyd:	2022
Pynciau:	Deep learning (Machine learning) Natural language processing (Computer science) Syntax Machine learning Linguistics

_version_	1826312843084955648
author	Kuncoro, AS
author2	Blunsom, P
author_facet	Blunsom, P Kuncoro, AS
author_sort	Kuncoro, AS
collection	OXFORD
description	<p>Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to answer the following open research questions: To what extent---if at all---can more explicit notions of hierarchical syntactic structures <em>further improve</em> the performance of neural models within NLP, even within the context of successful models like BERT that learn from large amounts of data? And where exactly would stronger notions of syntactic structures be beneficial in different types of language understanding tasks?</p> <p>To answer these questions, we explore two approaches for augmenting neural sequence models with an inductive bias that encourages a more explicit modelling of hierarchical syntactic structures. In the first approach, we use existing techniques that design <strong>tree-structured</strong> neural networks, where the ordering of the computational operations is determined by hierarchical syntax trees. We discover that this approach is indeed effective for designing better and more robust models at various challenging benchmarks of syntactic competence, although these benefits nevertheless come at the expense of <strong>scalability</strong>: In practice, such tree-structured models are much more challenging to scale to large datasets.</p> <p>Hence, in the second approach, we devise a novel <strong>knowledge distillation strategy</strong> for combining the best of both syntactic inductive biases and data scale. Our proposed approach is effective across different neural sequence modelling architectures and objective functions: By applying our approach on top of a left-to-right LSTM, we design a distilled syntax-aware (DSA) LSTM that achieves a new state of the art (as of mid-2019) and human-level performance at targeted syntactic evaluations. By applying our approach on top of a Transformer-based BERT masked language model that works well at scale, we outperform a strong BERT baseline on six structured prediction tasks---including those that are not explicitly syntactic in nature---in addition to the corpus of linguistic acceptability. Notably, our approach yields a new state of the art (as of mid-2020)---among models pre-trained on the original BERT dataset---on four structured prediction tasks: In-domain and out-of-domain phrase-structure parsing, dependency parsing, and semantic role labelling.</p> <p>Altogether, our findings and methods in this work: (i) provide an example of how existing linguistic theories (particularly concerning the syntax of language), annotations, and resources can be used both as diagnostic evaluation tools, and also as a source of prior knowledge for crafting inductive biases that can improve the performance of computational models of language; (ii) showcase the <em>continued</em> relevance and benefits of more explicit syntactic inductive biases, even within the context of scalable neural models like BERT that can derive their knowledge from large amounts of data; (iii) contribute to a better understanding of where exactly syntactic biases are most helpful in different types of NLP tasks; and (iv) motivate the broader question of how we can design models that integrate stronger syntactic biases---and yet can be easily scalable at the same time---as a promising (if relatively underexplored) direction of NLP research.</p>
first_indexed	2024-09-25T04:01:27Z
format	Thesis
id	oxford-uuid:3aa07c40-0b0f-40b0-bbde-0bf47e2d3da3
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:01:27Z
publishDate	2022
record_format	dspace
spelling	oxford-uuid:3aa07c40-0b0f-40b0-bbde-0bf47e2d3da32024-04-30T11:46:44ZScalable syntactic inductive biases for neural language modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:3aa07c40-0b0f-40b0-bbde-0bf47e2d3da3Deep learning (Machine learning)Natural language processing (Computer science)SyntaxMachine learningLinguisticsEnglishHyrax Deposit2022Kuncoro, ASBlunsom, P<p>Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to answer the following open research questions: To what extent---if at all---can more explicit notions of hierarchical syntactic structures <em>further improve</em> the performance of neural models within NLP, even within the context of successful models like BERT that learn from large amounts of data? And where exactly would stronger notions of syntactic structures be beneficial in different types of language understanding tasks?</p> <p>To answer these questions, we explore two approaches for augmenting neural sequence models with an inductive bias that encourages a more explicit modelling of hierarchical syntactic structures. In the first approach, we use existing techniques that design <strong>tree-structured</strong> neural networks, where the ordering of the computational operations is determined by hierarchical syntax trees. We discover that this approach is indeed effective for designing better and more robust models at various challenging benchmarks of syntactic competence, although these benefits nevertheless come at the expense of <strong>scalability</strong>: In practice, such tree-structured models are much more challenging to scale to large datasets.</p> <p>Hence, in the second approach, we devise a novel <strong>knowledge distillation strategy</strong> for combining the best of both syntactic inductive biases and data scale. Our proposed approach is effective across different neural sequence modelling architectures and objective functions: By applying our approach on top of a left-to-right LSTM, we design a distilled syntax-aware (DSA) LSTM that achieves a new state of the art (as of mid-2019) and human-level performance at targeted syntactic evaluations. By applying our approach on top of a Transformer-based BERT masked language model that works well at scale, we outperform a strong BERT baseline on six structured prediction tasks---including those that are not explicitly syntactic in nature---in addition to the corpus of linguistic acceptability. Notably, our approach yields a new state of the art (as of mid-2020)---among models pre-trained on the original BERT dataset---on four structured prediction tasks: In-domain and out-of-domain phrase-structure parsing, dependency parsing, and semantic role labelling.</p> <p>Altogether, our findings and methods in this work: (i) provide an example of how existing linguistic theories (particularly concerning the syntax of language), annotations, and resources can be used both as diagnostic evaluation tools, and also as a source of prior knowledge for crafting inductive biases that can improve the performance of computational models of language; (ii) showcase the <em>continued</em> relevance and benefits of more explicit syntactic inductive biases, even within the context of scalable neural models like BERT that can derive their knowledge from large amounts of data; (iii) contribute to a better understanding of where exactly syntactic biases are most helpful in different types of NLP tasks; and (iv) motivate the broader question of how we can design models that integrate stronger syntactic biases---and yet can be easily scalable at the same time---as a promising (if relatively underexplored) direction of NLP research.</p>
spellingShingle	Deep learning (Machine learning) Natural language processing (Computer science) Syntax Machine learning Linguistics Kuncoro, AS Scalable syntactic inductive biases for neural language models
title	Scalable syntactic inductive biases for neural language models
title_full	Scalable syntactic inductive biases for neural language models
title_fullStr	Scalable syntactic inductive biases for neural language models
title_full_unstemmed	Scalable syntactic inductive biases for neural language models
title_short	Scalable syntactic inductive biases for neural language models
title_sort	scalable syntactic inductive biases for neural language models
topic	Deep learning (Machine learning) Natural language processing (Computer science) Syntax Machine learning Linguistics
work_keys_str_mv	AT kuncoroas scalablesyntacticinductivebiasesforneurallanguagemodels

Scalable syntactic inductive biases for neural language models

Eitemau Tebyg