Scalable syntactic inductive biases for neural language models

<p>Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to an...

Полное описание

Библиографические подробности
Главный автор: Kuncoro, AS
Другие авторы: Blunsom, P
Формат: Диссертация
Язык:English
Опубликовано: 2022
Предметы:
_version_ 1826312843084955648
author Kuncoro, AS
author2 Blunsom, P
author_facet Blunsom, P
Kuncoro, AS
author_sort Kuncoro, AS
collection OXFORD
description <p>Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to answer the following open research questions: To what extent---if at all---can more explicit notions of hierarchical syntactic structures <em>further improve</em> the performance of neural models within NLP, even within the context of successful models like BERT that learn from large amounts of data? And where exactly would stronger notions of syntactic structures be beneficial in different types of language understanding tasks?</p> <p>To answer these questions, we explore two approaches for augmenting neural sequence models with an inductive bias that encourages a more explicit modelling of hierarchical syntactic structures. In the first approach, we use existing techniques that design <strong>tree-structured</strong> neural networks, where the ordering of the computational operations is determined by hierarchical syntax trees. We discover that this approach is indeed effective for designing better and more robust models at various challenging benchmarks of syntactic competence, although these benefits nevertheless come at the expense of <strong>scalability</strong>: In practice, such tree-structured models are much more challenging to scale to large datasets.</p> <p>Hence, in the second approach, we devise a novel <strong>knowledge distillation strategy</strong> for combining the best of both syntactic inductive biases and data scale. Our proposed approach is effective across different neural sequence modelling architectures and objective functions: By applying our approach on top of a left-to-right LSTM, we design a distilled syntax-aware (DSA) LSTM that achieves a new state of the art (as of mid-2019) and human-level performance at targeted syntactic evaluations. By applying our approach on top of a Transformer-based BERT masked language model that works well at scale, we outperform a strong BERT baseline on six structured prediction tasks---including those that are not explicitly syntactic in nature---in addition to the corpus of linguistic acceptability. Notably, our approach yields a new state of the art (as of mid-2020)---among models pre-trained on the original BERT dataset---on four structured prediction tasks: In-domain and out-of-domain phrase-structure parsing, dependency parsing, and semantic role labelling.</p> <p>Altogether, our findings and methods in this work: (i) provide an example of how existing linguistic theories (particularly concerning the syntax of language), annotations, and resources can be used both as diagnostic evaluation tools, and also as a source of prior knowledge for crafting inductive biases that can improve the performance of computational models of language; (ii) showcase the <em>continued</em> relevance and benefits of more explicit syntactic inductive biases, even within the context of scalable neural models like BERT that can derive their knowledge from large amounts of data; (iii) contribute to a better understanding of where exactly syntactic biases are most helpful in different types of NLP tasks; and (iv) motivate the broader question of how we can design models that integrate stronger syntactic biases---and yet can be easily scalable at the same time---as a promising (if relatively underexplored) direction of NLP research.</p>
first_indexed 2024-09-25T04:01:27Z
format Thesis
id oxford-uuid:3aa07c40-0b0f-40b0-bbde-0bf47e2d3da3
institution University of Oxford
language English
last_indexed 2024-09-25T04:01:27Z
publishDate 2022
record_format dspace
spelling oxford-uuid:3aa07c40-0b0f-40b0-bbde-0bf47e2d3da32024-04-30T11:46:44ZScalable syntactic inductive biases for neural language modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:3aa07c40-0b0f-40b0-bbde-0bf47e2d3da3Deep learning (Machine learning)Natural language processing (Computer science)SyntaxMachine learningLinguisticsEnglishHyrax Deposit2022Kuncoro, ASBlunsom, P<p>Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to answer the following open research questions: To what extent---if at all---can more explicit notions of hierarchical syntactic structures <em>further improve</em> the performance of neural models within NLP, even within the context of successful models like BERT that learn from large amounts of data? And where exactly would stronger notions of syntactic structures be beneficial in different types of language understanding tasks?</p> <p>To answer these questions, we explore two approaches for augmenting neural sequence models with an inductive bias that encourages a more explicit modelling of hierarchical syntactic structures. In the first approach, we use existing techniques that design <strong>tree-structured</strong> neural networks, where the ordering of the computational operations is determined by hierarchical syntax trees. We discover that this approach is indeed effective for designing better and more robust models at various challenging benchmarks of syntactic competence, although these benefits nevertheless come at the expense of <strong>scalability</strong>: In practice, such tree-structured models are much more challenging to scale to large datasets.</p> <p>Hence, in the second approach, we devise a novel <strong>knowledge distillation strategy</strong> for combining the best of both syntactic inductive biases and data scale. Our proposed approach is effective across different neural sequence modelling architectures and objective functions: By applying our approach on top of a left-to-right LSTM, we design a distilled syntax-aware (DSA) LSTM that achieves a new state of the art (as of mid-2019) and human-level performance at targeted syntactic evaluations. By applying our approach on top of a Transformer-based BERT masked language model that works well at scale, we outperform a strong BERT baseline on six structured prediction tasks---including those that are not explicitly syntactic in nature---in addition to the corpus of linguistic acceptability. Notably, our approach yields a new state of the art (as of mid-2020)---among models pre-trained on the original BERT dataset---on four structured prediction tasks: In-domain and out-of-domain phrase-structure parsing, dependency parsing, and semantic role labelling.</p> <p>Altogether, our findings and methods in this work: (i) provide an example of how existing linguistic theories (particularly concerning the syntax of language), annotations, and resources can be used both as diagnostic evaluation tools, and also as a source of prior knowledge for crafting inductive biases that can improve the performance of computational models of language; (ii) showcase the <em>continued</em> relevance and benefits of more explicit syntactic inductive biases, even within the context of scalable neural models like BERT that can derive their knowledge from large amounts of data; (iii) contribute to a better understanding of where exactly syntactic biases are most helpful in different types of NLP tasks; and (iv) motivate the broader question of how we can design models that integrate stronger syntactic biases---and yet can be easily scalable at the same time---as a promising (if relatively underexplored) direction of NLP research.</p>
spellingShingle Deep learning (Machine learning)
Natural language processing (Computer science)
Syntax
Machine learning
Linguistics
Kuncoro, AS
Scalable syntactic inductive biases for neural language models
title Scalable syntactic inductive biases for neural language models
title_full Scalable syntactic inductive biases for neural language models
title_fullStr Scalable syntactic inductive biases for neural language models
title_full_unstemmed Scalable syntactic inductive biases for neural language models
title_short Scalable syntactic inductive biases for neural language models
title_sort scalable syntactic inductive biases for neural language models
topic Deep learning (Machine learning)
Natural language processing (Computer science)
Syntax
Machine learning
Linguistics
work_keys_str_mv AT kuncoroas scalablesyntacticinductivebiasesforneurallanguagemodels