Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories
AbstractAlthough current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and comple...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The MIT Press
2021-01-01
|
Series: | Transactions of the Association for Computational Linguistics |
Online Access: | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00364/98238/Supertagging-the-Long-Tail-with-Tree-Structured |
_version_ | 1818251616142229504 |
---|---|
author | Jakob Prange Nathan Schneider Vivek Srikumar |
author_facet | Jakob Prange Nathan Schneider Vivek Srikumar |
author_sort | Jakob Prange |
collection | DOAJ |
description |
AbstractAlthough current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets. |
first_indexed | 2024-12-12T16:11:07Z |
format | Article |
id | doaj.art-334b9c621d174fe288dc15ab207d9bf3 |
institution | Directory Open Access Journal |
issn | 2307-387X |
language | English |
last_indexed | 2024-12-12T16:11:07Z |
publishDate | 2021-01-01 |
publisher | The MIT Press |
record_format | Article |
series | Transactions of the Association for Computational Linguistics |
spelling | doaj.art-334b9c621d174fe288dc15ab207d9bf32022-12-22T00:19:11ZengThe MIT PressTransactions of the Association for Computational Linguistics2307-387X2021-01-01924326010.1162/tacl_a_00364Supertagging the Long Tail with Tree-Structured Decoding of Complex CategoriesJakob Prange0Nathan Schneider1Vivek Srikumar2Georgetown University, United States. jp1724@georgetown.eduGeorgetown University, United States. nathan.schneider@georgetown.eduUniversity of Utah, United States. svivek@cs.utah.edu AbstractAlthough current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00364/98238/Supertagging-the-Long-Tail-with-Tree-Structured |
spellingShingle | Jakob Prange Nathan Schneider Vivek Srikumar Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories Transactions of the Association for Computational Linguistics |
title | Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories |
title_full | Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories |
title_fullStr | Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories |
title_full_unstemmed | Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories |
title_short | Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories |
title_sort | supertagging the long tail with tree structured decoding of complex categories |
url | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00364/98238/Supertagging-the-Long-Tail-with-Tree-Structured |
work_keys_str_mv | AT jakobprange supertaggingthelongtailwithtreestructureddecodingofcomplexcategories AT nathanschneider supertaggingthelongtailwithtreestructureddecodingofcomplexcategories AT viveksrikumar supertaggingthelongtailwithtreestructureddecodingofcomplexcategories |