Efficient Content-Based Sparse Attention with Routing Transformers
AbstractSelf-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focus...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The MIT Press
2021-01-01
|
Series: | Transactions of the Association for Computational Linguistics |
Online Access: | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00353/97776/Efficient-Content-Based-Sparse-Attention-with |