Efficient Content-Based Sparse Attention with Routing Transformers

AbstractSelf-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focus...

Full description

Bibliographic Details
Main Authors: Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier
Format: Article
Language:English
Published: The MIT Press 2021-01-01
Series:Transactions of the Association for Computational Linguistics
Online Access:https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00353/97776/Efficient-Content-Based-Sparse-Attention-with