LongT5-Mulla: LongT5 With Multi-Level Local Attention for a Longer Sequence

Efficient Transformer models typically employ local and global attention methods, or utilize hierarchical or recurrent architectures, to process long text inputs in natural language processing tasks. However, these models face challenges in terms of sacrificing either efficiency, accuracy, or compat...

Full description

Bibliographic Details
Main Author: Le Zhou
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10348571/