ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition
Attention-based encoder–decoder scene text recognition (STR) architectures have been proven effective in recognizing text in the real world, thanks to their ability to learn an internal language model. Nevertheless, the cross-attention operation that is used to align visual and linguistic features d...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-12-01
|
Series: | Journal of Imaging |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-433X/9/12/276 |