UAT: Universal Attention Transformer for Video Captioning
Video captioning via encoder–decoder structures is a successful sentence generation method. In addition, using various feature extraction networks for extracting multiple features to obtain multiple kinds of visual features in the encoding process is a standard method for improving model performance...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-06-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/22/13/4817 |