Enhancing semantics with multi‐objective reinforcement learning for video description
Abstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the err...
Main Authors: | Qinyu Li, Longyu Yang, Pengjie Tang, Hanli Wang |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-12-01
|
Series: | Electronics Letters |
Subjects: | |
Online Access: | https://doi.org/10.1049/ell2.12334 |
Similar Items
-
Unified multi‐stage fusion network for affective video content analysis
by: Yun Yi, et al.
Published: (2022-10-01) -
Attention‐based video object segmentation algorithm
by: Ying Cao, et al.
Published: (2021-06-01) -
A deep learning method for video‐based action recognition
by: Guanwen Zhang, et al.
Published: (2021-12-01) -
Remote sensing target tracking in satellite videos based on a variable‐angle‐adaptive Siamese network
by: Fukun Bi, et al.
Published: (2021-07-01) -
Anomaly detection in video sequences: A benchmark and computational model
by: Boyang Wan, et al.
Published: (2021-12-01)