Enhancing semantics with multi‐objective reinforcement learning for video description
Abstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the err...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-12-01
|
Series: | Electronics Letters |
Subjects: | |
Online Access: | https://doi.org/10.1049/ell2.12334 |