Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

Olles dieđut

Bibliográfalaš dieđut
Váldodahkkit: Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
Eará dahkkit: School of Computer Science and Engineering
Materiálatiipa: Journal Article
Giella:English
Almmustuhtton: 2022
Fáttát:
Liŋkkat:https://hdl.handle.net/10356/162546