Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists o...
Main Authors: | Mao, Junhua, Xu, Wei, Yang, Yi, Wang, Jiang, Huang, Zhiheng, Yuille, Alan L. |
---|---|
Format: | Technical Report |
Language: | en_US |
Published: |
Center for Brains, Minds and Machines (CBMM), arXiv
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/100198 |
Similar Items
-
CARF-net : CNN attention and RNN fusion network for video-based person reidentification
by: Prasad, Dilip Kumar, et al.
Published: (2019) -
Stack-VS : stacked visual-semantic attention for image caption generation
by: Cheng, Ling, et al.
Published: (2021) -
Dimension reduction in recurrent networks by canonicalization
by: Grigoryeva, Lyudmila, et al.
Published: (2022) -
Inferring origin-destination distribution of agent transfer in a complex network using deep gated recurrent units
by: Saw, Vee-Liem, et al.
Published: (2023) -
The effect of varying kilovoltage (kVp) and tube current (mAs) on the image quality and dose of CTA head phantom / Sity Noor Ayseah Dzulkafli
by: Dzulkafli, Sity Noor Ayseah
Published: (2015)