Aligning source visual and target language domains for unpaired video captioning
Training supervised video captioning model requires coupled video-caption pairs. However, for many targeted languages, sufficient paired data are not available. To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language....
Hoofdauteurs: | , , , , , |
---|---|
Formaat: | Journal article |
Taal: | English |
Gepubliceerd in: |
IEEE
2022
|