Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction

Video memorability prediction aims to quantify the credibility of being remembered according to the video content, which provides significant value in advertising design, social media recommendation, and other applications. However, the main attributes that affect the memorability prediction have no...

Full description

Bibliographic Details
Main Authors:	Jing Li, Xin Guo, Fumei Yue, Fanfu Xue, Jiande Sun
Format:	Article
Language:	English
Published:	MDPI AG 2022-08-01
Series:	Applied Sciences
Subjects:	multi-modal video memorability ensemble learning
Online Access:	https://www.mdpi.com/2076-3417/12/17/8599

_version_	1797496406543433728
author	Jing Li Xin Guo Fumei Yue Fanfu Xue Jiande Sun
author_facet	Jing Li Xin Guo Fumei Yue Fanfu Xue Jiande Sun
author_sort	Jing Li
collection	DOAJ
description	Video memorability prediction aims to quantify the credibility of being remembered according to the video content, which provides significant value in advertising design, social media recommendation, and other applications. However, the main attributes that affect the memorability prediction have not been determined so that making the design of the prediction model more challenging. Therefore, in this study, we analyze and experimentally verify how to select the most impact factors to predict video memorability. Furthermore, we design a new framework, Adaptive Multi-modal Ensemble Network, based on the chosen vital impact factors to predict video memorability efficiently. Specifically, we first conduct three main impact factors that affect video memorability, i.e., temporal 3D information, spatial information and semantics derived from video, image and caption, respectively. Then, the Adaptive Multi-modal Ensemble Network integrates the three individual base learners (i.e., ResNet3D, Deep Random Forest and Multi-Layer Perception) into a weighted ensemble framework to score the video memorability. In addition, we also design an adaptive learning strategy to update the weights based on the importance of memorability, which is predicted by the base learners rather than assigning weights manually. Finally, the experiments on the public VideoMem dataset demonstrate that the proposed method provides competitive results and high efficiency for video memorability prediction.
first_indexed	2024-03-10T03:03:09Z
format	Article
id	doaj.art-da3eea449ad24d62b0d0a9a8b92887fc
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T03:03:09Z
publishDate	2022-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-da3eea449ad24d62b0d0a9a8b92887fc2023-11-23T12:42:25ZengMDPI AGApplied Sciences2076-34172022-08-011217859910.3390/app12178599Adaptive Multi-Modal Ensemble Network for Video Memorability PredictionJing Li0Xin Guo1Fumei Yue2Fanfu Xue3Jiande Sun4School of Jouralism and Communication, Shandong Normal University, Jinan 250061, ChinaShandong Haiyi Digital Technology Co., Ltd., Zibo 256410, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250061, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250061, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250061, ChinaVideo memorability prediction aims to quantify the credibility of being remembered according to the video content, which provides significant value in advertising design, social media recommendation, and other applications. However, the main attributes that affect the memorability prediction have not been determined so that making the design of the prediction model more challenging. Therefore, in this study, we analyze and experimentally verify how to select the most impact factors to predict video memorability. Furthermore, we design a new framework, Adaptive Multi-modal Ensemble Network, based on the chosen vital impact factors to predict video memorability efficiently. Specifically, we first conduct three main impact factors that affect video memorability, i.e., temporal 3D information, spatial information and semantics derived from video, image and caption, respectively. Then, the Adaptive Multi-modal Ensemble Network integrates the three individual base learners (i.e., ResNet3D, Deep Random Forest and Multi-Layer Perception) into a weighted ensemble framework to score the video memorability. In addition, we also design an adaptive learning strategy to update the weights based on the importance of memorability, which is predicted by the base learners rather than assigning weights manually. Finally, the experiments on the public VideoMem dataset demonstrate that the proposed method provides competitive results and high efficiency for video memorability prediction.https://www.mdpi.com/2076-3417/12/17/8599multi-modalvideo memorabilityensemble learning
spellingShingle	Jing Li Xin Guo Fumei Yue Fanfu Xue Jiande Sun Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction Applied Sciences multi-modal video memorability ensemble learning
title	Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
title_full	Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
title_fullStr	Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
title_full_unstemmed	Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
title_short	Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
title_sort	adaptive multi modal ensemble network for video memorability prediction
topic	multi-modal video memorability ensemble learning
url	https://www.mdpi.com/2076-3417/12/17/8599
work_keys_str_mv	AT jingli adaptivemultimodalensemblenetworkforvideomemorabilityprediction AT xinguo adaptivemultimodalensemblenetworkforvideomemorabilityprediction AT fumeiyue adaptivemultimodalensemblenetworkforvideomemorabilityprediction AT fanfuxue adaptivemultimodalensemblenetworkforvideomemorabilityprediction AT jiandesun adaptivemultimodalensemblenetworkforvideomemorabilityprediction

Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction

Similar Items