DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction

Recently, many models have been developed to predict video Quality of Experience (QoE), yet the applicability of these models still faces significant challenges. Firstly, many models rely on features that are unique to a specific dataset and thus lack the capability to generalize. Due to the intrica...

Full description

Bibliographic Details
Main Authors: Zhang, Huaizheng, Dong, Linsen, Gao, Guanyu, Hu, Han, Wen, Yonggang, Guan, Kyle
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152986
_version_ 1811679134642864128
author Zhang, Huaizheng
Dong, Linsen
Gao, Guanyu
Hu, Han
Wen, Yonggang
Guan, Kyle
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zhang, Huaizheng
Dong, Linsen
Gao, Guanyu
Hu, Han
Wen, Yonggang
Guan, Kyle
author_sort Zhang, Huaizheng
collection NTU
description Recently, many models have been developed to predict video Quality of Experience (QoE), yet the applicability of these models still faces significant challenges. Firstly, many models rely on features that are unique to a specific dataset and thus lack the capability to generalize. Due to the intricate interactions among these features, a unified representation that is independent of datasets with different modalities is needed. Secondly, existing models often lack the configurability to perform both classification and regression tasks. Thirdly, the sample size of the available datasets to develop these models is often very small, and the impact of limited data on the performance of QoE models has not been adequately addressed. To address these issues, in this work we develop a novel and end-to-end framework termed as DeepQoE. The proposed framework first uses a combination of deep learning techniques, such as word embedding and 3D convolutional neural network (C3D), to extract generalized features. Next, these features are combined and fed into a neural network for representation learning. A learned representation will then serve as input for classification or regression tasks. We evaluate the performance of DeepQoE with three datasets. The results show that for small datasets (e.g., WHU-MVQoE2016 and Live-Netflix Video Database), the performance of state-of-the-art machine learning algorithms is greatly improved by using the QoE representation from DeepQoE (e.g., 35.71% to 44.82%); while for the large dataset (e.g., VideoSet), our DeepQoE framework achieves significant performance improvement in comparison to the best baseline method (90.94% vs. 82.84%). In addition to the much improved performance, DeepQoE has the flexibility to fit different datasets, to learn QoE representation, and to perform both classification and regression problems. We also develop a DeepQoE based adaptive bitrate streaming (ABR) system to verify that our framework can be easily applied to multimedia communication service. The software package of the DeepQoE framework has been released to facilitate the current research on QoE.
first_indexed 2024-10-01T03:04:20Z
format Journal Article
id ntu-10356/152986
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:04:20Z
publishDate 2021
record_format dspace
spelling ntu-10356/1529862021-10-27T05:57:44Z DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction Zhang, Huaizheng Dong, Linsen Gao, Guanyu Hu, Han Wen, Yonggang Guan, Kyle School of Computer Science and Engineering Engineering::Computer science and engineering Video Quality of Experience Deep Learning Recently, many models have been developed to predict video Quality of Experience (QoE), yet the applicability of these models still faces significant challenges. Firstly, many models rely on features that are unique to a specific dataset and thus lack the capability to generalize. Due to the intricate interactions among these features, a unified representation that is independent of datasets with different modalities is needed. Secondly, existing models often lack the configurability to perform both classification and regression tasks. Thirdly, the sample size of the available datasets to develop these models is often very small, and the impact of limited data on the performance of QoE models has not been adequately addressed. To address these issues, in this work we develop a novel and end-to-end framework termed as DeepQoE. The proposed framework first uses a combination of deep learning techniques, such as word embedding and 3D convolutional neural network (C3D), to extract generalized features. Next, these features are combined and fed into a neural network for representation learning. A learned representation will then serve as input for classification or regression tasks. We evaluate the performance of DeepQoE with three datasets. The results show that for small datasets (e.g., WHU-MVQoE2016 and Live-Netflix Video Database), the performance of state-of-the-art machine learning algorithms is greatly improved by using the QoE representation from DeepQoE (e.g., 35.71% to 44.82%); while for the large dataset (e.g., VideoSet), our DeepQoE framework achieves significant performance improvement in comparison to the best baseline method (90.94% vs. 82.84%). In addition to the much improved performance, DeepQoE has the flexibility to fit different datasets, to learn QoE representation, and to perform both classification and regression problems. We also develop a DeepQoE based adaptive bitrate streaming (ABR) system to verify that our framework can be easily applied to multimedia communication service. The software package of the DeepQoE framework has been released to facilitate the current research on QoE. National Research Foundation (NRF) This work was supported in part and jointly by a gift fund from Microsoft Research Asia (Ref. FY18-Research-Theme-051), a project fund from DSAIR@NTU, and a BSEWWT project fund from Singapore National Research Foundation, administrated through the BSEWWT program office (Ref. BSEWWT2017_2_06), and in part by National Natural Science Foundation of China (NSFC) under Grant 61971457. 2021-10-27T05:57:44Z 2021-10-27T05:57:44Z 2020 Journal Article Zhang, H., Dong, L., Gao, G., Hu, H., Wen, Y. & Guan, K. (2020). DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction. IEEE Transactions On Multimedia, 22(12), 3210-3223. https://dx.doi.org/10.1109/TMM.2020.2973828 1520-9210 https://hdl.handle.net/10356/152986 10.1109/TMM.2020.2973828 2-s2.0-85096581930 12 22 3210 3223 en IEEE Transactions on Multimedia © 2020 IEEE. All rights reserved.
spellingShingle Engineering::Computer science and engineering
Video Quality of Experience
Deep Learning
Zhang, Huaizheng
Dong, Linsen
Gao, Guanyu
Hu, Han
Wen, Yonggang
Guan, Kyle
DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
title DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
title_full DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
title_fullStr DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
title_full_unstemmed DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
title_short DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
title_sort deepqoe a multimodal learning framework for video quality of experience qoe prediction
topic Engineering::Computer science and engineering
Video Quality of Experience
Deep Learning
url https://hdl.handle.net/10356/152986
work_keys_str_mv AT zhanghuaizheng deepqoeamultimodallearningframeworkforvideoqualityofexperienceqoeprediction
AT donglinsen deepqoeamultimodallearningframeworkforvideoqualityofexperienceqoeprediction
AT gaoguanyu deepqoeamultimodallearningframeworkforvideoqualityofexperienceqoeprediction
AT huhan deepqoeamultimodallearningframeworkforvideoqualityofexperienceqoeprediction
AT wenyonggang deepqoeamultimodallearningframeworkforvideoqualityofexperienceqoeprediction
AT guankyle deepqoeamultimodallearningframeworkforvideoqualityofexperienceqoeprediction