Neighbourhood representative sampling for efficient end-to-end video quality assessment

The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, keeping the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing or croppin...

Full description

Bibliographic Details
Main Authors:	Wu, Haoning, Chen, Chaofeng, Liao, Liang, Hou, Jingwen, Sun, Wenxiu, Yan, Qiong, Gu, Jinwei, Lin, Weisi
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Quality-Sensitive Neighbourhood Representatives Video Quality Assessment
Online Access:	https://hdl.handle.net/10356/173445

_version_	1826124513590378496
author	Wu, Haoning Chen, Chaofeng Liao, Liang Hou, Jingwen Sun, Wenxiu Yan, Qiong Gu, Jinwei Lin, Weisi
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Wu, Haoning Chen, Chaofeng Liao, Liang Hou, Jingwen Sun, Wenxiu Yan, Qiong Gu, Jinwei Lin, Weisi
author_sort	Wu, Haoning
collection	NTU
description	The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, keeping the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing or cropping, will change the quality of original videos due to difference in details or loss of contents, and are henceforth harmful to quality assessment. With obtained insight from the studies of spatial-temporal redundancy in the human visual system, visual quality around a neighbourhood has high probability to be similar, and this motivates us to investigate an effective quality-sensitive neighbourhood representative sampling scheme for VQA. In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS), and the resultant samples are named fragments. In St-GMS, full-resolution videos are first divided into mini-cubes with predefined spatial-temporal grids, then the temporal-aligned quality representatives are sampled to compose the fragments that serve as inputs for VQA. In addition, we design the Fragment Attention Network (FANet), a network architecture tailored specifically for fragments. With fragments and FANet, the proposed FAST-VQA and FasterVQA (with an improved sampling scheme) achieves up to 1612× efficiency than the existing state-of-the-art, meanwhile achieving significantly better performance on all relevant VQA benchmarks.
first_indexed	2024-10-01T06:21:32Z
format	Journal Article
id	ntu-10356/173445
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T06:21:32Z
publishDate	2024
record_format	dspace
spelling	ntu-10356/1734452024-02-06T07:06:15Z Neighbourhood representative sampling for efficient end-to-end video quality assessment Wu, Haoning Chen, Chaofeng Liao, Liang Hou, Jingwen Sun, Wenxiu Yan, Qiong Gu, Jinwei Lin, Weisi School of Computer Science and Engineering S-Lab Computer and Information Science Quality-Sensitive Neighbourhood Representatives Video Quality Assessment The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, keeping the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing or cropping, will change the quality of original videos due to difference in details or loss of contents, and are henceforth harmful to quality assessment. With obtained insight from the studies of spatial-temporal redundancy in the human visual system, visual quality around a neighbourhood has high probability to be similar, and this motivates us to investigate an effective quality-sensitive neighbourhood representative sampling scheme for VQA. In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS), and the resultant samples are named fragments. In St-GMS, full-resolution videos are first divided into mini-cubes with predefined spatial-temporal grids, then the temporal-aligned quality representatives are sampled to compose the fragments that serve as inputs for VQA. In addition, we design the Fragment Attention Network (FANet), a network architecture tailored specifically for fragments. With fragments and FANet, the proposed FAST-VQA and FasterVQA (with an improved sampling scheme) achieves up to 1612× efficiency than the existing state-of-the-art, meanwhile achieving significantly better performance on all relevant VQA benchmarks. Agency for Science, Technology and Research (A*STAR) This work was supported in part by RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative and in part by cash and in-kind Contribution from the Industry Partner(s). 2024-02-05T02:19:34Z 2024-02-05T02:19:34Z 2023 Journal Article Wu, H., Chen, C., Liao, L., Hou, J., Sun, W., Yan, Q., Gu, J. & Lin, W. (2023). Neighbourhood representative sampling for efficient end-to-end video quality assessment. IEEE Transactions On Pattern Analysis and Machine Intelligence, 45(12), 15185-15202. https://dx.doi.org/10.1109/TPAMI.2023.3319332 0162-8828 https://hdl.handle.net/10356/173445 10.1109/TPAMI.2023.3319332 2-s2.0-85172997445 12 45 15185 15202 en IEEE Transactions on Pattern Analysis and Machine Intelligence © 2023 IEEE. All rights reserved.
spellingShingle	Computer and Information Science Quality-Sensitive Neighbourhood Representatives Video Quality Assessment Wu, Haoning Chen, Chaofeng Liao, Liang Hou, Jingwen Sun, Wenxiu Yan, Qiong Gu, Jinwei Lin, Weisi Neighbourhood representative sampling for efficient end-to-end video quality assessment
title	Neighbourhood representative sampling for efficient end-to-end video quality assessment
title_full	Neighbourhood representative sampling for efficient end-to-end video quality assessment
title_fullStr	Neighbourhood representative sampling for efficient end-to-end video quality assessment
title_full_unstemmed	Neighbourhood representative sampling for efficient end-to-end video quality assessment
title_short	Neighbourhood representative sampling for efficient end-to-end video quality assessment
title_sort	neighbourhood representative sampling for efficient end to end video quality assessment
topic	Computer and Information Science Quality-Sensitive Neighbourhood Representatives Video Quality Assessment
url	https://hdl.handle.net/10356/173445
work_keys_str_mv	AT wuhaoning neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT chenchaofeng neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT liaoliang neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT houjingwen neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT sunwenxiu neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT yanqiong neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT gujinwei neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment AT linweisi neighbourhoodrepresentativesamplingforefficientendtoendvideoqualityassessment

Neighbourhood representative sampling for efficient end-to-end video quality assessment

Similar Items