Vision-language-model-based video quality assessment

This work introduces a comprehensive approach to video quality assessment (VQA) by both traditional deep-learning-based methods as well as vision-language-model-based methods. Through the development of the DIVIDE-3k database and the DOVER model, we offer nuanced insights into the multifaceted natur...

Full description

Bibliographic Details
Main Author: Zhang, Erli
Other Authors: Lin Weisi
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175035
Description
Summary:This work introduces a comprehensive approach to video quality assessment (VQA) by both traditional deep-learning-based methods as well as vision-language-model-based methods. Through the development of the DIVIDE-3k database and the DOVER model, we offer nuanced insights into the multifaceted nature of video quality, capturing both technical and aesthetic dimensions. Further advancements are achieved with the Maxwell database, designed to pinpoint specific quality factors affecting video perception, and the MaxVQA model, which leverages language-prompted mechanisms for a refined analysis of video quality across various dimensions. The findings underscore the complexity of VQA, revealing the significance of both content-based and technical factors in determining video quality. This work not only advances the state-of-the-art in VQA but also sets the stage for future research in evaluating and enhancing the quality of in-the-wild videos.