Summary: | This work introduces a comprehensive approach to video quality assessment (VQA) by both traditional deep-learning-based methods as well as vision-language-model-based methods. Through the development of the DIVIDE-3k database and the DOVER model, we offer nuanced insights into the multifaceted nature of video quality, capturing both technical and aesthetic dimensions. Further advancements are achieved with the Maxwell database, designed to pinpoint specific quality factors affecting video perception, and the MaxVQA model, which leverages language-prompted mechanisms for a refined analysis of video quality across various dimensions. The findings underscore the complexity of VQA, revealing the significance of both content-based and technical factors in determining video quality. This work not only advances the state-of-the-art in VQA but also sets the stage for future research in evaluating and enhancing the quality of in-the-wild videos.
|