Q-align: teaching LMMs for visual scoring via discrete text-defined levels
The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide rang...
Main Authors: | , , , , , , , , , , , , , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/178466 http://arxiv.org/abs/2312.17090v1 https://openreview.net/forum?id=PHjkVjR78A https://icml.cc/ |