Inducing high energy-latency of large vision-language models with verbose images

Large vision-language models (VLMs) such as GPT-4 have achieved exceptional performance across various multi-modal tasks. However, the deployment of VLMs necessitates substantial energy consumption and computational resources. Once attackers maliciously induce high energy consumption and latency tim...

ver descrição completa

Detalhes bibliográficos
Main Authors: Gao, K, Bai, Y, Gu, J, Xia, ST, Torr, P, Li, Z, Liu, W
Formato: Conference item
Idioma:English
Publicado em: OpenReview 2024