Inducing high energy-latency of large vision-language models with verbose images
Large vision-language models (VLMs) such as GPT-4 have achieved exceptional performance across various multi-modal tasks. However, the deployment of VLMs necessitates substantial energy consumption and computational resources. Once attackers maliciously induce high energy consumption and latency tim...
Հիմնական հեղինակներ: | , , , , , , |
---|---|
Ձևաչափ: | Conference item |
Լեզու: | English |
Հրապարակվել է: |
OpenReview
2024
|