An image is worth 1000 lies: adversarial transferability across prompts on vision-language models

Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible...

Full description

Bibliographic Details
Main Authors: Luo, H, Gu, J, Liu, F, Torr, P
Format: Conference item
Language:English
Published: OpenReview 2024
_version_ 1811139621066637312
author Luo, H
Gu, J
Liu, F
Torr, P
author_facet Luo, H
Gu, J
Liu, F
Torr, P
author_sort Luo, H
collection OXFORD
description Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual adversarial perturbation with learnable textual prompts, which are designed to counteract the misleading effects of the adversarial image. By doing this, CroPA significantly improves the transferability of adversarial examples across prompts. Extensive experiments are conducted to verify the strong cross-prompt adversarial transferability of CroPA with prevalent VLMs including Flamingo, BLIP-2, and InstructBLIP in various different tasks.
first_indexed 2024-09-25T04:09:00Z
format Conference item
id oxford-uuid:78185aca-89f0-4301-8c7d-63f7c67bcf5b
institution University of Oxford
language English
last_indexed 2024-09-25T04:09:00Z
publishDate 2024
publisher OpenReview
record_format dspace
spelling oxford-uuid:78185aca-89f0-4301-8c7d-63f7c67bcf5b2024-06-13T12:33:50ZAn image is worth 1000 lies: adversarial transferability across prompts on vision-language modelsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:78185aca-89f0-4301-8c7d-63f7c67bcf5bEnglishSymplectic ElementsOpenReview2024Luo, HGu, JLiu, FTorr, PDifferent from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual adversarial perturbation with learnable textual prompts, which are designed to counteract the misleading effects of the adversarial image. By doing this, CroPA significantly improves the transferability of adversarial examples across prompts. Extensive experiments are conducted to verify the strong cross-prompt adversarial transferability of CroPA with prevalent VLMs including Flamingo, BLIP-2, and InstructBLIP in various different tasks.
spellingShingle Luo, H
Gu, J
Liu, F
Torr, P
An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
title An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
title_full An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
title_fullStr An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
title_full_unstemmed An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
title_short An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
title_sort image is worth 1000 lies adversarial transferability across prompts on vision language models
work_keys_str_mv AT luoh animageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT guj animageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT liuf animageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT torrp animageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT luoh imageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT guj imageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT liuf imageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels
AT torrp imageisworth1000liesadversarialtransferabilityacrosspromptsonvisionlanguagemodels