As firm as their foundations: can open-sourced foundation models be used to create adversarial examples for downstream tasks?

Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can b...

Full description

Bibliographic Details
Main Authors: Hu, A, Gu, J, Pinto, F, Kamnitsas, K, Torr, P
Format: Conference item
Language:English
Published: 2024
Description
Summary:Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can be easily identified through the open-sourced foundation model. In this work, we expose such vulnerabilities among CLIP’s downstream models and show that foundation models can serve as a basis for attacking their downstream systems. In particular, we propose a simple yet alarmingly effective adversarial attack strategy termed Patch Representation Misalignment (PRM). Solely based on open-sourced CLIP vision encoders, this method can produce highly effective adversaries that simultaneously fool more than 20 downstream models spanning 4 common vision-language tasks (semantic segmentation, object detection, image captioning and visual question-answering). Our findings highlight the concerning safety risks introduced by the extensive usage of publicly available foundational models in the development of downstream systems, calling for extra caution in these scenarios.