Adversarial Prompt Transformation for Systematic Jailbreaks of LLMs
The rapid integration of Large Language Models (LLMs) like OpenAI’s GPT series into diverse sectors has significantly enhanced digital interactions but also introduced new security challenges, notably the risk of "jailbreaking" where inputs cause models to deviate from their operational gu...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/157167 |