The code of paper "xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking".
Please refer to the updated version of this repository: https://github.com/Bowen1911/xJailbreak_r
| Name | Name | Last commit date | ||
|---|---|---|---|---|
The code of paper "xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking".
Please refer to the updated version of this repository: https://github.com/Bowen1911/xJailbreak_r