label | remark |
---|---|
对抗样本生成算法 | 常见的对抗样本生成算法 |
PGD | Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks |
FGSM | EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES |
LLM Self Denfense | LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked |
LLM(universal atk) | Universal and Transferable Adversarial Attacks on Aligned Language Models |