Notebook

对抗样本

label remark
对抗样本生成算法 常见的对抗样本生成算法
PGD Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks
FGSM EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
LLM Self Denfense LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
LLM(universal atk) Universal and Transferable Adversarial Attacks on Aligned Language Models