Notebook

对抗样本

label	remark
对抗样本生成算法	常见的对抗样本生成算法
PGD	Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks
FGSM	EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
LLM Self Denfense	LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
LLM(universal atk)	Universal and Transferable Adversarial Attacks on Aligned Language Models