Seguir
Xiangyu QI
Xiangyu QI
Dirección de correo verificada de princeton.edu - Página principal
Título
Citado por
Citado por
Año
Fine-tuning aligned language models compromises safety, even when users do not intend to!
X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal, P Henderson
International Conference on Learning Representations (ICLR), 2024 (Oral), 2023
962023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
X Qi, K Huang, A Panda, P Henderson, M Wang, P Mittal
AAAI Conference on Artificial Intelligence, 2024 (Oral), 2023
67*2023
Revisiting the assumption of latent separability for backdoor defenses
X Qi, T Xie, Y Li, S Mahloujifar, P Mittal
International Conference on Learning Representations (ICLR), 2023, 2023
65*2023
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
X Qi, T Xie, R Pan, J Zhu, Y Yang, K Bu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral), 2021
442021
Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks
NM Gürel, X Qi, L Rimanic, C Zhang, B Li
International Conference on Machine Learning (ICML), 2021, 2021
292021
Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting
X Qi, J Zhu, C Xie, Y Yang
ICLR Workshop, 2021
242021
Towards A Proactive {ML} Approach for Detecting Backdoor Poison Samples
X Qi, T Xie, JT Wang, T Wu, S Mahloujifar, P Mittal
32nd USENIX Security Symposium (USENIX Security 23), 1685-1702, 2023
16*2023
Assessing the brittleness of safety alignment via pruning and low-rank modifications
B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia, P Mittal, M Wang, ...
arXiv preprint arXiv:2402.05162, 2024
42024
Uncovering Adversarial Risks of Test-Time Adaptation
T Wu, F Jia, X Qi, JT Wang, V Sehwag, S Mahloujifar, P Mittal
International Conference on Machine Learning (ICML), 2023, 2023
42023
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
T Xie, X Qi, P He, Y Li, JT Wang, P Mittal
International Conference on Learning Representations (ICLR), 2024, 2023
12023
Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
J Wang, J Li, Y Li, X Qi, M Chen, J Hu, Y Li, B Li, C Xiao
arXiv preprint arXiv:2402.14968, 2024
2024
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–11