Cmmlu: Measuring massive multitask language understanding in chinese H Li, Y Zhang, F Koto, Y Yang, H Zhao, Y Gong, N Duan, T Baldwin arXiv preprint arXiv:2306.09212, 2023 | 183 | 2023 |
Confidence matters: Revisiting intrinsic self-correction capabilities of large language models L Li, Z Chen, G Chen, Y Zhang, Y Su, E Xing, K Zhang arXiv preprint arXiv:2402.12563, 2024 | 21 | 2024 |
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents R Wang, H Li, X Han, Y Zhang, T Baldwin arXiv preprint arXiv:2402.11651, 2024 | 14 | 2024 |
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang, J Gao, Y Zhang, W Che, ... Journal of Artificial Intelligence Research 82, 687-775, 2025 | 13 | 2025 |
Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE Y Zhang, H Li Ancient Language Processing Workshop, 2023, 2023 | 12 | 2023 |
Causal Representation Learning from Multimodal Biological Observations Y Sun, L Kong, G Chen, L Li, G Luo, Z Li, Y Zhang, Y Zheng, M Yang, ... arXiv preprint arXiv:2411.06518, 2024 | | 2024 |