Cmmlu: Measuring massive multitask language understanding in chinese H Li, Y Zhang, F Koto, Y Yang, H Zhao, Y Gong, N Duan, T Baldwin arXiv preprint arXiv:2306.09212, 2023 | 98 | 2023 |
Confidence matters: Revisiting intrinsic self-correction capabilities of large language models L Li, G Chen, Y Su, Z Chen, Y Zhang, E Xing, K Zhang arXiv preprint arXiv:2402.12563, 2024 | 6 | 2024 |
Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE Y Zhang, H Li Ancient Language Processing Workshop, 2023, 2023 | 6 | 2023 |
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang, J Gao, Y Zhang, W Che, ... arXiv preprint arXiv:2404.00629, 2024 | 5 | 2024 |
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents R Wang, H Li, X Han, Y Zhang, T Baldwin arXiv preprint arXiv:2402.11651, 2024 | 1 | 2024 |