Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data Y Li*, P Yuan*, S Feng, B Pan, B Sun, X Wang, H Wang, K Li AAAI 2024, 2023 | 16 | 2023 |
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning Y Li*, P Yuan*, S Feng, B Pan, X Wang, B Sun, H Wang, K Li ICLR 2024, 2024 | 9 | 2024 |
BatchEval: Towards Human-like Text Evaluation P Yuan, S Feng, Y Li, X Wang, B Pan, H Wang, K Li ACL 2024 Oral, 2023 | 9 | 2023 |
Generative Dense Retrieval: Memory Can Be a Burden P Yuan*, X Wang*, S Feng, B Pan, Y Li, H Wang, X Miao, K Li EACL 2024 Oral, 2024 | 6 | 2024 |
Better correlation and robustness: a distribution-balanced self-supervised learning framework for automatic dialogue evaluation P Yuan, X Wang, J Shi, B Sun, Y Li Advances in Neural Information Processing Systems 36, 2024 | 3 | 2024 |
Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning X Wang, S Feng, Y Li, P Yuan, Y Zhang, B Pan, H Wang, Y Hu, K Li arXiv preprint arXiv:2408.13457, 2024 | 2 | 2024 |
Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation X Wang*, Y Li*, S Feng, P Yuan, B Pan, H Wang, Y Hu, K Li ACL 2024 Main, 2024 | 2 | 2024 |
Focused Large Language Models are Stable Many-Shot Learners P Yuan, S Feng, Y Li, X Wang, Y Zhang, C Tan, B Pan, H Wang, Y Hu, ... EMNLP 2024 Main, 2024 | 1 | 2024 |
Poor-Supervised Evaluation for SuperLLM via Mutual Consistency P Yuan, S Feng, Y Li, X Wang, B Pan, H Wang, Y Hu, K Li ACL 2024 Findings, 2024 | 1 | 2024 |
Instruction Embedding: Latent Representations of Instructions Towards Task Identification Y Li, J Shi, S Feng, P Yuan, X Wang, B Pan, H Wang, Y Hu, K Li NuerIPS 2024 DB poster, 2024 | | 2024 |
CogLM: Tracking Cognitive Development of Large Language Models X Wang*, P Yuan*, S Feng, Y Li, B Pan, H Wang, Y Hu, K Li arXiv preprint arXiv:2408.09150, 2024 | | 2024 |
Parallel Corpora Alignment Framework for Multilingual and Robust Automatic Dialogue Evaluation X Wang*, J Shi*, P Yuan*, K Li Proceedings of The Eleventh Dialog System Technology Challenge, 123-132, 2023 | | 2023 |
Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Model Evaluation P Yuan, Y Zhang, S Feng, Y Li, X Wang, J Shi, C Tan, B Pan, Y Hu, K Li | | |
Mode: A Benchmark and a Probe into Multimodal Open-Domain Dialogue Evaluation H Yin*, X Wang*, Y Zhang, P Lu, B Sun, P Yuan, K Li Available at SSRN 4888542, 0 | | |