泛化性更好的评测方式 #1170

cobraheleah · 2024-05-17T03:56:47Z

针对现有公开评测集llm刷榜现象严重，不能反映模型真实效果，咱们这边有更鲁棒、泛化性更好的评测方式吗

tonysy · 2024-05-20T03:47:21Z

We provide OpenCompass 2.0 leaderboard for LLM, which consists of the non-public data.

tonysy · 2024-05-23T09:01:54Z

Feel free to re-open if needed.

mm-assistant bot assigned tonysy May 17, 2024

tonysy closed this as completed May 23, 2024

Provide feedback