Skip to content
#

foundation-models

Here are 165 public repositories matching this topic...

MixEval, a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably updated every month to avoid contamination.

  • Updated Jun 2, 2024
  • Python

Improve this page

Add a description, image, and links to the foundation-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the foundation-models topic, visit your repo's landing page and select "manage topics."

Learn more