MathCAMPS Logo

Fine-grained Synthesis of Mathematical Problems From Human Curricula

Gradewise and Overall Performance

Click on any model's name to see that model's detailed performance by grade and standard level. You will also be able to see the model's responses to individual problems in each standard.

Vendor Model All K 1 2 3 4 5 6 7 8
OpenAI GPT-4o 0.92 0.98 0.98 0.98 0.98 0.92 0.88 0.95 0.89 0.64
Anthropic Claude-3 Opus 0.89 0.97 0.99 0.96 0.98 0.89 0.83 0.96 0.73 0.56
Qwen Qwen2-Math 72B 0.89 0.98 0.99 0.98 0.97 0.90 0.80 0.91 0.77 0.59
Google Gemini-1.5 Pro 0.89 0.95 0.98 0.97 0.97 0.89 0.83 0.93 0.78 0.54
Google Gemini-1.5 Flash 0.87 0.98 0.98 0.97 0.98 0.80 0.80 0.90 0.84 0.56
OpenAI GPT-3.5 Turbo 0.87 0.96 0.98 0.98 0.97 0.86 0.77 0.90 0.77 0.56
Anthropic Claude-3 Sonnet 0.86 0.96 0.98 0.97 0.98 0.88 0.74 0.94 0.66 0.49
Meta Llama 3 70B 0.85 0.96 0.97 0.97 0.97 0.85 0.71 0.87 0.73 0.50
Mistral Mixtral 8x22B 0.84 0.96 0.99 0.98 0.96 0.79 0.69 0.88 0.73 0.61
Anthropic Claude-3 Haiku 0.84 0.97 0.98 0.97 0.98 0.87 0.69 0.92 0.59 0.51
Qwen Qwen2-Math 7B 0.83 0.96 0.99 0.97 0.93 0.85 0.66 0.91 0.58 0.62
DeepSeek DeepSeek 67B 0.80 0.95 0.99 0.96 0.93 0.82 0.60 0.84 0.61 0.47
DeepSeek DeepSeek Math 7B Base 0.78 0.94 0.97 0.93 0.89 0.75 0.63 0.86 0.53 0.55
Numina NuminaMath 7B TIR 0.78 0.89 0.97 0.95 0.90 0.72 0.63 0.84 0.59 0.53
Meta Llama 3 8B 0.77 0.94 0.97 0.96 0.94 0.78 0.55 0.79 0.53 0.43
Mistral Mixtral 8x7B 0.76 0.94 0.96 0.93 0.91 0.75 0.52 0.80 0.53 0.45
InternLM InternLM-Math Base 20B 0.74 0.95 0.96 0.95 0.86 0.68 0.55 0.79 0.52 0.47
EleutherAI LLemma 34B 0.71 0.95 0.96 0.93 0.87 0.61 0.47 0.77 0.46 0.44
Mistral Mistral 7B 0.68 0.89 0.94 0.91 0.84 0.61 0.42 0.66 0.45 0.42
DeepSeek DeepSeek Coder 33B 0.65 0.88 0.93 0.92 0.83 0.54 0.36 0.66 0.44 0.38
Meta CodeLlama 34B 0.64 0.90 0.94 0.92 0.85 0.51 0.38 0.70 0.37 0.30
Microsoft phi-2 0.63 0.95 0.96 0.89 0.78 0.46 0.38 0.61 0.37 0.41
EleutherAI LLemma 7B 0.62 0.78 0.90 0.85 0.79 0.48 0.41 0.67 0.41 0.36
Google Gemma 7B 0.62 0.83 0.92 0.90 0.82 0.47 0.36 0.65 0.36 0.30
Meta CodeLlama 13B 0.58 0.87 0.92 0.87 0.75 0.41 0.30 0.61 0.32 0.34
InternLM InternLM-Math Base 7B 0.58 0.71 0.73 0.73 0.72 0.54 0.38 0.61 0.37 0.39
Meta CodeLlama 7B 0.52 0.85 0.92 0.84 0.69 0.37 0.25 0.57 0.25 0.16
Google Gemma 2B 0.51 0.66 0.76 0.74 0.67 0.42 0.28 0.55 0.30 0.27