Reflection AI: Advanced 70B & 405B LLM Models

About Reflection 70B

Reflection 70B is currently the world's top open-source LLM, trained using innovative Reflection-Tuning technology. This technique enables the model to detect errors in reasoning and correct them promptly, greatly improving its performance and reliability.

In benchmark tests, Reflection 70B demonstrates exceptional performance, outperforming many leading models in tasks such as GPQA, MMLU, HumanEval, MATH, and GSM8K. Its ability to use 0-shot Reflection consistently yields top-tier results across various domains.

Coming Soon: Reflection 405B

Our upcoming Reflection 405B model is expected to become the world's best-performing LLM, including closed-source models. Stay tuned for this breakthrough AI technology!

Advantages of ReflectionAI

Why Choose ReflectionAI?

Advanced Reflection-Tuning technology
Top-tier open-source LLM performance
Continuously improving reasoning capabilities
Wide range of application potential

Frequently Asked Questions

Performance Comparison

Benchmark test	Reflection 70B	Claude 3.5 Sonnet	Claude 3 Opus	GPT-4o	Gemini 1.5 Pro	Llama 3.1 405B
GPQA	55.3% (0-shot Reflection)	59.4%* (0-shot CoT)	50.4% (0-shot CoT)	53.6% (0-shot CoT)	-	50.7% (0-shot)
MMLU	89.9% (0-shot Reflection)	88.7%** (5-shot) 88.3% (0-shot CoT)	85.7% (0-shot CoT)	88.7% (5-shot) 85.9% (0-shot CoT)	87.3% (5-shot) 88.6% (0-shot CoT)	-
HumanEval	91% (0-shot Reflection)	92.0% (0-shot)	84.9% (0-shot)	90.2% (0-shot)	84.1%	89.0% (0-shot)
MATH	79.7% (0-shot Reflection)	71.1% (0-shot CoT)	60.1% (0-shot CoT)	76.6% (4-shot)	67.7%	73.8% (0-shot CoT)
GSM8K	99.2% (0-shot Reflection)	96.4% (0-shot CoT)	95.0% (0-shot CoT)	-	90.8%	96.8% (8-shot CoT)
IFEval	90.13% (0-shot Reflection)	-	-	85.6%	-	88.6%

Note: CoT stands for Chain-of-Thought reasoning method. The numbers in parentheses indicate the specific method or number of shots used.