About Reflection 70B
Reflection 70B is currently the world's top open-source LLM, trained using innovative Reflection-Tuning technology. This technique enables the model to detect errors in reasoning and correct them promptly, greatly improving its performance and reliability.
In benchmark tests, Reflection 70B demonstrates exceptional performance, outperforming many leading models in tasks such as GPQA, MMLU, HumanEval, MATH, and GSM8K. Its ability to use 0-shot Reflection consistently yields top-tier results across various domains.
Coming Soon: Reflection 405B
Our upcoming Reflection 405B model is expected to become the world's best-performing LLM, including closed-source models. Stay tuned for this breakthrough AI technology!
Advantages of ReflectionAI
Why Choose ReflectionAI?
- Advanced Reflection-Tuning technology
- Top-tier open-source LLM performance
- Continuously improving reasoning capabilities
- Wide range of application potential
Frequently Asked Questions
Performance Comparison
Benchmark test | Reflection 70B | Claude 3.5 Sonnet | Claude 3 Opus | GPT-4o | Gemini 1.5 Pro | Llama 3.1 405B |
---|---|---|---|---|---|---|
GPQA | 55.3% (0-shot Reflection) | 59.4%* (0-shot CoT) | 50.4% (0-shot CoT) | 53.6% (0-shot CoT) | - | 50.7% (0-shot) |
MMLU | 89.9% (0-shot Reflection) | 88.7%** (5-shot) 88.3% (0-shot CoT) | 85.7% (0-shot CoT) | 88.7% (5-shot) 85.9% (0-shot CoT) | 87.3% (5-shot) 88.6% (0-shot CoT) | - |
HumanEval | 91% (0-shot Reflection) | 92.0% (0-shot) | 84.9% (0-shot) | 90.2% (0-shot) | 84.1% | 89.0% (0-shot) |
MATH | 79.7% (0-shot Reflection) | 71.1% (0-shot CoT) | 60.1% (0-shot CoT) | 76.6% (4-shot) | 67.7% | 73.8% (0-shot CoT) |
GSM8K | 99.2% (0-shot Reflection) | 96.4% (0-shot CoT) | 95.0% (0-shot CoT) | - | 90.8% | 96.8% (8-shot CoT) |
IFEval | 90.13% (0-shot Reflection) | - | - | 85.6% | - | 88.6% |
Note: CoT stands for Chain-of-Thought reasoning method. The numbers in parentheses indicate the specific method or number of shots used.