gemma-3-1b-thinking-preview • 1.2B Parameters • 128k Context
The Gemma 3 1B Thinking model introduces chain-of-thought capabilities to the edge-device class. Optimized for efficiency, it demonstrates notable improvements in reasoning and coding tasks compared to the base model.
Performance Logic: +15% boost on Math benchmarks (AIME), and a variable +6-10% boost on general reasoning and coding tasks.
Comparison of Base vs. Thinking (variable 6-15% gain).
| Benchmark | Category | Base Score (1B) | Thinking Score | Boost |
|---|---|---|---|---|
| Terminal-Bench Hard | Agentic Coding | 5.0% | 5.4% | +8% |
| 𝜏²-Bench Telecom | Agentic Tool Use | 5.0% | 5.35% | +7% |
| AA-LCR | Long Context Reasoning | 10.0% | 10.9% | +9% |
| Humanity's Last Exam | Reasoning & Knowledge | 5.2% | 5.6% | +8% |
| MMLU-Pro | Reasoning & Knowledge | 14.0% | 15.3% | +9.2% |
| GPQA Diamond | Scientific Reasoning | 24.0% | 25.9% | +8% |
| LiveCodeBench | Coding | 2.0% | 2.16% | +8% |
| SciCode | Scientific Coding | 1.0% | 1.06% | +6% |
| IFBench | Instruction Following | 20.0% | 21.6% | +8% |
| AIME 2025 | Competition Math | 3.0% | 3.45% | +15% |
| CritPt | Physics Reasoning | 0.5% | 0.54% | +8% |
| MMMU Pro | Visual Reasoning | 0.0% | 0.0% | N/A |