OpenAI o1-mini: Advancing Cost-Efficient Reasoning for STEM

Today, OpenAI is introducing o1-mini, a compact and affordable model specifically designed for tasks requiring strong reasoning abilities, particularly in STEM fields such as mathematics and programming. Despite its smaller size, o1-mini demonstrates performance nearly on par with the larger OpenAI o1 model in key benchmarks like AIME and Codeforces, while offering faster responses and lower costs for applications where broad world knowledge is less critical.

o1-mini Now Available for Tier 5 API Users

o1-mini is now accessible to Tier 5 API users, priced 80% lower than the OpenAI o1-preview model. Users across ChatGPT Plus, Team, Enterprise, and Edu can also adopt o1-mini as a faster, more cost-effective alternative to o1-preview, benefiting from enhanced rate limits and reduced latency (see “Model Speed” for further details).

openai o1 mini stats

Designed for STEM Reasoning Tasks

Large models like o1 are known for their expansive world knowledge, but this can come at the expense of speed and cost in practical applications. By contrast, o1-mini is built with a focus on reasoning, particularly in STEM disciplines, offering similar performance to o1 on many complex reasoning tasks, but at a significantly lower operational cost. The model has been optimized using the same reinforcement learning pipeline as o1, delivering robust results in reasoning-heavy areas.

Though o1-mini excels in STEM, it is less effective at tasks requiring comprehensive world knowledge (see “Limitations”).

Competitive Performance in Math and Coding

In benchmark tests, o1-mini has proven itself as a competitive option for both math and coding tasks, closely rivaling the performance of o1 at a fraction of the cost.

  • Mathematics (AIME): In the AIME math competition, o1-mini scored 70%, just behind o1‘s 74.4%, and notably outperformed o1-preview‘s 44.6%. This level of performance places o1-mini in the top 500 of US high school students, demonstrating its prowess in mathematical reasoning.
  • Coding (Codeforces): On the Codeforces platform, o1-mini achieved an Elo score of 1650, nearly matching o1‘s 1673 and surpassing o1-preview‘s 1258. This places the model in the 86th percentile of Codeforces programmers, reflecting its strong capabilities in programming and coding challenges.

openai o1 mini MMLU GPQA MATH-500

Broader STEM Capabilities

Beyond these benchmarks, o1-mini also performs well in other academic reasoning tasks, such as GPQA (science) and MATH-500. While it does not surpass GPT-4o in general knowledge tasks like MMLU, it consistently delivers strong results in STEM-specific domains.

Human Preference Evaluations further validate o1-mini’s strengths, with raters favoring it over GPT-4o in areas like mathematical calculation, programming, and data analysis, though it trails in more language-focused tasks.

For users seeking a fast, efficient, and cost-effective solution for STEM reasoning tasks, o1-mini offers a compelling alternative to larger models, delivering high performance where it counts.

Leave the first comment

Conor Dart

A deep desire to explore and learn as much about AI as possible while spreading kindness and helping others.

The Power of AI with Our Free Prompt Blueprints

Supercharge your productivity and creativity with our curated collection of AI prompts, designed to help you harness the full potential of custom GPTs across various domains.

Want to be notified when we have new and exciting shares?

We use cookies in order to give you the best possible experience on our website.
By continuing to use this site, you agree to our use of cookies.
Please review our GDPR Policy here.
Accept
Reject