On December 11, 2025, OpenAI released GPT-5.2—their most capable model series yet for professional knowledge work. This isn't just another incremental update. GPT-5.2 is the first model to beat human experts on a wide range of real-world professional tasks, scoring 70.9% on GDPval where it outperforms industry professionals across 44 occupations.
The release came in response to Google's rapid Gemini 3 advances, with reports of a "Code Red" memo at OpenAI pushing the team to accelerate development. The result is a model that sets new benchmarks in coding, math, long-context reasoning, and agentic tool use.
What's New in GPT-5.2
GPT-5.2 arrives in three variants, each optimized for different use cases:
- GPT-5.2 Instant: Fast, capable workhorse for everyday work—writing, translation, and quick tasks
- GPT-5.2 Thinking: Deeper reasoning for complex tasks—coding, summarizing long documents, math, and multi-step planning
- GPT-5.2 Pro: Maximum capability for difficult questions where accuracy matters most
Key Specifications
- Context Window: 400,000 tokens (272K prompt, 128K completion)
- Modalities: Text, images, files (input); text (output)
- Knowledge Cutoff: August 31, 2025
- Adaptive Reasoning: Dynamically allocates compute based on task complexity
The adaptive reasoning capability is particularly notable—GPT-5.2 responds quickly to simple queries while spending more depth on complex tasks, making it both fast and thorough depending on what you need.
Benchmark Performance
GPT-5.2 sets new state-of-the-art results across multiple categories. Here's how it compares to other current frontier models:
| Benchmark | GPT-5.2 Thinking | GPT-5.2 Pro | Gemini 3 Pro | Claude Opus 4.5 |
| GDPval (Professional Tasks) | 70.9% | 74.1% | — | — |
| SWE-Bench Pro (Coding) | 55.6% | — | — | — |
| SWE-Bench Verified | 80.0% | — | 78% | — |
| GPQA Diamond (Science) | 92.4% | 93.2% | 91.9% | 87% |
| AIME 2025 (Math) | 100% | 100% | 95% | 100% |
| ARC-AGI-2 (Reasoning) | 52.9% | 54.2% | 31.1% | 37.6% |
Data sources: OpenAI Official Announcement, Artificial Analysis
Intelligence Index Comparison
According to Artificial Analysis, GPT-5.2 leads the intelligence index rankings:
Token Efficiency
GPT-5.2 shows significant improvements in token efficiency across reasoning effort levels:
Long Context Reasoning
GPT-5.2 achieves near-perfect scores on long context reasoning benchmarks:
Professional Work: First to Beat Human Experts
The headline achievement is GDPval performance. This benchmark measures real-world professional tasks across 44 occupations—creating presentations, spreadsheets, and other artifacts that knowledge workers produce daily. GPT-5.2 Thinking beats or ties industry professionals on 70.9% of tasks.
According to OpenAI, GPT-5.2 produced outputs at more than 11x the speed and less than 1% the cost of expert professionals.
Coding: State-of-the-Art
GPT-5.2 achieves 55.6% on SWE-Bench Pro—a rigorous evaluation of real-world software engineering across four programming languages. On SWE-Bench Verified (Python-only), it hits 80%, outperforming all previous models.
Early testers from Windsurf, JetBrains, and Augment Code report significantly stronger performance in interactive coding, code reviews, and bug finding.
Long Context: Near-Perfect Recall
GPT-5.2 Thinking achieves near 100% accuracy on the 4-needle MRCR variant out to 256k tokens—meaning it can find and connect information spread across hundreds of thousands of tokens of text. This makes it exceptional for document analysis, contract review, and multi-file projects.
Writingmate Evaluation
We ran GPT-5.2 through our standard evaluation suite. Here's what we found:
| Test | Result | Notes |
| Creative Writing | Excellent | Model showed sophisticated reasoning about structure and word count |
| Code Generation | Excellent | Comprehensive TypeScript with generics, JSDoc, cancel/flush methods |
| Reasoning | Correct | Got the sheep puzzle right (9) with clear explanation |
| Instruction Following | Excellent | Exactly 5 items, all under 10 words, no intro/conclusion |
| Summarization | Excellent | Exactly 2 sentences, captured all key concepts |
Reasoning: Clear and Methodical
The classic "sheep puzzle" (A farmer has 17 sheep, all but 9 die) is a common test for logical reasoning. GPT-5.2 handled it perfectly:
The key phrase is "all but 9 die." Step by step: 1. The farmer starts with 17 sheep. 2. "All but 9 die" means all except 9 die. 3. So 9 sheep survive. Answer: The farmer has 9 sheep left.
Code Generation: Production-Ready
We asked for a debounced async function with TypeScript types. GPT-5.2 delivered a sophisticated implementation including:
- Generic typing with
Parameters<F>andReturnType<F> - Promise synchronization for multiple callers
cancel()andflush()methodspending()status check- Complete JSDoc documentation
The model's reasoning showed it was actively thinking about edge cases like "what happens when multiple callers invoke during the debounce window."
Instruction Following: Precise
When asked for exactly 5 benefits of remote work, each under 10 words, with no intro or conclusion, GPT-5.2 delivered exactly that:
- Eliminates commuting time and transportation costs.
- Enables flexible schedules for personal productivity peaks.
- Expands hiring access beyond geographic limitations.
- Reduces office overhead costs for employers.
- Improves work-life balance through home-based routines.
Every item followed the constraints perfectly.
Pricing
GPT-5.2 is priced higher than GPT-5.1, reflecting its greater capabilities:
| Model | Input | Cached Input | Output |
| GPT-5.2 / GPT-5.2-chat | $1.75/M | $0.175/M | $14/M |
| GPT-5.2 Pro | $21/M | — | $168/M |
| GPT-5.1 (previous) | $1.25/M | $0.125/M | $10/M |
While the per-token cost is 40% higher than GPT-5.1, OpenAI notes that GPT-5.2's greater token efficiency often results in lower total cost per task—you get better results with fewer tokens.
On Writingmate, GPT-5.2 is included in your subscription along with 200+ other AI models.
How to Try GPT-5.2 on Writingmate
GPT-5.2 is available now on Writingmate through our OpenRouter integration. To use it:
- Open Writingmate and start a new chat
- Select GPT-5.2 from the model dropdown
- Start prompting
You can also try GPT-5.2 Pro for the most demanding tasks, or GPT-5.2 Chat (Instant) for fast, lightweight work.
Best Use Cases for GPT-5.2
Based on the benchmarks and our testing, here's where GPT-5.2 excels:
- Professional knowledge work: Creating presentations, spreadsheets, reports, and business artifacts
- Complex coding: Multi-file refactoring, debugging production code, implementing features end-to-end
- Long document analysis: Contracts, research papers, transcripts—anything requiring connection across hundreds of pages
- Scientific and mathematical reasoning: With 100% on AIME 2025 and 92.4% on GPQA Diamond
- Agentic workflows: Multi-step tasks with tool calling and long-horizon planning
Safety Improvements
GPT-5.2 includes significant safety improvements, particularly in handling sensitive conversations:
- 30% fewer responses with factual errors compared to GPT-5.1
- Improved responses to mental health, self-harm, and emotional reliance queries
- Better handling of refusals—fewer over-refusals while maintaining appropriate boundaries
The Competitive Landscape
GPT-5.2's release came amid intense competition with Google's Gemini 3 series. According to TechCrunch, OpenAI CEO Sam Altman expects the company to exit "Code Red" by January following this launch.
The result is a win for users: frontier AI models are advancing rapidly, with both OpenAI and Google pushing the boundaries of what's possible in coding, reasoning, and professional applications.
Frequently Asked Questions
Sources
Written by
Artem Vysotsky
Ex-Staff Engineer at Meta. Building the technical foundation to make AI accessible to everyone.
Reviewed by
Sergey Vysotsky
Ex-Chief Editor / PM at Mosaic. Passionate about making AI accessible and affordable for everyone.



