• Home
  • Blog
  • AI News
  • GPT-5.3-Codex: OpenAI's Most Capable Agentic Coding Model

GPT-5.3-Codex: OpenAI's Most Capable Agentic Coding Model

Updated at Feb 6, 2026

1 min


Introduction

On February 5, 2026, OpenAI announced GPT-5.3-Codex, its most advanced agentic coding model to date. This release marks a significant milestone in artificial intelligence—not only for its impressive technical capabilities but also because it's the first model that was instrumental in creating itself.
GPT-5.3-Codex represents a fundamental shift from a code-writing tool to an interactive AI collaborator capable of handling long-horizon, real-world technical work across the full spectrum of professional computing tasks.

What Makes GPT-5.3-Codex Different?

A True Agentic Model

Unlike traditional coding assistants that simply generate code snippets, GPT-5.3-Codex is designed as an "agentic" model. This means it can:
  • Maintain context over long-running tasks that span hours or even days
  • Use tools autonomously, including command-line interfaces, file systems, and development environments
  • Adapt and iterate based on real-time feedback without losing its place
  • Handle complex multi-step workflows that require research, planning, and execution

Self-Building Achievement

Perhaps the most remarkable aspect of GPT-5.3-Codex is that the Codex team used early versions of the model to:
  • Debug its own training process
  • Manage its own deployment
  • Diagnose test results and evaluations
  • Optimize infrastructure for the final release
This self-referential development cycle demonstrates how AI is beginning to accelerate its own improvement—a milestone that OpenAI researchers described as leaving them "blown away by how much Codex was able to accelerate its own development."

Performance Improvements

GPT-5.3-Codex is 25% faster than its predecessor (GPT-5.2-Codex), thanks to improvements in OpenAI's infrastructure and inference stack. This speed improvement enables more responsive real-time collaboration and faster iteration cycles.

Benchmark Performance: The Data

GPT-5.3-Codex achieves state-of-the-art performance across several key benchmarks that measure coding, agentic capabilities, and real-world computer use.

SWE-Bench Pro

SWE-Bench Pro is a rigorous evaluation of real-world software engineering that spans four programming languages (Python, JavaScript, TypeScript, and Go). Unlike its predecessor (SWE-Bench Verified) which only tested Python, SWE-Bench Pro is designed to be more contamination-resistant and industry-relevant.

Terminal-Bench 2.0

The 13.3% improvement on Terminal-Bench 2.0 is particularly significant. This benchmark measures the terminal skills that a coding agent needs—navigating file systems, executing commands, and managing development workflows. Notably, GPT-5.3-Codex achieves this with fewer tokens than any prior model, making it more efficient.

OSWorld-Verified

The 26.5% jump on OSWorld-Verified demonstrates dramatically improved computer-use capabilities. OSWorld is an agentic computer-use benchmark where agents must complete productivity tasks in a visual desktop environment. This massive improvement shows that GPT-5.3-Codex is far better at navigating real-world interfaces than previous models.

Beyond Code: A General-Purpose Agent

While GPT-5.3-Codex excels at programming, its capabilities extend far beyond code generation. OpenAI positions it as an agent that can handle "nearly anything developers and professionals can do on a computer."

Software Lifecycle Support

The model is built to support the entire software development lifecycle:
  • Debugging - Identifying and fixing bugs
  • Deploying - Managing releases and infrastructure
  • Monitoring - Tracking performance and metrics
  • Writing PRDs - Product requirement documents
  • Editing copy - Documentation and marketing text
  • User research - Analyzing user feedback
  • Testing - Writing and running test suites
  • Metrics analysis - Data-driven decision making

Knowledge Work Capabilities

On GDPval (OpenAI's 2025 evaluation measuring performance on knowledge-work tasks across 44 occupations), GPT-5.3-Codex matches GPT-5.2's performance. This includes tasks like:
  • Creating slide decks and presentations
  • Analyzing data in spreadsheets
  • Document management and organization
  • Research and synthesis

Web Development Example

To demonstrate the model's capabilities, OpenAI asked GPT-5.3-Codex to build two complete games from scratch:
  • A racing game (version 2 of the Codex app launch game)
  • A diving game
Using only a "develop web game" skill and generic follow-up prompts like "fix the bug" or "improve the game," GPT-5.3-Codex iterated autonomously over millions of tokens, building highly functional, polished games.

Better Intent Understanding

Compared to GPT-5.2-Codex, the new model better understands user intent when building websites. Simple or underspecified prompts now default to sites with:
  • More functionality
  • Sensible defaults
  • Production-ready features
For example, when asked to build a pricing landing page, GPT-5.3-Codex automatically displayed the yearly plan as a discounted monthly price (making the discount clear) and created an automatically transitioning testimonial carousel with three distinct user quotes—resulting in a more complete and polished design.

Interactive Collaboration

One of the most significant user experience improvements is the ability to steer the model while it works.

Real-Time Interaction

Instead of waiting for a final output, users can now:
  • Ask questions during execution
  • Discuss different approaches
  • Steer toward specific solutions
  • Provide feedback mid-task
GPT-5.3-Codex talks through what it's doing, responds to feedback, and keeps users in the loop from start to finish. This can be enabled in the Codex app via Settings > General > Follow-up behavior.
This transforms the experience from giving commands to a machine to collaborating with a teammate—a fundamental shift in how humans interact with AI systems.

Cybersecurity Capabilities and Safety

GPT-5.3-Codex is the first model OpenAI classifies as "High capability" for cybersecurity-related tasks under its Preparedness Framework. It's also the first model directly trained to identify software vulnerabilities.

Dual-Use Nature

Because cybersecurity is inherently dual-use (useful for both defense and offense), OpenAI is taking a precautionary approach:
  • No definitive evidence that it can automate cyber attacks end-to-end
  • Deploying comprehensive cybersecurity safety stack
  • Implementing safety training and automated monitoring
  • Requiring trusted access for advanced capabilities

Trusted Access for Cyber

OpenAI is launching Trusted Access for Cyber, a pilot program to:
  • Accelerate cyber defense research
  • Get tools to defenders first
  • Support ecosystem resilience

$10M Commitment

Building on a $1M Cybersecurity Grant Program from 2023, OpenAI is committing $10 million in API credits to accelerate cyber defense, especially for:
  • Open source software
  • Critical infrastructure systems
  • Good-faith security research

Aardvark Security Agent

OpenAI is expanding the private beta of Aardvark, its security research agent, as the first offering in its suite of Codex Security products and tools. They're also partnering with open-source maintainers to provide free codebase scanning for widely used projects like Next.js.

How OpenAI Used Codex to Build Codex

The development of GPT-5.3-Codex provides a fascinating case study in AI-accelerated research.

Research Team Use Cases

The research team used early versions of GPT-5.3-Codex to:
  • Monitor and debug the training run for the release
  • Track patterns throughout the course of training
  • Provide deep analysis on interaction quality
  • Propose fixes and build rich applications for human researchers
  • Precisely understand how the model's behavior differed from prior models

Engineering Team Use Cases

The engineering team used Codex to:
  • Optimize and adapt the harness for GPT-5.3-Codex
  • Identify context rendering bugs impacting users
  • Root cause low cache hit rates
  • Dynamically scale GPU clusters to adjust to traffic surges
  • Keep latency stable during launch

Data Science Use Cases

During alpha testing, a data scientist worked with GPT-5.3-Codex to:
  • Build regex classifiers to estimate frequency of clarifications, user responses, and task progress
  • Run these classifiers scalably over all session logs
  • Build new data pipelines and visualize results more richly than standard dashboarding tools
  • Co-analyze results, with Codex summarizing key insights over thousands of data points in under three minutes

Productivity Gains

The result? People building with Codex were happier as the agent:
  • Better understood their intent
  • Made more progress per turn
  • Asked fewer clarifying questions

Availability and Pricing

How to Access

GPT-5.3-Codex is available immediately for paid ChatGPT users across all Codex surfaces:
  • Desktop app (macOS and Windows)
  • Command-line interface (CLI)
  • IDE extensions (VS Code, JetBrains, etc.)
  • Web interface

Subscription Plans

For a limited time, paid plans will receive double the normal rate limits.

API Pricing

As of launch, OpenAI has not released official API pricing for GPT-5.3-Codex. API access is described as "rolling out soon" and "coming in the following weeks."
For reference, the current API pricing for the previous model (GPT-5.2-Codex) is:

Infrastructure

GPT-5.3-Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems—a testament to the close collaboration between OpenAI and NVIDIA in pushing the boundaries of AI capability.

Comparison with Competitors

The release of GPT-5.3-Codex came just minutes after Anthropic's announcement of Claude Opus 4.6, setting up an immediate comparison between the two models.

GPT-5.3-Codex Strengths

  • Terminal-Bench 2.0: 77.3 vs Opus 4.6's 65.4 (+18.6% advantage)
  • 25% faster performance
  • "High reliability, low variance" design philosophy
  • Self-building capability (helped create itself)
  • First "High capability" cybersecurity classification

Claude Opus 4.6 Strengths

  • 1 million token context window (significantly larger)
  • Agent Teams collaborative functionality
  • Broader versatility across knowledge work scenarios
  • Higher creativity temperature (more personality)

Design Philosophy Differences

The Bigger Picture

GPT-5.3-Codex represents more than just an incremental upgrade—it's a step change toward general-purpose agents that can reason, build, and execute across the full spectrum of real-world technical work.

From Code Agent to Computer Agent

OpenAI explicitly frames this evolution: "Codex is moving beyond writing code to using it as a tool to operate a computer and complete work end to end."
This is a profound shift. What started as a focus on being "the best coding agent" has become the foundation for a more general collaborator on the computer—expanding both who can build and what's possible with AI.

Accelerating AI Development

The fact that GPT-5.3-Codex helped build itself is a preview of what's to come. As OpenAI researchers note, "many researchers and engineers at OpenAI describe their job today as being fundamentally different from what it was just two months ago."
This suggests we're entering a period of accelerating returns in AI development, where each generation of models helps build the next—potentially compressing timelines from years to months.

Implications for Developers

For software developers, the implications are significant:
  • Faster development cycles - AI handles more of the routine work
  • Higher-level abstraction - Developers can focus on architecture and design
  • Interactive collaboration - Less like using a tool, more like working with a teammate
  • New capabilities - Tasks that previously required specialized knowledge are now accessible

Implications for Businesses

For businesses, GPT-5.3-Codex represents:
  • Increased productivity - More work gets done in less time
  • Lower barriers - Fewer specialized skills needed for certain tasks
  • New security considerations - "High capability" cybersecurity classification requires careful governance
  • Competitive advantage - Early adoption of powerful agentic AI

Conclusion

GPT-5.3-Codex is a landmark achievement in artificial intelligence. It combines:
  • State-of-the-art coding performance
  • Advanced agentic capabilities
  • Interactive collaboration
  • Self-improvement (it helped build itself)
  • Real-world computer use
The fact that it was instrumental in its own creation serves as both a technical achievement and a metaphor for where AI is headed. As models become more capable, they're not just tools we use—they're becoming partners in the creative and development process itself.
The simultaneous release with Claude Opus 4.6, just minutes apart, underscores the intensity of competition in the AI space. But more importantly, it signals that we've entered a new phase of AI capability—one where agents can reliably handle complex, long-horizon tasks across the full spectrum of professional computer work.
As OpenAI puts it: "What started as a focus on being the best coding agent has become the foundation for a more general collaborator on the computer."
The question now isn't just what these models can do—it's what we'll choose to build with them.



Sources



Disclaimer: This article is based on information available as of February 6, 2026. Specifications, pricing, and availability may change. Please refer to official OpenAI documentation for the most current information.