Introduction

On February 5, 2026, OpenAI announced GPT-5.3-Codex, its most advanced agentic coding model to date. This release marks a significant milestone in artificial intelligence—not only for its impressive technical capabilities but also because it's the first model that was instrumental in creating itself.

GPT-5.3-Codex represents a fundamental shift from a code-writing tool to an interactive AI collaborator capable of handling long-horizon, real-world technical work across the full spectrum of professional computing tasks.

What Makes GPT-5.3-Codex Different?

A True Agentic Model

Unlike traditional coding assistants that simply generate code snippets, GPT-5.3-Codex is designed as an "agentic" model. This means it can:

Maintain context over long-running tasks that span hours or even days

Use tools autonomously, including command-line interfaces, file systems, and development environments

Adapt and iterate based on real-time feedback without losing its place

Handle complex multi-step workflows that require research, planning, and execution

Self-Building Achievement

Perhaps the most remarkable aspect of GPT-5.3-Codex is that the Codex team used early versions of the model to:

Debug its own training process

Manage its own deployment

Diagnose test results and evaluations

Optimize infrastructure for the final release

This self-referential development cycle demonstrates how AI is beginning to accelerate its own improvement—a milestone that OpenAI researchers described as leaving them "blown away by how much Codex was able to accelerate its own development."

Performance Improvements

GPT-5.3-Codex is 25% faster than its predecessor (GPT-5.2-Codex), thanks to improvements in OpenAI's infrastructure and inference stack. This speed improvement enables more responsive real-time collaboration and faster iteration cycles.

Benchmark Performance: The Data

GPT-5.3-Codex achieves state-of-the-art performance across several key benchmarks that measure coding, agentic capabilities, and real-world computer use.

SWE-Bench Pro

SWE-Bench Pro is a rigorous evaluation of real-world software engineering that spans four programming languages (Python, JavaScript, TypeScript, and Go). Unlike its predecessor (SWE-Bench Verified) which only tested Python, SWE-Bench Pro is designed to be more contamination-resistant and industry-relevant.

Terminal-Bench 2.0

The 13.3% improvement on Terminal-Bench 2.0 is particularly significant. This benchmark measures the terminal skills that a coding agent needs—navigating file systems, executing commands, and managing development workflows. Notably, GPT-5.3-Codex achieves this with fewer tokens than any prior model, making it more efficient.

OSWorld-Verified

The 26.5% jump on OSWorld-Verified demonstrates dramatically improved computer-use capabilities. OSWorld is an agentic computer-use benchmark where agents must complete productivity tasks in a visual desktop environment. This massive improvement shows that GPT-5.3-Codex is far better at navigating real-world interfaces than previous models.

Beyond Code: A General-Purpose Agent

While GPT-5.3-Codex excels at programming, its capabilities extend far beyond code generation. OpenAI positions it as an agent that can handle "nearly anything developers and professionals can do on a computer."

Software Lifecycle Support

The model is built to support the entire software development lifecycle:

Debugging - Identifying and fixing bugs

Deploying - Managing releases and infrastructure

Monitoring - Tracking performance and metrics

Writing PRDs - Product requirement documents

Editing copy - Documentation and marketing text

User research - Analyzing user feedback

Testing - Writing and running test suites

Metrics analysis - Data-driven decision making

Knowledge Work Capabilities

On GDPval (OpenAI's 2025 evaluation measuring performance on knowledge-work tasks across 44 occupations), GPT-5.3-Codex matches GPT-5.2's performance. This includes tasks like:

Creating slide decks and presentations

Analyzing data in spreadsheets

Document management and organization

Research and synthesis

Web Development Example

To demonstrate the model's capabilities, OpenAI asked GPT-5.3-Codex to build two complete games from scratch:

A racing game (version 2 of the Codex app launch game)

A diving game

Using only a "develop web game" skill and generic follow-up prompts like "fix the bug" or "improve the game," GPT-5.3-Codex iterated autonomously over millions of tokens, building highly functional, polished games.

Better Intent Understanding

Compared to GPT-5.2-Codex, the new model better understands user intent when building websites. Simple or underspecified prompts now default to sites with:

More functionality

Sensible defaults

Production-ready features

For example, when asked to build a pricing landing page, GPT-5.3-Codex automatically displayed the yearly plan as a discounted monthly price (making the discount clear) and created an automatically transitioning testimonial carousel with three distinct user quotes—resulting in a more complete and polished design.

Interactive Collaboration

One of the most significant user experience improvements is the ability to steer the model while it works.

Real-Time Interaction

Instead of waiting for a final output, users can now:

Ask questions during execution

Discuss different approaches

Steer toward specific solutions

Provide feedback mid-task

GPT-5.3-Codex talks through what it's doing, responds to feedback, and keeps users in the loop from start to finish. This can be enabled in the Codex app via Settings > General > Follow-up behavior.

This transforms the experience from giving commands to a machine to collaborating with a teammate—a fundamental shift in how humans interact with AI systems.

Cybersecurity Capabilities and Safety

GPT-5.3-Codex is the first model OpenAI classifies as "High capability" for cybersecurity-related tasks under its Preparedness Framework. It's also the first model directly trained to identify software vulnerabilities.

Dual-Use Nature

Because cybersecurity is inherently dual-use (useful for both defense and offense), OpenAI is taking a precautionary approach:

No definitive evidence that it can automate cyber attacks end-to-end

Deploying comprehensive cybersecurity safety stack

Implementing safety training and automated monitoring

Requiring trusted access for advanced capabilities

Trusted Access for Cyber

OpenAI is launching Trusted Access for Cyber, a pilot program to:

Accelerate cyber defense research

Get tools to defenders first

Support ecosystem resilience

$10M Commitment

Building on a $1M Cybersecurity Grant Program from 2023, OpenAI is committing $10 million in API credits to accelerate cyber defense, especially for:

Open source software

Critical infrastructure systems

Good-faith security research

Aardvark Security Agent

OpenAI is expanding the private beta of Aardvark, its security research agent, as the first offering in its suite of Codex Security products and tools. They're also partnering with open-source maintainers to provide free codebase scanning for widely used projects like Next.js.

How OpenAI Used Codex to Build Codex

The development of GPT-5.3-Codex provides a fascinating case study in AI-accelerated research.

Research Team Use Cases

The research team used early versions of GPT-5.3-Codex to:

Monitor and debug the training run for the release

Track patterns throughout the course of training

Provide deep analysis on interaction quality

Propose fixes and build rich applications for human researchers

Precisely understand how the model's behavior differed from prior models

Engineering Team Use Cases

The engineering team used Codex to:

Optimize and adapt the harness for GPT-5.3-Codex

Identify context rendering bugs impacting users

Root cause low cache hit rates

Dynamically scale GPU clusters to adjust to traffic surges

Keep latency stable during launch

Data Science Use Cases

During alpha testing, a data scientist worked with GPT-5.3-Codex to:

Build regex classifiers to estimate frequency of clarifications, user responses, and task progress

Run these classifiers scalably over all session logs

Build new data pipelines and visualize results more richly than standard dashboarding tools

Co-analyze results, with Codex summarizing key insights over thousands of data points in under three minutes

Productivity Gains

The result? People building with Codex were happier as the agent:

Better understood their intent

Made more progress per turn

Asked fewer clarifying questions

Availability and Pricing

How to Access

GPT-5.3-Codex is available immediately for paid ChatGPT users across all Codex surfaces:

Desktop app (macOS and Windows)

Command-line interface (CLI)

IDE extensions (VS Code, JetBrains, etc.)

Web interface

Subscription Plans

For a limited time, paid plans will receive double the normal rate limits.

API Pricing

As of launch, OpenAI has not released official API pricing for GPT-5.3-Codex. API access is described as "rolling out soon" and "coming in the following weeks."

For reference, the current API pricing for the previous model (GPT-5.2-Codex) is:

Infrastructure

GPT-5.3-Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems—a testament to the close collaboration between OpenAI and NVIDIA in pushing the boundaries of AI capability.

Comparison with Competitors

The release of GPT-5.3-Codex came just minutes after Anthropic's announcement of Claude Opus 4.6, setting up an immediate comparison between the two models.

GPT-5.3-Codex Strengths

Terminal-Bench 2.0: 77.3 vs Opus 4.6's 65.4 (+18.6% advantage)

25% faster performance

"High reliability, low variance" design philosophy

Self-building capability (helped create itself)

First "High capability" cybersecurity classification

Claude Opus 4.6 Strengths

1 million token context window (significantly larger)

Agent Teams collaborative functionality

Broader versatility across knowledge work scenarios

Higher creativity temperature (more personality)

Design Philosophy Differences

The Bigger Picture

GPT-5.3-Codex represents more than just an incremental upgrade—it's a step change toward general-purpose agents that can reason, build, and execute across the full spectrum of real-world technical work.

From Code Agent to Computer Agent

OpenAI explicitly frames this evolution: "Codex is moving beyond writing code to using it as a tool to operate a computer and complete work end to end."

This is a profound shift. What started as a focus on being "the best coding agent" has become the foundation for a more general collaborator on the computer—expanding both who can build and what's possible with AI.

Accelerating AI Development

The fact that GPT-5.3-Codex helped build itself is a preview of what's to come. As OpenAI researchers note, "many researchers and engineers at OpenAI describe their job today as being fundamentally different from what it was just two months ago."

This suggests we're entering a period of accelerating returns in AI development, where each generation of models helps build the next—potentially compressing timelines from years to months.

Implications for Developers

For software developers, the implications are significant:

Faster development cycles - AI handles more of the routine work

Higher-level abstraction - Developers can focus on architecture and design

Interactive collaboration - Less like using a tool, more like working with a teammate

New capabilities - Tasks that previously required specialized knowledge are now accessible

Implications for Businesses

For businesses, GPT-5.3-Codex represents:

Increased productivity - More work gets done in less time

Lower barriers - Fewer specialized skills needed for certain tasks

New security considerations - "High capability" cybersecurity classification requires careful governance

Competitive advantage - Early adoption of powerful agentic AI

Conclusion

GPT-5.3-Codex is a landmark achievement in artificial intelligence. It combines:

State-of-the-art coding performance

Advanced agentic capabilities

Interactive collaboration

Self-improvement (it helped build itself)

Real-world computer use

The fact that it was instrumental in its own creation serves as both a technical achievement and a metaphor for where AI is headed. As models become more capable, they're not just tools we use—they're becoming partners in the creative and development process itself.

The simultaneous release with Claude Opus 4.6, just minutes apart, underscores the intensity of competition in the AI space. But more importantly, it signals that we've entered a new phase of AI capability—one where agents can reliably handle complex, long-horizon tasks across the full spectrum of professional computer work.

As OpenAI puts it: "What started as a focus on being the best coding agent has become the foundation for a more general collaborator on the computer."

The question now isn't just what these models can do—it's what we'll choose to build with them.

Sources

Introducing GPT-5.3-Codex - OpenAI Official Announcement

GPT-5.3-Codex System Card - OpenAI

GPT 5.3 Codex pricing, benchmarks, and features explained - eesel AI

OpenAI: New coding model GPT-5.3-Codex helped build itself - Mashable

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code - Ars Technica

OpenAI launches new agentic coding model only minutes after Anthropic drops its own - TechCrunch

Opus 4.6 vs CODEX 5.3, first real comparison - Reddit

GPT 5.3 Codex vs. Opus 4.6: The Great Convergence - Every

OpenAI Platform Pricing

Codex Pricing

SWE-Bench Official Leaderboard

Disclaimer: This article is based on information available as of February 6, 2026. Specifications, pricing, and availability may change. Please refer to official OpenAI documentation for the most current information.

GPT-5.3-Codex: OpenAI's Most Capable Agentic Coding Model