The name is misleading. Codex sounds like a tool for programmers, and it started as one. But what OpenAI has built with Codex and GPT 5.5 is something closer to what Every.to's team calls an operating system for knowledge work: a persistent agent workspace that runs goals across sessions, connects to your tools, and handles tasks across email, research, writing, planning, and meetings alongside code.

The companies using it well are not treating it as a faster way to write code. They are treating it as a team member. Katie Parrott, head of content at Every.to, describes it as an agent you can brief on what done looks like, point at your relevant data sources, and leave working while you do other things. You check back when it needs you. Not the other way around.

This article explains what Codex actually is, how GPT 5.5 differs from previous OpenAI models, what the key capabilities are, and how companies are using it across their business, not just in engineering.

What Codex Actually Is

A tool-using agentic workspace. Every word in that phrase matters.

Tool-using: Codex can read and write files, connect to external services through plugins and integrations, run scripts, control a browser, and take actions in apps. It is not just generating text. It is doing things.

Agentic: It runs multi-step tasks without asking for guidance at every step. You give it a goal, not a prompt. It plans, executes, checks its own work, revises, and reports back when it needs a decision or is done.

Workspace: It holds context across sessions. It remembers your files, your preferences, your ongoing work. When you return to a task the next day, it knows where things stand. This is fundamentally different from a chat interface that starts fresh with every conversation.

ChatGPT / traditional AI chat

Chat. Respond. Repeat.

Session model
One conversation at a time. Starts fresh every session with no memory of previous work.
Context
You provide all context in every message. Nothing carries over.
Work style
Stops when you stop asking. Waits for your next message.
Output
Generates text you then act on yourself.
Best for
Single questions, short tasks, drafts you refine yourself.
OpenAI Codex

Goals. Actions. Results.

Session model
Multiple tasks running in parallel. Persistent workspace with memory across sessions.
Context
Holds your goals, files, and preferences between sessions. Set once, applies everywhere.
Work style
Works continuously toward goals while you do other things.
Output
Takes actions in your tools and systems directly.
Best for
Ongoing work, multi-step projects, tasks that span days.

GPT 5.5: What Changed

GPT 5.5 is the model powering Codex. It is a step change from GPT-4o and the model line that came before it.

"For almost any topic, the top AIs now give better answers than the actual world-class experts I could call on the phone. And I can call basically anyone." — Marc Andreessen, on GPT 5.5 and the current generation of frontier models

The specific improvements that matter for Codex's knowledge work use cases:

What GPT 5.5 unlocks in Codex
GPT 5.5
The underlying model in Codex. Part of the generation Marc Andreessen described as crossing an AGI threshold alongside Claude Opus 4.8 and Gemini 3.
Parallel tasks
Run multiple goals simultaneously. Codex works on all of them while you focus elsewhere.
Persistent goals
Set a goal once with /goal. Codex keeps working toward it across session breaks.
Mobile control
Control Codex from your phone via ChatGPT mobile. Kick off tasks, approve decisions, review results from anywhere.

Goals and Skills: The Key Concepts

Two concepts define how Codex works differently from a standard AI assistant. Understanding them explains most of what Codex can do.

Goals are persistent objectives. You set a goal with /goal and describe what done looks like, how success gets checked, and what constraints to respect. Codex then keeps working toward that outcome across interruptions and session breaks.

The test for when to use a goal: if you would type the same instruction in three separate messages, "always cite your sources, match our house style, never send without my review," make it a goal instead. It applies to everything Codex does in that session.

Skills are reusable instructions. A skill teaches Codex how to handle a recurring kind of task well. Once you have defined how you want weekly competitive analysis reports formatted, or how to process incoming sales enquiries, or how to handle end-of-day meeting summaries, that skill runs automatically for every instance of that task.

Together: Goals tell Codex what you are trying to accomplish. Skills tell it how to do recurring tasks within that goal. The combination is what makes it a workspace rather than a chat tool.

Goals and Skills in practice

A CEO's Codex setup

  • Goal: "My company is preparing for Series B. Track all relevant signals, competitor moves, customer feedback, market news, and surface anything I should know before my Monday morning standup."
  • Skill: "When summarising competitor news, always include: what changed, why it matters, and one question I should ask my team about it."
  • Skill: "Meeting notes format: attendees, decisions made, action items with owners, open questions. Post to Notion within 30 minutes of the meeting ending."
  • Result: Codex runs continuously. Monday morning summary is ready before you open your laptop.

What Every.to Learned Running Four Agents

Katie Parrott and the team at Every.to, a 25-person media company, documented what happened when they moved their core coordination work to four Codex agents:

Anton (Prioritisation): Routes daily priorities to team members and posts company-wide summaries to Slack. Synthesises launch calendars, strategy documents, and task lists. Answers the question every employee has every morning: "What should I work on today?"

Max (Meetings to Tasks): Extracts action items from meeting transcripts, posts them to Slack as numbered lists, and converts selected items into tasks linked to relevant projects.

Strategy Interviewer (OKR Planning): Conducts quarterly goal interviews with team members, pushing for specificity and measurable outcomes while ensuring alignment with company strategy. Reduced planning from weeks to two days.

Campaign Reporter (Growth Tracking): Delivers daily scorecards showing key metrics, pace indicators, and whether targets are being met.

"We moved from relying on the COO as a human router to automating core coordination work." — Every.to team

The COO stopped being the conduit for information. The agents became the conduit.

Three things the Every.to team found essential:

The Folder Is the Agent

Kieran Klaassen at Every.to discovered something surprising when managing 44 concurrent agents: you do not need complex orchestration. You need well-organised context.

A well-structured project folder, with CLAUDE.md or similar context files encoding conventions and institutional knowledge, turns the same AI model into a different specialist depending on which folder it reads. Point it at your codebase folder, it behaves like a Rails engineer. Point it at your monitoring folder, it behaves like an operations engineer.

Klaassen runs 44 concurrent agents through a file-based dispatch system with two commands: /hey generates status reports across all projects, and /orchestrate breaks tasks into subtasks and spawns workers in appropriate folders.

The implication: the barrier to running multiple agents is lower than it looks. The orchestration infrastructure is not the hard part. The hard part is having clean, well-structured context in each domain.

Codex vs Claude Code: What to Use When

For companies looking at both tools: both are agentic coding and knowledge work platforms. Both can run parallel tasks, use tools, and connect to systems. The differences are in emphasis.

Codex (GPT 5.5)

Knowledge work. Operations. Mobile.

Primary strength
Knowledge work alongside coding: email, research, writing, ops tasks.
Mobile
First-class mobile control via ChatGPT app. Run and approve tasks from your phone.
Persistence
Persistent goals that survive session breaks. Set once, runs indefinitely.
Ecosystem
Strong integration with OpenAI's broader tool and plugin ecosystem.
Claude Code (Opus 4.8)

Engineering. Scale. Depth.

Primary strength
Deep coding and engineering focus. Long-horizon tasks: migrations, audits, refactoring.
Scale
Dynamic workflows: scale to hundreds of agents for large codebase tasks.
Effort control
Ultracode mode for maximum-effort engineering tasks when you need it done right.
Best for
Engineering-heavy workflows where code quality and depth matter most.

Neither is objectively better. The companies running at Wave 3 typically use both: Claude Code for engineering-heavy workflows, Codex for knowledge work and business operations. The nativefirst stack, Opus 4.8 and Codex GPT 5.5, reflects this.

Codex and Opus 4.8 are the tools. The architecture is what makes them work.

Book a free Diagnostic: 30–45 minutes, no deck, no pitch. It maps which knowledge work functions in your company are most ready for an agent setup, what the data access looks like, which goals to set first, and how to structure the context layer so agents actually produce useful output.

Book the Diagnostic →
Sources
1Katie Parrott, Codex for Knowledge Work, Every.to, May 2026. Complete guide to using Codex as an operating system for non-engineering work.
2Brandon Gell, How We Run a 25-person Company on Four AI Agents, Every.to, 2026. Every.to's four-agent setup and what changed.
3Kieran Klaassen, "The Folder Is the Agent", Every.to, 2026. On managing 44 concurrent agents through folder-based context.
4Marc Andreessen on GPT 5.5 and AGI thresholds, via @itsolelehmann summary on X, May 2026.
5@gdb (OpenAI), Codex real-time meeting transcription and Q&A, X, May 2026.
John Tan
John Tan

Fractional AI & Product Founder at nativefirst.ai. Ex-CEO, Depict (Y Combinator). Embeds on-site with scaling founders and CEOs to ship Level-3 agents and AI workflows in production.