Model Loyalty Is the Bug: Codex Has the Hands, Claude Has the Lens

Picking one model and defending it like a sports team is the actual mistake. Codex lives in the repo and moves until the task is handled. Claude never touches the tree and tells you the task was wrong. Run both, on purpose, with a handoff that doesn't leak.

The worst move in 2026 is model loyalty. People pick a model the way they pick a baseball team — once, emotionally, then they defend it through every loss. That instinct is the bug. It is a recipe book, not a jersey: the right tool at the right moment beats the tool you've decided to love.

The split, measured against shipping work:

Codex has hands. It works inside the repo — reads files, runs commands, patches code, tests, builds, reviews its own diff, and keeps moving until the task is handled, with proof.

Claude has the lens. It's the second strong reasoner that never touches the tree — broad planning, alternate implementations, copy, critique, architecture, the fresh review pass that hasn't already convinced itself.

The sweet spot uses both: Claude for the high-level second opinion, Codex for implementation and verification, then either one for final polish — whichever still has the context loaded.


Codex Lives in the Repo

Codex opens your codebase. The whole job is on-disk.

WHERE CODEX EARNS ITS KEEP
------------------------------------------------------------
repo-native work     read code, find patterns, scoped edits,
                     respect the structure that's already there
build/release chores package config, entitlements, signing
                     checks, scripts, test runs, screenshots,
                     local verification
iterative debugging  run -> observe -> patch -> rerun, on a loop
dirty tree           check status, avoid unrelated edits,
                     preserve the changes you didn't mean to ship
frontend verify      launch the local preview, screenshot it,
                     catch the layout break with its own eyes
concrete deliverables changed files, generated docs, review
                     notes, release checklists, build output
------------------------------------------------------------

The keyword in every line is verification. Codex runs the build and reads the output. When release work is on the line, command output is the entire job.


Claude Never Touches the Tree

Claude moves opposite. With no hands on the tree, it's free to tell you the plan is wrong before any code exists to defend.

WHERE CLAUDE EARNS ITS KEEP
------------------------------------------------------------
fresh critique     architectural teardown BEFORE implementation
detached tradeoffs explaining the cost of each path without
                   the bias of having just written one
product language   App Store copy, marketing, support text,
                   the words a release also ships
independent review reads a Codex diff cold, hasn't already
                   rationalized the approach
design exploration multiple directions before the repo gets
                   touched and the first idea calcifies
vague -> spec      turns a half-formed idea into a prompt or
                   spec another agent can actually execute
------------------------------------------------------------

"Independent" is doing the work there. A model reviewing its own diff is its own defense attorney. The point of two models is that the second one didn't write the first one's code and owes it nothing.


Five Patterns That Actually Pay

These handoffs survive contact with a deadline:

1. Claude strategizes  ->  Codex implements + verifies it
2. Codex makes the change  ->  Claude reviews the diff cold
                              for risks and missed cases
3. Claude drafts copy variants  ->  Codex drops the chosen
                                    text into repo docs / release assets
4. Codex produces real build/test output  ->  Claude reads the
                                    product + positioning implications
5. Files must change -> Codex.   Wider conceptual lens -> Claude.

Pattern 5 is the whole article compressed. If the deliverable is a changed file, you want hands. If the deliverable is a better decision, you want the lens. Most tasks are secretly both, in that order.

HACK LOVE BETRAY
COMING SOON

HACK LOVE BETRAY

Mobile-first arcade trench run through leverage, trace burn, and betrayal. The City moves first. You keep up or you get swallowed.

VIEW GAME FILE

Prompts the Hands Want

Codex responds to scope, verification, and a demand for evidence:

Inspect this repo, make the smallest safe change, run the
relevant checks, and summarize exactly what changed.
Prepare this Electron app for Mac App Store submission.
Create the needed entitlements, update package scripts,
flag App Store risks, and run the build if possible.
Review this UI locally. Take screenshots, point out
launch-blocking issues, and fix the obvious ones.

Every one ends in the same demand: show me the result. Smallest safe change. Run the checks. Take the screenshot. Proof is the deliverable.


Prompts the Lens Wants

Claude responds to being told to argue, diverge, and refuse to be agreeable:

Read this release plan and challenge it. What are the
biggest risks, missing assumptions, and better sequencing?
Give me three App Store positioning directions for this
app: practical, pro-audio, and playful.
Review this diff conceptually. Ignore formatting unless it
affects behavior, and focus on what could break.

Every one of these gives the lens permission to disagree. "Challenge it." "Three directions." "What could break." A reasoning model you didn't license to push back will just ratify whatever you brought it — an echo with better grammar.


The Handoff Is the Whole Game

Two models don't help you if the second one starts cold. The handoff is where multi-model work fails.

# Paste this verbatim into the receiving model. Fill the
# bracketed slots. Do not summarize the slot contents — paste
# the evidence as evidence.

ROLE: You are the [reviewer | implementer | copywriter |
release manager | skeptic]. The previous model in this chain
was the [other role].

STATE — what the previous model concretely did:
  - [changed files, with line counts]
  - [commands run, with stdout/stderr pasted, not summarized]
  - [decisions made, with one line of reasoning each]

ARTIFACTS — paste the evidence:
  - [file diffs, verbatim]
  - [full build/test output]
  - [screenshots if frontend]
  - [errors encountered, verbatim]

AUTHORITY: [The implementer | This handoff] owns final repo
edits. You [may | may not] write to the tree. If you may not,
your output is advisory text only — no shell, no edits.

TRUTH: When release matters, verified command output beats
either model's narration. If you need to verify something,
ask for the command output, not for opinions.

GOAL — what I want from you next:
  [one sentence, imperative, concrete deliverable]

That fourth line is non-negotiable. The instant two models both have write authority over the same tree, you get conflicting edits and a merge you have to referee. Pick one set of hands. The other model advises; the hands stay with one model.


One Hand on the Repo

The split I'd actually run, generalized off real app work:

Codex is the release engineer and local implementation agent. It touches the repo, builds the app, inspects the UI, produces the final release checklist. It is the only thing with write access to the tree, and it proves every claim with command output.

Claude is the second opinion on judgment. Positioning, store language, product calls, the cold diff review. It never edits. It argues, drafts, and breaks the plan before the plan is code.

Two angles, zero duplication. Multi-model work fails when both models do the same job twice and you trust whichever one flattered you. Give them different jobs. Keep one hand on the repo. Let the other tell you the repo is wrong. That's the recipe book working as designed.


GhostInThePrompt.com // Codex has hands and runs the build. Claude has the lens and breaks the plan. The only model you should be loyal to is the one holding the right tool this minute.