# Self-Verifying Agent Workflows Emerge Across Codex, OpenClaw, and Claude

*By Coding Agents Alpha Tracker • May 24, 2026*

The strongest signal today is operational, not model-driven: top practitioners are getting real leverage from verification harnesses, repo-specific skills, and scratch-logged long runs. Also inside: Claude terminal agents, Cursor Composer 2.5 comparisons, and the open-tooling story around Codex, Pi, and OpenCode.

## 🔥 TOP SIGNAL

- **Self-verifying agent loops are the highest-leverage pattern showing up right now.** Peter Steinberger’s OpenClaw setup gives the agent broad access to Slack, Discord, Notion, Linear, email, calendar, and other data sources, then lets it check out repos, compile, run, debug, and verify fixes end-to-end; his example is a bug report arriving via message and turning into a PR about 5 minutes later [^1]. The key is the harness, not the vibe: he explicitly builds replica environments so the agent can prove the fix before commit, and he treats this as *agentic engineering* that still requires human system understanding and meta-prompting when loops appear [^1].

## ⚡ TRY THIS

- **Codify repo intent, then autotriage the easy wins (Peter Steinberger).**
  1. Add a `VISION.md` to the repo.
  2. Restrict autonomous work to issues/PRs that **fit project vision**, are **inferrable in code with high confidence**, have a **clear fix**, and **can be live-tested** [^2].
  3. Let Codex run in a VM with computer vision via [crabbox.sh](https://crabbox.sh), then manually review its suggestions; if issue entry is slowing you down, steipete added quick selection to [repo.bar](http://repo.bar) [^2].

- **Force a scratch-log on any refactor that touches lots of files (steipete).**
  - Prompt: `Maintain a scratch-log while you work on this refactor with decisions, tradeoffs, and review fixes.`
  - Pair it with the public [`autoreview` skill](https://github.com/openclaw/agent-skills/blob/main/skills/autoreview/SKILL.md) for long-running cleanup; steipete says it has already run 5h+ on a large refactor and kept fixing issues [^3][^4][^3].
  - Read the log afterward to see what the agent decided and what you forgot to specify. Good antidote to ThePrimeagen’s warning that AI can make your own production reports less dense if you stop surfacing that information yourself [^5].

- **Give the agent a testable world, not just a prompt (Peter Steinberger).**
  1. Recreate the exact failing environment first — his example is a remote macOS box for a launchd bug [^1].
  2. Have the agent reproduce the issue, write the fix, and rerun the same environment to verify it works before commit [^1].
  3. If the run time or behavior feels off, stop it and ask on the meta-level where it is struggling instead of letting it spin [^1].

- **Roll out coding agents like a platform, not a gated pilot (Boris Cherny, Anthropic).**
  - Give everyone tokens and remove approval friction for everyday experimentation [^6].
  - Create psychological safety so people can try weird process changes, fail, and iterate [^6].
  - Do **not** pre-pick the use cases; Boris says the wins often come from unexpected roles, and you optimize only after a use case starts to scale [^6].
  - His production signal: Anthropic says code written per engineer grew about **250%** after Claude Code, while quality and reliability stayed stable [^6].

## 📡 WHAT SHIPPED

- **Claude Code terminal agents:** new multitasking flow for firing off multiple parallel tasks, browsing them with left/right arrows, and routing work through custom sub-agents like a Web Research Specialist [^7].
- **Cursor Composer 2.5:** Riley Brown says the new model is extremely cheap to run, includes a full in-app browser, and is noticeably faster than Codex or Claude on quick frontend/landing-page work; he still says he moved most of his own agentic work to Codex because it feels like a more unified product [^7].
- **Codex computer use:** Greg Brockman highlighted Codex building and debugging an iPhone simulator feature end-to-end, then driving the simulator to bug-bash the code it had just generated [^8][^9].
- **Open ecosystem signal from OpenAI:** Tibo says about **5%** of production traffic is on Pi harness and another **5%** on OpenCode; Romain Huet adds that ChatGPT subscriptions work across other tools too, and the Codex harness/app server are open source for bringing similar experiences into your own app [^10][^11].
- **pi.dev read tool debate:** the default read behavior changed, but Armin Ronacher notes the tool is fully swappable; one prompt restores the old behavior via this [gist](https://gist.github.com/mitsuhiko/0f5ddf204a956c15c128ff85ae343b92). Discussion is in [issue #4916](https://github.com/earendil-works/pi/issues/4916) [^12][^13][^14][^12].

## 🎬 GO DEEPER

- **24:28-25:08 — Peter Steinberger on the self-verifying bug-fix loop.** Remote repro box, agent fix, rerun the exact env, then commit. This is the cleanest timeless pattern in today’s set [^1].

[![Wie hast du OpenClaw gebaut und an OpenAI verkauft, Peter Steinberger?](https://img.youtube.com/vi/ufUUB7RIy18/hqdefault.jpg)](https://youtube.com/watch?v=ufUUB7RIy18&t=1467)
*Wie hast du OpenClaw gebaut und an OpenAI verkauft, Peter Steinberger? (24:27)*


- **17:28-18:25 — Ping to PR in 5 minutes.** Worth watching for the broader point: once the agent can read your messages and touch the systems you already use, bug triage becomes a full workflow instead of a chat demo [^1].

[![Wie hast du OpenClaw gebaut und an OpenAI verkauft, Peter Steinberger?](https://img.youtube.com/vi/ufUUB7RIy18/hqdefault.jpg)](https://youtube.com/watch?v=ufUUB7RIy18&t=1047)
*Wie hast du OpenClaw gebaut und an OpenAI verkauft, Peter Steinberger? (17:27)*


- **01:56-02:26 — Boris Cherny on the 250% code-volume jump.** Short clip, big management lesson: gains show up when teams actually let people use the tools without turning experimentation into a permissions process [^6].

[![Tokenmaxxing: The Obsession Taking Over Silicon Valley Right Now](https://img.youtube.com/vi/qC4FdIe2TfM/hqdefault.jpg)](https://youtube.com/watch?v=qC4FdIe2TfM&t=115)
*Tokenmaxxing: The Obsession Taking Over Silicon Valley Right Now (1:55)*


- **Repos and tools worth studying:**
  - [`autoreview` SKILL.md](https://github.com/openclaw/agent-skills/blob/main/skills/autoreview/SKILL.md) — small public reference for long-running review/fix loops on messy refactors [^3].
  - [crabbox.sh](https://crabbox.sh) + [repo.bar](http://repo.bar) — one is the verification layer, the other is the quick issue browser feeding Codex [^2].
  - [Pi read-tool restore gist](https://gist.github.com/mitsuhiko/0f5ddf204a956c15c128ff85ae343b92) — useful if you care about making agent tools swappable instead of arguing endlessly about defaults [^14][^12].

*Editorial take: the durable edge today is not which agent wins — it is who has the better verification harness, repo rules, and scratch-log discipline around the agent.* [^1][^2][^4]

---

### Sources

[^1]: [Wie hast du OpenClaw gebaut und an OpenAI verkauft, Peter Steinberger?](https://www.youtube.com/watch?v=ufUUB7RIy18)
[^2]: [𝕏 post by @steipete](https://x.com/steipete/status/2058240758801420530)
[^3]: [𝕏 post by @steipete](https://x.com/steipete/status/2058307930613518698)
[^4]: [𝕏 post by @steipete](https://x.com/steipete/status/2058308112134635528)
[^5]: [𝕏 post by @ThePrimeagen](https://x.com/ThePrimeagen/status/2058217397782491458)
[^6]: [Tokenmaxxing: The Obsession Taking Over Silicon Valley Right Now](https://www.youtube.com/watch?v=qC4FdIe2TfM)
[^7]: [AI Agent: The Biggest Updates You Missed This Week \(Codex, Claude Code, Cursor\)](https://www.youtube.com/watch?v=52ltebR4Jnw)
[^8]: [𝕏 post by @gdb](https://x.com/gdb/status/2058232892266836141)
[^9]: [𝕏 post by @JustinBleuel](https://x.com/JustinBleuel/status/2058228412158758950)
[^10]: [𝕏 post by @thsottiaux](https://x.com/thsottiaux/status/2058071172361998482)
[^11]: [𝕏 post by @romainhuet](https://x.com/romainhuet/status/2058322493144973579)
[^12]: [𝕏 post by @badlogicgames](https://x.com/badlogicgames/status/2058112885369295217)
[^13]: [𝕏 post by @mitsuhiko](https://x.com/mitsuhiko/status/2058135544891158829)
[^14]: [𝕏 post by @mitsuhiko](https://x.com/mitsuhiko/status/2058135547017670728)