# Load-Bearing Code Review, Cursor SDK, and Codex’s Expanding Surface

*By Coding Agents Alpha Tracker • April 30, 2026*

The real bottleneck has shifted from generating code to reviewing and orchestrating it. Today’s brief covers Anthropic’s code review playbook, Cursor’s new SDK, Codex’s widening integration surface, and practical workflow patterns from PI, LLM 0.32a0, and hands-on builders.

## 🔥 TOP SIGNAL

Review is the bottleneck now. Cat Wu says Anthropic made AI code review load-bearing by front-loading planning, making authors own PR effects end-to-end, and using many agents to review across files at high recall; Peter Steinberger is pushing the same direction in the open with Codex review inside Clawsweeper that automerges and loops until no new issues are found [^1][^2].

Armin Ronacher and Andrej Karpathy land on the same meta-point from different angles: agents can outproduce human review capacity, so the winning pattern is better harnesses, deliberate friction, and human-owned specs—not blind dark-factory throughput [^3][^4].

## 🛠️ TOOLS & MODELS

- **Cursor SDK** — New release for building agents with the same runtime, harness, and models behind Cursor. It runs from CI/CD, workflow automations, or inside products; can deploy locally or in Cursor cloud; and Cursor says customers including Rippling, Notion, C3 AI, and Faire are already using it for background agents, ticket-to-PR flows, and self-healing codebases [^5][^6][^7].
  - Starter repos: coding agent CLI, prototyping tool, agent-powered kanban board [^8].

- **LLM 0.32a0 alpha** — Simon Willison’s backwards-compatible refactor makes custom agent loops cleaner: inputs are message sequences; outputs stream as typed parts; and `reply()`, tool calling, and response serialization now work without hardwiring SQLite into the design [^9].

- **PI** — Mario Zechner’s minimalist CLI agent is worth watching if you care about inspectable harnesses. The core is just read/write/edit/bash, but it is self-modifiable, has TypeScript hook points for custom tools and UI, and was built explicitly to avoid Claude Code and OpenCode behaviors like hidden context injection, pruning, and per-edit LSP noise [^3].

- **Codex is expanding past chat-with-a-repo** — Greg Brockman says Codex App has replaced the terminal as his primary interface; OpenAI is also pushing Codex into the design-to-app loop via GPT-5.5 + GPT-Image-2, into infra via Supabase support, and into other surfaces via App Server examples like Chromex [^10][^11][^12][^13][^14][^15].

- **Harness tuning is getting per-model** — LangChain Deep Agents added Harness Profiles so prompts, tools, and middleware can be adjusted per model instead of using one generic setup. Python is live now; TypeScript is next [^16][^17].

- **Tool-selection behavior is getting benchmarkable** — In an Amplifying survey across 4 project types, 20 categories, 3 runs each, and open-ended prompts, custom/DIY was the single most common primary recommendation, while GitHub Actions, Stripe, and Shadcn/UI were near-monopoly picks. Context mattered more than phrasing, newer models picked newer tools, and Claude Code still made bad stack-specific calls like recommending Bun for a Next.js runtime where Codex kept Node and flagged Bun as beta on Vercel [^18].

## 💡 WORKFLOWS & TRICKS

- **Anthropic’s load-bearing review recipe**
  1. Use the planning stage to teach architecture and the right verification method before code exists.
  2. Make the author own the PR end-to-end instead of dumping AI-generated diffs on a reviewer.
  3. Run the heavy review mode: multiple agents tracing across files, not just the diff.
  4. Optimize for bug recall—Cat Wu says this caught a ZFS type mismatch and an auth change that would have broken authenticity before merge [^1].

- **Trace loop, not vibe loop** — LangChain’s recipe is simple: get traces, enrich them, improve from them, repeat. Good reminder that agent improvement should run on observed failures, not intuition alone [^19].

- **Message-native agent loop with LLM 0.32a0**
  - Seed prior conversation with `messages=[user(...), assistant(...), user(...)]` instead of inventing your own transcript format.
  - Stream typed events like `text`, `tool_call_name`, and `tool_call_args`.
  - Use `response.execute_tool_calls()` or `response.reply()` to continue the loop.
  - Persist state with `response.to_dict()` / `Response.from_dict()` when you need storage, without locking yourself to SQLite [^9].

- **Riley Brown’s multimodal context stack**
  - **Whisper Flow** for rapid voice prompts across apps [^20]
  - **Raycast** clipboard history for pasting links and images into Codex [^20]
  - **CleanShot X** annotated screenshots and screen recordings for precise visual feedback [^20]
  His core point is the right one: agents are only as good as the context you feed them, and visuals beat vague text when you want UI changes [^20].

- **When the tool you want does not exist, build the tiny one** — Riley used one Codex prompt to create an Electron desktop app for Google Docs comments with image/video uploads and Firebase storage. His takeaway is direct: if the gap is specific and recurring, ask the agent to build the small tool that does exactly that [^20].

- **CLI vs. MCP: choose based on composability** — The PI team argues CLI pipes and code execution compose better than MCP when tool counts or API surfaces get large, because the model can work from the result instead of transforming everything inside context. Their compensating controls are specialized harnesses, agent self-validation, and human-in-loop gates [^3].

- **Two small but sharp hacks**
  - In Codex CLI, @embirico shows you can extract GPT-5.5 base instructions from `~/.codex/models_cache.json`, edit or filter them, then point Codex at the modified file with `model_instructions_file` [^21].
  - For persistent memory, here.now added private storage with prompts like `after a session, save memory to /context in my drive`; Ben Tossell says Codex synced 2.5k files into a cloud agent drive [^22][^23].

## 👤 PEOPLE TO WATCH

- **Cat Wu** — Best current practitioner signal on AI code review as a real production control, not a demo feature. Her examples are about bug recall, review scope, and ownership—the hard parts teams are actually struggling with [^1].

- **Mario Zechner + Armin Ronacher** — Probably the strongest conversation today on harness design: why minimal cores matter, why self-modification can be a feature, and why more agents is often the wrong answer when review and context are the real constraints [^3].

- **Andrej Karpathy** — Useful because he keeps separating vibe coding from agentic engineering. His practical bar: humans still own spec, taste, and oversight; agents handle the tedious API/detail layer [^4].

- **Simon Willison** — Still one of the best builders to watch if you want reusable agent primitives instead of app-specific magic. LLM 0.32a0 is a clean example: small API changes that remove real friction [^9].

- **Theo** — High signal when you want tool-comparison evidence instead of vibes. He is surfacing both broad recommendation patterns across models and weird edge-case failures like Claude Code choking on OpenClaw-related commit history [^18][^24].

## 🎬 WATCH & LISTEN

- **0:00-2:59 — Cat Wu on making AI code review load-bearing.** Planning, author ownership, and high-recall multi-agent review in one compact segment [^1].


[![Making AI Code Review a Loadbearing Part of Your Process with Cat Wu](https://img.youtube.com/vi/1eBxpDE35Gk/hqdefault.jpg)](https://youtube.com/watch?v=1eBxpDE35Gk&t=0)
*Making AI Code Review a Loadbearing Part of Your Process with Cat Wu (0:00)*


- **19:36-22:10 — Karpathy on where agents still need adults in the room.** Great segment on taste, invariants, persistent user IDs, and why humans still have to own the spec [^4].


[![Andrej Karpathy: From Vibe Coding to Agentic Engineering](https://img.youtube.com/vi/96jN2OCOfLs/hqdefault.jpg)](https://youtube.com/watch?v=96jN2OCOfLs&t=1175)
*Andrej Karpathy: From Vibe Coding to Agentic Engineering (19:35)*


- **10:08-14:13 — Codex driving Paper in real time.** Good watch if you care about multimodal agent UX: one prompt generates multiple UI directions, then iterates off linked components and inline image edits [^20].


[![7 Tools That Make Codex 10x More Powerful](https://img.youtube.com/vi/SNAlFLV9MBE/hqdefault.jpg)](https://youtube.com/watch?v=SNAlFLV9MBE&t=607)
*7 Tools That Make Codex 10x More Powerful (10:07)*


## 📊 PROJECTS & REPOS

- **[Cursor cookbook](http://github.com/cursor/cookbook)** — Starter projects for a coding agent CLI, prototyping tool, and agent-powered kanban board. Adoption signal is stronger than the repo itself: Cursor says Rippling, Notion, C3 AI, and Faire are already using the SDK for background agents, ticket-to-PR, and self-healing workflows [^8][^7].

- **[Chromex](https://github.com/GENEXIS-AI/chromex)** — Open-source example of embedding Codex App Server into Chrome using your ChatGPT account. Good concrete repo for the pattern Greg Brockman is explicitly pushing: build your own agents with Codex App Server [^14][^15].

- **[Clawsweeper](https://github.com/openclaw/clawsweeper)** — Now has Codex review integrated with automerge and a loop that runs until it stops finding new issues. Good repo to study if you are building review-first automation around PRs [^2].

- **[LLM 0.32a0](https://llm.datasette.io/en/latest/changelog.html#a0-2026-04-28)** + **[plugin system](https://llm.datasette.io/en/stable/plugins/index.html)** + **[llm-anthropic](https://github.com/simonw/llm-anthropic)** — Not a flashy agent framework, but a strong substrate if you want message-native conversations, typed streaming parts, and tool calling without locking yourself into one provider or persistence layer [^9].

*Editorial take: today’s durable gains are showing up above the model layer—in review loops, context packaging, and per-model harness tuning* [^1][^19][^16][^4].

---

### Sources

[^1]: [Making AI Code Review a Loadbearing Part of Your Process with Cat Wu](https://www.youtube.com/watch?v=1eBxpDE35Gk)
[^2]: [𝕏 post by @steipete](https://x.com/steipete/status/2049518771023360010)
[^3]: [Building Pi, and what makes self-modifying software so fascinating](https://www.youtube.com/watch?v=n5f51gtuGHE)
[^4]: [Andrej Karpathy: From Vibe Coding to Agentic Engineering](https://www.youtube.com/watch?v=96jN2OCOfLs)
[^5]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2049499866217185492)
[^6]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2049499871980200082)
[^7]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2049499876388454903)
[^8]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2049499874043830389)
[^9]: [LLM 0.32a0 is a major backwards-compatible refactor](https://simonwillison.net/2026/Apr/29/llm)
[^10]: [𝕏 post by @gdb](https://x.com/gdb/status/2049477747882254775)
[^11]: [𝕏 post by @romainhuet](https://x.com/romainhuet/status/2049597180474970179)
[^12]: [𝕏 post by @coreyching](https://x.com/coreyching/status/2049576335157416115)
[^13]: [𝕏 post by @romainhuet](https://x.com/romainhuet/status/2049593209308004585)
[^14]: [𝕏 post by @gdb](https://x.com/gdb/status/2049609076351381580)
[^15]: [𝕏 post by @arrakis_ai](https://x.com/arrakis_ai/status/2049484893877637359)
[^16]: [𝕏 post by @LangChain](https://x.com/LangChain/status/2049540926603718969)
[^17]: [𝕏 post by @LangChain_OSS](https://x.com/LangChain_OSS/status/2049539590990557381)
[^18]: [Claude Code's favorite tech stack](https://www.youtube.com/watch?v=v1MptV67kSI)
[^19]: [𝕏 post by @LangChain](https://x.com/LangChain/status/2049589573550809347)
[^20]: [7 Tools That Make Codex 10x More Powerful](https://www.youtube.com/watch?v=SNAlFLV9MBE)
[^21]: [𝕏 post by @embirico](https://x.com/embirico/status/2049696160642990549)
[^22]: [𝕏 post by @adamludwin](https://x.com/adamludwin/status/2049553336861679719)
[^23]: [𝕏 post by @bentossell](https://x.com/bentossell/status/2049577922420207819)
[^24]: [𝕏 post by @theo](https://x.com/theo/status/2049645973350363168)