# Coding Agents Move Beyond Codegen Into Testing, Triage, and PR Ops

*By Coding Agents Alpha Tracker • May 13, 2026*

The strongest signal today: single-agent codegen is hitting a ceiling, while teams wiring agents into testing, review, and triage are seeing bigger gains. This brief covers concrete workflows to copy now, plus the key releases and projects worth tracking across Codex, Claude Code, OpenClaw, Bun, Executor, and Simon Willison’s tooling.

## 🔥 TOP SIGNAL

- **If your agents only write code, you're optimizing the wrong bottleneck.** Cursor says agent requests are up **15x YoY** and now exceed tab accepts [^1]. Internally, even with **98% of merged commits** written by AI, productivity seems to stall around **40%** when agents stay in single-step coding-assistant mode; the bigger gains come from removing the testing, review, and handoff bottlenecks around the code [^2]. Intuit is seeing the org-level version of that shift with a **3x idea-to-release** push from agent-driven development in the last 90-120 days [^3].

## ⚡ TRY THIS

- **Review proof before diff.** In Cursor's internal workflow, let a cloud agent write the code *and* run the tests, then review the returned **video / screenshots / logs** first. If the behavior looks right, take control of the remote dev environment for edge cases, then open the PR and bounce follow-up comments back to the agent [^4].

- **Push PR babysitting to the cloud.** Once a local agent gets you to `PR created`, click **Babysit PR in cloud** and let the cloud agent own the slow loop: CI failures, merge conflicts, and review comments. Only pull the human back in for ambiguous decisions [^4].

- **Debug by asking for a repro artifact, not a fix.** Simon Willison used Codex CLI with GPT-5.5 xhigh to generate a **minimal Dockerfile** that reproduced a Datasette segfault [^5]. Steal the pattern: first ask the agent for the smallest failing environment, then iterate on the bug from there. See [the issue comment](https://github.com/simonw/datasette/issues/2709#issuecomment-4435604727) [^5].

- **Turn your best prompt into a reusable skill.** Cursor says a precise `how` skill for rigorous codebase inspection was the missing piece in their bug-triage agents [^2]. Harrison Chase says the packaging pattern is shifting from piles of sub-agents toward **skills** that wrap tools and data cleanly [^6]. Practical move: extract your best bug-intake, codebase-inspection, or security-review routine into a named skill and call it first instead of pasting a giant prompt every time.

## 📡 WHAT SHIPPED

- **Codex app — in-app browser upgrades.** Codex can now test across viewport sizes from the in-app browser, capture key screenshots during long runs, hide the IAB to disable animations for **1-2x faster** testing, and send annotations faster with lower token use [^7][^8].

- **Claude Code 2.1.139 — `/goal`.** Set a completion condition and Claude keeps working across turns until it's met; works in interactive, `-p`, and Remote Control modes [^9].

- **Cursor — Fast mode for Claude Opus 4.7.** Faster by **2.5x** at **6x** the cost; Cursor recommends standard speed for most tasks [^10].

- **OpenClaw beta — unified structured-file pathing.** `openclaw path read|write|append` now works the same across `md`, `jsonc`, `jsonl`, and `yaml`, giving plugins and agents one addressing substrate for surgical edits. Study the PR: [#78678](https://github.com/openclaw/openclaw/pull/78678) [^11]. Peter Steinberger says Microsoft is helping get OpenClaw ready for enterprise use [^12].

- **Bun Rust rewrite — emerging agent migration case study.** Theo reports Jared's parallel-agent Zig→Rust experiment has already hit **99.8%** of Bun's pre-existing Linux x64 glibc test suite, with a **960k-line rewrite in 6 days** [^13]. The catch: the current port still carries **13,044** `unsafe` calls, and Theo's warning is fair — this may trade known Zig issues for unknown Rust ones [^13].

- **Executor desktop app.** Add MCPs, OpenAPIs, and GraphQL servers once and make them available to every agent; Executor converts them into **code mode** under the hood, aims to support thousands of tools without context bloat, and keeps everything local [^14][^15].

- **`llm` 0.32a2.** Simon Willison's CLI now supports OpenAI's `/v1/responses` endpoint, which matters because it enables **interleaved reasoning across tool calls** for GPT-5-class models. It also exposes summarized reasoning tokens, hideable with `-R` / `--hide-reasoning`. Release: [0.32a2](https://github.com/simonw/llm/releases/tag/0.32a2) [^16].

## 🎬 GO DEEPER

- **Build → test → artifact review (0:51-3:11).** Jonas shows the cleanest short demo in today's batch: one cloud agent writes the change, tests it, and returns a video proof so the human starts with behavior instead of a giant diff [^4].


[![What happens when agents get their own computers](https://img.youtube.com/vi/_GljcHROPX8/hqdefault.jpg)](https://youtube.com/watch?v=_GljcHROPX8&t=50)
*What happens when agents get their own computers (0:50)*


- **Slack bug report → triage pipeline (7:33-10:48).** This is the best clip today on chaining a codebase-inspection skill with follow-up questions so vague bug reports become structured triage and assignment input [^2].


[![How Cursor builds agentic workflows across the SDLC](https://img.youtube.com/vi/dJAVS1g3NDw/hqdefault.jpg)](https://youtube.com/watch?v=dJAVS1g3NDw&t=453)
*How Cursor builds agentic workflows across the SDLC (7:33)*


- **From coder to agent manager (5:57-7:53).** Michael Truell's `ghost colleagues` framing is worth watching because it explains the real workflow shift: less syntax, more delegation, review, testing, and parallelism — plus a clear warning about unsustainable AI-generated architecture if you skip review [^1].


[![The next era of AI coding](https://img.youtube.com/vi/8h9j2rskP14/hqdefault.jpg)](https://youtube.com/watch?v=8h9j2rskP14&t=356)
*The next era of AI coding (5:56)*


- **Artifact to study — [Simon’s minimal Dockerfile repro](https://github.com/simonw/datasette/issues/2709#issuecomment-4435604727).** Small artifact, big lesson: make the failure portable first, then let the agent help you reason about it [^5].

- **PR to study — [OpenClaw #78678](https://github.com/openclaw/openclaw/pull/78678).** A compact example of a useful agent-infra idea: give agents and plugins one shared way to address structured files so edits stop being brittle [^11].

- **PR to study — [llm #1435](https://github.com/simonw/llm/pull/1435).** If you build agent tooling, this is the plumbing change that matters: `/v1/responses` support and interleaved reasoning across tool calls [^16].

*Editorial take: the edge is moving from better code generation to better orchestration — proof artifacts, reusable skills, and explicit human checkpoints are where the real compounding gains are showing up* [^2][^4][^6].

---

### Sources

[^1]: [The next era of AI coding](https://www.youtube.com/watch?v=8h9j2rskP14)
[^2]: [How Cursor builds agentic workflows across the SDLC](https://www.youtube.com/watch?v=dJAVS1g3NDw)
[^3]: [How Intuit, DoorDash, and Atlassian are adopting AI coding](https://www.youtube.com/watch?v=aF-rolD9W7I)
[^4]: [What happens when agents get their own computers](https://www.youtube.com/watch?v=_GljcHROPX8)
[^5]: [datasette 1.0a29](https://simonwillison.net/2026/May/12/datasette)
[^6]: [Harrison Chase & Rajeev Dham on AI Agents, Memory & the Future of LangChain | Startup Grind 2026](https://www.youtube.com/watch?v=WrQ9bsO0YEY)
[^7]: [𝕏 post by @JamesZmSun](https://x.com/JamesZmSun/status/2054391975403774431)
[^8]: [𝕏 post by @thsottiaux](https://x.com/thsottiaux/status/2054398093886673260)
[^9]: [𝕏 post by @dani_avila7](https://x.com/dani_avila7/status/2053945243582488728)
[^10]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2054274305345618163)
[^11]: [𝕏 post by @OmarShahine](https://x.com/OmarShahine/status/2054434090422812808)
[^12]: [𝕏 post by @steipete](https://x.com/steipete/status/2054435647650246967)
[^13]: [I wish this was clickbait](https://www.youtube.com/watch?v=gILMoijqeGA)
[^14]: [𝕏 post by @RhysSullivan](https://x.com/RhysSullivan/status/2054365791060218314)
[^15]: [𝕏 post by @kentcdodds](https://x.com/kentcdodds/status/2054366782656561256)
[^16]: [llm 0.32a2](https://simonwillison.net/2026/May/12/llm)