# Loop Engineering, Record & Replay, and New Automation Primitives

*By Coding Agents Alpha Tracker • June 19, 2026*

The strongest coding-agent signal today is the shift from manual prompting to durable loops. This brief covers the concrete workflows behind self-driving PRs, shared-state agent harnesses, and the latest releases from Codex, Cursor, Claude Code, LangSmith, and Datasette.

## 🔥 TOP SIGNAL

The clearest shift today is from manual prompting to loop design. Theo showed Codex clearing stale PRs overnight and waking up to four stacked PRs reviewed and merged [^1][^2], Jason Zhou described support and SEO loops already running in production on 30-minute and daily cadences [^3], and Steve Yegge’s write-up of Ezra Savard’s Netflix study treats single-agent and multi-agent use as distinct literacy jumps with dedicated training for each [^4]. The common pattern across Addy Osmani and Geoffrey Huntley: the advantage is a harness that can sleep, checkpoint state, recycle context, and use a separate evaluator—not a better one-shot prompt [^5][^6].

## ⚡ TRY THIS

- **Run a repo-maintainer loop instead of a cleanup sprint.** Steipete’s exact pattern is: tell Codex to maintain your repos, wake every 5 minutes, and direct work to threads; back it with an orchestrator plus triage, autoreview, and computer-use skills [^7]. Theo’s concrete use: let the loop close useless stale PRs, revive the worthwhile ones, then give each revived PR one build thread and one review thread; if you’re pushing a big migration, he also bumped Codex subagent parallelism from 3 to 20 and set a sharply defined goal [^1][^8]. Study the exact skill docs here: [maintainer-orchestrator](https://github.com/steipete/agent-scripts/blob/main/skills/maintainer-orchestrator/SKILL.md) and [github-project-triage](https://github.com/steipete/agent-scripts/blob/main/skills/github-project-triage/SKILL.md) [^7].

- **Move PR review handling off your keyboard.** Theo’s next step was giving a PR its own worktree on another machine, then telling the agent to watch for comments, address them, and keep going; one run kept working for 6+ hours [^2]. After the code lands, have the agent run the dev server, verify behavior, commit, push the PR, fetch review comments itself, and even spin up reviewer threads; his dynamic loop created PRs, re-reviewed each new SHA, merged, and triggered the next PR automatically [^2]. Watch token burn on bad branches: Theo saw one feedback loop chew through 3M+ tokens on a small set of comments [^2].

- **Turn a good one-off run into a shared-state loop.** Jason Zhou’s setup flow is practical: manually run the task once, calibrate the behavior, then ask the agent to create a README contract with the goal, workflow, timeline, and schema before wiring a recurring trigger [^3]. Put outputs into shared folders for artifacts, signals, and tasks so other loops can read/write the same state, and add a global `worklog.md` so each agent reads the last 5-10 entries before starting [^3]. Triggers can be cron jobs, webhooks, or other agents [^3].

- **Split planner / builder / reviewer at both the agent and model layers.** Addy Osmani’s minimum bar for long-running agents is true sleep via events, durable checkpoints on every transition, and a separate evaluator because self-review overrates quality [^5]. Matthew Berman’s concrete implementation is model routing as a skill: plan with Fable, write with Composer, then review with GPT-5.5 [^9]. Geoffrey Huntley’s simpler orchestrator constraint is also worth stealing: allow one task only, recycle the context window after each task, and progress state with git commits plus a todo list [^6].

## 📡 WHAT SHIPPED

- **Codex — Record & Replay.** OpenAI shipped a new primitive for teaching Codex by demonstration: record a recurring task once, stop recording when you want, and Codex turns the session into an inspectable, editable skill [^10]. Greg Brockman framed it as teaching Codex by demonstration, and Nick Baumann says he’s already using it for calendar formatting, PR-to-Slack posting, and onboarding-flow testing [^11][^12].
- **Cursor — `/automate` + new triggers.** Cursor added a plain-language `/automate` skill that configures triggers, instructions, and tools for you, plus Slack emoji triggers, GitHub triggers for issues/reviews/workflow runs, and computer use for cloud agents [^13][^14][^15]. Changelog: [cursor.com/changelog/06-18-26](http://cursor.com/changelog/06-18-26) [^15].
- **Claude Code — Artifacts (beta).** Team and Enterprise users can turn a session into an interactive page like a PR walkthrough or living project dashboard, then share it via private link [^16]. Boris Cherny says he’s using it for visual explanations of tricky code, system diagrams, animation previews, and shared dashboards; Mike Krieger’s tip is to ask Claude to diagram its work as tasks get deeper and more independent; @_catwu says teams are already using it to share architecture changes, analyses, and prototypes [^17][^18][^19].
- **LangSmith — LLM Gateway.** LangChain launched a gateway positioned as a budget guardrail against agents burning through large LLM bills overnight [^20]. Link: [Introducing LLM Gateway](https://www.langchain.com/blog/introducing-llm-gateway) [^20]. Timely context: Theo said his Codex loops drove more than $20,000 in inference over 48 hours [^21].
- **Datasette Agent / Datasette Apps.** Simon Willison’s latest write-up shows a coding-agent workflow that’s unusually clean: describe an app in chat, let the agent call `describe_table`, then `app_create`, and generate a single-file HTML app against a constrained API [^22]. His build stack is also a useful comparison point: Claude Opus 4.6 for the first plugin, Codex Desktop + GPT-5.5 for planning, and Claude Fable 5 for security review—which caught a real CSP privilege-escalation issue [^22].
- **GLM-5.2.** Simon notes the 753B MoE model has a 1M context window, open weights under MIT, ranks #2 on the Code Arena WebDev leaderboard behind only Claude Fable 5, and is listed on OpenRouter around $1.40 / $4.40 per million tokens input/output [^22]. In his testing it did especially well on animated SVG output, though one more complex illustration regressed versus GLM-5.1 [^22].

## 🎬 GO DEEPER

- **12:28-13:26 — Theo on loops that create more loops.** Short demo of the agentic endgame: one thread makes the PR, another reviews each new SHA, fixes get re-reviewed, then the PR merges and the next one starts [^2].


[![I guess we're writing loops now?](https://img.youtube.com/vi/iJVJwmCKW9o/hqdefault.jpg)](https://youtube.com/watch?v=iJVJwmCKW9o&t=748)
*I guess we're writing loops now? (12:28)*


- **18:24-19:29 — AI Jason on the handoff from manual run to production loop.** He shows the exact move most people skip: test the workflow once, then make the agent write a README contract and wire the recurring trigger around it [^3].


[![After spent 30+ hrs building loops...](https://img.youtube.com/vi/W6x-hb44C0c/hqdefault.jpg)](https://youtube.com/watch?v=W6x-hb44C0c&t=1104)
*After spent 30+ hrs building loops... (18:24)*


- **1:03-3:17 — Addy Osmani on why long-running agents fail.** Compact explanation of the three requirements: event-driven sleep, durable checkpoints, and a separate evaluator instead of self-grading [^5].
- **1:33-2:29 — Geoffrey Huntley on Ralph loops.** Good antidote to the `while true` meme: single-task constraint, context recycling, and state progression via git commit + todo list [^6].
- **Read Steve Yegge’s Netflix training note:** [The Flat Curve Society](https://steve-yegge.medium.com/the-flat-curve-society-36c8b01eb33b?source=rss-c1ec701babb7------2). Useful if you’re rolling agents out to a team: 0M / 4M / 12-15M qualified-day token cohorts, team-based training, and the shift from raw spend metrics to waste reduction and pocket evals [^4].
- **Study the exact skills behind the maintainer loop:** [maintainer-orchestrator](https://github.com/steipete/agent-scripts/blob/main/skills/maintainer-orchestrator/SKILL.md) and [github-project-triage](https://github.com/steipete/agent-scripts/blob/main/skills/github-project-triage/SKILL.md). These are the concrete skill docs steipete says he combines with triage, autoreview, and computer use so work can land autonomously [^7].
- **Study [Datasette Agent](https://agent.datasette.io/) + the [Datasette Apps article](https://simonw.substack.com/p/datasette-apps-host-custom-html-applications).** It’s a strong example of an agent with explicit tools, constrained APIs, and a copyable prompt template that other models can reuse [^22].

*Editorial take: the winners are starting to look less like prompt whisperers and more like workflow engineers with budgets, checkpoints, and reusable state [^3][^5][^20][^3].*

---

### Sources

[^1]: [𝕏 post by @theo](https://x.com/theo/status/2067688557448470761)
[^2]: [I guess we're writing loops now?](https://www.youtube.com/watch?v=iJVJwmCKW9o)
[^3]: [After spent 30+ hrs building loops...](https://www.youtube.com/watch?v=W6x-hb44C0c)
[^4]: [The Flat Curve Society](https://steve-yegge.medium.com/the-flat-curve-society-36c8b01eb33b?source=rss-c1ec701babb7------2)
[^5]: [3 patterns to build long-running AI agents](https://www.youtube.com/watch?v=l6KeLCuB90o)
[^6]: [Dark Factories, Cargo Cult AI, and Drunk Agents with @GeoffreyHuntley](https://www.youtube.com/watch?v=b5NE-tEkUCo)
[^7]: [𝕏 post by @steipete](https://x.com/steipete/status/2064998499780084154)
[^8]: [𝕏 post by @theo](https://x.com/theo/status/2067690174725886062)
[^9]: [I figured out the best way to vibe code](https://www.youtube.com/watch?v=wwfJlSF34n8)
[^10]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2067681320281723113)
[^11]: [𝕏 post by @gdb](https://x.com/gdb/status/2067700691062464887)
[^12]: [𝕏 post by @nickbaumann_](https://x.com/nickbaumann_/status/2067683680177504624)
[^13]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2067683814516858962)
[^14]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2067683817113137173)
[^15]: [𝕏 post by @cursor_ai](https://x.com/cursor_ai/status/2067683818488930393)
[^16]: [𝕏 post by @claudeai](https://x.com/claudeai/status/2067671912038240487)
[^17]: [𝕏 post by @bcherny](https://x.com/bcherny/status/2067700226669060207)
[^18]: [𝕏 post by @mikeyk](https://x.com/mikeyk/status/2067674333502427620)
[^19]: [𝕏 post by @_catwu](https://x.com/_catwu/status/2067674836726694200)
[^20]: [𝕏 post by @LangChain](https://x.com/LangChain/status/2067639147917902331)
[^21]: [𝕏 post by @theo](https://x.com/theo/status/2067796681387864135)
[^22]: [Datasette Apps: Host custom HTML applications inside Datasette](https://simonw.substack.com/p/datasette-apps-host-custom-html-applications)