# Opus Autonomy Playbook and the Loop Backlash

*By Coding Agents Alpha Tracker • June 8, 2026*

Boris Cherny shared the clearest checklist yet for running coding agents autonomously for hours or days, while OpenAI’s Codex use-case push shows where AI teammates are actually landing today. The other big signal: senior engineers are embracing orchestration loops, but there is already backlash against treating that pattern as universal advice.

## 🔥 TOP SIGNAL

Long-running coding agents finally have an operator playbook. Boris Cherny says he is seeing benchmarks where **Opus** is the best model for long-running work, and he pairs that with a concrete setup for unattended runs: auto-permissions, dynamic workflows, `/goal` or `/loop`, cloud execution, and end-to-end self-verification [^1]. The bigger pattern is real—steipete, Armin Ronacher, and Boris are all pointing toward loops that prompt agents—but today’s best correction comes from ThePrimeagen: don’t turn an advanced orchestration pattern into universal advice [^2][^3][^4][^5].

## ⚡ TRY THIS

- **Boris Cherny — set up Opus for hours/days, not a single sitting.** 1) Enable auto mode for permissions. 2) Use dynamic workflows when the task needs hundreds or thousands of agents. 3) Start `/goal` or `/loop` so work continues until done. 4) Run Claude Code in the cloud via desktop/mobile so you can close the laptop. 5) Give the agent a way to verify end to end—Chrome for web, iOS/Android sim MCP for mobile, or a full backend/server start path [^1].

- **Loops are emerging as the power-user abstraction.** steipete’s pattern is blunt: stop manually prompting coding agents; design loops that prompt them instead. Armin Ronacher says he treats that view as “a glimpse into the future,” and Boris frames the same shift as moving from direct prompts to loops that figure out what to do [^2][^3][^4].

  > "99.9999% of you should in fact not being “looping” your agent" [^5]

  **Takeaway:** treat loops as an advanced orchestration technique, not a default requirement for every developer [^2][^4][^5].

- **OpenAI Codex — start with bounded teammate jobs.** OpenAI’s concrete examples are the right kind of starting point: review GitHub pull requests before human review, turn Figma designs into production-ready code, understand large codebases in minutes, automate bug triage and QA, deploy apps or websites from prompts, build Mac and iOS apps, and turn Slack threads into coding tasks [^6]. The use-cases page is here: [developers.openai.com/codex/use-cases](https://developers.openai.com/codex/use-cases) [^6].

- **Stress-test your evals with long-horizon tasks.** SWE-Marathon is designed to measure coherence across a **1B-token** budget on tasks like building Slack from scratch, rewriting a JAX codebase in PyTorch, and building a C compiler in Rust [^7][^1]. If you need examples of what long-horizon software work actually looks like, this is the best framing in today’s source set [^7][^1].

## 📡 WHAT SHIPPED

- **SWE-Marathon — benchmark announcement.** New benchmark for autonomous long-horizon software work, centered on 1B-token coherence and big build tasks. Thread: [x.com/rishi_desai2/status/2062930906818769356](https://x.com/rishi_desai2/status/2062930906818769356) [^7][^1].
- **Codex use-case roundup from OpenAI.** OpenAI is positioning Codex as an “AI teammate” across software engineering, design, data analysis, and operations, with concrete dev workflows listed in its use-cases doc. Doc: [developers.openai.com/codex/use-cases](https://developers.openai.com/codex/use-cases) [^6].
- **Comparison signal: Opus for long runs.** Boris Cherny says he is seeing multiple benchmarks showing Opus as the best model for long-running work [^1].

## 🎬 GO DEEPER

No video or podcast clips were available in today’s source set. These are the best primary links to read closely:

- **[Boris Cherny’s post on running Opus autonomously](https://x.com/bcherny/status/2063792263067754658)** — The most copyable post of the day. It compresses the current unattended-agent playbook into five operator decisions: permissions, orchestration, persistence, cloud runtime, and self-verification [^1].
- **[SWE-Marathon thread](https://x.com/rishi_desai2/status/2062930906818769356)** — Worth reading if your eval mindset is still stuck on short diffs. The task selection is the point: build a product, port a framework, or implement a compiler under a huge token budget [^7][^1].
- **[Codex use cases](https://developers.openai.com/codex/use-cases)** — Fastest way to scan OpenAI’s own picture of where Codex already fits inside a dev org. Good for picking one narrow workflow to test first [^6].
- **[steipete’s loop post](https://x.com/steipete/status/2063697162748260627), [Armin Ronacher’s response](https://x.com/mitsuhiko/status/2063754786302742911), and [ThePrimeagen’s rebuttal](https://x.com/ThePrimeagen/status/2063746333023043966)** — Read these together, not separately. The contrast is the signal: expert users are moving toward orchestration, and the backlash is against pretending that makes it beginner advice [^2][^3][^5].

*Editorial take: the edge is moving from better prompts to better runtime control—permissions, loops, and verification—but full agent looping still looks like an expert move, not a default workflow.* [^1][^2][^5]

---

### Sources

[^1]: [𝕏 post by @bcherny](https://x.com/bcherny/status/2063792263067754658)
[^2]: [𝕏 post by @steipete](https://x.com/steipete/status/2063697162748260627)
[^3]: [𝕏 post by @mitsuhiko](https://x.com/mitsuhiko/status/2063754786302742911)
[^4]: [𝕏 post by @rohanpaul_ai](https://x.com/rohanpaul_ai/status/2063289804708835412)
[^5]: [𝕏 post by @ThePrimeagen](https://x.com/ThePrimeagen/status/2063746333023043966)
[^6]: [𝕏 post by @suraj_sharma14](https://x.com/suraj_sharma14/status/2063589795813785841)
[^7]: [𝕏 post by @rishi_desai2](https://x.com/rishi_desai2/status/2062930906818769356)