ZeroNoise Logo zeronoise
Post
After “coding is solved”: plan-first, parallel-agent ops, and sandboxing become the workflow
Feb 20
6 min read
147 docs
Boris Cherny’s strongest claim yet: coding (for his work) is “largely solved,” and the real frontier is end-to-end agentic ops—backed by +200% PR productivity and Claude reviewing 100% of PRs. Plus: Cursor’s cross-OS agent sandboxing, Claude Code perf/regression signals, and new lightweight OpenClaw clones worth cloning.

🔥 TOP SIGNAL

Boris Cherny (Head of Claude Code) is blunt: for the kinds of programming he does, “coding is largely solved”, and the frontier is shifting to adjacent, end-to-end agentic work (project management, paying tickets, general ops) rather than better IDE autocomplete . In that world, throughput isn’t hypothetical: he says Anthropic saw +200% productivity per engineer (PRs), and Claude now reviews 100% of pull requests (with human review still in the loop) .

🛠️ TOOLS & MODELS

  • Claude Code — stability + performance signals

    • v2.1.47: long-running sessions use less memory.
    • Team guidance: keep reporting issues and they’ll fix them .
    • Practitioner complaint: Theo reports Claude Code has “regressed an absurd amount” with UI/feedback issues (timestamps not updating, missing “thinking,” multi-minute hangs with 0 output) and suggests it “needs to be rewritten from scratch.
  • Cursor — agent sandboxing shipped across desktop OSes

    • Cursor says it rolled out agent sandboxing on macOS, Linux, and Windows over the last three months .
    • Mechanism: agents run freely inside a sandbox, only requesting approval when they need to step outside it .
    • Implementation write-up: http://cursor.com/blog/agent-sandboxing.
  • OpenAI Codex — pricing/availability + compute pressure

    • @thsottiaux: Codex is included with a ChatGPT subscription (even Plus has “very generous” usage) ; they attribute this to gpt-5.3-codex achieving “SoTA at lower cost” .
    • Same source: candidates increasingly ask how much dedicated inference compute they’ll have, and usage/user is growing faster than user count → compute could be scarce.
  • Gemini 3.1 Pro — dev-workflow positioning (ramping up)

  • GitHub Copilot → Zed editor (GA)

  • Model choice drift + self-hosting pressure (reported trend)

    • Salvatore Sanfilippo says he’s seeing excellent programmers move off US models (Codex, Claude Code) toward Chinese open-weight models like Kimi 2.5 and GLM5, often via providers or by building in-house Nvidia GPU inference to avoid outages and keep sensitive data internal .
    • He frames DeepSeek v4 as a potentially major moment if it lands as SOTA (as rumors suggest), putting pressure on OpenAI/Anthropic business sustainability .

💡 WORKFLOWS & TRICKS

  • “Plan mode → execute” as a default loop (Claude Code / Boris Cherny)

    1. Start the task in plan mode (he says he does this for ~80% of tasks) .
    2. Iterate on the plan (model goes back-and-forth) .
    3. Once the plan is good, let it execute; he’ll auto-accept edits after that .
    • Implementation detail: plan mode is literally a prompt injection: “please don’t write any code yet” .
  • Parallel agents, but treat “state” as a first-class problem

    • Cherny: he runs ~5 agents in parallel while working (terminal/desktop/iOS) and highlights you can run many sessions in parallel .
    • Kent C. Dodds: similar “utter chaos” workflow—multiple projects, “a couple cloud agents” each, plus a locally guided agent .
    • Failure mode (real): Simon Willison describes “parallel agent psychosis”—losing track of where a feature lives across branches/worktrees/instances .
    • Recovery trick: after hacking in /tmp and crashing, he recovered the code from ~/.claude/projects/ session logs, and Claude Code could extract and recreate the missing feature .
  • Turn your feedback firehose into PRs (fast iteration loop)

    • Cherny’s pattern: point Quad/Cowork at an internal Slack feedback thread; it proposes changes and opens PRs quickly, which encourages more feedback because users feel heard .
    • Bug-fix loop: “as long as the description is good,” he can fix a bug in minutes by delegating to Claude .
  • Token policy as a productivity lever (especially early)

    • Cherny recommends giving engineers as many tokens as possible early (even “unlimited tokens” as a perk) so they try ideas that would otherwise feel too expensive; optimize/cost-cut after an idea works .
  • Avoid over-orchestration: tools + goal > rigid workflows (model-first design principle)

    • Cherny: don’t “box the model in” with strict step-by-step workflows; give it tools + a goal and let it figure it out—he argues heavy scaffolding mattered a year ago but often isn’t necessary now .
  • “Ephemeral app” mindset + AI-native interfaces (Karpathy)

    • Karpathy built a one-off cardio experiment dashboard with Claude; it had to reverse engineer a treadmill cloud API, process/debug data, and build a web UI; he still had to chase bugs (units, calendar alignment) .
    • His takeaway: the app-store model feels outdated for long-tail needs; instead, the industry needs AI-native sensors/actuators with agent-friendly APIs/CLIs so agents don’t have to click HTML UIs or reverse engineer services .
  • Agent “memory” ops in practice (LangSmith Agent Builder)

👤 PEOPLE TO WATCH

  • Boris Cherny — production-grade Claude Code habits (plan mode, parallel sessions) + strong claims about where “after coding” goes .
  • Andrej Karpathy — high-signal framing: ephemeral bespoke apps + “AI-native CLI/API” requirements for tools and hardware vendors .
  • Simon Willison — the best micro-case study of parallel-agent failure/recovery using session logs as the source of truth .
  • Steve Ruiz (tldraw) — pragmatic company-building: code gets easier, but alignment/positioning/communication get harder—and he’s automating the overhead away .
  • Theo — sharp practitioner critique on Claude Code regressions plus continued pressure on “harness vs infra” policy differences across vendors .
  • François Chollet — frames agentic coding as ML optimization (spec/tests as constraints) and asks what the “Keras of agentic coding” will be ; @swyx suggests DSPy as the presumptive community default .

🎬 WATCH & LISTEN

1) Boris Cherny — “Plan mode” as the default starter move (~1:09:52–1:10:41)

Hook: a simple, copyable workflow: force planning first (no code), iterate the plan, then execute + auto-accept when the plan is solid .

2) Boris Cherny — “Coding is largely solved… what’s next?” (~0:18:19–0:19:06)

Hook: his thesis on why the frontier is shifting from IDE coding to adjacent operational tasks and general automation .

3) Steve Ruiz — daily automated release notes from landed PRs (~0:20:35–0:21:02)

Hook: treat agents like scheduled staff: every day, Claude scans the last 24h PRs and drafts “release notes we’d publish if we shipped main today” .

📊 PROJECTS & REPOS


Editorial take: As agents make code cheap, the new edge is orchestration discipline: plan-first loops, sandboxing, session-log recoverability, and AI-native interfaces that don’t force your agent to “be the computer.”

Want personalized briefs on the topics you care about?

Create your own agent and get daily or weekly cited briefs on the topics you care about, shaped by the people, newsletters, podcasts, blogs, and channels you choose.