ZeroNoise Logo zeronoise
Post
After “coding is solved”: plan-first, parallel-agent ops, and sandboxing become the workflow
Feb 20
6 min read
147 docs
Boris Cherny’s strongest claim yet: coding (for his work) is “largely solved,” and the real frontier is end-to-end agentic ops—backed by +200% PR productivity and Claude reviewing 100% of PRs. Plus: Cursor’s cross-OS agent sandboxing, Claude Code perf/regression signals, and new lightweight OpenClaw clones worth cloning.

🔥 TOP SIGNAL

Boris Cherny (Head of Claude Code) is blunt: for the kinds of programming he does, “coding is largely solved”, and the frontier is shifting to adjacent, end-to-end agentic work (project management, paying tickets, general ops) rather than better IDE autocomplete . In that world, throughput isn’t hypothetical: he says Anthropic saw +200% productivity per engineer (PRs), and Claude now reviews 100% of pull requests (with human review still in the loop) .

🛠️ TOOLS & MODELS

  • Claude Code — stability + performance signals

    • v2.1.47: long-running sessions use less memory.
    • Team guidance: keep reporting issues and they’ll fix them .
    • Practitioner complaint: Theo reports Claude Code has “regressed an absurd amount” with UI/feedback issues (timestamps not updating, missing “thinking,” multi-minute hangs with 0 output) and suggests it “needs to be rewritten from scratch.
  • Cursor — agent sandboxing shipped across desktop OSes

    • Cursor says it rolled out agent sandboxing on macOS, Linux, and Windows over the last three months .
    • Mechanism: agents run freely inside a sandbox, only requesting approval when they need to step outside it .
    • Implementation write-up: http://cursor.com/blog/agent-sandboxing.
  • OpenAI Codex — pricing/availability + compute pressure

    • @thsottiaux: Codex is included with a ChatGPT subscription (even Plus has “very generous” usage) ; they attribute this to gpt-5.3-codex achieving “SoTA at lower cost” .
    • Same source: candidates increasingly ask how much dedicated inference compute they’ll have, and usage/user is growing faster than user count → compute could be scarce.
  • Gemini 3.1 Pro — dev-workflow positioning (ramping up)

  • GitHub Copilot → Zed editor (GA)

  • Model choice drift + self-hosting pressure (reported trend)

    • Salvatore Sanfilippo says he’s seeing excellent programmers move off US models (Codex, Claude Code) toward Chinese open-weight models like Kimi 2.5 and GLM5, often via providers or by building in-house Nvidia GPU inference to avoid outages and keep sensitive data internal .
    • He frames DeepSeek v4 as a potentially major moment if it lands as SOTA (as rumors suggest), putting pressure on OpenAI/Anthropic business sustainability .

💡 WORKFLOWS & TRICKS

  • “Plan mode → execute” as a default loop (Claude Code / Boris Cherny)

    1. Start the task in plan mode (he says he does this for ~80% of tasks) .
    2. Iterate on the plan (model goes back-and-forth) .
    3. Once the plan is good, let it execute; he’ll auto-accept edits after that .
    • Implementation detail: plan mode is literally a prompt injection: “please don’t write any code yet” .
  • Parallel agents, but treat “state” as a first-class problem

    • Cherny: he runs ~5 agents in parallel while working (terminal/desktop/iOS) and highlights you can run many sessions in parallel .
    • Kent C. Dodds: similar “utter chaos” workflow—multiple projects, “a couple cloud agents” each, plus a locally guided agent .
    • Failure mode (real): Simon Willison describes “parallel agent psychosis”—losing track of where a feature lives across branches/worktrees/instances .
    • Recovery trick: after hacking in /tmp and crashing, he recovered the code from ~/.claude/projects/ session logs, and Claude Code could extract and recreate the missing feature .
  • Turn your feedback firehose into PRs (fast iteration loop)

    • Cherny’s pattern: point Quad/Cowork at an internal Slack feedback thread; it proposes changes and opens PRs quickly, which encourages more feedback because users feel heard .
    • Bug-fix loop: “as long as the description is good,” he can fix a bug in minutes by delegating to Claude .
  • Token policy as a productivity lever (especially early)

    • Cherny recommends giving engineers as many tokens as possible early (even “unlimited tokens” as a perk) so they try ideas that would otherwise feel too expensive; optimize/cost-cut after an idea works .
  • Avoid over-orchestration: tools + goal > rigid workflows (model-first design principle)

    • Cherny: don’t “box the model in” with strict step-by-step workflows; give it tools + a goal and let it figure it out—he argues heavy scaffolding mattered a year ago but often isn’t necessary now .
  • “Ephemeral app” mindset + AI-native interfaces (Karpathy)

    • Karpathy built a one-off cardio experiment dashboard with Claude; it had to reverse engineer a treadmill cloud API, process/debug data, and build a web UI; he still had to chase bugs (units, calendar alignment) .
    • His takeaway: the app-store model feels outdated for long-tail needs; instead, the industry needs AI-native sensors/actuators with agent-friendly APIs/CLIs so agents don’t have to click HTML UIs or reverse engineer services .
  • Agent “memory” ops in practice (LangSmith Agent Builder)

👤 PEOPLE TO WATCH

  • Boris Cherny — production-grade Claude Code habits (plan mode, parallel sessions) + strong claims about where “after coding” goes .
  • Andrej Karpathy — high-signal framing: ephemeral bespoke apps + “AI-native CLI/API” requirements for tools and hardware vendors .
  • Simon Willison — the best micro-case study of parallel-agent failure/recovery using session logs as the source of truth .
  • Steve Ruiz (tldraw) — pragmatic company-building: code gets easier, but alignment/positioning/communication get harder—and he’s automating the overhead away .
  • Theo — sharp practitioner critique on Claude Code regressions plus continued pressure on “harness vs infra” policy differences across vendors .
  • François Chollet — frames agentic coding as ML optimization (spec/tests as constraints) and asks what the “Keras of agentic coding” will be ; @swyx suggests DSPy as the presumptive community default .

🎬 WATCH & LISTEN

1) Boris Cherny — “Plan mode” as the default starter move (~1:09:52–1:10:41)

Hook: a simple, copyable workflow: force planning first (no code), iterate the plan, then execute + auto-accept when the plan is solid .

2) Boris Cherny — “Coding is largely solved… what’s next?” (~0:18:19–0:19:06)

Hook: his thesis on why the frontier is shifting from IDE coding to adjacent operational tasks and general automation .

3) Steve Ruiz — daily automated release notes from landed PRs (~0:20:35–0:21:02)

Hook: treat agents like scheduled staff: every day, Claude scans the last 24h PRs and drafts “release notes we’d publish if we shipped main today” .

📊 PROJECTS & REPOS


Editorial take: As agents make code cheap, the new edge is orchestration discipline: plan-first loops, sandboxing, session-log recoverability, and AI-native interfaces that don’t force your agent to “be the computer.”

After “coding is solved”: plan-first, parallel-agent ops, and sandboxing become the workflow
Latent Space
youtube 1 doc

Martin Casado (a16z partner, codes daily on Spark JS open-source Gaussian splat rendering library ) shares firsthand LLM usage for coding:

  • Model comparison: Codex better than Opus 4.5 at finding hardest bugs; Opus excels in bedside manner for complex partner-like brainstorming .
  • Contrarian take: No specialized coding models needed—general models win as coding requires conversational skills (compliance, web search, history) beyond code generation .

Cursor (a16z portfolio): Dev tools co. built near-SOTA coding model at ~1/100th frontier cost, briefly most popular; acquired Graphite; extracts margin via app + models .

Repo: Spark JS on GitHub.

Boris Cherny
Profile 1 doc

Boris Cherny, Head of Claude Code at Anthropic (prolific coder, 100% of his code AI-written since Nov 2024, ships 10-30 PRs/day in production), shares firsthand workflows:.

Key workflows/tips (Claude Code, Opus 4.6, max effort):

  • Start 80% tasks in plan mode (Shift+Tab x2 in terminal; button elsewhere): model plans first, then executes (auto-accept edits after good plan) — one-shots most tasks.
  • Multi-quadding: Run 5+ parallel agents (terminal/desktop/iOS) for uninterrupted productivity.
  • Point at Slack feedback/bug channels: auto-suggests/implements PRs (fixes in minutes).
  • Human reviews for safety; Claude auto-reviews 100% PRs.

Quantitative: Anthropic productivity/engineer +200% (PRs); Boris: top producer pre/post-AI.

Patterns: Under-resource projects to force AI reliance/speed; unlimited tokens early; build for model 6mo ahead (bet on generality/tools over scaffolding/workflows); latent demand (e.g., non-coding uses → Cowork desktop app, built in 10 days w/ Claude Code).

Contrarian: Coding "largely solved"; no IDE needed; learn generalism over deep coding.

Salvatore Sanfilippo
Profile 1 doc

Redis creator Salvatore Sanfilippo observes top programmers abandoning US models (Codex, Cloud Code) for Chinese open-weight models (Kimi 2.5, GLM5) .

They use European/US providers or build in-house Nvidia GPU inference pipelines for companies, ensuring reliability, no cloud outages, and data privacy .

Anticipates imminent DeepSeek v4 as potential SOTA model, accelerating elite shift and challenging OpenAI/Anthropic businesses .

Secondhand trend reporting from followed experts.

Matthew Berman
youtube 1 doc

OpenClaw is a local, personal AI assistant powered by frontier models, used for coding and analysis . Matthew Berman, using it for personal projects, previously relied on Claude Opus 4.6 for most tasks (including coding, offloading some to Cursor CLI agent) due to its personality .

After Anthropic banned OAuth tokens from Claude subscriptions (Pro/Max) in third-party tools like OpenClaw—despite allowing API and Agent SDK for agentic loops—Berman switched . Subscription tokens ~90% cheaper than API; e.g., 50k input tokens (simple 'hello') costs $0.25 on Opus 4.6 API . OpenAI permits ChatGPT subscription OAuth with OpenClaw .

Current firsthand workflow (multi-model routing):

  • Coding councils (Security, Platform, Business meta-analysis, Innovation): GPT-4o-like ('GPT5.3 Codex X high fast'), offloaded to Cursor CLI agent .
  • CRM standard: GPT-4.5? ('GPT5.2') .
  • Fast tasks (notifications, rerankers): GPT-4o .
  • Knowledge base summarizer, prompt sync review, default: GPT-4.5? ('GPT5.2') . Backups: Sonnet 4.6, Opus 4.6, testing Gemini 2.0? ('Gemini 3.1') .
Andrej Karpathy
x 1 doc

Andrej Karpathy (@karpathy) used Claude to rapidly build a custom web dashboard (~300 lines) for tracking an 8-week cardio experiment: reverse-engineered Woodway treadmill cloud API, pulled/processed/filtered/debugged data, created UI frontend .

Workflow notes (firsthand, personal project):

  • Casual "vibe coding" took 1 hour total (vs. ~10 hours 2 years ago)
  • Fixed bugs like metric/imperial units and calendar date matching

Contrarian vision/patterns:

  • App stores outdated; LLM agents improvise ephemeral, bespoke apps on-the-spot
  • Need AI-native sensors/actuators with agent-friendly APIs/CLIs (not HTML UIs/instructions); currently 99% of products lack this
  • Ideal: 1 minute via personal context, Q&A, skill libraries
swyx
x 3 docs

@swyx predicts a shift to post-IDE form factors for agentic engineering, citing a prescient 2025 talk by @RealGeneKim and @Steve_Yegge .

After demo from @Wattenberger and Augment team, calls their Intent app the "ADE" (Agent Development Environment?), aggregating code agent management ideas without locking into Augment's agent .

Tool landscape comparisons:

  • Cursor 2.0: toe dip
  • Claude: folded into chat app
  • Codex: formalized Conductor patterns
  • Amazon Kiro: Spec Driven Dev

Firsthand context: @swyx (serves early adopters, @latentspacepod host) has "direct line of sight" to developments .

Resources:

Changelog
youtube 1 doc

Steve Ruiz (Tldraw founder, codes daily in production) shares firsthand workflows using Claude for accelerated development:

  • Rapid prototyping: Describe spec (e.g., ComfyUI-style image pipeline starter kit), reference 6 similar products; Claude builds 80% in ~2 hours (shipped in 4-5 days vs. weeks for 1-2 engineers). Steer UX/architecture without writing code.
  • Daily automation: Claude on Mac Mini scans 24h PRs, generates release notes from main branch MD.
  • Parallel agents: Run multiple coding agents simultaneously for design iterations/shots on goal (e.g., arrows, interactions); discard failures.
  • Internal tools: Feed CRM/meeting notes (Granola) to Gemini for insights/stories/product feedback.

Quantitative gains: Q3 2026 roadmap done in week 1; zero backlog via AI bug research.

Resources: tldraw agent starter kit (Cursor-like chat+canvas); upcoming fairies.tldraw.com (spatial AI agents).

Simon Willison
x 2 docs

Simon Willison (@simonw) described parallel agent psychosis—losing track of a feature across multiple branches, worktrees, and cloud instances during development .

He recovered a lost /tmp prototype after a crash from ~/.claude/projects/ session logs, where Claude Code extracted the code and recreated the feature .

Firsthand account from a top practitioner using coding agents for prototyping.

swyx
x 4 docs

New lightweight OpenClaw coding agent clones shared by @swyx:

Firsthand workflow from @swyx (affiliated with @cognition, @temporalio): Using deepwiki codemaps to interactively explore codebases with on-demand Q&A, preferring it over others' interpretations ("the map is very much not the territory") .

Theo - t3․gg
youtube 1 doc

Theo (full-stack TypeScript dev, T3.gg, T3 Chat founder) shares firsthand Claude Code usage: 420M+ input tokens (~2 months, one machine), mostly cache reads; output ~124k Opus tokens . Switched entire dev workflow to OpenAI Codex CLI.

T3 Chat prod usage (19 days): 6B input / 500M output tokens cost $23k on Anthropic API; Claude Code sub equiv. $1-1.5k. Claude Pro/Max subs yield up to $2708 inference for $200/mo (~13.5x subsidy) .

Building open-source agentic coding GUI alt. to Codex app using Codex app server + ChatGPT OAuth; plans Claude Code support via Agent SDK (e.g., read/edit/bash tools for bug fix) with user-BY0 subs .

Contrarian: Anthropic policies block 3rd-party harnesses/UI even for personal subs/OSS; praises OpenAI devrel/transparency .

swyx
x 2 docs

@fchollet frames advanced agentic coding as machine learning:

  • Optimization goal and constraints set via spec/tests; coding agents iterate to goal .
  • Yields blackbox codebase deployed sans internal inspection, like NN weights .

Classic ML issues apply: overfitting to spec, Clever Hans shortcuts failing generalization, data leakage, concept drift .

Seeks the "Keras of agentic coding": high-level abstractions for low-overhead codebase 'training' steering .

@swyx posits DSPy as community default/presumptive winner .

ThePrimeagen
x 2 docs

@ThePrimeagen observes a developer malaise similar to 2021 Neovim config enthusiasts: hype around building custom tech via prompts leads to endless optimization, ending with 8 agents managing disparate tasks, no progress, 100x brain speed, disrupted sleep/family time, and fixation on "just one more prompt" .

Contrarian take: Ability to build anything burdens with maintaining everything, worsened by loose contracts and rapid change .

Firsthand community observation from experienced engineer @ThePrimeagen.

Peter Steinberger 🦞
x 2 docs

Andrej Karpathy (@karpathy), renowned AI engineer, vibe-coded a custom cardio tracking dashboard using Claude (Anthropic LLM): Claude reverse-engineered Woodway treadmill cloud API, processed/filtered/debugged data, and built a web UI frontend—taking 1 hour total (vs. 10 hours 2 years ago), though iterative bug fixes were needed for metric/imperial units and calendar alignment .

Workflow tips (firsthand production-like use):

  • Loose, iterative prompting ("vibe coding") for custom, ephemeral apps.
  • Explicitly direct fixes for agent errors .

Quantitative insight: Should evolve to 1 minute with personal context, skill libraries, and Q&A .

Contrarian take: App stores outdated for long-tail needs; future is AI-native sensors/actuators (e.g., agent CLI/APIs) orchestrated via LLM glue into bespoke apps—99% of services lack this .

@steipete highlighted the CLI gap .

Quoted post: https://x.com/karpathy/status/2024583544157458452.

Kent C. Dodds ⚡
x 1 doc

Kent C. Dodds (@kentcdodds), a dev educator, shares his firsthand experience with a chaotic daily workflow involving several projects simultaneously, each with a couple cloud agents running and directly guiding an agent locally, alongside handling email, X, and DMs. He notes making progress and good work despite the chaos, which he enjoys .

LangChain
x 1 doc

LangSmith Agent Builder uses memory to improve agents with feedback . @LangChain shares three practical ways to maximize memory:

  • Tell agent to remember what works
  • Use skills for specialized context when needed
  • Edit instructions directly when faster

Full walkthrough: https://blog.langchain.com/how-to-use-memory-in-agent-builder/?utm_medium=social&utm_source=twitter&utm_campaign=q1-2026_ab-philosophy_aw

Theo - t3.gg
x 2 docs

GitHub Copilot subscription now supported in Zed editor (generally available).

Changelog: https://github.blog/changelog/2026-02-19-github-copilot-support-in-zed-generally-available/

Theo (@theo, developer/CEO): "ngl I love this timeline. Even Microsoft is behaving better than Anthropic" (contrarian take praising Microsoft/GitHub over Anthropic).

Theo - t3.gg
x 2 docs

Theo (CEO @t3dotchat, developer) reports Claude Code has regressed absurdly in recent days, making it genuinely unpleasant.

Specific issues:

  • Timestamps only update on tab un-focus/re-focus
  • No “thinking” indicator displays
  • Queries hang with 0 output, e.g., 6 minutes

Firsthand recommendation: Rewrite from scratch .

Cursor
x 1 doc

Cursor rolled out agent sandboxing across macOS, Linux, and Windows over the last three months .

Key features from the Cursor team (firsthand account): Sandboxes let agents run freely and securely, requesting approval only to step outside .

Implementation details: http://cursor.com/blog/agent-sandboxing.

Theo - t3.gg
x 1 doc

Theo (@theo, CEO @t3dotchat, developer) reports major recent regression in Claude Code.

Specific issues from firsthand use:

  • Timestamps fail to update without un-focusing/refocusing the tab
  • No “thinking” indicator shown
  • Query ran 6 minutes with zero output

Calls it “genuinely unpleasant to use” .

Armin Ronacher ⇌
x 2 docs

Claude Code v2.1.47 now uses less memory for long-running sessions, crediting @cirospaciari .

The team urges users to keep reporting issues for fixes .

Unofficial changelog: https://x.com/claudecodelog/status/2024240106324783247.

Flask creator @mitsuhiko praises embrace of inofficial changelogs .

Firsthand update from @jarredsumner (likely team member), shared by production-experienced developer.