Error!
Unable to generate download right now.
We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
🔥 TOP SIGNAL
Coding agents crossed a “works in practice” threshold since December, driven (per Andrej Karpathy) by improved model quality, long-term coherence, and tenacity—enough to be disruptive to the default programming workflow. His concrete example: he handed an agent a single English brief to set up vLLM + Qwen3-VL, build a video inference endpoint + web UI, debug issues, install systemd services, and return a markdown report—hands-off in ~30 minutes.
🛠️ TOOLS & MODELS
GPT-5.3-Codex / Codex 5.3 vs Opus 4.6 (practitioner preference)
- Mitchell Hashimoto says Codex 5.3 is “much more effective” than Opus 4.6, and that after going back and forth he hasn’t touched Opus for a week—“first model to get me off of Opus… ever” .
- OpenAI’s Romain Huet says the team is “continuing to iterate and improve Codex every week” .
- Tool reliability signal: Brian Lovin hit Claude Code 500s, tried Codex, and reported “Codex is good!” .
Reasoning settings (Codex)
-
Sherwin Wu: they “basically only run [GPT-5.3-Codex] on
xhighnowadays for all coding tasks,” and notes speed improvements make it not feel slow even atxhigh. - Greg Brockman’s advice: “always run with xhigh reasoning” .
-
Sherwin Wu: they “basically only run [GPT-5.3-Codex] on
Claude Code — Remote Control (new capability, rough edges in testing)
-
Feature: run
claude remote-controllocally, then send prompts to that session from web/iOS/desktop; one session per machine and requires per-action approval. - Simon Willison reports it’s “a little bit janky,” including repeated API 500 errors and confusing failure behavior after restarting the program .
-
Feature: run
Devin 2.2 (Cognition)
- Cognition markets Devin 2.2 as an autonomous agent that can test with computer use, self-verify, and auto-fix; also claims 3× faster startup, redesigned UI, and “computer use + virtual desktop” .
OpenClaw — new beta
- Peter Steinberger: beta includes security improvements, various fixes, DM “heartbeat” made configurable after feedback, better Slack threads, improved subagents, and a more reliable Telegram webhook.
- Releases: https://github.com/openclaw/openclaw/releases.
Sourcegraph 7.0 (positioning shift)
- Sourcegraph says 7.0 marks a new chapter: doubling down on being an “intelligence layer” that developers and AI agents rely on to navigate/understand/operate on large codebases .
- Details: https://sourcegraph.com/blog/a-new-era-for-sourcegraph-the-intelligence-layer-for-ai-coding-agents-and-developers.
💡 WORKFLOWS & TRICKS
“English → parallel agents → you review” (Karpathy’s decomposition rule)
- Karpathy’s pattern: agents aren’t perfect—they need high-level direction, judgment, taste, oversight, iteration, hints, and they work best when tasks are well-specified and verifiable/testable.
- His operational heuristic: build intuition for task decomposition—hand off the parts that work well to agents, then “help out around the edges” .
- Scaling idea: build long-running orchestrators (“Claws”) with tools/memory/instructions managing multiple parallel “Code” instances .
Cursor cloud agent: “clone it from a video” as a starting point, then iterate for fidelity
- @swyx dropped a tweet + video into Cursor cloud expecting it not to work; he says Cursor Agent oneshotted a functional clone of Rachel Chen’s site from the video alone over 43 minutes (including a working “RachelLLM” sidebar) .
-
His follow-up prompt for fidelity is a reusable template:
- step through the video,
- discover assets (headless run / curl / network snooping),
- build a checklist + sitemap,
- spin up subagents/swarm for parallel work,
- don’t stop until behavior/visuals match closely; trade off fidelity vs simplicity when ambiguous .
- He reports a second improved output after another 43 minutes.
Run many agents in parallel (Cursor) + let the agent do exploratory UX testing
- Kent C. Dodds: he can run “as many of these [Cursor agents]” as he wants; instead of filing issues for ideas, he fires off prompts and gets back what it built (with screenshots) .
- He also saw the agent “noticed one UX edge case during walkthrough” while doing manual testing .
Long-running agent refactors overnight (Cursor) + “computer use” for steering
- Kent kicked off a long-running Cursor agent overnight and iterated in the morning using “computer use” .
- He reports it dropped ~15k lines in a refactor .
Code review aid: ask for a linear walkthrough of the codebase (Simon Willison)
- Willison’s prompt pattern: ask agents for “a linear walkthrough of the code that explains how it all works in detail” to understand vibe-coded output .
Git hygiene for agentic work: small commits, then squash (Huntley)
- Geoffrey Huntley suggests an agent-friendly workflow: make incremental small commits, then squash to a single commit so “study git log” for a unit of work can be a single tool call .
Production caution: don’t trust “ranked” PR scores if they’re editable
- Steinberger says they use Greptile to rank PRs, but observed someone manually edited a PR review score from 2/5 to 5/5.
- Example PR: https://github.com/openclaw/openclaw/pull/13095.
OSS maintainer playbook shift: tests as “reimplementation fuel”
- Simon Willison notes that a comprehensive test suite can be enough to rebuild a library from scratch, and highlights tldraw moving tests to a private repo as a response pattern .
👤 PEOPLE TO WATCH
- Andrej Karpathy — clearest firsthand articulation of what changed since December, plus a concrete “30 minutes, hands-off” agent-run build story and an orchestration north star (“Claws”) .
- Simon Willison — consistently turns agent usage into repeatable patterns (e.g., “linear walkthroughs”), and also documents sharp edges like Claude Code Remote Control’s failure modes .
- Mitchell Hashimoto — high-signal model/tool preference note: Codex 5.3 displaced Opus 4.6 for him after direct comparison .
- Kent C. Dodds — pragmatic day-to-day agent usage: parallel agents, long-running refactors, and agents surfacing UX edge cases during walkthroughs .
- ThePrimeagen — counterweight: after ~3 months of vibe-coding, he says he hates the generated code and the “subtle offness,” and plans to “tradcode” (useful reality check on taste/intent gaps) .
🎬 WATCH & LISTEN
- No YouTube videos or podcast episodes were included in today’s source set, so there are no embeddable clips to share.
📊 PROJECTS & REPOS
Simon Willison — “Present” (SwiftUI macOS presentation app) repo + walkthrough
- Repo: https://github.com/simonw/present
- Walkthrough doc: https://github.com/simonw/present/blob/main/walkthrough.md
OpenClaw — releases + active PR example
- Releases: https://github.com/openclaw/openclaw/releases
- PR referenced in Greptile score-editing report: https://github.com/openclaw/openclaw/pull/13095
tldraw — tests moving closed-source (issue)
Editorial take: The bottleneck is shifting from “can the agent write code?” to “can you reliably steer, verify, and govern what it did?”