Error!
Unable to generate download right now.
We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
🔥 TOP SIGNAL
Cursor’s big unlock this week is “demos, not diffs”: cloud agents can run the software they just built, test it end-to-end, and send you a video artifact as proof . Practitioners are saying this flips async agents from “fun but hard to trust” to “mergeable”—Jediah Katz reports that in the last two months >50% of his PRs were written by cloud agents once they could self-test and send videos .
🛠️ TOOLS & MODELS
Cursor Cloud Agents — “computer use” + video demo artifacts (shipping)
- Agents can onboard to your repo, use a cloud computer/remote desktop, and return video demos of the finished change .
- Cursor: “A third of the PRs we merge now come from agents running in cloud sandboxes.”
- Cursor CEO Michael Truell: “Over a third of our PRs are now created autonomously with this feature.”
- Internal example: Cursor agents modifying Cursor (e.g., adding secret redaction to model tool calls) and returning a multi-chapter demo video after E2E verification .
- Try/read: http://cursor.com/onboard · http://cursor.com/blog/agent-computer-use
Claude Code — Remote Control (rolled out to all Max users)
/remote-controllets you start a local terminal session, then continue it from your phone.- Boris Cherny says he’s been using it daily .
Claude Code — Slack plugin (context + updates)
-
Install with
/plugin install slackto connect Slack for search, messaging, doc creation, and pulling work context into Claude Code .
-
Install with
Claude Code — built-in git worktrees + tmux flags
-
New flags:
-w, --worktree [name]and--tmux; each session runs in its own worktree to avoid branch-switching chaos .
-
New flags:
Claude Code — notable performance datapoint
- Reported: p99 memory usage dropped 40× in the last two weeks, and 6× since January, while shipping new features .
Devin (Cognition) — enterprise-first PMF story + self-serve UX catch-up
- Scott (via @swyx): Devin didn’t have internal PMF at launch; first enterprise adoption took ~6 months; “async agents are the final boss of agent UX” .
- Claimed growth: usage doubled every 2 months in 2025 per enterprise after landing; accelerated to every 6 weeks so far this year; internal usage now 4× 2025 peak.
- Devin 2.2: sprint to pay down self-serve UX debt; omnibox; tighter “close the loop” integration with Devin Review .
💡 WORKFLOWS & TRICKS
Close the agent loop with “proof artifacts,” not trust
- Jediah Katz’s bottleneck framing: review/testing was the limiter (“you’re responsible… to deliver code you have proven to work”); video demos from agents shift what he can confidently merge without local checkout .
- Kent C. Dodds calls this “closing the agent loop” and credits Cursor’s computer-equipped cloud agents as a major step change for shipping from his phone .
“First run the tests” as your session opener (Simon Willison)
- Prompt: “First run the tests” to force test-suite discovery and put the agent into a testing mindset .
- Willison’s claim: automated tests are no longer optional when working with coding agents; if code hasn’t been executed, it’s luck if it works in production .
-
If you use
uvin Python, he prompts:Run "uv run pytest".
Generate a “linear walkthrough” doc for any repo (also Simon Willison)
- Use an agent to read the source and produce a structured walkthrough—especially helpful if you “prompted the whole thing into existence” and now need to understand it .
-
Willison’s implementation detail: use Showboat so the agent includes code snippets by running commands (
showboat exec+sed|grep|cat) instead of manual copy/paste (reduces hallucination risk) . - Example prompt (verbatim):
"Read the source and then plan a linear walkthrough of the code that explains how it all works in detail"
Peter Steinberger’s “conversational agent” habit: always ask for questions
- He treats coding with agents as a conversation and repeatedly asks: “Do you have any questions?” to surface hidden assumptions (models otherwise default to assumptions) .
PR review as intent review (not code review)
- Steinberger’s PR loop: first ask the model if it understands the intent of the PR and whether it’s the optimal solution; often the right fix is architectural/systemic .
Rubric separation to reduce “context rot” and bias (Doug O’Laughlin)
- He keeps task and rubric prompts separate because combining them can commingle information and increase bias/susceptibility; he also calls out sycophancy as a practical failure mode .
👤 PEOPLE TO WATCH
- Jediah Katz (Cursor) — concrete practitioner stat: >50% of PRs written by cloud agents once agents could self-test and send video proof .
- Michael Truell (Cursor CEO) — production signal: >⅓ of Cursor PRs now created autonomously with demos .
- Boris Cherny (Anthropic) — on-the-record: Claude Code does 100% of his coding; he “doesn’t write any of it anymore” .
- Simon Willison — turning agent work into repeatable patterns: “First run the tests” + agent-generated linear walkthroughs.
- Andrej Karpathy — pushing “build for agents”: CLI + Skills/MCP + exportable Markdown docs; argues CLIs are uniquely agent-friendly .
🎬 WATCH & LISTEN
1) Cursor: “A computer for every agent” (video artifacts as proof) (≈ 0:10–0:35)
Hook: Cursor shows agents testing their changes on a real desktop and returning a video artifact that demonstrates the feature works—not just a diff .
2) Cursor demo: “paste GitHub issue → agent works → browser proof” (≈ 0:47–1:05)
Hook: A concrete flow: paste an issue link; agent works ~40 minutes; returns an artifact showing it navigated to the locally running app and verified the result in-browser .
3) Claude Code (Boris Cherny): what changed at Opus 4.5 (≈ 8:02–8:52)
Hook: The shift from “agent does first pass, human fixes” to “agent runs tests, opens the browser, clicks around, and fixes UI issues”—so he no longer opens a text editor .
📊 PROJECTS & REPOS
- Showboat (Simon Willison) — a tool designed so agents can build trustworthy walkthrough documents using executed commands + captured output (instead of pasted snippets): https://github.com/simonw/showboat
- “present” (Simon Willison’s SwiftUI app repo) + generated walkthrough
- Repo: https://github.com/simonw/present
- Walkthrough doc: https://github.com/simonw/present/blob/main/walkthrough.md
- Polymarket CLI — positioned as a terminal interface agents can use to query markets/place trades/pull data .
Editorial take: The day’s theme is verification as a first-class artifact—agents that can run, test, and demo their own work are the ones that actually scale async development.