We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Your intelligence agent for what matters
Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.
Your time, back
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
3 steps to your first brief
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Review and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Get your briefs
Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.
Sam Altman
Harry Stebbings
Andrew Ng
Funding & Deals
- Humble — $24M seed. Eclipse led with Energy Impact Partners participating. The thesis is architectural: remove the cab, get 360° sensor coverage, cut weight, and optimize for 40- and 53-foot intermodal containers in dock-to-dock routes. CEO Eyal Cohen previously worked at Apple, Uber ATG, and Waabi, and co-founded Spark AI, acquired by John Deere in 2023; he says the team reached a prototype in under six months.
- AstorInvest — $5M seed. YC says Astor is building an AI investment advisor for everyday investors that connects to a brokerage account, analyzes the portfolio, and delivers personalized recommendations. Reported early traction: thousands of users connected more than $200M in assets less than two months after launch.
- ComfyUI — $30M at a $500M valuation. Led by Craft Ventures with PaceCap, Chemistry, TruArrow, and others, the round is a useful open-infrastructure read-through: ComfyUI says it has 4M users, 60k+ community-built nodes, and 150k+ daily downloads, and plans to spend on Comfy Cloud, collaborative workflows, local UX, ecosystem reliability, and day-one model support while keeping the platform open.
Emerging Teams
- Medra — robotic biology infrastructure with an AI scientist layer. Michelle Lee’s company opened a 38,000 sq ft San Francisco lab where roughly 100 robotic arms run experiments continuously. Its core wedge is using computer vision and manipulation models on standard lab equipment, which Lee says can raise the share of biotech tasks that can be automated from 5% to 75%; the company frames itself as TSMC for biology. In one cited customer example, the AI scientist proposed adding a vortexing step and improved antibody binding from 0% to above 70%.
- OpenWork — open-source enterprise rollout layer for AI. YC describes OpenWork as an open-source alternative to Claude Cowork that supports existing agents, on-prem deployment, and any LLM provider. Early distribution is strong for this category: 14k GitHub stars and more than 150k downloads. YC highlighted founder Benjamin Shafii at launch.
- Burrow — runtime security for agents from an operator who saw the failure mode firsthand. The founder says he leads cloud security at a company processing $80B in annual payments and started building Burrow after an internal AI agent deleted a production S3 bucket with customer data. The product lets teams define agent controls in plain English, create alerts for agent deviation, and investigate or quarantine agents through its Lookout service.
- Opero — small but measurable early traction in WhatsApp-native agents. Three weeks in, the founder reports 25 users and 2 paying customers. The sharper product ideas are an LLM-evaluated signals system that only emits structured CRM webhooks when a user-defined condition is met, plus a self-improving loop where the owner answers one question and the agent stores the answer for future use; reported median turnaround is under 90 seconds.
AI & Tech Breakthroughs
- DeepSeek V4 pushes the long-context efficiency frontier again. A technical deep dive describes V4 Pro as a 1.6T-parameter model with 49B active parameters and a new DSA hybrid attention architecture. At 1M context, the post says compute cost per token falls to 27% of V3.2 and KV cache to 10%, while LiveCodeBench reached 93.5, above GPT-5.4 at 91.7 in the cited comparison. The same post notes a weak spot in world knowledge, with SimpleQA-Verified at 57.9 versus Gemini 3.1 Pro at 75.6, and says DeepSeek describes itself as still 3-6 months behind the frontier there; the release is MIT licensed, with a 284B Flash model and 13B active parameters available.
- Agentic workflows still look like a bigger lever than base-model upgrades. Andrew Ng argues that iterative loops such as outlining, critiquing, researching, and revising produce much better work than one-shot prompting, and says his team found the gain from adding agentic workflow to GPT-3.5 on a coding benchmark was larger than the gain from moving from GPT-3.5 to GPT-4. AI Fund says it has been helping portfolio companies deploy these workflows, and Ng separately pointed to CrewAI, AutoGen, and LangGraph as agent workflow platforms to watch.
- Runtime retrieval is starting to close the training-cutoff gap for coding agents. Paper Lantern says its MCP server lets coding agents pull implementation guidance from more than 2M computer-science papers at runtime. In its 9-task benchmark, 5 tasks improved meaningfully; Python test generation moved from 63% bug catch to 87% using mutation-aware prompting from retrieved papers, and contract extraction improved from 44% to 76% using March 2026 papers that post-dated model training. Across the benchmark, 10 of the 15 most-cited papers were from 2025 or later.
- Frontier model launches are not automatically collapsing specialist infra. Sam Altman said GPT-5.5 and GPT-5.5 Pro are now available in the API, but LlamaIndex said its ParseBench testing still showed mixed OCR results: GPT-5.5 won on tables and visual grounding, lost on charts, content faithfulness, and semantic formatting in some comparisons, and came with materially higher per-page pricing than LlamaParse’s cited 1.25¢ per page.
Market Signals
- Bifurcation is no longer theoretical. SaaStr, citing Sapphire data, says enterprise software captured 52% of all VC funding in 2025, up from 41% in 2024, and that 80+ AI-native companies have already reached $100M+ ARR in under 18 months. AI-native operating profiles are diverging sharply from classic B2B, with cited ranges of 200-400% ARR growth, 130-200% NDR, 40-70% gross margins, and $1M-$5M ARR per employee. The same report says the top 10 private enterprise software companies are worth $1.93T, more than the pure SaaS public index at $1.88T, while public enterprise software has lost $2.4T in market cap since the October 2025 peak and pure SaaS trades at 3.1x NTM revenue.
- Valuation froth looks concentrated, not universal. Elizabeth Yin says the current bubble is strongest in AI infrastructure, where companies can reach millions in revenue in weeks or months, while crowded horizontal AI tools can attract few or no investors. She expects the frothiness to cool in 1-2 years as low-hanging use cases are exhausted, CAC rises, adoption slows, and investors pull back; her advice to founders is to optimize for business quality, not ease of fundraising.
- Due diligence is shifting from topline to engagement and moats. Harry Stebbings argues that in B2B AI, MAUs, WAUs, and DAUs now matter more than revenue because flat usage can hide stealth churn, while Clement Delangue says investors have become too fixated on top revenue growth and need to return to moats, product quality, and differentiated usage.
- Seed investing still rewards volume, even in an AI-heavy cycle. Newcomer, citing Dealroom, says YC leads seed-stage investing with 94 companies that later reached $100M+ revenue and now backs roughly 500-600 startups per year. SV Angel follows a similar small-check, wide-net approach with around 50-100 new investments annually, while Sequoia stands out as the most successful non-accelerator seed fund. The same Newcomer item notes Bill Gurley’s view that the AI boom remains heavily subsidized by VC cash.
- Founders may be underestimating non-AI opportunities and overestimating coding as the bottleneck. Paul Graham says AI is the biggest opportunity for startup founders, but non-AI ideas may be the most underpriced because others overlook them and some later become much larger through AI. Garry Tan’s related point is that in AI companies, deciding what to build, for whom, and how to get adoption is harder than writing the software.
Worth Your Time
Andrew Ng on agentic workflows
He argues that iterative agentic workflows can create larger gains than the GPT-3.5-to-GPT-4 jump on coding benchmarks, and pairs that view with falling training costs and better inference hardware.
Diana Hu on building an AI-native company
Useful for founders thinking about closed-loop companies, queryable orgs, software factories, and lean teams built around an intelligence layer rather than management middleware.
DeepSeek V4 primary materials
The paper and model collection.
Paper Lantern benchmarks and demo
Open benchmark repo and product demo: GitHub and paperlantern.ai/code.
Learning mechanics
Kanjun highlighted a new paper that tries to name and organize an emerging scientific theory of deep learning, framing learning mechanics as the physics to mechanistic interpretability’s biology. Read the paper.
Elizabeth Yin’s valuation essay
Her thread links a fuller argument for why some AI valuations may be justified by revenue velocity while crowded horizontal categories may reset as competition intensifies. Read it here.
Riley Brown
Geoffrey Huntley
Salvatore Sanfilippo
🔥 TOP SIGNAL
Harness design is becoming the real differentiator. GPT-5.5 is being singled out for long-running code/data/tool work and natural job monitoring, Cursor 3.2 is shipping async subagents and worktrees for parallel background execution, and Salvatore Sanfilippo shows DeepSeek v4 can be dropped into CloudCode with an endpoint swap while still feeling close to recent closed frontier models in real coding-agent work . The practical takeaway: model quality still matters, but orchestration, migration discipline, and review boundaries are increasingly what decide whether an agent actually ships useful work .
🛠️ TOOLS & MODELS
- GPT-5.5 keeps spreading across dev surfaces. Cursor says it is now available there, tops CursorBench at 72.8%, and is 50% off through May 2. OpenRouter frames it as SOTA for long-running work across code, data, and tools. Romain Huet’s API note is the most practical framing: for developers, it gets complex tasks done with fewer tokens and fewer retries. Devin’s team says it runs longer and more autonomously than any GPT model they have tested .
- Cursor 3.2 = better orchestration, not just better chat.
/multitaskruns async subagents instead of queueing requests, can multitask already-queued messages, adds improved worktrees for isolated background tasks across branches, and supports multi-root workspaces for cross-repo sessions . Jediah Katz’s recommended pattern: dedicate an async subagent to monitor a background job . - DeepSeek v4 Pro is the strongest open-weight coding story in today’s notes. In Salvatore Sanfilippo’s testing, the 1.6T MOE model with 49B active params and 1M context feels aligned with closed frontier models from roughly 3-6 months ago and is especially competent for software development . His CloudCode setup was simple: redefine endpoints with env vars/shell script, and the test session cost about $1/hour in tokens . He also flags the caveat that benchmark gains are outpacing real-world gains, so don’t confuse leaderboard movement with proportional productivity jumps .
- DeepSeek v4 Flash is the local angle to watch. Sanfilippo says the smaller Flash variant is viable for local inference on a 512GB Mac Studio, while Pro output pricing was quoted at about $3.48/M output tokens and Flash is cheaper . His warning: local coding-agent stability depends heavily on sampling settings, or smaller models can get stuck in repetition loops .
- Current practitioner stack rankings are moving fast. Mckay Wrigley says his coding split flipped from 80/20 Claude/GPT to 80/20 GPT/Claude in under three months, and if he could keep one model for engineering right now it would be GPT-5.5. His tool read is blunt: Codex and Claude Code are T1, Cursor is T2 in his current workflow; Codex feels like an engineer, Claude more like a general-purpose coworker .
- Google Cloud’s internal harness story matters more than another public benchmark chart. Thomas Kurian says many engineers use the internal JetSki coding harness, that feedback flows directly into Gemini improvement, and Gemini is already used to scan for security issues before senior review and to troubleshoot cloud incidents by exposing tools/APIs to the model .
💡 WORKFLOWS & TRICKS
- When migrating to GPT-5.5, don’t treat it like a drop-in. OpenAI’s guidance is to start from the smallest prompt that preserves the product contract, then retune reasoning effort, verbosity, tool descriptions, and output format instead of hauling over your whole old prompt stack . If you want the lazy path, Simon Willison points to the Codex command
openai-docs migrate this project to gpt-5.5, and Romain Huet explicitly suggests asking Codex to migrate a Responses API integration for you . - Force a short status update before any tool calls on multi-step work. OpenAI’s prompting guide recommends a 1-2 sentence user-visible update that acknowledges the request and states the first step; Simon notes Codex already does this, and it makes long runs feel much less like the model crashed .
- Cursor’s best new pattern: spawn a watcher, not more queue. Use
/multitaskto create an async subagent, let it monitor a background job, and keep the main thread moving; queued messages can also be converted into multitasked work instead of waiting for the current run to finish . - If you evaluate coding agents, steal Sanfilippo’s harness. Give the model a small but real codebase, a hard line-count budget, a non-trivial test suite, benchmark programs, and explicit anti-benchmaxing rules; then only count wins if speed improves with no regressions . His optimization hints—dual-ported objects, stack-machine expressions, fixed local-variable slots—show how to give strong priors without hand-writing the patch .
- Human-in-the-loop still wins at the PR boundary. Kent C. Dodds says he can let agents work through personal-but-complex software mostly on their own, then review the PRs when they are done . Google Cloud is running the same shape at org scale: model-first inspection, human peer review retained, and exploration of separate supervisor models for review .
- Measure output like Google does: functions shipped, not lines of code. Kurian’s point is simple: senior engineers write more compact code, so LoC is a bad productivity metric in an agent-heavy workflow .
👤 PEOPLE TO WATCH
- Salvatore Sanfilippo — one of the few people doing repeated, same-task comparisons across frontier and open-weight models instead of screenshot benchmarks. His DeepSeek v4 tests and local-inference notes are useful because they include both wins and caveats .
- Jediah Katz — high-signal on agent UX right now. The useful detail today was not just that GPT-5.5 is strong, but that it is strong specifically at multitasking and monitoring long-running work—and Cursor is shipping around that behavior .
- Geoffrey Huntley — worth tracking for timeless agent patterns. His Ralph Wiggum memory-management loop is now built into Claude, Cursor, and Copilot, and his bigger point is that deliberate practice still separates casual use from real leverage .
- Kent C. Dodds — clean articulation of the end-state workflow: autonomous execution first, human review second .
- zeeg + ThePrimeagen — useful anti-hype filter.
"The state of the art is still ‘can we even one shot a production quality patch that we wont regret later’, and its rarer than you’d expect based on discourse."
Primeagen says he likes this framing not because he is anti-AI, but because obsessive prompt-chasing can wreck sleep, relationships, and life balance .
🎬 WATCH & LISTEN
- 07:47-12:37 — Salvatore Sanfilippo’s coding-agent benchmark design (Italian). Best technical segment of the day if you care about evaluation quality: a tiny interpreter, 70 tests, hard code-size limits, explicit speed targets, and anti-benchmaxing constraints .
- 15:13-17:17 — Why local coding agents still loop and degrade (Italian). Useful reality check on OMLX and local inference: fast runtimes are not enough if repetition penalties and sampling are off .
- 00:05-03:12 — Riley Brown on Codex + Remotion. Good hands-on walkthrough of why built-in plugins matter: one interface for prompts, code generation, and rendered artifacts, with a very copyable project setup .
📊 PROJECTS & REPOS
- Gondolin — sandbox project supporting QEMU, krun, and WASM on a branch . The interesting bet: its builders picked QEMU over Firecracker because they think future agents need “the computer they’ll actually need,” not just a thin function runtime .
- Ralph Wiggum loop — not new, still relevant. Huntley describes it as the memory-management technique now built into Claude, Cursor, and Copilot, and says it spread through YC startups in early 2024 . The core pattern is still simple: keep appending working memory to an array and resend it to a stateless API in a loop .
- OMLX — MLX-based local inference tooling for Mac worth watching if you want local agent runs with larger open-weight models . The caveat is the point: speed is nice, but stable coding-agent behavior depends on tuned repetition penalties and sampling .
Editorial take: the edge is shifting from “who writes the prettiest diff” to “who can keep a long-running agent on the rails, visible to the user, and reviewable at the end.”
Cognition
GitHub
Tencent Hy
Top Stories
Why it matters: Today’s clearest signals were distribution, efficient long-context inference, and the compute race behind frontier models.
GPT-5.5 moved from launch to broad deployment. OpenAI made GPT-5.5 and GPT-5.5 Pro available in the API, including a 1M context window and a higher-accuracy Pro option in the Responses API . GitHub Copilot, Cursor, Perplexity Computer, and Devin also rolled it out or began using it as a default/orchestrator model . The recurring theme was efficiency: on Notion’s knowledge-work benchmark, GPT-5.5 was 33% faster than Opus 4.7 while using half the tokens, and on LisanBench it used about 45.6% fewer tokens than GPT-5.4-medium while scoring 1.77x higher .
DeepSeek V4 made open-weight competition look more like a systems story than a parameter story. At 1M context, V4-Pro uses 27% of V3.2’s single-token FLOPs and 10% of its KV cache, which DeepSeek commentators say can translate into far more concurrent long-context requests on the same hardware . Artificial Analysis says V4 Pro leads open-weight models on GDPval-AA at 1554, while V4 Flash shifts the price/performance frontier; it also reports very high hallucination rates for both models .
Alphabet deepened the compute war around Anthropic. Alphabet said it will invest up to an additional $40 billion in Anthropic and provide at least 5 GW of computing power . The business implication is straightforward: frontier competition is increasingly being financed as dedicated infrastructure, not just model R&D.
Research & Innovation
Why it matters: The most interesting research today focused on harder math, more reliable tool use, and longer-horizon memory for agents.
OpenAI linked GPT-5.5 to a new Ramsey-number result. Sebastien Bubeck said an internal version of GPT-5.5 proved that the ratio
R(k,n+1)/R(k,n)tends to 1 for all fixedk, solving Erdős problem #1014; OpenAI also published a proof PDF and a Lean verification .A new paper targeted the
MCP taxin tool-heavy agents.Tool Attention Is All You Needproposes dynamic tool gating plus lazy schema loading; on a simulated 120-tool benchmark it cut tool tokens 95%, from 47.3k to 2.4k per turn, while raising effective context utilization from 24% to 91% .StructMem argues agent memory needs maintenance, not just retrieval. The paper stores simple memories first, then consolidates them in the background into structured relationships across time and events, targeting a common long-horizon failure mode: losing the links between facts .
Products & Launches
Why it matters: Product competition is shifting from raw model access toward orchestration, parallelism, and tighter user control.
- Cursor 3.2 added
/multitask, letting async subagents run requests in parallel instead of queueing them, plus background worktrees and multi-root workspaces for cross-repo changes . - Gemini API added collaborative planning for Deep Research: users can request a plan, refine it, and only then approve execution .
- Gemini’s April Drops bundled a native Mac app, Lyria 3 Pro music generation, NotebookLM integration, interactive visuals, and conversation branching fixes .
Industry Moves
Why it matters: Major companies kept buying compute, sovereignty, and distribution rather than waiting for the next model cycle.
- Cohere and Aleph Alpha said they are forming a transatlantic AI powerhouse anchored in Canada and Germany to build sovereign, enterprise-grade AI for businesses and governments .
- Meta and AWS agreed to bring tens of millions of AWS Graviton cores into Meta’s compute portfolio to scale Meta AI and agentic experiences .
- Cloud GPU scarcity is tightening again. Reporting from The Information says providers like Microsoft are diverting GPUs to internal teams or larger customers, leaving smaller AI startups scrambling .
Quick Takes
Why it matters: These smaller updates add texture to where models, agents, and benchmarks are moving next.
- Anthropic’s Project Deal let Claude agents negotiate for 69 employees; they closed 186 deals worth over $4,000, and Opus models got substantially better deals than Haiku models .
- Xiaomi’s MiMo V2.5 Pro hit 54 on the Artificial Analysis Intelligence Index, tying Kimi K2.6, and scored 1578 on GDPval-AA; weights are expected soon .
- ParseBench found GPT-5.5 strong on tables and visual grounding for enterprise OCR, but weaker on charts, faithfulness, and semantic formatting, at 5.93¢ to 13¢ per page .
- Tencent open-sourced Hy3 preview as a 295B A21B reasoning/agent model, and it is now live on Arena for public evaluation .
Tim Ferriss
Tim Ferriss
What stood out
Two kinds of authentic recommendations emerged today: a reflective article Tim Ferriss still considers unusually impactful, and a technical Cloudflare post Tomasz Tunguz credits with shaping an AI-agent implementation. Ferriss also shared a compact reading stack for category design, audience-building, and wellbeing
Most compelling recommendation
- Title:The Tail End
Content type: Blog post / article
Author/creator: Tim Urban
Link/URL:https://waitbutwhy.com/2015/12/the-tail-end.html
Who recommended it: Tim Ferriss, who said Matt Mullenweg first pointed him to it on a hike in San Francisco
Key takeaway: Ferriss said the piece uses diagrams to underscore how short life is and can prompt a rethink of personal priorities
Why it matters: This had the strongest personal endorsement in today’s set. Ferriss said that if you only read one article this month, it should be this one, and later called it one of the most impactful blog posts he has ever read
“It turns out that when I graduated from high school, I had already used up 93% of my in-person parent time. I’m now enjoying the last 5% of that time. We’re in the tail end.”
Highest-utility operator pick
- Title: Cloudflare blog post on code model tools
Content type: Blog post
Author/creator: Cloudflare
Link/URL: No direct URL provided in the source material; source context: SF AI Engineers: inside Vision AI, Coding Agents + Rust Systems
Who recommended it: Tomasz Tunguz
Key takeaway: Tunguz said the post was the main reason they implemented a discovery API in which the agent first asks which tools and functions are available, then builds a plan; he said this significantly compressed tokens and acted as living documentation for the model
Why it matters: This was the clearest recommendation today with measurable implementation impact. Tunguz tied it to using smaller open-source models and reducing monthly errors from roughly 50,000 to 114
“the main reason we did this is Cloudflare published a blog post a little while ago on code mode tools.”
Ferriss’s compact stack for positioning and sanity
In the same conversation, Ferriss recommended durable reads for category design and audience-building, then separately pointed to exercise as a foundation for wellbeing
“you’re competing in an algo chasing game ... the window for that working is going to close very quickly.”
Title:Blue Ocean Strategy
Content type: Book
Author/creator: Not specified in the source material
Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
Who recommended it: Tim Ferriss
Key takeaway: Ferriss said he would be reading it in response to a market where algorithm-dependent tactics may not work longitudinally
Why it matters: He positioned it as a way to think about durable differentiation instead of short-term reach hacksTitle:The 22 Immutable Laws of Marketing
Content type: Book
Author/creator: Not specified in the source material
Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
Who recommended it: Tim Ferriss
Key takeaway: He singled out the chapter on the law of category
Why it matters: It sat inside the same advice set on building trust and credibility without leaning on fragile algorithmic distributionTitle:1,000 True Fans
Content type: Essay
Author/creator: Kevin Kelly
Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
Who recommended it: Tim Ferriss
Key takeaway: Ferriss said he would be reading it and added that many of the people who convert best right now may not think of themselves as creators
Why it matters: It complements his broader advice to build durable audience relationships rather than chase platform volatilityTitle:Spark
Content type: Book
Author/creator: Not specified in the source material
Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
Who recommended it: Tim Ferriss
Key takeaway: Ferriss recommended it as a book on the effects of exercise on cognition while arguing that taking care of the body supports the brain and mind
Why it matters: It was the clearest wellbeing recommendation in today’s set, aimed at sanity and performance rather than positioning alone
Bottom line
If you open one resource first, start with The Tail End for the strength and durability of Ferriss’s endorsement . If you want the most immediately applicable operator read, follow Tunguz’s pointer to Cloudflare’s code model tools post via the talk context above .
Shreyas Doshi
signüll
Big Ideas
1) Build ahead of model capability, then strip the scaffolding
Anthropic's Claude Code team built code review before the models were accurate enough; because the prototype already existed, they could swap in newer Opus models and test the idea again as capability improved . The same team audits prompts and workflow crutches after model releases, removing features that weaker models once needed, such as to-do lists once Opus 4 could track work natively .
- Why it matters: Waiting for perfect model capability can leave a team behind, while keeping old scaffolding for too long creates product debt .
- How to apply: Build promising ideas to the point where a model swap can be tested immediately, then run a release-by-release audit of prompts, guardrails, and helper steps to remove what stronger models no longer need .
2) The scarce PM skill is discernment plus diagnosis
"now it’s: do you know what’s worth building, & can you feel when it’s wrong."
Shreyas Doshi says this discernment is learnable with the right mindset, but requires unlearning prior habits . Anthropic adds a concrete AI-native diagnostic move: ask the model to explain its own mistakes, because the answer can reveal a confusing system prompt or a subagent that failed to verify its work .
- Why it matters: As building gets cheaper, PM leverage shifts toward choosing the right problems and understanding why a system failed .
- How to apply: When results are poor, separate the diagnosis into three questions: was the bet wrong, was the prompt or harness wrong, or did the verification flow fail?
3) Treat your operating model like a product
Tim Herbig's framing is to treat ways of working like products: optimize for value over theoretical correctness, and connect strategy, OKRs, and discovery to the team's specific context .
- Why it matters: In fast-moving environments, process that looks correct but creates little value becomes a drag .
- How to apply: Review your recurring rituals the way you would review features: what job they serve, what value they create, and whether they should be kept, adapted, or removed in your context .
4) PM work is moving toward supervising fleets of AI tasks
Anthropic describes a progression from single successful tasks to running many tasks at once—eventually 50 or 100 simultaneously—which requires remote execution, better task-management interfaces, output verification, and self-improving feedback loops .
- Why it matters: The human role shifts from doing every task directly to deciding what to inspect, verifying outputs, and improving the system over time .
- How to apply: In your own AI workflows, explicitly separate task definition, execution, verification, and feedback so you can see where orchestration breaks first .
Tactical Playbook
1) Run a model-introspection debugging loop
- When the model makes an unexpected decision, ask it why it made that choice .
- Check whether the explanation points to a confusing system prompt .
- Check whether a subagent delegated verification but failed to actually verify the work .
- Fix the harness, then rerun the task .
- Why it matters: This turns vague model failure into a fixable prompt or orchestration problem .
- How to apply: Make introspection a standard part of AI-product QA, not an ad hoc trick used only when a launch is already off track .
2) Add a build-ahead and release-audit cycle
- Build versions of promising ideas that are "on the edge of working" instead of waiting for perfect model capability .
- When stronger models ship, swap them into the existing prototype immediately to test whether the capability gap has closed .
- After each major model release, audit prompts and workflow steps for scaffolding the model may no longer need .
- Remove the crutches that have turned into debt, as Anthropic did with Claude Code's to-do lists .
- Why it matters: The same operating loop helps teams capture upside faster and simplify products as models improve .
- How to apply: Put model-release reviews on the team calendar the same way you schedule launch retrospectives .
3) Audit PM process for value, not framework purity
- Pick one practice at a time—strategy reviews, OKRs, or discovery rituals .
- Ask what value it creates for the team rather than whether it matches a textbook model .
- Check whether it actually connects strategy, OKRs, and discovery in your context .
- Keep, adapt, or drop the practice based on that value test .
- Why it matters: It is easier to remove low-value process when the evaluation standard is usefulness, not orthodoxy .
- How to apply: Use this audit when a team is debating process changes but cannot explain what better outcomes the current process creates .
4) Be selective if you formalize decision memory
A Reddit thread highlighted a recurring problem: new PMs may not know why a decision was made, and teams can end up re-debating issues that were closed months earlier . A commenter also warned that trying to track everything can become "a death by a thousand cuts" or a liability in some industries .
- Why it matters: Decision memory can reduce ramp-up friction, but documenting every decision has real overhead and risk .
- How to apply: If you try to solve this, start with the decisions that most often cause onboarding delays or repeat debate, rather than exhaustive logging .
Case Studies & Lessons
1) Rakuten: one managed agent per department
Rakuten deployed one Claude Managed Agent for each department—engineering, product, sales, marketing, and finance—and each agent went live in under a week . Reported results were a 97% reduction in critical errors and a release cadence change from quarterly to biweekly . Aakash Gupta argues the old gating problem—sandboxed execution, credential vaulting, audit trails, and scoped permissions—was handled by Anthropic, letting Rakuten focus on defining the job each agent should do .
- Lesson: When infrastructure constraints move to the vendor, PM work shifts toward scoping, ownership, and adoption .
- How to apply: Start with a department-sized job to be done, define the agent's scope clearly, and do not assume the rollout still needs quarter-scale custom infrastructure work .
2) Claude Code: prototype early, simplify later
Claude Code's code review product failed multiple times because earlier models were not accurate enough, but the prototype was already built, so Anthropic could quickly test it again with Opus 4.5 and 4.6 . As model capability improved, the team also removed legacy scaffolding that weaker models had needed .
- Lesson: In AI products, "not ready yet" can still be a reason to build the surrounding product shell if you expect model quality to improve .
- How to apply: For high-upside ideas blocked by current model performance, build enough of the experience, measurement, and harness that a better model can be evaluated immediately when it arrives .
Career Corner
1) Discernment is trainable, but it requires unlearning
Shreyas Doshi says the new question is not whether you can build it, but whether you know what is worth building and can feel when it is wrong . He also says this discernment is learnable with the right mindset, but requires unlearning prior teachings .
- Why it matters: AI raises the value of product judgment relative to delivery mechanics .
- How to apply: In your own work, review launches and misses with an explicit "what did I misread?" lens, not just a "what did we ship?" lens .
2) AI teams are hiring for resilience, not just PM fundamentals
Anthropic says it looks for people who can lean into chaos, stay optimistic, and tackle hard challenges without burning out as priorities change quickly .
- Why it matters: In high-velocity environments, the ability to keep operating through shifting priorities is itself a career asset .
- How to apply: In interviews, use examples that show calm execution under ambiguity, not only polished planning artifacts .
3) For PM interns, optimize for relationships, questions, and notes
Advice from former PM interns in Reddit focused on accepting limited direct impact, bringing curiosity and energy, attending events, setting up 3–5 new 1:1s each week, finding a mentor, keeping a running question list, and getting strong at note-taking . One commenter also recommended reading Inspired and Empowered.
- Why it matters: The advice prioritizes network-building, context gathering, and observation over trying to look like a fully formed PM on day one .
- How to apply: Build a simple weekly cadence: new 1:1s, one mentor conversation, one question list, and one clean set of meeting notes .
4) Customer-focus is a fair interview bar; surprise unpaid research is not
One Reddit candidate described preparing a take-home presentation, then being asked without prior notice which company customers they had interviewed; the interviewer reportedly argued that presentations are easy because AI tools can help, while talking to customers is the real value . Commenters suggested more valid alternatives: ask about the candidate's research sources and how trustworthy they are, or role-play a customer discovery conversation .
- Why it matters: Strong PM interviews should test discovery judgment, but the test itself should be explicit and job-relevant .
- How to apply: Clarify expected research inputs before take-homes, and use the interview design itself as a signal about how the company works .
Tools & Resources
1) Cat Wu on Claude Code PM practices
https://x.com/lennysan/status/2047669259380383955 covers build-ahead prototyping, model introspection, scaffolding audits, and the shift toward managing many AI tasks at once .
- Why explore it: It packages several concrete AI-native PM operating ideas in one place .
- How to use it: Review it with your team and decide which one change to test first: introspection debugging, build-ahead prototyping, or release audits for scaffolding .
2) Rakuten case study
http://claude.com/customers/rakuten is the source linked in Aakash Gupta's note about Rakuten's managed-agent rollout .
- Why explore it: It includes concrete reported outcomes—97% fewer critical errors and releases moving from quarterly to biweekly .
- How to use it: Use it to frame internal discussions around departmental scope, deployment speed, and where vendor infrastructure changes the rollout plan .
3) Anthropic automations deep dive
https://www.news.aakashg.com/p/claude-automation-pms is Aakash Gupta's deeper breakdown of Anthropic's automation surfaces .
- Why explore it: It translates the Rakuten example into a planning implication for PMs: the constraint may have moved from infrastructure to task definition .
- How to use it: Share it when stakeholders still assume an internal AI-agent deployment must be scoped as a multi-quarter engineering project .
4) Uncertainty-Driven Discovery
https://runthebusiness.substack.com/p/uncertainty-driven-discovery features Tim Herbig's argument for value-first product practices that fit context instead of rigid frameworks .
- Why explore it: It is useful when the team is debating process more than value .
- How to use it: Use it as a prompt for a retrospective on whether your current OKR and discovery routines are actually helping the team make better decisions .
Cursor
GitHub
AI at Meta
GPT-5.5 becomes a distribution story
Microsoft and developer tools move quickly
OpenAI's new model moved into major work products quickly. Microsoft said GPT-5.5 is rolling out to GitHub Copilot, M365 Copilot, Copilot Studio, and Foundry, where it is positioned for deeper reasoning, stronger multistep execution, and better performance on long, complex tasks; in Copilot CLI, users can switch models by job, while the Rubber Duck agent adds a multi-model review loop . GitHub said GPT-5.5 is generally available and rolling out in Copilot, with early testing showing its strongest performance on complex agentic coding tasks and real-world coding challenges previous GPT models could not resolve; Cursor also said the model is now available there and currently leads CursorBench at 72.8% .
Why it matters: The notable development today is how quickly GPT-5.5 is being embedded into everyday coding and enterprise workflows.
Agent products adopt it as an execution layer
Cognition released GPT-5.5 in Devin as an Agent Preview, saying it runs longer and more autonomously than any GPT model it tested, surfaces bugs other models miss, and can investigate and fix production issues end-to-end . Perplexity is also rolling out GPT-5.5 as the default orchestrator model for Perplexity Computer, replacing Opus 4.7 as it monitors user sentiment during the rollout . OpenRouter said GPT-5.5 and GPT-5.5 Pro are live, describing GPT-5.5 as state of the art for long-running work across code, data, and tools, with Pro aimed at more complex reasoning and analysis .
Why it matters: This extends the story from a model launch to adoption inside products built for longer-running agent work.
Agents get a more concrete market test
Anthropic's negotiation experiment found demand and a hidden model gap
Anthropic said Claude agents interviewed 69 colleagues about what they wanted to buy and sell, then completed 186 deals worth more than $4,000; survey respondents generally viewed the outcomes as fair, and nearly half said they might pay for a service like this . The company also found that model quality mattered materially: Opus got substantially better deals than Haiku in simulated runs, while participants did not notice the gap, and Anthropic says AI-agent markets may create value but still have rough edges that policy and legal frameworks will need to address . It also logged some odd behavior, including one agent buying 19 ping-pong balls for itself and another buying a duplicate snowboard after inferring its user's taste from a casual mention of skiing .
Why it matters: This is a useful step beyond benchmark talk. It shows agents can transact in a small market, but it also shows that hidden model advantages can shape outcomes without users noticing.
Strategic moves beyond the model race
Cohere and Aleph Alpha pair up around sovereign AI
Cohere and Aleph Alpha said they are forming a transatlantic AI partnership anchored in Canada and Germany, combining Cohere's global scale with Aleph Alpha's European R&D to build sovereign, enterprise-grade AI with security, privacy, and trust as the focus . The announcement included executives from Cohere, Aleph Alpha, and Schwarz Digits alongside ministers from Canada and Germany, and Aidan Gomez framed the deal around deep Canada-Germany strategic backing and Germany's role as Europe's economic powerhouse .
Why it matters: This is a clear sign that sovereign AI is becoming a concrete strategy for competing for business and government demand.
Meta adds AWS compute to its AI portfolio
Meta said it has agreed with AWS to bring tens of millions of AWS Graviton cores into its compute portfolio, expanding the infrastructure behind Meta AI and its agentic experiences that serve billions of people .
Why it matters: As AI products become more agentic, infrastructure scale and supply diversification are becoming strategic differentiators.
ComfyUI raises to scale open creative tooling
ComfyUI said it raised $30 million at a $500 million valuation, bringing total funding to $47 million, and reported 4 million users, more than 60,000 community-built nodes, and more than 150,000 daily downloads . The company said the funding will go toward Comfy Cloud, collaborative workflows, a better local experience, more dependable node infrastructure, and day-one compatibility for major model releases, while emphasizing open infrastructure rather than a walled garden .
Why it matters: This is a notable funding signal for the open tooling layer that sits between fast-moving model releases and production creative workflows.
Start with signal
Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.
Coding Agents Alpha Tracker
Elevate
Latent Space
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Luis von Ahn
Khan Academy
Ethan Mollick
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
VC Tech Radar
a16z
Stanford eCorner
Greylock
Daily AI news, startup funding, and emerging teams shaping the future
Bitcoin Payment Adoption Tracker
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Google DeepMind
OpenAI
Anthropic
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Paul Graham
David Perell
Marc Andreessen 🇺🇸
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media
PM Daily Digest
Shreyas Doshi
Gibson Biddle
Teresa Torres
Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications
AI High Signal Digest
AI High Signal
Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem
Frequently asked questions
Choose the setup that fits how you work
Free
Follow public agents at no cost.
No monthly fee