Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Humble’s Seed, Medra’s Lab, and DeepSeek V4’s Efficiency Push
Apr 25
7 min read
838 docs
Sam Altman
Harry Stebbings
Andrew Ng
+17
Two fresh financings and several emerging teams point to durable wedges in autonomy, biotech automation, enterprise AI rollout, and agent security. The larger pattern is a sharper split between AI-native winners with real pull and crowded categories where usage, moats, and cost structure matter more than topline alone.

Funding & Deals

  • Humble — $24M seed. Eclipse led with Energy Impact Partners participating. The thesis is architectural: remove the cab, get 360° sensor coverage, cut weight, and optimize for 40- and 53-foot intermodal containers in dock-to-dock routes. CEO Eyal Cohen previously worked at Apple, Uber ATG, and Waabi, and co-founded Spark AI, acquired by John Deere in 2023; he says the team reached a prototype in under six months.
  • AstorInvest — $5M seed. YC says Astor is building an AI investment advisor for everyday investors that connects to a brokerage account, analyzes the portfolio, and delivers personalized recommendations. Reported early traction: thousands of users connected more than $200M in assets less than two months after launch.
  • ComfyUI — $30M at a $500M valuation. Led by Craft Ventures with PaceCap, Chemistry, TruArrow, and others, the round is a useful open-infrastructure read-through: ComfyUI says it has 4M users, 60k+ community-built nodes, and 150k+ daily downloads, and plans to spend on Comfy Cloud, collaborative workflows, local UX, ecosystem reliability, and day-one model support while keeping the platform open.

Emerging Teams

  • Medra — robotic biology infrastructure with an AI scientist layer. Michelle Lee’s company opened a 38,000 sq ft San Francisco lab where roughly 100 robotic arms run experiments continuously. Its core wedge is using computer vision and manipulation models on standard lab equipment, which Lee says can raise the share of biotech tasks that can be automated from 5% to 75%; the company frames itself as TSMC for biology. In one cited customer example, the AI scientist proposed adding a vortexing step and improved antibody binding from 0% to above 70%.
  • OpenWork — open-source enterprise rollout layer for AI. YC describes OpenWork as an open-source alternative to Claude Cowork that supports existing agents, on-prem deployment, and any LLM provider. Early distribution is strong for this category: 14k GitHub stars and more than 150k downloads. YC highlighted founder Benjamin Shafii at launch.
  • Burrow — runtime security for agents from an operator who saw the failure mode firsthand. The founder says he leads cloud security at a company processing $80B in annual payments and started building Burrow after an internal AI agent deleted a production S3 bucket with customer data. The product lets teams define agent controls in plain English, create alerts for agent deviation, and investigate or quarantine agents through its Lookout service.
  • Opero — small but measurable early traction in WhatsApp-native agents. Three weeks in, the founder reports 25 users and 2 paying customers. The sharper product ideas are an LLM-evaluated signals system that only emits structured CRM webhooks when a user-defined condition is met, plus a self-improving loop where the owner answers one question and the agent stores the answer for future use; reported median turnaround is under 90 seconds.

AI & Tech Breakthroughs

  • DeepSeek V4 pushes the long-context efficiency frontier again. A technical deep dive describes V4 Pro as a 1.6T-parameter model with 49B active parameters and a new DSA hybrid attention architecture. At 1M context, the post says compute cost per token falls to 27% of V3.2 and KV cache to 10%, while LiveCodeBench reached 93.5, above GPT-5.4 at 91.7 in the cited comparison. The same post notes a weak spot in world knowledge, with SimpleQA-Verified at 57.9 versus Gemini 3.1 Pro at 75.6, and says DeepSeek describes itself as still 3-6 months behind the frontier there; the release is MIT licensed, with a 284B Flash model and 13B active parameters available.
  • Agentic workflows still look like a bigger lever than base-model upgrades. Andrew Ng argues that iterative loops such as outlining, critiquing, researching, and revising produce much better work than one-shot prompting, and says his team found the gain from adding agentic workflow to GPT-3.5 on a coding benchmark was larger than the gain from moving from GPT-3.5 to GPT-4. AI Fund says it has been helping portfolio companies deploy these workflows, and Ng separately pointed to CrewAI, AutoGen, and LangGraph as agent workflow platforms to watch.
  • Runtime retrieval is starting to close the training-cutoff gap for coding agents. Paper Lantern says its MCP server lets coding agents pull implementation guidance from more than 2M computer-science papers at runtime. In its 9-task benchmark, 5 tasks improved meaningfully; Python test generation moved from 63% bug catch to 87% using mutation-aware prompting from retrieved papers, and contract extraction improved from 44% to 76% using March 2026 papers that post-dated model training. Across the benchmark, 10 of the 15 most-cited papers were from 2025 or later.
  • Frontier model launches are not automatically collapsing specialist infra. Sam Altman said GPT-5.5 and GPT-5.5 Pro are now available in the API, but LlamaIndex said its ParseBench testing still showed mixed OCR results: GPT-5.5 won on tables and visual grounding, lost on charts, content faithfulness, and semantic formatting in some comparisons, and came with materially higher per-page pricing than LlamaParse’s cited 1.25¢ per page.

Market Signals

  • Bifurcation is no longer theoretical. SaaStr, citing Sapphire data, says enterprise software captured 52% of all VC funding in 2025, up from 41% in 2024, and that 80+ AI-native companies have already reached $100M+ ARR in under 18 months. AI-native operating profiles are diverging sharply from classic B2B, with cited ranges of 200-400% ARR growth, 130-200% NDR, 40-70% gross margins, and $1M-$5M ARR per employee. The same report says the top 10 private enterprise software companies are worth $1.93T, more than the pure SaaS public index at $1.88T, while public enterprise software has lost $2.4T in market cap since the October 2025 peak and pure SaaS trades at 3.1x NTM revenue.
  • Valuation froth looks concentrated, not universal. Elizabeth Yin says the current bubble is strongest in AI infrastructure, where companies can reach millions in revenue in weeks or months, while crowded horizontal AI tools can attract few or no investors. She expects the frothiness to cool in 1-2 years as low-hanging use cases are exhausted, CAC rises, adoption slows, and investors pull back; her advice to founders is to optimize for business quality, not ease of fundraising.
  • Due diligence is shifting from topline to engagement and moats. Harry Stebbings argues that in B2B AI, MAUs, WAUs, and DAUs now matter more than revenue because flat usage can hide stealth churn, while Clement Delangue says investors have become too fixated on top revenue growth and need to return to moats, product quality, and differentiated usage.
  • Seed investing still rewards volume, even in an AI-heavy cycle. Newcomer, citing Dealroom, says YC leads seed-stage investing with 94 companies that later reached $100M+ revenue and now backs roughly 500-600 startups per year. SV Angel follows a similar small-check, wide-net approach with around 50-100 new investments annually, while Sequoia stands out as the most successful non-accelerator seed fund. The same Newcomer item notes Bill Gurley’s view that the AI boom remains heavily subsidized by VC cash.
  • Founders may be underestimating non-AI opportunities and overestimating coding as the bottleneck. Paul Graham says AI is the biggest opportunity for startup founders, but non-AI ideas may be the most underpriced because others overlook them and some later become much larger through AI. Garry Tan’s related point is that in AI companies, deciding what to build, for whom, and how to get adoption is harder than writing the software.

Worth Your Time

Andrew Ng on agentic workflows

He argues that iterative agentic workflows can create larger gains than the GPT-3.5-to-GPT-4 jump on coding benchmarks, and pairs that view with falling training costs and better inference hardware.

Diana Hu on building an AI-native company

Useful for founders thinking about closed-loop companies, queryable orgs, software factories, and lean teams built around an intelligence layer rather than management middleware.

DeepSeek V4 primary materials

The paper and model collection.

Paper Lantern benchmarks and demo

Open benchmark repo and product demo: GitHub and paperlantern.ai/code.

Learning mechanics

Kanjun highlighted a new paper that tries to name and organize an emerging scientific theory of deep learning, framing learning mechanics as the physics to mechanistic interpretability’s biology. Read the paper.

Elizabeth Yin’s valuation essay

Her thread links a fuller argument for why some AI valuations may be justified by revenue velocity while crowded horizontal categories may reset as competition intensifies. Read it here.

Harnesses Matter Now: Cursor Multitasks, GPT-5.5 Spreads, DeepSeek v4 Gets Real
Apr 25
6 min read
155 docs
Riley Brown
Geoffrey Huntley
Salvatore Sanfilippo
+16
The clearest signal today is that coding agents are becoming an orchestration problem as much as a model problem. This brief covers Cursor 3.2’s async agent upgrades, practical GPT-5.5 migration tactics, DeepSeek v4’s strongest hands-on review yet, and the workflow patterns serious users are actually keeping.

🔥 TOP SIGNAL

Harness design is becoming the real differentiator. GPT-5.5 is being singled out for long-running code/data/tool work and natural job monitoring, Cursor 3.2 is shipping async subagents and worktrees for parallel background execution, and Salvatore Sanfilippo shows DeepSeek v4 can be dropped into CloudCode with an endpoint swap while still feeling close to recent closed frontier models in real coding-agent work . The practical takeaway: model quality still matters, but orchestration, migration discipline, and review boundaries are increasingly what decide whether an agent actually ships useful work .

🛠️ TOOLS & MODELS

  • GPT-5.5 keeps spreading across dev surfaces. Cursor says it is now available there, tops CursorBench at 72.8%, and is 50% off through May 2. OpenRouter frames it as SOTA for long-running work across code, data, and tools. Romain Huet’s API note is the most practical framing: for developers, it gets complex tasks done with fewer tokens and fewer retries. Devin’s team says it runs longer and more autonomously than any GPT model they have tested .
  • Cursor 3.2 = better orchestration, not just better chat./multitask runs async subagents instead of queueing requests, can multitask already-queued messages, adds improved worktrees for isolated background tasks across branches, and supports multi-root workspaces for cross-repo sessions . Jediah Katz’s recommended pattern: dedicate an async subagent to monitor a background job .
  • DeepSeek v4 Pro is the strongest open-weight coding story in today’s notes. In Salvatore Sanfilippo’s testing, the 1.6T MOE model with 49B active params and 1M context feels aligned with closed frontier models from roughly 3-6 months ago and is especially competent for software development . His CloudCode setup was simple: redefine endpoints with env vars/shell script, and the test session cost about $1/hour in tokens . He also flags the caveat that benchmark gains are outpacing real-world gains, so don’t confuse leaderboard movement with proportional productivity jumps .
  • DeepSeek v4 Flash is the local angle to watch. Sanfilippo says the smaller Flash variant is viable for local inference on a 512GB Mac Studio, while Pro output pricing was quoted at about $3.48/M output tokens and Flash is cheaper . His warning: local coding-agent stability depends heavily on sampling settings, or smaller models can get stuck in repetition loops .
  • Current practitioner stack rankings are moving fast. Mckay Wrigley says his coding split flipped from 80/20 Claude/GPT to 80/20 GPT/Claude in under three months, and if he could keep one model for engineering right now it would be GPT-5.5. His tool read is blunt: Codex and Claude Code are T1, Cursor is T2 in his current workflow; Codex feels like an engineer, Claude more like a general-purpose coworker .
  • Google Cloud’s internal harness story matters more than another public benchmark chart. Thomas Kurian says many engineers use the internal JetSki coding harness, that feedback flows directly into Gemini improvement, and Gemini is already used to scan for security issues before senior review and to troubleshoot cloud incidents by exposing tools/APIs to the model .

💡 WORKFLOWS & TRICKS

  • When migrating to GPT-5.5, don’t treat it like a drop-in. OpenAI’s guidance is to start from the smallest prompt that preserves the product contract, then retune reasoning effort, verbosity, tool descriptions, and output format instead of hauling over your whole old prompt stack . If you want the lazy path, Simon Willison points to the Codex command openai-docs migrate this project to gpt-5.5, and Romain Huet explicitly suggests asking Codex to migrate a Responses API integration for you .
  • Force a short status update before any tool calls on multi-step work. OpenAI’s prompting guide recommends a 1-2 sentence user-visible update that acknowledges the request and states the first step; Simon notes Codex already does this, and it makes long runs feel much less like the model crashed .
  • Cursor’s best new pattern: spawn a watcher, not more queue. Use /multitask to create an async subagent, let it monitor a background job, and keep the main thread moving; queued messages can also be converted into multitasked work instead of waiting for the current run to finish .
  • If you evaluate coding agents, steal Sanfilippo’s harness. Give the model a small but real codebase, a hard line-count budget, a non-trivial test suite, benchmark programs, and explicit anti-benchmaxing rules; then only count wins if speed improves with no regressions . His optimization hints—dual-ported objects, stack-machine expressions, fixed local-variable slots—show how to give strong priors without hand-writing the patch .
  • Human-in-the-loop still wins at the PR boundary. Kent C. Dodds says he can let agents work through personal-but-complex software mostly on their own, then review the PRs when they are done . Google Cloud is running the same shape at org scale: model-first inspection, human peer review retained, and exploration of separate supervisor models for review .
  • Measure output like Google does: functions shipped, not lines of code. Kurian’s point is simple: senior engineers write more compact code, so LoC is a bad productivity metric in an agent-heavy workflow .

👤 PEOPLE TO WATCH

  • Salvatore Sanfilippo — one of the few people doing repeated, same-task comparisons across frontier and open-weight models instead of screenshot benchmarks. His DeepSeek v4 tests and local-inference notes are useful because they include both wins and caveats .
  • Jediah Katz — high-signal on agent UX right now. The useful detail today was not just that GPT-5.5 is strong, but that it is strong specifically at multitasking and monitoring long-running work—and Cursor is shipping around that behavior .
  • Geoffrey Huntley — worth tracking for timeless agent patterns. His Ralph Wiggum memory-management loop is now built into Claude, Cursor, and Copilot, and his bigger point is that deliberate practice still separates casual use from real leverage .
  • Kent C. Dodds — clean articulation of the end-state workflow: autonomous execution first, human review second .
  • zeeg + ThePrimeagen — useful anti-hype filter.

"The state of the art is still ‘can we even one shot a production quality patch that we wont regret later’, and its rarer than you’d expect based on discourse."

Primeagen says he likes this framing not because he is anti-AI, but because obsessive prompt-chasing can wreck sleep, relationships, and life balance .

🎬 WATCH & LISTEN

  • 07:47-12:37 — Salvatore Sanfilippo’s coding-agent benchmark design (Italian). Best technical segment of the day if you care about evaluation quality: a tiny interpreter, 70 tests, hard code-size limits, explicit speed targets, and anti-benchmaxing constraints .
  • 15:13-17:17 — Why local coding agents still loop and degrade (Italian). Useful reality check on OMLX and local inference: fast runtimes are not enough if repetition penalties and sampling are off .
  • 00:05-03:12 — Riley Brown on Codex + Remotion. Good hands-on walkthrough of why built-in plugins matter: one interface for prompts, code generation, and rendered artifacts, with a very copyable project setup .

📊 PROJECTS & REPOS

  • Gondolin — sandbox project supporting QEMU, krun, and WASM on a branch . The interesting bet: its builders picked QEMU over Firecracker because they think future agents need “the computer they’ll actually need,” not just a thin function runtime .
  • Ralph Wiggum loop — not new, still relevant. Huntley describes it as the memory-management technique now built into Claude, Cursor, and Copilot, and says it spread through YC startups in early 2024 . The core pattern is still simple: keep appending working memory to an array and resend it to a stateless API in a loop .
  • OMLX — MLX-based local inference tooling for Mac worth watching if you want local agent runs with larger open-weight models . The caveat is the point: speed is nice, but stable coding-agent behavior depends on tuned repetition penalties and sampling .

Editorial take: the edge is shifting from “who writes the prettiest diff” to “who can keep a long-running agent on the rails, visible to the user, and reviewable at the end.”

GPT-5.5 Spreads Across AI Products as DeepSeek Pushes 1M Context and Alphabet Backs Anthropic
Apr 25
4 min read
969 docs
Cognition
GitHub
Tencent Hy
+21
OpenAI’s GPT-5.5 spread quickly into APIs and developer products, while DeepSeek’s V4 release sharpened the debate around efficient million-token inference. The day also brought a massive new compute commitment to Anthropic, practical agent research, and a fresh round of product workflow upgrades.

Top Stories

Why it matters: Today’s clearest signals were distribution, efficient long-context inference, and the compute race behind frontier models.

  • GPT-5.5 moved from launch to broad deployment. OpenAI made GPT-5.5 and GPT-5.5 Pro available in the API, including a 1M context window and a higher-accuracy Pro option in the Responses API . GitHub Copilot, Cursor, Perplexity Computer, and Devin also rolled it out or began using it as a default/orchestrator model . The recurring theme was efficiency: on Notion’s knowledge-work benchmark, GPT-5.5 was 33% faster than Opus 4.7 while using half the tokens, and on LisanBench it used about 45.6% fewer tokens than GPT-5.4-medium while scoring 1.77x higher .

  • DeepSeek V4 made open-weight competition look more like a systems story than a parameter story. At 1M context, V4-Pro uses 27% of V3.2’s single-token FLOPs and 10% of its KV cache, which DeepSeek commentators say can translate into far more concurrent long-context requests on the same hardware . Artificial Analysis says V4 Pro leads open-weight models on GDPval-AA at 1554, while V4 Flash shifts the price/performance frontier; it also reports very high hallucination rates for both models .

  • Alphabet deepened the compute war around Anthropic. Alphabet said it will invest up to an additional $40 billion in Anthropic and provide at least 5 GW of computing power . The business implication is straightforward: frontier competition is increasingly being financed as dedicated infrastructure, not just model R&D.

Research & Innovation

Why it matters: The most interesting research today focused on harder math, more reliable tool use, and longer-horizon memory for agents.

  • OpenAI linked GPT-5.5 to a new Ramsey-number result. Sebastien Bubeck said an internal version of GPT-5.5 proved that the ratio R(k,n+1)/R(k,n) tends to 1 for all fixed k, solving Erdős problem #1014; OpenAI also published a proof PDF and a Lean verification .

  • A new paper targeted the MCP tax in tool-heavy agents.Tool Attention Is All You Need proposes dynamic tool gating plus lazy schema loading; on a simulated 120-tool benchmark it cut tool tokens 95%, from 47.3k to 2.4k per turn, while raising effective context utilization from 24% to 91% .

  • StructMem argues agent memory needs maintenance, not just retrieval. The paper stores simple memories first, then consolidates them in the background into structured relationships across time and events, targeting a common long-horizon failure mode: losing the links between facts .

Products & Launches

Why it matters: Product competition is shifting from raw model access toward orchestration, parallelism, and tighter user control.

  • Cursor 3.2 added /multitask, letting async subagents run requests in parallel instead of queueing them, plus background worktrees and multi-root workspaces for cross-repo changes .
  • Gemini API added collaborative planning for Deep Research: users can request a plan, refine it, and only then approve execution .
  • Gemini’s April Drops bundled a native Mac app, Lyria 3 Pro music generation, NotebookLM integration, interactive visuals, and conversation branching fixes .

Industry Moves

Why it matters: Major companies kept buying compute, sovereignty, and distribution rather than waiting for the next model cycle.

  • Cohere and Aleph Alpha said they are forming a transatlantic AI powerhouse anchored in Canada and Germany to build sovereign, enterprise-grade AI for businesses and governments .
  • Meta and AWS agreed to bring tens of millions of AWS Graviton cores into Meta’s compute portfolio to scale Meta AI and agentic experiences .
  • Cloud GPU scarcity is tightening again. Reporting from The Information says providers like Microsoft are diverting GPUs to internal teams or larger customers, leaving smaller AI startups scrambling .

Quick Takes

Why it matters: These smaller updates add texture to where models, agents, and benchmarks are moving next.

  • Anthropic’s Project Deal let Claude agents negotiate for 69 employees; they closed 186 deals worth over $4,000, and Opus models got substantially better deals than Haiku models .
  • Xiaomi’s MiMo V2.5 Pro hit 54 on the Artificial Analysis Intelligence Index, tying Kimi K2.6, and scored 1578 on GDPval-AA; weights are expected soon .
  • ParseBench found GPT-5.5 strong on tables and visual grounding for enterprise OCR, but weaker on charts, faithfulness, and semantic formatting, at 5.93¢ to 13¢ per page .
  • Tencent open-sourced Hy3 preview as a 295B A21B reasoning/agent model, and it is now live on Arena for public evaluation .
The Tail End, Cloudflare’s Code Model Tools Post, and Ferriss’s Positioning Stack
Apr 25
4 min read
192 docs
Tim Ferriss
Tomasz Tunguz
Tim Ferriss
+2
Today’s strongest organic recommendations split between Tim Ferriss’s enduring reads—one reflective essay and a compact stack on positioning, audience-building, and wellbeing—and Tomasz Tunguz’s Cloudflare engineering pick tied to a concrete AI-agent implementation win.

What stood out

Two kinds of authentic recommendations emerged today: a reflective article Tim Ferriss still considers unusually impactful, and a technical Cloudflare post Tomasz Tunguz credits with shaping an AI-agent implementation. Ferriss also shared a compact reading stack for category design, audience-building, and wellbeing

Most compelling recommendation

  • Title:The Tail End
    Content type: Blog post / article
    Author/creator: Tim Urban
    Link/URL:https://waitbutwhy.com/2015/12/the-tail-end.html
    Who recommended it: Tim Ferriss, who said Matt Mullenweg first pointed him to it on a hike in San Francisco
    Key takeaway: Ferriss said the piece uses diagrams to underscore how short life is and can prompt a rethink of personal priorities
    Why it matters: This had the strongest personal endorsement in today’s set. Ferriss said that if you only read one article this month, it should be this one, and later called it one of the most impactful blog posts he has ever read

“It turns out that when I graduated from high school, I had already used up 93% of my in-person parent time. I’m now enjoying the last 5% of that time. We’re in the tail end.”

Highest-utility operator pick

  • Title: Cloudflare blog post on code model tools
    Content type: Blog post
    Author/creator: Cloudflare
    Link/URL: No direct URL provided in the source material; source context: SF AI Engineers: inside Vision AI, Coding Agents + Rust Systems
    Who recommended it: Tomasz Tunguz
    Key takeaway: Tunguz said the post was the main reason they implemented a discovery API in which the agent first asks which tools and functions are available, then builds a plan; he said this significantly compressed tokens and acted as living documentation for the model
    Why it matters: This was the clearest recommendation today with measurable implementation impact. Tunguz tied it to using smaller open-source models and reducing monthly errors from roughly 50,000 to 114

“the main reason we did this is Cloudflare published a blog post a little while ago on code mode tools.”

Ferriss’s compact stack for positioning and sanity

In the same conversation, Ferriss recommended durable reads for category design and audience-building, then separately pointed to exercise as a foundation for wellbeing

“you’re competing in an algo chasing game ... the window for that working is going to close very quickly.”

  • Title:Blue Ocean Strategy
    Content type: Book
    Author/creator: Not specified in the source material
    Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
    Who recommended it: Tim Ferriss
    Key takeaway: Ferriss said he would be reading it in response to a market where algorithm-dependent tactics may not work longitudinally
    Why it matters: He positioned it as a way to think about durable differentiation instead of short-term reach hacks

  • Title:The 22 Immutable Laws of Marketing
    Content type: Book
    Author/creator: Not specified in the source material
    Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
    Who recommended it: Tim Ferriss
    Key takeaway: He singled out the chapter on the law of category
    Why it matters: It sat inside the same advice set on building trust and credibility without leaning on fragile algorithmic distribution

  • Title:1,000 True Fans
    Content type: Essay
    Author/creator: Kevin Kelly
    Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
    Who recommended it: Tim Ferriss
    Key takeaway: Ferriss said he would be reading it and added that many of the people who convert best right now may not think of themselves as creators
    Why it matters: It complements his broader advice to build durable audience relationships rather than chase platform volatility

  • Title:Spark
    Content type: Book
    Author/creator: Not specified in the source material
    Link/URL: No direct URL provided in the source material; source context: How to Stay Sane & Productive with Tim Ferris & Dr. Laurie Santos
    Who recommended it: Tim Ferriss
    Key takeaway: Ferriss recommended it as a book on the effects of exercise on cognition while arguing that taking care of the body supports the brain and mind
    Why it matters: It was the clearest wellbeing recommendation in today’s set, aimed at sanity and performance rather than positioning alone

Bottom line

If you open one resource first, start with The Tail End for the strength and durability of Ferriss’s endorsement . If you want the most immediately applicable operator read, follow Tunguz’s pointer to Cloudflare’s code model tools post via the talk context above .

Managed Agents, Model Introspection, and the Next PM Operating Loop
Apr 25
9 min read
148 docs
Shreyas Doshi
signüll
Run the Business
+3
This issue focuses on newer AI-era PM operating patterns: build ahead of model capability, use model introspection to debug failures, and rethink process based on value. It also covers Rakuten's managed-agent rollout, early-career PM tactics, and interview signals worth watching.

Big Ideas

1) Build ahead of model capability, then strip the scaffolding

Anthropic's Claude Code team built code review before the models were accurate enough; because the prototype already existed, they could swap in newer Opus models and test the idea again as capability improved . The same team audits prompts and workflow crutches after model releases, removing features that weaker models once needed, such as to-do lists once Opus 4 could track work natively .

  • Why it matters: Waiting for perfect model capability can leave a team behind, while keeping old scaffolding for too long creates product debt .
  • How to apply: Build promising ideas to the point where a model swap can be tested immediately, then run a release-by-release audit of prompts, guardrails, and helper steps to remove what stronger models no longer need .

2) The scarce PM skill is discernment plus diagnosis

"now it’s: do you know what’s worth building, & can you feel when it’s wrong."

Shreyas Doshi says this discernment is learnable with the right mindset, but requires unlearning prior habits . Anthropic adds a concrete AI-native diagnostic move: ask the model to explain its own mistakes, because the answer can reveal a confusing system prompt or a subagent that failed to verify its work .

  • Why it matters: As building gets cheaper, PM leverage shifts toward choosing the right problems and understanding why a system failed .
  • How to apply: When results are poor, separate the diagnosis into three questions: was the bet wrong, was the prompt or harness wrong, or did the verification flow fail?

3) Treat your operating model like a product

Tim Herbig's framing is to treat ways of working like products: optimize for value over theoretical correctness, and connect strategy, OKRs, and discovery to the team's specific context .

  • Why it matters: In fast-moving environments, process that looks correct but creates little value becomes a drag .
  • How to apply: Review your recurring rituals the way you would review features: what job they serve, what value they create, and whether they should be kept, adapted, or removed in your context .

4) PM work is moving toward supervising fleets of AI tasks

Anthropic describes a progression from single successful tasks to running many tasks at once—eventually 50 or 100 simultaneously—which requires remote execution, better task-management interfaces, output verification, and self-improving feedback loops .

  • Why it matters: The human role shifts from doing every task directly to deciding what to inspect, verifying outputs, and improving the system over time .
  • How to apply: In your own AI workflows, explicitly separate task definition, execution, verification, and feedback so you can see where orchestration breaks first .

Tactical Playbook

1) Run a model-introspection debugging loop

  1. When the model makes an unexpected decision, ask it why it made that choice .
  2. Check whether the explanation points to a confusing system prompt .
  3. Check whether a subagent delegated verification but failed to actually verify the work .
  4. Fix the harness, then rerun the task .
  • Why it matters: This turns vague model failure into a fixable prompt or orchestration problem .
  • How to apply: Make introspection a standard part of AI-product QA, not an ad hoc trick used only when a launch is already off track .

2) Add a build-ahead and release-audit cycle

  1. Build versions of promising ideas that are "on the edge of working" instead of waiting for perfect model capability .
  2. When stronger models ship, swap them into the existing prototype immediately to test whether the capability gap has closed .
  3. After each major model release, audit prompts and workflow steps for scaffolding the model may no longer need .
  4. Remove the crutches that have turned into debt, as Anthropic did with Claude Code's to-do lists .
  • Why it matters: The same operating loop helps teams capture upside faster and simplify products as models improve .
  • How to apply: Put model-release reviews on the team calendar the same way you schedule launch retrospectives .

3) Audit PM process for value, not framework purity

  1. Pick one practice at a time—strategy reviews, OKRs, or discovery rituals .
  2. Ask what value it creates for the team rather than whether it matches a textbook model .
  3. Check whether it actually connects strategy, OKRs, and discovery in your context .
  4. Keep, adapt, or drop the practice based on that value test .
  • Why it matters: It is easier to remove low-value process when the evaluation standard is usefulness, not orthodoxy .
  • How to apply: Use this audit when a team is debating process changes but cannot explain what better outcomes the current process creates .

4) Be selective if you formalize decision memory

A Reddit thread highlighted a recurring problem: new PMs may not know why a decision was made, and teams can end up re-debating issues that were closed months earlier . A commenter also warned that trying to track everything can become "a death by a thousand cuts" or a liability in some industries .

  • Why it matters: Decision memory can reduce ramp-up friction, but documenting every decision has real overhead and risk .
  • How to apply: If you try to solve this, start with the decisions that most often cause onboarding delays or repeat debate, rather than exhaustive logging .

Case Studies & Lessons

1) Rakuten: one managed agent per department

Rakuten deployed one Claude Managed Agent for each department—engineering, product, sales, marketing, and finance—and each agent went live in under a week . Reported results were a 97% reduction in critical errors and a release cadence change from quarterly to biweekly . Aakash Gupta argues the old gating problem—sandboxed execution, credential vaulting, audit trails, and scoped permissions—was handled by Anthropic, letting Rakuten focus on defining the job each agent should do .

  • Lesson: When infrastructure constraints move to the vendor, PM work shifts toward scoping, ownership, and adoption .
  • How to apply: Start with a department-sized job to be done, define the agent's scope clearly, and do not assume the rollout still needs quarter-scale custom infrastructure work .

2) Claude Code: prototype early, simplify later

Claude Code's code review product failed multiple times because earlier models were not accurate enough, but the prototype was already built, so Anthropic could quickly test it again with Opus 4.5 and 4.6 . As model capability improved, the team also removed legacy scaffolding that weaker models had needed .

  • Lesson: In AI products, "not ready yet" can still be a reason to build the surrounding product shell if you expect model quality to improve .
  • How to apply: For high-upside ideas blocked by current model performance, build enough of the experience, measurement, and harness that a better model can be evaluated immediately when it arrives .

Career Corner

1) Discernment is trainable, but it requires unlearning

Shreyas Doshi says the new question is not whether you can build it, but whether you know what is worth building and can feel when it is wrong . He also says this discernment is learnable with the right mindset, but requires unlearning prior teachings .

  • Why it matters: AI raises the value of product judgment relative to delivery mechanics .
  • How to apply: In your own work, review launches and misses with an explicit "what did I misread?" lens, not just a "what did we ship?" lens .

2) AI teams are hiring for resilience, not just PM fundamentals

Anthropic says it looks for people who can lean into chaos, stay optimistic, and tackle hard challenges without burning out as priorities change quickly .

  • Why it matters: In high-velocity environments, the ability to keep operating through shifting priorities is itself a career asset .
  • How to apply: In interviews, use examples that show calm execution under ambiguity, not only polished planning artifacts .

3) For PM interns, optimize for relationships, questions, and notes

Advice from former PM interns in Reddit focused on accepting limited direct impact, bringing curiosity and energy, attending events, setting up 3–5 new 1:1s each week, finding a mentor, keeping a running question list, and getting strong at note-taking . One commenter also recommended reading Inspired and Empowered.

  • Why it matters: The advice prioritizes network-building, context gathering, and observation over trying to look like a fully formed PM on day one .
  • How to apply: Build a simple weekly cadence: new 1:1s, one mentor conversation, one question list, and one clean set of meeting notes .

4) Customer-focus is a fair interview bar; surprise unpaid research is not

One Reddit candidate described preparing a take-home presentation, then being asked without prior notice which company customers they had interviewed; the interviewer reportedly argued that presentations are easy because AI tools can help, while talking to customers is the real value . Commenters suggested more valid alternatives: ask about the candidate's research sources and how trustworthy they are, or role-play a customer discovery conversation .

  • Why it matters: Strong PM interviews should test discovery judgment, but the test itself should be explicit and job-relevant .
  • How to apply: Clarify expected research inputs before take-homes, and use the interview design itself as a signal about how the company works .

Tools & Resources

1) Cat Wu on Claude Code PM practices

https://x.com/lennysan/status/2047669259380383955 covers build-ahead prototyping, model introspection, scaffolding audits, and the shift toward managing many AI tasks at once .

  • Why explore it: It packages several concrete AI-native PM operating ideas in one place .
  • How to use it: Review it with your team and decide which one change to test first: introspection debugging, build-ahead prototyping, or release audits for scaffolding .

2) Rakuten case study

http://claude.com/customers/rakuten is the source linked in Aakash Gupta's note about Rakuten's managed-agent rollout .

  • Why explore it: It includes concrete reported outcomes—97% fewer critical errors and releases moving from quarterly to biweekly .
  • How to use it: Use it to frame internal discussions around departmental scope, deployment speed, and where vendor infrastructure changes the rollout plan .

3) Anthropic automations deep dive

https://www.news.aakashg.com/p/claude-automation-pms is Aakash Gupta's deeper breakdown of Anthropic's automation surfaces .

  • Why explore it: It translates the Rakuten example into a planning implication for PMs: the constraint may have moved from infrastructure to task definition .
  • How to use it: Share it when stakeholders still assume an internal AI-agent deployment must be scoped as a multi-quarter engineering project .

4) Uncertainty-Driven Discovery

https://runthebusiness.substack.com/p/uncertainty-driven-discovery features Tim Herbig's argument for value-first product practices that fit context instead of rigid frameworks .

  • Why explore it: It is useful when the team is debating process more than value .
  • How to use it: Use it as a prompt for a retrospective on whether your current OKR and discovery routines are actually helping the team make better decisions .
GPT-5.5 Spreads Across AI Products as Agents Get a Real-World Test
Apr 25
4 min read
229 docs
Cursor
GitHub
AI at Meta
+9
The biggest story was downstream adoption: Microsoft, GitHub, Devin, Perplexity, OpenRouter, and Cursor all moved quickly on GPT-5.5. Anthropic added a more grounded agent story with a live negotiation experiment, while Cohere, Meta, and ComfyUI signaled larger shifts in sovereignty, compute, and tooling.

GPT-5.5 becomes a distribution story

Microsoft and developer tools move quickly

OpenAI's new model moved into major work products quickly. Microsoft said GPT-5.5 is rolling out to GitHub Copilot, M365 Copilot, Copilot Studio, and Foundry, where it is positioned for deeper reasoning, stronger multistep execution, and better performance on long, complex tasks; in Copilot CLI, users can switch models by job, while the Rubber Duck agent adds a multi-model review loop . GitHub said GPT-5.5 is generally available and rolling out in Copilot, with early testing showing its strongest performance on complex agentic coding tasks and real-world coding challenges previous GPT models could not resolve; Cursor also said the model is now available there and currently leads CursorBench at 72.8% .

Why it matters: The notable development today is how quickly GPT-5.5 is being embedded into everyday coding and enterprise workflows.

Agent products adopt it as an execution layer

Cognition released GPT-5.5 in Devin as an Agent Preview, saying it runs longer and more autonomously than any GPT model it tested, surfaces bugs other models miss, and can investigate and fix production issues end-to-end . Perplexity is also rolling out GPT-5.5 as the default orchestrator model for Perplexity Computer, replacing Opus 4.7 as it monitors user sentiment during the rollout . OpenRouter said GPT-5.5 and GPT-5.5 Pro are live, describing GPT-5.5 as state of the art for long-running work across code, data, and tools, with Pro aimed at more complex reasoning and analysis .

Why it matters: This extends the story from a model launch to adoption inside products built for longer-running agent work.

Agents get a more concrete market test

Anthropic's negotiation experiment found demand and a hidden model gap

Anthropic said Claude agents interviewed 69 colleagues about what they wanted to buy and sell, then completed 186 deals worth more than $4,000; survey respondents generally viewed the outcomes as fair, and nearly half said they might pay for a service like this . The company also found that model quality mattered materially: Opus got substantially better deals than Haiku in simulated runs, while participants did not notice the gap, and Anthropic says AI-agent markets may create value but still have rough edges that policy and legal frameworks will need to address . It also logged some odd behavior, including one agent buying 19 ping-pong balls for itself and another buying a duplicate snowboard after inferring its user's taste from a casual mention of skiing .

Why it matters: This is a useful step beyond benchmark talk. It shows agents can transact in a small market, but it also shows that hidden model advantages can shape outcomes without users noticing.

Strategic moves beyond the model race

Cohere and Aleph Alpha pair up around sovereign AI

Cohere and Aleph Alpha said they are forming a transatlantic AI partnership anchored in Canada and Germany, combining Cohere's global scale with Aleph Alpha's European R&D to build sovereign, enterprise-grade AI with security, privacy, and trust as the focus . The announcement included executives from Cohere, Aleph Alpha, and Schwarz Digits alongside ministers from Canada and Germany, and Aidan Gomez framed the deal around deep Canada-Germany strategic backing and Germany's role as Europe's economic powerhouse .

Why it matters: This is a clear sign that sovereign AI is becoming a concrete strategy for competing for business and government demand.

Meta adds AWS compute to its AI portfolio

Meta said it has agreed with AWS to bring tens of millions of AWS Graviton cores into its compute portfolio, expanding the infrastructure behind Meta AI and its agentic experiences that serve billions of people .

Why it matters: As AI products become more agentic, infrastructure scale and supply diversification are becoming strategic differentiators.

ComfyUI raises to scale open creative tooling

ComfyUI said it raised $30 million at a $500 million valuation, bringing total funding to $47 million, and reported 4 million users, more than 60,000 community-built nodes, and more than 150,000 daily downloads . The company said the funding will go toward Comfy Cloud, collaborative workflows, a better local experience, more dependable node infrastructure, and day-one compatibility for major model releases, while emphasizing open infrastructure rather than a walled garden .

Why it matters: This is a notable funding signal for the open tooling layer that sits between fast-moving model releases and production creative workflows.

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 108 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+105

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.