Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Multi-Agent Coding Becomes the Default Workflow
May 21
4 min read
129 docs
Boris Cherny
Logan Kilpatrick
Kevin Hou
+14
The clearest signal today is operational: top practitioners are moving from one-agent chat tabs to multi-agent control surfaces with evals, observability, and safe delegation. Inside: copyable ADK and MCP workflows, Cursor and Claude Code updates, benchmark context on Composer 2.5 vs Gemini 3.5 Flash, and three clips worth your time.

🔥 TOP SIGNAL

  • Multi-agent is becoming the default posture. Boris Cherny says most Claude Code users now run many instances at once and he personally runs ~5 locally plus hundreds or thousands in parallel overnight; Google says Anti Gravity is shifting away from the IDE toward a UI for managing multiple agents, and Railway is designing for thousands of coordinated agents with explicit human intervention points. Anthropic says code written per engineer rose about 250% after Claude Code, which makes this look like an operational shift, not a demo trick.

⚡ TRY THIS

  • Use a manager-agent pattern. Boris Cherny says he no longer writes code directly; he prompts one Claude that prompts other Claudes. Copy the structure, not the scale: give one lead agent the spec, let it coordinate other agent instances, start with a handful of parallel tasks locally, then expand to bigger overnight batches only after the review loop feels safe.

  • Use a zero-to-one build prompt, then gate on evals. In Kevin Hou's demo, the prompt was build me a daily news bot using adk ... I want RSS feeds ... summarize the five stories ... I should be able to deploy this and fetch the latest stories. The agent produced an implementation plan you can comment on, created the scaffolding and task list, then kept running evals/tests while fixing its own mistakes; steal this pattern for small internal tools instead of starting from folders and boilerplate.

  • Debug prod with MCPs and one plain-English prompt. In Google's demo, once the cloud MCPs were configured, the prompt find out what's wrong with my DynoQuest app was enough for the agent to inspect the right services and logs. The trick is boring but powerful: wire the MCPs first, then keep the prompt high-level.

  • Compress long-horizon evals before changing prompts. Palash Shah's process for a 30+ minute agent run is: extract the reasoning from traces, identify the cause of the behavior, recreate only that minimal situation, then iterate on a much smaller eval. This is the cleanest prompt-debug loop in today's sources.

📡 WHAT SHIPPED

  • Cursor 3.5 automations. Multi-repo automations can now work across codebases to execute, test, and verify tasks; you can also create repo-less automations for jobs like a daily Slack digest. Automations now live in the Agents Window, new automation runs are 50% off for 7 days, and templates are live at cursor.com/marketplace; download is at cursor.com/download.

  • Claude Code Auto Mode. Anthropic replaced per-tool permission prompts with layered safety checks where a second Claude evaluates tool use, backed by thousands of safety benchmarks; Boris Cherny says this reduced approval fatigue and was safer than the old prompt-by-prompt flow.

  • Cursor Composer 2.5: fresh benchmark context. Artificial Analysis puts it at 62 on the Coding Agent Index, third behind Claude Opus 4.7 and GPT-5.5; standard is $0.07/task, Fast is $0.44/task and averages 6.7 minutes per task, with availability limited to Cursor IDE and CLI.

  • Gemini 3.5 Flash: strong on Google's coding evals, weak on CursorBench. Google says Flash improved materially on terminal benchmarks, SWE-bench-style coding, MCP calling, and tool use for agentic coding; Theo points to cursor.com/evals and argues Flash 3.5 scored below Composer 2 there while costing 4x more. Treat this as a benchmark split-screen, not a settled verdict.

  • Antigravity tooling is consolidating. Google says the old Apache-licensed Gemini CLI will stop working with subscription plans on June 18 and be replaced by Antigravity CLI; Simon Willison notes the broader suite now spans a desktop app, CLI, IDE, and an open-source Python SDK wrapper. Repos: google-antigravity/antigravity-sdk-python, google-antigravity/antigravity-cli, google-gemini/gemini-cli.

  • CodeMinder is one to watch. Sundar Pichai says Google's internal security teams already use agentic workflows to detect vulnerabilities and patch them, and the internal CodeMinder system being externalized can identify issues, generate patches, test them, verify them, and deploy fixes.

🎬 GO DEEPER

  • 2:24:06-2:28:43 — Kevin Hou's zero-to-one ADK build. Best concrete demo of the day if you want to watch an agent go from a natural-language request to plan, scaffolding, fixes, evals, and a finished artifact.
  • 1:04:53-1:07:02 — MCP-powered prod debugging. Good watch if your agents still depend on humans to paste logs around: the key prompt is almost trivial once the MCP wiring exists.
  • 50:07-51:57 — Railway's self-deploy loop. Advanced, but worth the two minutes: put the platform CLI inside a process already running on the platform, let the agent provision what it needs, deploy itself, and throw away bad copies.

Editorial take: the model is eating the scaffolding, but the durable edge is ops—evals, observability, and safe multi-agent control.

Blank Bio's Seed, Exa's Search Bet, and the Agent-Native Infrastructure Shift
May 21
5 min read
724 docs
Exa
Aidan Gomez
Cohere
+12
Blank Bio's seed and Exa's search financing framed the capital signals, while YC launches and new commentary from Baseten, Railway, and Cohere sharpened the investment case around agent-native software, post-training, and compute economics.

Funding & Deals

  • Blank Bio: Blank Bio raised a $7.2M seed with a strategic collaboration from PacBio. The company is training foundation models on bulk RNA-seq to help pharma design better clinical trials by learning patient heterogeneity and building prognostic and predictive biomarkers from tumor transcriptomes. Announcement

  • Exa: Exa raised $250M at a $2.2B valuation in a Series C led by a16z. Not seed-stage, but still a clear thesis-confirming financing: Exa is positioning as search infrastructure for AI agents, especially on long-tail, high-alpha queries where traditional engines fail, and a16z says developers and agents are already reaching for it first. The founders started building years before ChatGPT, betting transformers would change how information is accessed.

Emerging Teams

  • Lab0: Lab0 is building an AI forward deployment engineer for enterprise software, automating client process discovery, configuration, testing, and go-live. The key datapoint is implementation speed: YC says deployment cycles fall from six months to ten days. Founders: Onkar Borade, tokenaware, and Sujay Sriv.

  • InLoopRobotics: InLoopRobotics is selling warehouse automation as a monthly service rather than capex: packing, kitting, and fulfillment with no integrators and no 6-month PoC. Paid pilots are already live at 300+ picks per hour. Founders: FeduniakS, Zakariea_sh, and Pasha Rizali.

  • Armature: Armature is an early signal that "agent experience" may become its own software category. It runs real agent workflows to monitor and optimize how AI agents experience products, with a focus on improving MCP or CLI surfaces. Founders: Totzenberger and Louis Scremin.

  • AI code-review tooling is starting to cluster: YC-backed Stage is a guided code-review platform for understanding AI-generated code and claims faster review than GitHub, while Prix AI independently pitches AI as the first reviewer on GitHub PRs, flagging repetitive issues such as edge cases, logic mistakes, performance, security, and style problems before humans step in. The overlap suggests a real wedge is forming around QA for AI-written software.

AI & Tech Breakthroughs

  • Baseten's "owned intelligence" thesis is getting production proof points: Baseten describes its stack as production-grade inference for companies moving from rented to owned intelligence by post-training models on their own application data. It cited Abridge, Decagon, OpenEvidence, Cursor, and Intercom as companies already adopting this pattern, and its technical work is pushing toward continual learning for long-horizon agentic tasks where models evolve with real-time data, tools, and specialized evals.

  • Cohere Command A+: Cohere said Command A+ is its most powerful LLM yet, optimized to run on minimal hardware and released as the company's first fully open-source Apache 2 model. For investors, it is a clean signal that efficient open models are still improving at the high end.

  • Context control is turning into real infrastructure: Compresh reports roughly 60% fewer input tokens on long agent sessions by keeping the last four rounds raw and compressing older context into a partitioned memory view; in separate architecture writing, an "Adaptive Agent Architecture" proposes state-driven micro-agents, hard retry limits, and reflection anchors, with a claimed reduction from 15,000-50,000 tokens per task to 3,000-7,000. The broader takeaway is that memory and retry control are becoming first-class product surfaces.

  • Efficient-model research keeps moving: A BitNet 1.58 writeup highlighted a ternary-weight approach using {-1, 0, +1} instead of FP16/FP32 weights, trading precision for higher dimensionality to preserve output quality while reducing memory and compute demands.

Market Signals

  • The competitive layer is moving above the base model: Railway argues agent workloads need tighter control over network, compute, storage, orchestration, versioning, observability, and branching at 1,000x human scale; Armature is explicitly measuring how agents experience products; and a Reddit discussion around Google's enterprise agent platform framed the shift as moving from model hosting toward orchestration, governance, and multi-agent tooling.

"Pull request is definitely dying."

  • Compute scarcity is creating infra moats: Railway says its own bare-metal data centers deliver roughly three-month payback and ~70% margins, while cloud bursting across five providers helps avoid compute bottlenecks. Baseten says capacity constraints are worse than most outsiders think and has responded by distributing inference across 15-20 clouds and 80-100 regions.

  • Search and distribution are being rebuilt for AI agents: Exa's financing rests on the idea that agent-first search wins hard, long-tail queries, while Georion is building a growth dashboard around AI visibility scanning, prompt tracking, AI crawler logs, and revenue attribution across engines such as ChatGPT, Claude, and Perplexity.

  • Capital structure may matter more than many app founders expect: Gavin Baker argued that disaggregating prefill and inference could extend GPU useful lives from 3-4 years to 10-15 years, lowering financing rates and helping fund the AI buildout; in the same discussion, he said TSMC's capacity decisions are the key indicator for whether AI infrastructure turns into an overbuild.

  • Policy risk is rising around frontier releases: Bindu Reddy flagged a planned White House executive order requiring frontier models to be reviewed 90 days before release, and argued it would boost China and open-source AI.

Worth Your Time

  • GBrain thread and follow-up: Quick read on open-source agent memory infrastructure, benchmarked long-memory performance, and context-engineering-driven idea generation.

  • Blank Bio seed announcement: Short, useful read if you want the cleanest primary-source framing for the RNA-seq foundation-model thesis in clinical trials.

OpenAI’s Math Milestone and Anthropic’s Colossus-Scale Bet
May 21
4 min read
865 docs
Google DeepMind
clem 🤗
Databricks AI Research
+17
OpenAI claimed the first autonomous AI solution to a major open math problem, while Anthropic paired a multiyear Colossus compute deal with projected profitability. Also inside: agent research advances, new product launches from Cohere, OpenAI, and Google, plus notable funding and policy moves.

Top Stories

Why it matters: today’s clearest signals were AI reaching a new research threshold, and frontier-model economics becoming even more tied to massive infrastructure commitments.

  • OpenAI said a general-purpose reasoning model solved the planar unit distance problem, a famous Erdős question posed in 1946. The model found a new family of constructions that beats the square-grid approach mathematicians had treated as best for nearly 80 years, which OpenAI called the first time AI has autonomously solved a prominent open problem central to a field of mathematics .

“What’s significant about this moment is that it’s the first really clear example of AI solving — not just an unsolved math problem — but a really well-known math problem.”

The result also comes less than a year after frontier models reached IMO gold-level performance, marking a fast jump from competition math to original research .

  • Anthropic’s scale story sharpened on both compute and finances. A SpaceX filing says Anthropic is paying $1.25B per month through May 2029 for capacity across Colossus and Colossus II , while WSJ-reported projections put Q2 revenue at $10.9B and first operating profit at about $559M. Together, those numbers suggest frontier AI demand is now supporting both multiyear infrastructure commitments and near-term profitability.

Research & Innovation

Why it matters: beyond headline model releases, the most useful research updates focused on making agents cheaper, more scientifically useful, and easier to evaluate honestly.

  • Databricks introduced MemEx, a programmable Python scratchpad that lets agents transform, slice, and persist tool outputs as typed objects instead of flooding the context window . On enterprise agentic tasks, Databricks says frontier models gained 2–5 accuracy points at 25–30% lower cost, while Qwen 122B and 397B nearly doubled accuracy at 40–50% lower cost.

  • Hugging Face released Carbon, an open DNA base model with open weights, training code, and data pipeline for downstream biology tasks . The team says Carbon is 275x faster than the next best model at its size, can run locally, and can process a whole human genome on a single GPU in under two days .

  • InferenceBench offered a reality check on AI R&D automation. Its creators say current frontier agents still struggle with system-level engineering and complex dependencies, and underperform simple hyperparameter-tuning baselines for vLLM and SGLang . They also found weak strategy diversity, with most agents defaulting to vLLM rather than exploring alternatives .

Products & Launches

Why it matters: new launches are pushing AI from standalone chat into reusable workflows for coding, science, and open deployment.

  • Cohere open-sourced Command A+, its fastest and most powerful model yet, under Apache 2.0 . Cohere says it supports 48 languages, multimodal input, and can run on as few as two H100s.

  • OpenAI brought Codex to mobile. Users can work with Codex from the ChatGPT mobile app, answer questions on the go, and continue the same thread later from a computer .

  • Google DeepMind launched Science Skills for Antigravity, integrating insights from more than 30 life-science sources including UniProt and the AlphaFold Database . In a test on a rare disease caused by AK2 mutations, the team said the tool sped up structural analysis and surfaced novel insights into the condition’s mechanisms .

Industry Moves

Why it matters: capital and partnerships are concentrating around the data, search, and vertical workflows that agents need to be useful in production.

  • Exa raised $250M at a $2.2B valuation to keep building search for agents . The company says it now serves 5,000+ companies and 500,000+ developers, and makes agents smarter and cheaper by returning 90% less text with little to no tradeoff in RAG quality .

  • Genesis and Incyte expanded their AI drug-discovery partnership with $120M upfront, recurring research funding, and potential milestone and royalty payments . Incyte’s proprietary experimental data will help train the next generation of Genesis foundation models, starting with at least five new collaboration targets .

  • RADAR became a new physical-AI unicorn with a $170M Series B. The company says its retail system delivers 99% item-level inventory accuracy in real time and is already deployed in more than 1,400 stores.

Policy & Regulation

Why it matters: governments are moving closer to pre-release oversight of frontier systems.

  • The White House briefed OpenAI, Anthropic, and Reflection AI on a planned executive order that would create a voluntary framework asking labs to share frontier models with government agencies up to 90 days before public release.

Quick Takes

Why it matters: a few smaller updates added signal on agents, search efficiency, and generative media.

  • More than 50% of Devin sessions are now triggered by agents rather than humans .
  • Perplexity said its query-aware context compression cuts context tokens by up to 70% while improving answer quality .
  • Stable Audio 3 launched with three open-source variants plus a closed “large” model .
  • MiniMax Speech 2.8 Turbo arrived on Together AI with 600+ voices and sub-250ms latency across 40+ languages.
Parkinson's Law, FDEs, and Bubble Frameworks
May 21
4 min read
225 docs
Aaron Levie
vas
tobi lutke
+6
The strongest organic recommendations today clustered around operating under acceleration: Tobi Lütke on Parkinson's Law and pace, Aaron Levie on why FDEs matter in AI deployment, Bill Gurley on healthcare AI operations, plus conceptual frames from Garry Tan and Gavin Baker. Parkinson's Law stood out as the clearest single signal because Lütke tied it directly to leadership and execution.

What stood out

A clear pattern ran through today’s recommendations: pace, implementation, market cycles, and how to combine ambition with a good life. Tobi Lütke pointed to a short book on compressing time windows, Aaron Levie highlighted why AI deployment creates a durable FDE role, Bill Gurley shared a healthcare AI operating case, Gavin Baker reached for a historical bubble framework, and Garry Tan recommended an essay rejecting the false choice between meaningful work and living well.

Most compelling recommendation

Parkinson's Law

  • Content type: Book
  • Author/creator: Not specified in the source discussion
  • Link/URL: Direct book link not provided; source discussion: Tobi Lütke interview
  • Who recommended it: Tobi Lütke
  • Key takeaway: Work expands to fill the time allocated to it; a leader’s job is to compress time windows to create pace.
  • Why it matters: This was the strongest signal of the day because Lütke said it is one of his most recommended books, keeps original 1960s/1970s copies, and gives them to executives. He tied it directly to how Shopify thinks about pace.

"This is basically one of the most important functions of a leader is to just compress time windows."

Timely operating reads

Post on Forward Deployed Engineers(title not provided in source)

  • Content type: Article/post
  • Author/creator: Linked via X post; a formal byline/title is not provided in the source
  • Link/URL:Article and post
  • Who recommended it: Aaron Levie
  • Key takeaway: FDE work should persist because AI agent rollouts are both highly technical and deeply tied to workflow change management; rapid model improvement also keeps shifting what is possible and what implementation scaffolding is obsolete.
  • Why it matters: Levie was unusually explicit about why this is worth reading now: he framed it as a durable job category as long as AI keeps changing rapidly.

Interview on healthcare price transparency and AI(title not provided in source)

  • Content type: Podcast/video interview
  • Author/creator: Shared via Chrissy Farr’s post; the source emphasizes the interview topics rather than naming the episode creator/title
  • Link/URL:Episode thread/post
  • Who recommended it: Bill Gurley
  • Key takeaway: The discussion covers healthcare price transparency, consumer cost visibility, Solv’s move toward an AI-first OS, automated voice agents, and the operational phases of AI adaptation.
  • Why it matters: It is a grounded example of AI being applied to a complex market where pricing, workflow redesign, and frontline operations all matter.

Conceptual frameworks worth saving

What Is Intelligence?

  • Content type: Book
  • Author/creator: Not specified in the source discussion
  • Link/URL: Direct book link not provided; source discussion: Tobi Lütke interview
  • Who recommended it: Tobi Lütke
  • Key takeaway: The book re-explains biology through the lens of prediction and emergence, and Lütke said it felt "existentially profound."
  • Why it matters: This was the clearest recommendation today offered because it changed the reader’s frame, not because it was merely timely.

Carlota Perez book on technological revolutions, financial capital, and bubbles(title not provided in source)

  • Content type: Book
  • Author/creator: Carlota Perez
  • Link/URL: Direct book link not provided; source discussion: Invest Like The Best episode
  • Who recommended it: Gavin Baker on Invest Like The Best
  • Key takeaway: Past foundational technologies have often produced bubbles because markets correctly see their importance and investor diversity breaks down into crowded bullishness.
  • Why it matters: It gives a historical frame for thinking about AI exuberance without denying AI’s importance.

Essay linked by Garry Tan(title not provided in source)

  • Content type: Essay/article
  • Author/creator: Not specified in the source; linked via X article
  • Link/URL:Essay link
  • Who recommended it: Garry Tan
  • Key takeaway: The essay argues against the false dichotomy of "grindslop vs aristocratic malaise" and for combining hard work on meaningful things with a life that includes beauty, food, experiences, and real connections.
  • Why it matters: Tan made the value explicit in his endorsement, which turns this from a vague lifestyle share into a clear principle for how to work and live.

Bottom line

If you open only one thing, start with Parkinson's Law for the clearest operating principle in today’s set. If you want the most timely AI read, open Aaron Levie’s FDE post next; it gives a concise explanation of why implementation and change management remain central even as models improve quickly.

OpenAI’s Math Claim, Cohere’s Open Model, and Europe’s Sovereign AI Push
May 21
4 min read
306 docs
Arthur Mensch
Yoshua Bengio
Aidan Gomez
+14
The day’s dominant story was OpenAI’s claim that a general-purpose model solved a long-open Erdős problem in geometry. Elsewhere, Cohere open-sourced Command A+, SAP and Mistral pushed a sovereign European enterprise stack, DeepMind shipped more concrete science tooling, and pressure kept building for stronger agent evaluation.

OpenAI’s math claim led the day

OpenAI says a general-purpose model solved a long-open Erdős problem

OpenAI said one of its models solved the planar unit distance problem, an open question posed by Paul Erdős in 1946, by finding a new family of constructions that outperformed the long-assumed square-grid-like approach . The company described it as the first time AI has autonomously solved a prominent open problem central to a field of mathematics, and said the proof came from a general-purpose reasoning model rather than a system built specifically for this task . Sam Altman separately called it "a kinda big milestone" .

Why it matters: If borne out, this would be a notable step from AI-assisted math toward AI-generated mathematical discovery: the reported result did not just optimize within the accepted picture, but replaced it with a better family of constructions . OpenAI framed it as evidence that models are getting better at sustaining long chains of reasoning that could also help in biology, physics, engineering, and medicine—while stressing that human judgment still determines which problems matter and how results are interpreted .

"AI can help search, suggest, and verify. People choose the problems that matter, interpret the results, and decide what questions to pursue next."

A caution came quickly: Gary Marcus argued that outsiders still do not know how the new model works, how it was trained, what it costs, or how it performs on other tasks, and said judgment should wait for more facts .

Open models and enterprise deployment strategies kept diverging

Cohere open-sourced Command A+ and leaned into efficient deployment

Cohere introduced Command A+, calling it its most powerful LLM yet, optimized to run on minimal hardware and released open source under Apache 2.0—its first fully open-source Apache 2 model . Discussion around the release highlighted a parallel block design that, per the cited tech report excerpt, keeps equivalent performance while improving throughput versus a vanilla transformer block .

Why it matters: In separate commentary, Aidan Gomez argued that Chinese open-source models are matching or nearly matching U.S. frontier benchmarks at far lower cost, and that Western labs’ pricing power will increasingly concentrate in regulated sectors that require secure, democratically aligned deployment .

SAP and Mistral sharpened the European sovereignty pitch

SAP launched a new Business AI platform and Autonomous Suite spanning finance, supply chain, HCM, and industry AI . SAP also said Mistral AI’s full platform is now generally available in SAP’s sovereign European environment, with agents already live for public tender management and complex finance workflows under EU regulation .

Why it matters: The emphasis here was less about raw model performance than business context, governance, auditability, and sovereignty. Arthur Mensch said enterprises deploying agents need traceability, explainability, and protection from extraterritorial exposure, positioning the SAP-Mistral stack as built for production use in Europe .

The agent stack kept moving closer to real workflows

Exa raised $250M to build search and web agents for AI systems

Exa said it raised $250 million at a $2.2 billion valuation, led by a16z, to continue organizing the web for agents . The company said it already serves search to Cursor, Cognition, OpenRouter, 5,000+ companies, and 500k+ developers, and that it makes agents cheaper by returning 90% less text with little to no RAG quality tradeoff while building end-to-end web agents optimized for price, performance, and latency .

Why it matters: Exa is pitching retrieval compression and end-to-end web agents as core infrastructure for AI products, not just a search feature .

Google DeepMind made AI-for-science more operational

Google DeepMind launched Science Skills for Google Antigravity, integrating insights from more than 30 life-science sources including UniProt and the AlphaFold Database. In a test on a rare genetic disease caused by AK2 mutations, the toolkit produced a highly complex structural analysis faster than usual and led to novel insights into the condition’s underlying mechanisms .

Why it matters: Both the tooling and the measurement stack are getting closer to day-to-day research work, from integrated life-science sources inside Antigravity to benchmark tasks built from real scientific workflows .

Reliability questions kept pace with the agent push

More experts are arguing for stronger evaluation and safety evidence

Gary Marcus cited a METR finding that agents "routinely violated constraints" on hard tasks, and argued that this shows current safety approaches are not sufficient . François Chollet separately warned that unconstrained agents will exploit shortcuts or drift toward easier but useless sub-goals instead of solving the real problem .

Why it matters: As agent systems move into research and software workflows, the debate is becoming more concrete: constraint-following, goal stability, and proof of safety are all being treated as operational requirements rather than abstract principles . Yoshua Bengio argued that developers should have to demonstrate safety with scientifically valid risk assessments, and that AI adoption choices should be discussed honestly rather than sold through false confidence about jobs, safety, or social impact .

Continuous Discovery, Cursor-Layer AI, and Open Source PM Lessons
May 21
4 min read
85 docs
Sachin Rekhi
Teresa Torres
Melissa Perri
+9
This brief covers the latest thinking on continuous discovery, cursor-layer AI design, team structure realities, enterprise AI trust patterns, and open source product management. It also highlights practical resources for PM knowledge management and AI feature prioritization.

Big Ideas

Continuous discovery is a structure and a cadence

Teresa Torres frames discovery as three linked moves: define the outcome, uncover opportunities (customer needs, pain points, desires), then test solutions against those opportunities . The method can vary, but the rhythm should not: teams should talk to customers every week, synthesize continuously, and keep roadmaps as living documents instead of rebuilding them in separate planning phases .

"You synthesize as you go."

Why it matters: Julia Austin argues AI can speed prototyping, but it cannot replace ethnographic research or direct contact with real users and buyers; skipping that foundation often means building fast without understanding adoption problems .

How to apply: make one customer touchpoint per week a team habit, update your opportunity map after each session, and treat roadmap items as current/next opportunities rather than fixed quarterly promises.

The next AI interface may sit beside the cursor, not inside a chatbox

Aakash Gupta argues many AI features still force Stage 1 behavior: users open a separate window, restate context, then return to work . Cursor-layer products such as Clicky and Magic Pointer remove that round-trip by letting AI see the screen and answer in place .

Why it matters: teams may think they shipped embedded AI when users are still doing manual context handoffs.

How to apply: audit current AI features for context re-establishment. If the user still has to explain what is on screen, fix that friction before adding another sidebar or chat feature .

Team definitions fail when they ignore how work really happens

Product, design, technology, and actual collaboration patterns all create different maps of the same organization, and those maps rarely align cleanly . Product can redraw boxes cheaply, while engineering absorbs headcount, on-call, and reliability consequences; design often sees the seams without having the structural power to resolve them .

How to apply: map teams honestly on a few spectracustomer proximity, technology ownership, work intake, performance criteria, and real mandatebefore redesigning the org .

Tactical Playbook

A practical checklist for getting enterprise AI through review

One repeatable playbook for enterprise AI approval is: lead with isolated VPC-first architecture, frame AI as deterministic background workflows instead of open chatboxes, add human approval pause-states for high-risk actions, and keep prompts/rules in version control for audits . Julie Zhuo adds the product-side complement: observability, audit trails, structured data, and clear trust signals are what turn AI from a demo into a tool .

How to apply: bring those controls into the first Legal, Compliance, or buyer reviewbefore debating model choice.

For AI-scale VoC, combine tagging with targeted outcome slices

At roughly 50k AI agent conversations per month, one PM team found 1% random sampling useful for "vibe" but not for statistical decisions, while generic LLM topic tagging still failed to explain why specific customers did not convert . Their practical workaround: keep LLM tagging, but review targeted slices like pricing, rage clicks, or handoff and tie those slices to outcomes .

How to apply: define 3-5 slices tied to a business outcome before transcript review; start there instead of browsing random conversations.

Case Studies & Lessons

Open source PM trades control for a bigger market

Dan Cerulli's Kubernetes-era lesson is that open-source PMs do not fully own the roadmap, and success often requires letting competitors participate . Google concluded it could not define a standard alone, but could as part of a consortium; monetization then came through proprietary tools, managed services, or support layered on top of open source . Cerulli's advice when open sourcing internally: be explicit about business value and bring other companies in early for legitimacy and safer adoption . He also notes the model adds drama and loss of control, but created more value than solo efforts for early Kubernetes participants .

Lesson: if ecosystem adoption matters, optimize for shared legitimacy before perfect ownership.

Career Corner

The AI-native PM pitch is resonating

After speaking to more than 800 PMs at PM3 Summit, Sachin Rekhi said the strongest reaction came from the upside of becoming an AI-native PM: more time on solving customer problems and less time on coordination overhead . He believes this could be a "golden era of product management" .

How to apply: start by identifying the coordination-heavy parts of your week and judge AI tools by whether they give that time back to product craft.

Tools & Resources

  • PM Brain OS: a local markdown + CLAUDE.md system that loads relevant context before tasks, updates the right files afterward, surfaces contradictions, and runs a weekly maintenance sweep . Its key design choice is provenance tagging: decisions outrank research, which outranks verbal claims . In the walkthrough, it immediately exposed a strategy gap: 38 of 47 shipped Jira tickets focused on enterprise permissions/admin tooling while only 4 touched the activation funnel . MIT-licensed and installable with one shell command .
  • Cursor-layer toolkit: Gupta's package includes a design spec, three prototypes to test this week, and a 30-minute audit that scores AI roadmap items; in his worked example, the top priority was a one-sprint fix rather than a multi-quarter rewrite .

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 108 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+105

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.