Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Agentic Engineering Takes Over the Coding Agent Playbook
May 22
5 min read
161 docs
Harrison Chase
Addy Osmani
Boris Cherny
+16
Karpathy’s “agentic engineering” framing was the clearest signal today: the leverage is moving from raw codegen to spec-writing, context design, execution environments, and cleanup discipline. This brief covers copyable workflows, new Codex and LangChain releases, Composer 2.5 economics, and the clips/repos worth studying.

🔥 TOP SIGNAL

  • Karpathy gave the clearest framing of the day: vibe coding now raises the floor, but agentic engineering is the discipline of keeping the old quality bar intact while agents handle recall-heavy implementation and boilerplate . Addy Osmani and swyx showed the practical version of the same shift: push your effort up-stack into precise specs, adversarial review, and repo hygiene—not more manual typing .

⚡ TRY THIS

  • Use prompts as the install script. Karpathy’s OpenClaw example is simple: replace a branching shell script with a copy-paste instruction block, hand it to the agent, and let it inspect the environment, adapt the setup, debug failures, and finish the install . This works best on tasks where the result is easy to verify .

  • End every session with a learning pass, then a hostile review pass. Add an instruction to your agent markdown/config like: At the end of this session, codify the learnings or ask me what was new today. Before you push, ask: What did we miss?, What objections exist?, What parts are unclear? and, if needed, don't be nice. Addy Osmani’s point is mutual amplification: both you and the agent should get better every day .

  • Split planning from execution. Matthew Berman’s recommended routing: use frontier models such as Opus 4.7 / GPT 5.5 for the upfront plan, then hand actual code writing and iteration to a cheaper workhorse model like Composer 2.5 . His economics case is concrete: Composer 2.5 sits roughly 1.5 points off frontier on Cursor Bench at about $0.55/task, with pricing at $0.50/M input and $2.50/M output . For your own evals, Boris Cherny recommends keeping recurring real-world “test cases” and rerunning them across models to spot genuine capability jumps .

  • Automate only the verified slice, then widen it. Berman’s OpenClaw lesson was that maintenance overhead can dominate: he says ~95% of his total effort ended up in maintaining the automation as fast-moving software kept breaking it . His safer loop is manual first, then automate one part, confirm it works, then extend the automation step by step . Good rule when the end-to-end agent demo is impressive but brittle .

📡 WHAT SHIPPED

  • Codex got a real usability push on Mac + iPhone. Appshots lets Mac users press Command-Command to attach the current app window to a Codex thread, including a screenshot plus full text from the window—even beyond what’s visible onscreen; it’s live across plans on Mac, with enterprise access coming soon . Remote Computer Use now lets Codex Mobile control Mac apps even when the machine is locked; docs: locked-use docs. The ChatGPT iOS app also added turn-completion push notifications, better reconnection, a more compact conversations UI, /fork, and improved diffs with full-file open .

  • Codex enterprise controls are showing up too. Greg Brockman called out token analytics and plugin sharing for business/enterprise users .

  • Claude Code is adding token attribution. The upcoming /usage command breaks down token consumption by Skills, Agents, MCPs, and Plugins; CLI first, Desktop next .

  • LangChain opened private beta for Managed Deep Agents. It’s model-agnostic managed infra for deep agents, with deepagents init → deepagents deploy → live endpoint, a versioned context hub, per-thread sandboxes with persistent filesystem, shell access, file I/O, auth proxy, snapshotting, and a durable execution harness with no Dockerfile or infra glue. Full breakdown: Introducing Managed Deep Agents.

  • Cursor filled in the Composer 2.5 stack and pricing. The model is built on Kimi K 2.5 and trained with 25x more synthetic tasks than Composer 2 . Pricing is $0.50/M input and $2.50/M output, performance is ~64% on Cursor Bench and 63 on Artificial Analysis’s Coding Agent Index, and it’s only available inside Cursor .

  • Daytona is an infra project worth watching if you care about agent sandboxes. In Latent.Space’s interview, CEO Ivan Burazin says Daytona starts one sandbox in ~60ms, can bring up 50,000 concurrent sandboxes in ~75 seconds, and already serves a customer close to 850k sandboxes/day; RL/eval workloads have gone from 0% to about 50% of usage in just a few months . The comparison to EKS/GKE is speed, ergonomics, stateful snapshots, and dynamic resizing rather than generic container orchestration .

🎬 GO DEEPER

  • 15:57-17:10 — Karpathy on “vibe coding” vs “agentic engineering.” Best short clip of the day if you want a clean mental model for where the human still matters: spec, taste, security, and fundamentals .
  • 27:23-28:40 — Addy Osmani’s adversarial mentor pattern. Good pre-push ritual: ask the model what you missed, what objections exist, and what is still unclear. This is one of the best quality-over-velocity habits in today’s sources .
  • 08:46-09:18 — LangChain on Metaharness. Worth watching if you still think model choice is the whole game: they cite MIT/Stanford’s Metaharness and say they moved from top-30 to top-5 on Terminal Bench 2 by editing the harness only, with no model change .
  • 16:45 — Daytona on sandbox economics. If you’re deciding whether agent infrastructure belongs on generic Kubernetes or something more stateful, this podcast segment is the most concrete operator-level discussion in today’s set: startup time, concurrent scale, snapshot distribution, and why RL/eval workloads look different from “follow-the-sun” agents .

  • Repo to study: datasette-agent. Simon Willison says Claude Code and Codex are both strong at writing plugins against this repo, which makes it a nice small-but-real benchmark if you want to compare coding agents on something more practical than a toy prompt . If you want a local-model loop, he also shared a uv one-liner for running it via LM Studio .

Editorial take: the durable edge is shifting away from “who can prompt fastest” and toward who can define the spec, shape the context, choose the right execution layer, and keep the cleanup bar high.

World Models, Agent Sandboxes, and New Vertical AI Wedges
May 22
6 min read
857 docs
Sarah Guo
Yann LeCun
Elad Gil
+18
The clearest signals this cycle are a concrete shift toward world models, strong traction in agent infrastructure, and a fresh set of vertical AI startups in fraud, pathology, and human-agent coordination. The brief also covers new financing structures, frontier-model economics, and a short list of source material worth reading or watching.

Funding & Deals

  • OpenAI's YC token-for-equity program is still the clearest financing experiment in this batch. Sam Altman offered $2M in OpenAI tokens to every YC startup in the current batch in exchange for equity; YC separately said the offer covers the spring and summer batches and extended the summer deadline to May 25. External commentary framed the tokens as compute credits that can de-risk early product work and may lift valuations at the margin.

  • Round mechanics are lagging company growth. Harry Stebbings said founders are agreeing terms around $3M ARR and reaching $15M-$20M ARR by the time legals finish, with company progression outpacing legal completion.

  • A small angel round came with a clear product lesson. An AI video editor founder said they raised $30K two weeks earlier, then learned from 10 beta users that the real pain was workflow speed, not output fidelity. They responded by cutting validation from 18 gates to 5, limiting retries, and moving to a preview-first flow.

Emerging Teams

  • Daytona: stateful compute for agents. CEO Ivan Burazin previously co-founded CodeAnywhere, used by about 3 million people, and later ran developer experience at InfoBip. After a January 2025 pivot from human dev environments to agent sandboxes, Daytona reported 74% month-over-month growth; one customer runs about 850K sandboxes a day; RL/eval workloads moved from 0% to roughly 50% of usage. The system runs on bare metal with its own scheduler, using local NVMe snapshots to start one sandbox in about 60 ms or 50,000 in about 75 seconds.

  • Incandor: behavioral intelligence for bank fraud. YC says the product links behavior across accounts, making fraud rings, mule handoffs, and banned operators visible. Founders are Matthew Yekell and Luc Rosenzweig.

  • Limrun: mobile development infrastructure for cloud agents. The product provides remote Xcode plus iOS and Android simulators so cloud agents can build mobile software; YC says customers already include Replit, Rork, and Momentic AI. Founder: @muvaff.

  • Voquill: voice AI for pathologists. Voquill listens while pathologists work and drafts sign-out-ready reports in real time, targeting a workflow where many pathologists spend more time writing reports than diagnosing. Founders are @HenryHabibAI, @josiahsrc, and Michael.

  • Human-agent coordination is becoming a software layer. Pentagon, launched by @edgarpavlovsky, argues that agents are already doing coding, research, ops, and customer work but still operate in isolation, turning humans into middleware. Lightsprint is attacking the adjacent problem with a platform for visual planning, parallel cloud agents, live previews, and more reliable shipping.

AI & Tech Breakthroughs

  • World models are moving from research rhetoric into startup formation. Yann LeCun said he founded Advanced Machine Intelligence to pursue world models and physical AI beyond LLMs, predicted 2026 will be "the year of the world model," and argued that LLM-style next-token architectures do not work for video, sensor, or biological data because there are infinitely many plausible next states. Fei-Fei Li said World Labs is building foundation models for spatial intelligence, with world models and world action models that learn from pixels to generate states, policies, and actions for robots and physical systems. Bioptimists is applying similar beyond-language ideas to biology with multimodal, multiscale models aimed at drug discovery and rational medicine design.

"I think 2026 is going to be the year of the world model"

  • OpenAI's unit-distance result is a real symbolic milestone. An OpenAI model discovered a new family of constructions for the planar unit distance problem, outperforming square-grid-based approaches and disproving a belief held since Erdős posed the problem in 1946. Multiple sources framed it as the first time AI autonomously solved a prominent open problem central to a field of mathematics; one account said the model connected geometry to deep number theory, and experts including Noga Alon, Melanie Wood, and Tim Gowers called it "a milestone in AI mathematics."

  • Runway is productizing a stronger video-editing primitive. Aleph 2.0 lets users edit a single frame, preview the change, and propagate that edit through the rest of the video inside the web-based Edit Studio. Cristóbal Valenzuela said Aleph 1.0 had already changed editing workflows and positioned 2.0 as a new standard for the category.

  • Fast inference remains one of the few infrastructure advantages users immediately feel. Cerebras said its wafer-scale AI systems are 15-20x faster than GPUs at inference and are built around a 46,000 square millimeter chip. CEO Andrew Feldman said demand accelerated in 2025 once models became useful enough for everyday work, and argued that speed opens new business models rather than just marginal efficiency gains.

Market Signals

  • AI adoption is now showing measurable GTM leverage. ICONIQ/SaaStr data says companies with AI fully embedded in GTM generate roughly 2x the net new revenue per FTE of medium and low adopters. AI-heavy pipelines also show better top-of-funnel conversion: new lead to MQL is 38% versus 27%, and MQL to SQL is 37% versus 29%. Daily AI use passed 50% in marketing, SDR/BDR, and RevOps.

  • Returns still appear concentrated at the frontier, and the supporting stack is expensive. In recent conversations cited by Patrick O'Shaughnessy, Anthropic's Krishna, Dylan Patel, and Gavin Baker all argued that frontier models capture most economic returns at the model layer; Krishna said customers spend heavily on newer models because frontier intelligence drives meaningful ROI. Sarah Guo added that this is a capex-intensive cycle, that Nvidia is 2-5 years ahead in areas like neoclouds and inference cloud, and that startups still want frontier chip performance because it enables products such as current coding agents.

  • The geopolitics of open versus closed models are shifting. Fei-Fei Li said the 2026 AI Index shows the US-China capability gap has closed for the first time; she added that China now leads in open LLMs, video models, and even world models, while the US is closing models.

  • Efficiency gains are real, but the energy map will get more complicated. Fei-Fei Li said inference costs for language models fell about 280x in the last two to three years through distillation, quantization, and newer chips. At the same time, she said AI's current power buildout is being driven by training and inference on language models, while embodied AI will eventually add a much more distributed pattern of on-machine compute and energy demand.

  • Early-stage distribution is being subsidized with tokens, and that may invite backlash. Harry Stebbings said token spend is becoming a core marketing line item, with founders willing to give away $20K-$50K per month in tokens to drive usage and temporarily out-hustle incumbents. He also warned that layoffs and capital shifts into machines are creating a political and social backlash the tech industry is underestimating.

Worth Your Time

  • OpenAI's planar unit distance thread — Primary-source summary of the math result and what changed relative to long-standing square-grid intuition.

  • Runway Aleph 2.0 demo — Quick product demo of a potentially important editing primitive: change one frame, then propagate the edit across the clip.

Qwen 3.7-Max Raises the Agent Bar as Codex Expands Computer Use
May 22
4 min read
777 docs
Y Combinator
Simon Eskildsen
Figure
+20
Alibaba’s Qwen3.7-Max narrowed the gap to frontier labs with stronger agentic performance, while OpenAI pushed Codex further into persistent computer use. The brief also covers new research on AI review quality, long-context architectures, product launches from Cohere and Devin, and the latest business signals across AI infrastructure.

Top Stories

Why it matters: the clearest signals today were stronger agentic models, faster productization of computer-use systems, and continued compression in AI price-performance.

  • Alibaba pushed deeper into the frontier with Qwen3.7-Max. The company introduced it as a flagship model for the “Agent Era,” highlighting end-to-end coding, MCP-based productivity workflows, and a 35-hour kernel-optimization run that used 1,158 tool calls and achieved a 10.0x geometric-mean speedup over the Triton reference . Artificial Analysis scored it at 56.6, up 4.8 points from Qwen3.6 Max Preview and the closest Alibaba has come to frontier labs, with gains concentrated in scientific reasoning, agentic capability, and coding . Part of that gain came from higher abstention that reduced hallucination rate, not just higher factual recall .

  • Model quality keeps getting cheaper. Text Arena’s price-performance view says the cost of frontier-quality output fell from about $50 per million tokens in 2023 to about $0.10 today, while the gap between sub-$0.20 models and the leader shrank from roughly 350 Arena points to 60. In coding agents, Cursor’s Composer 2.5 reached 62 on the Artificial Analysis Coding Agent Index—third overall—at $0.07 per task in standard mode or $0.44 in Fast mode, versus $4.10-$4.82 for the two higher-ranked systems .

Research & Innovation

Why it matters: the most useful research updates were about better evaluation, better long-context architectures, and better harnesses rather than just bigger models.

  • AI paper review got a strong benchmark result. A study on 82 Nature-family papers found that frontier LMs in an agent harness were judged by 45 expert scientists to outperform the best human reviewer; the authors also said AI reviews were accurate and well-evidenced but less grounded in scientific norms and more homogeneous than human panels .

  • Gated DeltaNet-2 advanced linear attention. The architecture decouples erase and write gates, outperformed KDA and Mamba-3 at 1.3B scale, and showed especially large long-context gains, including S-NIAH-3: 63 → 90 and multi-key needle retrieval: 28 → 38.

  • Harness quality still matters. The new Physics-Intern scaffold wraps a model with a dedicated subagent for science problems; it raised Gemini 3.1 Pro from 17.7 to 31.4, beating GPT 5.5 Pro, while GPT 5.5 Pro itself did not improve under the harness .

Products & Launches

Why it matters: leading products are moving from chat and code generation toward persistent work, computer use, and lower-cost deployment.

  • OpenAI expanded Codex into a more persistent computer-use product. New updates let Codex securely use apps on a locked Mac from a phone, run in Goal mode across the app, IDE extension, and CLI for tasks lasting hours or days, pull screenshots plus visible and off-screen text into threads via Appshots, and make direct webpage changes with advanced annotation . OpenAI also added richer business analytics and team plugin sharing .

  • Cohere open-sourced Command A+. Cohere called it its most powerful LLM yet, optimized to run on minimal hardware and released for broad access; separate posts described it as the company’s first fully open-source Apache 2 model, and a W4A4 Hugging Face release promises sharply lower serving footprint with little performance loss .

  • Devin gained native Windows support. Cognition said Devin can now run in a Windows VM with support for MSBuild, IIS, PowerShell, SQL Server, and enterprise controls including isolated sessions, SOC 2 Type II, ISO 27001, SSO, and RBAC .

Industry Moves

Why it matters: the business story is increasingly about who can convert model demand into durable revenue, infra scale, and profitable software.

  • The frontier revenue race is separating into growth and profitability stories. Posts citing reported figures put OpenAI at about $5.7B in Q1 revenue versus Anthropic at roughly $4.7B-$4.8B. But Anthropic’s recent annualized revenue reportedly neared $45B and it is projecting about $600M in operating profit, while separate commentary said OpenAI was losing $1.22 for every dollar earned and had user growth stalled near 905M weekly actives .

  • Modal raised $355M at a $4.65B valuation. The round was led by General Catalyst and Redpoint, with the company framing its mission around infrastructure that improves developer productivity for AI and data teams as workloads scale .

  • turbopuffer reported breakout traction in search infrastructure. The company said it crossed $100M run-rate in March, just 19 months after $1M ARR, while staying profitable with under $1M raised; it says customers include Cursor, Anthropic, and Cognition, and that Cursor cut costs 95% after migrating production search workloads .

Quick Takes

Why it matters: a few smaller updates sharpened the picture on chips, local AI, robotics, and startup distribution.

  • HBM is eating more of AI chip budgets: its share of frontier chip component spend rose from 52% to 63%, with total spend growing from about $12B to $32B.
  • llama.cpp added WebGPU support, enabling GPU-accelerated local models in the browser with no data leaving the device .
  • Figure said F.03 reached 200 hours of autonomous operation without failure after processing 238,000 packages.
  • OpenAI is offering $2M in tokens to every YC company in the spring and summer batches .
Seven Powers, a Productivity-Growth Paper, and a San Jose Case Study
May 22
2 min read
214 docs
Garry Tan
Patrick Collison
Amjad Masad
+1
Patrick Collison surfaced the day’s strongest learning resources: Hamilton Helmer’s *Seven Powers* for thinking about durable software moats, and a Nick Bloom paper on innovation and productivity growth. Garry Tan’s notable pick was a New York Times video on San Jose’s homelessness approach, highlighted for its concrete design and measurable outcomes.

What stood out

Today’s strongest signals centered on durable frameworks and outcome-focused case studies. Patrick Collison supplied the clearest founder-learning recommendations: a favorite book on software moats and a paper on innovation and productivity growth. Garry Tan’s pick was a policy video, but it fit the same filter: a resource endorsed for its concrete model and observed results.

Most compelling recommendation

Seven Powers

  • Content type: Book
  • Author/creator: Hamilton Helmer
  • Link/URL: Direct book link was not provided in the source; source discussion: Patrick Collison & Amjad Masad
  • Who recommended it: Patrick Collison
  • Key takeaway: Collison said software moats may not change all that much over the next 5–10 years and pointed to Seven Powers as one of his favorite frameworks because it reduces the question to “seven moats.”
  • Why it matters: This was the strongest recommendation because Collison used it directly when answering how durable competitive advantages in software should be analyzed.

"one of my favorite books on the subject is Hamilton Helmer's ... seven powers."

Two other recommendations worth saving

Paper on innovation and productivity growth(title not provided in source)

  • Content type: Research paper
  • Author/creator: Nick Bloom
  • Link/URL: Direct paper link was not provided in the source; source discussion: Patrick Collison & Amjad Masad
  • Who recommended it: Patrick Collison
  • Key takeaway: Collison described it as a “very interesting paper” about how innovation and productivity growth, at least on a per-person basis, appeared to be declining and how that broader stagnation might be explained.
  • Why it matters: It offers a macro frame for readers who want to think about innovation, productivity, and stagnation at the system level rather than only at the company level.

San Jose's approach to homelessness(video)

  • Content type: Video
  • Author/creator: The New York Times
  • Link/URL:NYT video
  • Who recommended it: Garry Tan
  • Key takeaway: Tan highlighted a pragmatic approach built around converting rundown motels into transitional housing, prioritizing people from the local neighborhood first, and creating no-encampment zones only after offering real alternatives.
  • Why it matters: He emphasized concrete outcomes rather than abstract intent: fewer 911 calls, less blight, and more stable neighborhoods.

Bottom line

If you open only one resource, start with Seven Powers. It had the clearest endorsement and the most direct application to a durable founder problem: how to reason about moats in software. After that, Nick Bloom’s paper is the best macro follow-on, while the NYT video is the most concrete non-business case study in today’s set.

Inference Economics Tighten as AI Moves Deeper Into Work
May 22
5 min read
318 docs
Sarah Guo
Yann LeCun
Elad Gil
+24
Cerebras’ IPO and a widening compute squeeze made inference economics hard to ignore, while OpenAI, LangChain, and agent-infrastructure companies pushed AI further into real software workflows. At the same time, leading researchers pointed beyond language models toward physical AI and world models.

Compute economics are getting harder to ignore

Cerebras’ IPO turns inference into a capital-markets story

Cerebras went public at roughly a $60B-$63B market cap and says its wafer-scale chips deliver 15-20x faster inference than GPUs across model sizes . The company also disclosed a >$20B OpenAI agreement and an AWS deployment deal, while founder Andrew Feldman said demand accelerated once models became useful in everyday work and speed became essential .

Why it matters: This is one of the clearest signs yet that inference speed and latency are becoming core business drivers, not just technical specs .

The compute squeeze is spreading beyond the biggest clouds

Discussion around NVIDIA’s latest results emphasized that demand is expanding beyond hyperscalers into enterprises, AI labs, industry, and robotics, with NVIDIA’s growth tracking ahead of hyperscaler capex alone . Sarah Guo said some startups are now trying to secure $100M blocks of compute on multi-year commitments, and argued that today’s coding agents would not have been possible on hardware from three years ago .

Why it matters: Access to frontier hardware is becoming part of product strategy, especially as more startups push into agentic and inference-heavy workloads .

OpenAI is widening from model provider to workflow platform

Codex is moving closer to the operating system

OpenAI launched Codex Appshots, letting Mac users attach an app window to a Codex thread with both a screenshot and extracted text, including off-screen content . It also released Remote Computer Use so Codex Mobile can operate Mac apps while the computer stays at home and locked, and rolled out ChatGPT for PowerPoint for building, querying, and editing decks inside PowerPoint .

Separately, Greg Brockman said major banks are using OpenAI’s Daybreak for cybersecurity defense .

"the model alone is no longer the product"

OpenAI is also tightening its startup pipeline

OpenAI is offering $2M in tokens to every YC company in the spring and summer batches, with the summer deadline extended to May 25 . Publicly framed as "OpenAI for YC companies," the offer signals a deeper OpenAI-YC partnership around early-stage AI startups .

Why it matters: OpenAI is pushing on both ends at once: deeper workflow integration for users and closer platform ties to the next wave of startups .

Agent software is becoming its own layer

LangChain is packaging agents for operators, not just developers

LangChain launched LangSmith Fleet, a no-code managed agent builder with 200+ built-in tools, 7,500 more through Arcade, native Slack/Gmail/Outlook integration, and support for open or closed models on its Deep Agents harness . LangChain says its own teams already use agents for talent sourcing, marketing research, incident response, and go-to-market work, with the go-to-market agent lifting lead-to-qualified conversion 240% .

Why it matters: The control layer is moving upward from raw model APIs to agent builders, tools, channels, and governance that non-engineers can use directly .

The runtime layer for agents is starting to look like new cloud infrastructure

Daytona said its pivot from human developer environments to AI-agent sandboxes is now producing 74% month-over-month growth, with roughly 60ms startup times, stateful snapshots, dynamic resizing, and customers running up to about 850,000 sandboxes per day . CEO Ivan Burazin argued that agents need full "composable computers" rather than simple code-execution boxes, and that the resulting stack may look like a dedicated cloud for agents .

Andrej Karpathy described a December inflection where agentic workflows became reliable enough for sustained "vibe coding," and argued that "Software 3.0" means prompting and context increasingly act as the program .

"you can outsource your thinking but you can't outsource your understanding."

Why it matters: If agents become long-running software workers, their runtime layer may become its own infrastructure category .

Beyond language, leading researchers are converging on physical AI

World models are becoming the next big research bet

Yann LeCun said research is moving from language and other discrete symbols toward "physical AI" and world models, arguing that the architectures that work so well for language do not transfer cleanly to real-world prediction because there are infinitely many plausible next states . Fei-Fei Li made a similar case, describing the next wave as spatial and embodied intelligence built from world, action, and video models—and warning that once these systems mature they will create a fresh surge in energy demand beyond today’s language-model data centers .

Why it matters: Some of the field’s most influential researchers are increasingly talking as if the post-LLM race is already underway .

The hardware stack may not be ready for it

Sara Hooker argued that the industry is shifting from pure pretraining scale toward post-training, test-time compute, and sequential interaction with the world—exactly the kinds of workloads current GPUs handle poorly . In her view, the next gains will come from co-designing algorithms and hardware, plus systems that can keep learning without forgetting as they operate over longer horizons .

Why it matters: If physical AI becomes the next major frontier, the bottleneck may be architectures, hardware, and interfaces built for interaction—not just bigger text models .

Also worth watching

A meta-analysis of 210 biomedical AI studies found that 97% of papers using statistical comparisons under cross-validation relied on invalid statistical tests, prompting warnings of a replication crisis in biomedical AI . The result comes from a new preprint led by Tianchu Zeng and coauthors .

Why it matters: As AI-for-science claims accelerate, evaluation methodology is becoming a story in its own right .

Decision-First Prototyping, AI-Native PM Workflows, and Behavior-Driven Product Design
May 22
3 min read
82 docs
Aakash Gupta
Product Management - The place for all things product
Nir Eyal
+5
This brief covers why prototypes should be built to force decisions, how AI is reshaping PM work toward design and agent orchestration, and what recent discovery examples suggest about behavior-driven product design.

Big Ideas

  • Decision-first prototyping. Ravi Mehta’s core reminder: prototypes are discovery tools, not delivery artifacts. AI makes it cheap to spin up demos, which increases two risks—over-polishing and generating more variants than the team can learn from. Why it matters: both failure modes feel productive but delay the actual decision. Apply: define the decision first, choose the lightest prototype that can answer it, and plan to discard it .

"The prototype itself is never the point. The decision it enables is."

  • The AI-era PM role is narrowing around design, customers, and systems. Several operators describe the same shift: PM/design boundaries are blurring, PMs should code in playgrounds rather than ship production code, and the irreplaceable work becomes customer time, system management, and output review . Andre Albuquerque adds a more radical operating model: execution should be solo while discovery and delivery stay collaborative; his PM agent routes work to five specialists, and with half of build time spent on agent infrastructure, three people now match the output of teams of 15 . Apply: get explicit about which PM archetype your company values, then train for that version of the role .

Tactical Playbook

  • Match prototype type to the question.

    1. Concept prototype when the problem is clear but solutions compete; keep it low-fidelity with mock data
    2. Design prototype once a direction is chosen; use working flows to replace deck-driven alignment
    3. Research prototype when you need real behavior; use realistic data and instrumentation
    4. Technical prototype when feasibility is the question; focus on latency, quality thresholds, and scale, especially for AI

    End each one by naming the decision it should force .

  • Use backend language in AI specs and prompts. When working in Claude Code or reviewing AI-generated specs, explicitly ask for: async handling with loading states, race-condition checks for read/write flows, idempotency keys for retries, and graceful degradation with happy/loading/error states so one failure does not take down the whole experience . Why it matters: the output improves when PMs “speak the system’s language” .

Case Studies & Lessons

  • Goal abandonment looked more social than motivational. After 10 semi-structured interviews anchored on “What actually killed the goal?”, one PM heard recurring themes: no social consequence, urgency miscalibration, identity fragmentation, and procrastination disguised as productivity . Existing tools looked like symptom-fixes, so StrideWithMe deliberately avoided leaderboards, points, and public-by-default sharing . Lesson: map insights to mechanisms before mapping them to features.

  • AI raises the bar on stored value. Nir Eyal says the Hook Model’s four steps still hold, but AI supercharges the investment phase by letting products remember prior behavior and adapt in real time . His TikTok example: immediate reward on first open, then dwell-time and interaction data improve future recommendations . Lesson: define what user behavior should make the next session better—and keep the design on the side of persuasion, not coercion or addiction .

Career Corner

  • Show the PM shape you fit—and prove it with artifacts. The PM role is becoming more design-focused in some companies, engineering-focused in others, and more traditional elsewhere, so candidates need to understand what their company actually values . One transition example packaged that proof as a technical PRD, an independent discovery case study, and a self-deployed portfolio site; the most direct feedback was to add metrics to every project . Shreyas Doshi’s durable edge: get better at explaining the user psychology behind why products resonate, because that compounds creativity over time .

Tools & Resources

  • Operating template: a CLAUDE.md can act as a lightweight PM system prompt—encoding agent roles, routing rules, and constraints before work begins. Andre’s rule was simple: always call the PM agent first; when something fails, fix the agent or rule and rerun the pipeline .
  • Free event:Product leadership skills in the AI era with Shreyas Doshi and Gil Feig focuses on what AI-native teams expect from product leaders .

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 108 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+105

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.