Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Goal Loops Take Over Coding Agents; Google Ships Managed Agents and Cursor Adds /loop
Jun 4
6 min read
153 docs
Elad Gil
Logan Kilpatrick
Romain Huet
+12
Goal-based coding agents are converging across OpenAI, Google, Cursor, and Microsoft. This brief covers the copyable workflows behind that shift, what shipped today, and the strongest anti-hype lessons from engineers who benchmark agent output against expert baselines.

🔥 TOP SIGNAL

The biggest practical shift today: coding agents are standardizing around goal loops, not chat turns—Romain Huet shows Codex's goal flow as one ambitious task plus a verifiable completion condition that can run for hours or days , while Google's managed-agents team says Gemini's Interactions API had to become agent-first because real agents do tool calls, sub-agents, and continuous steps rather than simple user/model turns . Jediah Katz's new Cursor /loop skill and Satya Nadella's note that Copilot now needs UI for 100+ concurrent agent sessions make the same point from the product side: the valuable skill is increasingly orchestration—wake-up conditions, state, and review—not just faster prompting . Also: don't confuse autonomy with quality—Mitchell Hashimoto's loop found a big local win, but his handwritten baseline was still ~75x better , and Alexander Embiricos's bring the taste principle is the right operating model for serious work .

"AI can write your code... But it cannot care... You have to bring the taste."

⚡ TRY THIS

  • Batch a whole class of work with one hard done-condition (Romain Huet / Codex). In Codex, type goal, then phrase the task as a final state the agent can verify: pull all of the bugs from the backlog from yesterday's launch and prepare a PR for each of them and make sure all of the tests pass. OpenAI says goal mode is meant for tasks that run for hours or even days; Huet shows the same pattern on a large migration with migrate this entire code base to Java 26 plus the requirement that everything keeps going until tests pass .

  • Add a wake-up loop for waiting tasks (Jediah Katz / Cursor). Cursor can now watch terminal output and take action; Katz used that to build a public /loop skill. Copyable prompts: /loop until this PR merges and /loop 1h check #infra-logs for anything critical. Caveats: it does not work in Cloud Agents yet, and it will not fire while your computer is sleeping .

  • Constrain optimization agents like you would a junior performance engineer (Mitchell Hashimoto). His RALPH loop was basically while not done: try again, but with explicit no-go zones: the agent could not modify input data structures, the public API, or tests . That still produced a large improvement—88ms -> 1.5-2ms and 150k allocations -> 500—but the real move is the second pass: benchmark against an expert baseline before you call the result great, because Hashimoto's handwritten version was still far better .

  • Move agent definition into files, not ephemeral chat state (Google Managed Agents / Anti Gravity). Google's current workflow is plain markdown: an agents.md file for how the agent should work, plus separate markdown skill files . Pair that with agent-first docs in markdown and the MCP server, then call the agent through the Interactions API—or start in AI Studio and one-click export into Anti Gravity once the project hits real-codebase territory .

📡 WHAT SHIPPED

  • Google — Managed Agents in the Gemini API. One API call spins up an autonomous agent in a remote Linux sandbox that can write code, run Bash, and create files; the launch stack is Gemini 3.5 Flash plus the Anti Gravity agent harness .

  • Google — Anti Gravity now spans IDE, CLI, SDK, and API. Google positions it for agentic engineering on very large codebases with guardrails, not just quick prototypes .

  • Google — Interactions API is the unifying layer for models + agents. The same interface can call models and managed agents; the data model is now agent-first, with tool calls, sub-agents, and continuous step streams instead of turn-based chat .

  • OpenAI — Codex / GPT 5.5 goes deeper into full-lifecycle engineering. Goal mode can run ambitious software tasks for hours or days with verifiable completion conditions ; Cisco says Codex is already being used for new code and legacy migrations that used to take months and now take weeks . The same demo showed 6.5-hour full-codebase security scans with inline P0 findings, appshot/computer-use testing that drives the app without taking over the user's machine, and automatic engineering context pulled from tools like Databricks .

  • Cursor — terminal-watching agents are public. The /loop skill is available now, making scheduled or output-triggered wake-ups a practical local workflow; still no Cloud Agents support, and sleep mode stops it .

  • LangChain — LangSmith Sandboxes GA. LangChain's framing is exactly the coding-agent requirement: stateful little computers where agents can install packages, edit files, follow long-running threads, resume later, and run untrusted code safely by default . Announcement: langchain.com/blog/langsmith-sandboxes-generally-available

  • Sourcegraph — MCP server for Copilot context. GitHub Copilot can be connected to the Sourcegraph MCP server to pull context from all repositories, including code that lives in GitLab .

  • Microsoft / GitHub Copilot — agent scale is reshaping both UI and pricing. Satya Nadella says coding usage has grown to the point where the IDE now has to manage 100+ agent sessions, which is why chat alone no longer works and a canvas UI is needed . He also says Copilot had to move away from pure per-user economics because long-running agent workloads are much more intense than classic code-complete usage .

  • Microsoft — the harness is becoming the product. Nadella describes a GitHub harness that loops models, data, and tools, uses rich context prep plus multimodal tool access for efficiency, and can be tuned with private evals; he says the same harness is used across products and is available in Foundry .

🎬 GO DEEPER

  • 37:04-38:03 — Romain Huet on Codex goal mode. Best short explainer of the new interaction pattern: ambitious goal, explicit done-condition, then let the agent run .
  • 5:59-6:40 — Google on why agent APIs can't stay turn-based. Useful mental model if you're building your own harness: the real unit is a stream of tool, function, and sub-agent steps—not one chat reply .
  • 7:06-7:37 — Satya Nadella on the 100 agent sessions problem. Short clip, big signal: once agents run in parallel, chat-only IDE UX breaks .
  • 34:31-35:41 — Mitchell Hashimoto's optimization loop, with the anti-hype payoff. Watch this for a concrete example of constraint-driven agent search—and why a huge gain can still be nowhere near the real ceiling .
  • Study: LangSmith Sandboxes GA is the cleanest short writeup in today's set on the execution model serious coding agents need: stateful environments, resumability, and safe untrusted-code execution. langchain.com/blog/langsmith-sandboxes-generally-available

  • Study: Sourcegraph's Copilot + MCP demo is worth a quick pass if your pain point is cross-repo context, especially in mixed GitHub/GitLab setups .

Editorial take: the edge is moving from prompt cleverness to operational discipline—clear goal conditions, stateful sandboxes, wake-up loops, and humans who know when better still isn't good enough .

Lassie and TownAI Raise as Verified AI, Open Models, and AI-Native Services Gain Momentum
Jun 4
6 min read
836 docs
sarah guo
Jean-Denis Greze 💡
martin_casado
+11
Fresh Series A rounds for Lassie and TownAI highlighted the shift from copilots to systems that execute work, while Axiom Math, Harvey, and open-model infrastructure pointed to new technical leverage. YC's AI-native services playbook was the clearest market signal for what to underwrite next.

Funding & Deals

  • Lassie - $47M Series A led by a16z. Lassie says it already works autonomously for 700+ dental practices, delivering 30 hours of labor per month by handling workflows like insurance claims and payments. Founders Steijn Pelle and Frederic Renken left Robinhood and Superhuman, then spent months in dental offices processing payments by hand before writing code. a16z's thesis is that small-business owners are buried in back-office work and that Lassie is built to do the work, not just assist with it.

  • TownAI - $55M Series A led by a16z, with Forerunner, First Round, Alt Cap, and Conviction. Coming out of beta, Town connects across email, calendar, Slack, docs, WhatsApp, desktop, and web, then handles drafting, scheduling, project tracking, follow-ups, context gathering, and other multi-step tasks while adapting to the user's voice, relationships, priorities, and routines; it only acts when the user says so. The team combines former Plaid CTO Jean-Denis Greze with former Google product/AI and Dropbox design leader Tony Vincent. Investor reaction was notably strong: Sarah Guo highlighted the product as a distinctly different AI experience, and Ben Horowitz said he has been using it.

Emerging Teams

  • Hexa - AI automation for industrial distribution. YC's framing is simple: 50% of orders go to whoever quotes first, yet distributors still hand-copy RFQs into 30-year-old software. Hexa is automating sales and procurement workflows so industrial distributors can quote faster and win more bids. Founders: @Ishaanx75, @MannPatira, and @AuriNayak. YC launch: Hexa.

  • PliOS - compliance OS from a founder with real domain depth. The founder says he spent 10+ years in financial regulation and compliance, including work as a bank regulator and senior director in fintech/digital assets, and also has a software engineering background. PliOS was launched recently and uses AI agents to map obligations to FinCEN, OFAC, and state MTL rules, draft policies, run examiner-grade risk assessments, and generate board-ready reports; the positioning is a fractional CCO for crypto, fintech, banks, and MSBs.

  • Wato - collaboration infrastructure for multi-agent teams. The company is building a shared AI workspace with shared knowledge, cloud agents, automations, and permissioned tools across the AI subscriptions companies already use. Founders: @arihanxv and @rahulrejeev. YC launch: Wato.

  • Playabl.ai - notable consumer traction. YC describes it as a TikTok for user-generated games where anyone can play, create, publish, and monetize; it says the product reached 1M organic plays across 3,000 games in five days. Founders: @hamzawy998, @omarmjarrah, and @sanadkiswani.

AI & Tech Breakthroughs

  • Axiom Math - formal verification is emerging as a serious AI advantage. Latent Space reports the seven-month-old startup solved all 12 Putnam problems, with 8/12 inside the official time limit; the same source notes the result exceeded top undergraduate scores and prior reported AI results, while also flagging that time-limit comparisons are imperfect. The company also reportedly reached 99% (187/189) on the Verina code-and-proof benchmark versus 4.9% for OpenAI o3, and its thesis is that Lean-verified proofs create a high-quality corpus that can compound across training and inference. Axiom has also open-sourced AXLE for exploring, validating, and manipulating Lean proofs, and the episode references a $200M Series A at a $1.6B valuation.

  • Hybrid and post-trained open legal agents are getting economically compelling. Harvey says a hybrid setup with GLM 5.1 as the primary worker and Opus 4.7 as an advisor reached an 18% all-pass rate on a 100-task legal benchmark slice versus 14% for Opus alone, at $368 versus $954, with Opus invoked just 0.83 times per task on average. The same work says SFT moved Kimi 2.6 from 11% to 15% all-pass, again above Opus on that slice, at $84 versus $954. The read-through from Clement Delangue is that routing plus post-training can outperform blanket frontier-model usage in cost-sensitive, high-accuracy domains.

  • InstinctRazor shows one path to large-model inference on modest GPUs. General-Instinct says its 122B MoE setup keeps experts on CPU and active GPU VRAM around 8 GB, with a compressed model size of about 50 GB total. Its published table shows it ahead of Gemma-4-A4B on 5 of 7 listed evals, including MMLU-Pro, GPQA-Diamond, MMMLU, HLE no-tools, and LiveCodeBench v6; the authors also note it trails on MATH-500 and AIME, so the key signal is the memory/performance tradeoff.

  • Ideogram 4.0 keeps the open image stack moving. Ideogram introduced 4.0 with downloadable weights, user fine-tuning, and local hardware runs. a16z and Martin Casado both highlighted the release, with Casado specifically pointing to strong Design Arena performance and the health of the open-source image ecosystem.

Market Signals

  • Copilots are giving way to systems that own the workflow. YC's latest framing is that some of the biggest next-decade companies may be services businesses rebuilt with AI doing most of the work, not software vendors selling internal tools. That logic matches what was actually funded this week: Lassie says it does the work for dental practices rather than act as a copilot, and Town is positioning as a personal assistant that already knows how the user works and can handle multi-step tasks.

  • YC's underwriting checklist for AI-native services is unusually practical. The best markets are ones where work is already outsourced, task-level judgment is low enough for most steps to be automated, the overall job is hard enough that models plus humans can outperform either alone, and regulation can raise the moat. On team quality, YC is looking for domain fluency, model fluency, and operational rigor; on execution, it warns that too many early pilots and output variance are existential risks.

  • The model layer is fragmenting in favor of routing, post-training, and deployment advantage. Harvey and Fireworks point to domain-specific systems that beat frontier APIs on cost and sometimes quality, Ideogram is distributing open weights for local use, and InstinctRazor is compressing the hardware barrier for large models. The pattern strengthens the case for model harnesses and vertical tuning as investable layers.

Worth Your Time

  • How to Build an AI-Native Services Company - the clearest operating playbook in the set: why outcome-based AI services exist now, which markets fit, what team traits matter, and why variance kills these businesses faster than price.

"The world is not made of words."

  • A functional taxonomy of world models - Fei-Fei Li's short framing for why world models learn the structure of space and time, not just text, and why that could matter for systems that need to reason about and interact with the physical world.

  • Scaling Past Informal AI - Carina Hong, Axiom Math - the best source here for Axiom's Putnam result, Verina score, AXLE, and the broader thesis that formally verified outputs compound.

  • Harvey's legal-model thread - a compact case study in selective frontier routing and post-training economics for a domain where precision matters.

  • InstinctRazor blog - worth reading if you care about the memory/performance tradeoff behind large-model inference on modest hardware.

Gemma 4 Goes Local, Open Image Models Leap Ahead, and AI Strategy Turns Task-Specific
Jun 4
4 min read
774 docs
Digital EU 🇪🇺
Reuters
OpenAI
+21
Google’s Gemma 4 12B anchored a broader shift toward open, local, multimodal AI, while Ideogram and Reve raised the bar in image generation. This brief also covers Google’s LEAP formal-math result, new agent products, major funding moves, and the EU’s next AI Act implementation step.

Top Stories

Why it matters: the biggest shift today was not just better models, but more capable AI moving into open, local, and workflow-specific deployment.

  • Gemma 4 12B pushed local multimodal AI forward. Google released an Apache 2.0, unified encoder-free model that brings agentic reasoning, vision, and audio to laptops with 16GB VRAM; ecosystem posts also highlighted 256K context, tool calling, audio input, and day-0 support across major serving stacks and runtimes .
  • Open image generation took another step up. Ideogram 4.0 opened its weights for download, fine-tuning, and local use, then became the top open model on Text-to-Image Arena with a 1204 score and especially large gains in text rendering and commercial design . Reve 2.0 also launched with precise layout-based generation and editing, landing #2 overall in the same arena at 1280, up 125 points from v1.5 .
  • Microsoft sharpened the case for task-specific AI moats. Satya Nadella said frontier performance is becoming task-specific, argued that private evals may be a company’s biggest IP, and described agents trained on company traces as assets; he also shared that one Azure team asked for more tokens rather than more headcount .

Research & Innovation

Why it matters: the strongest research updates showed progress in verifier-grounded reasoning, agent optimization, and scientific AI.

  • Google’s LEAP paired a general LLM with Lean verification and posted a major formal-math jump. The system grounds each step in the Lean compiler and iterates on verifier feedback; it solved all 12 Putnam 2025 problems and raised Lean-IMO-Bench one-shot performance from under 10% to 70%, above a specialized gold-medal system at 48% .
  • Microsoft Research’s SkillOpt treats agent instructions as trainable state. Instead of changing the agent itself, SkillOpt edits the skill document with validation-gated changes; it was best or tied across all 52 evaluation cells, beat human-written skills and prior methods, transferred across models and harnesses, and added 20 points on a multimodal paper-figure-extraction skill with no extra inference cost .
  • Genesis reported a strong zero-shot cofolding result. Its Pearl system reached 60% sub-1 Å accuracy on OpenBind versus 27% for the next-best model, using physics-guided generation and combined physics-plus-AI ranking to model induced fit rather than assuming fixed protein pockets .

Products & Launches

Why it matters: new launches kept pushing AI deeper into domain workflows and everyday desktop software.

  • OpenAI upgraded GPT-Rosalind for enterprise life sciences. The model series now combines GPT-5.5’s agentic coding and tool use with stronger capabilities for drug discovery, analysis, design, and experimental workflows .
  • Perplexity brought Personal Computer to Windows. The product runs on a user’s machine, orchestrates across everyday apps and files, and is rolling out first to Max and Enterprise Max users on the waitlist .
  • TownAI launched out of beta as a cross-workflow assistant. It connects to inbox, calendar, Slack, docs, messages, and workflows to handle drafting, scheduling, follow-ups, project tracking, and other multi-step tasks, while only acting when told to and adapting to user routines over time .

Industry Moves

Why it matters: usage and capital continue to concentrate around open deployment and large-scale AI platforms.

  • Open-weight models have overtaken closed models on OpenRouter. The platform says 69.1% of token volume now goes to open-weight models, versus 30.9% for closed models .
  • Capital kept flowing into AI leaders. Suno announced a $400M Series D at a $5.4B valuation , while Reuters reported DeepSeek is slated to draw $7B in its first fundraising round .
  • MiniMax positioned M3 for the local-LLM stack. The company highlighted M3 inside NVIDIA and Microsoft’s local lineup with open weights, 1M context, strong coding, and native multimodality; it said full 1M context is server-class, while consumer hardware can run quantized versions locally .

Policy & Regulation

Why it matters: Europe’s AI rulebook is moving from principle to implementation.

  • The EU AI Act added new implementation bodies. The EU created a Scientific Panel and Advisory Forum of independent experts to help apply the law across Europe, and Yoshua Bengio said he is joining the Scientific Panel to advise on implementation and risk assessment .

Quick Takes

Why it matters: a few smaller updates still sharpen the competitive picture.

  • Step 3.7 Flash shipped as open weights under Apache 2.0, with 256K context, ~400 output tokens/sec, and better agentic scores than Step 3.5 Flash, though it still trails peers on knowledge and hallucination .
  • Anthropic says its data team automated 95% of business analytics queries with Claude and published its approach to skills, data foundations, evals, and online validation .
  • Huawei’s KVarN claims 3-5x more context length with FP16-level accuracy, higher-than-FP16 throughput, one-flag vLLM integration, and no calibration .
  • Miso One opened an 8B expressive TTS model with 110ms latency, one-shot voice cloning, and self-hosting for private audio workflows .
The Beginning of Infinity, Liang Wenfeng, and Two Current AI Reads
Jun 4
3 min read
213 docs
clem 🤗
Adam Sadovsky
Reid Hoffman
+3
Kjun Chu’s explanation of why David Deutsch’s The Beginning of Infinity changed her model of progress is the clearest high-signal recommendation in this batch. Bill Gurley adds a Liang Wenfeng study pair, while Clement Delangue and Marc Andreessen surface two current AI reads they found worth attention.

Most compelling recommendation

The Beginning of Infinity — David Deutsch

This was the richest recommendation in the notes because Kjun Chu did not just name the book; she explained how it changed her model of progress. She said it moved her from seeing history as cyclical to seeing progress as the result of constructing and layering good explanations that are hard to vary and have broad reach.

  • Content type: Book
  • Author/creator: David Deutsch
  • Link/URL: Mentioned in What it takes to trust AI
  • Who recommended it: Kjun Chu
  • Key takeaway: Progress comes from the construction and layering of good explanations; the strongest explanations both fit the phenomenon and extend to other phenomena.
  • Why it matters: It is the clearest worldview-shaping recommendation in this batch, and it comes with a reusable standard for judging ideas.

"The idea he puts forward is that the construction and layering on of good explanations is what results in progress over time."

Best founder-study pair

Bill Gurley's recommendation was less about a single article and more about a reading habit: if Chinese founders and investors study leading US founders closely, US readers should do the same in reverse. He singled out Liang Wenfeng of DeepSeek as "one to watch" and pointed to two useful entry points.

"Chinese founders/investors read all they can about leading US founders. Good practice to do the same in reverse. Liang Wenfeng (DeepSeek) is one to watch."

Liang Wenfeng recent-moves profile

  • Content type: Article / profile
  • Author/creator: Not specified in the provided notes
  • Link/URL:Recent moves profile
  • Who recommended it: Bill Gurley
  • Key takeaway: Gurley singled Liang Wenfeng out as a founder worth watching closely.
  • Why it matters: It gives readers a current snapshot of the founder Gurley highlighted.

Liang Wenfeng interview translation

  • Content type: Interview translation / article
  • Author/creator: Not specified in the provided notes
  • Link/URL:Translation of earlier long-form interviews
  • Who recommended it: Bill Gurley
  • Key takeaway: Gurley paired the current profile with translated primary-source material from earlier interviews.
  • Why it matters: Taken together, the pair offers both recent context and longer-form founder voice.

Two current AI reads getting authentic praise

MAI-Thinking-1 tech report

  • Content type: Tech report / paper
  • Author/creator: Not specified in the provided notes
  • Link/URL:PDF
  • Who recommended it: Clement Delangue
  • Key takeaway: Delangue called the report "amazing" in a reply to its announcement.
  • Why it matters: It was the clearest research-report endorsement in this batch.

I Built an AI Company. Here’s Why AI...

  • Content type: Article
  • Author/creator: Not specified in the provided notes
  • Link/URL:The Free Press
  • Who recommended it: Marc Andreessen
  • Key takeaway: Andreessen said it contained "wise words" and matched what he is seeing "on the ground every day."
  • Why it matters: Even with limited detail in the notes, this stands out as a current-state AI signal from an investor saying the piece tracks lived reality.

What stands out

The strongest recommendations here were tied either to a clear mental-model update or to a credible read on current AI reality. Kjun Chu supplied the deepest model-changing explanation through Deutsch's book on progress , Bill Gurley turned Liang Wenfeng into a concrete cross-border study prompt , and Delangue plus Andreessen surfaced two AI documents they found worth readers' time .

Alphabet Funds the Buildout as Local, Verified, and Domain-Specific AI Gains Ground
Jun 4
4 min read
281 docs
OpenAI
OpenAI Newsroom
Greg Brockman
+11
The day’s biggest AI news centered on control: more capital for infrastructure, more emphasis on local and customizable models, and growing investment in verified or domain-specific systems. The digest also covers NVIDIA’s physical-AI releases, OpenAI’s enterprise and safety updates, and a continual-learning research idea worth watching.

Capital, openness, and control

A few of today’s biggest stories pointed in the same direction: the AI buildout is still getting larger, but the product layer is moving toward local deployment, private evaluation, formal verification, and domain-specific systems instead of reliance on one generic model .

Alphabet finances the buildout while Google pushes local open models

Alphabet said its AI investment strategy raised about $45B in an oversubscribed equity offering, with another $40B coming through an at-the-market program starting in Q3, for roughly $85B total; Berkshire Hathaway invested $10B . On the product side, Google released Gemma 4 12B, a unified encoder-free multimodal model under Apache 2.0 designed to run locally on laptops with 16GB VRAM, as the Gemma family passed 150 million downloads . Why it matters: Google is working both ends of the stack at once — securing more capital for AI infrastructure while still pushing open, laptop-scale deployment .

Microsoft’s MAI follow-through is increasingly about control, not just benchmarks

Microsoft’s MAI-Thinking-1 remains the headline model: a 35B active-parameter MoE with 256K context, 97% on AIME 2025, 53% on SWE-Bench Pro, and optimization for Microsoft’s MAIA 200 chip, which the company said delivers 30% better performance per dollar than GB200 for MAI workloads . Commentary on the MAI tech report emphasized unusually detailed MFU reporting, a full scaling-ladder recipe, and a training approach described as using no synthetic data or distillation . Why it matters: Microsoft’s own framing is that Frontier Tuning, RLEs, and private evals let companies turn general models into organization-specific agents; it cited public/private benchmark parity with GPT-5.4 and up to 10x efficiency in Excel-focused agentic use cases .

Axiom Math turns verified AI into a major funding story

Axiom Math said it raised a $200M Series A at a $1.6B valuation, and Latent Space reported that the seven-month-old company solved all 12 Putnam problems while also releasing AXLE, a toolkit for exploring, validating, and manipulating Lean proofs . Its core argument is that verification is not mainly about removing hallucinations, but about improving generation quality and sample efficiency, with eventual transfer to coding and reasoning .

“Verification to me is about scaling brilliance, compounding brilliance”

Why it matters: This is one of the clearest signs that formally verified generation is moving from a niche research idea to a well-capitalized frontier bet .

Systems moving into the real world

NVIDIA makes a broad open push in physical AI

At CVPR, NVIDIA rolled out new physical AI agent skills for autonomous vehicles, robots, and vision systems, alongside Cosmos 3, an open omnimodel for physical AI, and Alpamayo 2 Super, an open 32B VLA model for level-4 driving development . The company also released open datasets and tools — including GRAIL with roughly 50 hours of humanoid interaction data — and highlighted new research such as GraspGen-X trained on 2 billion simulated grasps, LCDrive’s latent-space driving reasoning, and NitroGen, which improved low-data gameplay performance by up to 52% . Why it matters: NVIDIA is packaging models, tools, and data together so physical-AI teams can move from isolated demos to end-to-end workflows faster .

OpenAI broadens from general productivity to vertical science, while safety work gets more concrete

OpenAI said Codex now has more than 5 million weekly active users and is being used not just for code, but across research, analysis, content, and operations . It also expanded GPT-Rosalind, its enterprise life-sciences model series, and said the new capabilities combine GPT-5.5’s agentic coding and tool use with stronger performance for drug discovery, analysis, design, and experimental workflows . Separately, OpenAI published a frontier safety blueprint for democratic governance and durable US institutions, while Anthropic said it examined 832 malicious accounts and mapped their activity to MITRE ATT&CK . Why it matters: Providers are trying to expand everyday usage, deepen into specific high-value domains, and show more concrete safety and governance mechanisms at the same time .

Nested Learning is a research idea worth tracking

Ali Behrouz’s Nested Learning architecture updates different parts of a model at different frequencies so it can adapt to new context while preserving core knowledge, a design described in coverage as a possible paradigm shift . In reported results, it matches Transformers on standard benchmarks while outperforming on 10M-token recall and simultaneous translation of multiple previously unseen languages, and a related “sleep” phase distills knowledge from fast-updating layers into slower ones while generating synthetic data for abstraction learning . Why it matters: If the results hold up, this is a concrete alternative to the usual pattern of chasing capability mainly through more layers, because the underlying claim is that scaling may also come from nesting more update frequencies .

Live-Code Decisions, Experimentation Culture, and the New AI PM Playbook
Jun 4
4 min read
124 docs
Mind the Product
Product Management
Melissa Perri
+4
Teams are moving more product decisions into working code, leaders are relearning how to make experimentation stick, and PMs have new guidance on AI evaluation, stakeholder influence, and frontier-lab career prep.

Big Ideas

  • Product quality decisions are moving from mocks to live code. Anthropic’s Head of Design said quality gates have shifted from PRDs, mocks, and Figma into working code, with small 3-5 person pods making decisions and releasing internally before expanding externally based on real adoption. Why it matters: PMs can evaluate actual behavior earlier, not just intent. Apply it: replace some review cycles with working prototypes and internal dogfooding, then judge success on adoption, retention, and revenue rather than token counts alone.

  • Experimentation only sticks when leadership turns it into culture. David Bland warns that experimentation becomes theater when teams run tests only to justify a launch they already want. Monica Lewis adds that leaders need to normalize mistakes, share early thinking, and create discovery time, or teams revert to old habits. Why it matters: process without leadership behavior rarely lasts. Apply it: point experiments at real high-uncertainty opportunities, review what was learned, and have leaders keep modeling the behavior publicly.

It was in our bloodstream, but it wasn’t in our DNA

Tactical Playbook

  1. Use signal prep before high-stakes meetings.

    • Answer three questions: What do I need from this room? What is my one-line recommendation? What will people repeat without me?
    • Lead with the destination, not a long backstory. That is especially useful for PMs who default to detail to prove credibility and then get labeled non-strategic.
    • Why it matters: it shifts you from giving updates to leading a decision. Apply it: do a short prep pass before roadmap reviews, exec syncs, and stakeholder negotiations. In one coaching case, this shift changed how a Head of Product was perceived within 2-3 months.
  2. For AI products, choose metrics by task, not convenience.

    • Define the task precisely first. Accuracy can hide failure in imbalanced problems; F1 is more useful for fraud, credit risk, and document classification.
    • Use BLEU when the main risk is saying the wrong thing, ROUGE when the main risk is leaving out the right thing, Exact Match + token F1 for extractive QA, and perplexity for model selection rather than production health.
    • Why it matters: a single metric is easy to game. Apply it: track at least two complementary metrics and pair them with human evaluation before shipping.

Case Studies & Lessons

  • Claude Code’s operating model: Anthropic said the product made $2.5B in its first year and reached about 51% of the coding market. The team ships through small pods, supports broad shipping authority with code review/CI/testing, and expands from internal use to external rollout after seeing real adoption. Enterprise growth has also been bottom-up, with developers becoming internal advocates and teams building connectors and tooling around the product. Lesson: speed scales when governance and shared infrastructure scale with it. Apply it: ship smaller internal-first releases and invest early in the tooling that makes adoption easier across a team.

  • OpenAI PM leverage through synthesis: Abhi Muchhal’s setup includes a daily Slack triage for blockers and deadlines, a self-updating market dashboard pulling from 7-8 sources, and a weekly stakeholder update drafted from Slack, Drive, Notion, and dashboards. Lesson: the highest-value PM automations are often synthesis workflows, not generic note-taking. Apply it: start with one recurring digest or dashboard that pulls from multiple systems but still keeps a human review step before anything goes out.

  • Copilot’s early signal: Mario Rodriguez said initial acceptance rates were only 20-30%, yet the product still created major value when suggestions were useful. Lesson: a weak surface metric can still mask strong product value. Apply it: pair AI interaction metrics with downstream outcome metrics and keep the learning loop fast.

Career Corner

  • Frontier-lab PM hiring still starts with PM fundamentals. Aakash Gupta’s reporting says strong candidates show structured thinking, analytical decision-making, and communication under ambiguity, then prove AI fluency by building a real API-based project and speaking the language of evals: capability, baseline, and improvement criteria. Why it matters: tool familiarity alone is not the bar. Apply it: bring one real project you built and be ready to explain how you measured whether it improved.

  • For strategy and design interviews, rehearse a default structure. One practical format is context, goal, user, constraints, options, tradeoffs, decision. Candidates also recommended practicing on a company’s top products and starting with clarifying why-questions. Why it matters: these interviews reward structured thinking under pressure. Apply it: practice aloud with a timer until the framework feels automatic.

Tools & Resources

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 109 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+106

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.