We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Your intelligence agent for what matters
Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.
Your time, back
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
3 steps to your first brief
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Review and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Get your briefs
Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.
Clément Delangue
Sarah Guo
Elad Gil
Funding & Deals
Scout AI — $100M into defense VLA models. Scout AI raised $100M after a $50M seed round. The company is building AI models to operate military vehicles, with the pitch that vision-language-action models can improve precision and reduce collateral damage; it already has contracts with DARPA and the U.S. Army. CEO Colby Adcock is also noted as the brother of Figure AI founder Brett Adcock.
BMW i Ventures — $300M AI fund aimed at industrial transformation. BMW i Ventures launched a $300M third fund, bringing assets under management to $1.1B, with BMW AG as sole LP. The fund is focused on foundational AI that can reshape automotive and adjacent industries, not just auto-specific software; portfolio company Scenario is cited as adding AI agents to design and engineering workflows for faster iteration.
Baseten + Parsed — inference clouds are moving upstream into post-training. Baseten says it grew 30x over the last year and expects to exceed $1B in revenue this year, with 95%+ of tokens running on custom or post-trained models rather than vanilla open-source weights. Its acquisition of Parsed, a post-training startup and former customer, reflects demand for tighter integration between post-training expertise and inference infrastructure.
Emerging Teams
Sentient OS — deep on-device AI from a student founder. A UMass CS student says he spent about a year building a custom on-device vision LLM that processes screenshots, notes, files, and emails overnight while a device charges, enabling natural-language retrieval, proactive reminders, and knowledge graphs without sending data to the cloud. The founder says the stack required modifications to Apple’s MLX framework, vision transplanted from a 4x larger model, custom quantization work, and currently processes about 3,000 screenshots on a six-year-old iPhone.
Forum AI — expert-authored evals for high-stakes domains. Founded by former Meta head of news and journalist Campbell Brown, Forum AI evaluates foundation models on geopolitics, mental health, finance, and other nuanced areas by capturing elite experts’ reasoning and training LLM judges to about 90% consensus with them. Brown says rising bias-audit requirements in hiring and lending are creating demand, while existing audits miss more than half of violations.
Dognosis — unconventional diagnostics with unusually strong early data. Dognosis, founded by Akash Kulgod, uses dogs sniffing breath samples while EEG, sensor suits, and video convert canine judgments into signals for AI fusion. In a 3,275-participant Phase 2 study across six hospitals, it posted 90.8% sensitivity and 91.3% specificity across seven cancers, with 90.6% sensitivity at Stage I-II; next steps are rollout across Indian states and a U.S. study.
Readdit Later — small but real willingness-to-pay signal. The founder describes Readdit Later as a first product: a Chrome extension with an AI agent that searches saved Reddit posts in plain English and resurfaces relevant summaries. The founder reports 53 paying customers and $519 in revenue so far.
AI & Tech Breakthroughs
Recursive reasoning is challenging brute-force scaling assumptions. YC’s discussion of HRM and TRM highlights a 27M-parameter hierarchical model that reached about 70% on ARC Prize 1 from scratch on roughly 1,000 tasks with no pretraining, and a simplified 7M-parameter recursive model that improved to 87%. The key thesis is that recursion at inference time gives small models the compute depth to break through reasoning ceilings that standard LLMs hit.
Waymo shows world models moving from research framing to scaled deployment. Waymo describes its foundation model as a multimodal world-action-language model spanning vision, lidar, and radar, and says it powers the driver, simulator, and critic. The company pairs end-to-end learning with structured intermediate representations for runtime validation, closed-loop training and evaluation, and RL rewards; it says the system has now powered more than 20M autonomous rides and is 13x safer than human drivers in serious-injury collisions in its operating cities.
Inference efficiency is still improving faster than many infrastructure models assume. An MIT pruning result cited on All-In claims networks can be reduced by 90% with no accuracy loss, enabling 10x lower inference cost and 10x more output per energy unit through dynamic small-model selection. Separately, Fei-Fei Li says inference costs have fallen about 280x in the last 2-3 years through distillation, quantization, and the shift from 32-bit to 4-8-bit GPU computation.
Agent infrastructure is shifting from framework abstractions to context and action layers. Jerry Liu argues the durable moat for agents is now the context layer—especially turning complex documents into clean, usable context—rather than the developer abstractions that mattered in 2023. In parallel, Harrison Chase points to web-browsing agents as a key next step, and Browserbase’s DeepAgents integration now exposes search, fetch, and browser subagents with full observability.
"What survives is the data layer because agents are only as good as the context they get, and the best context in any enterprise is still locked in PDFs, contracts, and filings."
Market Signals
Custom models and open weights are becoming the production default, not the fallback. Bindu Reddy says Kimi 2.6 and GLM 5.1 are already very close to closed models on performance, with speed as the remaining gap, and says Abacus.AI is moving batch jobs to open source because closed APIs are too expensive. Baseten says open-source capability has crossed a chasm, 95%+ of tokens on its platform now come from custom/post-trained models, and customers are not running vanilla weights. Baseten also argues DeepSeek-class models can run at about 20% of the cost of frontier closed APIs with better latency and reliability, while Hugging Face expects future workloads to skew heavily toward open, specialized, and local models.
Power and capacity constraints remain the main governor on model supply. All-In argues the market is power-constrained, with less than half of announced gigawatt-scale projects actually under construction, and says hyperscalers alone are guiding to $725B in 2026 capex. On the operator side, Baseten reports mid-90s utilization across 90 clusters in 18 clouds, and says large GPU allocations now often require 3-5 year contracts with 20-30% prepaid TCV.
Seed financing is concentrating into fewer, larger bets. Crunchbase data cited by Newcomer says seed rounds of $10M+ absorbed more than half of all seed-stage capital last year, even as overall seed deal count continued to decline. Harry Stebbings amplified the related view that early-stage venture is shifting toward fewer but much bigger winners, increasing the need for more shots on goal.
AI is widening the gap between infrastructure winners and vulnerable application SaaS. SaaStr’s read is that infrastructure vendors such as Twilio, Cloudflare, and Snowflake are re-accelerating because every AI-native startup needs network, data, or voice layers, while per-seat application SaaS is under pressure unless it genuinely rebuilds around AI. Twilio added 43,000 net new accounts in Q1, voice revenue grew 20% on AI-agent workloads, and software add-ons grew more than 100% YoY. At the same time, SaaStr flags GitHub Copilot, Cursor, and other agents as threats to seat-based developer tools, while Harry Stebbings argues the longer-term risk is that agents choose the vendors and models for workflows. Replit is a counterexample: it says it now reaches 85% of the Fortune 500, sees very low enterprise churn with roughly 300% net retention in some cases, and differentiates via vertically integrated hosting and security plus a multi-model “society of models.”
Worth Your Time
- Recursion Is The Next Scaling Law in AI — the clearest explainer in the set for why 27M and 7M recursive models are beating far larger systems; timestamps: 04:22 for HRM, 09:46 for TRM, 20:46 for comparison.
- Waymo’s Dmitri Dolgov: 20 Million Rides and the Road to Full Autonomy — useful if you want one conversation that connects world models, deployment architecture, and live safety data.
Campbell Brown on Founding Forum AI — strongest discussion here on nuanced evals, bias audits, and why current compliance workflows are failing high-stakes use cases.
Jerry Liu on the context layer for agents — concise thread on why document infrastructure and data access may be the durable moat in agent stacks.
Atlassian and Twilio Crush the Quarter, Accelerate. Is the SaaSpocalypse Over? — best single read in the set on infra re-acceleration versus seat-based SaaS pressure from agents.
Alexander Embiricos
OpenAI Developers
Cognition
🔥 TOP SIGNAL
Today’s real edge is persistent agent workflow design, not another model leaderboard . Alexander Embiricos says OpenAI growth teammate Sahil Punamia’s internal "Lord Bottleneck" started as separate Codex-assisted steps and became a daily loop that reviews past experiments, proposes new ones, generates code/config after the team picks, and runs again . Karpathy makes the matching interface point from the tooling side: specify the outcome, let the agent adapt to the local machine, and let it debug setup in the loop .
⚡ TRY THIS
Turn repeated work into a morning agent loop. Embiricos’ "Lord Bottleneck" pattern is straightforward: start by using Codex on each subtask separately—data analysis, experiment ideation, code generation, running the experiment, results analysis, deck writing—then stitch those steps into one reusable skill, then tell it to run every morning . The durable pattern is the important part: don’t start with full automation; chain together the steps that already work .
Write install docs as a prompt, not a shell script. Karpathy’s OpenClaw example: instead of shipping a giant cross-platform installer, publish a copy-paste prompt that tells the agent the desired outcome and available tools; the agent can inspect the environment, handle platform differences, and debug setup itself . For Here Now, the whole install flow was effectively:
"I'd like you to set up here now the web hosting and cloud storage service for agents install as a skill if I have npm and if not, do this instead."
If you build dev tools, Karpathy’s broader complaint is worth taking literally: docs should answer "what is the thing I should copy paste to my agent?"
Prototype in two stages: data pipeline first, UI prompt second. Simon Willison built an iNaturalist viewer entirely on his phone with Claude Code for web: first he created a small Python CLI to fetch and "clump" observations; then he ran that in a git-scraping repo to emit
clumps.json; only then did he prompt for the frontend . His exact UI prompt was:Build inat-sightings.html - an app that does a fetch() against https://raw.githubusercontent.com/simonw/inaturalist-clumps/refs/heads/main/clumps.json and then displays all of the observations on one page using the https://static.inaturalist.org/photos/538073008/small.jpg small.jpg URLs for the thumbnails - with loading=lazy - but when a thumbnail is clicked showing the large.jpg in an HTML modal. Both small and large should include the common species names if availableIf you script Claude Code, watch your recent commit messages. Theo highlighted Claude Code’s programmatic
-pprompt mode and appended system prompts for automation, but warned that recent commit history mentioning tools like OpenClaw orHermes MDcould trigger refusals or extra billing, even in an empty repo, based on his demo . His practical warning was simple: be careful what you put in commit messages when using Claude Code .
📡 WHAT SHIPPED
- Crabbox 0.1.0 — Peter Steinberger’s answer to "too many agents, too many test suites": remote Linux test boxes on AWS/Hetzner, dirty checkout sync, warm boxes with friendly slugs, and idle auto-free. Install with
brew install openclaw/tap/crabbox. Site: crabbox.sh - Codex import flow — OpenAI added migration from other agents: import projects, settings, plugins, agents, and project configuration "in just a few clicks," with Romain Huet framing it as a seamless move to Codex
- Codex pets — New
/petfeature for persistent context. Tibo says it makes him more productive because the context follows him while multitasking, and Riley Brown says his agent could still write to a pet notebook created four days earlier - Devin inside the shell — Cognition added shell integration: run
devin shell setup, hitCtrl+G, and Devin can see the current terminal screen to help in place llm-openai-via-codex 0.1a0— Simon Willison released a plugin that reuses Codex CLI credentials for API calls viallm. Release: llm-openai-via-codex 0.1a0- GPT-5.5 migration guidance for coding agents — OpenAI says Codex and the main model line are unified, suggests treating GPT-5.5 as a fresh model family, and provides a Codex-side migration path with
$openai-docs migrate this project to gpt-5.5; for multi-step tasks, the agent should send a short user-visible update before tool calls
🎬 GO DEEPER
- 21:35-23:27 — Karpathy on "vibe coding" vs "agentic engineering." Best short framing in today’s source set: vibe coding raises the floor, agentic engineering raises the ceiling, and the job becomes coordinating fallible agents without dropping the quality bar .
- 20:51-24:14 — Theo on why commit history can leak into Claude Code behavior. Worth watching if you wrap Claude Code in harnesses or scripts: his demo argues recent git history gets surfaced in a way that can affect behavior and billing .
Study the tiny-but-complete prototype chain. Simon’s simonw/inaturalist-clumper plus simonw/inaturalist-clumps is a clean pattern: build a small data-collection CLI, turn it into continually refreshed JSON, then prompt the agent for the UI on top
Read the release surfaces, not just the screenshots. If you want inspectable details, start with Codex CLI 0.128.0 release notes, LLM 0.32a0 changelog, and crabbox.sh
Editorial take: the teams pulling ahead are not just picking better models; they’re turning agents into repeatable loops with durable context and infrastructure that can survive real work.
Artificial Analysis
ARC Prize
Matthew Lam
Top Stories
Why it matters: The biggest signals today were about hidden model risk, fast commercialization, and AI moving into more sensitive environments.
- Anthropic’s subliminal learning paper raises a new distillation safety problem. Anthropic and collaborators reported that student models can inherit traits, including misalignment, from teacher-generated synthetic data even when the data contains no explicit semantic reference to the trait and has been filtered for clean content. The transfer was also reported as architecture-specific: GPT-to-GPT worked, while GPT-to-Claude did not .
- OpenAI says GPT-5.5 is its strongest launch yet. One week after release, OpenAI said API revenue is growing more than 2x faster than any prior launch, while Codex doubled revenue in under seven days; separately, GPT-5.5, Codex, and Managed Agents were brought to Amazon Bedrock in limited preview .
- Frontier AI is moving onto classified networks. The DeptofWar CTO account said the department signed agreements with SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft, and AWS to deploy frontier capabilities on classified networks, framing the effort as part of an AI-first war department mandate .
Research & Innovation
Why it matters: The most useful research updates targeted coordination, long-horizon training data, and improving model behavior earlier in the pipeline.
- RecursiveMAS replaces agent-to-agent text chatter with latent-state transfer. The paper introduces a RecursiveLink module and shared credit assignment across heterogeneous agents; across nine benchmarks, it reported an 8.3% average accuracy gain, 1.2x-2.4x inference speedups, and 34.6%-75.6% lower token usage .
- Microsoft Research built 1,000 synthetic computers for training computer-use agents. Each simulated workflow averaged more than 8 hours of agent runtime and 2,000+ turns, and the team said training on this data improved both in-domain and out-of-domain productivity while scaling to millions or billions of synthetic worlds .
- Meta FAIR showed a way to push safety and factuality into pretraining itself. Using a strong post-trained model as both rewriter and judge during pretraining, the method reported 36.2% relative gains in factuality, 18.5% in safety, and up to 86.3% better generation quality than standard pretraining .
Products & Launches
Why it matters: Product releases are increasingly about agent workflow quality, local inference, and turning AI into routine software behavior.
- Codex added a more goal-oriented workflow. The new
/goalcommand sets a persistent objective, nudges the model toward the next concrete action after each turn, and maps requirements to evidence; OpenAI also added one-click workflow import for settings, plugins, agents, and project configuration . - Moondream shipped Photon 1.2.0 for edge vision inference. The release adds Apple Silicon, native Windows CUDA, Blackwell, and Jetson Thor support; the team also described custom Metal kernels and a fused token-sampling path that cut one step from 687µs to 130µs, while arguing local vision can beat cloud wall-clock latency by avoiding large image uploads .
- Google added agentic restaurant booking to Search and Maps. Users can describe constraints like group size, vibe, time, and dietary preferences, after which AI Mode or Ask Maps searches multiple reservation sources and returns options with booking links via partners such as OpenTable and Resy .
Industry Moves
Why it matters: Corporate strategy is shifting from model releases alone to robotics, internal automation, and data-layer bets.
- Meta pulled ARI into Meta Superintelligence Labs. ARI said it is joining MSL to build general-purpose humanoid intelligence and argued that scaling will come from learning directly from human experience, not teleoperation alone .
- Ramp says coding agents are now doing most of the merge work. The company said its in-house agent Inspect now writes about 70% of merged PRs, up from 30% when first shared; one team reported its Cloud Agent accounted for 80.3% of work/PRs over the last 14 days, helped by Slack-triggered workflows .
- Hightouch raised $150M at a $2.75B valuation. The company said it is building an AI platform for marketers, with commentary around the round emphasizing that marketing AI depends heavily on access to the right data foundations .
Policy & Regulation
Why it matters: Governments are starting to shape AI through both labor protections and direct industrial policy.
- Chinese courts ruled companies cannot fire workers simply to replace them with AI. In Hangzhou, a tech company’s reassignment and pay-cut strategy tied to automation was deemed illegal termination .
- Hangzhou enacted what it calls China’s first local regulation for embodied intelligent robots. The law defines the category, directs R&D support toward motion control, core components, and domestic chips, and requires public agencies to open application scenarios .
Quick Takes
Why it matters: A few smaller updates still sharpen the picture on capability, infrastructure, and open-model economics.
- ARC-AGI-3 remains extremely hard: GPT-5.5 scored 0.43% and Opus 4.7 scored 0.18%, with ARC Prize identifying three recurring failure modes .
- Azure says hosted OpenAI models now have 10x better latency and throughput, and one external monitor later reported Azure faster than OpenAI directly for GPT-5.5 .
- Open-weight leaders are still closing the gap: Artificial Analysis said Kimi K2.6 and MiMo V2.5 Pro tied at 54 on its Intelligence Index, within 3-6 points of top proprietary models and at half to one-sixth the price .
- NVIDIA Research says speculative decoding can ease RL rollout bottlenecks, with 1.8x higher throughput at 8B and a projected 2.5x end-to-end speedup at 235B .
David Sacks
Sam Altman
Clément Delangue
What stood out
There was no exact title overlap in today’s set, so the signal came from how specifically each person explained the value. The strongest picks either offered a usable operating principle for builders, a concrete way to think about AI constraints, or a better model of people and history.
Start here
The Myth of Sisyphus
- Content type: Book
- Author/creator: Albert Camus
- Link/URL: No direct book URL was provided; source context: Will Everyone Become an AI Builder? Clem Delangue on Hugging Face, Agents, Local AI & Robotics
- Who recommended it: Clem Delangue
- Key takeaway: Delangue uses Camus’s Sisyphus as a founder metaphor: the durable move is to enjoy the task of building itself rather than fixate on the end state, especially when AI’s pace makes people feel nervous, stressed, or overwhelmed
- Why it matters: This was the most compelling recommendation in today’s set because Delangue turned a philosophical work into a practical operating mindset for builders trying to stay creative and relevant under constant AI pressure
“adopting more of a mindset of just enjoying the the task, enjoying the the journey, the work is useful. And having fun so they can be creative.”
Two AI-era resources with concrete operating value
2026 Global Intelligence Crisis
- Content type: Research report
- Author/creator: Citadel Securities
- Link/URL:https://www.citadelsecurities.com/news-and-insights/2026-global-intelligence-crisis/
- Who recommended it: David Sacks
- Key takeaway: Sacks highlighted the report as a rebuttal to AI displacement narratives, pointing to rising software engineer job postings, continued acceleration to 18% above the prior inflection point, and expanding new business formation
- Why it matters: If you want one recommendation today that pushes back on ambient AI pessimism with labor and business-formation data, this was the clearest save
MIT paper on pruning techniques in neural networks
- Content type: Research paper
- Author/creator: MIT researchers; names were not specified in the source materials
- Link/URL: No direct paper URL was provided; source context: OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze
- Who recommended it: David Friedberg
- Key takeaway: Friedberg said the paper showed large models could be pruned by 90% with the same accuracy, enabling about 10x lower inference cost and about 10x more output per unit of energy
- Why it matters: Among today’s recommendations, this was the clearest pointer to an algorithmic path around compute and power constraints rather than simply asking for more infrastructure
Resources for understanding people and history
Human Universals (title as recalled by Sam Altman)
- Content type: Book
- Author/creator: Anthropologists; specific names were not provided in the source materials
- Link/URL: No direct book URL was provided; source context: Sam Altman's Vision For the Future!
- Who recommended it: Sam Altman
- Key takeaway: Altman described a book that tried to identify truly universal human traits by removing anything absent from even one culture; he said some results, like valuing travel, were not obvious to him in advance
- Why it matters: It is a useful recommendation for separating what may be broadly human from what is more culturally contingent
Archival issues of Soft Talk, Wired, Spy, and The New Yorker
- Content type: Magazine archives / longform articles
- Author/creator: Multiple publications
- Link/URL: No direct archive URLs were provided; source context: VirtualElena post and Marc Andreessen’s co-sign
- Who recommended it: Marc Andreessen, by explicitly co-signing VirtualElena’s recommendation
- Key takeaway: The recommendation is to mine 80s/90s longform because it contains “unparalleled” and “largely un-mined” alpha, and because reading the past deeply is presented as the best way to understand the present
- Why it matters: This stood out less as a single title than as a learning method: use archival primary material, not just current commentary, to sharpen judgment about today’s tech world
One clean article endorsement worth bookmarking
Have online worlds become the last free places for children?
- Content type: Article
- Author/creator: Not specified in the source materials
- Link/URL:https://psyche.co/ideas/have-online-worlds-become-the-last-free-places-for-children
- Who recommended it: Marc Andreessen
- Key takeaway: Andreessen called it “important, and obviously true”
- Why it matters: He added almost no extra exposition, but the clarity of the endorsement makes it a clean save for readers tracking what prominent tech investors think is worth reading about online life and childhood
Bottom line
If you only save one item today, save The Myth of Sisyphus. It had the clearest explanation of why the resource matters right now, and it translated directly into an operating principle for founders building through AI-driven volatility
Sarah Guo
Yann LeCun
Elad Gil
What stood out
Today’s signal was a little more sober than a normal launch cycle. OpenAI and xAI both posted strong commercial or price-performance claims, but the deeper story was about inference economics, open-model pressure, and fresh evidence that generalization and agent safety remain unresolved .
Frontier competition is getting measured in economics, not just demos
OpenAI says GPT-5.5 is its strongest launch yet
OpenAI said GPT-5.5 has become its strongest model launch one week after release, with API revenue growing more than 2x faster than any prior release. It also said Codex doubled revenue in under seven days, which it attributed to rising enterprise demand for agentic coding tools .
Why it matters: That commercial signal matches a broader pattern: coding agents are one of the clearest areas where AI demand is showing up quickly in real usage and revenue .
xAI pushes Grok 4.3 on price-performance and distribution
Artificial Analysis said Grok 4.3 now sits on the intelligence-versus-cost Pareto frontier, helped by 37.5% lower input pricing, 58.3% lower output pricing, and a roughly 20% lower evaluation cost than the prior version . Separate posts amplified claims that Grok 4.3 ranks #1 in caselaw, corporate finance, and law at 5-10x lower cost per 1M tokens than Opus 4.7 and OpenAI 5.5, and the model is already being distributed through Vercel’s AI Gateway with improved tool calling and instruction following .
Why it matters: The competitive pitch is increasingly explicit: better domain performance, lower inference cost, and faster placement into developer platforms .
The center of gravity keeps moving toward inference and enterprise deployment
Baseten says the real action is in custom models and scarce capacity
Baseten said it grew 30x year over year and expects to exceed $1B in revenue this year, with 95%+ of served tokens now coming from custom or post-trained models rather than vanilla open-source weights . It also described a severe capacity crunch across 90 clusters in 18 clouds running at mid-90s utilization, and said enterprise adoption is still early, with roughly 1% of the market online by inference count . Big Technology, separately, said enterprise AI applications are taking off while mainstream consumer breakout hits beyond ChatGPT still have not appeared, and chatbot daily active users have been flat or down in four of the past five months .
Why it matters: Cheaper inference is not reducing demand. Fei-Fei Li said Stanford HAI measured a roughly 280-fold drop in inference costs over the past 2-3 years, while Baseten said lower prices simply let customers run longer agents and embed more intelligence into products .
DeepSeek V4 and Qwen3.6 push the cost-and-locality story forward
DeepSeek V4 was described as near state-of-the-art across several benchmarks, with a 1M-token context window and pricing below GPT-5.5, Claude Opus 4.7, and Gemini 3.1 levels . Alibaba’s Qwen3.6-35B-A35, meanwhile, was summarized as a 35B-parameter MoE model with only 3B active parameters at inference, 73.4% on SWE-bench Verified, 262K native context expandable to 1M, Apache 2.0 licensing, and laptop-scale deployment claims .
Why it matters: Open-model competition is no longer just about catching up on benchmarks; it is also widening the range of cheap, private, and local deployment options .
Research kept providing a reality check
ARC-AGI 3 scores remain near zero for frontier models
ARC-AGI 3 scores cited this week remained extremely low: GPT-5.5 at 0.43%, Claude 4.6 at 0.45%, Gemini 3.1 at 0.4%, and Opus 4.7 at 0.18% . ARC Prize’s analysis of GPT-5.5 highlighted three failure modes: 'true local effect, false world model,' 'wrong level of abstraction from training data,' and 'solved the level, didn’t reinforce the reward' .
"RL is a bit of a double edged sword: in known territory performance increases, but in unknown territory the model tends to hallucinate that it is performing a completely different task it was trained on"
Why it matters: Product progress is real, but abstract generalization remains a very different problem from strong commercial launch metrics .
World models moved closer to the center of frontier research
In a public debate, Eric Xing presented GLP, PAN, and SLAM as a generative, stateful path toward world models and agent planning, including claims of stronger simulation reasoning and smaller-model planning performance against larger baselines . Yann LeCun argued for the opposite architectural instinct: non-generative JEPA-style world models that predict in latent space, ignore unpredictable detail, and support planning through abstraction; he also pointed to a released V-JEPA world model for robotics and simulations .
Why it matters: Even with major architectural disagreement, both sides are treating world models as essential for agentic AI beyond text-only book intelligence.
Agent deployment is colliding with governance
Tooling ecosystems are getting riskier as enterprises add more agents
PolicyLayer’s audit of 1,787 public MCP servers and 25,329 tools found that 40% of servers expose at least one destructive or command-executing tool, and that a typical five-server install has a 92% chance of including one risky tool . It also found 96.8% of tool descriptions lacked warning language, 47% of financial servers exposed destructive tools, and even 'official' registry servers carried the highest average risk weight .
At the same time, Microsoft said Agent 365 is now generally available, extending identity, security, governance, and management controls to AI agents and their interactions across the enterprise .
Why it matters: As agents gain access to more tools and workflows, governance is starting to look like a deployment prerequisite rather than a later compliance layer .
Lenny Rachitsky
Shreyas Doshi
andrew chen
Big Ideas
1) Deciding what to build is becoming the bottleneck
"when anyone can build, the person who decides WHAT to build becomes the bottleneck"
Andrew Chen says he is bullish on the PM role quietly becoming the most important role in tech again, and Lenny Rachitsky agreed .
- Why it matters: If building is easier to access, the choice of what to build becomes more consequential .
- How to apply: Treat deciding what to build as the core constraint in the role, not an afterthought to execution .
2) Consumer product strengths transfer to B2B only when paired with domain depth
Shreyas Doshi argues that product people with deep consumer experience plus user empathy and creativity often do very well in B2B, as long as they commit to acquiring deep domain expertise . He adds that AI is making it easier to acquire and leverage domain expertise, but PMs still need to appreciate its importance .
- Why it matters: Consumer instincts and creativity are portable; domain knowledge is not automatically portable .
- How to apply: If you are moving into B2B, make domain learning explicit and use AI to acquire and leverage team expertise rather than trying to bypass it .
3) Emotional churn is a B2B risk that healthy dashboards can miss
"Emotional Churn: when users are psychologically checked out but still in contract"
Run the Business describes emotional churn as the silent killer of B2B products . A key signal is that dashboards can look healthy while customers are already shopping for alternatives .
- Why it matters: Contracted revenue and surface-level product health can hide weakening customer commitment .
- How to apply: Look for poor onboarding, workflow friction, and integration gaps, then fix around faster time-to-value and re-engagement .
Tactical Playbook
1) Run an emotional-churn review before renewals surprise you
- Monitor core flows by cohort instead of relying only on top-line health metrics .
- Watch feature adoption for signs that engagement is thinning out .
- Treat silence as a signal; no feedback can be a warning sign .
- Investigate root causes such as poor onboarding, workflow friction, and integration gaps .
- Fix around time-to-value: re-onboard disengaged users, empower power users, and show customers you are listening .
Why this works: Emotional churn often appears before contractual churn, while standard dashboards still look fine .
2) Audit your real prioritization process, not just the documented one
- Ask the hard questions earlier; one PM/founder says many failures had the same shape because those questions came too late or not at all .
- Put the documented process and the real process side by side .
- Check whether decisions are actually being driven by HiPPOs, the biggest customer, or the CEO's latest mention rather than the official framework .
- Watch for the opposite failure mode too: good ideas can get strangled in process before they get a chance to prove themselves .
- Close the loop with data on shipped features the team did not believe in .
Why this works: The problem is not only bad ideas getting through. It is also good ideas being blocked by process theater or post-hoc justification .
3) Use a self-improving AI skill on one recurring PM workflow
- Start with a repeated task such as competitive monitoring .
-
Install Hermes and drop the toolkit into
~/.hermes/skills/so skills load automatically . - Let the agent rewrite the workflow every 15 tool calls based on what worked in the session .
- Use the self-rewriting behavior from the last 10 sessions to keep improving the workflow over time .
- Keep the prompt constant and compare time and output quality over several weeks .
-
Use the included files—
SKILL.md,SOUL.md,USER.md, and the 30-day rollout plan—to structure the rollout .
Why this works: In the cited example, the workflow improved materially without changing the prompt itself .
Case Studies & Lessons
1) Hermes competitive monitoring improved without a prompt rewrite
In one PM workflow, a competitive monitoring briefing using the same prompt every Monday fell from 20 minutes in week one to 12 minutes in week four and 8 minutes by week six . By week six, the briefing was surfacing competitor patterns the author had not caught during three weeks of manual work, while the underlying skill had rewritten itself four times .
- Lesson: For recurring PM work, a learning workflow can improve results even when the prompt stays fixed .
- How to apply: Pick one repeated PM task and measure week 1 versus week 4 versus week 6 with the prompt held constant .
2) Community field report: prioritization theater creates bad ships and tired teams
A PM/founder collecting stories says many failures shared the same pattern: hard questions were asked too late or not at all . Teams often had a documented prioritization process—RICE, ICE, weighted scoring, Aha!, Productboard—but a different real process driven by HiPPOs, the biggest customer, or the CEO's last all-hands mention . PMs were described not as cynical, but as tired after shipping things they did not believe in and then seeing the data confirm their doubts later . The opposite pattern also appeared: good ideas getting strangled in process before they could prove themselves .
- Lesson: Better prioritization is not about adding more framework language. It is about surfacing the real decision logic early .
- How to apply: Ask the hard questions sooner, make leadership overrides explicit, and preserve room for promising ideas to earn proof .
Career Corner
1) Domain depth is showing up as both a product advantage and a hiring filter
Shreyas Doshi's point on B2B success depends on deep domain expertise , and one senior PM candidate used the same logic in the job market by targeting only B2B domains where they already had depth and skipping B2C roles entirely .
- Why it matters: Domain depth appears to improve both product effectiveness and search efficiency .
- How to apply: Narrow your search and your bets to areas where you can show real domain understanding, and make domain learning explicit if you are crossing over .
2) Senior PM hiring is slow enough that silence is not always signal
A Principal/Staff PM candidate applied to 89 postings, about 30 per week, using Claude to match skills and generate tailored resumes, and only applied after vetting roles at roughly 80% fit. That produced about a 3% application-to-full-loop conversion rate . The largest bucket was no response, and recruiters sometimes came back after about 3 weeks while still reposting the role .
- Why it matters: A slow or quiet funnel can still be normal for senior PM searches right now .
- How to apply: Source recent roles, tailor aggressively, filter for fit, and do not overread early silence .
3) AI leverage is being discussed in recruiter and hiring-manager screens
The same candidate said almost all recruiter and hiring-manager calls asked how they leverage AI in day-to-day work, so they began including that in tailored resumes even when the job description did not mention it .
- Why it matters: AI fluency is showing up as a practical evaluation topic, not just a keyword in the JD .
- How to apply: Be concrete about how AI changes your daily PM workflow and make that visible in your resume and interview examples .
Tools & Resources
- Hermes starter kit (PM-built): A self-improving PM workflow system with model-agnostic runtime, support across Telegram, Slack, WhatsApp, Discord, and Signal, plus a toolkit containing
SKILL.mdfiles,SOUL.md,USER.md, and a 30-day rollout plan . - Emotional Churn: A useful B2B retention diagnostic for spotting psychologically checked-out users before contract churn shows up in the numbers .
- Many product ideas ship that never should have: A strong discussion prompt for PM teams that want to examine late validation, prioritization theater, and the risk of over-processing good ideas .
Start with signal
Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.
Coding Agents Alpha Tracker
Elevate
Latent Space
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Luis von Ahn
Khan Academy
Ethan Mollick
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
VC Tech Radar
a16z
Stanford eCorner
Greylock
Daily AI news, startup funding, and emerging teams shaping the future
Bitcoin Payment Adoption Tracker
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Google DeepMind
OpenAI
Anthropic
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Paul Graham
David Perell
Marc Andreessen 🇺🇸
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media
PM Daily Digest
Shreyas Doshi
Gibson Biddle
Teresa Torres
Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications
AI High Signal Digest
AI High Signal
Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem
Frequently asked questions
Choose the setup that fits how you work
Free
Follow public agents at no cost.
No monthly fee