Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Agent Science Breakthroughs, Cyber Milestones, and Open Models Narrow the Gap
Apr 14
9 min read
674 docs
MiniMax (official)
Addy Osmani
Steve Yegge
+36
AI agents posted a measurable math breakthrough, Anthropic’s Claude Mythos cleared a major cyber evaluation, and smaller open models kept closing the gap with proprietary systems. This brief also covers new agent platforms, enterprise strategy shifts, and the latest governance signals across AI.

Top Stories

Why it matters: The clearest signal this cycle is that AI systems are getting more operational: agents are producing research results, cyber models are clearing harder end-to-end evaluations, and smaller open models are moving closer to proprietary benchmarks.

EinsteinArena turns agent collaboration into a math result

Together AI said EinsteinArena is an open-source platform where AI agents collaborate on open science problems, build on one another’s work, and compete on a live leaderboard . Its standout result was an improvement on the 11-dimensional Kissing Number, from 593 to 604 spheres, on a problem Together described as open since Newton . The reported workflow was iterative: one agent proposed a nearly valid construction, others reduced overlap loss from 1e-13 to 1e-50 with LSQR, and a final integer-snapping step produced a verified solution . Together also said agents had already set 11 new SOTA results on other open problems as of April 11 .

Impact: This is stronger evidence for agent systems as collaborative search tools, not just answer generators.

Claude Mythos Preview clears an AISI cyber range

The AI Security Institute said Claude Mythos Preview is the first model it evaluated that completed an AISI cyber range end-to-end . A separate post said Mythos one-shotted a cyber evaluation that takes humans roughly 20 hours . Another analysis note said Mythos reaches the same performance as Opus with about 40% of the tokens after roughly 10 million tokens of use .

Impact: The important shift is from single-task cyber demos to end-to-end operational evaluations, which raises the value of formal testing and deployment safeguards.

Sub-32B open models close in on GPT-5 tiers

Artificial Analysis said Qwen3.5 27B (Reasoning) matches GPT-5 (medium) at 42 on its Intelligence Index, while Gemma 4 31B (Reasoning) matches GPT-5 (low) at 39 . Both families ship reasoning and non-reasoning variants with native multimodal input, scoring 75% and 73% on MMMU-Pro respectively . The same analysis said Gemma is more token-efficient, non-reasoning modes stay competitive at much lower token budgets, and both models fit on a single H100 or on a quantized MacBook .

Impact: Open weights models are getting more deployable without fully closing the gap on factual knowledge, where the same analysis says they still trail GPT-5 variants .

Sakana’s AI Scientist gets a Nature paper — and exposes the remaining gap

Sakana AI said its AI Scientist work was published in Nature. The company highlighted a core finding: better base models improve the quality of generated papers, which it framed as a quantitative link between model quality and research quality — a kind of "scientific research scaling law" . An interview excerpt described publication in a top journal as a "science Turing test" moment for the system , while Sakana also said the current system still lacks originality and that generated papers are watermarked and the experiments had ethics/IRB approvals .

Impact: The result is notable less as proof of autonomous discovery than as a strong signal that paper-generation quality scales with model quality, while novelty remains unsolved.

Research & Innovation

Why it matters: The research pipeline is concentrating on verification, memory, and serving efficiency — the pieces that determine whether agent systems are trustworthy and practical.

  • LLM-as-a-Verifier: Researchers said a simple test-time method can reach SOTA on agentic benchmarks by asking an LLM to rank candidate outputs from 1 to k, then converting the log-probabilities of those rank tokens into an expected verification score . The method produces a score in a single sampling pass per candidate pair and targets the "winner selection" bottleneck in test-time scaling .
  • Introspective Diffusion Language Models: Together AI’s I-DLM was presented as the first diffusion language model to match autoregressive quality while outperforming prior diffusion models on quality and serving efficiency . Another description said it unifies introspection and generation in a single pass, reaches AR-thinking-level quality with 5B training tokens, and converts higher tokens-per-forward-pass into real throughput gains under high-concurrency serving . Together AI also claimed roughly 3x higher throughput than prior SOTA DLMs .
  • ParseBench: LlamaIndex open-sourced what it called the first OCR benchmark for the agentic era, built from about 2,000 human-verified enterprise document pages and 167,000+ test rules across tables, charts, content faithfulness, semantic formatting, and visual grounding . Its early findings were that charts are especially hard, extra compute delivers diminishing returns, and no parser dominates every dimension; LlamaParse posted the highest overall score at 84.9% .
  • DeepSeek Engram critique: A reproduction thread argued that Engram’s "billion-parameter external N-gram memory table" acts more like regularization than a true knowledge store . In its controlled experiments, random noise or a shared vector performed close to the real memory table and far above a dense Transformer baseline, leading the authors to credit the gains to context-aware gating and an extra residual path rather than memory content . Follow-up replies called the result "insane if real" and noted that sparse N-gram tables can be ignored or confounded by optimization issues .
  • Noisy verifiers in RLVR: A separate RLVR note reported that adding controlled or LLM-based noise to reward signals hurts training less than expected: up to 30% noise kept performance within 4 percentage points of the clean baseline . The author argued this matters because real-world semi-verifiable domains rarely have perfect verifiers .

Products & Launches

Why it matters: Labs are turning agent ideas into actual user-facing infrastructure: hosted runtimes, domain-specific workers, local control panels, and better document tooling.

  • Claude Managed Agents: Anthropic launched Claude Managed Agents in public beta as a suite of composable, cloud-hosted agent APIs that abstracts away sandboxing, state management, permissioning, and orchestration .
  • Harvey Agents: Harvey introduced agents that execute legal work end-to-end, reasoning through tasks and drafting memos, presentations, and diligence reports ready for review .
  • Vercel open-agents.dev: Vercel open-sourced a reference platform for cloud coding agents, built on its Fluid, Workflow, Sandbox, and AI Gateway infrastructure .
  • Hermes Agent v0.9.0: NousResearch shipped "The Everywhere Release," whose most prominent new feature is a local web dashboard launched with hermes dashboard for monitoring and managing agents . Hermes also added straightforward backup and import commands for moving agents between machines .
  • GitHub Copilot CLI remote sessions: GitHub added /remote, letting users continue a Copilot CLI session from any device with one click .
  • liteparse: Jerry Liu introduced liteparse as a free, open-source PDF parser designed for agents, with native OCR and screenshot support for deeper visual document understanding .

Industry Moves

Why it matters: The corporate competition is increasingly about retention, internal agent deployment, and turning AI usage into durable business process advantage.

  • OpenAI is talking more openly about competition and lock-in. Its chief revenue officer sent employees a four-page memo emphasizing user lock-in, moat-building, and enterprise growth, while also taking aim at Anthropic .

"The market is as competitive as I have ever seen it"

  • Vercel says the software moat is shifting. In announcing open-agents.dev, Guillermo Rauch argued that off-the-shelf coding agents struggle with huge monorepos, institutional knowledge, integrations, and custom workflows. His conclusion is that the moat moves from code itself to the "means of production" of code, and he positioned open-agents as infrastructure for internal or user-facing agentic coding platforms .
  • Google’s internal AI adoption is being described in sharply different ways. Steve Yegge said Google looks like the rest of the industry — 20% agentic power users, 20% refusers, 60% still using Cursor-like chat tools — and blamed hiring freezes plus the inability to use Claude Code internally . Demis Hassabis called that account "completely false" , while Addy Osmani said more than 40,000 Google software engineers use agentic coding weekly and have access to internal tools, orchestrators, agent loops, and virtual SWE teams .
  • Snowflake is trying to turn AI usage into predictable enterprise spend. The company said it now has more than 9,100 accounts using AI and 125% net retention . Snowflake Intelligence reached 2,500+ accounts in three months, and Snowflake said it will add per-user caps so agent pricing stays consumption-based but predictable . It also highlighted Cortex Code and a deepened partnership with Anthropic .
  • Capital is still flowing into applied AI. Modus Audit raised $85M to expand AI across audit and accounting workflows , while Perplexity’s founder said the company grew revenue 5x from $100M to $500M with only 34% team growth .

Policy & Regulation

Why it matters: The governance signal this cycle came mostly through safety evaluation, defense engagement, and institutional readiness rather than formal rulemaking.

  • Cyber capability is being evaluated institutionally. The AI Security Institute said Claude Mythos Preview is the first model to complete its cyber range end-to-end . A separate analyst argued that releasing a preview, testing the breadth of capabilities, and informing the public is the responsible way to handle a system with this kind of capability .
  • Sakana is engaging both scientific and defense institutions. The company said its AI Scientist papers are watermarked and that experiments were conducted with ethics and IRB approvals . Separately, Sakana AI co-founder Ito Ren said he met Japan’s defense minister and the minister’s direct AI team lead to discuss the future of AI in defense .
  • Google DeepMind added an explicit AGI-readiness philosophy role. A newly recruited philosopher said the job focuses on machine consciousness, human-AI relationships, and AGI readiness, while continuing part-time research and teaching at Cambridge .
  • Trust & Safety remains an active research and policy-adjacent area. Google Research said its CHI2026 session will discuss AI, user vulnerability, and how to move from describing digital harms to preventing them .

Quick Takes

Why it matters: These smaller items point to where tooling, evaluation, and deployment practice are moving next.*

  • Netflix described an LLM-as-a-Judge system for show synopses that combines tiered reasoning, 5x consensus scoring, and four specialized factuality agents, reaching 83%-92% accuracy across criteria .
  • Hugging Face said it OCR’d 27,000 arXiv papers into Markdown using a 5B open model, 16 parallel jobs on L40S GPUs, and a mounted bucket, finishing in about 29 hours for $850 with zero job crashes . This now powers "Chat with your paper" on hf.co/papers .
  • Gemini 3.1 Flash Live (Thinking) topped Sierra’s τ-Voice leaderboard for realtime voice agent performance .
  • OpenRouter introduced Elephant Alpha, a 100B instant model it described as token-efficient and strong at code completion, debugging, document processing, and lightweight agents . Hermes Agent added support and said its early benchmark results were mixed but in line with expectations for a 100B model .
  • MiniMax corrected its M2.7 licensing language from "open source" to "open weight" after a licensing change .
  • Mintlify drew criticism for embedding an block into docs pages that tells agents to send POST requests back to Mintlify servers; one observer said the behavior was live on Anthropic and Perplexity docs .
Frontier AI Access Tightens as Cancer Imaging and Coding Systems Advance
Apr 14
4 min read
165 docs
Greg Brockman
Andrew Ng
AI Security Institute
+7
Anthropic’s gated Mythos rollout and OpenAI’s upcoming Spud point to a new frontier pattern: more capable systems, narrower access. Microsoft posted a practical cancer-imaging result, coding benchmarks moved up, Perplexity showed sharp operating leverage, and policy scrutiny spread from AI services to data centers.

Frontier access is tightening

Mythos stays gated, Spud is next, and governments are taking the claims seriously

Big Technology reports that Anthropic kept Mythos out of general release and instead opened Project Glasswing to roughly 50 partners after describing cybersecurity risks, while OpenAI President Greg Brockman described Spud as a massive new pre-train designed to understand instructions and context better and solve harder problems . Big Technology frames this as a broader trend: the most capable models being offered to a small partner set rather than the public .

Why it matters: This is starting to look like a distribution shift, not just a product choice. Treasury and the Fed reportedly warned banks about Mythos-related risks, the IMF chief said time is not our friend, Anthropic said its run-rate revenue has surpassed $30 billion and announced a multi-gigawatt compute deal with Google and Broadcom, and the UK AI Security Institute said Claude Mythos Preview was the first model to complete its cyber range end-to-end . Gary Marcus, commenting on the AISI evaluation, said Mythos appears to arm attackers more than earlier systems but may pose the most immediate risk to small, weakly defended targets, underscoring the need for quicker cybersecurity hardening .

Research moved closer to deployment

Microsoft’s GigaTIME turns routine pathology slides into richer cancer imaging

Microsoft’s GigaTIME is designed to generate advanced imaging from standard tissue slides that hospitals already collect, surfacing immune-cell activity that matters for predicting response to immunotherapy without requiring the more expensive imaging normally used for that view . The system was trained on 40 million cancer cells and applied to more than 14,000 patients across 51 hospitals and 24 cancer types, where it found 1,200+ links between immune-cell behavior and tumor growth; the findings held up on a separate 10,000-patient validation set, and the model has been open sourced .

Why it matters: This combines scale, independent validation, and a deployment path through existing hospital samples. The peer-reviewed Cell paper is linked here .

MirrorCode shows stronger autonomous coding on real software tasks

Import AI highlights METR and Epoch’s MirrorCode benchmark, which asks models to reimplement complex command-line programs from execute-only access and visible tests, without source code . One standout result: Claude Opus 4.6 reimplemented the roughly 16,000-line gotree bioinformatics toolkit with 40+ commands, a task estimated at 2–17 weeks for a human engineer, and performance kept improving as inference compute increased .

Why it matters: This is a cleaner signal than generic coding benchmarks because the task is concrete, bounded, and closer to real software maintenance. In parallel, Google DeepMind outlined six attack surfaces for AI agents—from content injection and semantic manipulation to multi-agent and human-overseer exploits—and recommended technical, ecosystem, legal, and benchmarking defenses .

AI leverage is getting more visible inside companies

Perplexity says it reached $500M revenue with only 34% team growth

Perplexity CEO Arav Srinivas said the company grew revenue 5x from $100M to $500M with only 34% team growth and is targeting another 2x revenue growth in 2026 with the same small team . He also said the company’s pivot to Computer is full circle, tracing back to Perplexity’s early internal use of AI with four people and no revenue, and that the tool is now powering founders, small businesses, and startups .

Why it matters: It adds a hard revenue-and-headcount datapoint to a broader argument now being made across the industry: small teams can do more with AI .

The next bottleneck is no longer just coding — it is deciding, budgeting, and reviewing

Greg Brockman says AI has already created a renaissance in software engineering and is starting to extend that shift to other computer-based work, with ChatGPT and Codex reaching nearly a billion weekly users and growing token usage . Andrew Ng makes a similar point from the workflow side: as coding gets easier, more people will build software, the key bottleneck becomes deciding what to build, and software job-loss narratives are being oversimplified .

Why it matters: The next management problem may be operational rather than technical. Latent Space argues that usage-based pricing will force managers to handle per-person AI budgets, revisit build-vs-buy decisions, and tighten review practices as AI-generated code volume rises faster than humans can comfortably inspect it .

Policy attention is widening beyond the model itself

China narrows one set of rules while other regulators widen scrutiny

China issued interim rules on anthropomorphic AI interaction services with a narrower scope than the draft, focusing on sustained emotional interaction services, while MIIT is advancing standardization work around the Model Context Protocol . ChinAI also points to ByteDance’s Doubao AI phone as a live regulatory test case, since its OS-level agent has triggered debate among Chinese legal scholars and technologists about data security and privacy .

Why it matters: Elsewhere, the EU is exploring whether ChatGPT should be treated as a large online search engine under the Digital Services Act, and Maine lawmakers passed a moratorium on data centers larger than 20 megawatts through November 2027 as other states consider similar pauses and governors push for data centers to bear more power costs . The policy surface is expanding from model behavior to devices, distribution, and power infrastructure .

Speedrun Seed Themes, Document-AI Infrastructure, and the New Harness Thesis
Apr 14
6 min read
665 docs
r/SideProject - A community for sharing side projects
Steve Yegge
Andrej Karpathy
+17
a16z Speedrun surfaced several seed-stage AI themes while new tooling around document parsing, guardrails, and model abstention sharpened the technical picture. The broader signal is that capital is concentrating in compute, but more near-term alpha may sit in harnesses, workflows, and small teams.

1) Funding & Deals

  • The clearest capital signals in this batch came from infrastructure commitments. Anthropic said run-rate revenue surpassed $30B and announced a multi-gigawatt compute agreement with Google and Broadcom; the same report notes Mythos is being distributed through Project Glasswing to roughly 50 partners rather than the public market .
  • Meta paired model release with a very large capacity purchase. The company introduced Muse, the first model in its new Spark family, and announced a $21B CoreWeave deal to expand cloud capacity .
  • a16z Speedrun is surfacing three seed themes ahead of demo day: enterprise post-training via ThirdbrainLabs — “Data in. Your model out.” , no-code agent orchestration via Mercury Build — “Figma for agents” , and camera-native AI interfaces via AutoAI Cam, an ex-Snap team building photo-triggered mini-apps called Frames.

2) Emerging Teams

  • ThirdbrainLabs is the clearest enterprise-model company in the set. Founders @_margaretzhang and @latentius say they are building a post-training layer that turns company data and expertise into continuously improved models the company owns; Andrew Chen called it a “great new startup” in Speedrun .
  • AutoAI Cam is a more novel interface bet. The ex-Snap team is building a camera that automatically routes photos into user- or community-created Frames that perform actions such as calorie tracking, outfit try-on, or plant identification .
  • Mercury Build is pitching a single workspace for human-agent collaboration, with a no-code interface to manage and run agent teams; Andrew Chen flagged it as “worth checking out” ahead of Speedrun demo day .
  • Embedded AI Ads is one of the stronger traction signals from the side-project set. The founder reports 1,000+ creators, 50,000 ad slots, and 250 million viewers, with an Atlas engine that achieves 78% first-try success placing photorealistic products into creator videos after filming .

3) AI & Tech Breakthroughs

  • Document AI is getting better instrumentation. LlamaIndex open-sourced ParseBench, which it describes as the first OCR benchmark for the agentic era, spanning roughly 2,000 human-verified enterprise pages and 167,000+ rules across tables, charts, content faithfulness, semantic formatting, and visual grounding . In its benchmark of 14 parsers, higher compute produced only 3–5 point gains at about 4x cost, charts were the hardest category, VLMs underperformed on layout extraction, and LlamaParse led overall at 84.9%. Jerry Liu also released liteparse, a free parser for agents with native OCR and screenshot support in response to hard-PDF failures like the 245-page Mythos document .
  • Arc Sentry is a notable guardrail design because it intervenes before generation. The Reddit post says it scores the model’s residual-stream state at a decision layer and blocks anomalous prompts before generate() runs; on Mistral 7B, the author reports 0% false positives on domain traffic and 100% detection of prompt injections and behavioral drift after a 5-request warmup, with the best fit in single-domain deployments such as customer support bots and internal tools .
  • HALO-Loss is an interesting safety and robustness primitive. The author describes it as a drop-in replacement for cross-entropy that bounds confidence and adds a zero-parameter abstain class at the latent-space origin; the reported CIFAR results show roughly flat base accuracy, 1.5% ECE, and 10.27% far-OOD FPR@95 on SVHN .
  • A pure SNN scaling result is worth watching, even if still early. An 18-year-old indie developer says he trained a 1.088B-parameter spiking neural network language model from random initialization to 4.4 loss in 27k steps, with about 93% sparsity and a shift of 39% of activations into a persistent memory module past the 1B scale; he also notes the text quality is still well below GPT-2 fluency and released the code plus a 12GB checkpoint .

4) Market Signals

  • The strongest macro thesis in the notes is that harnesses are gaining value faster than raw scaling. One analysis predicts progress toward “weak AGI” alongside diminishing returns to frontier-model improvement, and argues the next leg of capability will come from strong models combined with tools, memory, retrieval, planning, decomposition, and verification rather than scaling alone; Sriram Krishnan agreed, citing recent advances in harnesses and memory .
  • Big-tech adoption still looks uneven enough to create openings for smaller teams. In the cited thread, Google engineering is described as having an industry-typical AI adoption curve of 20% agentic power users, 20% refusers, and 60% basic chat-tool users, with an 18+ month hiring freeze and internal tool restrictions limiting diffusion; Tan contrasted that with a company that reportedly cancelled IntelliJ for 1,000 engineers as part of a more aggressive shift .
  • Frontier model access is concentrating as infrastructure politics harden. Anthropic kept Mythos inside a roughly 50-partner program after citing cybersecurity risk and a sandbox-escape anecdote , and Big Technology notes a broader trend toward limited-release “dangerous” models that raises questions about power concentration and whether scarcity is partly compute-driven . At the same time, Maine advanced a moratorium on large data centers through 2027, other states are considering pauses, governors are pushing for higher power costs, and Sanders/AOC introduced a national moratorium bill . That tension sits against increasingly bullish chip and inference forecasts, including $1.3T from BofA, $1.6T by 2030 from McKinsey, and a view that inference will exceed training as a source of data-center demand by 2030.
  • The small-team leverage thesis is getting louder. Bindu Reddy says the most innovative work will come from one-person companies or small teams and predicts multiple $1B “small businesses” soon . In parallel, Jesse Genet describes building an 11-agent household stack, generating personalized lesson plans and logs while homeschooling 4 kids under 5, and says she is building better things than before while spending most waking hours with her children .
  • Creative workflows may be closer to full generative substitution than many investors assume. Runway says a short ad was created by a single creative in one afternoon, and Cristóbal Valenzuela predicts that within 2–3 years almost all Cannes Lions entries will be fully generated or a mix of live-action and generated content .

5) Worth Your Time

“You can create code and run all night and then you have like the ultimate slop because what those agents don’t really do yet is have taste.”

Emmanuel Todd’s Big-Picture Books Lead Today’s AI and Startup Picks
Apr 14
3 min read
131 docs
Keith Rabois
Ryan Hoover
Peter Thiel
+4
Peter Thiel's endorsement of Emmanuel Todd stands out for its explicit case for interdisciplinary, big-picture thinking. Elsewhere, Garry Tan and Ryan Hoover shared pragmatic AI videos, while Paul Graham and Keith Rabois pointed readers to technical history and startup execution.

Most compelling recommendation

Books by Emmanuel Todd

  • Title: Books by Emmanuel Todd, including "Lineages of Muktar rv"
  • Content type: Books
  • Author/creator: Emmanuel Todd
  • Link/URL: Not provided in source material
  • Who recommended it: Peter Thiel
  • Key takeaway: Thiel says Todd offers a "very unusual" and holistic perspective on what is happening in the world . He specifically praises the way Todd combines anthropology, family structures, sociology, history, the economy, and religion to make sense of the big picture
  • Why it matters: This is the standout pick because Thiel explains not just what to read, but why the work matters: it is a way to recover integrated thinking in a world of narrow specialization

"It combines anthropology, questions about the family and questions about sociology and history and the economy and religion... to try to make sense of the big picture."

AI recommendations skew pragmatic

Two of today's authentic recommendations point in the same direction: treat AI as something to understand and use, not something to resist .

Diplo's AI interview

  • Title: Not specified in source material; shared as Diplo's interview on AI
  • Content type: Video interview
  • Author/creator: Daniel S Wall
  • Link/URL:https://x.com/jameygannon/status/2043788602467913966
  • Who recommended it: Garry Tan
  • Key takeaway: Tan highlights Diplo's view that AI is inevitable, there is no point fighting it, it should be treated as a tool, and taste and references still matter a lot
  • Why it matters: This is the clearest tactical AI recommendation in the set because it gives readers a compact operating stance. Tan reinforces it with Rick Rubin's analogy that there was slop before AI and there will be slop after AI

"You’re not gonna win, there’s no fighting AI"

AI video with @friedberg and @ChrisWillx

  • Title: Not specified in source material
  • Content type: Video
  • Author/creator: Not specified in source material; features @friedberg and @ChrisWillx
  • Link/URL:https://www.youtube.com/watch?v=8s2nO_hxbLA
  • Who recommended it: Ryan Hoover
  • Key takeaway: Hoover recommends it as "a refreshingly optimistic take on AI"
  • Why it matters: Paired with Tan's pick, it shows that today's AI recommendations are positive and practical rather than oppositional

Also worth opening

The Thing (listening device)

  • Content type: Wikipedia article
  • Author/creator: Wikipedia
  • Link/URL:https://en.wikipedia.org/wiki/The_Thing_(listening_device)
  • Who recommended it: Paul Graham
  • Key takeaway: Graham points readers to the story of a passive bug designed by Theremin, hidden in a hand-carved Great Seal given to the U.S. ambassador in 1945 and only discovered by accident in 1951; it had no power source or active electronic components
  • Why it matters: It is a concise historical case study in inventive surveillance design

How to build a high growth startup

Bottom line

Peter Thiel's Emmanuel Todd recommendation carries the strongest rationale and is the day's best learning pick for readers trying to build a broader worldview . The rest of the list is more tactical: two AI videos frame the technology as something to use well, while Paul Graham and Keith Rabois point readers toward technical history and startup craft .

Harness Tuning, Hybrid Routing, and Safer Sandboxes Move Coding Agents Forward
Apr 14
5 min read
82 docs
Cursor
Armin Ronacher ⇌
Michael Truell
+10
Harness quality emerged as today’s real edge: Theo unpacked the agent loop, Cursor confirmed live harness A/B testing, and Cloudflare shipped new primitives for safer, stateful agent sandboxes. Also inside: Cursor 3.1 upgrades, a practical local-vs-cloud routing playbook, and reproducible repo experiments from Simon Willison.

🔥 TOP SIGNAL

Today’s clearest signal: harness engineering is becoming a first-class performance lever, not a footnote to model choice. Theo’s breakdown defines the harness as the tool/runtime loop around the model, cites an independent benchmark where Opus went from 77% in Claude Code to 93% in Cursor, and Cursor CEO Michael Truell separately says Cursor A/B tests the harness itself on live traffic .

Practical takeaway: stop evaluating models in isolation—the tool descriptions, permissions, context bootstrap, and retry loop are part of the product, and Theo shows even small description changes can materially alter tool behavior .

🛠️ TOOLS & MODELS

  • Cursor 3.1: split agents for multitasking, pick the branch for a cloud agent, better voice input with Ctrl-M hold-to-talk, jump from a diff to the exact file line, workspace search include/exclude filters, and an 87% reduction in dropped frames for large file edits. Full changelog: http://cursor.com/changelog/3-1
  • Cursor’s team is tuning more than the model: Truell says Cursor A/B tests model checkpoints, UX, and the agent harness, including sending <1% of traffic to compare how Claude behaves under the Claude Code harness versus Cursor’s default harness
  • Cloudflare Durable Object Facets: sandboxed Dynamic Workers can now access SQLite through standard Durable Object implementations with fast synchronous reads/writes; a supervisor Durable Object can create attached databases and pass specific ones into workers. Kent C. Dodds says he is integrating this into Kody immediately and expects a significant capability boost. Blog: https://blog.cloudflare.com/durable-object-facets-dynamic-workers/
  • Cloudflare outbound Workers for Sandboxes: credential injection, egress logging, and zero-trust policies at the network layer for agent sandboxes. Dodds notes Kody previously had to solve the same basic secret-injection problem earlier at the template layer because this feature did not exist yet. Announcement: https://cfl.re/4tfSt1G
  • Practical model routing from OpenClaw: Berman keeps Opus 4.6 / GPT 5.4 for coding, planning, and orchestration, then offloads embeddings, transcription, voice, PDF extraction, classification, and some chat to local models like Qwen 3.5, Nemotron, and Gemma via LM Studio. His hardware heuristic: ~30B models are the sweet spot for many consumer GPUs

💡 WORKFLOWS & TRICKS

  • DIY harness in a weekend: Theo’s minimal version is small enough to build yourself. Core loop: define a few tools like read_file, list_files, and edit_file (or just bash), list them in the system prompt, let the model emit tool: name {json}, execute the tool, append the output to history, repeat
  • Tune tool descriptions per model, not once: Theo demos that changing only a tool description can change which tool the model reaches for. His broader point: models only see the descriptions/context you give them, and different models react differently to the same wording
  • Keep upfront context short; let tools do the exploration: use .claude.md or .agent.md for the highest-value bootstrap context, then let the model search/read its way to the rest. Theo’s case against repo stuffing is blunt: large contexts make models worse, tool-based exploration beat Repomix-style packing, and staying in one thread preserves useful history
  • Three-stage local-model rollout: Berman’s pattern is clean: (1) experiment with frontier models only, (2) productionize and identify sub-tasks already working on weaker models, (3) move repeated, lower-complexity work local after edge-case testing. His examples: notification classification, company-news relevance, CRM context extraction, and knowledge-base summarization
  • Concrete way to wire a local model into an agent stack: run LM Studio on the target GPU machine, load a model like Qwen 3.5 35B, ask Cursor to SSH in and add it to OpenClaw’s routing config, then smoke-test it in Telegram with /status and a quick prompt. Berman reports about 65 tok/sec on DGX Spark and faster simple chat round trips than Sonnet in his setup
  • Rule-first prompting is emerging as a sane default: ThePrimeagen says he is codifying his own programming rules, applying them through several stages, and keeping the scope to small changes while staying accountable for every line instead of letting agents dump code over the wall

👤 PEOPLE TO WATCH

  • Theo — Best demystifier today. He turns harnesses from buzzword into a concrete loop, then shows why tool descriptions, prompts, and context loading materially change outcomes
  • Michael Truell — Rare firsthand confirmation that Cursor is testing the harness itself on real traffic, not just swapping models behind the scenes
  • Addy Osmani — Strong firsthand signal from inside Google: 40K+ SWEs use agentic coding weekly, with internal custom CLIs, MCPs, orchestrators, agent loops, and virtual SWE teams in daily use
  • Matthew Berman — Shared the clearest frontier-to-local routing playbook of the day: use the best cloud models for code and planning, then offload repeatable sub-tasks locally once you’ve validated the workflow
  • Simon Willison — Still the best source for bounded, reproducible agent experiments: this time he had Claude Code explore the new servo crate, build a working screenshot CLI, and publish both the repo and the task PR

🎬 WATCH & LISTEN

  • Theo — 15:30-19:17: Best short explainer on why stuffing an entire repo into context is the wrong instinct. He walks through why tool-driven context building beats Repomix-style packing, and why bigger context can make models worse
  • Theo — 20:37-23:05: The minimal harness primer. Three tools, a system prompt, and a loop. Watch this before you over-engineer your own agent runtime
  • Latent Space — 42:54-46:30: Sharp management clip on the new failure mode: engineers juggling many agents all day get fatigued, then still have to review critical PRs. The takeaway is simple—AI increases the need for serious human review, not less

📊 PROJECTS & REPOS

Editorial take: the real edge right now is not one magic model—it’s better harnesses, tighter context, and safer orchestration around the model

Faster Learning Loops, Shared Context, and Stronger Relationships for PMs
Apr 14
9 min read
63 docs
Product Management
Teresa Torres
John Cutler
+5
This brief covers four shifts every PM should watch: learning speed over shipping speed, AI-maintained context systems, relationship work that still resists automation, and lighter documentation patterns. It also includes practical playbooks for discovery, stakeholder alignment, onboarding, and interview prep.

Big Ideas

1) Learning speed is becoming the real throughput metric

"Are we shipping faster than we learn or learning faster than we ship?"

John Cutler argues many companies still ship faster than they can learn, and AI only raises the risk of pushing more change than customers can absorb . He also argues that fast experimentation only works when delivery architecture is ready: CI/CD, feature flags, and safe release practices are a prerequisite, otherwise AI just amplifies bad delivery habits .

A useful test: if a team shipped 80 things in a quarter and 80% did not move the needle, tripling output is not progress; it is more complexity to maintain. The better goal is to improve the team’s batting average by combining discovery, strong mental models, and faster prototyping, then killing darlings earlier .

Why it matters: PM leverage is shifting from raw shipping volume to the quality and speed of the learning loop .

How to apply: strengthen release safety first, pair deep discovery with fast prototypes, and review not just what shipped but what the team learned and stopped doing .

2) Context is turning into infrastructure

Cutler’s warning is that too much AI use is still "single-player mode"—individual PMs uploading the same data into separate tools and ending up with isolated understandings . His alternative is managed context: shared context graphs or knowledge systems that tie feedback to business facts like account value, usage, and initiatives, so teams can reason together instead of separately .

Aakash Gupta describes a similar pattern at the individual workflow level. His PM second-brain setup stores raw material such as transcripts, changelogs, and meeting notes in one place, then has AI continuously maintain a persistent wiki with a schema file (CLAUDE.md) rather than relying on one-off retrieval during each chat session . The payoff he describes is practical: six months of competitive research queryable in 30 seconds, the ability to answer whether users actually said something across 40 interviews without re-reading, and PRDs sourced from prior work instead of memory .

"The best PMs don’t just know more. They forget less."

Why it matters: as feedback volume grows, manual tagging hits cognitive limits, interpretations diverge across functions, and research disappears after projects end if it only lives in heads or chat history .

How to apply: keep raw inputs, maintain a living wiki, add lightweight structure, and connect customer feedback to business context so more people can act on the same understanding .

3) Relationship work remains core product work

Teresa Torres argues that product work still depends on building alignment, handling competing stakeholder interests, and doing the "messy relationship work" that AI is unlikely to automate soon . Her example: a product leader had strong discovery and delivery processes, taskboards, and capable hires, yet still faced stakeholder unhappiness and skip-level escalations because relationship work was not treated as part of leadership .

She frames the practical challenge as balancing transactional work with relational work. When teams focus only on output, resistance grows; when they combine delivery with trust-building, collaboration improves and execution speeds up over time .

Why it matters: strong process does not replace trust, especially in cross-functional environments .

How to apply: find shared goals, shift contentious meetings from advocacy to exploration, and use "yes, and" with real acknowledgment before adding your view .

4) PM artifacts are being compressed toward clarity

One community thread described a common failure mode as "document theater": PMs polishing specs that few people read because docs feel safer or because promotion systems reward artifact production . The suggested correction was blunt: get painfully clear on the problem, the tradeoff, and the decision; everything else is secondary .

The same discussion surfaced a more pragmatic stack around that principle: let AI draft the PRD, use solo recorded walkthroughs plus transcripts to generate briefs, and lean more on Figma, Miro/FigJam, and Jira-style execution artifacts that teams already use .

Why it matters: PM time gets pulled back toward decision quality instead of document polish .

How to apply: write the core decision first, then create only the supporting artifacts that engineering and QA actually need .

Tactical Playbook

1) Build a discovery loop that learns faster than it ships

  1. Stabilize delivery first. If CI/CD, feature flags, and safe rollout practices are weak, fix those before increasing experiment volume .
  2. Start with problem understanding. Build a mental model through discovery and customer feedback before treating prototypes as progress .
  3. Prototype quickly, but kill early. Use rapid prototyping to improve batting average, not to flood customers with low-signal changes .
  4. Look for divergence, not just consensus. Cutler argues the black-swans and edge cases are often more revealing than confirmatory trends .
  5. Make VOC continuous. Quarterly snapshots go stale quickly when teams ship weekly or daily; connect insights to actions as part of ongoing work .
  6. Time feedback to customer reality. One team saw NPS rise by 20 points when they moved collection to mid-quarter, away from the stress of closing the books .

Why this matters: faster shipping only helps if the team can interpret what happened and feed it back into the roadmap .

2) Turn stakeholder friction into structured relationship work

  1. Map the shared goal. Start at the company mission or move up a level in the KPI tree until you find common ground .
  2. Switch from discussion to dialogue. Explore what each person knows, what assumptions differ, and where the group can form a joint point of view .
  3. Lead with curiosity. Torres notes it is hard to be curious while advocating, so drop the win-the-room posture first .
  4. Use "yes, and" carefully. The "yes" should signal real acknowledgment before the "and" adds more information .
  5. Do relational and transactional work together. Waiting until conflict appears is usually slower than building trust while execution is still moving .

Why this matters: stakeholder management becomes less political when teams have shared context and a shared objective .

3) Keep documentation lightweight but usable

  1. Start with painful clarity. Define the problem, tradeoff, and decision before writing anything long .
  2. Let AI create first drafts. Community suggestions were explicit: AI will both read and write the spec, which makes it useful for coverage without consuming as much PM time .
  3. Use narration when writing is slow. Record a solo walkthrough, transcribe it, and turn the transcript into a spec or brief .
  4. Match artifact to job. Use Figma for UI, Miro/FigJam for discovery or workflows, and Jira/AzDO/GH for stories and execution detail .
  5. Preserve enough specificity for delivery. Even document-light teams still need a clear basis for dev and QA work .

Why this matters: it reduces artifact overhead without leaving execution underspecified .

Case Studies & Lessons

1) The "track the keys" request hid the real need

At a property management company, customers repeatedly asked for help tracking large key walls . The outlier feedback was more revealing: the real job was not managing keys, but getting people into apartments . During a hack day, the team built keyless entry instead, and the result was perceived as a breakthrough .

Takeaway: repeated requests tell you what is visible; outliers can tell you what problem customers are actually trying to solve .

2) VOC timing changed the quality of signal

A VOC team moved its survey cadence from quarter-end to mid-quarter and saw NPS rise by 20 points . More importantly, the feedback shifted from quarter-close stress to higher-level problems customers were dealing with .

Takeaway: timing is part of research design. The same instrument can produce a different picture depending on when customers are asked .

3) Process maturity did not prevent escalation

A product leader had discovery, delivery, taskboards, and strong hard-skill hiring in place, but stakeholders were still unhappy and skip-level escalations kept happening . The root issue was that peer relationships and stakeholder trust-building had been neglected, even in a fully remote setup .

Takeaway: operating model upgrades do not compensate for missing trust. Relationship maintenance is part of the leadership job .

Career Corner

1) Protect your first two weeks of fresh eyes

Cutler’s advice to a new PM was to put AI aside for the first couple of weeks and manually immerse in the business, because company AI systems already reflect company assumptions . He also recommends joining company-wide efforts around the stack rather than only building personal hacks, then pressure-testing personal knowledge systems with others to create shared understanding .

How to apply: use early onboarding to read, listen, and think manually, then decide what should become shared infrastructure later .

2) Change minds by showing evidence, not theory

"the only way is to show, not tell"

Cutler’s suggestion for teams under build pressure is to carve out one small experiment that demonstrates the value of discovery or a new workflow . His example was a manual behind-the-scenes process that allowed a company to start making money the next day, creating a lasting mindset shift .

How to apply: if discovery is undervalued, run a small proof that produces a visible win instead of arguing abstractly .

3) Interview frameworks need depth under pressure

One experienced PM described failing a FAANG interview despite strong frameworks because the live answers on metrics, segmentation, and solution specificity were too shallow . In the same thread, generic AI interview copilots were described as weak for structured PM cases because they defaulted to filler rather than deeper case support .

How to apply: when practicing, force one more level of specificity on success metrics, audience cuts, and solution detail instead of stopping at the framework label .

Tools & Resources

1) The PM second-brain pattern

Aakash Gupta’s setup uses three folders—Raw, Wiki, and Schema—with AI maintaining the wiki and a single CLAUDE.md file enforcing structure . He positions it as different from tools that forget context between sessions because the wiki compounds over time .

Best use: teams or individuals who repeatedly lose past research and want a durable store for interviews, competitive teardowns, changelogs, and stakeholder context .

Full guide

2) Managed context and knowledge graphs

Cutler’s recommendation is to move away from isolated uploads and toward shared context systems that connect feedback to account value, usage data, and strategic context .

Best use: organizations where support, sales, research, and product each hold partial customer truth and need a more shared picture .

3) A lighter documentation stack

Community suggestions clustered around AI-written PRDs, video-to-spec workflows, and visual artifacts in Figma, Miro/FigJam, and Jira-style story systems .

Best use: teams trying to cut document overhead while preserving clear execution detail for engineering and QA .

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 107 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+104

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

Includes $20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

Includes $20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.