Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

HUD, Reactor, and AMP Point to AI’s Next Infrastructure Layer
Jun 19
4 min read
792 docs
Karan Singhal
clem 🤗
Poolside
+9
New funding clustered around data and world-model infrastructure, while AMP, open weights, and test-time compute offered the strongest signals on where early AI investing and technical leverage may move next.

1) Funding & Deals

  • HUD — $16M total, Series A led by Dalton Caldwell. HUD is building a platform for high-quality post-training datasets plus a toolset and marketplace for RL environments. The company says more than 50 businesses already use it to build RL environments, sell them to AI labs, or train their own models. Backers named include Standard Capital, Y Combinator, Exceptional Capital, Liquid2 Ventures, 22VC, and angels including Dylan Patel, tszzl, Ivan Burazin, and Theo.

"HUD : ScaleAI :: Airbnb : Hilton"

  • Reactor — Lightspeed is leading the Series A around real-time video and world-model infrastructure. Reactor is building an infrastructure platform for real-time video and world models across interactive generative media, robotics, embodied AI, and hybrid movie/game experiences. The company says the capital will go to GPUs and cloud compute, team expansion, and R&D to improve model efficiency at scale.

2) Emerging Teams

  • AMP is the strongest new compute-market design in the set. Anjney Midha—previously at Discord’s developer platform and an investor in Anthropic, Mistral, Black Forest Labs, and Periodic—is building an independent compute grid intended to make FLOPs flow like megawatts across clouds and silicon. The target is 1.2GW of base-load capacity plus roughly 6GW of spike capacity over four years. The technical bench includes ex-Google scheduler builders Seb and Mihai.

  • Reactor’s founders are unusually on-thesis for world-model infrastructure. Alberto Tayutti previously built Luma AI’s core 3D and video foundation-model stack, while Bryce Schmidchen came from the early Apple Vision Pro / VisionOS effort and specializes in low-power, sub-10ms real-time systems. Early pull is coming from real-time media, educational apps, video editing, targeted advertising, and robotics.

  • LabGeni has a concrete enterprise-biotech validation signal. The Airstreet portfolio company partnered with LG Chem to develop tumor-targeting antibodies using its AI-driven platform.

  • Chion is a bottom-up data tooling concept worth tracking. The solo founder connects read-only Postgres, compiles analyst-verified SQL into a portable skill library, and exports those skills to Claude, Codex, Cursor, or any LLM via MCP. The wedge is reliability: reuse trusted queries instead of generating fresh SQL in meetings.

3) AI & Tech Breakthroughs

  • Rare-disease diagnosis is becoming a concrete test-time-compute use case. Published evidence now suggests reasoning models can help with rare undiagnosed diseases in some of the hardest pediatric cases.

  • Poolside pushed further into open weights. The company released Laguna M.1, its most capable model, with 256K context; both base and post-trained checkpoints are on Hugging Face under Apache 2.0.

"Open weights are now our default"

  • World models are looking like a separate infra category, not an LLM add-on. Reactor argues that real-time, stateful, interactive generation changes the full stack—from inference and GPU/cloud orchestration to streaming, networking, APIs, and developer experience—and sees a new open-source world-model wave that rhymes with the LLM infra buildout.

  • Model composition is becoming more explicit at the agent layer. Bindu Reddy outlined different pairings for backend coding, search, video, image, massively parallel work, and expert coding, suggesting model routing and combination are becoming product primitives rather than hidden implementation detail.

4) Market Signals

  • Startup-native AI is still the dominant investor posture. Foundation Capital says the best AI products it is seeing come from Bay Area founders and early-stage startups, not from established companies, and explicitly says this moment favors startups building from scratch.

  • Efficiency is becoming doctrine on both the compute and product sides. Anjney calls the frontier-systems mindset "output maxing" rather than brute-force scaling, while Foundation Capital is pushing founders toward 12-24 hour loops from customer conversation to shipped feature.

  • The next infra layer is forming around bottlenecks in memory, heterogeneous compute, and agent governance. Foundation Capital is watching KV-cache efficiency, CUDA-virtualization-style layers for mixed chip environments, and telemetry/governance tools for billions of agents. AMP is attacking the same scarcity problem via scheduling and utilization across multi-cloud, multi-silicon supply.

  • Test-time compute may create another demand kink in inference. A 20VC discussion argued that frontier models keep improving as more compute is applied at inference time, with no clear wall yet identified; the rare-disease diagnosis result above is one example of that thesis showing up in a real application.

5) Worth Your Time

  • Latent Space: The Professor of Outputmaxxing — Anjney Midha, AMP — useful for understanding compute pooling, utilization discipline, and why an ISO-style control layer could emerge in AI infrastructure.

  • HUD founder interview — useful for a primary-source walkthrough of HUD’s RL-environment marketplace thesis.

  • Laguna M.1 collection — useful if you are tracking the strength of the open-weights camp and Poolside’s Apache 2.0 posture.

  • Reactor on why world models require a new stack

  • Foundation Capital on compressing founder cycle time
OpenAI Advances Health AI as New Benchmarks Expose Agent Limits
Jun 19
4 min read
919 docs
clem 🤗
Poolside
Dean W. Ball
+19
OpenAI’s health-focused model update and rare-disease study led the day, while new evaluations showed how far frontier agents still are from reliable long-horizon work. The rest of the brief covers memory systems, reusable skills, open-weight strategy, and a new White House-Anthropic jailbreak framework.

Top Stories

Why it matters: today’s biggest signals were where AI is getting more useful in high-stakes settings, where agents still fall short, and where open models are becoming more practical.

  • OpenAI pushed health AI on both product and research fronts. GPT-5.5 Instant is now on par with OpenAI’s frontier Thinking models for health questions, with better urgent-care detection, context gathering, and uncertainty communication for the 230M+ weekly health queries ChatGPT sees; possible factuality errors fell 71%, and the model is free to all users . In parallel, OpenAI, Boston Children’s Hospital, and Harvard reported in NEJM AI that o3 Deep Research helped clinicians find 18 diagnoses across 376 previously unsolved pediatric cases, with every result undergoing human adjudication .

  • New agent benchmarks were a reality check for long-horizon work. AA-Briefcase evaluates multi-week projects with thousands of messy inputs, including documents, transcripts, 25,000+ Slack messages, and 3,500+ emails . Claude Fable 5 leads at 1587 Elo, but it satisfies all rubric criteria on only 3% of tasks, and no model clears 50% on 31 of 91 tasks . Terminal-Bench Challenges reported a similar pattern: even the strongest frontier models still score very low on large-scale autonomous software tasks .

  • GLM-5.2 kept strengthening the case for open models. It is now the top open model on Agent Arena at #10 overall , scored 1266 Elo on AA-Briefcase at an average cost of $2.40 per task , and can now run locally in a 2-bit version that shrinks from 1.51TB to 238GB while retaining about 82% accuracy . The notable shift is that the story is no longer just leaderboard strength; it is also price and local execution.

Research & Innovation

Why it matters: the most interesting technical work today focused on alignment that transfers, and faster ways to customize models.

  • OpenAI released new work on broadly beneficial RL. Using reinforcement learning on realistic conversations across 12 domains, the trained model improved on 44 of 53 independent evaluations spanning deception, reward hacking, safety, health, and mental health . Health-only training also improved non-health misalignment, deception, and reward-hacking evaluations, and the model was harder to steer toward harmful behavior with adversarial prompts .

  • Sakana AI introduced Doc-to-LoRA and Text-to-LoRA. The methods use a hypernetwork to generate LoRA adapters on demand, letting models specialize to new tasks or internalize documents with sub-second latency . In experiments, Doc-to-LoRA reached near-perfect needle-in-a-haystack accuracy on inputs five times longer than the base model’s context window and could transfer visual information from a vision-language model into a text-only LLM .

Products & Launches

Why it matters: product releases are moving from chat responses toward memory, reusable skills, and better team-facing outputs.

  • Perplexity launched Brain in Computer, a continuously learning memory system that builds a context graph from sessions, files, and connectors; on context-heavy tasks it improved answer correctness by 25%, recall by 16%, and ran 13% cheaper per task .

  • Claude Code added Artifacts, interactive pages built from a session, such as PR walkthroughs or living dashboards, shared through private team links on Team and Enterprise plans .

  • OpenAI added Codex Record & Replay, which turns a demonstrated recurring workflow into an inspectable, editable skill; recording is user-controlled and the rollout starts in select markets .

Industry Moves

Why it matters: companies are making bigger bets on policy influence, open-weight positioning, and new infrastructure layers for output quality.

  • OpenAI hired Dean Ball to lead a new Strategic Futures team focused on shaping frontier AI policy, starting July 6 .

  • Poolside paired a model release with a clearer strategy signal. It released Laguna M.1 under Apache 2.0 and said open weights are now its default .

  • Taste Labs emerged from stealth with an $18.5M seed. Its pitch is building the data and infrastructure layer that gives models and agents taste, and it says it is already working with frontier labs on post-training data and RL environments .

Policy & Regulation

Why it matters: AI governance is becoming more technical and more operational, not just a debate about principles.

  • The White House and Anthropic are developing a formal jailbreak-severity framework, with proposed benchmarks for how much safeguards were bypassed, what capabilities were exposed, and the practical consequences of a breach .

  • Google DeepMind published its AI Control Roadmap for managing advanced AI systems inside Google, arguing most agent failures come from misinterpreting commands or over-pursuing goals, and warning there is a narrow window to embed structural security protocols before multi-agent systems scale .

Quick Takes

Why it matters: these smaller releases still point to where tooling and infrastructure are improving fastest.

  • Liquid AI released multilingual retrieval models with end-to-end latency as low as 1.5ms across 11 languages .
  • VS Code now lets users bring any model to Chat, including local models, without a GitHub Copilot account .
  • Devin now performs automatic security reviews on every PR, ranks findings by severity, and drafts merge-ready fixes .
Loop Engineering, Record & Replay, and New Automation Primitives
Jun 19
6 min read
149 docs
Peter Steinberger 🦞
Addy Osmani
Geoffrey Huntley
+17
The strongest coding-agent signal today is the shift from manual prompting to durable loops. This brief covers the concrete workflows behind self-driving PRs, shared-state agent harnesses, and the latest releases from Codex, Cursor, Claude Code, LangSmith, and Datasette.

🔥 TOP SIGNAL

The clearest shift today is from manual prompting to loop design. Theo showed Codex clearing stale PRs overnight and waking up to four stacked PRs reviewed and merged , Jason Zhou described support and SEO loops already running in production on 30-minute and daily cadences , and Steve Yegge’s write-up of Ezra Savard’s Netflix study treats single-agent and multi-agent use as distinct literacy jumps with dedicated training for each . The common pattern across Addy Osmani and Geoffrey Huntley: the advantage is a harness that can sleep, checkpoint state, recycle context, and use a separate evaluator—not a better one-shot prompt .

⚡ TRY THIS

  • Run a repo-maintainer loop instead of a cleanup sprint. Steipete’s exact pattern is: tell Codex to maintain your repos, wake every 5 minutes, and direct work to threads; back it with an orchestrator plus triage, autoreview, and computer-use skills . Theo’s concrete use: let the loop close useless stale PRs, revive the worthwhile ones, then give each revived PR one build thread and one review thread; if you’re pushing a big migration, he also bumped Codex subagent parallelism from 3 to 20 and set a sharply defined goal . Study the exact skill docs here: maintainer-orchestrator and github-project-triage.

  • Move PR review handling off your keyboard. Theo’s next step was giving a PR its own worktree on another machine, then telling the agent to watch for comments, address them, and keep going; one run kept working for 6+ hours . After the code lands, have the agent run the dev server, verify behavior, commit, push the PR, fetch review comments itself, and even spin up reviewer threads; his dynamic loop created PRs, re-reviewed each new SHA, merged, and triggered the next PR automatically . Watch token burn on bad branches: Theo saw one feedback loop chew through 3M+ tokens on a small set of comments .

  • Turn a good one-off run into a shared-state loop. Jason Zhou’s setup flow is practical: manually run the task once, calibrate the behavior, then ask the agent to create a README contract with the goal, workflow, timeline, and schema before wiring a recurring trigger . Put outputs into shared folders for artifacts, signals, and tasks so other loops can read/write the same state, and add a global worklog.md so each agent reads the last 5-10 entries before starting . Triggers can be cron jobs, webhooks, or other agents .

  • Split planner / builder / reviewer at both the agent and model layers. Addy Osmani’s minimum bar for long-running agents is true sleep via events, durable checkpoints on every transition, and a separate evaluator because self-review overrates quality . Matthew Berman’s concrete implementation is model routing as a skill: plan with Fable, write with Composer, then review with GPT-5.5 . Geoffrey Huntley’s simpler orchestrator constraint is also worth stealing: allow one task only, recycle the context window after each task, and progress state with git commits plus a todo list .

📡 WHAT SHIPPED

  • Codex — Record & Replay. OpenAI shipped a new primitive for teaching Codex by demonstration: record a recurring task once, stop recording when you want, and Codex turns the session into an inspectable, editable skill . Greg Brockman framed it as teaching Codex by demonstration, and Nick Baumann says he’s already using it for calendar formatting, PR-to-Slack posting, and onboarding-flow testing .
  • Cursor — /automate + new triggers. Cursor added a plain-language /automate skill that configures triggers, instructions, and tools for you, plus Slack emoji triggers, GitHub triggers for issues/reviews/workflow runs, and computer use for cloud agents . Changelog: cursor.com/changelog/06-18-26.
  • Claude Code — Artifacts (beta). Team and Enterprise users can turn a session into an interactive page like a PR walkthrough or living project dashboard, then share it via private link . Boris Cherny says he’s using it for visual explanations of tricky code, system diagrams, animation previews, and shared dashboards; Mike Krieger’s tip is to ask Claude to diagram its work as tasks get deeper and more independent; @_catwu says teams are already using it to share architecture changes, analyses, and prototypes .
  • LangSmith — LLM Gateway. LangChain launched a gateway positioned as a budget guardrail against agents burning through large LLM bills overnight . Link: Introducing LLM Gateway. Timely context: Theo said his Codex loops drove more than $20,000 in inference over 48 hours .
  • Datasette Agent / Datasette Apps. Simon Willison’s latest write-up shows a coding-agent workflow that’s unusually clean: describe an app in chat, let the agent call describe_table, then app_create, and generate a single-file HTML app against a constrained API . His build stack is also a useful comparison point: Claude Opus 4.6 for the first plugin, Codex Desktop + GPT-5.5 for planning, and Claude Fable 5 for security review—which caught a real CSP privilege-escalation issue .
  • GLM-5.2. Simon notes the 753B MoE model has a 1M context window, open weights under MIT, ranks #2 on the Code Arena WebDev leaderboard behind only Claude Fable 5, and is listed on OpenRouter around $1.40 / $4.40 per million tokens input/output . In his testing it did especially well on animated SVG output, though one more complex illustration regressed versus GLM-5.1 .

🎬 GO DEEPER

  • 12:28-13:26 — Theo on loops that create more loops. Short demo of the agentic endgame: one thread makes the PR, another reviews each new SHA, fixes get re-reviewed, then the PR merges and the next one starts .
  • 18:24-19:29 — AI Jason on the handoff from manual run to production loop. He shows the exact move most people skip: test the workflow once, then make the agent write a README contract and wire the recurring trigger around it .
  • 1:03-3:17 — Addy Osmani on why long-running agents fail. Compact explanation of the three requirements: event-driven sleep, durable checkpoints, and a separate evaluator instead of self-grading .
  • 1:33-2:29 — Geoffrey Huntley on Ralph loops. Good antidote to the while true meme: single-task constraint, context recycling, and state progression via git commit + todo list .
  • Read Steve Yegge’s Netflix training note:The Flat Curve Society. Useful if you’re rolling agents out to a team: 0M / 4M / 12-15M qualified-day token cohorts, team-based training, and the shift from raw spend metrics to waste reduction and pocket evals .
  • Study the exact skills behind the maintainer loop:maintainer-orchestrator and github-project-triage. These are the concrete skill docs steipete says he combines with triage, autoreview, and computer use so work can land autonomously .
  • Study Datasette Agent + the Datasette Apps article. It’s a strong example of an agent with explicit tools, constrained APIs, and a copyable prompt template that other models can reuse .

Editorial take: the winners are starting to look less like prompt whisperers and more like workflow engineers with budgets, checkpoints, and reusable state .

Health AI Expands, Open Models Close Gaps, and the Grid Becomes an AI Issue
Jun 19
4 min read
323 docs
Tanishq Mathew Abraham, Ph.D.
Midjourney
Nathan Benaich
+8
Today’s biggest signals came from healthcare and biology: OpenAI paired a broad health upgrade with published rare-disease results, Profluent signed a $2.25B Lilly deal, and Midjourney surfaced a medical imaging project. Elsewhere, new benchmark data showed open-weight momentum amid persistent agent limits, while labs and policymakers focused on deeper safety and infrastructure questions.

Health and biology led the day

OpenAI paired a broad health rollout with published clinical evidence

OpenAI said GPT-5.5 Instant is now on par with its frontier Thinking models for health-related questions, with better urgent-care recognition, context gathering, uncertainty explanation, and clarity across more than 230 million weekly health and wellness queries; the update is available to all free ChatGPT users and was shaped with feedback from hundreds of physicians across 60 countries, 49 languages, and 26 specialties . Separately, OpenAI, Boston Children’s Hospital, and Harvard published a study in NEJM AI showing o3 Deep Research helped clinicians identify 18 diagnoses across 376 previously unsolved rare pediatric disease cases, with every result going through human adjudication and clinical confirmation .

Why it matters: one announcement widened access to health guidance inside ChatGPT, while the other tested AI inside an expert-led rare-disease reanalysis workflow that had already resisted years of specialist review .

Profluent signed a $2.25B Lilly deal for AI-designed gene editors

Profluent said it signed a $2.25 billion milestone deal with Eli Lilly to develop AI-designed gene editors for therapeutic large-gene insertion, framing the work as an example of AI unlocking a problem that could not previously be solved in this way . The company says its transformer-based sequence models are trained on more than 100 billion protein sequences and used to generate proteins from scratch; it also pointed to OpenCRISPR as the first demonstration of AI-generated functional gene editors, and said peer-reviewed comparisons found sequence models outperforming structure-based approaches on complex multi-domain proteins .

Why it matters: this is a large commercial signal for generative biology, and it ties frontier-model methods directly to therapeutic gene-editing programs rather than discovery tooling alone .

Midjourney surfaced a new medical imaging project with clear tradeoffs

Midjourney published a technical dive on a new "Scanner" project, which François Chollet described as a hardware effort for full-body internal 3D scans without MRI . A separate technical summary described the system as radiation-free, magnet-free, fast, and low-cost, while also noting current constraints: it requires a water immersion tank and its resolution is still coarser than CT or MRI .

Why it matters: it is a notable expansion from an AI image company into medical hardware, but the present limitations are substantial and part of the story .

Open-weight competition kept getting stronger

A new benchmark showed both momentum and stubborn limits

Artificial Analysis launched AA-Briefcase, a benchmark for long-horizon knowledge work across multi-week projects with thousands of fragmented inputs, including 25,000+ Slack messages and 3,500+ emails . Its headline result was sobering: the top model, Claude Fable 5, satisfied all rubric criteria on just 3% of tasks, and no model scored above 50% on 31 of 91 tasks; within that field, GLM-5.2 was the next-best non-Anthropic model at 1266 Elo and one of the strongest price/performance options, at $2.40 per task versus $31 for Claude Fable 5 . Poolside added to the open-weight push by releasing Apache 2.0 weights for its 256K-context Laguna M.1 and saying that "open weights are now our default" .

Why it matters: open-weight models are getting more competitive on cost and capability, but the benchmark also underscores how far the field still is from reliable end-to-end agentic knowledge work .

Safety work is moving below the interface layer

OpenAI and DeepMind both argued for more structural approaches

"Instead of assuming AI will always do what we intend, we ask: what if it doesn’t?"

OpenAI said its new work on broadly beneficial reinforcement learning used realistic conversations across 12 domains and improved a compute-matched model on 44 of 53 independent evaluations spanning deception, reward hacking, safety, health, and mental health; it also reported cross-domain transfer, where training only on health conversations improved non-health misalignment evaluations . The company also reported that the trained model was harder to steer toward harmful behavior with adversarial prompts and showed preliminary resistance to harmful fine-tuning while remaining responsive to helpful instructions . In parallel, Google DeepMind published an AI Control Roadmap arguing that most agent failures come from misinterpreting commands or becoming over-enthusiastic, and that there is a narrow window to embed structural security protocols before multi-agent systems scale globally .

Why it matters: both efforts point toward safety techniques that try to shape persistent behavior and system design, rather than relying only on after-the-fact prompt guardrails .

AI infrastructure is becoming energy policy

FERC took a meaningful step on large-load interconnection

FERC issued a large-load interconnection milestone that affects how AI factories, semiconductor fabrication support systems, and advanced manufacturing facilities connect to the grid . The policy direction highlighted in the announcement includes large-load customers funding their own network upgrades, bringing new generation online, and offering flexible load; customers that can demonstrate flexibility may qualify for accelerated study timelines as short as 60 days . NVIDIA also said it and Emerald AI are already working on flexible AI factories designed as grid assets, with commercial deployment beginning later this year .

Why it matters: AI capacity planning is no longer just a chip and data-center story; grid access and load flexibility are becoming part of the competitive stack too .

Pew’s AI Adoption Report and a Favorite Essay on Real-World Complexity
Jun 19
2 min read
173 docs
Marc Andreessen 🇺🇸
20VC with Harry Stebbings
Today’s strongest authentic recommendations pair a new Pew snapshot of AI chatbot adoption with a favorite mini-essay about hidden complexity in the physical world. Marc Andreessen shared the data-rich report, while a Benchmark partner highlighted an essay he said he really loves.

What stood out

Today’s authentic recommendations split into two useful kinds of learning resources: one offers fresh data on how far AI chatbot adoption has spread, and one offers a mental model for why physical-world work is harder than it first appears .

Most compelling recommendation

reality has a surprising amount of detail

  • Content type: Blog post / essay
  • Author/creator: Not specified in the notes
  • Link/URL: Not provided in the source notes
  • Who recommended it: Ev, Benchmark partner
  • Key takeaway: He said he really loves the piece because it uses the example of building stairs to show how much hidden complexity exists in the real, physical world .
  • Why it matters: This was the strongest explicit personal endorsement in today’s set, and it points readers to a compact lesson about real-world complexity that goes beyond the specific example .

"But the whole point is like in the real world, in the physical world, stuff is just really complex."

Also worth reading

Americans and AI 2026: Chatbots, Smart Devices, and Views on Impact

  • Content type: Article
  • Author/creator: Pew Research Center
  • Link/URL:https://www.pewresearch.org/internet/2026/06/17/americans-and-ai-2026-chatbots-smart-devices-and-views-on-impact/
  • Who recommended it: Marc Andreessen
  • Key takeaway: Andreessen shared the report to support his view that AI chatbots may be the most rapidly democratized technology in history, highlighting that about half of US adults now use them and one-in-four use them daily .
  • Why it matters: If you want a grounded benchmark for how quickly AI use has moved into the mainstream, this is the most concrete resource in today’s set .

"About half of US adults now report using AI chatbots, up substantially from the summer of 2024. One-in-four use these tools on daily basis."

If you only pick one

Start with reality has a surprising amount of detail for the clearest personal recommendation and the most general lesson in today’s set. Then read the Pew report for a hard-data view of how quickly AI tools have already spread .

Independent AI Review Loops and the Feedback Habits Behind Profitability
Jun 19
4 min read
50 docs
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Aakash Gupta
This brief highlights one important AI execution pattern for PMs—separating workers from judges—and a startup case study on reaching profitability through stronger feedback loops, onboarding, analytics, Android expansion, and partnerships.

Big Ideas

  • AI execution is moving toward independent review. Aakash Gupta notes that OpenAI and Anthropic converged on a "separation of duties" pattern after hitting the same failure: agents were approving half-built work. The fix is structural: one model executes, and a separate model verifies whether the output met a stated condition . Why it matters: if PMs are delegating work to AI, the leverage point shifts from better prompting to clearer success criteria and stronger review design. How to apply it: separate "do the work" from "judge completion," and make the judge answer one concrete pass/fail question.

"The worker never gets a vote on its own completion."

  • Profitability often starts with better listening, not bigger roadmaps. In one startup account, the early problems were bugs, weak design, poor feedback habits, bad analytics, and building features users did not want, including 5-minute summaries when customers preferred longer ones . Why it matters: PM errors often start when teams miss or misread user signals. How to apply it: treat instrumentation, direct feedback, and post-cancellation learning as core product work.

Tactical Playbook

  1. Run AI work with a worker/judge loop.

    1. Define the completion condition before execution
    2. Let the worker model do the task
    3. Give a separate judge model the transcript and ask only whether the condition was met
    4. Keep iterating until the proof is visible; in Gupta's example, the judge rejected premature completion claims until evidence appeared

    Example: a bug backlog that one-shot prompting left 12 issues deep was cleared in 31 unsupervised turns: 11 fixes passed tests, 2 issues were correctly marked blocked, and 1 duplicate was caught .

  2. Build a tighter product-feedback system.

    • Add an in-app feedback form
    • Pay a small set of users for detailed input; this team paid select users $100
    • Ask for reviews after clear AHA moments such as finishing a summary or quiz
    • Review competitor feedback weekly
    • Email cancelled users to learn why they left
    • Run user testing when the UI feels unintuitive

    Why it matters: this gives PMs a steady evidence pipeline for prioritization instead of relying on assumptions.

Case Studies & Lessons

  • A book-summary app reached profitability by correcting bad assumptions. After early quality and product mistakes , the team shifted toward what users actually wanted and added differentiators including text, audio, video, and visual summaries, quizzes, infographics, AI "Ask a Book," AI reading plans, and gamification . They also launched Android despite assuming only iOS users would pay; Android became a meaningful revenue driver . Personalized onboarding increased conversion , and a switch to Amplitude made analytics easier to use and broadened tracking . The founder also says corporate partnerships were a major factor in reaching profitability .What PMs should take from it:
    • Re-test willingness-to-pay assumptions by platform or segment
    • Treat onboarding as a conversion lever, not just setup
    • Use AI as differentiation only when it supports real user demand

Career Corner

  • Practice writing testable outcomes. The AI-agent example shows that vague completion criteria create false positives, while clear pass/fail conditions let a separate judge catch unfinished work . Why it matters for PMs: this is the same skill behind strong specs, crisp success metrics, and cleaner stakeholder alignment. How to build it: rewrite delegated tasks so they include observable proof of completion, plus valid blocked or duplicate states .

  • Keep one recurring user-learning ritual on your calendar. Weekly competitor review analysis, cancellation follow-ups, and direct user testing helped this founder identify what to fix . Why it matters: staying close to raw user language improves prioritization judgment. How to build it: own at least one weekly feedback review yourself.

Tools & Resources

  • Aakash Gupta's PM playbook and goal templates for structuring AI work around explicit success conditions and review criteria
  • Amplitude is worth exploring if your current analytics setup is hard to use; in this case, the team switched, found it easier to work with, and started tracking much more broadly

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 109 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+106

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.