Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Antifund’s Thesis, Agentic Document Infrastructure, and Open-Source AI Pressure
Apr 19
4 min read
531 docs
Machine Learning
Bindu Reddy
r/SideProject - A community for sharing side projects
+11
The clearest capital signal this cycle was Antifund’s portfolio and distribution-first thesis, while the strongest company and technical signals clustered around agentic document infrastructure, private AI systems, and open-source AI stack debates. This brief also highlights emerging platform-control ideas around identity, billing, and agent-native onboarding.

1) Funding & Deals

  • Antifund / 20VC: Antifund said it runs a $30M fund today but wants to build toward $10B-$20B scale, and highlighted exposure to Ramp, Cognition, and Chronosphere . The team cited entering Ramp at a $50M valuation and described that position as roughly a 300x personal-side outcome; it also called AeroDome a 10x in 18 months before rolling stock into Flock Safety . In the same discussion, the firm’s stated principle was that "attention is more valuable than capital" .
  • Incubation angle: The group also described incubating Better after seeing sports and gaming companies spend heavily on marketing while shipping weak ads and clunky apps .

2) Emerging Teams

  • LiteParse / LlamaIndex: A high-signal open-source infrastructure project for AI agents: model-free document parsing, ~500 pages in 2 seconds, 50+ formats, zero cloud dependency, and existing use in Claude Code, Cursor, and production pipelines . It reached 4.3K+ GitHub stars in a few weeks, and Jerry Liu called it a central pillar in LlamaIndex’s open-source push toward an agentic document-processing platform .
  • OpenFDD: A new "logic-less" document spec built around PDF-like safety, JSON-like readability, and web-app-style UI, using a Universal 1003 loan application as proof of concept . The format removes JavaScript, adds JSON-LD for AI-native extraction, uses did:web signatures, supports local file updates, and is explicitly aimed at moving workflows away from "digital paper" toward portable data .
  • Offline-first AI knowledge appliance: An early founder is targeting 10-30 person businesses that will not put proprietary data in the cloud, with an on-prem NVIDIA system that ingests SOPs, emails, procedures, and client files into a searchable knowledge base . The product includes citations, role-based permissions, audit logs, deduplication, version control, and request queuing, and is intentionally built as a simple eight-layer pipeline with full execution tracing rather than multi-agent routing . Customer discovery cited engineering firms, medical device startups, and MSPs; feedback focused on setup friction, pricing, and whether privacy is strong enough to justify hardware .

3) AI & Tech Breakthroughs

  • ParseBench: LlamaIndex launched what it described as the first document OCR benchmark for AI agents, centered on "content faithfulness"—whether a parser captures all text in order without omissions, hallucinations, or reading-order errors . It uses 167K+ rule-based tests, and Jerry Liu’s framing is that current parsers still miss this baseline, which compromises downstream agent decision-making .
  • Kimi.ai infrastructure paper: Kimi.ai published "Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter," putting cross-datacenter KV cache on the agenda for next-generation model serving .
  • Steerling-8B: One Reddit post highlighted Guide Labs’ open-sourced Steerling-8B for baking a concept layer into the architecture so tokens can be traced back to training-data origins without post-hoc analysis; the same post said the model still discovers novel concepts independently .
  • Qwen 3.6: Bindu Reddy described Qwen 3.6 as a 3B-active-parameter release that "costs nothing to run" and delivers about 80% of Opus 4.7’s performance, framing it as evidence that open source is making giant leaps .

4) Market Signals

"America leading in open source/open weight AI is crucial. this matters at all levels of the stack from models to harnesses to applications."

  • Open source as national advantage: Andreessen endorsed that view directly, and separately agreed with the claim that even small government delays or threats can tip a country’s innovation culture and let rival nations race ahead .
  • Regulation as competitive pressure: In another exchange, Andreessen called "concerning" a narrative that would use fear to push taxes and regulations which, in his view, would hurt Anthropic’s startup competition and slow AI innovation .
  • More of the stack is investable: Sriram Krishnan argued that the industry is about multiple layers of the stack, pointing to Openclaw / Hermes and innovation in harnesses, memory, and context engineering . GStack is a concrete example of that layer: a MIT-licensed coding-agent stack with 26 skills, a programmable browser, screenshot tooling that can replace Puppeteer / Chromium, and E2E LLM-as-judge evals for /plan-ceo-review and /office-hours.
  • Agent-native identity and onboarding primitives are emerging: Andrew Chen proposed a "login with GPT/Claude/Gemini" button that collapses API keys, billing, and auth into a single primitive and turns models into identity + wallet rails . Separately, AgentMail shows adjacent agent-native onboarding: an agent can get its own inbox through a single prompt to http://agent.email, and Garry Tan said the pattern "should work for everything" .
  • Investor edge is shifting toward attention, taste, and distribution: Antifund’s stated principle is that attention is more valuable than capital, and the team argued that as AI makes coding and financial analysis more like metered intelligence, taste, cultural fit, and distribution become more important .

5) Worth Your Time

  • 20VC / Antifund:episode — the best primary-source item in this set for Antifund’s portfolio, fund ambition, and attention-over-capital thesis .
  • ParseBench resources:blog / paper / website — the cleanest reference here on OCR failure modes that directly affect agent decisions .
  • GStack repo:GitHub — useful for inspecting how a modern coding-agent harness packages skills, browser control, and evals .
  • OpenFDD spec:GitHub / spec — worth reading if you care about signed, AI-readable documents rather than OCR-dependent files .
Robotics Reasoning, Cheaper Serving, and Agentic Coding Gain Ground
Apr 19
4 min read
405 docs
Will Guisbond
Percy Liang
Larry Dial
+16
Operational AI was the main theme: DeepMind upgraded robotics reasoning, Moonshot showed a path to cheaper cross-datacenter serving, and Databricks said its coding agent now writes more code than humans on its own platform. Also in this brief: Apple’s Transformer-to-Mamba distillation, new document-processing tools for agents, Meta’s AI infrastructure shift, and the FAA’s air-traffic AI project.

Top Stories

Why it matters: The clearest signal this week is AI moving from chat interfaces into operational systems: robots, serving stacks, and internal software workflows.

  • DeepMind released Gemini Robotics-ER 1.6. The robotics reasoning model adds stronger spatial reasoning, multi-view success detection, and instrument reading, with 93% accuracy on instrument reading using agentic vision . That improves core perception and feedback tasks for real-world robotics.
  • Moonshot pushed Prefill/Decode disaggregation beyond a single cluster. It says Kimi Linear makes cross-datacenter, heterogeneous-hardware serving practical by reducing KV cache size, and reports 1.54× throughput plus a 64% drop in P90 time-to-first-token on a 20× scaled-up model . The practical implication is lower latency and lower token costs.
  • Databricks says Genie Code now writes more code than humans on its platform, one month after launch. The tool is positioned as an AI agent for data teams . If sustained, that suggests agentic coding is moving from assistant mode to primary execution in some internal workflows .

Research & Innovation

Why it matters: Some of the most important progress was in infrastructure research that could lower serving costs or make large-model training more stable.

  • Apple’s “Attention to Mamba” shows a two-stage path from Transformers to Mamba. Instead of distilling directly and losing performance, Apple first distills into a linearized-attention student and then into pure Mamba; on a 1B model trained on 10B tokens, the Mamba student reached 14.11 perplexity versus 13.86 for the teacher . That suggests long-context serving could get cheaper without retraining models from scratch .
  • Google’s CoDaS treats biomarker discovery as an agentic workflow. Across 9,279 participant-observations, it surfaced 41 mental-health and 25 metabolic candidate biomarkers, including links between circadian instability and depression and between a cardiovascular fitness index and insulin resistance . The loop combines hypothesis generation, statistical analysis, adversarial validation, and literature-grounded reasoning with human oversight .
  • Quantile Balancing is getting real use in MoE training. The method assigns tokens to experts by solving a linear program with no hyperparameters and is described as yielding stable training; Marin says it used it in a 1e22 FLOPs run, an ongoing 130B model, and a current 1e23 FLOPs MoE .

Products & Launches

Why it matters: New launches are increasingly about giving agents reliable access to documents, repos, and local tooling.

  • LlamaIndex launched ParseBench, a document OCR benchmark built for agents. It measures “content faithfulness” with 167K+ rule-based tests across omissions, hallucinations, and reading-order failures, and LlamaIndex says no parser currently gets this completely right .
  • LiteParse became a first-class LlamaIndex component. LlamaIndex says the open-source parser now has 4.3K+ GitHub stars, supports 50+ formats, parses roughly 500 pages in 2 seconds, and runs with zero cloud dependency .
  • Ollama added GitHub’s Copilot CLI support. The integration lets users explore issues and PRs, search repos by label, scaffold work from tickets, edit files, and run commands through the terminal agent .

Industry Moves

Why it matters: Companies are reallocating capital and revisiting financing as infrastructure costs and model competition keep rising.

  • Meta is reportedly cutting about 8,000 jobs, or 10% of its workforce, starting May 20 to free up billions for AI infrastructure. The cited shift is from payroll toward data centers, chips, and advanced models .
  • DeepSeek is reportedly in talks to raise outside money for the first time after two years of rejecting investors. One analysis tied the shift to five senior researcher departures, repeated V4 delays, and a hardware migration running in parallel .
  • Sakana AI says it received an order for a domestic AI analysis system in Japan’s defense sector. The contract was highlighted in a Nikkei podcast and article focused on domestic production for defense AI .

Policy & Regulation

Why it matters: Government AI adoption is starting to touch safety-critical systems, where procurement and oversight matter as much as model capability.

  • The FAA is developing an AI-powered air traffic management tool that could significantly change how U.S. airspace operates. Reported bidders include Palantir, Thales, and Airspace Intelligence .

Quick Takes

Why it matters: A few smaller updates also point to where momentum is building next.

  • DSPy.RLM + Qwen 3.5 9B reached 15.69% on LongCoT-full versus 9.83% for GPT 5.2 on the same slice .
  • Hermes Agent passed 100,000 GitHub stars .
  • vLLM says day-0 support for MiniMax M2.7 on NVIDIA Blackwell Ultra is already delivering up to 2.5× throughput on NVIDIA’s 1K/1K benchmark .
  • Hugging Face says agents can now call 1 million HF Spaces for specialized capabilities .
Iain Banks' Culture Books as Elon Musk's Model for AI Abundance
Apr 19
2 min read
135 docs
Elon Musk
Elon Musk's recommendation of Iain Banks' Culture books was the clearest organic signal in today's set. What makes it useful is the context: he framed the books as the best imaginative model for the AI-and-robotics future he expects.

What stood out

One recommendation clearly passed the authenticity filter today: Elon Musk pointed readers to Iain Banks' Culture books as the best imagining of the world he expects AI and robotics to create. He framed that future as one in which the output of goods and services rises several orders of magnitude above today's economy.

Most compelling recommendation

Culture books

  • Content type: Book series
  • Author/creator: Iain Banks
  • Link/URL: No direct URL to the books was provided in the source material. Source post: Elon Musk on X
  • Who recommended it: Elon Musk
  • Key takeaway: Musk called these books "the best imagining of how it will be" for a future shaped by AI and robotics-driven abundance.
  • Why it matters: The recommendation gives readers a concrete imaginative reference point for the kind of high-output future Musk says advanced AI and robotics could enable.

"Actually, AI/Robotics will mean everyone can have a penthouse if they want. The output of goods & services will be several orders of magnitude higher than today’s economy."

"Read the Iain Banks Culture books for the best imagining of how it will be."

Bottom line

If you want one resource from today's set, start here. The recommendation stands out because Musk did not share it in isolation; he attached it to a specific claim about AI and robotics massively increasing output and expanding what is materially possible.

Cheaper Inference, Agent Memory, and the New Compute Moat
Apr 18
6 min read
560 docs
martin_casado
Gavin Baker
David Sacks
+14
Capital this cycle centered on cheaper inference and strategic autonomy bets, while early teams pushed forward on agent memory, semantic retrieval, and non-invasive interfaces. The broader read-through is that AI competition is shifting from raw model novelty toward context infrastructure, compute access, and differentiated workflows.

1) Funding & Deals

Disclosed financing in this set clustered around cheaper inference and hardware-linked autonomy rather than pure application SaaS .

  • Parasail — $32M for cheaper inference infrastructure. TechCrunch said Parasail raised about $32M to scale AI inference infrastructure in a cheaper way, landing into a market where firms are increasingly focused on token use and budget efficiency .
  • Wave — $60M strategic extension from AMD, Arm, and Qualcomm. The UK autonomous-driving company added a $60M extension on top of its more than $1B Series D, and TechCrunch said the semiconductor investors are taking real equity, not in-kind credits, to help scale Wave’s hardware-agnostic, end-to-end neural-network stack; Uber plans another $300M based on milestones .

2) Emerging Teams

  • Sabi — stealth BCI company with top-tier backing. Not Boring said Sabi emerged from stealth backed by Khosla Ventures, Accel, Initialized, and OpenAI VP Kevin Weil, with a non-invasive cap/beanie that uses 70,000-100,000 EEG sensors and a Brain Foundation Model aimed at 30 words per minute; shipping is expected at year-end .
  • Gaia — college-student founder tackling tool-calling scale. The solo builder said Gaia hit a wall at 200 tools, then fixed hallucinations and context bloat by embedding tools in ChromaDB and retrieving them semantically at runtime; the system now routes across a three-layer comms/executor/subagent architecture and claims to scale to thousands of tools without degradation .
  • AgentID — shared memory for multi-tool workflows. AgentID is pitched as a shared memory, context, and identity layer so multiple AI tools stop redoing setup and burning tokens; the founder says its Caveman compression layer cuts token usage by up to 65% in some workflows, and early commenters validated the pain around repeated context loss while pushing for harder proof on completion rates and scoped resets .
  • AgentMailr — agent-native email infrastructure, already in production. The founder built it around persistent mailboxes, thread-level routing, sender filtering, and reliable inbound webhooks; shipped features include mailbox provisioning per agent, routing rules, allowlists/blocklists, and BYOS, with follow-up discussion focused on protections against agent loops using sender rules and thread-ID tracking .

3) AI & Tech Breakthroughs

  • Structured context is starting to beat brute-force RAG in code and tool use. One builder reported 80% hit@5 retrieval across 18 repos and 90 tasks using only regex + TF-IDF over function signatures and class shapes, versus a 13.6% random baseline, with a 98.1% token reduction and no embeddings or ML . A related code-memory project, Ix, maps repos into graphs of files, functions, relationships, and dependencies so models query structure instead of chunks . Gaia makes the same bet on tools: semantic retrieval replaced prompt-listed tool search and was said to take the system from dozens of tools to thousands without degradation .
  • Persistent runtimes are getting more autonomous. Springdrift injects a structured self-state block called a sensorium into each cycle, and its author described an episode where the agent noticed a missing writer agent from passive context and rerouted work without a diagnostic tool call . Agent Relay is making a similar infrastructure bet from the other direction: synced files across multi-agent sandboxes and virtual file mounting from systems like Notion, pitched as faster and lower-token than API-heavy access .
  • OpenClaw’s core insight is UX, not a new base model. The product is framed as winning on ergonomics because messaging channels like iMessage, WhatsApp, and Telegram make delayed replies feel normal, reducing the pressure for instant-but-shallow responses; Garry Tan separately called OpenClaw “straight magic” and Peter Steinberger’s TED talk “a revelation,” while All-In said OpenAI has hired Steinberger as it pushes for the agent platform layer .

"that’s the magic of openclaw - same underlying tech, different consumer mental model"

  • Coding agents are starting to show real utility in personalized medicine. Patrick Collison said agents working over his genome surfaced a roughly 30x higher melanoma predisposition and recommended follow-on tests, supplements, and more frequent screening; he estimated the analysis at under $100 on top of a few hundred dollars for sequencing, while noting the agents still need monitoring and re-steering. Marc Andreessen publicly co-signed the use case .

4) Market Signals

  • Anthropic’s enterprise-coding focus is being cited as a major growth driver. On All-In, speakers said Anthropic and OpenAI were both around a $30B run rate at the start of Q2, while also noting Anthropic’s figure may be lower on an apples-to-apples basis because of channel-partner revenue; the same discussion said Anthropic has been growing roughly 10x/year versus OpenAI’s 3-4x/year, with enterprise coding and metered usage explaining the gap. The speakers also said secondary markets now value Anthropic above OpenAI, and that OpenAI is pivoting harder toward business customers and the agent platform layer .
  • Compute supply and hardware fit are becoming larger competitive variables. All-In argued frontier labs have grown to the point where depending on hyperscalers is a strategic mistake, and cited increasing siting resistance, including a Maine ban and claims that roughly 40% of contested data-center projects get canceled . In parallel, Gavin Baker argued model portability is eroding as hardware topologies diverge and labs optimize for inference economics, not just training, which raises switching costs and rewards tighter co-design between models and systems .
  • The application moat is moving beyond the wrapper. Clouded Judgement lays out a progression from thin wrapper to harness to post-training and eventually pre-training proprietary models, arguing that early winners like Cursor are already moving into phases 3-4 . Garry Tan’s operating version is “fat skills, fat code, thin harness,” and he argues many critiques of agents are really critiques of naked LLM use without tools, deterministic code, or context management .
  • This still looks like an expansionary spend cycle, not a mature ROI market. Clouded Judgement says many companies are currently “over earning” on rapid AI-spend growth and token-maxing behavior, but expects an optimization phase once budgets balloon, which should separate differentiated vendors from rising-tide beneficiaries . Parasail’s financing around cheaper inference infrastructure sits inside that theme .
  • Founder supply remains broad, but the skill gap may widen. Garry Tan pointed to YC funding 800+ mostly first-time founders as evidence that deciding what to build still matters even as tools improve . In a separate post he highlighted, heavy users were described as encoding full workflows in plain-English markdown, with the claim that “engineering context = engineering code” .
  • Investors are re-litigating what counts as ARR. One critic warned that reporting stepped multi-year contracts as current ARR can inflate figures by roughly 3x and mask negative first-year margins from bundled forward-deployed engineers, while Martin Casado pushed back that using exit ARR as current is not that common and is less problematic than other reporting games such as treating GMV as ARR .

5) Worth Your Time

  • All-In on Anthropic, OpenAI, and the datacenter constraint — covers the claims that Anthropic is compounding faster than OpenAI on enterprise coding economics, that OpenAI is pivoting toward business customers and agents, and that frontier labs now need their own infrastructure .
  • Gavin Baker’s portability thread — useful on why tokens per watt per dollar now dominate, why co-designed models run worse on the “wrong” hardware, and why U.S./China AI stacks may diverge .
  • Clouded Judgement: "Rising Tide, Hidden Risk" — lays out the case that today’s AI spend boom is masking over-earning, and that the next moat may shift from harnesses to proprietary models .
  • Gaia’s tool-calling writeup — details the failure mode at 200 tools, the move to semantic retrieval, and the comms/executor/subagent architecture that followed .
  • Peter Steinberger’s TED talk on OpenClaw — Garry Tan called it “a revelation,” and All-In later noted that OpenAI hired Steinberger as it pushes deeper into the agent platform layer .
Claude Design Debuts as Stargate Advances and Epoch Sees Capability Acceleration
Apr 18
5 min read
830 docs
Bill Peebles
Justus Mattern
PrismML
+16
Anthropic expanded Claude into design workflows, Epoch AI reported faster progress across most capability metrics since reasoning models emerged, and new survey data suggests OpenAI’s Stargate buildout is materially underway. Also inside: web-agent research, new automation tools, and fresh enterprise signals.

Top Stories

Why it matters: The biggest AI story today is that labs are expanding from model releases into workflow ownership, while the compute and capability curves behind those products keep steepening.

  • Anthropic pushed Claude beyond chat and coding with Claude Design. The new tool lets users create prototypes, slides, and one-pagers by talking to Claude; it supports inline edits, sliders, export to Canva/PPTX/PDF/HTML, and handoff to Claude Code. It runs on Claude Opus 4.7 and is rolling out in research preview to Pro, Max, Team, and Enterprise users . That matters because Anthropic is productizing end-to-end creative work, not just model access. Posts tracking the launch also pointed to Figma shares falling about 7% after the announcement .

  • Epoch AI found signs of faster capability growth after reasoning models arrived. Across four capability metrics, Epoch reported strong evidence of acceleration in three: ECI, METR’s 50% time horizon, and a math index were best fit by two linear trends with a break around the arrival of reasoning models, while WeirdML V2 did not show the same acceleration . Epoch says the result survives multiple robustness checks, but also notes these metrics lean heavily toward math and programming, where RL-style verification is easier than in messier domains .

  • OpenAI’s Stargate buildout looks materially underway. Epoch AI says all seven US Stargate sites show visible development and that the project appears on track for more than 9 GW by 2029, comparable to New York City’s peak power demand . Abilene, Texas is already estimated at 0.6 GW operational today and 1.2 GW by Q3 2026 . The significance is strategic: frontier AI is becoming a power-and-construction race as much as a model race.

Research & Innovation

Why it matters: Research progress is increasingly about making agents more durable, reusable, and efficient rather than only pushing raw benchmark scores.

  • FrontierSWE is a new ultra-long-horizon coding benchmark where agents get up to 20 hours to solve tasks such as optimizing a video rendering library or training models for quantum-property prediction, and they still rarely succeed . It is a useful reality check on how far current coding agents remain from sustained autonomous engineering.

  • WebXSkill teaches web agents reusable skills from synthetic trajectories. Reported gains include 69.5% on WebArena versus 59.7% for baselines, and 86.1% on WebVoyager in grounded mode; guided skills also transferred across environments at 85.1% . The authors also note stronger models benefit more from grounded execution, while weaker ones gain more from guided mode .

  • Ternary Bonsai from PrismML uses ternary weights {-1, 0, +1} to build models the company says are 9x smaller than 16-bit counterparts while outperforming most peers in their parameter classes on standard benchmarks . The models are open-sourced in 8B, 4B, and 1.7B sizes under Apache 2.0 .

Products & Launches

Why it matters: The product layer is shifting toward persistent automation, local agents, and cheaper multimodal building blocks.

  • Claude Code Routines adds serverless automations that can be triggered by schedule, API call, or GitHub webhook, with daily run caps depending on plan tier .

  • Ollama 0.21 now supports Hermes Agent, which Ollama describes as a self-improving agent that creates skills from experience, improves them during use, persists knowledge, searches past conversations, and builds a user model across sessions .

  • Fish Audio S2 Pro became the leading open-weights model on Artificial Analysis’s speech arena, with 1,165 Elo, multi-speaker and multi-turn generation, natural-language prosody tags, and API pricing of $15 per 1M characters .

Industry Moves

Why it matters: Enterprise adoption is still moving fast, but the business models around AI are getting closer scrutiny.

  • TextQL raised $17M led by Blackstone to build agentic analytics for messy enterprise data. The company says it grew revenue 9x year over year, posted 300%+ net dollar retention, and is live at Blackstone, Scale AI, and Dropbox, where its system queries across 400K+ tables and 100K+ dashboards .

  • OpenAI saw notable leadership turnover. Bill Peebles said he is leaving after helping build Sora from zero to one, highlighting early gains in object permanence and a rapid jump to high-fidelity 1080p multi-shot generation . Kevin Weil also said OpenAI for Science is being decentralized into other research teams as he departs .

  • Revenue quality is becoming a live debate in enterprise AI. Scott Stevenson argued that some startups are inflating “Contracted ARR” by annualizing future step-up pricing on multi-year deals even when current cash collection is much lower and customers can opt out after 12 months . His example showed roughly $100M reported ARR versus $35M in cash-generating ARR by Q5, with forward-deployed engineers further pressuring margins .

Quick Takes

Why it matters: These are smaller items, but each points to where the next capability or deployment shift may come from.

  • Anthropic introduced Claude Mythos Preview, a model that can autonomously identify and exploit serious software vulnerabilities; it is not being released publicly and is instead being tested with industry partners first .
  • Muse Spark ranked #3 on ClawEval, ahead of GPT-5.4 and Gemini 3.1 Pro, according to Alexandr Wang .
  • AMD and EmbeddedLLM say the MORI-IO KV Connector boosts vLLM single-node goodput by 2.5x and keeps decode stable at max load .
  • Qwen 3.6 can now preserve chain-of-thought between turns, which researchers say could improve reasoning efficiency if context clutter stays manageable .
Codex Turns Proactive as Harness Quality Becomes the Real Differentiator
Apr 18
5 min read
158 docs
Evan Bacon 🥓
Riley Brown
Peter Steinberger
+19
Codex picked up the strongest real-world momentum today: proactive Slack triage, in-app iOS simulator workflows, and heavyweight operator setups. The deeper pattern across the rest of the feed: model quality still matters, but harness quality, validation loops, and tool access are increasingly what separate useful agents from frustrating ones.

🔥 TOP SIGNAL

Codex is looking less like a coding sidecar and more like a full agentic IDE/computer-use layer: Greg Brockman highlighted proactive task suggestions from Slack bug threads and said Codex is becoming a "full agentic IDE," while a separate demo showed iPhone app development directly in Codex desktop with the iOS simulator . Alexander Embiricos pointed to a MacStories review calling Codex's computer use the best tested in any LLM desktop agent, which lines up with the operator chatter around plugin-heavy setups with real tool access . The practitioner response is following that direction: Soumitra Shukla says he now mostly uses Codex because it has lower setup friction than Claude Code, and Riley Brown says Codex has a slight edge in his current workflow .

🛠️ TOOLS & MODELS

  • Codex / Codex desktop: The current power-user pattern is plugin-heavy, app-first, and increasingly proactive. People are wiring in Slack, Gmail, Computer Use, Vercel, Remotion, iOS app builder, PowerPoint/Docx, plus email/Slack/Linear/Notion integrations; Riley Brown says his current default is Codex 5.4 Xhigh for most tasks .
  • Claude Opus 4.7 + Claude Code: Field reports remain mixed. Theo says Opus 4.7 is not the best current model for code, and his hands-on tests found stronger instruction following but failures caused by stale version assumptions, lack of web search for "latest," hallucinated gitignore behavior, and Claude Code permission/harness issues . Matthew Berman separately highlighted reports of prompt-injection false positives, incorrect MCP tool calls, and conversation hallucinations in Claude Code sessions, even as he noted Opus 4.7's SWE-bench Verified score rising to 64.3% from 53% for 4.6 .
  • Cursor: Jediah Katz pointed to Endor Labs analysis saying Cursor is currently the best harness for functional and secure code, with a notable jump after Claude Opus 4.7.
  • Ecosystem update: OpenCode and Cursor early-access support landed in the latest Nightly builds .
  • CLI update:llm-anthropic0.25 added claude-opus-4.7 with thinking_effort: xhigh.

💡 WORKFLOWS & TRICKS

  • Use repo references, not vague descriptions. Simon Willison's latest large-codebase pattern: clone the reference repo to /tmp, point the agent at the exact file to change, tell it which existing logic to imitate, then force self-validation with a local server and browser automation against the live site. He used that recipe to update blog-to-newsletter.html and ship PR #268 .
  • Keep your agent setup boring and portable. Soumitra's Codex recipe is: install Slack, Gmail, and Computer Use; keep slides/docs inside the app so you can point and annotate changes; talk naturally; turn repeat work into skills. Riley Brown's add-on is to keep those skills as markdown/SOPs backed by a Notion or Obsidian knowledge base so you can port them between tools later .
  • Run agents in parallel because waiting is now the bottleneck. Peter Steinberger says his typical workflow is now 5-6 parallel sessions/windows; Riley says strong devs are working on 5-10 parts of a codebase at once, and left-panel chat switching is the interface that makes that practical .
  • Treat agent security as an architecture problem, not a warning banner. Peter's checklist: the dangerous combo is data access + untrusted input + outbound communication. Keep personal agents personal, sandbox team agents, mark web/email as untrusted, and keep gateway tokens local-only or inside a private network .
  • If agents are shipping for you, audit the deployment defaults too. Matthew Berman cut a Vercel bill from $800 in two weeks to a couple dollars per week by switching from turbo to elastic build machines, disabling on-demand concurrent builds, and in some cases using GitHub Actions for builds while leaving Vercel for deploys .

👤 PEOPLE TO WATCH

  • Simon Willison — Still the cleanest source for reproducible agent workflows on real repos. His latest prompt pattern is practical, and he's explicitly pushing back on the idea that agents only help on greenfield work .

“I don’t think that idea holds up any more”

  • Peter Steinberger — Worth following if you care about what breaks after the demo: parallel-session workflows, human taste, system design, and security boundaries from someone running one of the fastest-growing open-source agent projects .
  • Theo — High-signal because he publishes the ugly logs. His main point today: separate raw model quality from harness quality before you call a model "dumber" .
  • Riley Brown — Useful for aggressive operator playbooks: Codex/Claude setup, scheduled tasks, remote control from phone, and skills/SOPs that make agents act more like personal staff .
  • ThePrimeagen — Good antidote when benchmark screenshots start flying. His Berkeley roundup shows how easily agent benchmarks can be gamed with git log, monkey patches, config leaks, or judge hacks .

🎬 WATCH & LISTEN

  • Theo — 16:33-20:12. Good clip if you're trying to decide whether Opus 4.7 failures are model regressions or Claude Code harness problems. He walks through a real modernization task that targeted outdated versions, burned time, and still broke the build .
  • Peter Steinberger — 11:18-14:34. Best short security segment of the day. He explains the "lethal trifecta" and the guardrails he actually recommends for personal vs team agents .
  • Riley Brown — 2:06-3:06. Fast explanation of why agent UIs are converging on left-side chat stacks: if one agent is busy, you should already be in the next thread .

📊 PROJECTS & REPOS

  • OpenClaw — Peter Steinberger says the open-source personal agent framework is only 5 months old but already at roughly 30k GitHub stars, around 30k commits, and nearing 2k contributors.
  • Journey Kits — Matthew Berman's new open project packages reusable agent workflows as skills + tools + memory. His example daily-brief kit assembles schedule, priorities, local weather, and meeting prep, and kits are scanned for prompt injections and malware before distribution .
  • Graphify — New open-source project that turns any folder into a navigable knowledge graph in one command; the pitch is persistent knowledge instead of re-reading files or refetching RAG chunks every time, and it shipped within 48 hours of Karpathy's post .
  • Journey Chat — Experimental agent-to-agent chat for sharing learnings directly between teammates' agents instead of routing everything back through humans .

Editorial take: the edge is moving away from raw model choice alone and toward who has the cleaner harness, tighter validation loop, and an agent stack that can actually touch the rest of their tools.

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 107 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+104

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.