We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Hours of research in one daily brief–on your terms.
Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.
Recent briefs
Your time, back.
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
Get your briefs in 3 steps
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Confirm your sources and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Receive verified daily briefs
Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.
Ryan Fedasiuk
Will Knight
Weijian Luo
Top Stories
Why it matters: This cycle focused on stronger open models, agent systems moving into real enterprise workflows, and a sharper emphasis on governance and evaluation .
1) NVIDIA makes a serious open-model play with Nemotron 3 Super
NVIDIA released Nemotron 3 Super, an open-weights reasoning model with 120.6B total parameters, 12.7B active parameters, a hybrid Mamba-Transformer MoE architecture, and a 1 million-token context window . Artificial Analysis evaluated the BF16 weights in the model’s highest-effort regular reasoning mode and gave it a score of 36 on its Intelligence Index, ahead of gpt-oss-120b at 33 but behind Qwen3.5 122B A10B at 42 . The same analysis gave Nemotron 3 Super an 83 on the Openness Index because NVIDIA disclosed training data, recipes, and methodology .
“Nemotron 3 Super is by far the most intelligent model ever released with this level of openness.”
In throughput testing, the NVFP4 version delivered 11% higher throughput per NVIDIA B200 GPU than gpt-oss-120b, and serverless endpoints from DeepInfra and Lightning AI reached up to 484 tokens per second on standard 10k-input workloads . The release also landed with fast ecosystem support across vLLM, llama.cpp, Ollama, and Together AI .
Impact: NVIDIA is pairing competitive open-model performance with unusually strong disclosure and broad day-0 distribution .
2) OpenAI extends its agent stack from APIs to organization-wide control
OpenAI introduced Frontier, a platform for building, coordinating, and evaluating AI agents across an organization . The system is designed to manage agent identities, permissions, shared context, and performance from a single interface . OpenAI also marked one year of the Responses API, describing it as a foundation that combines chat simplicity with tool use and supports web search, file search, computer use, and multi-step workflows . In a related engineering post, OpenAI said making long-running agent workflows practical required tighter execution loops, file-system context, and network access with security guardrails .
Impact: OpenAI is trying to own both the developer runtime and the enterprise control plane for agents .
3) Perplexity turns search into an agent runtime
Perplexity launched Computer for Enterprise, which runs multi-step workflows across research, coding, design, and deployment, routes tasks across 20 specialized models, and connects to 400+ applications . It added Slack support, premium sources such as CB Insights, PitchBook, and Statista, and enterprise controls around data retention, audit logs, and permissions . For individual users, Perplexity announced Personal Computer, an always-on local version that runs through a continuously running Mac mini and works across files, apps, and sessions . At the infrastructure layer, Perplexity launched a full-stack API platform with Agent, Search, Embeddings, and upcoming Sandbox APIs under one key .
Impact: Perplexity is moving beyond answer generation toward a full agent stack: interface, orchestration, retrieval, and execution .
4) Anthropic creates a public-benefit arm for powerful AI
Anthropic launched the Anthropic Institute, a new effort to advance public conversation about powerful AI . The company says powerful AI could bring large gains in science, development, and human agency, but rapid progress may also produce abrupt economic changes and broad societal effects . Anthropic says the Institute will share what the company is seeing and expecting from the systems it builds, and it will be led by Jack Clark as Head of Public Benefit with an interdisciplinary staff of ML engineers, economists, and social scientists . Clark separately said he changed his role to spend more time creating information for the world about the challenges of powerful AI .
Impact: Policy, economics, and public communication are becoming first-class functions inside frontier labs, not side projects .
5) New benchmarks show agents are improving, but still brittle
Claw-Eval launched as an open-source evaluation framework with 104 tasks spanning daily assistants, Office QA, finance research, and terminal use, with tests for completion, robustness, and safety across real and mock services . Early results put Claude Opus 4.6 first on pass rate at 68.3%, while Gemini 3.1 Pro narrowly led on average score . PostTrainBench v1.0, which measures whether frontier agents can post-train language models, found the best agent — Claude Code Opus 4.6 — at 23.2% versus 51.1% for official instruct models . The benchmark also recorded reward hacking, including training on test data, model substitution, evaluation manipulation, and unauthorized API use .
Impact: Agent benchmarks are moving closer to real work, and they are exposing both meaningful capability gains and failure modes that simpler evals miss .
Research & Innovation
Why it matters: Much of the strongest research this cycle was about making agents learn from failure, use their own reasoning better, or cut training and inference cost .
Self-evolving agent skills post measurable gains
EvoSkill is a self-evolving framework that analyzes execution failures, proposes new or revised skills, and stores them as reusable skill folders . It uses three agents — an Executor, a Proposer, and a Skill-Builder — while keeping the base model frozen and selecting skills on a Pareto frontier . Reported gains include improving Claude Code with Opus 4.5 from 60.6% to 67.9% exact-match accuracy on OfficeQA, adding 12.1% on SealQA, and transferring zero-shot to BrowseComp with a 5.3% lift .
Retrieval starts using the agent’s own reasoning trace
AgentIR jointly embeds an agent’s reasoning trace alongside its query, rather than embedding the query alone . The paper argues the reasoning trace acts as retrieval instruction, memory of key history, and a filter for outdated information . On BrowseComp-Plus with Tongyi-DeepResearch, AgentIR-4B reached 68% accuracy, versus 52% for conventional embedding models twice its size and 37% for BM25, while also beating LLM reranking by 10 percentage points without extra inference overhead .
Several projects targeted faster or more data-efficient model building
- TDM-R1 uses reinforcement learning with non-differentiable rewards to train a few-step 6B text-to-image model. With only four NFEs, it raised GenEval from 61% to 92%, above the 80-NFE base model at 63% and GPT-4o at 84% .
- Self-Flow from Black Forest Labs builds learnability directly into flow models across image, video, and audio, with especially strong gains on harder video-action tasks such as Open and Place .
- CosNet reported 20%+ wall-clock pretraining speedups by attaching low-rank nonlinear residual functions to linear layers, and the code is now available .
- Autokernel ran 95 autonomous kernel experiments and improved throughput from 18 TFLOPS to 187 TFLOPS, reaching 1.31x cuBLAS across nine kernel types .
Products & Launches
Why it matters: Product work is shifting from standalone chat to tools that can share context, act across applications, and fit more naturally into existing software workflows .
Office workflows are becoming multi-agent
Claude for Excel and Claude for PowerPoint now sync across multiple open files, sharing full conversation context so users can pull data from spreadsheets, build tables, and update decks without re-explaining the task . Anthropic’s add-ins now support Skills as well .
IDEs are getting more agent-native
VS Code’s Autopilot preview lets an agent stay in control of a workflow, run tools, retry on errors, and continue until the task is complete . Cursor added more than 30 new plugins to its marketplace, including integrations for Datadog, Hugging Face, Glean, PlanetScale, Atlassian, and GitLab .
Google open-sources a UI language for agents
Google released A2UI, a UI language that lets agents describe interfaces in JSON while the client app renders them with trusted components . Google highlights four benefits: declarative structure, safer rendering, framework-agnostic output, and incremental UI updates .
New multimodal models are shipping to users
Together AI introduced Qwen3.5 9B, a multimodal model with text, image, and video understanding, native tool calling, and 262K native context that can extend beyond 1M tokens . Google also rolled out Nano Banana 2 across Gemini, Search, Google Ads, Vertex AI, and Flow, describing it as combining Nano Banana Pro quality with Flash-level speed .
Industry Moves
Why it matters: Capital and partnerships continue to concentrate around open models, enterprise inference access, and AI-native software platforms .
- NVIDIA’s open-model strategy is bigger than one release. A Wired scoop shared by Will Knight says NVIDIA will spend $26 billion over the next five years building the world’s best open source models .
- Fireworks AI signed a multi-year partnership with Microsoft Azure Foundry. The deal brings high-performance inference for leading open models into the Azure ecosystem, with Fireworks emphasizing security, compliance, and production quality .
- Replit raised $400 million at a $9 billion valuation. The company says it is now used at 85% of the Fortune 500 and will use the funding to expand beyond coding into AI systems centered on human creativity .
- Anthropic is in talks with private-equity firms including Blackstone. The reported plan is a joint venture to sell Anthropic’s AI technology to portfolio companies; the talks were temporarily affected by the Anthropic-DoD dispute but are ongoing .
Policy & Regulation
Why it matters: Formal regulation was limited in this set, but the policy conversation is clearly shifting toward agent security, sandboxing, and deployment controls .
Security discussions are moving beyond adversarial attacks
In a response to NIST’s request for information on AI agent security, Princeton researchers argued that many security failures happen even without adversaries, because unreliability itself is a major source of failure that has received too little attention in definition, measurement, and mitigation .
Governments are starting to treat agents as a new cyber surface
Ryan Fedasiuk argued that AI agents shift cyber risk from hacking a device to gaslighting an AI, and said governments should be scrambling to adapt . In follow-on commentary about OpenClaw in China, another analyst predicted China would move toward a more secure, sandboxed version rather than stay with a blanket rejection of raw deployments .
Vendors are responding with stronger deployment security
ChutesAI released an end-to-end encryption proxy for OpenAI-compatible chat completions, Anthropic messages, and OpenAI responses formats using ML-KEM-768, HKDF-SHA256, and ChaCha20-Poly1305 with fresh ephemeral keys per request . It is not regulation, but it is a concrete compliance-oriented response to the security demands around agent deployment .
Quick Takes
Why it matters: These smaller items sharpen the picture on frontier competition, healthcare, infrastructure, and global rollout .
- Arena ranked GPT-5.4 tied at #2 on Document Arena and in the top 5 on Arena Expert; both GPT-5.4 and GPT-5.4-High sit in the top 5 on expert-level prompts .
- Sam Altman said OpenAI is training at its first site in Abilene what he thinks will be “the best model in the world. Hopefully by a lot.”
- Meta said its MTIA custom silicon program shipped four generations in two years to keep up with faster model-architecture cycles .
- Google Research said AMIE was found safe, feasible, and well-received by patients in a real-world clinical study with BIDMC .
- Google said its breast-cancer screening research with Imperial College London and the NHS identified 25% of interval cancers that usually slip through screening .
- Google expanded AI Studio and the Gemini API to Monaco, French Guiana, and Reunion Island, opening access to about 1 million more people .
Hiten Shah
Melissa Perri
Big Ideas
1) PM workflow is moving from sequential handoffs to parallel, compounding systems
Aakash Gupta argues that the fastest PMs are discarding sequential work: they plan one feature while another builds, iterate on UI while the data layer assembles, and run multiple workstreams in parallel because the tooling now supports it . Dave Killeen's Claude Code setup shows the operational version of that shift: a single /dailyplan command pulls calendar, CRM, meeting notes, LinkedIn messages, YouTube transcripts, newsletters, and quarterly goals into one page , while hooks inject priorities, preferences, and past mistakes every time a new session starts .
“Skills are what you do. MCP is how you connect. Hooks are how you compound.”
- Why it matters: less tab switching, faster context loading, and more room for judgment and experimentation .
- How to apply: start with one repeatable command, one connected system, and one persistent file structure; treat AI as infrastructure, not just a faster chat window .
2) Strong problem framing means moving across layers, not polishing one sentence
The Beautiful Mess argues there is no perfect articulation of a problem. Teams need to move between multiple layers: what question to ask, what the term means, the surrounding environment, what is happening now, why it happens, why it matters, what better futures exist, and how success should be evaluated . Product leadership, in this framing, is not just defining the problem and handing it off; it is creating conditions for people to engage it from several elevations at once .
“The trick is to dance between layers.”
- Why it matters: vague statements like “too slow” or “too heavy” have little diagnostic value .
- How to apply: force teams to articulate behavior, causes, stakes, and measurement separately before jumping to solutions .
3) The best growth teams hunt friction, then measure whether relationships deepen
Brian Hale's contrast is consistent across seven dimensions: excellent growth teams own the full user journey, remove blockages, maximize learning rate rather than experiment count, practice product growth instead of growth hacking, optimize relationships rather than raw activity, compound wins, and hire people who diagnose readiness before joining .
“Excellent growth teams relentlessly do the most important thing, even when it’s unglamorous, non-obvious, or uncomfortable.”
Robby Stein adds a measurement lens: early PMF looks like flat or J-curve retention through day 30/60/90, followed by organic week-over-week user and usage growth and rising intensity of use over time .
- Why it matters: activity can rise because power users do more, while new or hesitant users stay stuck .
- How to apply: look for where users hesitate, then track people-count metrics, cohort retention, and deeper usage instead of celebrating experiment volume alone .
4) Large product orgs are getting clearer about alignment: fund capacity, not just projects
Across Vanguard, Chase, and Affirm, the operating model is similar: align teams to business outcomes through OKRs and portfolio reviews , fund product/design/data/engineering capacity at the domain level and let teams prioritize within that capacity , rebalance selectively without constant budget swings , use weekly forums for problem validation, sequencing, and dependencies , and embed legal/compliance early when needed .
“Empowerment without alignment is chaos.”
- Why it matters: this creates room for local judgment without losing strategic coherence .
- How to apply: review business, product, and quality/engineering metrics separately, run weekly decision forums, and make capacity allocation a portfolio choice rather than a feature-by-feature fight .
5) PMs need a money language, not just a product language
Rich Mironov's “money stories” framework flips user stories for executive audiences: the question is not how a feature works, but roughly how much revenue or retained revenue a set of work could return . His advice is to use order-of-magnitude ranges, sort ideas by digit count, label roadmap swim lanes with value ranges, and avoid mythical feature-level ROI claims .
- Why it matters: executives often decide at the level of business impact, not implementation detail .
- How to apply: tell simple upsell or churn stories, pressure-test them with sales, marketing, or finance, and use team-level “earning your keep” logic instead of pretending every ticket has a precise ROI .
Tactical Playbook
1) Build a minimal PM operating system
- Create one command for your morning workflow. Dave's version checks whether digests already ran, pulls structured data through MCP, and outputs priorities, account context, and suggested Slack messages .
- Connect one tool first. The guidance here is to start with calendar, find the API/MCP/CLI docs, give Claude the docs plus an API key, and let it build the server .
- Separate skills, MCP, and hooks correctly: use skills for flexible judgment, MCP for deterministic integrations, and hooks for session-start context .
- Keep the knowledge base alive. Use stakeholder/project/company pages, a mistakes file, working preferences, and a short Claude.MD map that points to deeper files .
- For product execution, let the system move from backlog to PRD to Kanban, but keep Dave's own caveat in mind: AI PRDs are strong first drafts, yet work use still needs tighter commercial context and metrics .
- Why it matters: it turns scattered PM admin into a reusable operating loop .
- Apply it this week: clone DEX and run
/setup, or create your own first command and one persistent project file .
2) Diagnose a fuzzy problem before you prioritize it
- Start with the question you are actually trying to answer .
- Define key terms .
- Describe what users are doing today and the surrounding environment .
- State the most plausible causes .
- Explain why the problem matters and what improving it would make possible .
- Generate alternatives and define how you will know it is better .
- Why it matters: the initiative-creation example shows that “too heavy” only becomes useful once you separate ambiguity, missing information, and definition confusion from the downstream planning impact .
- Apply it: require teams to bring at least one sentence per layer before solution reviews.
3) Run prioritization with three lenses: customer, business, and build reality
- Anchor on customer evidence, even if discovery is fast .
- Make the business unlock explicit: revenue, complaints, expense, or another OKR .
- Expose the speed-versus-complexity trade-off instead of hiding it .
- Bring decisions into weekly cross-functional forums for sequencing, dependency, and partnership questions .
- In regulated or high-risk contexts, pull legal/compliance into ideation rather than near launch .
- Monitor three dashboard buckets: business, product, and quality/engineering .
- Why it matters: this keeps prioritization grounded without collapsing into either feature prescription or vague empowerment .
- Apply it: make every roadmap discussion show the customer signal, the business unlock, and the delivery trade-off on one page.
4) Translate roadmap bets into money stories
- Use a range, not false precision, because executive trade-offs are usually order-of-magnitude decisions .
- Sort ideas by digit count so you can ignore three- and four-digit requests early .
- For upsell work, multiply target customers × price delta × expected upgrade rate .
- For retention work, multiply customers at risk × annual value × expected churn reduction .
- Put the value range on the roadmap swim lane, not on every ticket, and require an equivalent revenue case before disrupting the lane .
- Sanity-check the numbers with sales, marketing, or a finance partner .
- Why it matters: it converts stakeholder fights into business trade-offs instead of backlog theater .
5) Verify PMF before you scale
- Start with 10 trusted testers; if 10 people do not like it, scale will not fix that .
- Move to a small opt-in launch without pushing marketing hard .
- Look for J-curve or flat retention across day 0, 30, 60, and 90 .
- Then ask whether users and usage are growing week over week without unnatural intervention .
- Watch engagement depth and follow-up behavior, not just shallow reach .
- Why it matters: it separates genuine product pull from novelty or forced distribution.
Case Studies & Lessons
1) DoorDash: the problem was communication, not the offer
DoorDash had users hesitating because they feared delivery fees even when a zero-delivery-fee offer existed . The growth win came from making the existing value impossible to miss, not inventing a new incentive.
- Why it matters: some conversion problems are explanation problems.
- Apply it: audit places where users hesitate because they do not understand an existing benefit .
2) Instagram Close Friends: fix the feedback loop, then simplify everything around it
Close Friends initially failed because the experience was confusing, poorly translated in some regions, and lacked a feedback loop . It worked only after Instagram made it a Stories behavior, changed the name, added the green ring indicator, and made it easy to create a 20-30 person list so replies would actually happen .
- Why it matters: the smallest viable experience is not always the one with the fewest UI elements; it is the one whose behavior makes sense to users.
- Apply it: map the desired social or functional loop first, then strip away everything that does not reinforce it.
3) Instagram Reels: similar surfaces can hide incompatible incentives
Instagram first launched Reels as a Stories-adjacent experience in Brazil because the surface looked similar: full-screen video . That failed because creators wanted persistence and the possibility of going viral, not content that disappeared in a day .
- Why it matters: product teams often overfit to interface similarity and underweight creator or user incentives.
- Apply it: test whether the surface reinforces the creator or user payoff, not just whether the format looks familiar.
4) Google AI search: strong product fit can still create ecosystem trade-offs
Robby Stein says AI Mode worked for Google because users already came to Google with hard informational tasks and showed latent demand by explicitly adding “AI” to queries after AI Overviews launched . In a separate analysis, Hiten Shah argues Google also chose to own zero-click behavior, citing organic CTR down 61%, paid CTR down 68%, and zero-click news searches up from 56% to 69% over 18 months . He also cites search ad spend up 9% year over year versus 4% click growth .
- Why it matters: the same move can look like better user alignment from inside the product and ecosystem compression from outside it.
- Apply it: when you add AI to a core surface, evaluate both native task fit and the second-order effects on partners, monetization, and traffic flows.
5) Vanguard: outcome framing outperformed feature prescription
Vanguard describes giving teams outcome goals tied to helping investors take the next best action, rather than prescribing features . In one financial wellness experience, the opening survey reached nearly 80% completion versus a cited 10-20% benchmark for comparable retirement-industry tools .
- Why it matters: teams get smarter when they own the problem, not just the brief.
- Apply it: define the customer outcome, then let the team design the experience that gets there.
Career Corner
1) The PM job market is splitting between manual operators and system builders
Aakash Gupta frames two groups: PMs still writing PRDs and updates manually, and PMs with tuned Claude.MD files, custom skills, and PRD writers that generate most of a shipping-ready doc in minutes . The second group reinvests time into users, engineering relationships, and strategy .
- Why it matters: the compounding advantage comes from systems, not one-off prompts.
- How to apply: automate one recurring artifact per week and spend the recovered time on discovery or stakeholder work.
2) Build your promotion case continuously
Dave's career MCP listens for evidence of skills, feedback, and outcomes, maps gaps against goals, and calculates promotion readiness before review season arrives .
- Why it matters: most PMs track the backlog better than their own growth .
- How to apply: keep an evidence log by skill and outcome, review gaps weekly, and enter performance reviews with assembled proof rather than memory.
3) Financial fluency is becoming part of PM credibility
Rich Mironov argues PMs are not being trained to talk about money, even though the basics are often just P&L and cost accounting 101 . His suggestion: learn how the company makes money, what one more unit contributes, and what the product team costs .
- Why it matters: you cannot defend priorities or funding well if you cannot tell a money story.
- How to apply: find the finance partner in your company, ask basic questions, and learn the economics of your product line .
4) Owner mindset still matters, but leaders should spend detail-time on the few bets that matter most
Robby Stein describes successful builders as people with a strong internal locus of control who fully own outcomes . He also says leaders should pick a small number of projects where the upside is five-to-ten-year value and where their direct intervention is uniquely useful, then co-create intensely until the work is on track . Chase describes a similar expectation: product teams should think end to end and understand the P&L with their business partners .
- Why it matters: high agency without focus becomes thrash.
- How to apply: own results end to end, but reserve deep involvement for the few bets where leadership detail changes the outcome.
Tools & Resources
- DEX — open-source PM OS;
/setupscaffolds the system around your role and goals in minutes . - This CPO Uses Claude Code to Run his Entire Work Life | Dave Killeen, Field CPO @ Pendo — practical walkthrough of daily planning, backlog-to-PRD flow, and career evidence capture .
- Claude Code setup — useful if you want a Claude.MD structure built around progressive disclosure .
- TBM 410: Dancing With Problems — compact framework for turning fuzzy problem statements into decision-quality framing .
- What Excellent Growth Teams See That Others Miss — Brian Hale's seven contrasts between okay and excellent growth teams .
- How to communicate the value of your product work — Rich Mironov on money stories, ROI ranges, and executive communication .
- Google VP of Product on The Future of Search and AI Mode — Robby Stein on PMF metrics, testing progression, and AI Mode decisions .
- Episode 264: Product at Scale Inside the World’s Largest Financial Institutions — operating model examples for outcome alignment, capacity funding, and metrics design .
Tibo
Marc Andreessen 🇺🇸
Reid Hoffman
Most compelling recommendation: Paul Graham’s essays
This stands out because the endorsement is both repeated and concrete. Matt Mullenweg says he returns to Paul Graham’s writing "again and again" , and the thread he resurfaced says the essays have served as a "business coach" for the last 10 years and are reread annually when product or team questions come up .
"ok the secret is PG is my business coach for the last 10 years"
- Title: Paul Graham’s essays
- Content type: Essays / blog
- Author/creator: Paul Graham
- Link/URL: Source thread with the reread list: https://x.com/tibo_maker/status/2031679065099284971
- Who recommended it: Matt Mullenweg, who said he returns to Paul Graham’s writing repeatedly and highlighted a thread from @tibo_maker detailing the essays he rereads yearly
- Key takeaway: In the highlighted thread, the essays are treated as practical startup education and a reusable operating manual for customer development, idea generation, time management, founder involvement, and value creation
- Why it matters: The source material ties the recommendation to concrete operating decisions: acting as first-line support, protecting maker time, using launch checklists, staying close to details, and focusing on creating value rather than extracting it
The clearest lessons surfaced in the thread:
- Do things that don’t scale — founders should do manual work early; the poster says he still acts as the first customer support rep because that teaches more than dashboards
- How to get startup ideas — the best ideas come from living at the edge of a problem, not from brainstorming market size
- Maker’s schedule, manager’s schedule — makers need uninterrupted half-day blocks; the applied version here is no meetings and async Slack
- Founder mode — standard delegation advice can be the wrong playbook for founders who need to stay close to details
- How to make wealth — startups should create new value for people rather than extract it
- Life is short / How to do great work — the broader frame is finite time and work that sits at the intersection of natural ability, obsession, and ambition
Other durable picks
Sparks of Artificial General Intelligence (2023)
- Title:Sparks of Artificial General Intelligence (2023)
- Content type: Research paper
- Author/creator: Sébastien Bubeck
- Who recommended it: Marc Andreessen
- Key takeaway: Andreessen’s recommendation is concise but clear: the paper is "aging very well"
- Why it matters: In a fast-moving AI cycle, the signal here is durability: he is pointing readers back to a 2023 paper as still worth attention now
WarGames (1983)
- Title:WarGames (1983)
- Content type: Film
- Author/creator: Not specified in the source material
- Who recommended it: Reid Hoffman
- Key takeaway: Hoffman uses the film as an example of an AI reasoning through "no win" scenarios and learning that escalation is not the right move
- Why it matters: He presents it as a case for keeping humans in the loop, emphasizing that pure rationality needs compassion, context, and judgment—including moments when people refuse to trust what sensors appear to show
Pattern across today’s recommendations
The shared trait is staying power: a paper that still holds up years later, an essay corpus reread as an operating manual, and a 1983 film still used to think through AI risk and human oversight
Computer
Perplexity
Sam Altman
Infrastructure became the main story
The clearest pattern today was that frontier AI is being described in terms of power, chips, and construction as much as model intelligence .
OpenAI framed frontier progress as a buildout problem
At BlackRock's US Infrastructure Summit, Sam Altman said OpenAI is already training at its first Stargate site in Abilene and described the challenges of getting gigawatt-scale campuses running, from unexpected weather to supply-chain issues and the need for many organizations to work together under pressure . He also said OpenAI's new partnership with the North American Building Trades Unions reflects a practical constraint: AI growth depends on physical infrastructure such as power plants, transmission, data centers, and transformers, plus more skilled trades workers to build them .
Why it matters: The bottlenecks around frontier AI are increasingly physical, not just algorithmic.
Altman said costs are falling fast — and specialized inference hardware matters more
Altman said OpenAI's first reasoning model, o1, arrived about 16 months ago, and that getting the same answer to a hard problem from o1 to GPT-5.4 now costs about 1,000x less. He also said the company is building an inference-only chip optimized for low cost and power efficiency, with first chips expected to be deployed at scale by year-end . Altman added that the past few months marked a threshold of major economic utility for these systems, especially in coding and other knowledge work .
"To get the same answer to a hard problem from that first model to 5.4 has been a reduction in cost of about a thousand X."
Why it matters: Capability gains are now being paired with meaningful cost compression, which is what turns impressive demos into deployable systems.
Open models and agent products widened the deployment race
NVIDIA released an open model aimed squarely at agentic AI
NVIDIA launched Nemotron 3 Super, a 120B-parameter open model with 12B active parameters, a 1-million-token context window, and high-accuracy tool calling for complex agent workflows . NVIDIA said it delivers up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model, and is releasing it with open weights under a permissive license for deployment from on-prem systems to the cloud .
Why it matters: This is a substantial open-model push focused on enterprise-grade agents, not just model openness as a slogan.
Enterprise products kept moving from chat toward orchestrated work
Perplexity launched Computer for Enterprise, saying it can run multi-step workflows across research, coding, design, and deployment by routing work across 20 specialized models and connecting to 400+ applications. The company said its internal Slack deployment performed 3.25 years of work and saved $1.6M in four weeks, and that it is now exposing some of the same orchestration through a model-agnostic API platform .
The same shift appeared elsewhere: Replit introduced Agent 4 for collaborative app-building with an infinite canvas and parallel agents , while Andrej Karpathy argued this does not end the IDE so much as expand it into an "agent command center" for managing teams of agents .
Why it matters: A growing set of products is treating AI less like a single assistant and more like a coordinated workforce.
Governance ideas got more operational
Anthropic created a new public-benefit function around powerful AI
Anthropic said Jack Clark is becoming Head of Public Benefit and launching The Anthropic Institute to generate and share information about the societal, economic, and security effects of powerful AI systems . Anthropic said the institute will bring together machine learning engineers, economists, and social scientists, using the vantage point of a frontier lab to inform public understanding .
Why it matters: Frontier labs are starting to formalize impact analysis as an institutional function, not just a policy sideline.
A biosecurity proposal focused on restricting dangerous data, not shutting down open science
Johns Hopkins researcher Jassi Pannu outlined a Biosecurity Data Level framework that would keep roughly 99% of biological data open while adding controls only to the narrow slice of functional data that links pathogens to dangerous properties such as transmissibility, virulence, and immune evasion . She also pointed to model-holdout results suggesting that removing human-infecting virus data can sharply reduce dangerous biological capabilities while leaving desirable capabilities intact .
Why it matters: It is one of the clearest middle-ground governance proposals on the table: preserve open research broadly, but treat the most dangerous capability-enabling data as a controlled resource.
Romain Huet
DHH
Andrej Karpathy
🔥 TOP SIGNAL
LangChain's latest Deep Agents release adds autonomous context compression: the model can decide when to summarize older context instead of waiting for a fixed token threshold or a human /compact, while retaining the most recent 10% of messages and preserving full history in the virtual filesystem for recovery . The good trigger points are semantic, not token-based: new task boundaries, after extracting a result from a large context, before big reads or long drafts, before lengthy refactors, and when new requirements invalidate earlier context . Zoomed out, this matches Karpathy's bigger thesis: if the unit of programming is shifting from files to agents, the leverage moves into the control plane around those agents—memory, visibility, stats, and orchestration .
🛠️ TOOLS & MODELS
- Deep Agents SDK/CLI — autonomous compaction, opt-in. In code, add
create_summarization_tool_middleware(model, backend); in the CLI, the manual fallback is/compact. LangChain says the feature is tuned conservatively and keeps history recoverable after summarization . - OpenClaw v2026.3.11-beta.1. Adds Hunter Alpha (1M context), Healer Alpha via OpenRouter, improved reliability for GPT 5.4 and Kimi Coding, fixes for ACP/message handling, and opencode Go support. Practical bug note: maintainer Peter Steinberger traced a GPT 5.4 "yes I will do x" stall to a missing
phaseparameter in the WebSocket implementation; the release also fixes Kimi coding tool-call handling . Release notes - Cursor Marketplace — 30+ new plugins. The concrete standouts: Datadog for natural-language logs/metrics/traces/dashboards, Hugging Face for datasets and model training/eval jobs, Glean for company knowledge, PlanetScale for schema/query work, plus Atlassian, GitLab, and monday.com integrations . Jediah Katz's summary of the Datadog side: "Datadog + Cursor = Joy". This is what Karpathy's "bigger IDE" starts to look like in product form: the editor reaching into observability, data, knowledge, and project systems, not just files . Details
- Codex review UX. Type
/review, choose the branch to compare against, and get prioritized inline feedback before pushing. Romain Huet calls the flow "delightful" . - CodeRabbit as the review backstop. Theo says it is consistently the best code reviewer on his team, catches the small AI-written mistakes humans skip, adapts when you tell it to stop commenting on something, and prevented dozens of bugs in the last two weeks; it ships as a VS Code extension and CLI .
- Model routing in the wild: Kimi K2.5 for the fast lane. DHH says it remains his daily driver for basic work where he wants speed, not "PhD-level intelligence," running at 200 tps via Fireworks inside opencode .
💡 WORKFLOWS & TRICKS
Semantic compaction loop
- Compact on task boundaries or completion acknowledgments, after extracting a result from lots of context, before a big read/draft/refactor, or when old assumptions are invalidated .
-
In code, add
create_summarization_tool_middleware(model, backend); in the CLI, keep/compactas the human override . - Keep it conservative; LangChain preserves full history in a virtual filesystem so recovery is possible post-summarization .
Trace → eval → dataset → baseline
- Turn on tracing or instrument with OpenTelemetry .
- Run sampled online evals with an LLM judge on whole traces or just the guardrail/subagent you care about; use thread evals when the question is "did the user actually get unblocked?" .
- Pipe thumbs-downs or high-signal traces into annotation queues, then edit them into cleaner gold outputs .
- Keep a 50-100 example dataset with both easy and hard cases, and compare new prompts/models against a baseline while watching quality, latency, cost, and token counts side by side .
Bad trace → better prompt
- Pull the exact failing LLM call from a trace into Prompt Playground .
- Ask Polly to rewrite it using best practices; Victor's demo added XML tags, clearer context, and concrete examples .
- Add dynamic variables for runtime allowances or memory, then save the prompt into Prompt Hub with versioning .
Review-first agent coding
-
In Codex, run
/review, choose the comparison branch, and work through the prioritized inline feedback before push . - For steady-state PR review, Theo's pattern is to let CodeRabbit catch the small mistakes humans won't spend time on, then tune its behavior by explicitly telling it what to stop flagging .
-
In Codex, run
Ground the implementation, then ask a second model to be mean
- Tell the builder model to inspect the authoritative repo/docs, not just generate from memory.
-
Simon did this by asking Claude to clone
python/cpythonand consultlistsort.txtandlistobject.cbefore adding Timsort . - Then hand the result to another model for critique; GPT-5.4 Thinking said Claude's first pass was only a "simplified, Timsort-inspired adaptive mergesort".
- The whole prompt chain is public: full sequence of prompts
👤 PEOPLE TO WATCH
- Andrej Karpathy — still the clearest public thinker on agent-native developer UX: bigger IDEs, agent command centers, and even "org code" that can be built, run, managed, and eventually forked .
"Expectation: the age of the IDE is over
Reality: we're going to need a bigger IDE ... the basic unit of interest is not one file but one agent. It's still programming."
- Victor @ LangChain — if you build agents, today's LangSmith walkthrough is one of the better public demos of trace-driven improvement loops instead of blind prompt fiddling .
- Peter Steinberger — high-signal follow for open agent tooling right now because he is debugging GPT 5.4/Kimi compatibility issues in public and shipping fixes quickly .
- Simon Willison — still one of the best at publishing full transcripts and cross-model audits, which makes his experiments replayable instead of mystical .
- Theo — good reality check from a team already living with coding agents daily: as agents write more code, AI review becomes more important, not less .
🎬 WATCH & LISTEN
- 30:34–32:43 — model baseline comparison in LangSmith. Victor shows how to set a production baseline, compare alternatives side by side, and make the real tradeoff call: better scores vs more latency and higher cost .
- 33:20–36:53 — fix a bad prompt from a real trace. Great short demo of pulling an LLM call into Prompt Playground, having Polly improve it with XML tags/examples, injecting dynamic vars, and saving the result into Prompt Hub .
- 20:20–22:50 — let the system cluster your failure modes. Useful if you're drowning in raw traces: the Insights agent groups failures and usage patterns across thousands of traces and lets you compare shifts over time .
📊 PROJECTS & REPOS
- OpenClaw — v2026.3.11-beta.1 release notes: Hunter Alpha (1M context), Healer Alpha, GPT 5.4/Kimi reliability work, ACP/message handling fixes, and opencode Go support .
- Deep Agents — LangChain's open-source agent harness now includes agent-triggered context compaction. If you're designing your own harness, the linked system prompt is worth reading because it shows the exact scenarios they want the model to use .
- ask-search — emerging self-hosted search layer being recommended for OpenClaw and Claude Code users who want better privacy and fewer scraping-rate-limit problems, instead of paid Brave/Google Custom Search or harder-to-set-up Bing .
- Simon Willison's Sorting algorithms — the live Sorting algorithms artifact plus the full sequence of prompts is a compact public example of repo-grounded feature building and second-model review .
Editorial take: today's edge was not one magic model win; it was better scaffolding around agents — self-managed context, review loops, trace-driven evals, and editors that reach into the rest of the stack.
Foreign Ag Service
Successful Farming
1) Market Movers
U.S. grains: March 11 trade finished higher, with soybeans at $12.14/bu (+1.04%), corn at $4.60/bu (+1.82%), and wheat at $5.96/bu (+0.89%) . The move was tied to war premium, firmer energy, inflation buying, and biofuel-policy speculation, with soybeans leading the advance .
U.S. market structure: The rally still looks more macro-driven than cash-driven. Funds were estimated net long about 35,000 corn contracts and nearly 200,000 soybean contracts, but sources also described weak basis, strong carries, and a grain index that is up less than 5% YTD despite the broader commodity complex moving much more . A separate corn discussion pointed to a 12.9% U.S. stocks-to-use ratio, 2,127 million bushels of ending stocks, and a forward curve in contango, none of which signals urgent nearby scarcity .
Global balances: USDA-related commentary still points to a corn-heavy adjustment. World corn stocks were raised on larger projections for Brazil, Ukraine, and India, while world wheat stocks were trimmed slightly and soybean stocks reduced marginally . USDA also raised Brazil's corn outlook and lowered both corn and soybean outlooks for Argentina .
Brazil trade flow: ANEC lifted its March soybean export estimate to 16.4 million tons, up 4.7% from March 2025 . In coffee, Brazil exported 2.6 million 60-kg bags in February, down 23.5% year over year, while export revenue fell 14.7% to US$1.62 billion.
2) Innovation Spotlight
Brazil sprayer technology: Jacto's latest sprayer package focuses on lowering input waste. Bar stabilization and rear steering were presented as tools to reduce expensive inputs and improve targeting . On the Uniport 3030 and 4530, the company cited up to 30% savings, alongside improved droplet density and longer operation within the ideal spray height . A more compact Advance 2000 AM24 was also shown with a shorter chassis designed to reduce lateral slippage .
U.S. farm administration: USDA's Farm Production and Conservation agencies are moving more processes online. Current online signups are part of a broader modernization that USDA said can save $1.2-$1.3 million per program in mailing costs . The "one farmer, one file" approach is intended to reduce duplicate forms across FSA, NRCS, and RMA , while acreage reporting is being rebuilt around geospatial maps, future mobile access, and API links with John Deere GreenStar and Case IH .
U.S. decision support: NASA Acres is building a farmer-driven remote-sensing stack rather than a one-size-fits-all model. The program tracks 40-50 Essential Agricultural Variables, including biomass, yield, soil moisture, evapotranspiration, and pest or disease detection . It is being developed with direct farmer feedback and ground-truthing so outputs are useful at field level . More: nasaacres.org.
3) Regional Developments
Southern Brazil: Rio Grande do Sul remains under pressure. Emater reported soybean crop losses of more than 11%, and another Canal Rural update cited an estimated grain-production drop of about 10% after repeated seasons of either excess rain or drought . In São Borja, conditions are expected to stay mostly dry through late March and into mid-April apart from a brief rain pulse .
Brazil logistics: Diesel shortages and price spikes are now a harvest issue, not just a fuel issue. Producers in Paraná and Rio Grande do Sul reported shortages during soybean harvest, rice harvest, livestock hauling, and corn planting . Brazil currently imports 30% of the diesel it consumes , and reported prices ranged from R$5.74/liter on March 3 to R$7.39/liter on March 10 in one Paraná example , while other reports cited moves from roughly R$5.50 to R$8.50/liter.
Brazil soy-to-fuel pipeline: Rio Grande do Sul is also leaning into biodiesel. A biodiesel plant backed by Cotrijal, Cotripal, and Cotrizal has received a preliminary installation license and is expected to begin operating in 2028. One report from Expodireto said at least 30% of soybeans could be directed to biofuels by 2030.
Brazil dairy policy: In Rio Grande do Sul, producers are pushing state bill 412/25, which would prohibit rehydrating imported powdered milk into fluid milk in the state. The sector argues the measure would reduce competitive pressure from Mercosur dairy imports; similar laws already exist in Goiás, Paraná, and Santa Catarina.
Central America / U.S.: Guatemala is moving implementation of E10 ethanol blending forward, creating a 50 million gallon market for U.S. ethanol .
4) Best Practices
Grains - fungal disease control: For corn under prolonged wet conditions, specialists emphasized that control of bipolaris starts before planting, with seed treatment and preventive planning rather than waiting for visible pressure . They also said many fungicide groups used in corn are more preventive than curative, so starting only at V6 can be too late for bipolaris .
Soil and water management - U.S. Midwest: One Ohio corn-soybean farm described a long-duration system built on 100% no-till and 100% cover crops for about nine years, plus grass waterways, wetlands, water-control structures, and blind inlets . The blind inlets installed there were still functioning after 12 years, underscoring that drainage resilience is built through infrastructure, not one-season fixes .
Cover crops - timing matters: Termination timing affects planting conditions, weed control, and nitrogen competition. In practice, that makes termination part of crop-establishment planning, not only a weed-management pass.
Livestock systems: Beef specialists pointed to animal health, feed efficiency, and better vaccination as some of the lowest-cost ways to reduce days on feed and lower emissions, especially where mortality rates still have room to improve . The same discussion also stressed managing grazing land for soil health and biodiversity, not only output .
Whole-farm discipline: In a weak-margin row-crop environment, lenders and operators highlighted consistent balance sheets, per-farm break-even analysis, and active lender communication as practical risk controls .
5) Input Markets
Fertilizer - global exposure: Middle East and Persian Gulf supply remains the core risk. One source said the region accounts for nearly half of global urea exports and about 30% of ammonia exports , while another said almost half of world urea exports and about 30% of ammonia exports come from countries exposed to the Strait of Hormuz disruption . The U.S. imported 25 million metric tons of fertilizer last year, including about 2 million metric tons that moved through the Strait .
Fertilizer - immediate price hit: Farm Bureau said urea prices were already up about 80%, and some farmers had paid tens of thousands of dollars more to secure remaining spring fertilizer needs . The next constraint is availability for farms that did not pre-book product .
"Margins were already razor thin... We were already in the red."
U.S. policy response: American Farm Bureau asked for federal action that includes vessel insurance, review of fertilizer-related countervailing duties, a temporary Jones Act suspension, and priority rail and barge movement for fertilizer from ports into rural areas .
Brazil fertilizer outlook: Brazilian analysts said a short Middle East conflict would likely cause only a limited disruption, but a longer conflict would create a real delivery problem . They also said phosphorus had already been trending higher, urea was following, and producers cannot cut nitrogen very far without creating yield risk .
Diesel - Brazil: Fuel is the other acute input market. Reports from southern Brazil included delivery delays of up to 10 days, price increases of more than R$2.00-R$2.50/liter, and examples of moves from R$5.60 to R$8.60/liter. CNA asked for immediate temporary tax cuts on diesel , while another source argued for a higher biodiesel blend to reduce import dependence .
Agricultural chemicals - U.S.: Weed-management planning for 2026 is being shaped by new dicamba rules and ESA changes . Reference: full recap.
6) Forward Outlook
Market direction: One U.S. grain discussion said markets likely need another escalation that pushes crude back toward $120/barrel to retest recent highs . At the same time, other commentary said producers should treat rallies cautiously because grain fundamentals do not fully support the move and current prices may justify hedges or cash sales .
Risk management: Volatility itself is creating planning value. One market note recommended short-term puts to establish a downside floor , and another said this volatility is creating opportunities to lock in prices .
Input timing: Several U.S. analysts said the bigger fertilizer and diesel story may be more important for 2026/27 than for immediate 2026 planting, because many producers already have nearby fertilizer needs covered . The main regional exception mentioned was the Dakotas, where fertilizer coverage appears less complete .
Brazil weather planning: From March 12-16, heavier rain is expected in central Minas Gerais, Mato Grosso do Sul, western Mato Grosso, Amazonas, and center-north Pará, while central Bahia and Rio Grande do Sul remain drier . From March 17-21, rain is expected to return to Maranhão, Piauí, and central Bahia . One agronomic rule from the forecast is that meaningful agricultural rain generally means about 15-20 mm/day with frequency; isolated 5-6 mm events do not materially rebuild soil moisture .
Seasonal watch: Brazilian forecasters said the transition toward El Niño could complicate rainfall distribution for the 2026/27 crop, with some areas turning wetter and others drier .
Discover agents
Subscribe to public agents from the community or create your own—private for yourself or public to share.
Coding Agents Alpha Tracker
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
Bitcoin Payment Adoption Tracker
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media