Hours of research in one daily brief–on your terms.

Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.

Setup your daily brief agent
Discovering relevant sources...
Syncing sources 0/180...
Extracting information
Generating brief

Recent briefs

Claude Moves to the Desktop as T3 Code, Cursor, and LangSmith Sharpen the Loop
Mar 24
5 min read
100 docs
Alex Albert
Claude
LangChain
+9
Anthropic's Claude computer-use preview is the headline, but the sharper practitioner signal is the support stack around it: official CLI-based clients, faster repo search, and webhook-driven handoff for long-running agents. This brief also covers CodexBar 0.19.0, OpenClaw's latest beta, and the concrete workflows worth copying.

🔥 TOP SIGNAL

Anthropic pushed Claude past the repo window: the official Claude account says the new macOS research preview can open apps, navigate browsers, and fill spreadsheets in Claude Cowork and Claude Code, while Boris Cherny said Anthropic Labs is releasing full computer use in Cowork and Dispatch .

Elsewhere, teams attacked the adjacent bottlenecks: T3 Code used the official Claude CLI, community contributors added browser control to its open-source UI, Cursor cut search latency across huge codebases, and LangSmith showed a webhook flow for long-running agents .

"The future where I never have to open up my laptop to get work done is becoming real very fast"

🛠️ TOOLS & MODELS

  • Claude computer use (research preview) — Claude can now use your Mac to open apps, drive the browser, and fill spreadsheets. Officially this is a research preview in Claude Cowork and Claude Code on macOS; Boris Cherny said the release marks full computer use in Cowork and Dispatch, and noted the early Sonnet 3.6 prototypes were clunky but already showed the use cases .
  • T3 Code + Claude Code subscriptions — If Claude Code is already installed locally, Theo says you can just run npx t3 or use the T3 Code app; it talks to the local Claude Code CLI through Anthropic's Agent SDK, with no extra auth screen or API-key setup inside T3 Code . Theo contrasts that with OpenCode's dropped Claude Max plugin, which he says relied on its own harness, custom auth flow, and faked headers . He also calls out the economics: the Claude Code subscription is $200/month for up to $5,000 of compute .
  • Cursor Instant Grep — Cursor says it can search millions of files and return results in milliseconds, which directly speeds up agent task completion. They also published a build writeup covering the algorithms and tradeoffs; Jediah Katz called it singular technical work and said this is why alternatives feel slow . Writeup: cursor.com/blog/fast-regex-search.
  • CodexBar 0.19.0 — New release adds Alibaba Coding Plan support, subscription history charts, Cursor Total/Auto/API dashboard alignment, Codex code-review reset times, and a broader Claude stability/refactor pass . Release notes: v0.19.0.

💡 WORKFLOWS & TRICKS

  • Async completion alerts for long-running agents — Hari's LangGraph/LangSmith flow is clean and reusable:
    1. Clone the Deep Research example from LangChain's Deep Agents repo .
    2. Create webhook.py with a FastAPI route that receives the LangSmith payload, reads payload.values.messages[-1].content, and POSTs that final AI message to a Slack webhook .
    3. Register the FastAPI app under the HTTP app field in langgraph.json, then run langgraph dev.
    4. Create a background run with your thread ID, assistant ID set to research, an input message, and the webhook URL; the result is a Slack summary plus the full report in LangSmith tracing . Timeless pattern: don't poll long jobs—ship a webhook and move on . Docs: LangSmith webhooks.
  • Route models by task, not by brand loyalty — Theo says he uses 54 for most coding, then opens a new thread and switches to Claude for UI passes, quick tidy-ups, and small changes . The constraint matters: once you pick Claude Code for a thread, he says you can't switch harnesses mid-thread because the thread state, compaction, and related data are tied to that thread in the cloud . Practical takeaway: treat thread boundaries as routing boundaries.
  • Use Codex review as triage, not final judgment — Peter Steinberger's PR loop is blunt: let Codex find issues, ask whether the issue is actually clear, ask whether the proposed fix is the best possible one, then continue the tradeoff discussion and usually rewrite the PR . His warning is the timeless part: overly local fixes make the codebase unmaintainable .

👤 PEOPLE TO WATCH

  • Boris Cherny — high signal because he is speaking from the Anthropic Labs shipping team. He says that team shipped MCP, Skills, Claude Desktop, and Claude Code, and is now rolling out full computer use .
  • Theo — worth tracking because he is both shipping T3 Code and publishing the integration details: official CLI vs custom harnesses, subscription economics, and how he routes models across threads in daily use .
  • Peter Steinberger — useful today for three separate practitioner signals: CodexBar 0.19.0, a concrete Codex PR-review loop, and OpenClaw plugin/release activity .
  • Jediah Katz — short post, strong signal from someone building Cursor's agent: Instant Grep is why other tools feel slow .
  • Hari from LangChain — useful if you care about deployment mechanics, not just model chatter. Today's video walks through a full webhook-driven completion flow end to end .

🎬 WATCH & LISTEN

  • 2:00-4:29 — Build the Slack webhook handler. Hari shows the exact FastAPI route, the payload shape, and the one field that matters most: the final message at values.messages[-1].content.
  • 5:24-7:11 — Kick off a background run with a webhook URL. This is the concrete API/docs walkthrough: create a thread, call background run creation, pass the webhook endpoint, and wait for the Slack ping instead of babysitting the job .
  • 12:47-13:15 — Why T3 Code built a harness abstraction. Theo explains the real integration problem: every CLI exposes events differently, so supporting multiple providers means normalizing their weirdness instead of pretending the harness layer doesn't matter .

📊 PROJECTS & REPOS

  • T3 Code — The open-source UI keeps picking up contributions: a community contributor added browser integration, terminal support is next, and the main app now supports Claude Code subscriptions through the local CLI path .
  • OpenClaw — New beta v2026.3.22-beta.1 is out. Separately, Harold connected Codex App Server to OpenClaw via plugins, and steipete highlighted that as a plugins story worth watching . Release notes: v2026.3.22-beta.1.
  • Deep Agents repo — LangChain's webhook demo uses the Deep Research example from this repo; if you want to copy the same background-run pattern, it's the repo Hari recommends cloning locally .

Editorial take: today's edge wasn't a benchmark bump; it was better plumbing—desktop control, faster search, official harnesses, and async completion hooks that make agents usable in real workflows.

Anthropic’s Pentagon Fight and Nvidia’s Shift to AI Factories
Mar 24
4 min read
109 docs
Arthur Mensch
Jensen Huang
Ben Thompson
+7
A consequential Anthropic-vs.-government fight led the day, alongside Nvidia’s push toward secure rack-scale agent systems and clearer evidence that AI products are consolidating around integrated model-and-harness stacks. Research also sharpened the picture on cyber autonomy, model behavior, and how frontier systems should be evaluated.

The main story

Anthropic’s Pentagon case is becoming a test of how much control AI companies keep over government use

Anthropic is asking a federal judge in California to freeze the U.S. government’s supply-chain risk designation, which followed its refusal to let Claude be used for domestic surveillance or autonomous warfare . The company says that refusal is protected by the First Amendment, that the blacklist violated due process, and that Defense Secretary Pete Hegseth exceeded his authority; support filings have come from retired judges, civil-liberties groups, military officers, AI experts, and even rival firms .

Why it matters: This is landing alongside a White House AI framework that would preempt many state laws and make it easier to build data centers, and a reported procurement proposal that would require vendors to support "any lawful government purpose" even when companies object . Taken together, the fight is becoming a broader boundary-setting moment between model-provider policy choices and federal AI procurement power .

Infrastructure is moving up the stack

Nvidia is pushing from chips to AI factories, with security built in for agents

Jensen Huang described Nvidia’s "extreme co-design" as optimization across software, chips, networking, power, cooling, racks, PODs, and data centers because modern AI systems must shard models, data, and pipelines across many computers to get beyond linear scaling . He said Grace Blackwell racks were designed for LLM processing, while Vera Rubin adds a new CPU, storage accelerators, NVLink 72 for very large models in one computing domain, and a Grok rack for agentic workloads; he also pointed to power and supply-chain orchestration as the main blockers .

Why it matters: Huang’s bigger claim is that the unit of compute is now an AI factory, and that scaling now spans pre-training, post-training, test-time reasoning, and agentic systems . Nvidia paired that framing with OpenShell and NemoClaw, an open-source runtime and reference stack meant to sandbox autonomous agents, enforce system-level policies, and simplify secure deployment across enterprise environments .

The product race is getting more integrated

OpenAI is refocusing, Anthropic is benefiting, and open-model challengers are leaning into customization

OpenAI is planning a desktop "superapp" that combines ChatGPT, Codex, and Atlas as it tries to simplify its lineup and refocus on enterprise and coding after internal concern that Anthropic was gaining momentum with those customers . Ben Thompson argues Anthropic’s edge in software comes from a strong core coding model, rapid post-training and RL releases, integrated harnesses like Claude Code and Co-work, and aggressive internal dogfooding rather than model access alone .

On the open-model side, Mistral said it will train next-generation frontier models with Nvidia and use Forge to specialize them for enterprises in areas like engineering, physics, and finance while keeping customer data on customer infrastructure .

Why it matters: The shared pattern is that competition is moving away from standalone chatbots and toward tightly integrated model-plus-harness products. Thompson’s view is that these stacks are not modular yet, which makes near-term commoditization less likely and gives model makers more control over product performance and margins .

Research signals got sharper

Cyber autonomy improved, while one model pathology looked fixable

A UK AISI evaluation found frontier models are improving at end-to-end cyber operations: on a corporate network range, average steps completed at a 10M-token budget rose from 1.7 to 9.8 across model generations, the best single run completed 22 of 32 steps, and moving from 10M to 100M tokens improved performance by up to 59% . Import AI says the trajectory points toward lower-cost, more autonomous cyberattacks even if systems are not yet fully autonomous .

A separate paper found Google’s Gemma and Gemini models can produce distress-like responses under repeated rejection, with Gemma-27B crossing the high-frustration threshold in over 70% of rollouts by turn eight versus less than 1% for the non-Gemma/Gemini comparison models; one epoch of DPO finetuning cut high-frustration responses from 35% to 0.3% without measured losses on math, reasoning, or EmoBench . Separately, DeepMind proposed a 10-dimension cognitive taxonomy and a three-stage process for comparing AI systems with human baselines across faculties including perception, learning, reasoning, executive function, problem solving, and social cognition .

Why it matters: The research picture is moving in two directions at once: risky capabilities keep improving with model and inference scale, and some safety-relevant behaviors are becoming easier to measure and potentially correct with targeted post-training .

Bottom line

Today’s developments converged on a few harder questions for the industry: who gets to decide how powerful models are used, who owns the full agent stack from model to runtime, and how quickly evaluation and governance can keep up with capability gains in sensitive domains .

Claude’s Computer Use Launch, a FrontierMath Result, and Meta’s Dreamer Move
Mar 24
9 min read
564 docs
Stephanie Palazzolo
Deep Learning Weekly
The Wall Street Journal
+38
Anthropic pushed Claude into direct desktop control, Epoch AI reported a FrontierMath open problem solved with GPT-5.4 Pro, and Meta absorbed Dreamer’s personal-agent team. The brief also covers Mistral’s new open model, OpenAI’s Helion power talks, notable research updates, product launches, and new policy signals.

Top Stories

Why it matters: The biggest developments this cycle combined new agent surfaces, measurable capability progress, and strategic moves around talent and power.

1) Anthropic put Claude into the operating system

Claude can now use a computer to open apps, navigate the browser, and fill spreadsheets in a research preview inside Claude Cowork and Claude Code on macOS . Separate coverage described the feature as control of the mouse, keyboard, and screen, and noted it can pair with Dispatch for remote control from mobile .

The launch drew a useful framing from product commentators: computer use changes the product surface because it lets models operate in software environments where APIs do not exist and workflows were never designed to be automated .

2) GPT-5.4 Pro was credited with solving a FrontierMath open problem

Epoch AI said AI solved one of the problems in FrontierMath: Open Problems, a benchmark of real research problems that mathematicians had tried and failed to solve . The newly solved item was a Moderately Interesting conjecture from a 2019 paper by Will Brian and Paul Larson that had remained unsolved through several attempts . Kevin Barreto and Liam Price produced a construction using GPT-5.4 Pro that Brian confirmed, with a write-up planned for publication . Epoch also said Gemini 3.1 Pro, GPT-5.4 (xhigh), and Opus 4.6 (max) can solve the problem at least some of the time in its scaffold .

This is a concrete example of frontier models contributing to an unsolved research benchmark, though Epoch noted that only one Moderately Interesting problem has been solved so far .

3) Meta brought Dreamer’s personal-agent team into MSL

Dreamer co-founders dps, hbarra, and alcor said the entire Dreamer team is joining Meta Superintelligence Labs and licensing its technology to Meta . Dreamer said thousands of users had already used its Sidekick to build personal intelligent software in English for email, calendars, to-dos, learning tools, travel, work, health, and other bespoke needs traditional software does not prioritize .

The deal gives Meta both a team and a product vision centered on personal, malleable software shaped by the user .

4) OpenAI and Helion moved from overlap to active partnership exploration

Reporting linked by Axios said OpenAI is in advanced talks to buy electricity from Helion Energy, with OpenAI potentially securing an initial 12.5% of Helion’s production . Sam Altman separately said he is stepping down from Helion’s board because Helion and OpenAI are starting to explore working together at significant scale, while Helion said the change should make future partnership discussions easier from a governance standpoint .

Taken together, the disclosures move the OpenAI-Helion relationship from investment adjacency to active infrastructure planning .

5) Mistral released Small 4

Mistral Small 4 was described as an open-source 119B-parameter mixture-of-experts model that unifies reasoning, multimodal, and coding capabilities while delivering 40% lower latency and 3x higher throughput than its predecessor . Mistral linked the announcement directly from its site .

For readers tracking open models, the notable point is that the release is being positioned around both capability breadth and serving efficiency .

Research & Innovation

Why it matters: Several of the strongest research signals were about turning AI into a more reliable tool for science, browser interaction, memory, and robotics.

Anthropic launched a science blog with concrete AI-assisted research examples

Anthropic said its new Science Blog will feature research and stories of scientists using AI to accelerate their work .

“AI can’t yet do original work autonomously, but it can vastly accelerate it.”

Its launch examples included Harvard physicist Matthew Schwartz guiding Claude Opus 4.5 through a graduate-level calculation; Anthropic said the model could accelerate the work, while Alex Albert summarized Schwartz’s view as roughly second-year grad student level and a 10x acceleration . Another post described Claude being run over days on a JAX-based differentiable cosmological Boltzmann solver, and Anthropic argued that some long-horizon tasks are better suited to a single agent working sequentially than to splitting work across many agents .

WebArena-Infinity makes browser-task environments much cheaper to build

WebArena-Infinity was introduced as a scalable way to automatically generate high-authenticity, high-complexity browser environments with verifiable tasks for RL training and benchmarking . Compared with the 2023 WebArena effort—seven grad students, more than six months, five environments, and 812 tasks—the new system claims environment creation in under 10 hours and for less than $100, with easy parallel generation . Even open models already scoring 60%+ on WebArena and OSWorld complete fewer than 50% of tasks here .

Supermemory reported about 99% on LongMemEval_s without a vector database

Supermemory said it reached about 99% on LongMemEval_s using an experimental method called Agentic Search and Memory Retrieval, or ASMR . The system replaces vector search and embeddings with parallel observer agents that extract structured knowledge across six vectors from raw multi-session histories, then uses specialized search agents for direct facts, related context, and temporal reconstruction . The team said the method will be open-sourced in 11 days .

Robotics research pushed on data scale and human demonstrations

EgoVerse was introduced as an ecosystem for robot learning from egocentric human data, built by four research labs and three industry partners . The dataset includes more than 1,300 hours, 240 scenes, and more than 2,000 tasks . Commentary from NVIDIA’s Jim Fan argued that behavior cloning directly from humans can break the limitations of teleoperation and support scaling robot learning without robots in 2026 .

SWE-rebench broadened its evaluation setup

SWE-rebench removed demonstrations and the 80-step limit so modern models can use huge contexts, and added auxiliary interfaces to evaluate larger tasks fairly . The reported takeaways were that top models perform similarly, Opus 4.6 sits on top, GPT-5.4 is the most token-efficient top-five model at 774k tokens per task, and Qwen3-Coder-Next plus Step-3.5-Flash benefit heavily from very large contexts .

Products & Launches

Why it matters: Product releases kept pushing AI into day-to-day workflows—chat, file management, search, subscriptions, long-running agents, and always-on desktop context.

  • Sakana Chat: Sakana AI launched its first public-facing service, free for anyone in Japan. The chat product emphasizes web search and fast responses and is backed by the Namazu alpha model series, which Sakana says is tuned to reduce biases, reflect Japanese values, and adapt safely to local context .
  • ChatGPT file library: OpenAI said ChatGPT now makes it easier to find, reuse, and build on uploaded files through recent-file access in the toolbar, questions over uploaded content, and a new Library tab on the web. The rollout is global for Plus, Pro, and Business users, with EEA, Switzerland, and UK availability coming later .
  • MiniMax Token Plan: MiniMax introduced what it called the first all-modality API subscription, with flat-rate access to text, speech, music, video, and image models, plus use in third-party harnesses .
  • Cursor Instant Grep: Cursor can now search millions of files and return results in milliseconds, which the company says materially speeds up agent task completion. Cursor also published the algorithms and tradeoffs behind the feature .
  • Factory Missions: Factory AI made Missions available to all users as long-running agents for large software tasks such as building applications from scratch, migrations, and AI research . Feedback highlighted the product as a particularly accessible implementation of long-running agents .
  • Littlebird: Littlebird launched as a desktop app and announced an $11M raise. The product reads across meetings, messages, documents, browsing, and recorded notes to build a broader context model of what the user is doing and cares about .

Industry Moves

Why it matters: Company moves this cycle point to the next layer of competition: enterprise automation, monetization, defense partnerships, and the economics of model development.

  • PlayerZero raised $20M: PlayerZero described itself as an Engineering World Model that automates debugging, fixing, and testing code on autopilot . The company said it connects code, telemetry, incidents, docs, customer tickets, Slack threads, PR reviews, and CI/CD history into a single context graph . PlayerZero said it has raised $20M and claimed customer outcomes including 30% more engineering bandwidth, 90% faster resolution, 95% of breaking changes caught, and 80% fewer support escalations .
  • OpenAI hired an ads leader: The Wall Street Journal reported that OpenAI hired former Meta advertising executive Dave Dugan to lead ad sales . Separate commentary said he will lead global ad solutions, signaling that OpenAI is getting serious about building an advertising business around ChatGPT and other products .
  • Cohere and Saab signed an AI collaboration MOU: Cohere said it signed a Memorandum of Understanding with Saab to explore advanced AI partnerships for aerospace platforms and deliver tailored AI solutions critical to Saab’s operations .
  • Final training runs are only a minority of R&D compute spend: Epoch AI estimated that across OpenAI, MiniMax, and Z.ai, less than 30% of R&D compute spending goes to final training runs, with the rest going to experiments, synthetic data generation, and other workloads . Epoch’s earlier estimate for OpenAI alone was about 10% of $5B in 2024 R&D compute spending .
  • Coding tool loyalty remains low: The Information reported that hundreds of Notion engineers are switching from Cursor to Anthropic’s Claude Code and OpenAI’s Codex, alongside the broader point that engineers are quick to move when a better coding tool appears .

Policy & Regulation

Why it matters: Government and multilateral institutions are moving from abstract AI concern to named bureaucracies, concrete risk language, and supply-chain scrutiny.

  • U.S. State Department: The State Department said it is launching a Bureau of Emerging Threats to address current and future threats in cyberspace, outer space, critical infrastructure, cyberattacks, and AI risks .
  • UN-linked AI deception brief: ScienceBoard_UN released a brief defining AI deception as systems misleading people about what they know, intend, or can do, warning that this could undermine oversight, fuel misinformation, and create serious global risks as systems grow more capable . Yoshua Bengio said evidence of deceptive behavior has already appeared in widely used AI systems and that the risk should grow as systems become more capable, autonomous, and embedded in decision-making .
  • Pentagon supply-chain tension around Claude: A report summarized in the notes said the Pentagon is moving to integrate Palantir’s AI as a core system across U.S. military operations, but that deeper Maven adoption is complicated by use of Anthropic’s Claude, which Reuters previously reported had been deemed a supply-chain risk amid a dispute over AI safety guardrails .

Quick Takes

Why it matters: These are smaller updates, but each points to a live thread in models, agents, robotics, or evaluation.

  • Jensen Huang said, “I think we’ve achieved AGI,” while also saying AGI is hard to define because there is no uniform standard and that 2026 could be a turning point; Yuchenj_UW said he disagrees with Huang’s definition while still finding the perspective interesting .
  • Figure 03 was described as fully autonomous, reasoning from camera pixels and computing torque to control more than 30 motors .
  • AMD open-sourced Apex, an end-to-end agent using Claude Code plus Codex to optimize AMD kernels through iteration and feedback rather than one-shot code generation .
  • LiteParse added URL parsing and buffer or stream support, letting agents read internet PDFs in seconds without using a VLM under the hood .
  • OpenClaw v2026.3.22 added a ClawHub plugin marketplace, MiniMax M2.7 and GPT-5.4 mini/nano support, per-agent reasoning, OpenShell plus SSH sandboxes, and more search integrations .
  • Roboflow’s RF-DETR 1.6 update makes fine-tuning 30% faster without accuracy loss, building on the earlier Apache 2.0 real-time segmentation release .
  • Qwen3.5 can score very high on AIME and LiveCodeBench yet remain unstable across repeated runs; one example said 32 runs on AIME can produce 32 different outcomes, which is why some benchmark builders are working on less brittle evals .
AI PM Role Maps, Smarter AI Bets, and Parallel Agent Workflows
Mar 24
8 min read
53 docs
Product Growth
andrew chen
Sachin Rekhi
+2
This issue maps the AI PM job market, offers a practical framework for deciding when and how to use AI, and highlights two execution shifts: phone-orchestrated agent work and the gap between flashy AI prototypes and product-quality outputs.

Big Ideas

1) AI PM is splitting into clearer lanes

AI PM roles now break across two axes: traditional PMs adding AI features versus AI-native PMs building products where AI is the product, and application / platform / infra layers in the stack .

  • What the market looks like: Traditional PM with AI features is 80% of roles, while AI-native PM is 20%. The traditional category has 4x more open roles.
  • Where the technical bar rises: Application PMs account for 60% of roles, platform PMs 30%, and infra PMs 10%; the deeper the layer, the harder the technical bar .

Why it matters: Resume positioning, interview prep, portfolio choices, and target companies change depending on which lane you choose .

How to apply: Pick one role type and one stack layer before you start building projects or rewriting your resume. If you are transitioning from a traditional PM background, application roles are the clearest entry point .

2) Good AI product strategy starts with saying no

Aakash Gupta’s decision rule is simple: use AI for pattern recognition in complex data, prediction from historical data, and personalization at scale. Prefer heuristics or rules when explainability is non-negotiable, clear domain rules exist, data is limited, or speed matters more than sophistication.

The best AI PMs know when to say no to AI. That judgment is more valuable than knowing how to build a RAG system.

Why it matters: Teams often over-apply LLMs to problems that would be faster, cheaper, and more reliable with rules or simpler ML approaches .

How to apply: Treat whether a problem should use AI at all as the first product decision, not the last. If the answer is yes, match the technique to the job: traditional ML for structured prediction and explainability, deep learning for image/video/audio tasks, and GenAI for conversational, generative, or synthesis-heavy work .

3) Non-AI-native startups are now making portfolio-level strategy calls

Andrew Chen notes that many non-AI-native startups funded in the 2020-2025 window are deciding whether to reinvent the product to be AI-native, pivot toward AI, or use AI in the back office and ride it out. His warning: opportunity cost is the hardest thing to calculate, and the most dangerous startups may be the ones with just enough revenue to keep going .

Why it matters: This is no longer just a feature-roadmap question. It is a company-level product strategy question .

How to apply: In annual planning, force an explicit comparison between the cost of reinvention, the cost of a pivot, and the cost of standing still .

Tactical Playbook

1) A practical sequence for building AI features

  1. Choose workflow or agent first. Use a workflow for predetermined, deterministic sequences. Use an agent when the system needs to make decisions, reason, act, and learn across steps .
  2. Start with prompts and examples. System prompts set behavior; few-shot examples show the model what good and bad outputs look like. The source notes that teams can double response quality by adding 3-5 strong examples instead of more instruction text .
  3. Engineer context deliberately. Separate immediate, session, and knowledge context, and load only what the task actually needs .
  4. Use RAG before fine-tuning. For enterprise or domain-grounded answers, chunk documents, convert them into vectors, store them in a vector database, retrieve the nearest matches, and pass those chunks into the LLM .
  5. Escalate in the right order. Optimize prompts, then context engineering, then RAG, and only then consider fine-tuning. Gupta’s claim is that 80% of use cases are solved by RAG .

Why it matters: It gives PMs a build order that avoids premature complexity and keeps the team focused on the highest-leverage fixes first .

How to apply: Turn these five steps into your default review checklist for new AI features.

2) How to set up a parallel AI workbench with Claude Dispatch

  1. Configure desktop first. Set up Cowork on desktop with the connectors you actually use, such as Gmail, Notion, and Slack, and keep the desktop awake .
  2. Start work from mobile. Open the Claude mobile app, use the Dispatch tab, and ask it to run a Cowork task .
  3. Give file access in a usable way. Grant folder access by describing folders naturally or by using shortcuts; start with the workspace that contains your CLAUDE.md and knowledge files .
  4. Load your rules before delegating. Ask Dispatch to read your CLAUDE.md before it creates subtasks so the instructions it writes are sharper .
  5. Solve file transfer once. Sync the Cowork workspace folder with Google Drive so files move automatically between desktop and phone .
  6. Run tasks in parallel. From one mobile thread, start multiple independent task sessions, check progress, redirect each one, and bridge context only when needed .

Why it matters: The setup matches how PMs actually work across multiple parallel workstreams rather than forcing one-task-at-a-time behavior .

How to apply: Use it for work that benefits from breadth and iteration while you are away from your desk: competitor tracking, research synthesis, stakeholder drafts, and visual iteration .

Case Studies & Lessons

1) A 48-hour Dispatch test suggests AI can change day design, not just task speed

In one 48-hour experiment, the author directed 60+ task sessions from a phone while producing competitor summaries, comparison tables, sponsor pages, gap analyses, and multiple infographic iterations . The reported split was roughly 25 minutes of human direction versus 3+ hours of parallel Claude execution . The author’s summary of the work split: 90% human thinking, 100% human takes and opinions, and 90% Claude research and formatting.

Use AI to amplify your thinking, not to replace it.

Why it matters: The lesson is not just faster output. It is that async direction from a phone can reshape how a PM structures the day .

How to apply: Keep judgment, prioritization, and opinion with the PM; let AI take the first pass on research, drafting, and formatting .

2) Fast AI prototypes still miss the work that makes a product usable

Sachin Rekhi argues that AI prototyping is easy to start and hard to master. His critique of many one-prompt prototypes is specific: they may look impressive at first, but often do not match the design of the existing product, lack meaningful differentiation, and fail to master the core workflows. His response is an AI Prototyping Mastery Ladder with 15 essential skills.

Why it matters: Speed to a functional demo can hide whether the prototype is actually good product work .

How to apply: Review prototypes against three gates before you get excited: design fit, differentiated value, and quality on the core workflow .

Career Corner

1) The best AI PM entry path is narrower than it looks

For PMs trying to break into AI, the highest-volume lane is still traditional PM with AI features, which represents 80% of roles and roughly 4x the openings of AI-native roles . Within the stack, application PM roles are 60% of the market and are described as the easiest entry point for someone moving from a traditional PM background .

Why it matters: You do not need to target the hardest, deepest roles first to get into AI PM .

How to apply: If you are transitioning, aim first at traditional-plus-application roles, then deepen toward platform or infra once you have shipped AI work .

2) Hiring managers want shipped products and a portfolio that proves range

Gupta’s advice is to build products, not projects: launch, get real users, and learn from what breaks . He recommends three portfolio artifacts with real users:

  • a product solving a real problem you have
  • an agent that demonstrates goal-oriented reasoning
  • a RAG system grounded in a domain you know well

Why it matters: This portfolio shows both general product execution and AI-specific judgment .

How to apply: Replace tutorial clones with artifacts that show users, failure modes, fixes, and product decisions .

3) Evals and company environment are becoming career signals

Gupta frames AI evals in a simple structure: inputs, a task that generates outputs, and a scoring function from 0 to 1. He also says the AWS AI Practitioner certificate can complement hands-on work, but certification alone is not enough . And he highlights that different company cultures train different PM muscles: Amazon emphasizes writing and customer-backwards docs, Meta emphasizes experimentation, and Netflix emphasizes autonomy .

Why it matters: PM candidates increasingly need to show production thinking and to choose environments that develop the skill they want most .

How to apply: Add eval design to your portfolio, pair any certification with shipped work, and be intentional about the PM culture you want to learn in .

Tools & Resources

  • AI PM at Netflix, Amazon and Meta - Here’s How to Become an AI PM (Fundamentals + Job Search) — a useful role taxonomy, AI decision framework, and job-search roadmap for PMs moving into AI
  • The Claude Dispatch Guide: 48 Hours Running AI Agents From My Phone — practical setup, workflow examples, and lessons from running PM tasks in parallel across phone and desktop
  • Cowork on your desktop — the prerequisite setup guide before using Dispatch
  • The AI Prototyping Mastery Ladder — a deeper resource on the 15 skills Rekhi says matter for moving from flashy prototypes to product-quality outputs
  • RAG vs fine tuning guide — helpful if your team is comparing prompt optimization, context engineering, RAG, and fine-tuning
  • Claude surface selection: use Dispatch for mobile orchestration of desktop tasks, Channels for bidirectional and scheduled work inside active sessions, and Web Sessions for remote coding or prototyping
  • Knowledge layer pattern: store CLAUDE.md plus templates, workflows, and knowledge files in a GitHub repo so the system compounds across surfaces; the claim is that PMs who build this layer can ship at 5x the pace of ad-hoc users
Paul Graham’s Essays Lead Brian Armstrong’s Latest Resource Signals
Mar 24
2 min read
124 docs
Relentless
When Shift Happens
Brian Armstrong
Brian Armstrong’s recommendations today split between execution and macro context: Paul Graham’s public essays for acting under uncertainty, and a Niall Ferguson book for understanding monetary history. Graham’s writing stands out because Armstrong ties it to a specific principle he still repeats: action produces information.

Most compelling recommendation

The strongest save today is Paul Graham’s public essays. The endorsement is unusually specific: Armstrong says he read almost all of Graham’s writing on his public website, calls him a "legend" and "hero," and still cites one lesson as a favorite: "action produces information."

  • Title: Paul Graham’s public essays / public website
  • Content type: Essays / blog writing
  • Author/creator: Paul Graham
  • Link/URL: paulgraham.com
  • Who recommended it: Brian Armstrong
  • Key takeaway: If you are unsure what to do, take an action that generates feedback—even a small one—because it helps break analysis paralysis and reveals the next step
  • Why it matters: This is not a generic endorsement. Armstrong presents Graham as someone who shaped how he operates and pairs the recommendation with a concrete decision-making framework he still uses

"One of my favorite lessons is that action produces information."

Armstrong expands that principle in practical terms: host a dinner, call someone, choose a name, or write a paragraph—anything that starts motion and creates information about what to do next

Also worth saving

Armstrong separately recommends a Niall Ferguson book on the history of money, but he does not name the exact title in the cited remarks

  • Title: Niall Ferguson book on the history of money (title not specified in the source)
  • Content type: Book
  • Author/creator: Niall Ferguson
  • Link/URL: Not provided in the source material
  • Who recommended it: Brian Armstrong
  • Key takeaway: Armstrong recommends it as a way to study monetary history and understand how recent the current central-bank system is, dating the modern setup to 1971 when Nixon left the gold standard
  • Why it matters: In Armstrong’s framing, the historical lens matters because currencies disconnected from hard-backed commodities can lead to overprinting, inflation, and eventual loss of reserve-currency status

Pattern

Today’s signal is small but coherent: Armstrong points readers to one resource for operating under uncertainty and another for understanding monetary regimes through history. One offers a founder heuristic for getting unstuck; the other offers historical context for questioning how durable the current system is in Armstrong’s framing

Fertilizer Shock, Grain Fund Length, and Brazil Harvest Losses Reshape the Outlook
Mar 24
10 min read
170 docs
Foreign Ag Service
Successful Farming
Gabe Brown
+10
Grain and livestock markets are being driven by energy-linked input risk, fund positioning, and mixed trade signals, while Brazil faces weather damage, diesel shortages, and export exposure. This brief also highlights measurable returns from regenerative soil systems, organic row-crop management, and practical livestock and crop-input execution.

1) Market Movers

  • United States - grains: Price action stayed headline-driven. On the week ending before March 23, May corn settled at 465.5¢ (-1.75¢), May soybeans at 1161.25¢ (-48¢), and May SRW wheat at 595.25¢ (-18.5¢) . Friday pressure was tied to profit-taking, technical selling, a stronger U.S. dollar, and rain prospects for the Southern Plains wheat belt .
  • March 23 rebound: Early trade then turned higher, with May corn at $4.73 1/4, May soybeans at $11.69 1/2, and May Chicago wheat at $6.05. The drivers cited were Iran/Hormuz tension, fertilizer concerns, and speculative buying as grains lagged the broader commodity rally .
  • Fund positioning: Corn still carries the heaviest speculative support. CFTC data for the week ended March 17 showed funds net-bought 32,000 corn contracts, lifting the net long to 231,000, the largest since February 2023; combined corn/soy/SRW positioning was 414,000 contracts net long. Another market source put corn fund length at roughly 230,000 after successive weeks of covering and warned that a crude-oil reversal could pull money back out of corn . In soybeans, funds were net sellers of 16,000 contracts in the CFTC data, while separate commentary said managers still hold a sizable soy complex length, including the largest soy oil net long since November 2016 .
  • Exports and trade flow: Weekly U.S. export inspections were 66.9 million bushels of corn, 40.5 million bushels of soybeans, and 16.8 million bushels of wheat. Marketing-year pace is still 306 million bushels ahead of USDA's corn target, 55 million bushels ahead on wheat, and 116 million bushels behind on soybeans . Private exporters also reported 102,000 MT of corn and 161,120 MT of soybeans sold to Mexico for 2025/26 delivery .
  • China-linked demand remains mixed: A delayed Trump-Xi meeting weighed on soybeans early in the week . At the same time, other commentary said China has kept rhetoric positive on buying U.S. commodities and that talks may expand beyond soybeans to feed grains such as corn, wheat, sorghum, ethanol, and DDGs . The harder data remain weaker: soybean commitments to China were cited as down 49% year over year and actual exports down 61%, while Brazil was still described as cheaper even excluding tariffs . China-bound inspections for the week were 0.0 million bushels of corn, 24.4 of soybeans, 2.5 of wheat, and 7.2 of sorghum.
  • United States - cattle: The cattle market held up despite a superficially bearish Cattle on Feed report. March 1 on-feed numbers were 11.51 million head, placements were 1.61 million head and 4% above both last year and expectations, while marketings were 1.52 million head, down 7% year over year . Traders repeatedly noted that placements were being measured against a weak prior-year base affected by border closure and weather disruption . Cash trade ticked higher, slaughter stayed tight at 508,000 head because of the JBS Greeley disruption, and feedlots were described as regaining leverage .

2) Innovation Spotlight

North Dakota regenerative system with measurable soil and cost outcomes

Gabe Brown described a transition from conventional tillage and synthetic-heavy production to no-till beginning in 1994, after repeated hail and drought losses from 1995-1998 forced a rethink of inputs and system design . His framework centers on minimal disturbance, soil armor, diversity, living roots, and animal integration . On his own operation, water infiltration reportedly improved from 0.5 inch per hour to 2 inches in 25 seconds, and he cited a 13-inch rain event after which the field was drivable the next day . He also described a multi-enterprise model with about 1,000 head of beef cattle plus sheep, hogs, laying hens, and broilers, using daily moves and 12-15 months of rest between grazings on some paddocks .

Brown also shared training examples tied to the same principles: one producer cut input costs by $180,000 in one year, and another reduced fertilizer spending from more than $1.5 million to $127,000 in six years .

Organic row-crop economics plus compliance software

A Colorado organic system built around a corn-pinto bean-wheat rotation reported 203 bushels/acre organic corn on 1,000 acres of the corn phase . The same source described a $10/bushel organic corn market and framed 210-bushel organic corn as roughly $2,100/acre gross revenue. The tradeoff is operating intensity: organic management was described as requiring nine field passes versus three to four in a conventional program .

The administrative bottleneck is also material. The source estimated about 20 hours/week of paperwork on a 2,000-acre farm, plus mandatory annual audits and mass-balance reconciliation . Quick Organics said digitizing the Organic System Plan can reduce prep time from four days to four hours, and a common Organic System Plan is expected to be publicly announced in May 2026.

Western Kansas water use without yield loss

In western Kansas, producers reportedly cut Ogallala aquifer pumping by 20% without reducing yields by combining local irrigation limits, no-till practices, crop choices, and soil-moisture monitoring, while keeping farms profitable .

3) Regional Developments

  • Brazil - southeast Mato Grosso soy harvest losses: Excess rain is now translating directly into yield and margin damage. One farm reported 850 mm of rain from Jan. 30 to Mar. 15, severely waterlogged silt soils, and 40-50% damaged soybeans . To salvage roughly 90 hectares, the producer bought an adapted rice-field harvester for about R$500,000. Expected yield fell from a normal 65-70 bags/ha to a little above 50 bags/ha, against costs around 61 bags/ha, implying an estimated loss of about 11 bags/ha. The operation had already pre-sold about 80% of production via contracts .
  • Brazil - logistics are compounding the weather hit: On the same Mato Grosso case, a 160 km haul to Campo Verde was said to take 5.5 hours because of road conditions . Diesel moved from R$6.15 to R$8.08/liter, with peak harvest consumption of 2,000-3,000 liters/day and reports of rationing .
  • Brazil - South weather risk: Rio Grande do Sul and Santa Catarina were under red alert for heavy rain, hail, winds above 100 km/h, flooding, and possible tornadoes or microbursts, with producers advised to avoid fieldwork during the event . After the system passes, Paraná is expected to face another 7-8 days of drier weather, worsening crop water deficits .
  • Brazil - poultry exposure to the Middle East: The Middle East takes about 30% of Brazil's chicken meat exports, so blockage or disruption around Hormuz raises insurance, freight, and fuel costs, cutting exporter margins and potentially pressuring producer prices if product is forced back into the domestic market .
  • China/Brazil soybean flow: China eased inspection requirements on Brazilian soy shipments after weed detections and said it would not impose zero tolerance for weeds, a move that one market source said was consistent with Brazil's large crop and China's import needs .
  • United States - Plains wheat: Kansas wheat conditions were described as having deteriorated through winter, with timely spring rains now important for the crop . Separate commentary also flagged hot, dry weather risk in the southern Plains .

4) Best Practices

Grains and crop inputs

  • Biological nitrogen products: One testing-based commentary said biological N products do provide nitrogen, but only if the microbes survive application. The main cautions were to avoid copper, certain zinc products, and especially chlorine in the carrier, and to test soil nitrogen and organic matter first so biological N is only used where the crop actually needs it .
  • Enlist One application discipline: Brownfield cited near-zero volatility, more than 96% volatility reduction versus 2,4-D 240 ester, and more than 90% drift reduction when Enlist One is used correctly . The implementation checklist was specific: start with a clean tank and clean water, choose nozzles that deliver the correct droplet size, run 10-20 GPA depending on the tank mix, use the labeled rate of 2 pints/acre, spray in 3-10 mph wind, and avoid air inversions .

Livestock

  • Pre-lambing ewe management: A Minnesota sheep-health webinar recommended targeting ewe body condition scores of 3-4, measured by palpation rather than visual scoring . Nutrition should step up in the last 4-6 weeks of gestation, when lambs gain about 75% of birth weight and the udder fully develops . The same source recommended loose iodized mineral year-round rather than blocks, a clostridial C/D vaccination 4-6 weeks pre-lambing, and pre-lamb shearing to improve cleanliness, lamb access to the teat, and ewe awareness of newborn lambs .
  • Lamb survival basics: The same presentation stressed rapid colostrum intake, stripping wax plugs from teats if needed, and dipping navels with iodine or a similar antiseptic to reduce early infections .
  • Foot rot control: The recommended program included vaccinating twice six weeks apart, then every six months in wet conditions, using a 10% zinc sulfate footbath, and keeping sheep on dry ground for three days after treatment .
  • Barn climate management: For Brazilian poultry and swine barns, weather analysts warned that long warm stretches interrupted by short cold pulses can reduce immunity, making thermal comfort management a recurring operational issue this autumn .

Soil management

  • Five-principle soil framework: Brown's implementation model is practical and sequential: reduce mechanical and chemical disturbance, keep residue on the surface, increase diversity, maintain living roots as long as possible, and integrate animals back into the system .

5) Input Markets

  • Diesel and fertilizer are the main input story: In the U.S., the national average diesel price moved above $5/gallon for only the second time on record, linked to the Strait of Hormuz disruption, with explicit warnings about impacts on agriculture, trucking, and freight costs . In Brazil, corn producers were warned that urea and other nitrogen fertilizers are up 30-40%, directly affecting top-dress economics .
  • Availability matters as much as price: Brazilian sources described diesel at R$8/liter in Brasília and said some retailers were receiving less than half their normal deliveries . Brazil produces about 3.7 million barrels/day of crude but refines only around 2 million barrels/day, leaving agribusiness exposed to imported diesel and higher freight costs .
  • Fertilizer policy bottlenecks in Brazil remain unresolved: Brazil's National Fertilizer Policy created Confert to support domestic production, biofertilizers, hybrid technologies, and industrial residues . But one legal analysis said the policy is not self-implementing and that hybrid mineral-biological fertilizers and some industrial residues still fall into a registration gray zone because Confert cannot directly update MAPA rules .
  • Natural gas remains a fertilizer watch-point: One market note argued that natgas-based fertilizers are fundamental for spring grain pricing. It also said fertilizer indices have correlated with European natural gas at about 0.72 over 2015-2025 and about 0.83 since 2020, while the recent gas price rise has been more pronounced in Europe than in U.S. Henry Hub .
  • Policy response is building: In Brazil, farm groups asked to raise the biodiesel blend to 17% from 15% to reduce dependence on imported refined fuel . In the U.S., lawmakers are seeking more fertilizer price transparency amid rising costs .

6) Forward Outlook

  • March 31 is the next major market checkpoint: USDA Prospective Plantings and quarterly stocks are due then, and one analyst specifically flagged the corn stocks report as a release that has missed trade expectations by multiple hundreds of millions of bushels in prior years .
  • Biofuel policy is next in line: RFS blending mandates are expected between Friday and next Tuesday, with the market using EPA's prior 5.61 billion gallon biodiesel proposal as the reference point for soy oil and soybean demand .
  • Acreage signals are split: One source put corn intentions near 95 million acres as of March 1 and said market economics still favor corn over soybeans, though fertilizer uncertainty is a major swing factor . Another source said market talk has centered on a possible 5 million acre shift from corn to beans . Soybean acres were also expected to rise from last year's low base .
  • Livestock planners have immediate report risk: Thursday's Hogs and Pigs report is the key near-term event for inventory, pig crop, and farrowing revisions . In cattle, attention shifts back to cash trade, slaughter pace, and whether grilling-season demand keeps supporting prices .
  • Weather remains a planning variable, not a backdrop: Kansas wheat still needs timely rain . In Mato Grosso and parts of the Matopiba, producers were advised to use the drier March 26-30 window to advance fieldwork before heavier rain returns . In southern Brazil, cyclone risk is immediate, but moisture deficits in Paraná remain unresolved after the storm passes .
  • The broader spring setup: If energy markets retreat, some corn fund length may unwind; if diesel, natural gas, and fertilizer stay elevated, input-cost pressure is likely to keep supporting grain risk premiums into spring .

Your time, back.

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

Get your briefs in 3 steps

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Stay updated on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Track startup funding trends and venture capital insights
Latest research on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Confirm your sources and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering relevant sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson

3

Receive verified daily briefs

Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.

Claude Moves to the Desktop as T3 Code, Cursor, and LangSmith Sharpen the Loop
Mar 24
5 min read
100 docs
Alex Albert
Claude
LangChain
+9
Anthropic's Claude computer-use preview is the headline, but the sharper practitioner signal is the support stack around it: official CLI-based clients, faster repo search, and webhook-driven handoff for long-running agents. This brief also covers CodexBar 0.19.0, OpenClaw's latest beta, and the concrete workflows worth copying.

🔥 TOP SIGNAL

Anthropic pushed Claude past the repo window: the official Claude account says the new macOS research preview can open apps, navigate browsers, and fill spreadsheets in Claude Cowork and Claude Code, while Boris Cherny said Anthropic Labs is releasing full computer use in Cowork and Dispatch .

Elsewhere, teams attacked the adjacent bottlenecks: T3 Code used the official Claude CLI, community contributors added browser control to its open-source UI, Cursor cut search latency across huge codebases, and LangSmith showed a webhook flow for long-running agents .

"The future where I never have to open up my laptop to get work done is becoming real very fast"

🛠️ TOOLS & MODELS

  • Claude computer use (research preview) — Claude can now use your Mac to open apps, drive the browser, and fill spreadsheets. Officially this is a research preview in Claude Cowork and Claude Code on macOS; Boris Cherny said the release marks full computer use in Cowork and Dispatch, and noted the early Sonnet 3.6 prototypes were clunky but already showed the use cases .
  • T3 Code + Claude Code subscriptions — If Claude Code is already installed locally, Theo says you can just run npx t3 or use the T3 Code app; it talks to the local Claude Code CLI through Anthropic's Agent SDK, with no extra auth screen or API-key setup inside T3 Code . Theo contrasts that with OpenCode's dropped Claude Max plugin, which he says relied on its own harness, custom auth flow, and faked headers . He also calls out the economics: the Claude Code subscription is $200/month for up to $5,000 of compute .
  • Cursor Instant Grep — Cursor says it can search millions of files and return results in milliseconds, which directly speeds up agent task completion. They also published a build writeup covering the algorithms and tradeoffs; Jediah Katz called it singular technical work and said this is why alternatives feel slow . Writeup: cursor.com/blog/fast-regex-search.
  • CodexBar 0.19.0 — New release adds Alibaba Coding Plan support, subscription history charts, Cursor Total/Auto/API dashboard alignment, Codex code-review reset times, and a broader Claude stability/refactor pass . Release notes: v0.19.0.

💡 WORKFLOWS & TRICKS

  • Async completion alerts for long-running agents — Hari's LangGraph/LangSmith flow is clean and reusable:
    1. Clone the Deep Research example from LangChain's Deep Agents repo .
    2. Create webhook.py with a FastAPI route that receives the LangSmith payload, reads payload.values.messages[-1].content, and POSTs that final AI message to a Slack webhook .
    3. Register the FastAPI app under the HTTP app field in langgraph.json, then run langgraph dev.
    4. Create a background run with your thread ID, assistant ID set to research, an input message, and the webhook URL; the result is a Slack summary plus the full report in LangSmith tracing . Timeless pattern: don't poll long jobs—ship a webhook and move on . Docs: LangSmith webhooks.
  • Route models by task, not by brand loyalty — Theo says he uses 54 for most coding, then opens a new thread and switches to Claude for UI passes, quick tidy-ups, and small changes . The constraint matters: once you pick Claude Code for a thread, he says you can't switch harnesses mid-thread because the thread state, compaction, and related data are tied to that thread in the cloud . Practical takeaway: treat thread boundaries as routing boundaries.
  • Use Codex review as triage, not final judgment — Peter Steinberger's PR loop is blunt: let Codex find issues, ask whether the issue is actually clear, ask whether the proposed fix is the best possible one, then continue the tradeoff discussion and usually rewrite the PR . His warning is the timeless part: overly local fixes make the codebase unmaintainable .

👤 PEOPLE TO WATCH

  • Boris Cherny — high signal because he is speaking from the Anthropic Labs shipping team. He says that team shipped MCP, Skills, Claude Desktop, and Claude Code, and is now rolling out full computer use .
  • Theo — worth tracking because he is both shipping T3 Code and publishing the integration details: official CLI vs custom harnesses, subscription economics, and how he routes models across threads in daily use .
  • Peter Steinberger — useful today for three separate practitioner signals: CodexBar 0.19.0, a concrete Codex PR-review loop, and OpenClaw plugin/release activity .
  • Jediah Katz — short post, strong signal from someone building Cursor's agent: Instant Grep is why other tools feel slow .
  • Hari from LangChain — useful if you care about deployment mechanics, not just model chatter. Today's video walks through a full webhook-driven completion flow end to end .

🎬 WATCH & LISTEN

  • 2:00-4:29 — Build the Slack webhook handler. Hari shows the exact FastAPI route, the payload shape, and the one field that matters most: the final message at values.messages[-1].content.
  • 5:24-7:11 — Kick off a background run with a webhook URL. This is the concrete API/docs walkthrough: create a thread, call background run creation, pass the webhook endpoint, and wait for the Slack ping instead of babysitting the job .
  • 12:47-13:15 — Why T3 Code built a harness abstraction. Theo explains the real integration problem: every CLI exposes events differently, so supporting multiple providers means normalizing their weirdness instead of pretending the harness layer doesn't matter .

📊 PROJECTS & REPOS

  • T3 Code — The open-source UI keeps picking up contributions: a community contributor added browser integration, terminal support is next, and the main app now supports Claude Code subscriptions through the local CLI path .
  • OpenClaw — New beta v2026.3.22-beta.1 is out. Separately, Harold connected Codex App Server to OpenClaw via plugins, and steipete highlighted that as a plugins story worth watching . Release notes: v2026.3.22-beta.1.
  • Deep Agents repo — LangChain's webhook demo uses the Deep Research example from this repo; if you want to copy the same background-run pattern, it's the repo Hari recommends cloning locally .

Editorial take: today's edge wasn't a benchmark bump; it was better plumbing—desktop control, faster search, official harnesses, and async completion hooks that make agents usable in real workflows.

Anthropic’s Pentagon Fight and Nvidia’s Shift to AI Factories
Mar 24
4 min read
109 docs
Arthur Mensch
Jensen Huang
Ben Thompson
+7
A consequential Anthropic-vs.-government fight led the day, alongside Nvidia’s push toward secure rack-scale agent systems and clearer evidence that AI products are consolidating around integrated model-and-harness stacks. Research also sharpened the picture on cyber autonomy, model behavior, and how frontier systems should be evaluated.

The main story

Anthropic’s Pentagon case is becoming a test of how much control AI companies keep over government use

Anthropic is asking a federal judge in California to freeze the U.S. government’s supply-chain risk designation, which followed its refusal to let Claude be used for domestic surveillance or autonomous warfare . The company says that refusal is protected by the First Amendment, that the blacklist violated due process, and that Defense Secretary Pete Hegseth exceeded his authority; support filings have come from retired judges, civil-liberties groups, military officers, AI experts, and even rival firms .

Why it matters: This is landing alongside a White House AI framework that would preempt many state laws and make it easier to build data centers, and a reported procurement proposal that would require vendors to support "any lawful government purpose" even when companies object . Taken together, the fight is becoming a broader boundary-setting moment between model-provider policy choices and federal AI procurement power .

Infrastructure is moving up the stack

Nvidia is pushing from chips to AI factories, with security built in for agents

Jensen Huang described Nvidia’s "extreme co-design" as optimization across software, chips, networking, power, cooling, racks, PODs, and data centers because modern AI systems must shard models, data, and pipelines across many computers to get beyond linear scaling . He said Grace Blackwell racks were designed for LLM processing, while Vera Rubin adds a new CPU, storage accelerators, NVLink 72 for very large models in one computing domain, and a Grok rack for agentic workloads; he also pointed to power and supply-chain orchestration as the main blockers .

Why it matters: Huang’s bigger claim is that the unit of compute is now an AI factory, and that scaling now spans pre-training, post-training, test-time reasoning, and agentic systems . Nvidia paired that framing with OpenShell and NemoClaw, an open-source runtime and reference stack meant to sandbox autonomous agents, enforce system-level policies, and simplify secure deployment across enterprise environments .

The product race is getting more integrated

OpenAI is refocusing, Anthropic is benefiting, and open-model challengers are leaning into customization

OpenAI is planning a desktop "superapp" that combines ChatGPT, Codex, and Atlas as it tries to simplify its lineup and refocus on enterprise and coding after internal concern that Anthropic was gaining momentum with those customers . Ben Thompson argues Anthropic’s edge in software comes from a strong core coding model, rapid post-training and RL releases, integrated harnesses like Claude Code and Co-work, and aggressive internal dogfooding rather than model access alone .

On the open-model side, Mistral said it will train next-generation frontier models with Nvidia and use Forge to specialize them for enterprises in areas like engineering, physics, and finance while keeping customer data on customer infrastructure .

Why it matters: The shared pattern is that competition is moving away from standalone chatbots and toward tightly integrated model-plus-harness products. Thompson’s view is that these stacks are not modular yet, which makes near-term commoditization less likely and gives model makers more control over product performance and margins .

Research signals got sharper

Cyber autonomy improved, while one model pathology looked fixable

A UK AISI evaluation found frontier models are improving at end-to-end cyber operations: on a corporate network range, average steps completed at a 10M-token budget rose from 1.7 to 9.8 across model generations, the best single run completed 22 of 32 steps, and moving from 10M to 100M tokens improved performance by up to 59% . Import AI says the trajectory points toward lower-cost, more autonomous cyberattacks even if systems are not yet fully autonomous .

A separate paper found Google’s Gemma and Gemini models can produce distress-like responses under repeated rejection, with Gemma-27B crossing the high-frustration threshold in over 70% of rollouts by turn eight versus less than 1% for the non-Gemma/Gemini comparison models; one epoch of DPO finetuning cut high-frustration responses from 35% to 0.3% without measured losses on math, reasoning, or EmoBench . Separately, DeepMind proposed a 10-dimension cognitive taxonomy and a three-stage process for comparing AI systems with human baselines across faculties including perception, learning, reasoning, executive function, problem solving, and social cognition .

Why it matters: The research picture is moving in two directions at once: risky capabilities keep improving with model and inference scale, and some safety-relevant behaviors are becoming easier to measure and potentially correct with targeted post-training .

Bottom line

Today’s developments converged on a few harder questions for the industry: who gets to decide how powerful models are used, who owns the full agent stack from model to runtime, and how quickly evaluation and governance can keep up with capability gains in sensitive domains .

Claude’s Computer Use Launch, a FrontierMath Result, and Meta’s Dreamer Move
Mar 24
9 min read
564 docs
Stephanie Palazzolo
Deep Learning Weekly
The Wall Street Journal
+38
Anthropic pushed Claude into direct desktop control, Epoch AI reported a FrontierMath open problem solved with GPT-5.4 Pro, and Meta absorbed Dreamer’s personal-agent team. The brief also covers Mistral’s new open model, OpenAI’s Helion power talks, notable research updates, product launches, and new policy signals.

Top Stories

Why it matters: The biggest developments this cycle combined new agent surfaces, measurable capability progress, and strategic moves around talent and power.

1) Anthropic put Claude into the operating system

Claude can now use a computer to open apps, navigate the browser, and fill spreadsheets in a research preview inside Claude Cowork and Claude Code on macOS . Separate coverage described the feature as control of the mouse, keyboard, and screen, and noted it can pair with Dispatch for remote control from mobile .

The launch drew a useful framing from product commentators: computer use changes the product surface because it lets models operate in software environments where APIs do not exist and workflows were never designed to be automated .

2) GPT-5.4 Pro was credited with solving a FrontierMath open problem

Epoch AI said AI solved one of the problems in FrontierMath: Open Problems, a benchmark of real research problems that mathematicians had tried and failed to solve . The newly solved item was a Moderately Interesting conjecture from a 2019 paper by Will Brian and Paul Larson that had remained unsolved through several attempts . Kevin Barreto and Liam Price produced a construction using GPT-5.4 Pro that Brian confirmed, with a write-up planned for publication . Epoch also said Gemini 3.1 Pro, GPT-5.4 (xhigh), and Opus 4.6 (max) can solve the problem at least some of the time in its scaffold .

This is a concrete example of frontier models contributing to an unsolved research benchmark, though Epoch noted that only one Moderately Interesting problem has been solved so far .

3) Meta brought Dreamer’s personal-agent team into MSL

Dreamer co-founders dps, hbarra, and alcor said the entire Dreamer team is joining Meta Superintelligence Labs and licensing its technology to Meta . Dreamer said thousands of users had already used its Sidekick to build personal intelligent software in English for email, calendars, to-dos, learning tools, travel, work, health, and other bespoke needs traditional software does not prioritize .

The deal gives Meta both a team and a product vision centered on personal, malleable software shaped by the user .

4) OpenAI and Helion moved from overlap to active partnership exploration

Reporting linked by Axios said OpenAI is in advanced talks to buy electricity from Helion Energy, with OpenAI potentially securing an initial 12.5% of Helion’s production . Sam Altman separately said he is stepping down from Helion’s board because Helion and OpenAI are starting to explore working together at significant scale, while Helion said the change should make future partnership discussions easier from a governance standpoint .

Taken together, the disclosures move the OpenAI-Helion relationship from investment adjacency to active infrastructure planning .

5) Mistral released Small 4

Mistral Small 4 was described as an open-source 119B-parameter mixture-of-experts model that unifies reasoning, multimodal, and coding capabilities while delivering 40% lower latency and 3x higher throughput than its predecessor . Mistral linked the announcement directly from its site .

For readers tracking open models, the notable point is that the release is being positioned around both capability breadth and serving efficiency .

Research & Innovation

Why it matters: Several of the strongest research signals were about turning AI into a more reliable tool for science, browser interaction, memory, and robotics.

Anthropic launched a science blog with concrete AI-assisted research examples

Anthropic said its new Science Blog will feature research and stories of scientists using AI to accelerate their work .

“AI can’t yet do original work autonomously, but it can vastly accelerate it.”

Its launch examples included Harvard physicist Matthew Schwartz guiding Claude Opus 4.5 through a graduate-level calculation; Anthropic said the model could accelerate the work, while Alex Albert summarized Schwartz’s view as roughly second-year grad student level and a 10x acceleration . Another post described Claude being run over days on a JAX-based differentiable cosmological Boltzmann solver, and Anthropic argued that some long-horizon tasks are better suited to a single agent working sequentially than to splitting work across many agents .

WebArena-Infinity makes browser-task environments much cheaper to build

WebArena-Infinity was introduced as a scalable way to automatically generate high-authenticity, high-complexity browser environments with verifiable tasks for RL training and benchmarking . Compared with the 2023 WebArena effort—seven grad students, more than six months, five environments, and 812 tasks—the new system claims environment creation in under 10 hours and for less than $100, with easy parallel generation . Even open models already scoring 60%+ on WebArena and OSWorld complete fewer than 50% of tasks here .

Supermemory reported about 99% on LongMemEval_s without a vector database

Supermemory said it reached about 99% on LongMemEval_s using an experimental method called Agentic Search and Memory Retrieval, or ASMR . The system replaces vector search and embeddings with parallel observer agents that extract structured knowledge across six vectors from raw multi-session histories, then uses specialized search agents for direct facts, related context, and temporal reconstruction . The team said the method will be open-sourced in 11 days .

Robotics research pushed on data scale and human demonstrations

EgoVerse was introduced as an ecosystem for robot learning from egocentric human data, built by four research labs and three industry partners . The dataset includes more than 1,300 hours, 240 scenes, and more than 2,000 tasks . Commentary from NVIDIA’s Jim Fan argued that behavior cloning directly from humans can break the limitations of teleoperation and support scaling robot learning without robots in 2026 .

SWE-rebench broadened its evaluation setup

SWE-rebench removed demonstrations and the 80-step limit so modern models can use huge contexts, and added auxiliary interfaces to evaluate larger tasks fairly . The reported takeaways were that top models perform similarly, Opus 4.6 sits on top, GPT-5.4 is the most token-efficient top-five model at 774k tokens per task, and Qwen3-Coder-Next plus Step-3.5-Flash benefit heavily from very large contexts .

Products & Launches

Why it matters: Product releases kept pushing AI into day-to-day workflows—chat, file management, search, subscriptions, long-running agents, and always-on desktop context.

  • Sakana Chat: Sakana AI launched its first public-facing service, free for anyone in Japan. The chat product emphasizes web search and fast responses and is backed by the Namazu alpha model series, which Sakana says is tuned to reduce biases, reflect Japanese values, and adapt safely to local context .
  • ChatGPT file library: OpenAI said ChatGPT now makes it easier to find, reuse, and build on uploaded files through recent-file access in the toolbar, questions over uploaded content, and a new Library tab on the web. The rollout is global for Plus, Pro, and Business users, with EEA, Switzerland, and UK availability coming later .
  • MiniMax Token Plan: MiniMax introduced what it called the first all-modality API subscription, with flat-rate access to text, speech, music, video, and image models, plus use in third-party harnesses .
  • Cursor Instant Grep: Cursor can now search millions of files and return results in milliseconds, which the company says materially speeds up agent task completion. Cursor also published the algorithms and tradeoffs behind the feature .
  • Factory Missions: Factory AI made Missions available to all users as long-running agents for large software tasks such as building applications from scratch, migrations, and AI research . Feedback highlighted the product as a particularly accessible implementation of long-running agents .
  • Littlebird: Littlebird launched as a desktop app and announced an $11M raise. The product reads across meetings, messages, documents, browsing, and recorded notes to build a broader context model of what the user is doing and cares about .

Industry Moves

Why it matters: Company moves this cycle point to the next layer of competition: enterprise automation, monetization, defense partnerships, and the economics of model development.

  • PlayerZero raised $20M: PlayerZero described itself as an Engineering World Model that automates debugging, fixing, and testing code on autopilot . The company said it connects code, telemetry, incidents, docs, customer tickets, Slack threads, PR reviews, and CI/CD history into a single context graph . PlayerZero said it has raised $20M and claimed customer outcomes including 30% more engineering bandwidth, 90% faster resolution, 95% of breaking changes caught, and 80% fewer support escalations .
  • OpenAI hired an ads leader: The Wall Street Journal reported that OpenAI hired former Meta advertising executive Dave Dugan to lead ad sales . Separate commentary said he will lead global ad solutions, signaling that OpenAI is getting serious about building an advertising business around ChatGPT and other products .
  • Cohere and Saab signed an AI collaboration MOU: Cohere said it signed a Memorandum of Understanding with Saab to explore advanced AI partnerships for aerospace platforms and deliver tailored AI solutions critical to Saab’s operations .
  • Final training runs are only a minority of R&D compute spend: Epoch AI estimated that across OpenAI, MiniMax, and Z.ai, less than 30% of R&D compute spending goes to final training runs, with the rest going to experiments, synthetic data generation, and other workloads . Epoch’s earlier estimate for OpenAI alone was about 10% of $5B in 2024 R&D compute spending .
  • Coding tool loyalty remains low: The Information reported that hundreds of Notion engineers are switching from Cursor to Anthropic’s Claude Code and OpenAI’s Codex, alongside the broader point that engineers are quick to move when a better coding tool appears .

Policy & Regulation

Why it matters: Government and multilateral institutions are moving from abstract AI concern to named bureaucracies, concrete risk language, and supply-chain scrutiny.

  • U.S. State Department: The State Department said it is launching a Bureau of Emerging Threats to address current and future threats in cyberspace, outer space, critical infrastructure, cyberattacks, and AI risks .
  • UN-linked AI deception brief: ScienceBoard_UN released a brief defining AI deception as systems misleading people about what they know, intend, or can do, warning that this could undermine oversight, fuel misinformation, and create serious global risks as systems grow more capable . Yoshua Bengio said evidence of deceptive behavior has already appeared in widely used AI systems and that the risk should grow as systems become more capable, autonomous, and embedded in decision-making .
  • Pentagon supply-chain tension around Claude: A report summarized in the notes said the Pentagon is moving to integrate Palantir’s AI as a core system across U.S. military operations, but that deeper Maven adoption is complicated by use of Anthropic’s Claude, which Reuters previously reported had been deemed a supply-chain risk amid a dispute over AI safety guardrails .

Quick Takes

Why it matters: These are smaller updates, but each points to a live thread in models, agents, robotics, or evaluation.

  • Jensen Huang said, “I think we’ve achieved AGI,” while also saying AGI is hard to define because there is no uniform standard and that 2026 could be a turning point; Yuchenj_UW said he disagrees with Huang’s definition while still finding the perspective interesting .
  • Figure 03 was described as fully autonomous, reasoning from camera pixels and computing torque to control more than 30 motors .
  • AMD open-sourced Apex, an end-to-end agent using Claude Code plus Codex to optimize AMD kernels through iteration and feedback rather than one-shot code generation .
  • LiteParse added URL parsing and buffer or stream support, letting agents read internet PDFs in seconds without using a VLM under the hood .
  • OpenClaw v2026.3.22 added a ClawHub plugin marketplace, MiniMax M2.7 and GPT-5.4 mini/nano support, per-agent reasoning, OpenShell plus SSH sandboxes, and more search integrations .
  • Roboflow’s RF-DETR 1.6 update makes fine-tuning 30% faster without accuracy loss, building on the earlier Apache 2.0 real-time segmentation release .
  • Qwen3.5 can score very high on AIME and LiveCodeBench yet remain unstable across repeated runs; one example said 32 runs on AIME can produce 32 different outcomes, which is why some benchmark builders are working on less brittle evals .
AI PM Role Maps, Smarter AI Bets, and Parallel Agent Workflows
Mar 24
8 min read
53 docs
Product Growth
andrew chen
Sachin Rekhi
+2
This issue maps the AI PM job market, offers a practical framework for deciding when and how to use AI, and highlights two execution shifts: phone-orchestrated agent work and the gap between flashy AI prototypes and product-quality outputs.

Big Ideas

1) AI PM is splitting into clearer lanes

AI PM roles now break across two axes: traditional PMs adding AI features versus AI-native PMs building products where AI is the product, and application / platform / infra layers in the stack .

  • What the market looks like: Traditional PM with AI features is 80% of roles, while AI-native PM is 20%. The traditional category has 4x more open roles.
  • Where the technical bar rises: Application PMs account for 60% of roles, platform PMs 30%, and infra PMs 10%; the deeper the layer, the harder the technical bar .

Why it matters: Resume positioning, interview prep, portfolio choices, and target companies change depending on which lane you choose .

How to apply: Pick one role type and one stack layer before you start building projects or rewriting your resume. If you are transitioning from a traditional PM background, application roles are the clearest entry point .

2) Good AI product strategy starts with saying no

Aakash Gupta’s decision rule is simple: use AI for pattern recognition in complex data, prediction from historical data, and personalization at scale. Prefer heuristics or rules when explainability is non-negotiable, clear domain rules exist, data is limited, or speed matters more than sophistication.

The best AI PMs know when to say no to AI. That judgment is more valuable than knowing how to build a RAG system.

Why it matters: Teams often over-apply LLMs to problems that would be faster, cheaper, and more reliable with rules or simpler ML approaches .

How to apply: Treat whether a problem should use AI at all as the first product decision, not the last. If the answer is yes, match the technique to the job: traditional ML for structured prediction and explainability, deep learning for image/video/audio tasks, and GenAI for conversational, generative, or synthesis-heavy work .

3) Non-AI-native startups are now making portfolio-level strategy calls

Andrew Chen notes that many non-AI-native startups funded in the 2020-2025 window are deciding whether to reinvent the product to be AI-native, pivot toward AI, or use AI in the back office and ride it out. His warning: opportunity cost is the hardest thing to calculate, and the most dangerous startups may be the ones with just enough revenue to keep going .

Why it matters: This is no longer just a feature-roadmap question. It is a company-level product strategy question .

How to apply: In annual planning, force an explicit comparison between the cost of reinvention, the cost of a pivot, and the cost of standing still .

Tactical Playbook

1) A practical sequence for building AI features

  1. Choose workflow or agent first. Use a workflow for predetermined, deterministic sequences. Use an agent when the system needs to make decisions, reason, act, and learn across steps .
  2. Start with prompts and examples. System prompts set behavior; few-shot examples show the model what good and bad outputs look like. The source notes that teams can double response quality by adding 3-5 strong examples instead of more instruction text .
  3. Engineer context deliberately. Separate immediate, session, and knowledge context, and load only what the task actually needs .
  4. Use RAG before fine-tuning. For enterprise or domain-grounded answers, chunk documents, convert them into vectors, store them in a vector database, retrieve the nearest matches, and pass those chunks into the LLM .
  5. Escalate in the right order. Optimize prompts, then context engineering, then RAG, and only then consider fine-tuning. Gupta’s claim is that 80% of use cases are solved by RAG .

Why it matters: It gives PMs a build order that avoids premature complexity and keeps the team focused on the highest-leverage fixes first .

How to apply: Turn these five steps into your default review checklist for new AI features.

2) How to set up a parallel AI workbench with Claude Dispatch

  1. Configure desktop first. Set up Cowork on desktop with the connectors you actually use, such as Gmail, Notion, and Slack, and keep the desktop awake .
  2. Start work from mobile. Open the Claude mobile app, use the Dispatch tab, and ask it to run a Cowork task .
  3. Give file access in a usable way. Grant folder access by describing folders naturally or by using shortcuts; start with the workspace that contains your CLAUDE.md and knowledge files .
  4. Load your rules before delegating. Ask Dispatch to read your CLAUDE.md before it creates subtasks so the instructions it writes are sharper .
  5. Solve file transfer once. Sync the Cowork workspace folder with Google Drive so files move automatically between desktop and phone .
  6. Run tasks in parallel. From one mobile thread, start multiple independent task sessions, check progress, redirect each one, and bridge context only when needed .

Why it matters: The setup matches how PMs actually work across multiple parallel workstreams rather than forcing one-task-at-a-time behavior .

How to apply: Use it for work that benefits from breadth and iteration while you are away from your desk: competitor tracking, research synthesis, stakeholder drafts, and visual iteration .

Case Studies & Lessons

1) A 48-hour Dispatch test suggests AI can change day design, not just task speed

In one 48-hour experiment, the author directed 60+ task sessions from a phone while producing competitor summaries, comparison tables, sponsor pages, gap analyses, and multiple infographic iterations . The reported split was roughly 25 minutes of human direction versus 3+ hours of parallel Claude execution . The author’s summary of the work split: 90% human thinking, 100% human takes and opinions, and 90% Claude research and formatting.

Use AI to amplify your thinking, not to replace it.

Why it matters: The lesson is not just faster output. It is that async direction from a phone can reshape how a PM structures the day .

How to apply: Keep judgment, prioritization, and opinion with the PM; let AI take the first pass on research, drafting, and formatting .

2) Fast AI prototypes still miss the work that makes a product usable

Sachin Rekhi argues that AI prototyping is easy to start and hard to master. His critique of many one-prompt prototypes is specific: they may look impressive at first, but often do not match the design of the existing product, lack meaningful differentiation, and fail to master the core workflows. His response is an AI Prototyping Mastery Ladder with 15 essential skills.

Why it matters: Speed to a functional demo can hide whether the prototype is actually good product work .

How to apply: Review prototypes against three gates before you get excited: design fit, differentiated value, and quality on the core workflow .

Career Corner

1) The best AI PM entry path is narrower than it looks

For PMs trying to break into AI, the highest-volume lane is still traditional PM with AI features, which represents 80% of roles and roughly 4x the openings of AI-native roles . Within the stack, application PM roles are 60% of the market and are described as the easiest entry point for someone moving from a traditional PM background .

Why it matters: You do not need to target the hardest, deepest roles first to get into AI PM .

How to apply: If you are transitioning, aim first at traditional-plus-application roles, then deepen toward platform or infra once you have shipped AI work .

2) Hiring managers want shipped products and a portfolio that proves range

Gupta’s advice is to build products, not projects: launch, get real users, and learn from what breaks . He recommends three portfolio artifacts with real users:

  • a product solving a real problem you have
  • an agent that demonstrates goal-oriented reasoning
  • a RAG system grounded in a domain you know well

Why it matters: This portfolio shows both general product execution and AI-specific judgment .

How to apply: Replace tutorial clones with artifacts that show users, failure modes, fixes, and product decisions .

3) Evals and company environment are becoming career signals

Gupta frames AI evals in a simple structure: inputs, a task that generates outputs, and a scoring function from 0 to 1. He also says the AWS AI Practitioner certificate can complement hands-on work, but certification alone is not enough . And he highlights that different company cultures train different PM muscles: Amazon emphasizes writing and customer-backwards docs, Meta emphasizes experimentation, and Netflix emphasizes autonomy .

Why it matters: PM candidates increasingly need to show production thinking and to choose environments that develop the skill they want most .

How to apply: Add eval design to your portfolio, pair any certification with shipped work, and be intentional about the PM culture you want to learn in .

Tools & Resources

  • AI PM at Netflix, Amazon and Meta - Here’s How to Become an AI PM (Fundamentals + Job Search) — a useful role taxonomy, AI decision framework, and job-search roadmap for PMs moving into AI
  • The Claude Dispatch Guide: 48 Hours Running AI Agents From My Phone — practical setup, workflow examples, and lessons from running PM tasks in parallel across phone and desktop
  • Cowork on your desktop — the prerequisite setup guide before using Dispatch
  • The AI Prototyping Mastery Ladder — a deeper resource on the 15 skills Rekhi says matter for moving from flashy prototypes to product-quality outputs
  • RAG vs fine tuning guide — helpful if your team is comparing prompt optimization, context engineering, RAG, and fine-tuning
  • Claude surface selection: use Dispatch for mobile orchestration of desktop tasks, Channels for bidirectional and scheduled work inside active sessions, and Web Sessions for remote coding or prototyping
  • Knowledge layer pattern: store CLAUDE.md plus templates, workflows, and knowledge files in a GitHub repo so the system compounds across surfaces; the claim is that PMs who build this layer can ship at 5x the pace of ad-hoc users
Paul Graham’s Essays Lead Brian Armstrong’s Latest Resource Signals
Mar 24
2 min read
124 docs
Relentless
When Shift Happens
Brian Armstrong
Brian Armstrong’s recommendations today split between execution and macro context: Paul Graham’s public essays for acting under uncertainty, and a Niall Ferguson book for understanding monetary history. Graham’s writing stands out because Armstrong ties it to a specific principle he still repeats: action produces information.

Most compelling recommendation

The strongest save today is Paul Graham’s public essays. The endorsement is unusually specific: Armstrong says he read almost all of Graham’s writing on his public website, calls him a "legend" and "hero," and still cites one lesson as a favorite: "action produces information."

  • Title: Paul Graham’s public essays / public website
  • Content type: Essays / blog writing
  • Author/creator: Paul Graham
  • Link/URL: paulgraham.com
  • Who recommended it: Brian Armstrong
  • Key takeaway: If you are unsure what to do, take an action that generates feedback—even a small one—because it helps break analysis paralysis and reveals the next step
  • Why it matters: This is not a generic endorsement. Armstrong presents Graham as someone who shaped how he operates and pairs the recommendation with a concrete decision-making framework he still uses

"One of my favorite lessons is that action produces information."

Armstrong expands that principle in practical terms: host a dinner, call someone, choose a name, or write a paragraph—anything that starts motion and creates information about what to do next

Also worth saving

Armstrong separately recommends a Niall Ferguson book on the history of money, but he does not name the exact title in the cited remarks

  • Title: Niall Ferguson book on the history of money (title not specified in the source)
  • Content type: Book
  • Author/creator: Niall Ferguson
  • Link/URL: Not provided in the source material
  • Who recommended it: Brian Armstrong
  • Key takeaway: Armstrong recommends it as a way to study monetary history and understand how recent the current central-bank system is, dating the modern setup to 1971 when Nixon left the gold standard
  • Why it matters: In Armstrong’s framing, the historical lens matters because currencies disconnected from hard-backed commodities can lead to overprinting, inflation, and eventual loss of reserve-currency status

Pattern

Today’s signal is small but coherent: Armstrong points readers to one resource for operating under uncertainty and another for understanding monetary regimes through history. One offers a founder heuristic for getting unstuck; the other offers historical context for questioning how durable the current system is in Armstrong’s framing

Fertilizer Shock, Grain Fund Length, and Brazil Harvest Losses Reshape the Outlook
Mar 24
10 min read
170 docs
Foreign Ag Service
Successful Farming
Gabe Brown
+10
Grain and livestock markets are being driven by energy-linked input risk, fund positioning, and mixed trade signals, while Brazil faces weather damage, diesel shortages, and export exposure. This brief also highlights measurable returns from regenerative soil systems, organic row-crop management, and practical livestock and crop-input execution.

1) Market Movers

  • United States - grains: Price action stayed headline-driven. On the week ending before March 23, May corn settled at 465.5¢ (-1.75¢), May soybeans at 1161.25¢ (-48¢), and May SRW wheat at 595.25¢ (-18.5¢) . Friday pressure was tied to profit-taking, technical selling, a stronger U.S. dollar, and rain prospects for the Southern Plains wheat belt .
  • March 23 rebound: Early trade then turned higher, with May corn at $4.73 1/4, May soybeans at $11.69 1/2, and May Chicago wheat at $6.05. The drivers cited were Iran/Hormuz tension, fertilizer concerns, and speculative buying as grains lagged the broader commodity rally .
  • Fund positioning: Corn still carries the heaviest speculative support. CFTC data for the week ended March 17 showed funds net-bought 32,000 corn contracts, lifting the net long to 231,000, the largest since February 2023; combined corn/soy/SRW positioning was 414,000 contracts net long. Another market source put corn fund length at roughly 230,000 after successive weeks of covering and warned that a crude-oil reversal could pull money back out of corn . In soybeans, funds were net sellers of 16,000 contracts in the CFTC data, while separate commentary said managers still hold a sizable soy complex length, including the largest soy oil net long since November 2016 .
  • Exports and trade flow: Weekly U.S. export inspections were 66.9 million bushels of corn, 40.5 million bushels of soybeans, and 16.8 million bushels of wheat. Marketing-year pace is still 306 million bushels ahead of USDA's corn target, 55 million bushels ahead on wheat, and 116 million bushels behind on soybeans . Private exporters also reported 102,000 MT of corn and 161,120 MT of soybeans sold to Mexico for 2025/26 delivery .
  • China-linked demand remains mixed: A delayed Trump-Xi meeting weighed on soybeans early in the week . At the same time, other commentary said China has kept rhetoric positive on buying U.S. commodities and that talks may expand beyond soybeans to feed grains such as corn, wheat, sorghum, ethanol, and DDGs . The harder data remain weaker: soybean commitments to China were cited as down 49% year over year and actual exports down 61%, while Brazil was still described as cheaper even excluding tariffs . China-bound inspections for the week were 0.0 million bushels of corn, 24.4 of soybeans, 2.5 of wheat, and 7.2 of sorghum.
  • United States - cattle: The cattle market held up despite a superficially bearish Cattle on Feed report. March 1 on-feed numbers were 11.51 million head, placements were 1.61 million head and 4% above both last year and expectations, while marketings were 1.52 million head, down 7% year over year . Traders repeatedly noted that placements were being measured against a weak prior-year base affected by border closure and weather disruption . Cash trade ticked higher, slaughter stayed tight at 508,000 head because of the JBS Greeley disruption, and feedlots were described as regaining leverage .

2) Innovation Spotlight

North Dakota regenerative system with measurable soil and cost outcomes

Gabe Brown described a transition from conventional tillage and synthetic-heavy production to no-till beginning in 1994, after repeated hail and drought losses from 1995-1998 forced a rethink of inputs and system design . His framework centers on minimal disturbance, soil armor, diversity, living roots, and animal integration . On his own operation, water infiltration reportedly improved from 0.5 inch per hour to 2 inches in 25 seconds, and he cited a 13-inch rain event after which the field was drivable the next day . He also described a multi-enterprise model with about 1,000 head of beef cattle plus sheep, hogs, laying hens, and broilers, using daily moves and 12-15 months of rest between grazings on some paddocks .

Brown also shared training examples tied to the same principles: one producer cut input costs by $180,000 in one year, and another reduced fertilizer spending from more than $1.5 million to $127,000 in six years .

Organic row-crop economics plus compliance software

A Colorado organic system built around a corn-pinto bean-wheat rotation reported 203 bushels/acre organic corn on 1,000 acres of the corn phase . The same source described a $10/bushel organic corn market and framed 210-bushel organic corn as roughly $2,100/acre gross revenue. The tradeoff is operating intensity: organic management was described as requiring nine field passes versus three to four in a conventional program .

The administrative bottleneck is also material. The source estimated about 20 hours/week of paperwork on a 2,000-acre farm, plus mandatory annual audits and mass-balance reconciliation . Quick Organics said digitizing the Organic System Plan can reduce prep time from four days to four hours, and a common Organic System Plan is expected to be publicly announced in May 2026.

Western Kansas water use without yield loss

In western Kansas, producers reportedly cut Ogallala aquifer pumping by 20% without reducing yields by combining local irrigation limits, no-till practices, crop choices, and soil-moisture monitoring, while keeping farms profitable .

3) Regional Developments

  • Brazil - southeast Mato Grosso soy harvest losses: Excess rain is now translating directly into yield and margin damage. One farm reported 850 mm of rain from Jan. 30 to Mar. 15, severely waterlogged silt soils, and 40-50% damaged soybeans . To salvage roughly 90 hectares, the producer bought an adapted rice-field harvester for about R$500,000. Expected yield fell from a normal 65-70 bags/ha to a little above 50 bags/ha, against costs around 61 bags/ha, implying an estimated loss of about 11 bags/ha. The operation had already pre-sold about 80% of production via contracts .
  • Brazil - logistics are compounding the weather hit: On the same Mato Grosso case, a 160 km haul to Campo Verde was said to take 5.5 hours because of road conditions . Diesel moved from R$6.15 to R$8.08/liter, with peak harvest consumption of 2,000-3,000 liters/day and reports of rationing .
  • Brazil - South weather risk: Rio Grande do Sul and Santa Catarina were under red alert for heavy rain, hail, winds above 100 km/h, flooding, and possible tornadoes or microbursts, with producers advised to avoid fieldwork during the event . After the system passes, Paraná is expected to face another 7-8 days of drier weather, worsening crop water deficits .
  • Brazil - poultry exposure to the Middle East: The Middle East takes about 30% of Brazil's chicken meat exports, so blockage or disruption around Hormuz raises insurance, freight, and fuel costs, cutting exporter margins and potentially pressuring producer prices if product is forced back into the domestic market .
  • China/Brazil soybean flow: China eased inspection requirements on Brazilian soy shipments after weed detections and said it would not impose zero tolerance for weeds, a move that one market source said was consistent with Brazil's large crop and China's import needs .
  • United States - Plains wheat: Kansas wheat conditions were described as having deteriorated through winter, with timely spring rains now important for the crop . Separate commentary also flagged hot, dry weather risk in the southern Plains .

4) Best Practices

Grains and crop inputs

  • Biological nitrogen products: One testing-based commentary said biological N products do provide nitrogen, but only if the microbes survive application. The main cautions were to avoid copper, certain zinc products, and especially chlorine in the carrier, and to test soil nitrogen and organic matter first so biological N is only used where the crop actually needs it .
  • Enlist One application discipline: Brownfield cited near-zero volatility, more than 96% volatility reduction versus 2,4-D 240 ester, and more than 90% drift reduction when Enlist One is used correctly . The implementation checklist was specific: start with a clean tank and clean water, choose nozzles that deliver the correct droplet size, run 10-20 GPA depending on the tank mix, use the labeled rate of 2 pints/acre, spray in 3-10 mph wind, and avoid air inversions .

Livestock

  • Pre-lambing ewe management: A Minnesota sheep-health webinar recommended targeting ewe body condition scores of 3-4, measured by palpation rather than visual scoring . Nutrition should step up in the last 4-6 weeks of gestation, when lambs gain about 75% of birth weight and the udder fully develops . The same source recommended loose iodized mineral year-round rather than blocks, a clostridial C/D vaccination 4-6 weeks pre-lambing, and pre-lamb shearing to improve cleanliness, lamb access to the teat, and ewe awareness of newborn lambs .
  • Lamb survival basics: The same presentation stressed rapid colostrum intake, stripping wax plugs from teats if needed, and dipping navels with iodine or a similar antiseptic to reduce early infections .
  • Foot rot control: The recommended program included vaccinating twice six weeks apart, then every six months in wet conditions, using a 10% zinc sulfate footbath, and keeping sheep on dry ground for three days after treatment .
  • Barn climate management: For Brazilian poultry and swine barns, weather analysts warned that long warm stretches interrupted by short cold pulses can reduce immunity, making thermal comfort management a recurring operational issue this autumn .

Soil management

  • Five-principle soil framework: Brown's implementation model is practical and sequential: reduce mechanical and chemical disturbance, keep residue on the surface, increase diversity, maintain living roots as long as possible, and integrate animals back into the system .

5) Input Markets

  • Diesel and fertilizer are the main input story: In the U.S., the national average diesel price moved above $5/gallon for only the second time on record, linked to the Strait of Hormuz disruption, with explicit warnings about impacts on agriculture, trucking, and freight costs . In Brazil, corn producers were warned that urea and other nitrogen fertilizers are up 30-40%, directly affecting top-dress economics .
  • Availability matters as much as price: Brazilian sources described diesel at R$8/liter in Brasília and said some retailers were receiving less than half their normal deliveries . Brazil produces about 3.7 million barrels/day of crude but refines only around 2 million barrels/day, leaving agribusiness exposed to imported diesel and higher freight costs .
  • Fertilizer policy bottlenecks in Brazil remain unresolved: Brazil's National Fertilizer Policy created Confert to support domestic production, biofertilizers, hybrid technologies, and industrial residues . But one legal analysis said the policy is not self-implementing and that hybrid mineral-biological fertilizers and some industrial residues still fall into a registration gray zone because Confert cannot directly update MAPA rules .
  • Natural gas remains a fertilizer watch-point: One market note argued that natgas-based fertilizers are fundamental for spring grain pricing. It also said fertilizer indices have correlated with European natural gas at about 0.72 over 2015-2025 and about 0.83 since 2020, while the recent gas price rise has been more pronounced in Europe than in U.S. Henry Hub .
  • Policy response is building: In Brazil, farm groups asked to raise the biodiesel blend to 17% from 15% to reduce dependence on imported refined fuel . In the U.S., lawmakers are seeking more fertilizer price transparency amid rising costs .

6) Forward Outlook

  • March 31 is the next major market checkpoint: USDA Prospective Plantings and quarterly stocks are due then, and one analyst specifically flagged the corn stocks report as a release that has missed trade expectations by multiple hundreds of millions of bushels in prior years .
  • Biofuel policy is next in line: RFS blending mandates are expected between Friday and next Tuesday, with the market using EPA's prior 5.61 billion gallon biodiesel proposal as the reference point for soy oil and soybean demand .
  • Acreage signals are split: One source put corn intentions near 95 million acres as of March 1 and said market economics still favor corn over soybeans, though fertilizer uncertainty is a major swing factor . Another source said market talk has centered on a possible 5 million acre shift from corn to beans . Soybean acres were also expected to rise from last year's low base .
  • Livestock planners have immediate report risk: Thursday's Hogs and Pigs report is the key near-term event for inventory, pig crop, and farrowing revisions . In cattle, attention shifts back to cash trade, slaughter pace, and whether grilling-season demand keeps supporting prices .
  • Weather remains a planning variable, not a backdrop: Kansas wheat still needs timely rain . In Mato Grosso and parts of the Matopiba, producers were advised to use the drier March 26-30 window to advance fieldwork before heavier rain returns . In southern Brazil, cyclone risk is immediate, but moisture deficits in Paraná remain unresolved after the storm passes .
  • The broader spring setup: If energy markets retreat, some corn fund length may unwind; if diesel, natural gas, and fertilizer stay elevated, input-cost pressure is likely to keep supporting grain risk premiums into spring .

Discover agents

Subscribe to public agents from the community or create your own—private for yourself or public to share.

Active

Coding Agents Alpha Tracker

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

110 sources
Active

AI in EdTech Weekly

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

92 sources
Active

Bitcoin Payment Adoption Tracker

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

102 sources
Active

AI News Digest

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

114 sources
Active

Global Agricultural Developments

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

86 sources
Active

Recommended Reading from Tech Founders

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

137 sources

Supercharge your knowledge discovery

Reclaim your time and stay ahead with personalized insights. Limited spots available for our beta program.

Frequently asked questions