Hours of research in one daily brief–on your terms.

Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.

Setup your daily brief agent
Discovering relevant sources...
Syncing sources 0/180...
Extracting information
Generating brief

Recent briefs

4-day autonomous agents, Cursor MCP Apps, and the push from code review to evidence
Mar 4
4 min read
111 docs
Romain Huet
Tristan Rhodes
Latent.Space
+8
Cursor claims a 4-day fully autonomous agent run produced a stronger-than-official solution to a math research challenge—suggesting coordination techniques may generalize beyond coding. Also: Cursor’s MCP Apps (interactive UIs in-chat), model/tool value debates (Codex vs others), and concrete execution patterns like Implementation Plans, 95%-confidence autopilot loops, and async checkpoints.

🔥 TOP SIGNAL

Cursor’s CEO says their agent harness ran fully autonomously for 4 days (no nudges/hints) and produced what they believe is a novel, stronger solution to Problem Six of the First Proof math research challenge—an early signal that “scaling agent coordination” may generalize beyond coding tasks . The claimed improvements include using the Marcus–Spielman–Srivastava interlacing polynomial method, improving a constant from c=0.03 → 0.13, and partitioning the entire vertex set into light components (vs a subset) .

🛠️ TOOLS & MODELS

  • Cursor — MCP Apps support (new): Cursor now supports MCP Apps, so agents can render interactive UIs inside conversations.

  • OpenAI Codex — “most agentic coding per dollar” (practitioner claim): Romain Huet says Codex is currently the best option by far for agentic coding value .

  • Antigravity (agentic coding platform) — “Implementation Plan” + screenshot-to-Flutter UI

    • Recommended flow: request an “Implementation Plan” artifact first, review/edit the markdown architecture/logic, then approve execution—explicitly warning “don’t let AI write code blindly” .
    • “Screenshot → functional Flutter UI” demo: drop a screenshot and ask to rebuild as Flutter UI; described as powered by Gemini 3 Flash and launching on-device .
  • Claude Opus 4.5 / 4.6 (Copilot workflow) — quality jump (firsthand): Burke Holland describes Opus as a practical inflection point for building tools quickly, contrasting it with Sonnet 3.5 output he calls “spaghetti code” and “willy nilly” changes .

💡 WORKFLOWS & TRICKS

  • Steal this: “Implementation Plan → approve → execute” as your default safety rail (Antigravity)

    1. Ask the agent for an Implementation Plan artifact first .
    2. Review and edit the architecture + markdown logic yourself .
    3. Only then approve execution (the explicit goal: control the outcome vs blind codegen) .
  • Plan mode isn’t about the plan—it’s about flushing missing constraints (Burke Holland)

    • Start in “plan mode” and do 4–6 loops where the agent proposes what you forgot to specify + multiple options, before you let it implement .
  • Autopilot / loop-until-confidence (Burke Holland)

    • Run the agent in a loop that feeds its output back into itself, but change the stop condition from “until it’s done” to “until you have 95% confidence it’s done” .
  • Task classification + model routing + sub-agent fanout (multi-model orchestration) (Burke Holland)

    • Use a “front” agent to classify tasks as easy/medium/hard and change the workflow accordingly (hard tasks: plan + sub-agents + farm-out work) .
    • In the described Copilot setup, different models can be used in one run (example routing: Gemini for design, other models for refactoring) and scaled up to many sub-agents—but the workflow must still output something verifiable.
  • Async agent + human checkpoints (Burke Holland)

    • Pattern: give the CLI a big job, walk away, and have it message you (example: Telegram) with progress + a “what next?” checkpoint so you can approve/deny and let it continue .
  • Reality check: “polish” is still synchronous (Kent C. Dodds)

    • Kent calls out that with cloud agents, polish requires real-time back-and-forth while you try outputs and iterate—hard to do asynchronously from phone/desktop today .

👤 PEOPLE TO WATCH

  • Michael Truell (Cursor) — concrete evidence of long-horizon autonomy: same harness previously used to “build a browser from scratch,” now used for a 4-day autonomous run on a math research problem .

  • Burke Holland (GitHub Copilot DevRel) — unusually replicable patterns for “agent experience”: plan-mode loops, 95% confidence autopilot loops, and multi-model orchestration with evidence requirements .

  • Simon Willison — frames the core bottleneck as security review at scale: treat coding agents like “teams of mixed ability engineers” shipping under deadline pressure; security issues are “directly harmful” vs survivable code quality issues .

  • swyx (+ Ankitxg) — continued push to remove review bottlenecks: calls “killing the Code Review” the “Final Boss of Agentic Engineering,” pointing to a layered playbook and “Dark Factory” anecdotes (no human code and no human review) .

🎬 WATCH & LISTEN

1) Changelog — “Plan mode” loops that prevent bad prompts (≈20:55–22:04)

Hook: plan mode as a structured way to surface what you forgot to ask for, plus multiple implementation options before execution .

2) Changelog — Autopilot: loop until 95% confidence (≈22:16–23:03)

Hook: changing the stopping condition (“until it’s done” → “until 95% confident”) to force deeper self-checking iterations .

📊 PROJECTS & REPOS


Editorial take: The frontier is shifting from “write code” to run loops + produce evidence—and the hardest unsolved piece is how you scale review (especially security) without slowing agents back down .

Gemini 3.1 Flash‑Lite launches as GPT‑5.3 Instant rolls out and Anthropic nears $19B run-rate
Mar 4
9 min read
862 docs
Thariq
Logan Kilpatrick
Sam Altman
+43
Gemini 3.1 Flash‑Lite Preview lands with “thinking levels,” aggressive speed claims, and $0.25/$1.50 per MTok pricing, while OpenAI rolls out GPT‑5.3 Instant broadly and adds GPT‑5.3-chat-latest to the API. Also: Anthropic’s reported $19B run-rate and business share shift, Arena’s new Document Arena leaderboard, and continued turbulence inside Alibaba’s Qwen team.

Top Stories

1) Google ships Gemini 3.1 Flash‑Lite Preview (speed + cost focus, with adjustable “thinking levels”)

Why it matters: The release is positioned for high-volume, low-latency workloads, and adds a new control surface (“thinking levels”) that lets developers trade off compute vs. complexity on a per-task basis—useful for agent pipelines and real-time processing.

Key details from Google and independent evals:

  • Availability: Rolling out in preview via the Gemini API in Google AI Studio and Vertex AI.
  • Pricing:$0.25 / 1M input tokens and $1.50 / 1M output tokens.
  • Speed claims (vs Gemini 2.5 Flash):2.5× faster time to first answer token and 45% faster output speed.
  • Benchmarks shared by Google:1432 Elo on Arena leaderboard, up to 86.9% on GPQA Diamond, and 76.8% on MMMU‑Pro.
  • “Thinking levels”: Google describes adjustable compute with “zero thinking overhead” on high-volume tasks, while reasoning through complex edge cases.
  • Artificial Analysis (Gemini 3.1 Flash‑Lite Preview): scored 34 on the Artificial Analysis Intelligence Index (up 12 vs Gemini 2.5 Flash‑Lite) while served at >360 output tokens/s with ~5.1s average answer latency.
  • Context + features (AA): retains 1M token context and supports tool calling, structured outputs, and JSON mode.

2) OpenAI rolls out GPT‑5.3 Instant broadly (and adds GPT‑5.3‑chat‑latest to the API)

Why it matters: This is a “most-used model” refresh focused on more direct, less defensive responses and improved web search behavior—the kinds of UX shifts that can materially change product adoption even without a headline benchmark jump.

What’s new / where it’s available:

  • ChatGPT rollout: “GPT‑5.3 Instant in ChatGPT is now rolling out to everyone.”
  • Stated behavioral goals: fewer unnecessary refusals, fewer defensive disclaimers, and answers that “get to the point more directly.”
  • Web search improvements called out by OpenAI: sharper contextualization, better understanding of question subtext, and more consistent tone within a chat.
  • Hallucination/factuality note: for “questions where factuality matters most,” one contributor reports 26.8% better (when searching) and 19.7% better (when not searching).
  • API: “GPT‑5.3‑chat‑latest now also in the API.”
  • Benchmarking access: “GPT‑5.3‑Chat‑Latest” is available in Arena’s Text Arena for testing.

OpenAI also teased:

“5.4 sooner than you Think.”

3) Anthropic momentum: $19B run‑rate reports + business share shift + senior talent move

Why it matters: Multiple signals point to rapid enterprise pull: reported revenue acceleration, business market share movement, and a high-profile research leadership transition.

  • Revenue run‑rate: Sources cited by Bloomberg via Techmeme say Anthropic recently surpassed $19B run‑rate revenue (up from $9B end of 2025 and ~$14B a few weeks earlier).
  • Run‑rate disclaimer: described as “annualized run-rate,” not realized revenue.
  • US business AI market share claim: Feb 2025: ChatGPT 90%; Feb 2026: Claude ~70%.
  • Talent move: Max Schwarzer (OpenAI post‑training leadership) said he’s leaving OpenAI and joining Anthropic to work on RL research.

4) “Document Arena” launches with PDF-based evaluations (Claude Opus 4.6 leads)

Why it matters: Document reasoning is closer to many real workflows (contracts, reports, technical PDFs). Arena’s new format uses user-uploaded PDFs and side-by-side voting, making the leaderboard a live signal for “doc work” performance.

  • Document Arena is live and compares frontier models on document reasoning using PDFs.
  • Leaderboard snapshot: Claude Opus 4.6 is #1 at 1525 (+51 lead).
  • Arena says Opus 4.6 is now #1 across Text, Code, Search, and Document arenas.
  • PDF upload workflows highlighted: summarize complex content, ask questions against the file, extract key insights.

5) Alibaba Qwen team turbulence (leadership change + departures + org restructure signals)

Why it matters: Qwen is widely credited as core infrastructure for open-weight ecosystems; leadership and staffing instability could change the pace and direction of open model releases.

  • Leadership change: “Alibaba‑Cloud kicked out Qwen’s tech lead.”
  • Departure posts: Qwen tech lead @JustinLin610: “me stepping down. bye my beloved qwen.” and @huybery: “bye qwen, me too.”
  • Restructure context (Tongyi conference summary): Qwen described as a group priority with plans for expansion; references to resource constraints (including compute) and organizational changes.
  • External view on impact: Qwen 1.0 launched in fall 2023; subsequent releases “pushing the frontier of open-weights,” enabling “hundreds, maybe thousands” of papers and many products/startups.

Research & Innovation

What to watch: reliability + efficiency are increasingly “core research,” not just engineering

Two clusters stood out this cycle: (1) methods that reduce the memory/compute cost of training and (2) evidence that multi-agent coordination is still fragile without deliberate design.

Training efficiency: FlashOptim (Databricks AI Research)

  • Claim: cuts training memory by over 50% with no measurable loss in model quality.
  • Concrete metric: AdamW training typically needs 16 bytes/parameter for weights, gradients, and optimizer state; FlashOptim reduces this to 7 bytes (or 5 with gradient release).
  • Example: Llama‑3.1‑8B finetuning peak GPU memory drops from 175 GiB → 113 GiB.
  • Compatibility: drop-in replacement for SGD, AdamW, Lion; supports DDP and FSDP2; open source.
  • Techniques summarized by Databricks: improved master weight splitting + companded optimizer-state quantization.

Optimization + search: SkyDiscover (open-source)

  • Releases an open-source framework with two adaptive algorithms reported to match/exceed AlphaEvolve on many benchmarks and outperform OpenEvolve/GEPA/ShinkaEvolve across 200+ optimization tasks.
  • Reports +34% median score improvement on 172 Frontier‑CS problems and “discovers system optimizations beyond human-designed SOTA.”

Agent reliability: consensus + coordination don’t “just emerge”

  • Byzantine consensus games: research finds valid agreement is unreliable even in benign settings and degrades with group size; most failures are convergence stalls/timeouts (not subtle value corruption).
  • Theory of Mind (ToM) in multi-agent systems: a ToM/BDI + symbolic verification architecture shows ToM-like mechanisms don’t automatically improve coordination; effectiveness depends on underlying LLM capability.

Biology: Eubiota “AI co-scientist” claims lab-validated discoveries

  • Eubiota is described as a multi-agent AI framework for end-to-end discovery (planning, tool use, evidence verification, wet-lab validation).
  • Reports 87.7% mechanistic reasoning accuracy (vs GPT‑5.1 77.3%).
  • Reported validated outcomes include: identifying the uvr‑ruv stress axis (screening 1,945 genes and 10K papers), designing a microbial therapy reducing colitis inflammation, engineering antibiotics, and discovering anti-inflammatory metabolites.

Products & Launches

What to watch: tools are converging on “agent runtimes” (compute + context + UI + eval)

This week’s releases focus less on single APIs and more on the scaffolding around agents: sandboxes, computer-use, document pipelines, and debugging/observability.

Developer agents and orchestration

  • Cursor cloud agents: run in isolated VMs with full computer-use capabilities; produce merge-ready PRs and validation artifacts (video/screenshot) across web/mobile/Slack/GitHub.
  • Cursor MCP Apps (v2.6): agents can render interactive UIs inside conversations; also adds private plugin marketplaces for teams.
  • OpenAI Codex: shipped a new $chatgpt-apps skill in the Codex app for building ChatGPT apps with the Apps SDK (scaffolding, wiring tools to widget resources, iterating host-aware UI).

Search + research APIs

  • you.com Research API: claims SOTA on DeepSearchQA and top scores on BrowseComp/FRAMES/SimpleQA “at a fraction of the latency and cost.” Offers one endpoint with five depth levels, up to “1,000+ reasoning turns” per query.

Document workflows: evaluation and production tooling

  • Arena Document Arena: PDF upload + side-by-side voting and leaderboard for document reasoning tasks.
  • LlamaIndex positioning: says it has evolved from a RAG framework to an “agentic document processing platform,” with LlamaParse processing 300k+ users across 50+ formats using multi-agent workflows (OCR + computer vision + LLM reasoning).

Speech / realtime

  • AssemblyAI Universal‑3‑Pro streaming: brings AssemblyAI’s most accurate speech model to streaming audio; highlights include real-time speaker labels, strong entity detection, code-switching, and global language coverage.

Specialized models in production contexts

  • Baseten: says it trained a specialist model that beats Gemini on emergency medicine documentation and runs 6–8× faster.

Industry Moves

What to watch: “distribution + workflow integration” is reshaping competition

  • OpenAI building a GitHub alternative: The Information reports OpenAI is developing an internal alternative to GitHub after outages; staff discussed potentially selling it to customers.
  • Perplexity Computer as a packaged runtime: Perplexity says its “Computer” orchestrates 20 different AI models and can be embedded into apps without developers managing API keys, using a secure sandboxed runtime they orchestrate end-to-end.
  • US business market share claim: a post asserts ChatGPT fell from 90% (Feb 2025) to Claude ~70% (Feb 2026).
  • Apple local-compute signal: Apple introduced M5 Pro and M5 Max with a “Fusion Architecture” merging two 3nm dies; claims include over 4× peak GPU compute for AI vs prior generation and 614GB/s unified memory bandwidth.

Policy & Regulation

What to watch: legal definitions are hardening into product constraints

US copyright: AI can’t be the author (Thaler v. Perlmutter stands)

  • US courts held that “authorship” must be human (Thaler v. Perlmutter), and the US Supreme Court declined review (so the D.C. Circuit ruling stands).
  • USCO guidance: prompt-only AI output can’t be registered; meaningful human creative contribution can be protected (and similar logic applies to AI-generated code absent human authorship).

New York bill targeting chatbot legal advice (SB 7263)

  • SB 7263 would prohibit chatbot operators from permitting substantive legal advice that would constitute unauthorized practice of law; it passed the Internet & Technology Committee last week.
  • Includes a private right of action with mandatory attorneys’ fees.

OpenAI–DoW/DoD contract language scrutiny continues

  • OpenAI amended its agreement to state the AI system “shall not be intentionally used” for domestic surveillance of US persons/nationals, including deliberate tracking via commercially acquired personal/identifiable information.
  • The Department affirmed services won’t be used by DoW intelligence agencies (e.g., NSA) without a follow-on modification.
  • Commentators note the full contract text is not public; some argue language could still be porous given legal definitions of “collect/surveil” and “incidental” collection mechanisms.

Global governance signal

  • The UN’s Independent International Scientific Panel on AI elected co-chairs Yoshua Bengio and Maria Ressa, with the first report slated for July 2026.

Quick Takes

What to watch: smaller signals that may compound

  • METR Evals correction: fixed a modeling mistake that inflated recent 50%-time horizons by 10–20%; for Opus 4.6, one update reports P50 11h 59m (down from 14.5h) and P80 1h 20m (up from ~1h).
  • Claude Code voice mode: rolling out (reported live for ~5% of users), toggled via /voice.
  • Codex voice transcription: available to 100% of Codex users; in-app via mic or Ctrl + M, and in CLI via config + press-and-hold Space.
  • Gemini 3 Pro sunset: Google is “turning down Gemini 3 Pro” on March 9; users can upgrade to Gemini 3.1 Pro Preview.
  • Qwen 3.5 GPTQ Int4 weights: Alibaba released GPTQ‑Int4 weights with native vLLM and SGLang support (less VRAM, faster inference).
  • H100 shortage watch: posts report near-zero H100 capacity on Prime Intellect and Lambda dashboards; one provider suggests capacity may improve in coming weeks.
  • Bipartisan opposition to AI data centers: reported escalation includes New York proposing three-year construction moratoriums and communities pulling tax incentives.
Gemini 3.1 Flash-Lite raises the speed-per-dollar bar as open-weights surge and agent tooling proliferates
Mar 4
7 min read
167 docs
Sam Altman
Jeremy Howard
Jeremy Howard
+30
Google’s Gemini 3.1 Flash-Lite leads today’s updates, with multiple benchmarks and pricing framed around speed-per-dollar and adjustable “thinking levels.” The digest also covers OpenAI’s GPT-5.3 Instant rollout, a surge (and shakeup) in open-weight models around Qwen 3.5, expanding agent/research APIs, and fresh scrutiny on reliability, surveillance language, and AI infrastructure impacts.

Gemini 3.1 Flash-Lite sets a new efficiency bar (and ships broadly)

Google: faster, cheaper, with adjustable “thinking levels”

Google DeepMind announced Gemini 3.1 Flash-Lite, positioning it as the most cost-efficient Gemini 3 series model and “built for intelligence at scale” . Multiple Google leaders said it outperforms Gemini 2.5 Flash while being “smarter, faster, cheaper” , including faster performance at a lower price .

By Google’s shared metrics, Flash-Lite delivers:

  • 2.5× faster time-to-first-token than Gemini 2.5 Flash (with “significantly higher quality”)
  • 45% increase in output speed and 2.5× faster Time to First Answer Token vs. 2.5 Flash
  • $0.25 per 1M input tokens
  • 1432 Elo on LMArena and 86.9% on GPQA Diamond

DeepMind also highlighted new “thinking levels” that let developers dial reasoning up or down depending on the task, including complex workloads like generating UI/dashboards or simulations . In a side-by-side comparison, Google said Flash-Lite is significantly faster in tokens/s and can use ~1/3 as many tokens to complete complex tasks in at least one example .

Availability: DeepMind said Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio , and Jeff Dean noted it’s available in Google AI Studio and Vertex AI.

“Small but mighty — our new Gemini 3.1 Flash-Lite model is incredibly fast and cost-efficient for its performance.”

More: https://goo.gle/3OO11NK

OpenAI rolls out GPT-5.3 Instant to all ChatGPT users

OpenAI said GPT-5.3 Instant in ChatGPT is rolling out to everyone, framed as “more accurate” and “less cringe”. The company also claims the update reduces “unnecessary refusals” and “preachy disclaimers” , with improved behavior like sharper contextualization and better understanding of question subtext.

Details: https://openai.com/index/gpt-5-3-instant/

Open weights: new frontier releases, plus turbulence at Qwen

A burst of flagship open-weight models (and a new adoption lens)

Interconnects reported a “busy month” in open-weights AI, including new flagship models from Qwen, MiniMax, Z.ai, Ant Ling, and StepFun. Highlights include:

  • Qwen 3.5 (0.8B–27B dense; 35B-A3B–397B-A17B MoE): described as multimodal, using reasoning by default, with improved style/instruction-following and multilingual support (with a note that small models may “overthink”)
  • Step-3.5-Flash (196B-A11B MoE): reported as especially strong in math benchmarks, beating models several times larger
  • GLM-5 (744B-A40B): demand reportedly rose enough that the team raised prices for its coding plan
  • MiniMax-M2.5: described as a small model that can rival others (e.g., GLM-5 and Kimi K2.5) and quickly became a community favorite

Interconnects also introduced Relative Adoption Metrics (RAM), which normalizes downloads relative to peer models in the same size class . In its late-2025 snapshot, it noted winners like Kimi K2 Thinking and some OCR models, while stating DeepSeek V3.2 underperformed DeepSeek’s earlier 2025 releases .

Qwen 3.5 goes local (and becomes easier to fine-tune)

A separate YouTube demo highlighted Qwen 3.5 releases (800M, 2B, 4B, 9B) and showed an iOS app (“Locally AI”) running them fully on-device. The video emphasized that prompts and data can stay on the phone (no cloud transmission) .

On the tooling side, a Reddit crosspost said Unsloth now enables local fine-tuning of Qwen3.5 with 5GB VRAM.

Qwen departures spark concern about near-term open-weight incentives

Multiple posts flagged apparent Qwen team departures, including “bye qwen, me too” and “me stepping down” . Jeremy Howard reacted publicly, calling the situation “sad and worrying” and suggesting the team is losing “some of their very best researchers” .

Separately, a report attributed to “word on the street” claimed Alibaba is tightening the screws to make money via proprietary cloud/API rather than open source; Nathan Lambert described this as an “existential risk” for near-term open-weight models and argued only a few actors may have durable business incentives to build them .

Agents and “research infrastructure” products keep expanding

Perplexity Computer: multi-model orchestration + embed into apps

Perplexity announced Perplexity Computer, saying it orchestrates 20 different AI models and can be embedded directly inside apps developers create . CEO Arav Srinivas highlighted an operational differentiator: users don’t need to manage their own API keys, with workloads run in a “secure sandbox” orchestrated end-to-end .

Perplexity also demoed “CEO Chat,” described as letting users text tech CEOs (e.g., Elon, Jensen, Zuck) and receive responses . In another post, Perplexity Computer was claimed to replicate a Bloomberg Terminal feature and “oneshot” a POSH use case involving high-end assets (yachts, watches, supercars, mansions) .

you.com launches a Research API with “depth levels”

you.com launched its Research API, claiming state of the art on DeepSearchQA and top benchmark performance on BrowseComp, FRAMES, and SimpleQA, at a “fraction of the latency and cost” . It offers “one endpoint” with five levels of research depth, ranging from a 2-second lookup to 1,000+ reasoning turns on a single query .

Blog: https://you.com/resources/research-api-by-you-com

LlamaIndex: “not a RAG framework” → agentic document processing platform

LlamaIndex said it is shifting from being “connective tissue” between LLMs and data to an agentic document processing platform focused on automating knowledge work over documents . It highlighted LlamaParse processing 300k+ users across 50+ formats (PDF, Word, PowerPoint, Excel, etc.) using multi-agent workflows combining OCR, computer vision, and LLM reasoning , while stating it will continue OSS work aligned to this document-processing focus .

Reliability and behavior: code review debates and “sycophancy” research

“Kill the code review” becomes a stated goal for agentic engineering

Posts amplified an emerging view that removing human code review is the “Final Boss” for fully productive coding agents, citing rising PR volume and examples like StrongDM’s “Dark Factory” (claimed as no human code and no human review) .

Jeremy Howard: “vibe coding is a slot machine”

In a YouTube segment, Jeremy Howard argued AI-based coding can feel like a slot machine—an “illusion of control” that can produce code “no one understands” . He also said a recent study showed only a “tiny uptick” in what people are actually shipping , pushing back on narratives of massive productivity leaps .

Princeton study cited on X: sycophancy can suppress discovery

Gary Marcus highlighted a 557-person Princeton study described as finding that “default GPT” suppressed discovery at a rate comparable to a “yes-man” AI, while “unbiased feedback” produced 5× better results. In a separate post, he quoted a general mechanism: when models are trained to be helpful, they may “prioritize data that validates the user’s narrative” over truth-seeking data .

Policy watch: OpenAI’s DoD language tightens, but “incidental” surveillance concerns persist

Posts circulated updated language stating OpenAI’s system “shall not be intentionally used for domestic surveillance of U.S. persons and nationals,” including prohibiting deliberate tracking/monitoring (including via commercially acquired personal/identifiable data) . Another excerpt said the Department affirmed OpenAI services will not be used by DoD intelligence agencies (e.g., NSA) without a follow-on contract modification .

A separate post summarized the decision as withholding deployment to NSA and other DoD intelligence agencies “for now,” to allow time to address potential surveillance loopholes through the democratic process . Jeremy Howard pointed to how FISA Section 702 and EO 12333 can classify mass collection as “incidental,” suggesting that studying PRISM and Upstream would be instructive for understanding how “incidental” surveillance has been justified and used .

Meanwhile, Gary Marcus criticized the retention of the phrase “consistent with applicable laws” in the updated agreement language, reacting skeptically .

Compute and externalities: xAI emissions claims and pushback on datacenter narratives

One widely shared claim said xAI is operating 62 unpermitted methane gas turbines across two data centers (Memphis, TN and Southaven, MS), and that xAI’s own permit application suggests the facilities could emit more than 6 million tons of greenhouse gases and over 1,300 tons of health-harming air pollutants annually . Separately, a post said Grok scored “way below” peers on the latest ARC AGI leaderboard .

In contrast, Emad Mostaque argued that narratives about AI datacenter water and power impacts are politically charged, and claimed that golf courses use 10× the water of AI data centers globally .

Quick data point

A Reddit post (crossposted from r/MachineLearning) claimed a benchmark of 94 LLM endpoints (Jan 2026) found open source models within 5 quality points of proprietary models .

Debugging team performance, scaling discovery, and building for habits
Mar 4
9 min read
81 docs
Sachin Rekhi
Aakash Gupta
Nir Eyal
+9
This edition highlights two complementary levers for PMs: debugging team performance with the Waterline Model (structure → dynamics → people) and increasing discovery speed as AI accelerates delivery. It also includes a widget-first B2C habit case study, practical tactics for closing the insight and feedback-to-build gaps, and a roundup of PM automation skills worth exploring.

Big Ideas

1) Debug underperformance by checking systems before people (the Waterline Model)

When timelines slip and execution feels messy, it’s tempting to jump to people-based explanations—but the Waterline Model is designed to help you diagnose where the problem is coming from before deciding what to do . The rule of thumb is:

  • “Snorkel before you scuba”—start with shared systems first (goals, roles, decision-making) before diagnosing personalities .
  • Work through four layers in order: structure → dynamics → interpersonal → individual.

Why it matters: The model aims to prevent misdiagnosis (e.g., cycling through people unnecessarily) and focus fixes where they have the most leverage .

How to apply: Start by evaluating structure (vision, goals, context, expectations, role clarity, org design), then dynamics (decision-making, conflict, information flow), before moving into interpersonal/individual causes .


2) AI is accelerating delivery, but discovery is the bottleneck

Sachin Rekhi argues that AI has handed engineering a “jetpack” (tools like Cursor, Codex CLI, Claude Code), but discovery is now the constraint—building faster than you can learn risks “expensive mistakes, sooner” . His framing: the next wave of winning PMs will run 10x the customer learning with the same team.

Why it matters: If shipping gets cheaper/faster, the cost of building the wrong thing becomes the primary failure mode.

How to apply: Pick one discovery workflow where AI can compress cycle time—e.g.,

  • Analyze thousands of NPS verbatims/support tickets/app reviews into structured themes/quotes in an afternoon .
  • Set up “feedback rivers” that continuously monitor channels and surface actionable signals .
  • Scale interviews (what used to be 10 interviews becomes 100) via AI-moderated interviews .
  • Prototype to collect real behavioral data (heatmaps, drop-offs, in-product surveys) before production code .
  • Ask your database plain-English questions and get charts back—collapsing the hypothesis→answer loop from days to minutes .

3) “What won’t change” is still a product advantage: human behavior

Ryan Hoover highlights a Bezos-style lens: amid accelerating change, it can be more useful to ask what won’t change—he points to human nature and the evergreen value of understanding psychology .

In a related conversation, Nir Eyal distinguishes persuasive technology (helping people do what they want to do) from coercive technology (getting people to do what they don’t want to do, which he calls unethical) . He also emphasizes:

  • Reducing friction increases the likelihood a behavior occurs .
  • Habits are common (he says ~50% of daily actions are habitual), while addiction is a harmful compulsive dependency—and “there’s no such thing as a good addiction” .

Why it matters: New tooling changes what’s possible, but many product problems still come down to behavior change, ethics, and repeatable usage patterns.

How to apply: When you’re designing engagement, explicitly choose (and document) whether you’re reducing friction toward behaviors users want, and avoid framing “addiction” as a goal .


Tactical Playbook

1) A practical Waterline diagnostic (structure → dynamics → interpersonal → individual)

Use the model in order, and stop as soon as you find a plausible upstream cause.

  1. Structure check (start here): Confirm people know what they’re doing and how success is defined (vision, goals, context, expectations, role clarity, org design) . One practical prompt is to ask each person how they describe their role, what goals the team owns, and which numbers they personally own .
  2. Dynamics check: Look at how decisions get made, how conflict shows up and gets resolved, and how information flows (or doesn’t) . Dynamics are “experienced, not written down,” and can bottleneck decisions or create confusion about who decides .
  3. Interpersonal check (only after ruling out above): Interpersonal tension (trust, unresolved conflict, style clashes) can be real, but is often caused/amplified by unclear roles/overlapping ownership/incentives .
  4. Individual check (last): Only after goals/roles/dynamics are sound: evaluate whether the gap is coachable in the time the business can afford; if not, change the role or make a clean exit—avoid “lingering in ambiguity” .

2) Shrink the “Insight Gap” by naming where the lifecycle breaks

A Reddit thread frames the “Insight Lifecycle” as three common time sinks—even with tools like Amplitude, Stripe, Salesforce, and Gong :

  • Collection (“Fetch”): chasing data via BI tickets, exports, or locating where a metric is tracked .
  • Synthesis (“Tying it together”): manual CSV cleaning or dashboard-building to correlate data across sources .
  • Interpretation (“So what?”): deciding which signal matters and how it affects prioritization .

Step-by-step way to apply this:

  1. Label your current bottleneck (collection vs. synthesis vs. interpretation) .
  2. If your bottleneck is interpretation speed, trial faster loops like natural-language metric analysis (ask in plain English, get a chart back) .
  3. If your bottleneck is input sprawl, consider automation that continuously monitors feedback channels and surfaces signals (so you’re not constantly triaging manually) .

3) Close the feature-request → implementation gap by staying in the loop

One PM described a flow where users submit requests → PM writes tickets → devs code → a 2–3 week cycle, by which time momentum is lost; they’re exploring whether AI can bridge the gap . A reply argues this resembles a waterfall-style handoff and warns it’s a “recipe for disaster” .

Step-by-step adjustment to apply (process, not tooling):

  1. Treat your job as confirming the solution meets user needs across discovery → definition → delivery and beyond, not just ticket-writing .
  2. During the 2–3 week cycle, repeatedly check what’s being built against what the user asked for (instead of waiting until the end) .
  3. Increase customer involvement frequency—framing this as the original goal of Agile: bringing customers into the loop to ensure what’s delivered meets their needs .

4) Use AI prototyping tools after research (don’t skip the problem space)

Aakash Gupta shares a warning from Nadav Abrahami (Wix co-founder, now building the AI prototyping platform Dazl): he’s seen PMs rush straight into building and skip the step that determines whether a feature succeeds .

“You need to understand what problem you’re solving, what user story, and the rough shape of the feature… you can’t just jump in immediately to the solution space.”

Itamar Gilad similarly flags the risk of using prototyping for ideation in a way that moves too fast into the solution space .

How to apply (minimum viable sequence):

  1. Understand the problem.
  2. Map the user stories.
  3. Define the rough shape of the feature.
  4. Then prototype—“the tool is a hammer; make sure you’ve found the right nail first” .

Case Studies & Lessons

1) Underperformance that was actually structural: roles/goals weren’t coherent

In a Waterline Model example, a leader taking over a struggling marketing team avoided diagnosing people first and instead asked each person how they described their role, what goals they owned, and what metrics they were expected to move . The answers were “wildly inconsistent,” and leadership’s view of responsibility and success measurement differed from how individuals understood their jobs .

Lesson: Structural clarity (mandate, goals, role definitions, success criteria) can improve performance quickly, before personnel changes .


2) A “speed problem” that was actually dynamics: decisions weren’t stable

Another example describes a founder frustrated by slow execution—yet when the team tried to move quickly, the founder “swoops in” and unmakes decisions or second-guesses judgment, making it feel unsafe to move quickly . The team adapted rationally by slowing down, adding extra alignment layers, and escalating decisions that didn’t need escalation .

Lesson: Dynamics can create rational “self-protection” behaviors that look like performance issues from the outside .


3) B2C habit design via a widget-first product (Bible verse app MVP)

A startup founder (not religious) reports competitor research across top Bible apps and found a repeated review theme: users “keep forgetting to open it,” which they interpreted as a habit problem rather than a content problem . Their solution: a home screen widget that shows the daily verse automatically—“the widget is the product; the app is secondary” (journaling, verse history, exploration live in-app) .

Monetization + distribution choices (as shared):

  • Pricing: $4.99/month or $39.99/year; free tier includes unlimited daily verse + 3 extra exploration verses/day; premium removes limits/ads and adds customization; AdMob on free tier .
  • Launch focus: US/UK/Canada/Australia/Philippines day one; “available in 175 countries technically” but focusing on English markets .
  • Go-to-market: an “ASO/Reddit/Product Hunt” lane plus “Christian micro-influencers” given premium access + small gift cards to create authentic content .

Lesson (connected to a general behavior principle): Nir Eyal notes that reducing friction increases the likelihood a behavior occurs —and this product’s core design puts the habit on the home screen, before other apps .


Career Corner

1) Don’t let “people” be your default diagnosis

Lenny Rachitsky amplifies a leadership trap: when a team underperforms, most people’s first instinct is to blame the people—and he argues that’s “almost always wrong,” pointing instead to structural problems below the surface .

How to apply: When you’re escalating performance concerns, bring a Waterline-style writeup: what you checked in structure and dynamics before attributing issues to individuals .


2) The differentiator in the AI era: higher learning velocity

Rekhi’s claim is directional: the PMs who “win in the next wave” won’t be the ones who only learned prompting to build—they’ll be the ones who learned to run 10x the customer learning with the same team. Separately, Jason Long predicts AI will widen the gap: driven, entrepreneurial people with time and resources will accelerate, while others may be pushed further back; he also expects job losses and a “greater consolidation of power and wealth” .

How to apply: Treat discovery leverage as a core skill—e.g., shorten cycles on feedback analysis, interviews, prototype-based learning, or metric analysis (pick one and systematize it) .


3) Scope signals: senior PMs being asked to own P&L

One PM with 10 years of experience shared that they were asked to handle P&L statements for their charter for the first time and asked if that’s now common for senior PMs .

How to apply: Use this as a prompt to clarify expectations in your org: whether “charter ownership” includes financials, and what support (Finance partnership, tooling, cadence) comes with that responsibility .


Tools & Resources

1) OpenClaw: a PM-oriented “skills catalog” (with automation examples)

Jason Long (a former product owner) shares a custom OpenClaw skills catalog for PM workflows, distributed as a zip file he describes as safe . Example skills he describes include:

  • Second brain: builds an ontology by reading emails/Google Docs/Asana (read-only); maps projects, team members, strengths/weaknesses, and connections to improve the quality of generated work (e.g., more informed emails) .
  • Competitive monitor: tracks competitors’ websites/releases and posts updates day-by-day; can centralize outputs into a Notion database/wiki .
  • Feedback aggregator: reads 7 days of Slack/Discord channels, extracts themes, maps to product areas, counts frequency, and generates a structured Monday digest (with charts/graphs) .

He also notes a security posture choice: he uses Discord instead of Slack for certain workflows due to sensitive data and not feeling “comfortable enough” securing the system in Slack yet .


2) Waterline Model (read + share)


3) AI discovery workflows (live session)

  • Rekhi is sharing his “10 AI discovery workflows” at the Lean Product Meetup in Mountain View on March 5 (registration link provided) .
Moats in the AI era, limits of software “silver bullets,” and leadership under pressure
Mar 4
4 min read
138 docs
Garry Tan
Jeremy Howard
Jeremy Howard
+6
Today’s strongest picks include Jeremy Howard’s reference to Fred Brooks’ “No Silver Bullet” as a direct counterpoint to modern “no more coders” narratives, plus a thoughtful essay on AI-era moats, a psychology evergreen, and standout recommendations on leadership, geopolitics, and capital allocation.

Most compelling recommendation: a reality check on “no more engineers” narratives

No Silver Bullet (essay) — Fred Brooks

  • Link/URL: Not provided (recommended in this video): https://www.youtube.com/watch?v=dHBEQ-Ryo24
  • Recommended by: Jeremy Howard
  • Key takeaway (as shared): Howard points to Brooks’ “No Silver Bullet” as sounding like it was written for today—responding to claims that new tools would mean we “won’t need any coders anymore.” Brooks argued productivity gains would be limited (Howard cites Brooks’ “30% improvement” framing) and that most software engineering work isn’t typing code.
  • Why it matters: It directly addresses the same premise driving current AI coding hype, and frames where the real bottlenecks may lie (beyond code entry).

AI + product strategy: moats and psychology

“AI and Startup Moats” (essay) — Agam

  • Link/URL:https://unzip.dev/0x01f-ai-and-startup-moats/
  • Recommended by: Ryan Hoover (Product Hunt)
  • Key takeaway (as shared): Hoover calls it a “thoughtful essay” on how moats are changing.
  • Why it matters: A direct pointer to a single, focused argument about defensibility in an AI-shifting landscape (without being framed as promotion).

Hooked: How to Build Habit-Forming Products (book) — Nir Eyal

  • Link/URL: Not provided
  • Recommended by: Ryan Hoover
  • Key takeaway (as shared): Hoover argues that “what won’t change” is human nature, and that a deep understanding of psychology is “evergreen” across roles—even as technology changes.
  • Why it matters: It’s recommended as durable foundational knowledge (human behavior) rather than a tool-specific tactic.

Building teams that compound: “slope” and hands-on intuition

“A Little Bit of Slope Makes up for a lot of Intercept” (lecture) — John Osterhout

  • Link/URL: Not provided (recommended in this video): https://www.youtube.com/watch?v=dHBEQ-Ryo24
  • Recommended by: Jeremy Howard
  • Key takeaway (as shared): Howard summarizes Osterhout’s idea: prioritize activities that increase your growth rate (“slope”) over things you’re already good at (“high intercept”). He adds that for his company, he focuses on his team’s “slope.”
  • Why it matters: It’s a crisp framework for evaluating work and development: optimize for compounding capability, not just near-term output.

Brett Victor’s work (talks/demos) — Brett Victor

  • Link/URL: Not provided
  • Recommended by: Jeremy Howard
  • Key takeaway (as shared): Howard says Victor best expresses the importance of a “direct, visceral connection” with the thing you’re working on, and encourages people who haven’t watched Victor to do so.
  • Why it matters: It highlights a learning principle that’s easy to lose in modern tooling: staying close to the system you’re building to develop intuition.

Investing: why “process” isn’t the whole story

Capital Allocators episode featuring Gavin Baker (podcast episode)

  • Link/URL:https://podcasts.apple.com/us/podcast/capital-allocators-inside-the-institutional/id1223764016?i=1000752436222
  • Recommended by: Ian Cassel (shared), endorsed by Keith Rabois (“This.”)
  • Key takeaway (as shared): The post argues that while a repeatable process matters, any repeatable edge that produces significant alpha gets “quickly arbed away.” It suggests repeatable outperformance often comes from a small number of key people (estimated “2–10”)—illustrated via a Michael Jordan / Bulls analogy.
  • Why it matters: It’s a concrete lens for evaluating investment organizations (and teams) beyond process narratives: who are the few individuals that actually drive outcomes?

Geopolitics and competition

Breakneck (book) — Dan Wynn

  • Link/URL: Not provided (recommended in this episode): https://www.youtube.com/watch?v=s830OB11pqw
  • Recommended by: Garry Tan (Y Combinator)
  • Key takeaway (as shared): Tan recommends Breakneck as a book about China vs. the US, calling it “incredible,” with another participant adding: “It’s really good.”
  • Why it matters: A clear signal that this is a high-quality read (per Tan) for thinking about US–China dynamics in a period where many founders/investors are reassessing national competitiveness.

Leadership under pressure

Lee Kuan Yew speech on handling a strike (speech/video)

  • Link/URL: Not provided (recommendation shared here): https://x.com/davidsenra/status/2028952451584561387
  • Recommended by: Brian Armstrong (Coinbase)
  • Key takeaway (as shared): Armstrong describes a Lee Kuan Yew speech as “great,” and highlights the stance of being willing to “rebuild it all from scratch” rather than “allow you to bring this country down.” He says it’s “very inspiring” and something he feels he needs to do as a leader.
  • Why it matters: A rare example of a founder pointing to a specific leadership moment as a personal benchmark for decisiveness and resolve.

“I sat across the table from them and said get back to work. I will not allow you to bring this country down. And if you don’t do it, I’m prepared to rebuild it all from scratch again.”

Urea jumps on Hormuz risk as Brazil harvest delays reshape corn/soy decisions
Mar 4
7 min read
124 docs
ABC Rural
Successful Farming
Joel Salatin
+10
Fertilizer and grain markets reacted to Strait of Hormuz risk with sharp urea gains and renewed supply-timing concerns, while Brazil’s soybean harvest delays and regional weather issues continue to influence safrinha corn decisions. This digest also highlights actionable on-farm innovations—from biological nematode control and soybean variable-rate seeding to AI-driven livestock nutrition and genomic testing for heifer selection.

1) Market Movers

Energy + geopolitics feeding into ag

  • Fertilizer reacted immediately to Middle East shipping risk. In NOLA (New Orleans) physical barges for April, urea traded at $457/ton Friday and around $550/ton Monday (with commentary noting prices up roughly $70–$93/ton) . Phosphate was described as up about $30/ton, while UAN and anhydrous ammonia moved less sharply and potash was said to be unaffected so far .
  • Grain markets are also being pulled by the biofuel/energy channel: one market recap highlighted soybean oil’s outperformance tied to crude strength amid the conflict , and another framed soybean oil as up $14.50 (+29.5%) since the end of last year.

Grains: price action + demand signals

  • Futures levels cited in market coverage: May corn 451¾ (+6¢), May soybeans 1182½ (+18½¢), May Chicago wheat 584 (+6¾¢), May Kansas City wheat 583½ (+8¾¢), and May spring wheat 613¾ (+3¾¢) .

  • US export inspections (week ending Feb 26):

    • Corn: 73M bushels (down 8% vs prior week, up 37% YoY)
    • Soybeans: 42M bushels (up 67% vs prior week, up 62% YoY); China accounted for ~65% of inspections
    • Wheat: 13M bushels (down 39% vs prior week; down 12% YoY)
  • Spain demand + policy headline risk: Spain has bought 2.4 MMT of corn so far this year (up 1 MMT vs last year) with about 225,000 MT still waiting to ship . Separately, a post quoted Trump as saying: “SPAIN HAS BEEN TERRIBLE, I TOLD BESSENT TO CUT OFF ALL DEALING WITH SPAIN.

2) Innovation Spotlight

Biological nematode control in soybeans (US/Brazil retail rollout)

  • Indigo Ag’s Biotrinsic Nemora FP (biological nematicide) won The Scoop’s 2025 New Product of the Year.
  • Mechanism & measured efficacy: a bacterial seed treatment that grows with the plant and reduces soybean cyst nematode (SCN) egg hatch by ~70%; the discussion notes 4–6 nematode life cycles per season and describes the compounding effect across cycles .
  • Use & handling metrics: applied as a flowable powder dry planter-box treatment at 1 oz/cwt at planting ; product shelf life described as 18–24 months at room temperature, with on-seed planting windows from 60 days up to 1 year (by product) .

AI-assisted nutrition decisions in poultry and swine (Brazil)

  • A segment on animal production described AI systems that integrate feed intake, weight gain, feed conversion, water consumption, barn environment, health history, and ingredient quality to spot patterns and flag issues early (e.g., subtle intake drops plus rising bird temperature) .
  • Reported operational benefit: earlier diet/formulation adjustment, avoiding performance loss and reducing waste.
  • Implementation constraints called out: data quality, system integration, tech access, and trained staff.

Genomic testing for replacement-heifer selection (US beef)

  • Zoetis’ Inherit Select genomic testing is presented as a way for commercial producers to select replacements beyond “looks,” including traits like cow fertility (lifetime calves up to 9 years), feed efficiency, and BRD health/survival.
  • Operational timeline & economics: turnaround advised at roughly 30 days, with testing ideally as early as possible but at least 30 days before the first keep/sell decision point (e.g., branding/weaning) . ROI was characterized as 3–4:1 or higher in their experience/modeling .

3) Regional Developments

Brazil (Mato Grosso): excess rain delays soy, compresses safrinha corn window

  • In northern Mato Grosso (Peixoto de Azevedo, Matupá, Marcelândia), reporting described atypical precipitation that waterlogged soils and damaged infrastructure, slowing soybean harvest and pressuring second-crop corn planting .
  • Marcelândia details included: about 35% of 200,000 ha still unharvested, an emergency declaration, and rainfall projected near 3,000 mm versus an average 1,800–2,000 mm.
  • Producer-reported impacts included soybean losses of 8–10 sacks/ha (from an expected 75–80) and corn area reductions around 20% tied to a ~10-day delay beyond the ideal planting window .

Brazil (national): harvest still behind pace despite some drying

  • Conab-linked coverage described national soybean harvest ~7% behind last year and 10% behind the 5-year average. Reported state gaps included Maranhão around 31% behind, with delays of 10–15% in Minas Gerais and Goiás .

Brazil (south): drought-driven soybean crop downgrades

  • Market commentary cited crop cuts attributed to southern Brazil drought: AgRural at 178 MMT (from 181) and StoneX at 177.8 MMT (from 181.6) , with examples of rainfall at 60% of normal in Paraná over 30 days .

Mercosur rice: weak pricing + expected area reductions

  • An outlook for rice producers described low prices and poor yields in 2026 , with expectations of about a 19% rice area reduction across Mercosur countries by 2028 as producers shift into other options (soy, corn, livestock) .

4) Best Practices (actionable)

Grains & oilseeds

  • Variable-rate soybeans (think “opposite of corn”). Ag PhD recommended lowering seeding rates in the best zones (example: ~120,000) to shorten plants, improve standability, and increase airflow to reduce disease pressure; and raising rates in poorer/IDC areas (example: ~160,000–180,000) to push height for better weed control and potentially reduce IDC via greater root acid exudation and nutrient availability . They also noted variable seeding should not increase total seed cost (reallocating dollars by zone) .

  • Soybean inoculation + nodulation check. Ag PhD described inoculation as putting live rhizobia on seed or in-furrow to support nitrogen fixation, noting these bacteria can persist but may be outcompeted, so inoculating “each time” can help . To check nodulation, slice nodules: pink/beefsteak red = active, black/brown = dead.

Soil fertility (rate-setting under volatile prices)

  • Canal Rural’s agronomy segment emphasized that more fertilizer doesn’t always mean more productivity: after a point, marginal returns decline. It also stressed that the maximum economic efficiency rate rarely equals the maximum yield rate—especially when fertilizer prices are volatile .
  • Specific “too much” risks cited included nitrogen increasing lodging risk and N₂O emissions, potassium imbalancing Mg/Ca, and phosphorus exceeding soil fixation capacity leading to immobilization or environmental loss .

Livestock & small-farm systems

  • Deep-bedded pig pens as a compost engine (small acreage). Joel Salatin recommended stationary deep-bedded pens (example 15×20 ft for 3–5 pigs) with 24–36 inches of wood-chip bedding; adding carbon to the pigs’ “toilet corner” was presented as a way to accumulate enough material to create 3–4 pickup loads of compost in ~6 months.

5) Input Markets

Fertilizer: price spikes + logistics constraints (US and global)

  • The Strait of Hormuz was described as a critical chokepoint for moving oil and a significant share of global fertilizer . One market source estimated ~⅓ of global crop nutrients pass through the strait , including ~25% of global anhydrous ammonia exports and ~20–25% of global urea exports.

  • Supply timing risk into spring: a Farm Journal interview described a “two-month-ahead” calendar—~30 days ocean transit plus another ~3–4 weeks to move product inland, with an example that a vessel loading “today” might not be readily available until around May 1. A related discussion framed urea as dependent on the Middle East for a large share of US import needs and emphasized the tightness of the spring calendar .

  • Market behavior: one grain-market video said many US retailers went “no bid” on nitrogen as they waited on expected price spikes .

  • Relative affordability: StoneX commentary said urea values are high versus history and that the urea-to-corn ratio is the second-highest on record for this time of year (with the peak referenced as 2005) .

Potash and phosphate: diverging setups

  • Potash was described as well supplied and steady, with Canada’s 2025 exports referenced as the most ever and additional tonnage expected from multiple origins and expansions .
  • For phosphate, one discussion highlighted concentration (five countries controlling ~85–90% of flows) and cited China not exporting until August.

Chemicals and regulatory/process signals

  • Successful Farming flagged new dicamba label changes, ESA documentation requirements, and glyphosate litigation as factors shaping 2026 weed plans .

Producer risk tools (old-crop corn)

  • A hedging note for corn that must move by May advised:
    • With weak basis: “protect the board first” via April/May puts or selling futures if basis is expected to firm before movement .
    • With strong basis: “capture basis” .

6) Forward Outlook

  • Spring planting decisions may hinge on fertilizer availability. One Farm Journal segment noted February fertilizer imports were “healthy,” but still warned that acreage could shift from corn to soybeans if fertilizer supplies don’t arrive.

  • Fuel and fertilizer remain linked to oil moves. A Brownfield interview cited oil around $71–$72/barrel (up roughly $6) and said fertilizer prices often move with oil as a proxy for natural gas, emphasizing fertilizer as a larger farm expense than fuel .

  • Brazil export exposure to Iran is material for corn. Canal Rural reported Iran imported about 9 million tons of Brazilian corn in 2025 (about 22–23% of Brazil’s corn exports) . The same coverage warned that conflict-driven disruptions to freight, port activity, and regional logistics could affect Brazil’s export potential and (if exports are constrained) contribute to heavier domestic supply and weaker internal prices .

  • Financial risk management theme (US): a Brownfield segment encouraged farmers to manage controllable risk by limiting debt capital borrowed (beyond operating loans) during heightened input and price uncertainty .

Your time, back.

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

Get your briefs in 3 steps

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Stay updated on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Track startup funding trends and venture capital insights
Latest research on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Confirm your sources and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering relevant sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson

3

Receive verified daily briefs

Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.

4-day autonomous agents, Cursor MCP Apps, and the push from code review to evidence
Mar 4
4 min read
111 docs
Romain Huet
Tristan Rhodes
Latent.Space
+8
Cursor claims a 4-day fully autonomous agent run produced a stronger-than-official solution to a math research challenge—suggesting coordination techniques may generalize beyond coding. Also: Cursor’s MCP Apps (interactive UIs in-chat), model/tool value debates (Codex vs others), and concrete execution patterns like Implementation Plans, 95%-confidence autopilot loops, and async checkpoints.

🔥 TOP SIGNAL

Cursor’s CEO says their agent harness ran fully autonomously for 4 days (no nudges/hints) and produced what they believe is a novel, stronger solution to Problem Six of the First Proof math research challenge—an early signal that “scaling agent coordination” may generalize beyond coding tasks . The claimed improvements include using the Marcus–Spielman–Srivastava interlacing polynomial method, improving a constant from c=0.03 → 0.13, and partitioning the entire vertex set into light components (vs a subset) .

🛠️ TOOLS & MODELS

  • Cursor — MCP Apps support (new): Cursor now supports MCP Apps, so agents can render interactive UIs inside conversations.

  • OpenAI Codex — “most agentic coding per dollar” (practitioner claim): Romain Huet says Codex is currently the best option by far for agentic coding value .

  • Antigravity (agentic coding platform) — “Implementation Plan” + screenshot-to-Flutter UI

    • Recommended flow: request an “Implementation Plan” artifact first, review/edit the markdown architecture/logic, then approve execution—explicitly warning “don’t let AI write code blindly” .
    • “Screenshot → functional Flutter UI” demo: drop a screenshot and ask to rebuild as Flutter UI; described as powered by Gemini 3 Flash and launching on-device .
  • Claude Opus 4.5 / 4.6 (Copilot workflow) — quality jump (firsthand): Burke Holland describes Opus as a practical inflection point for building tools quickly, contrasting it with Sonnet 3.5 output he calls “spaghetti code” and “willy nilly” changes .

💡 WORKFLOWS & TRICKS

  • Steal this: “Implementation Plan → approve → execute” as your default safety rail (Antigravity)

    1. Ask the agent for an Implementation Plan artifact first .
    2. Review and edit the architecture + markdown logic yourself .
    3. Only then approve execution (the explicit goal: control the outcome vs blind codegen) .
  • Plan mode isn’t about the plan—it’s about flushing missing constraints (Burke Holland)

    • Start in “plan mode” and do 4–6 loops where the agent proposes what you forgot to specify + multiple options, before you let it implement .
  • Autopilot / loop-until-confidence (Burke Holland)

    • Run the agent in a loop that feeds its output back into itself, but change the stop condition from “until it’s done” to “until you have 95% confidence it’s done” .
  • Task classification + model routing + sub-agent fanout (multi-model orchestration) (Burke Holland)

    • Use a “front” agent to classify tasks as easy/medium/hard and change the workflow accordingly (hard tasks: plan + sub-agents + farm-out work) .
    • In the described Copilot setup, different models can be used in one run (example routing: Gemini for design, other models for refactoring) and scaled up to many sub-agents—but the workflow must still output something verifiable.
  • Async agent + human checkpoints (Burke Holland)

    • Pattern: give the CLI a big job, walk away, and have it message you (example: Telegram) with progress + a “what next?” checkpoint so you can approve/deny and let it continue .
  • Reality check: “polish” is still synchronous (Kent C. Dodds)

    • Kent calls out that with cloud agents, polish requires real-time back-and-forth while you try outputs and iterate—hard to do asynchronously from phone/desktop today .

👤 PEOPLE TO WATCH

  • Michael Truell (Cursor) — concrete evidence of long-horizon autonomy: same harness previously used to “build a browser from scratch,” now used for a 4-day autonomous run on a math research problem .

  • Burke Holland (GitHub Copilot DevRel) — unusually replicable patterns for “agent experience”: plan-mode loops, 95% confidence autopilot loops, and multi-model orchestration with evidence requirements .

  • Simon Willison — frames the core bottleneck as security review at scale: treat coding agents like “teams of mixed ability engineers” shipping under deadline pressure; security issues are “directly harmful” vs survivable code quality issues .

  • swyx (+ Ankitxg) — continued push to remove review bottlenecks: calls “killing the Code Review” the “Final Boss of Agentic Engineering,” pointing to a layered playbook and “Dark Factory” anecdotes (no human code and no human review) .

🎬 WATCH & LISTEN

1) Changelog — “Plan mode” loops that prevent bad prompts (≈20:55–22:04)

Hook: plan mode as a structured way to surface what you forgot to ask for, plus multiple implementation options before execution .

2) Changelog — Autopilot: loop until 95% confidence (≈22:16–23:03)

Hook: changing the stopping condition (“until it’s done” → “until 95% confident”) to force deeper self-checking iterations .

📊 PROJECTS & REPOS


Editorial take: The frontier is shifting from “write code” to run loops + produce evidence—and the hardest unsolved piece is how you scale review (especially security) without slowing agents back down .

Gemini 3.1 Flash‑Lite launches as GPT‑5.3 Instant rolls out and Anthropic nears $19B run-rate
Mar 4
9 min read
862 docs
Thariq
Logan Kilpatrick
Sam Altman
+43
Gemini 3.1 Flash‑Lite Preview lands with “thinking levels,” aggressive speed claims, and $0.25/$1.50 per MTok pricing, while OpenAI rolls out GPT‑5.3 Instant broadly and adds GPT‑5.3-chat-latest to the API. Also: Anthropic’s reported $19B run-rate and business share shift, Arena’s new Document Arena leaderboard, and continued turbulence inside Alibaba’s Qwen team.

Top Stories

1) Google ships Gemini 3.1 Flash‑Lite Preview (speed + cost focus, with adjustable “thinking levels”)

Why it matters: The release is positioned for high-volume, low-latency workloads, and adds a new control surface (“thinking levels”) that lets developers trade off compute vs. complexity on a per-task basis—useful for agent pipelines and real-time processing.

Key details from Google and independent evals:

  • Availability: Rolling out in preview via the Gemini API in Google AI Studio and Vertex AI.
  • Pricing:$0.25 / 1M input tokens and $1.50 / 1M output tokens.
  • Speed claims (vs Gemini 2.5 Flash):2.5× faster time to first answer token and 45% faster output speed.
  • Benchmarks shared by Google:1432 Elo on Arena leaderboard, up to 86.9% on GPQA Diamond, and 76.8% on MMMU‑Pro.
  • “Thinking levels”: Google describes adjustable compute with “zero thinking overhead” on high-volume tasks, while reasoning through complex edge cases.
  • Artificial Analysis (Gemini 3.1 Flash‑Lite Preview): scored 34 on the Artificial Analysis Intelligence Index (up 12 vs Gemini 2.5 Flash‑Lite) while served at >360 output tokens/s with ~5.1s average answer latency.
  • Context + features (AA): retains 1M token context and supports tool calling, structured outputs, and JSON mode.

2) OpenAI rolls out GPT‑5.3 Instant broadly (and adds GPT‑5.3‑chat‑latest to the API)

Why it matters: This is a “most-used model” refresh focused on more direct, less defensive responses and improved web search behavior—the kinds of UX shifts that can materially change product adoption even without a headline benchmark jump.

What’s new / where it’s available:

  • ChatGPT rollout: “GPT‑5.3 Instant in ChatGPT is now rolling out to everyone.”
  • Stated behavioral goals: fewer unnecessary refusals, fewer defensive disclaimers, and answers that “get to the point more directly.”
  • Web search improvements called out by OpenAI: sharper contextualization, better understanding of question subtext, and more consistent tone within a chat.
  • Hallucination/factuality note: for “questions where factuality matters most,” one contributor reports 26.8% better (when searching) and 19.7% better (when not searching).
  • API: “GPT‑5.3‑chat‑latest now also in the API.”
  • Benchmarking access: “GPT‑5.3‑Chat‑Latest” is available in Arena’s Text Arena for testing.

OpenAI also teased:

“5.4 sooner than you Think.”

3) Anthropic momentum: $19B run‑rate reports + business share shift + senior talent move

Why it matters: Multiple signals point to rapid enterprise pull: reported revenue acceleration, business market share movement, and a high-profile research leadership transition.

  • Revenue run‑rate: Sources cited by Bloomberg via Techmeme say Anthropic recently surpassed $19B run‑rate revenue (up from $9B end of 2025 and ~$14B a few weeks earlier).
  • Run‑rate disclaimer: described as “annualized run-rate,” not realized revenue.
  • US business AI market share claim: Feb 2025: ChatGPT 90%; Feb 2026: Claude ~70%.
  • Talent move: Max Schwarzer (OpenAI post‑training leadership) said he’s leaving OpenAI and joining Anthropic to work on RL research.

4) “Document Arena” launches with PDF-based evaluations (Claude Opus 4.6 leads)

Why it matters: Document reasoning is closer to many real workflows (contracts, reports, technical PDFs). Arena’s new format uses user-uploaded PDFs and side-by-side voting, making the leaderboard a live signal for “doc work” performance.

  • Document Arena is live and compares frontier models on document reasoning using PDFs.
  • Leaderboard snapshot: Claude Opus 4.6 is #1 at 1525 (+51 lead).
  • Arena says Opus 4.6 is now #1 across Text, Code, Search, and Document arenas.
  • PDF upload workflows highlighted: summarize complex content, ask questions against the file, extract key insights.

5) Alibaba Qwen team turbulence (leadership change + departures + org restructure signals)

Why it matters: Qwen is widely credited as core infrastructure for open-weight ecosystems; leadership and staffing instability could change the pace and direction of open model releases.

  • Leadership change: “Alibaba‑Cloud kicked out Qwen’s tech lead.”
  • Departure posts: Qwen tech lead @JustinLin610: “me stepping down. bye my beloved qwen.” and @huybery: “bye qwen, me too.”
  • Restructure context (Tongyi conference summary): Qwen described as a group priority with plans for expansion; references to resource constraints (including compute) and organizational changes.
  • External view on impact: Qwen 1.0 launched in fall 2023; subsequent releases “pushing the frontier of open-weights,” enabling “hundreds, maybe thousands” of papers and many products/startups.

Research & Innovation

What to watch: reliability + efficiency are increasingly “core research,” not just engineering

Two clusters stood out this cycle: (1) methods that reduce the memory/compute cost of training and (2) evidence that multi-agent coordination is still fragile without deliberate design.

Training efficiency: FlashOptim (Databricks AI Research)

  • Claim: cuts training memory by over 50% with no measurable loss in model quality.
  • Concrete metric: AdamW training typically needs 16 bytes/parameter for weights, gradients, and optimizer state; FlashOptim reduces this to 7 bytes (or 5 with gradient release).
  • Example: Llama‑3.1‑8B finetuning peak GPU memory drops from 175 GiB → 113 GiB.
  • Compatibility: drop-in replacement for SGD, AdamW, Lion; supports DDP and FSDP2; open source.
  • Techniques summarized by Databricks: improved master weight splitting + companded optimizer-state quantization.

Optimization + search: SkyDiscover (open-source)

  • Releases an open-source framework with two adaptive algorithms reported to match/exceed AlphaEvolve on many benchmarks and outperform OpenEvolve/GEPA/ShinkaEvolve across 200+ optimization tasks.
  • Reports +34% median score improvement on 172 Frontier‑CS problems and “discovers system optimizations beyond human-designed SOTA.”

Agent reliability: consensus + coordination don’t “just emerge”

  • Byzantine consensus games: research finds valid agreement is unreliable even in benign settings and degrades with group size; most failures are convergence stalls/timeouts (not subtle value corruption).
  • Theory of Mind (ToM) in multi-agent systems: a ToM/BDI + symbolic verification architecture shows ToM-like mechanisms don’t automatically improve coordination; effectiveness depends on underlying LLM capability.

Biology: Eubiota “AI co-scientist” claims lab-validated discoveries

  • Eubiota is described as a multi-agent AI framework for end-to-end discovery (planning, tool use, evidence verification, wet-lab validation).
  • Reports 87.7% mechanistic reasoning accuracy (vs GPT‑5.1 77.3%).
  • Reported validated outcomes include: identifying the uvr‑ruv stress axis (screening 1,945 genes and 10K papers), designing a microbial therapy reducing colitis inflammation, engineering antibiotics, and discovering anti-inflammatory metabolites.

Products & Launches

What to watch: tools are converging on “agent runtimes” (compute + context + UI + eval)

This week’s releases focus less on single APIs and more on the scaffolding around agents: sandboxes, computer-use, document pipelines, and debugging/observability.

Developer agents and orchestration

  • Cursor cloud agents: run in isolated VMs with full computer-use capabilities; produce merge-ready PRs and validation artifacts (video/screenshot) across web/mobile/Slack/GitHub.
  • Cursor MCP Apps (v2.6): agents can render interactive UIs inside conversations; also adds private plugin marketplaces for teams.
  • OpenAI Codex: shipped a new $chatgpt-apps skill in the Codex app for building ChatGPT apps with the Apps SDK (scaffolding, wiring tools to widget resources, iterating host-aware UI).

Search + research APIs

  • you.com Research API: claims SOTA on DeepSearchQA and top scores on BrowseComp/FRAMES/SimpleQA “at a fraction of the latency and cost.” Offers one endpoint with five depth levels, up to “1,000+ reasoning turns” per query.

Document workflows: evaluation and production tooling

  • Arena Document Arena: PDF upload + side-by-side voting and leaderboard for document reasoning tasks.
  • LlamaIndex positioning: says it has evolved from a RAG framework to an “agentic document processing platform,” with LlamaParse processing 300k+ users across 50+ formats using multi-agent workflows (OCR + computer vision + LLM reasoning).

Speech / realtime

  • AssemblyAI Universal‑3‑Pro streaming: brings AssemblyAI’s most accurate speech model to streaming audio; highlights include real-time speaker labels, strong entity detection, code-switching, and global language coverage.

Specialized models in production contexts

  • Baseten: says it trained a specialist model that beats Gemini on emergency medicine documentation and runs 6–8× faster.

Industry Moves

What to watch: “distribution + workflow integration” is reshaping competition

  • OpenAI building a GitHub alternative: The Information reports OpenAI is developing an internal alternative to GitHub after outages; staff discussed potentially selling it to customers.
  • Perplexity Computer as a packaged runtime: Perplexity says its “Computer” orchestrates 20 different AI models and can be embedded into apps without developers managing API keys, using a secure sandboxed runtime they orchestrate end-to-end.
  • US business market share claim: a post asserts ChatGPT fell from 90% (Feb 2025) to Claude ~70% (Feb 2026).
  • Apple local-compute signal: Apple introduced M5 Pro and M5 Max with a “Fusion Architecture” merging two 3nm dies; claims include over 4× peak GPU compute for AI vs prior generation and 614GB/s unified memory bandwidth.

Policy & Regulation

What to watch: legal definitions are hardening into product constraints

US copyright: AI can’t be the author (Thaler v. Perlmutter stands)

  • US courts held that “authorship” must be human (Thaler v. Perlmutter), and the US Supreme Court declined review (so the D.C. Circuit ruling stands).
  • USCO guidance: prompt-only AI output can’t be registered; meaningful human creative contribution can be protected (and similar logic applies to AI-generated code absent human authorship).

New York bill targeting chatbot legal advice (SB 7263)

  • SB 7263 would prohibit chatbot operators from permitting substantive legal advice that would constitute unauthorized practice of law; it passed the Internet & Technology Committee last week.
  • Includes a private right of action with mandatory attorneys’ fees.

OpenAI–DoW/DoD contract language scrutiny continues

  • OpenAI amended its agreement to state the AI system “shall not be intentionally used” for domestic surveillance of US persons/nationals, including deliberate tracking via commercially acquired personal/identifiable information.
  • The Department affirmed services won’t be used by DoW intelligence agencies (e.g., NSA) without a follow-on modification.
  • Commentators note the full contract text is not public; some argue language could still be porous given legal definitions of “collect/surveil” and “incidental” collection mechanisms.

Global governance signal

  • The UN’s Independent International Scientific Panel on AI elected co-chairs Yoshua Bengio and Maria Ressa, with the first report slated for July 2026.

Quick Takes

What to watch: smaller signals that may compound

  • METR Evals correction: fixed a modeling mistake that inflated recent 50%-time horizons by 10–20%; for Opus 4.6, one update reports P50 11h 59m (down from 14.5h) and P80 1h 20m (up from ~1h).
  • Claude Code voice mode: rolling out (reported live for ~5% of users), toggled via /voice.
  • Codex voice transcription: available to 100% of Codex users; in-app via mic or Ctrl + M, and in CLI via config + press-and-hold Space.
  • Gemini 3 Pro sunset: Google is “turning down Gemini 3 Pro” on March 9; users can upgrade to Gemini 3.1 Pro Preview.
  • Qwen 3.5 GPTQ Int4 weights: Alibaba released GPTQ‑Int4 weights with native vLLM and SGLang support (less VRAM, faster inference).
  • H100 shortage watch: posts report near-zero H100 capacity on Prime Intellect and Lambda dashboards; one provider suggests capacity may improve in coming weeks.
  • Bipartisan opposition to AI data centers: reported escalation includes New York proposing three-year construction moratoriums and communities pulling tax incentives.
Gemini 3.1 Flash-Lite raises the speed-per-dollar bar as open-weights surge and agent tooling proliferates
Mar 4
7 min read
167 docs
Sam Altman
Jeremy Howard
Jeremy Howard
+30
Google’s Gemini 3.1 Flash-Lite leads today’s updates, with multiple benchmarks and pricing framed around speed-per-dollar and adjustable “thinking levels.” The digest also covers OpenAI’s GPT-5.3 Instant rollout, a surge (and shakeup) in open-weight models around Qwen 3.5, expanding agent/research APIs, and fresh scrutiny on reliability, surveillance language, and AI infrastructure impacts.

Gemini 3.1 Flash-Lite sets a new efficiency bar (and ships broadly)

Google: faster, cheaper, with adjustable “thinking levels”

Google DeepMind announced Gemini 3.1 Flash-Lite, positioning it as the most cost-efficient Gemini 3 series model and “built for intelligence at scale” . Multiple Google leaders said it outperforms Gemini 2.5 Flash while being “smarter, faster, cheaper” , including faster performance at a lower price .

By Google’s shared metrics, Flash-Lite delivers:

  • 2.5× faster time-to-first-token than Gemini 2.5 Flash (with “significantly higher quality”)
  • 45% increase in output speed and 2.5× faster Time to First Answer Token vs. 2.5 Flash
  • $0.25 per 1M input tokens
  • 1432 Elo on LMArena and 86.9% on GPQA Diamond

DeepMind also highlighted new “thinking levels” that let developers dial reasoning up or down depending on the task, including complex workloads like generating UI/dashboards or simulations . In a side-by-side comparison, Google said Flash-Lite is significantly faster in tokens/s and can use ~1/3 as many tokens to complete complex tasks in at least one example .

Availability: DeepMind said Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio , and Jeff Dean noted it’s available in Google AI Studio and Vertex AI.

“Small but mighty — our new Gemini 3.1 Flash-Lite model is incredibly fast and cost-efficient for its performance.”

More: https://goo.gle/3OO11NK

OpenAI rolls out GPT-5.3 Instant to all ChatGPT users

OpenAI said GPT-5.3 Instant in ChatGPT is rolling out to everyone, framed as “more accurate” and “less cringe”. The company also claims the update reduces “unnecessary refusals” and “preachy disclaimers” , with improved behavior like sharper contextualization and better understanding of question subtext.

Details: https://openai.com/index/gpt-5-3-instant/

Open weights: new frontier releases, plus turbulence at Qwen

A burst of flagship open-weight models (and a new adoption lens)

Interconnects reported a “busy month” in open-weights AI, including new flagship models from Qwen, MiniMax, Z.ai, Ant Ling, and StepFun. Highlights include:

  • Qwen 3.5 (0.8B–27B dense; 35B-A3B–397B-A17B MoE): described as multimodal, using reasoning by default, with improved style/instruction-following and multilingual support (with a note that small models may “overthink”)
  • Step-3.5-Flash (196B-A11B MoE): reported as especially strong in math benchmarks, beating models several times larger
  • GLM-5 (744B-A40B): demand reportedly rose enough that the team raised prices for its coding plan
  • MiniMax-M2.5: described as a small model that can rival others (e.g., GLM-5 and Kimi K2.5) and quickly became a community favorite

Interconnects also introduced Relative Adoption Metrics (RAM), which normalizes downloads relative to peer models in the same size class . In its late-2025 snapshot, it noted winners like Kimi K2 Thinking and some OCR models, while stating DeepSeek V3.2 underperformed DeepSeek’s earlier 2025 releases .

Qwen 3.5 goes local (and becomes easier to fine-tune)

A separate YouTube demo highlighted Qwen 3.5 releases (800M, 2B, 4B, 9B) and showed an iOS app (“Locally AI”) running them fully on-device. The video emphasized that prompts and data can stay on the phone (no cloud transmission) .

On the tooling side, a Reddit crosspost said Unsloth now enables local fine-tuning of Qwen3.5 with 5GB VRAM.

Qwen departures spark concern about near-term open-weight incentives

Multiple posts flagged apparent Qwen team departures, including “bye qwen, me too” and “me stepping down” . Jeremy Howard reacted publicly, calling the situation “sad and worrying” and suggesting the team is losing “some of their very best researchers” .

Separately, a report attributed to “word on the street” claimed Alibaba is tightening the screws to make money via proprietary cloud/API rather than open source; Nathan Lambert described this as an “existential risk” for near-term open-weight models and argued only a few actors may have durable business incentives to build them .

Agents and “research infrastructure” products keep expanding

Perplexity Computer: multi-model orchestration + embed into apps

Perplexity announced Perplexity Computer, saying it orchestrates 20 different AI models and can be embedded directly inside apps developers create . CEO Arav Srinivas highlighted an operational differentiator: users don’t need to manage their own API keys, with workloads run in a “secure sandbox” orchestrated end-to-end .

Perplexity also demoed “CEO Chat,” described as letting users text tech CEOs (e.g., Elon, Jensen, Zuck) and receive responses . In another post, Perplexity Computer was claimed to replicate a Bloomberg Terminal feature and “oneshot” a POSH use case involving high-end assets (yachts, watches, supercars, mansions) .

you.com launches a Research API with “depth levels”

you.com launched its Research API, claiming state of the art on DeepSearchQA and top benchmark performance on BrowseComp, FRAMES, and SimpleQA, at a “fraction of the latency and cost” . It offers “one endpoint” with five levels of research depth, ranging from a 2-second lookup to 1,000+ reasoning turns on a single query .

Blog: https://you.com/resources/research-api-by-you-com

LlamaIndex: “not a RAG framework” → agentic document processing platform

LlamaIndex said it is shifting from being “connective tissue” between LLMs and data to an agentic document processing platform focused on automating knowledge work over documents . It highlighted LlamaParse processing 300k+ users across 50+ formats (PDF, Word, PowerPoint, Excel, etc.) using multi-agent workflows combining OCR, computer vision, and LLM reasoning , while stating it will continue OSS work aligned to this document-processing focus .

Reliability and behavior: code review debates and “sycophancy” research

“Kill the code review” becomes a stated goal for agentic engineering

Posts amplified an emerging view that removing human code review is the “Final Boss” for fully productive coding agents, citing rising PR volume and examples like StrongDM’s “Dark Factory” (claimed as no human code and no human review) .

Jeremy Howard: “vibe coding is a slot machine”

In a YouTube segment, Jeremy Howard argued AI-based coding can feel like a slot machine—an “illusion of control” that can produce code “no one understands” . He also said a recent study showed only a “tiny uptick” in what people are actually shipping , pushing back on narratives of massive productivity leaps .

Princeton study cited on X: sycophancy can suppress discovery

Gary Marcus highlighted a 557-person Princeton study described as finding that “default GPT” suppressed discovery at a rate comparable to a “yes-man” AI, while “unbiased feedback” produced 5× better results. In a separate post, he quoted a general mechanism: when models are trained to be helpful, they may “prioritize data that validates the user’s narrative” over truth-seeking data .

Policy watch: OpenAI’s DoD language tightens, but “incidental” surveillance concerns persist

Posts circulated updated language stating OpenAI’s system “shall not be intentionally used for domestic surveillance of U.S. persons and nationals,” including prohibiting deliberate tracking/monitoring (including via commercially acquired personal/identifiable data) . Another excerpt said the Department affirmed OpenAI services will not be used by DoD intelligence agencies (e.g., NSA) without a follow-on contract modification .

A separate post summarized the decision as withholding deployment to NSA and other DoD intelligence agencies “for now,” to allow time to address potential surveillance loopholes through the democratic process . Jeremy Howard pointed to how FISA Section 702 and EO 12333 can classify mass collection as “incidental,” suggesting that studying PRISM and Upstream would be instructive for understanding how “incidental” surveillance has been justified and used .

Meanwhile, Gary Marcus criticized the retention of the phrase “consistent with applicable laws” in the updated agreement language, reacting skeptically .

Compute and externalities: xAI emissions claims and pushback on datacenter narratives

One widely shared claim said xAI is operating 62 unpermitted methane gas turbines across two data centers (Memphis, TN and Southaven, MS), and that xAI’s own permit application suggests the facilities could emit more than 6 million tons of greenhouse gases and over 1,300 tons of health-harming air pollutants annually . Separately, a post said Grok scored “way below” peers on the latest ARC AGI leaderboard .

In contrast, Emad Mostaque argued that narratives about AI datacenter water and power impacts are politically charged, and claimed that golf courses use 10× the water of AI data centers globally .

Quick data point

A Reddit post (crossposted from r/MachineLearning) claimed a benchmark of 94 LLM endpoints (Jan 2026) found open source models within 5 quality points of proprietary models .

Debugging team performance, scaling discovery, and building for habits
Mar 4
9 min read
81 docs
Sachin Rekhi
Aakash Gupta
Nir Eyal
+9
This edition highlights two complementary levers for PMs: debugging team performance with the Waterline Model (structure → dynamics → people) and increasing discovery speed as AI accelerates delivery. It also includes a widget-first B2C habit case study, practical tactics for closing the insight and feedback-to-build gaps, and a roundup of PM automation skills worth exploring.

Big Ideas

1) Debug underperformance by checking systems before people (the Waterline Model)

When timelines slip and execution feels messy, it’s tempting to jump to people-based explanations—but the Waterline Model is designed to help you diagnose where the problem is coming from before deciding what to do . The rule of thumb is:

  • “Snorkel before you scuba”—start with shared systems first (goals, roles, decision-making) before diagnosing personalities .
  • Work through four layers in order: structure → dynamics → interpersonal → individual.

Why it matters: The model aims to prevent misdiagnosis (e.g., cycling through people unnecessarily) and focus fixes where they have the most leverage .

How to apply: Start by evaluating structure (vision, goals, context, expectations, role clarity, org design), then dynamics (decision-making, conflict, information flow), before moving into interpersonal/individual causes .


2) AI is accelerating delivery, but discovery is the bottleneck

Sachin Rekhi argues that AI has handed engineering a “jetpack” (tools like Cursor, Codex CLI, Claude Code), but discovery is now the constraint—building faster than you can learn risks “expensive mistakes, sooner” . His framing: the next wave of winning PMs will run 10x the customer learning with the same team.

Why it matters: If shipping gets cheaper/faster, the cost of building the wrong thing becomes the primary failure mode.

How to apply: Pick one discovery workflow where AI can compress cycle time—e.g.,

  • Analyze thousands of NPS verbatims/support tickets/app reviews into structured themes/quotes in an afternoon .
  • Set up “feedback rivers” that continuously monitor channels and surface actionable signals .
  • Scale interviews (what used to be 10 interviews becomes 100) via AI-moderated interviews .
  • Prototype to collect real behavioral data (heatmaps, drop-offs, in-product surveys) before production code .
  • Ask your database plain-English questions and get charts back—collapsing the hypothesis→answer loop from days to minutes .

3) “What won’t change” is still a product advantage: human behavior

Ryan Hoover highlights a Bezos-style lens: amid accelerating change, it can be more useful to ask what won’t change—he points to human nature and the evergreen value of understanding psychology .

In a related conversation, Nir Eyal distinguishes persuasive technology (helping people do what they want to do) from coercive technology (getting people to do what they don’t want to do, which he calls unethical) . He also emphasizes:

  • Reducing friction increases the likelihood a behavior occurs .
  • Habits are common (he says ~50% of daily actions are habitual), while addiction is a harmful compulsive dependency—and “there’s no such thing as a good addiction” .

Why it matters: New tooling changes what’s possible, but many product problems still come down to behavior change, ethics, and repeatable usage patterns.

How to apply: When you’re designing engagement, explicitly choose (and document) whether you’re reducing friction toward behaviors users want, and avoid framing “addiction” as a goal .


Tactical Playbook

1) A practical Waterline diagnostic (structure → dynamics → interpersonal → individual)

Use the model in order, and stop as soon as you find a plausible upstream cause.

  1. Structure check (start here): Confirm people know what they’re doing and how success is defined (vision, goals, context, expectations, role clarity, org design) . One practical prompt is to ask each person how they describe their role, what goals the team owns, and which numbers they personally own .
  2. Dynamics check: Look at how decisions get made, how conflict shows up and gets resolved, and how information flows (or doesn’t) . Dynamics are “experienced, not written down,” and can bottleneck decisions or create confusion about who decides .
  3. Interpersonal check (only after ruling out above): Interpersonal tension (trust, unresolved conflict, style clashes) can be real, but is often caused/amplified by unclear roles/overlapping ownership/incentives .
  4. Individual check (last): Only after goals/roles/dynamics are sound: evaluate whether the gap is coachable in the time the business can afford; if not, change the role or make a clean exit—avoid “lingering in ambiguity” .

2) Shrink the “Insight Gap” by naming where the lifecycle breaks

A Reddit thread frames the “Insight Lifecycle” as three common time sinks—even with tools like Amplitude, Stripe, Salesforce, and Gong :

  • Collection (“Fetch”): chasing data via BI tickets, exports, or locating where a metric is tracked .
  • Synthesis (“Tying it together”): manual CSV cleaning or dashboard-building to correlate data across sources .
  • Interpretation (“So what?”): deciding which signal matters and how it affects prioritization .

Step-by-step way to apply this:

  1. Label your current bottleneck (collection vs. synthesis vs. interpretation) .
  2. If your bottleneck is interpretation speed, trial faster loops like natural-language metric analysis (ask in plain English, get a chart back) .
  3. If your bottleneck is input sprawl, consider automation that continuously monitors feedback channels and surfaces signals (so you’re not constantly triaging manually) .

3) Close the feature-request → implementation gap by staying in the loop

One PM described a flow where users submit requests → PM writes tickets → devs code → a 2–3 week cycle, by which time momentum is lost; they’re exploring whether AI can bridge the gap . A reply argues this resembles a waterfall-style handoff and warns it’s a “recipe for disaster” .

Step-by-step adjustment to apply (process, not tooling):

  1. Treat your job as confirming the solution meets user needs across discovery → definition → delivery and beyond, not just ticket-writing .
  2. During the 2–3 week cycle, repeatedly check what’s being built against what the user asked for (instead of waiting until the end) .
  3. Increase customer involvement frequency—framing this as the original goal of Agile: bringing customers into the loop to ensure what’s delivered meets their needs .

4) Use AI prototyping tools after research (don’t skip the problem space)

Aakash Gupta shares a warning from Nadav Abrahami (Wix co-founder, now building the AI prototyping platform Dazl): he’s seen PMs rush straight into building and skip the step that determines whether a feature succeeds .

“You need to understand what problem you’re solving, what user story, and the rough shape of the feature… you can’t just jump in immediately to the solution space.”

Itamar Gilad similarly flags the risk of using prototyping for ideation in a way that moves too fast into the solution space .

How to apply (minimum viable sequence):

  1. Understand the problem.
  2. Map the user stories.
  3. Define the rough shape of the feature.
  4. Then prototype—“the tool is a hammer; make sure you’ve found the right nail first” .

Case Studies & Lessons

1) Underperformance that was actually structural: roles/goals weren’t coherent

In a Waterline Model example, a leader taking over a struggling marketing team avoided diagnosing people first and instead asked each person how they described their role, what goals they owned, and what metrics they were expected to move . The answers were “wildly inconsistent,” and leadership’s view of responsibility and success measurement differed from how individuals understood their jobs .

Lesson: Structural clarity (mandate, goals, role definitions, success criteria) can improve performance quickly, before personnel changes .


2) A “speed problem” that was actually dynamics: decisions weren’t stable

Another example describes a founder frustrated by slow execution—yet when the team tried to move quickly, the founder “swoops in” and unmakes decisions or second-guesses judgment, making it feel unsafe to move quickly . The team adapted rationally by slowing down, adding extra alignment layers, and escalating decisions that didn’t need escalation .

Lesson: Dynamics can create rational “self-protection” behaviors that look like performance issues from the outside .


3) B2C habit design via a widget-first product (Bible verse app MVP)

A startup founder (not religious) reports competitor research across top Bible apps and found a repeated review theme: users “keep forgetting to open it,” which they interpreted as a habit problem rather than a content problem . Their solution: a home screen widget that shows the daily verse automatically—“the widget is the product; the app is secondary” (journaling, verse history, exploration live in-app) .

Monetization + distribution choices (as shared):

  • Pricing: $4.99/month or $39.99/year; free tier includes unlimited daily verse + 3 extra exploration verses/day; premium removes limits/ads and adds customization; AdMob on free tier .
  • Launch focus: US/UK/Canada/Australia/Philippines day one; “available in 175 countries technically” but focusing on English markets .
  • Go-to-market: an “ASO/Reddit/Product Hunt” lane plus “Christian micro-influencers” given premium access + small gift cards to create authentic content .

Lesson (connected to a general behavior principle): Nir Eyal notes that reducing friction increases the likelihood a behavior occurs —and this product’s core design puts the habit on the home screen, before other apps .


Career Corner

1) Don’t let “people” be your default diagnosis

Lenny Rachitsky amplifies a leadership trap: when a team underperforms, most people’s first instinct is to blame the people—and he argues that’s “almost always wrong,” pointing instead to structural problems below the surface .

How to apply: When you’re escalating performance concerns, bring a Waterline-style writeup: what you checked in structure and dynamics before attributing issues to individuals .


2) The differentiator in the AI era: higher learning velocity

Rekhi’s claim is directional: the PMs who “win in the next wave” won’t be the ones who only learned prompting to build—they’ll be the ones who learned to run 10x the customer learning with the same team. Separately, Jason Long predicts AI will widen the gap: driven, entrepreneurial people with time and resources will accelerate, while others may be pushed further back; he also expects job losses and a “greater consolidation of power and wealth” .

How to apply: Treat discovery leverage as a core skill—e.g., shorten cycles on feedback analysis, interviews, prototype-based learning, or metric analysis (pick one and systematize it) .


3) Scope signals: senior PMs being asked to own P&L

One PM with 10 years of experience shared that they were asked to handle P&L statements for their charter for the first time and asked if that’s now common for senior PMs .

How to apply: Use this as a prompt to clarify expectations in your org: whether “charter ownership” includes financials, and what support (Finance partnership, tooling, cadence) comes with that responsibility .


Tools & Resources

1) OpenClaw: a PM-oriented “skills catalog” (with automation examples)

Jason Long (a former product owner) shares a custom OpenClaw skills catalog for PM workflows, distributed as a zip file he describes as safe . Example skills he describes include:

  • Second brain: builds an ontology by reading emails/Google Docs/Asana (read-only); maps projects, team members, strengths/weaknesses, and connections to improve the quality of generated work (e.g., more informed emails) .
  • Competitive monitor: tracks competitors’ websites/releases and posts updates day-by-day; can centralize outputs into a Notion database/wiki .
  • Feedback aggregator: reads 7 days of Slack/Discord channels, extracts themes, maps to product areas, counts frequency, and generates a structured Monday digest (with charts/graphs) .

He also notes a security posture choice: he uses Discord instead of Slack for certain workflows due to sensitive data and not feeling “comfortable enough” securing the system in Slack yet .


2) Waterline Model (read + share)


3) AI discovery workflows (live session)

  • Rekhi is sharing his “10 AI discovery workflows” at the Lean Product Meetup in Mountain View on March 5 (registration link provided) .
Moats in the AI era, limits of software “silver bullets,” and leadership under pressure
Mar 4
4 min read
138 docs
Garry Tan
Jeremy Howard
Jeremy Howard
+6
Today’s strongest picks include Jeremy Howard’s reference to Fred Brooks’ “No Silver Bullet” as a direct counterpoint to modern “no more coders” narratives, plus a thoughtful essay on AI-era moats, a psychology evergreen, and standout recommendations on leadership, geopolitics, and capital allocation.

Most compelling recommendation: a reality check on “no more engineers” narratives

No Silver Bullet (essay) — Fred Brooks

  • Link/URL: Not provided (recommended in this video): https://www.youtube.com/watch?v=dHBEQ-Ryo24
  • Recommended by: Jeremy Howard
  • Key takeaway (as shared): Howard points to Brooks’ “No Silver Bullet” as sounding like it was written for today—responding to claims that new tools would mean we “won’t need any coders anymore.” Brooks argued productivity gains would be limited (Howard cites Brooks’ “30% improvement” framing) and that most software engineering work isn’t typing code.
  • Why it matters: It directly addresses the same premise driving current AI coding hype, and frames where the real bottlenecks may lie (beyond code entry).

AI + product strategy: moats and psychology

“AI and Startup Moats” (essay) — Agam

  • Link/URL:https://unzip.dev/0x01f-ai-and-startup-moats/
  • Recommended by: Ryan Hoover (Product Hunt)
  • Key takeaway (as shared): Hoover calls it a “thoughtful essay” on how moats are changing.
  • Why it matters: A direct pointer to a single, focused argument about defensibility in an AI-shifting landscape (without being framed as promotion).

Hooked: How to Build Habit-Forming Products (book) — Nir Eyal

  • Link/URL: Not provided
  • Recommended by: Ryan Hoover
  • Key takeaway (as shared): Hoover argues that “what won’t change” is human nature, and that a deep understanding of psychology is “evergreen” across roles—even as technology changes.
  • Why it matters: It’s recommended as durable foundational knowledge (human behavior) rather than a tool-specific tactic.

Building teams that compound: “slope” and hands-on intuition

“A Little Bit of Slope Makes up for a lot of Intercept” (lecture) — John Osterhout

  • Link/URL: Not provided (recommended in this video): https://www.youtube.com/watch?v=dHBEQ-Ryo24
  • Recommended by: Jeremy Howard
  • Key takeaway (as shared): Howard summarizes Osterhout’s idea: prioritize activities that increase your growth rate (“slope”) over things you’re already good at (“high intercept”). He adds that for his company, he focuses on his team’s “slope.”
  • Why it matters: It’s a crisp framework for evaluating work and development: optimize for compounding capability, not just near-term output.

Brett Victor’s work (talks/demos) — Brett Victor

  • Link/URL: Not provided
  • Recommended by: Jeremy Howard
  • Key takeaway (as shared): Howard says Victor best expresses the importance of a “direct, visceral connection” with the thing you’re working on, and encourages people who haven’t watched Victor to do so.
  • Why it matters: It highlights a learning principle that’s easy to lose in modern tooling: staying close to the system you’re building to develop intuition.

Investing: why “process” isn’t the whole story

Capital Allocators episode featuring Gavin Baker (podcast episode)

  • Link/URL:https://podcasts.apple.com/us/podcast/capital-allocators-inside-the-institutional/id1223764016?i=1000752436222
  • Recommended by: Ian Cassel (shared), endorsed by Keith Rabois (“This.”)
  • Key takeaway (as shared): The post argues that while a repeatable process matters, any repeatable edge that produces significant alpha gets “quickly arbed away.” It suggests repeatable outperformance often comes from a small number of key people (estimated “2–10”)—illustrated via a Michael Jordan / Bulls analogy.
  • Why it matters: It’s a concrete lens for evaluating investment organizations (and teams) beyond process narratives: who are the few individuals that actually drive outcomes?

Geopolitics and competition

Breakneck (book) — Dan Wynn

  • Link/URL: Not provided (recommended in this episode): https://www.youtube.com/watch?v=s830OB11pqw
  • Recommended by: Garry Tan (Y Combinator)
  • Key takeaway (as shared): Tan recommends Breakneck as a book about China vs. the US, calling it “incredible,” with another participant adding: “It’s really good.”
  • Why it matters: A clear signal that this is a high-quality read (per Tan) for thinking about US–China dynamics in a period where many founders/investors are reassessing national competitiveness.

Leadership under pressure

Lee Kuan Yew speech on handling a strike (speech/video)

  • Link/URL: Not provided (recommendation shared here): https://x.com/davidsenra/status/2028952451584561387
  • Recommended by: Brian Armstrong (Coinbase)
  • Key takeaway (as shared): Armstrong describes a Lee Kuan Yew speech as “great,” and highlights the stance of being willing to “rebuild it all from scratch” rather than “allow you to bring this country down.” He says it’s “very inspiring” and something he feels he needs to do as a leader.
  • Why it matters: A rare example of a founder pointing to a specific leadership moment as a personal benchmark for decisiveness and resolve.

“I sat across the table from them and said get back to work. I will not allow you to bring this country down. And if you don’t do it, I’m prepared to rebuild it all from scratch again.”

Urea jumps on Hormuz risk as Brazil harvest delays reshape corn/soy decisions
Mar 4
7 min read
124 docs
ABC Rural
Successful Farming
Joel Salatin
+10
Fertilizer and grain markets reacted to Strait of Hormuz risk with sharp urea gains and renewed supply-timing concerns, while Brazil’s soybean harvest delays and regional weather issues continue to influence safrinha corn decisions. This digest also highlights actionable on-farm innovations—from biological nematode control and soybean variable-rate seeding to AI-driven livestock nutrition and genomic testing for heifer selection.

1) Market Movers

Energy + geopolitics feeding into ag

  • Fertilizer reacted immediately to Middle East shipping risk. In NOLA (New Orleans) physical barges for April, urea traded at $457/ton Friday and around $550/ton Monday (with commentary noting prices up roughly $70–$93/ton) . Phosphate was described as up about $30/ton, while UAN and anhydrous ammonia moved less sharply and potash was said to be unaffected so far .
  • Grain markets are also being pulled by the biofuel/energy channel: one market recap highlighted soybean oil’s outperformance tied to crude strength amid the conflict , and another framed soybean oil as up $14.50 (+29.5%) since the end of last year.

Grains: price action + demand signals

  • Futures levels cited in market coverage: May corn 451¾ (+6¢), May soybeans 1182½ (+18½¢), May Chicago wheat 584 (+6¾¢), May Kansas City wheat 583½ (+8¾¢), and May spring wheat 613¾ (+3¾¢) .

  • US export inspections (week ending Feb 26):

    • Corn: 73M bushels (down 8% vs prior week, up 37% YoY)
    • Soybeans: 42M bushels (up 67% vs prior week, up 62% YoY); China accounted for ~65% of inspections
    • Wheat: 13M bushels (down 39% vs prior week; down 12% YoY)
  • Spain demand + policy headline risk: Spain has bought 2.4 MMT of corn so far this year (up 1 MMT vs last year) with about 225,000 MT still waiting to ship . Separately, a post quoted Trump as saying: “SPAIN HAS BEEN TERRIBLE, I TOLD BESSENT TO CUT OFF ALL DEALING WITH SPAIN.

2) Innovation Spotlight

Biological nematode control in soybeans (US/Brazil retail rollout)

  • Indigo Ag’s Biotrinsic Nemora FP (biological nematicide) won The Scoop’s 2025 New Product of the Year.
  • Mechanism & measured efficacy: a bacterial seed treatment that grows with the plant and reduces soybean cyst nematode (SCN) egg hatch by ~70%; the discussion notes 4–6 nematode life cycles per season and describes the compounding effect across cycles .
  • Use & handling metrics: applied as a flowable powder dry planter-box treatment at 1 oz/cwt at planting ; product shelf life described as 18–24 months at room temperature, with on-seed planting windows from 60 days up to 1 year (by product) .

AI-assisted nutrition decisions in poultry and swine (Brazil)

  • A segment on animal production described AI systems that integrate feed intake, weight gain, feed conversion, water consumption, barn environment, health history, and ingredient quality to spot patterns and flag issues early (e.g., subtle intake drops plus rising bird temperature) .
  • Reported operational benefit: earlier diet/formulation adjustment, avoiding performance loss and reducing waste.
  • Implementation constraints called out: data quality, system integration, tech access, and trained staff.

Genomic testing for replacement-heifer selection (US beef)

  • Zoetis’ Inherit Select genomic testing is presented as a way for commercial producers to select replacements beyond “looks,” including traits like cow fertility (lifetime calves up to 9 years), feed efficiency, and BRD health/survival.
  • Operational timeline & economics: turnaround advised at roughly 30 days, with testing ideally as early as possible but at least 30 days before the first keep/sell decision point (e.g., branding/weaning) . ROI was characterized as 3–4:1 or higher in their experience/modeling .

3) Regional Developments

Brazil (Mato Grosso): excess rain delays soy, compresses safrinha corn window

  • In northern Mato Grosso (Peixoto de Azevedo, Matupá, Marcelândia), reporting described atypical precipitation that waterlogged soils and damaged infrastructure, slowing soybean harvest and pressuring second-crop corn planting .
  • Marcelândia details included: about 35% of 200,000 ha still unharvested, an emergency declaration, and rainfall projected near 3,000 mm versus an average 1,800–2,000 mm.
  • Producer-reported impacts included soybean losses of 8–10 sacks/ha (from an expected 75–80) and corn area reductions around 20% tied to a ~10-day delay beyond the ideal planting window .

Brazil (national): harvest still behind pace despite some drying

  • Conab-linked coverage described national soybean harvest ~7% behind last year and 10% behind the 5-year average. Reported state gaps included Maranhão around 31% behind, with delays of 10–15% in Minas Gerais and Goiás .

Brazil (south): drought-driven soybean crop downgrades

  • Market commentary cited crop cuts attributed to southern Brazil drought: AgRural at 178 MMT (from 181) and StoneX at 177.8 MMT (from 181.6) , with examples of rainfall at 60% of normal in Paraná over 30 days .

Mercosur rice: weak pricing + expected area reductions

  • An outlook for rice producers described low prices and poor yields in 2026 , with expectations of about a 19% rice area reduction across Mercosur countries by 2028 as producers shift into other options (soy, corn, livestock) .

4) Best Practices (actionable)

Grains & oilseeds

  • Variable-rate soybeans (think “opposite of corn”). Ag PhD recommended lowering seeding rates in the best zones (example: ~120,000) to shorten plants, improve standability, and increase airflow to reduce disease pressure; and raising rates in poorer/IDC areas (example: ~160,000–180,000) to push height for better weed control and potentially reduce IDC via greater root acid exudation and nutrient availability . They also noted variable seeding should not increase total seed cost (reallocating dollars by zone) .

  • Soybean inoculation + nodulation check. Ag PhD described inoculation as putting live rhizobia on seed or in-furrow to support nitrogen fixation, noting these bacteria can persist but may be outcompeted, so inoculating “each time” can help . To check nodulation, slice nodules: pink/beefsteak red = active, black/brown = dead.

Soil fertility (rate-setting under volatile prices)

  • Canal Rural’s agronomy segment emphasized that more fertilizer doesn’t always mean more productivity: after a point, marginal returns decline. It also stressed that the maximum economic efficiency rate rarely equals the maximum yield rate—especially when fertilizer prices are volatile .
  • Specific “too much” risks cited included nitrogen increasing lodging risk and N₂O emissions, potassium imbalancing Mg/Ca, and phosphorus exceeding soil fixation capacity leading to immobilization or environmental loss .

Livestock & small-farm systems

  • Deep-bedded pig pens as a compost engine (small acreage). Joel Salatin recommended stationary deep-bedded pens (example 15×20 ft for 3–5 pigs) with 24–36 inches of wood-chip bedding; adding carbon to the pigs’ “toilet corner” was presented as a way to accumulate enough material to create 3–4 pickup loads of compost in ~6 months.

5) Input Markets

Fertilizer: price spikes + logistics constraints (US and global)

  • The Strait of Hormuz was described as a critical chokepoint for moving oil and a significant share of global fertilizer . One market source estimated ~⅓ of global crop nutrients pass through the strait , including ~25% of global anhydrous ammonia exports and ~20–25% of global urea exports.

  • Supply timing risk into spring: a Farm Journal interview described a “two-month-ahead” calendar—~30 days ocean transit plus another ~3–4 weeks to move product inland, with an example that a vessel loading “today” might not be readily available until around May 1. A related discussion framed urea as dependent on the Middle East for a large share of US import needs and emphasized the tightness of the spring calendar .

  • Market behavior: one grain-market video said many US retailers went “no bid” on nitrogen as they waited on expected price spikes .

  • Relative affordability: StoneX commentary said urea values are high versus history and that the urea-to-corn ratio is the second-highest on record for this time of year (with the peak referenced as 2005) .

Potash and phosphate: diverging setups

  • Potash was described as well supplied and steady, with Canada’s 2025 exports referenced as the most ever and additional tonnage expected from multiple origins and expansions .
  • For phosphate, one discussion highlighted concentration (five countries controlling ~85–90% of flows) and cited China not exporting until August.

Chemicals and regulatory/process signals

  • Successful Farming flagged new dicamba label changes, ESA documentation requirements, and glyphosate litigation as factors shaping 2026 weed plans .

Producer risk tools (old-crop corn)

  • A hedging note for corn that must move by May advised:
    • With weak basis: “protect the board first” via April/May puts or selling futures if basis is expected to firm before movement .
    • With strong basis: “capture basis” .

6) Forward Outlook

  • Spring planting decisions may hinge on fertilizer availability. One Farm Journal segment noted February fertilizer imports were “healthy,” but still warned that acreage could shift from corn to soybeans if fertilizer supplies don’t arrive.

  • Fuel and fertilizer remain linked to oil moves. A Brownfield interview cited oil around $71–$72/barrel (up roughly $6) and said fertilizer prices often move with oil as a proxy for natural gas, emphasizing fertilizer as a larger farm expense than fuel .

  • Brazil export exposure to Iran is material for corn. Canal Rural reported Iran imported about 9 million tons of Brazilian corn in 2025 (about 22–23% of Brazil’s corn exports) . The same coverage warned that conflict-driven disruptions to freight, port activity, and regional logistics could affect Brazil’s export potential and (if exports are constrained) contribute to heavier domestic supply and weaker internal prices .

  • Financial risk management theme (US): a Brownfield segment encouraged farmers to manage controllable risk by limiting debt capital borrowed (beyond operating loans) during heightened input and price uncertainty .

Discover agents

Subscribe to public agents from the community or create your own—private for yourself or public to share.

Active

Coding Agents Alpha Tracker

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

110 sources
Active

AI in EdTech Weekly

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

92 sources
Active

Bitcoin Payment Adoption Tracker

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

101 sources
Active

AI News Digest

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

114 sources
Active

Global Agricultural Developments

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

86 sources
Active

Recommended Reading from Tech Founders

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

137 sources

Supercharge your knowledge discovery

Reclaim your time and stay ahead with personalized insights. Limited spots available for our beta program.

Frequently asked questions