ZeroNoise Logo zeronoise

PM Daily Digest

Active
Public Daily at 7:00 AM Agent time: 8:00 AM GMT+01:00 – Europe / London

by avergin 78 sources

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI‑Native PM: Strategy, GTM Payback, Demo Patterns, and Career Leverage
27 September 2025
10 minutes read
Teresa Torres Teresa Torres
Lenny Rachitsky Lenny Rachitsky
andrew chen andrew chen
+7
AI-native strategy you can run this quarter, GTM budgeting by CAC payback and high-intent targeting, demo patterns that convert, founder-transition lessons, and vetted tools—from evals to async ops—backed by real metrics and examples.

Big Ideas

1) AI‑native product strategy before roadmaps

Plans are worthless, but planning is everything.”

substack.com

Why it matters: Teams waste cycles “sprinkling AI” or copy‑pasting competitors without reshaping the value prop or building defensible moats .

How to apply:

  • Step 1 — Mission and problem: Deeply understand the problem and how AI could change your value proposition .
  • Step 2 — Vision: Define the next‑gen experience and create a visual artifact of how users will interact differently .
  • Step 3 — Strategy: Choose capabilities that unlock advantage and moats tied to your current position .
  • Step 4 — Then roadmap: Only after 1–3, specify features and sequence .

2) Close the AI promise–delivery gap

Why it matters: Demos look effortless, but production reveals messy data, resistant workflows, and compliance; pilots stall without disciplined delivery .

How to apply:

  • Budget for integration, data cleanup, and governance; set “pilot → production” exit criteria and owners .
  • Make discipline (not grand vision) the success variable in your plan; hold reviews on closing the gap, not just showing demos .

3) Don’t fall for the “illusion of continuity” — Do Your Own Research

Hype moves fast. Truth takes work. Do Your Own Research. Degens never forgot it. Everyone else is just catching up.

x.com

Why it matters: Past‑cycle playbooks (GTM, retention, pricing) won’t map cleanly to AI; fundamentals are shifting and everything feels “up for grabs” .

How to apply:

  • Run micro‑experiments and interviews before changing strategy; treat old instincts as hypotheses, not truths .

4) UI isn’t dead; it’s evolving (choose “both”, not “either/or”)

The UI is far from dead. But it will evolve.

www.reddit.com

Why it matters: Chat can add friction for common tasks; agents should augment, not replace, predictable UIs. Human‑in‑the‑loop and NLP will reshape search, analysis, and automation, but not every interaction .

How to apply:

  • Use AI to automate background work and surface edge cases (HITL) .
  • Apply NLP where it reduces complexity (e.g., search, insights) .
  • Keep fast, tactile controls for simple tasks; offer an on‑demand chatbot alongside standard UI .

5) GTM budgeting: from percentages to payback and propensity

Why it matters: %‑of‑revenue models fail early; funded B2B teams that anchor on CAC payback and high‑intent targeting scale more efficiently .

How to apply:

  • Pre‑PMF: don’t spend heavily on acquisition . Post‑PMF: front‑load spend (often 40–60% early funding) to capture share; expect 2–3× more for US entry .
  • Treat marketing as an investment: scale channels until marginal ROI falls, not to an arbitrary cap .

6) A practical bar for automation: the Economic Turing Test

Why it matters: Replace abstract “AI can do X” with a concrete job trial to judge real replaceability .

How to apply:

  • Contract an agent to perform a role for 1–3 months; if you would hire it afterward, it passed for that role .
  • Use GDPval‑style work benchmarks to quantify capabilities and labor impact .
  • Passing 50% of money‑weighted jobs would imply AGI/transformative AI .

7) Maximize value in founder‑as‑brand businesses

Why it matters: Heavy founder dependency depresses valuation and limits buyer pool .

How to apply:

  • Systematize: document SOPs, write role descriptions, hire to remove “only I can do that” bottlenecks .
  • Sell timing: sell “on the launchpad, full of fuel” (growing, profitable) .
  • Packaging: plan ~18 months to reduce key‑person risk; faster exits trade price for limited KPI history .
  • Structure: expect a transition/earn‑out (e.g., 1‑year contract) for brand transfer .

Tactical Playbook

1) Discovery to PMF: ship, talk, delight

Why it matters: Time is your scarcest resource; learning speed beats polish .

How to apply (steps):

  1. Launch immediately; learn from real use .
  2. Manually recruit first users; do unscalable things .
  3. Delight early adopters; a few who love you > many who shrug .
  4. Avoid traps: fake work (conferences/press), overhiring, competitor obsession .

2) Demos that convert and survive the internal handoff

Why it matters: Champions can’t resell a 30‑minute recording; multi‑persona calls bury needs .

How to apply:

  • Build persona‑specific, shorter flows (Ops → workflows, Finance → ROI, IT → security) .
  • Give champions click‑through demos they can forward to preserve the story .
  • For dev audiences: offer full‑featured trials, public docs indexable by an LLM, demo scripts with authentic sample data, and repo links .
  • Frame demos around problem → how we solve → business value (not feature dumps) .

3) CAC‑payback budgeting + high‑propensity targeting

Why it matters: Payback anchors spend; high‑intent lists compress payback dramatically .

How to apply:

  • Anchor to payback: with 80% gross margin and a 12‑month target, monthly marketing ≈ LTV margin from new customers ÷ 12 .
  • Start with measurable channels; scale until marginal ROI declines; run small experiments explicitly as experiments .
  • Prioritize high‑propensity accounts: build 200–300 account lists from triggers (hiring sprees, tool changes, funding) and re‑allocate budget; teams reported shortening payback from ~18 to ~6 months .

4) Niche GTM (scraping/botting) without getting throttled

Why it matters: Policy and platform risk break mainstream ads; communities buy .

How to apply:

  • Validate funnel: 17k ad impressions with 1 contact‑page visit signals offer/LP problems; fix before scaling spend .
  • Budget meaningfully; $20/day on Meta won’t teach the algorithm .
  • Go community‑first (Discord) where you already get ~7 leads/week; work within posting limits .
  • If cold outreach, build compliance safeguards to avoid bans .

5) AI‑agent product metrics that drive adoption

Why it matters: If the agent automates core workflows, track adoption, retention, and feature mix to steer roadmap .

How to apply:

  • Instrument: chat adoption rate (%), “stickiness” (% using chat >2 days/period), AI feature distribution .
  • Improve fit: analyze conversations; increase fulfilled requests to lift adoption .

6) Product Hunt launch plan (treat PH as amplification)

Why it matters: ~200 products launch daily; early momentum determines visibility .

How to apply:

  • Launch only once you have hundreds of customers primed to upvote; aim >100 customer upvotes early, then 200–300 from PH to contend for #1 .
  • Launch as early in the day as possible; prepare assets and outreach in advance .
  • Decide identity: personal profile + company page vs brand account .
  • If you have just ~53 on a waitlist, focus on customer acquisition first; use PH to amplify later .

7) Data ingestion: prefer APIs and exports; scraping as last resort

Why it matters: Scrapers break when HTML shifts; brittle locators create noisy data .

How to apply:

  • First ask: do they have an API? can you export CSV? is it one‑off or recurring? provide the target URL for scoping .
  • If scraping: start with Scrapy; use Playwright’s more reliable selectors (placeholders, labels, ids) over brittle XPath/CSS .

8) Speed to learning > waiting for perfect

Why it matters: Micro‑experiments reduce debate cycles and reveal real messages/pain points .

How to apply:

  • Run polls, segment test emails, or single‑CTA landing pages; iterate quickly .
  • Operate with the rule: speed to learning always wins .

Case Studies & Lessons

eSpark shipped 4 AI features without an ML team

What happened: A designer, a learning designer/PM, and a VP of Product & Engineering prototyped → tested → shipped features (e.g., Teacher Assistant) by leveraging LLMs; they navigated failed chatbots, RAG/embeddings, and built evals teachers could trust .

Takeaways: Small, cross‑functional teams can move fast with LLMs if they iterate on prototypes and invest in trustworthy evaluations .

High‑propensity targeting cut payback from ~18 to ~6 months

What happened: Teams used trigger‑based lists (200–300 accounts) instead of broad blasts; they still did some brand work but shifted the bulk of budget to high‑intent accounts .

Takeaways: Pair a payback target with intent data; spend where buyers are “in market” to compress payback .

Community‑first beats throttled ads in gray‑area niches

What happened: Meta ads produced 17k impressions but only one contact‑page visit; Discord yielded ~7 leads/week; $20/day was too low for learning; compliance risk is high for automation outreach .

Takeaways: Fix offer/LP, double‑down on communities where customers live, and treat paid as a secondary, well‑funded experiment .

Founder‑as‑brand, at scale: multiple paths

Context: An edu business reporting ~$1.7M revenue ($1.1M cohort classes; $600k subscriptions) and ~$1.55M profit was heavily tied to a 1.4M‑follower personal brand .

Options raised by operators: hire domain staff and a “ghostwriter” to produce in the founder’s voice; review until trust is built, then step back; reduce cadence (e.g., daily → weekly); invest ~$350k payroll to make it more passive; or build production ops to handle non‑camera work .

Takeaways: Systematize and staff before sale; reduce single‑point‑of‑failure risk to protect valuation .

Live‑shopping is marketing, not a retail killer (mind the regs and GEO)

Context: Commenters framed live shopping as QVC‑style marketing, not an e‑commerce killer; terminology needs defining; historical precedents exist .

Regulatory/GEO notes: Consumer‑protection rules (ads vs sales conversations) and jurisdictional enforcement change viability; adoption patterns differ by culture and region (e.g., more likely in parts of SE Asia/Africa; less likely in developed markets) .

Takeaways: Position it as a channel, validate per‑market legal exposure, and localize strategy by culture/regs .


Career Corner

PMs aren’t being replaced by AI — but PMs who don’t use AI will fall behind

Why it matters: The risk is obsolescence through non‑adoption, not immediate replacement; the core PM work (strategy, problem selection, discovery, stakeholder empathy) remains human‑led .

How to apply: Automate low‑cognitive tasks with AI, quality‑check outputs, reinvest time in discovery and hypothesis testing; product sense is more important than ever .

Managing up: present options, trade‑offs, and time costs

Why it matters: Early startups bias speed over process; “no” without alternatives stalls trust .

How to apply: Frame constraints as choices with revenue impact, and quantify change by time (not just complexity); your job often includes educating leadership and “managing the CEO” .

Hiring and positioning: match the firm’s risk profile

Why it matters: Risk‑averse markets favor high‑floor, specialized resumes; innovators hire breadth .

How to apply: Read JDs as ideals unless stated “hard requirement”; pitch transferable skills and your speed to ramp; many postings include extra criteria for filtering or to satisfy process/formality .

Build networks by leading with value (and avoid success distractions)

Why it matters: Networks form around value you create — not cold asks; as success grows, invitations can distract you from what made you successful .

How to apply: Share open source, essays, talks; open with an insight, bug report, or suggestion; “network your systems” via APIs/integrations; talk to interns through CEOs; focus long‑term, non‑transactional relationships .

Funding and programs

  • Accelerators/credits: have traction; cloud credits often require a “real investor” .
  • a16z Speedrun: up to $1M; quick wire; SF‑based 12‑week program with deep operator network; solo and non‑technical founders welcome (show prototype or the engineer you recruited) .

Leadership, culture, and habits

  • Values and rituals: values make hard decisions obvious; “what teams repeat in private” shows up in what they ship .
  • Turn mistakes into stories: codify learning; conviction fuels early momentum .
  • Lead yourself first: teams match the standard you model .

Operating remote teams

  • Replace some stand‑ups with end‑of‑day Slack handover notes; use daily 1:1 handovers for shared codebases; publish 1‑page “operating manuals” (work style, comms, feedback) to reduce friction .

Tools & Resources

  • AI Prototyping (Oct 6, SF): Free in‑person session on building, testing, iterating faster; RSVP on Partiful .

    • Why it matters: Hands‑on tactics to increase discovery and delivery speed.
    • How to use: Bring a current idea; plan to prototype and test immediately.
  • AI‑native roadmapping by Aakash Gupta: Failure modes to avoid; 4‑step process; curated links to AI prototyping, vibe coding, copilots, agents .

    • Why it matters: Avoid tech‑first fantasies; build moats.
    • How to use: Produce the vision artifact before any feature list.
  • AI evals (Hamel Husain & Shreya via Lenny): Step‑by‑step how‑to; start with error analysis; you likely need only 4–7 evals; try LLM‑as‑judge and code‑based approaches .

    • Why it matters: Reliable evals keep you out of “vibes‑only” territory.
    • How to use: Build a few high‑signal evals; iterate after error analysis.
  • Fireflies: Meeting transcription that auto‑logs to CRM; test domain jargon accuracy first .

  • Arcade (arcade. software): Fast, embeddable product walkthroughs for click‑through demos .

  • Scrapy + Playwright: Start Scrapy for web crawling; use Playwright’s robust selectors to reduce brittle locators .

  • Teresa Torres “Just Now Possible” — eSpark deep‑dive: How a small team shipped 4 AI features without ML engineers (Spotify/Apple/YouTube links in thread) .

  • Async playbook ideas: Slack handover notes, daily 1:1 handovers, personal “operating manuals” in Notion .

  • International employment platforms: Mixed experiences with Papaya; alternative: Arc.dev .

  • Benchmarks for AI work capability: GDPval and the Economic Turing Test definition (50% of money‑weighted jobs → AGI threshold) .


“Your competitor’s weakness is rarely on their homepage.”

“Speed to learning always beats waiting for perfect.”

“What teams repeat in private shapes what they ship in public.”

PM Intelligence: Evals as PRDs, Finance‑First Storytelling, and AI That Delivers
26 September 2025
9 minutes read
Shreyas Doshi Shreyas Doshi
Teresa Torres Teresa Torres
Lenny Rachitsky Lenny Rachitsky
+13
Actionable PM intelligence: how to turn evals into living PRDs, speak finance to win prioritization, ship AI that delivers immediate ROI, and use micro‑surveys to reveal the “why”—plus case studies (Wise, Nextdoor, SaaStr, eSpark), career tactics, and tools to try now.

1) Big Ideas

  • Evals are the new PRDs for AI products

    • Why it matters: You need a reliable way to measure and iterate on AI behavior as your product scales beyond “vibe checks.” Well‑crafted eval prompts become living requirements that continuously test your AI in real time .
    • Apply it: Start with manual error analysis on real traces, then codify the most important failure modes into a small suite of evals (often 4–7), and run them in CI and production monitoring .
  • Prototypes accelerate discovery, not delivery

    • Why it matters: Teams get misaligned when they expect prototype code to ship. The value is in insights, not code quality .
    • Apply it: Use high‑fidelity prototypes to align stakeholders, validate demand, and win executive buy‑in; plan a separate production build.

The code was never meant to ship — the insights were.

x.com
  • Differentiate AI with building blocks: AI + your data + your functionality

    • Why it matters: Off‑the‑shelf models are widely available; durable advantage comes from proprietary data and native product capabilities (workflows, rules, integrations) that give AI unique “superpowers” .
    • Apply it: Start from unmet customer problems, then combine AI capabilities with your data and product tools. Examples: Miro’s “canvas is the prompt” AI prototyping and Granola’s amplification‑first note‑taking using off‑the‑shelf models .
  • AI must deliver immediate, meaningful ROI (no “lame copilots”)

    • Why it matters: CIO budgets favor AI that unlocks new, powerful outcomes fast; marginal efficiency tools struggle to get funded .
    • Apply it: Design for instant value, run pilots, and use forward‑deployed implementation to prove ROI pre‑contract .
  • Integrations are the hidden unlock for AI value

    • Why it matters: AI is only as good as the data you feed it; without clean, authorized, timely data across systems, the model is “basically blind” .
    • Apply it: Normalize cross‑tool data, sync ACLs in real time, maintain freshness/throughput, and monitor integrations (issue detection, logs). Consider a unified integration layer when you face dozens of third‑party systems .
  • Speak finance: map product metrics to revenue and costs

    • Why it matters: Executives listen in the language of finance. Reframe roadmap asks as investments with revenue or OPEX impact .
    • Apply it: Answer four questions with numbers (what/why/when/cost), translate DAU/MAU, retention, engagement, and tickets into LTV, revenue per user, and OPEX. Use horizon planning to stage bets .
  • One respectful question beats a month of dashboards

    • Why it matters: Micro‑surveys reveal the “why” behind behavior—often yielding immediate conversion lifts (pricing clarity +10% reporting, signup increases, higher CTR/lower CAC) .
    • Apply it: Make listening a habit—one snippet, one question at the moment of friction, act the same day. Free, unlimited micro‑surveys remove caps that break the habit .
  • Measure real‑work capability, not just benchmarks

    • Why it matters: GDPval evaluates economically valuable tasks across 44 occupations; top models are approaching expert performance while being faster and cheaper .
    • Apply it: Use task‑based evals to choose where AI augments teams; expect a human‑in‑the‑loop for scale (ratio drops over time, but humans don’t disappear—for now) .

2) Tactical Playbook

  • Build a high‑leverage eval program (4–7 tests)

    1. Sample real traces and do manual error analysis; capture the first upstream issue per trace using specific “open codes” (avoid vague labels like “janky”) .
    2. Cluster into actionable “axial codes”; you can draft with an LLM, then refine as a “benevolent dictator” (one domain expert—often the PM) .
    3. Count and prioritize with simple pivot tables; decide which issues need code fixes vs. automated evaluators .
    4. Choose code‑based checks for deterministic rules; use LLM‑as‑judge for narrow, subjective failure modes—make each judge binary (pass/fail) .
    5. Validate judges against human labels and inspect the confusion matrix; don’t trust raw agreement % alone .
    6. Operationalize in CI and production sampling; expect 3–4 days to stand up, then ~30 minutes/week to maintain .
  • Fix your funnel this week with micro‑surveys

    1. Place one question at a key drop‑off (pricing, signup, landing). Examples: “What information is missing before you choose a plan?”; “What almost stopped you from signing up today?” .
    2. Act on language from responses (rename features, clarify next steps, align ad→page messaging). Expect immediate lifts (e.g., >10% more users reporting pricing clarity; signups increase; higher CTR, lower costs) .
    3. Make it a habit: one snippet, unlimited surveys/responses, ship changes the same day .
  • Present roadmap as an investment (finance first)

    1. Translate product metrics to financials (e.g., DAU/MAU→revenue; retention→LTV; tickets→OPEX) .
    2. Use a short business case with revenue math (e.g., search example → $500K), then stage bets via Horizons 1–3 .
    3. In exec reviews, move from “resources” to “investment,” and from “UX” to “revenue impact” .
  • Design AI features to win budget and adoption

    1. Aim for outcomes users couldn’t do before; avoid marginal copilots .
    2. Run pilots with forward‑deployed engineering; prove ROI pre‑contract .
    3. Keep agents current: ingest your freshest assets (site, prospectus, stats), recrawl daily, and “augment every day” .
    4. Build the data plumbing: normalize multi‑tool data, enforce ACLs, monitor throughput/freshness .
  • Scale outbound with AI SDRs (what actually works)

    1. Don’t upload “good emails”; train on personas and how to add value. Let the agent scrape signals (LinkedIn/site/news) for deep personalization .
    2. Lean on multivariate testing—agents iterate far beyond a human’s capacity .
    3. Expect a ramp (usually several tuned touches to book meetings), not day‑one wins .
    4. Automate qualification and calendar booking to stop losing warm prospects .
    5. Measure impact: example results—1,206 leads, 5,216 messages, 136 responses, 52 positive engagements .
  • Choose the right interaction pattern for AI in workflows

    • If users need predictable, curriculum‑aligned outputs, a structured workflow can beat a chatbot. Pair RAG/embeddings with rubric‑based evals and human‑in‑the‑loop checks before wide rollout .
  • “What not to build” and when to say no

    • Treat it as strategy: reframe assumptions; get paid signals early (“features can wait, survival first”) .
    • For B2B, aim for transformative ROI (50–90% savings or 2–10x outcomes) or expect a slog; switching has real costs .
  • Better data and feedback requests

    • Start with a hypothesis—always ask “for what?” before adding instrumentation or surveys; avoid degrading UX for low‑value data .

3) Case Studies & Lessons

  • Wise: mission‑first, risk‑aware execution

    • What happened: To protect customers, Wise temporarily blocked high‑amount transfers when source‑of‑funds queues ballooned; the squad adopted “never introduce risk without understanding consequences,” did side‑by‑side work with agents, and later shipped invoice→transfer with >95% quality across 41 currencies .
    • Why it matters: Strong product culture (values, autonomy, execution) plus risk discipline is a moat .
    • Apply it: Codify stop‑loss thresholds; do ground‑truthing with operations; set quality bars before expansion .
  • Nextdoor Maps: from feature to revenue platform

    • What happened: Proved H1 (can it be built; engagement; sponsorship revenue), then layered seasonal campaigns, integrations, and platform effects; framed Maps as recurring revenue vs. a one‑off feature .
    • Apply it: Use horizons to stage learning→scale; model revenue explicitly .
  • SaaStr: AI SDRs and dynamic collateral

    • What happened: Replaced human SDRs with AI SDR, ran multivariate outreach, automated qualification/calendar, and adopted dynamic AI‑generated decks—driving measurable pipeline .
    • Lesson: Outbound isn’t dead; poor execution is. Pilots still work .
  • Micro‑surveys: small changes, big gains

    • Pricing clarity: +10% more users reported clear pricing after copy fixes; conversions followed .
    • Signup flow: simplifying password rules + clarifying “what happens next” increased signups immediately .
    • Ad→landing fit: aligning language boosted CTR, lowered cost, and raised conversions .
  • eSpark Teacher Assistant: pick the right UX for AI

    • What happened: Initial chatbot failed in testing; the team shipped a structured UI, RAG retrieval, and rubric‑based evals; thousands of teachers now use the feature .
  • Measuring real‑work capability with GDPval

    • Insight: Best models are close to experts on real tasks and are faster/cheaper—use this signal to decide augmentation vs. automation paths .
  • Guard against AI‑washing and preserve trust

    • Lesson: Over‑claiming “AI‑first” without real value increases churn; keep a human‑in‑the‑loop and prioritize reliability on high‑impact features .

4) Career Corner

  • The AI PM market is broad—and growing

    • Snapshot: 9,164 AI PMs hired in 2025 (so far); Top‑10 companies account for only 7% of hires; 3.1K more hires expected this year .
    • Compensation (examples): Senior PM TC ranges from ~$226K–$562K depending on company/level .
    • Action: Every PM should build AI fluency—prompt libraries, automations, “vibe coding,” GitHub prototypes, and side projects with users/revenue .
  • Navigate politics by creating clarity, not conflict

    • Senior PM work skews to influence and alignment—less “what to build,” more “how to get it built” .
    • Tactic: In the moment, acknowledge strong exec input without committing; follow up with crisp trade‑off analysis to guide the decision .
    • Reframe politics to “get the most important things built,” not self‑advancement .
  • Equity and role clarity at early‑stage startups

    • Guidance: 0.5% with no salary at pre‑product stage is a hard no; if you’re building core tech, you’re a co‑founder—target co‑founder‑level equity (often 20–30%) or salary+equity aligned to contribution .
    • Benchmarks: Founding engineers commonly get ~0.5–1% plus significant salary after seed; “you can’t negotiate a zero” .
    • Red flags: Teams spinning wheels for a year without a tech co‑founder; avoid equity‑only “before founding engineer” roles .
  • Mentors, coaches, and peers—use them

    • Where to ask: PM Slack groups, mentors/coaches with weekly groups, and domain bots (e.g., Lenny bot). Many PMs under‑utilize these channels .
  • Onboarding and handoffs that accelerate impact

    • Playbook: Outgoing PM creates an offboarding doc; new PM spends a week as a fresh user (self‑serve/onboarding recording), then reviews questions—often surfaces non‑obvious issues .
    • Enablement: Provide dev/QA access day one; document commitments in one place .
  • What hiring managers look for (Wise examples)

    • Outcomes and complexity: Show numbers and impact (not fluff); experience handling complex domains; strong UX (comprehension & emotion) and analytics skills as the role demands .
  • Personal trait to cultivate

    • Product obsession: Many gifted product people “enjoy obsessively thinking about their product” outside work; sustained curiosity compounds judgment .

5) Tools & Resources

  • Learn and implement AI evals

    • Watch/listen: Step‑by‑step walkthrough (YouTube/Spotify/Apple) covering error analysis, code‑based evals vs. LLM‑as‑judge, and why 4–7 evals suffice .
  • Free, unlimited micro‑surveys (Crazy Egg Surveys)

    • What: One‑question, in‑context surveys; unlimited surveys/responses; designed to unblock continuous learning .
    • Why: Fastest path to the “why”; fixes funnels quickly .
  • GDPval: task‑based evals across 44 occupations

    • Use: Benchmark models on real‑work outputs (docs/slides/sheets/diagrams) to inform product integration and staffing strategies .
  • Reforge: AI Productivity (October cohort)

    • Learn: Using prototypes effectively for discovery; practical AI productivity patterns .
  • Integrations for AI features (Merge)

    • Platform: 220+ integrations; normalization, ACL sync, observability; “Agent Handler” (closed beta) to securely connect agents to third‑party tools .
  • Product Hunt launch checklist (field‑tested)

    • Tactics: Drive all traffic to the PH page, never ask for upvotes (ask for feedback/comments), stagger promotion, engage all day, and avoid vote‑buying (algo penalties) .
  • AEO and founder‑led newsletter for early growth

    • Tips: Optimize for answer engines (FAQs/Q&A/entities) and ship a useful founder‑led newsletter every two weeks; measure replies over likes .

“Evals help you create metrics to measure your AI application and improve it with confidence.”

Building in the AI Era: Agent Design, Ops Leverage, and GTM That Works
25 September 2025
10 minutes read
Shreyas Doshi Shreyas Doshi
Lenny Rachitsky Lenny Rachitsky
Melissa Perri Melissa Perri
+13
Actionable PM intelligence for the AI era: agent design frameworks, product-ops leverage, GTM that converts, case studies from Wise, Fyxer, and Remedial Health, plus career strategies and ready-to-use tools.

Building in the AI Era: Agent Design, Ops Leverage, and GTM That Works

Big Ideas

  • AI-native product management (what changes, what doesn’t)
    • Why it matters: AI accelerates engineering; product becomes the bottleneck without stronger strategy, evaluation, and UX rigor

Andrew Ng: With AI making software engineers much faster, product management is increasingly the bottleneck.

substack.com

. PMs must manage AI as a resource and pair data diagnosis with creative solution design .

  • How to apply:

    • Treat AI like any feature: define outcomes, evaluate value, and measure impact vs. vanity adoption .
    • Use “diagnose with data, treat with design” as a loop for prioritization and solutioning .
  • Agent design as a product discipline (not just prompting)

    • Why it matters: Agent UX introduces new tensions—user control vs. autonomy, transparency vs. simplicity, and power vs. predictability—requiring new patterns, testing, and metrics .
    • How to apply:
      • Build progressive trust: shadow → suggestions → supervised autonomy → full autonomy for narrow, well-understood tasks .
      • Pre-design error handling: confidence expression, escalation triggers, and robust recovery/rollbacks .
      • Choose the right integration pattern (augmentation, parallel, gateway, federation) based on org readiness and systems landscape .
  • Product ops > tools: model reality, reduce friction, measure leverage

    • Why it matters: Overloaded rituals and tool sprawl slow teams. High utilization is a perverse game; value throughput wins .
    • How to apply:
      • Anchor on four organizational “graphs”: context, intent, collaboration, investment . Keep intent as “bets,” presented as a fabric (not a top-down cascade) .
      • Spend 60–80% of product-ops effort on pain-killing (copy/recontextualize automation, fewer decks) to free cognitive bandwidth .
      • Reframe prioritization for finance with a simple leverage score and coarse allocation “chips” instead of resource Tetris .
  • AI GTM fundamentals (what’s actually working)

    • Why it matters: Distribution advantages are neutralized (everyone’s on mobile); adoption and brand move fast, but budgets unlock only when products “just work” with near‑instant ROI .
    • How to apply:
      • Use pilots and forward-deployed engineers to achieve “working in production” before signature—pilots still convert .
      • Favor bottom-up proof over top-down mandates; personal AI usage often precedes enterprise success .
  • Market entry: vertical product > horizontal infra (for most teams)

    • Why it matters: Incumbents dominate enterprise infrastructure; switching costs are high. Technical merit alone loses to GTM and relationships .
    • How to apply:
      • Start where you have vertical unfair advantage (e.g., fashion e‑commerce with computer vision), even if TAM is smaller—positioning improves and customers can switch .

Tactical Playbook

  • Discovery at AI speed: “vibe coding” without skipping product fundamentals

    • Why it matters: You can get high-fidelity, interactive prototypes in hours; use that speed to validate, not to ship prematurely .
    • How to apply (steps):
      1. Start in the problem space; separate problem from solution .
      2. Move up fidelity: sketch → wireframe → clickable mockups → live prototype; validate before coding .
      3. Generate UI from screenshots/Figma; paste annotated screenshots directly into the tool to target edits .
      4. Version safely: bookmark good states; fork after ~20–30 messages; “reroll” because outputs are probabilistic .
      5. Test in waves of 5–8 target users; synthesize, iterate .
  • Micro-surveys that unblock conversion (analytics tells “what,” one question finds “why”)

    • Why it matters: Tiny, targeted prompts can reveal friction that dashboards miss—and fix rates, costs, and signups fast .
    • How to apply (questions + fixes):
      • Pricing page: “What information is missing before you choose a plan?” → Clarify tier differences; +10.5% “pricing clarity” and higher conversion .
      • Signup page: “What almost stopped you from signing up today?” → Simplify password rules; explain “what happens next”; signups increased without more traffic .
      • Campaign landing page: “What are you looking for today?” → Align ad and page intent; conversions up, CAC down .
  • Paid pilots that actually convert (and when “free” is okay)

    • Why it matters: Pilots de‑risk enterprise sales; but free pilots create misaligned incentives.
    • How to apply:
      • Charge for pilots (credit toward contract); run weekly stakeholder check‑ins to ensure usage and iterate quickly .
      • Cultivate a user “evangelist” to push buying decisions up the chain; don’t rely on owner-only sponsorship .
      • Use marquee logos carefully; avoid losing your shirt on “name-only” deals; recruit more clients in parallel .
      • Rule of thumb: require “skin in the game” unless the pilot is total automation with no user burden (then pure R&D can be defensible) .
  • Design an AI SDR program (what to train, what to measure)

    • Why it matters: AI agents can scale outbound with far greater testing and signal-scraping than human SDRs—if you train on value, not past emails .
    • How to apply (framework):
      • Train on personas and value-add; don’t upload past “good” emails .
      • Let the agent scrape live signals (LinkedIn, site, news) to personalize .
      • Run multivariate sequences; many wins need 3 quality emails; expect ~1 month of training + daily tuning .
      • Auto‑ingest and recrawl your content (site, RSS, YouTube) to stay current .
      • Track a simple run: e.g., 5,216 messages → 136 responses → 52 positive intents; 1,206 leads attributed in one campaign .
  • Agent design starter checklist (from research to guardrails)

    • Why it matters: Trust and safety must be designed up front in agent products.
    • How to apply:
      • Pick an interaction pattern (dashboard controller, delegation workspace, augmented canvas) that matches the workflow .
      • Define trust stages, success thresholds, and escalation triggers before code .
      • Express confidence and provide progressive, role-aware explanations; include counterfactuals and references .
      • Instrument feedback loops (implicit + explicit), visualize learning to users, and measure feedback-loop velocity .
      • Test behaviorally (routine, edge, adverse, regression) and monitor drift in production with human sampling .
  • High‑signal stakeholder communication (fast alignment, less thrash)

    • Why it matters: Repetition is alignment; format and framing speed decisions.
    • How to apply:
      • Prepare three decks (IT, business, exec) for the same initiative; adjust detail/emphasis . Publish a press‑release‑first draft to clarify narrative and scope .
      • Standardize templates; solicit feedback on formats; teach sales how to pitch (not the internals), and use visuals/diagrams liberally .
      • Guard your backlog: “no data = no place” and reserve ~10% every cycle for tech debt; time‑box heavy paydowns when needed (~80% for a quarter) .
      • Own risk trade‑offs: choosing features over fixes means you answer for outcomes; don’t expect overnight heroics from engineering .

Case Studies & Lessons

  • Wise: shipping a mission, not slides

    • What happened: Wise anchors on “Money without borders—instant, convenient, transparent, and eventually free” ; prices at cost + marginal market pricing with no hidden fees . It split international transfers into two local rails to achieve speed , moved ~£150B last year with 70% of same‑currency transfers instant; average price ~1.57 and trending down .
    • Decisions & outcomes:
      • High‑amount volatility: partnered with treasury (hedging/safeguarding), learned the domain, adjusted guaranteed-rate behavior .
      • KYC backlog: paused service, went “side-by-side” with agents, did the job personally, then fixed flow before reopening .
      • AI feature: invoice/screenshot → auto-populated recipient/transfer; live for 41/50 currencies with >95% quality .
    • Takeaways: Mission → pricing → architecture → ops → product; PMs own details and consequences (“never introduce risk without understanding its consequences”) .
  • Fyxer: consumer-to-enterprise AI playbook at speed

    • What happened: Grew from $1M to $17M ARR in eight months (team ~40), then raised a $30M Series B .
    • Playbook:
      1. Dead-simple onboarding (presets; credit card to trial) .
      2. Influencer volume: DM LinkedIn signups; ~20–30 signups/post; paid ~$1k/post when warranted .
      3. Performance marketing as a DTC brand: 200+ ads live, 100–150 new creatives per sprint; channel mix Meta/YouTube/Google . PM drives ~50% of new growth; 90% of payers stay 3+ months .
      4. Refine to business users: work-email signups are ~10× more valuable; optimize cost‑per‑work‑email .
      5. Continuous AI improvements + reactivation: founder‑style human emails perform best; re-open trials; acknowledge eval regressions .
    • Pricing: $30/$50 per user/month; avg ~$41; enterprises request usage‑based pricing—being explored .
  • Remedial Health (Nigeria): fix the whole chain, then localize and train

    • What happened: Addressed fragmentation, authenticity/fraud, logistics, and financing; sourced from manufacturers, built regional hubs, offered credit, and partnered with HMOs .
    • Adoption levers: localized the app (Hausa) to improve inclusivity—~32% adoption lift in ~6–7 months; invested in hands‑on training for lower digital literacy contexts .
    • Takeaways: Don’t digitize a broken system piecemeal; combine tech + ops + financing + partnerships .
  • Market entry (search): go vertical when infra is locked up

    • What happened: Enterprise search infra is dominated by incumbents with high switching costs; technical superiority often loses to distribution/relationships .
    • Takeaway: Pick a vertical where your tech creates unfair advantage (e.g., fashion e‑commerce) and accept a smaller TAM for higher winnability .

Career Corner

  • Upgrade your AI PM skills (and avoid “hopium”)

    • Why it matters: Your org will likely demand an AI roadmap; most fail by “sprinkling AI,” tech‑first fantasies, or copy‑paste strategies .
    • How to apply: Follow Vision → Strategy → Roadmap; choose among 8 roadmap types/templates; use data to prove signal before scale .
  • Getting hired (positioning and pathways)

    • Recast founder work: it’s real PM—present outcomes, leadership, and metrics .
    • Reduce “new grad” bias: remove graduation date; some recruiters default to early-career buckets .
    • Break in via PO/analyst roles: internal promotions > resumes for the first PM job .
    • The market is tight: network and pursue referrals; blind apply response rates can be <2%; internships build brand signal .
  • How PMs win day-to-day (communication and ownership)

    • Storytelling is the job: tailor message per audience; visuals beat walls of text .
    • Facilitate decisions: PMs “do the unglamorous work” and enable the org to decide; don’t add processes that increase cognitive load .
  • Hiring signals from Wise

    • What they look for: problem-solving under complexity, strong UX (comprehension + emotion) for front‑end teams, and analytics literacy; CVs must show numbers and outcomes .
    • Entry paths: product academy and internal transfers from Ops/CS/Analytics/Engineering .
  • Mindset (ego off, receipts on)

    • Humility unlocks growth—and separates winners from ego preservation

I regret to inform you that personal growth rarely comes from acquiring new knowledge and always from periods of intense humility (i.e. your ego finally relenting)

x.com
  • Critical thinking = radical honesty (with receipts) .

Tools & Resources

  • AI roadmap toolkit and course

    • What: 6‑page “Do you need an AI roadmap?” assessment; 8 roadmap types/templates; Vision → Strategy → Roadmap process .
    • Why/use: avoid hopium, pick the right roadmap type, and articulate value coherently . AI PM certification (by Miqdad Jaffer, OpenAI) is recommended by Aakash and includes a cohort discount .
  • Prototype faster (and safer)

    • Dan Olsen’s free PRD template to scope effectively before generation .
    • Magic Patterns (screenshot → UI), Cursor (IDE copilot), and selective “build/chat/edit” modes to control changes .
    • Recruit testers quickly with userinterviews.com .
  • Micro-surveys you can ship this week

    • Three one‑question prompts and where to place them (pricing, signup, campaign LPs) plus examples of impact .
    • Heads‑up: Crazy Egg is rolling out a free way to run these in‑page surveys .
  • Communication staples

    • Amazon 6‑pager guide for deep decisions . Aha! market research primer (market landscape, competitors, customers) .
    • Press‑release‑first as a shared narrative framework .
  • Lightweight growth infra

    • Getform for unlimited-subscriber email signup pages (free plan, double opt‑in included) .
    • EmailDetective.io to validate emails in a hand‑rolled flow .
  • Free help with board decks

    • Experienced founder offering free support: build your deck, narrative, agenda, and blind‑spot review; raised $40M+ prior; DM to engage .

Appendix: Quick Reference Metrics

  • Wise: ~£150B moved last year; 70% of same‑currency instant; avg price ~1.57 .
  • Fyxer: $1M → $17M ARR in 8 months; performance marketing ≈50% of new growth; 90% of payers stay 3+ months . Work-email signups worth ~10× personal; cost‑per‑work‑email is the KPI .
  • Micro-surveys: +10.5% “pricing clarity” after copy fixes; signups improved after simplifying password rules and clarifying next steps .
  • AI SDR run: 5,216 messages → 136 responses → 52 positive intents; 1,206 leads attributed .
  • Remedial Health: ~32% adoption lift after Hausa localization (6–7 months) .
Delight that Drives Revenue, Focus on the Core, Momentum-as-Moat, and AI Agents that Work
24 September 2025
9 minutes read
Teresa Torres Teresa Torres
Lenny Rachitsky Lenny Rachitsky
SaaStr AI SaaStr AI
+13
Actionable PM intelligence: delight that drives revenue, focusing the core, momentum-as-moat in consumer AI, buy-vs-build for agents, and portfolio thinking. Includes step-by-step tactics, case studies (Peloton, Shopify, SaaStr), career frameworks, and a vetted toolset.

Big Ideas

1) Product delight as a measurable growth engine

Why it matters: Emotionally connected users are more likely to buy, stay, and recommend; multiple studies show retention, revenue, and referral can double with emotional connection . AI is accelerating functional delivery, but not the emotional side—without intent, you risk “robotic” products .

How to apply:

  • Design on three pillars: remove friction, anticipate needs, exceed expectations (surprise + joy) .
  • Segment by motivations (functional and emotional) to target why users come to you .
  • Allocate roadmap capacity using 50/40/10: ~50% core functional, 40% deep-delight (functional + emotional), 10% surface touches .
  • Operationalize with a 4‑step loop: identify motivators → convert to opportunities → map on the Delight Grid (emotional vs functional) → validate using the Light Excellence checklist (inclusion, business alignment, measurement, harm avoidance) .
It's about the impact of emotional connections on product adoptions. And it's really funny because I read four of them and there've been a consensus, a very clear consensus. The consensus is that emotionally connected users are more likely to buy product from you, to stay longer with you and to recommend your products. So retention, revenue and referral are doubled with emotional connections. So, and these three metrics are key.
www.youtube.com

2) Focus the core—and break process deliberately

Why it matters: “The main thing is the main thing.” Many teams under‑invest in the core while chasing side bets; big gains often remain in the core if resourced properly . At Peloton, a separate app (StrengthPlus) shipped faster by breaking process intentionally .

How to apply:

  • Resource allocation: explicitly balance innovation vs scaling vs mature products; check if more growth remains in the core and fund accordingly .
  • Monthly “what to break” review: set a calendar reminder to ask where to move faster and what process to break, then pitch changes with clear business impact .
  • Guard against “beautiful orgs” that slow you down—re‑scope teams to the biggest opportunities .

3) In consumer AI, momentum beats moats

Why it matters: Models and infra shift weekly; old playbooks (paid, SEO, hacks) are less effective. For now, momentum is the moat .

How to apply:

  • Build fast learning loops. Run weekly micro‑surveys at points of friction—ask one real question and ship based on the answer—to reduce guessing and maintain momentum .
  • Normalize failure documentation; teams that write down failures compound learning faster .

4) AI agents: buy when you can, train every day

Why it matters: Most AI agent failures stem from poor training or no data ingestion. There is no “set and forget” today .

How to apply:

  • Buy > build for 99% of teams; build only where no off‑the‑shelf tool exists .
  • Vendor choice: prioritize onboarding and data compatibility over slick demos; bail if they can’t ingest or help you onboard .
  • Train for weeks; then monitor outputs daily and keep a human‑in‑the‑loop .
  • Expect mailbox warm‑up (2–3 weeks) for outbound agents .

5) Crypto as a product substrate (and AI enabler)

Why it matters: Chains are now cheap and fast; stablecoins enable global, programmable dollars. Entire payment/finance subsystems can be re‑written into compact smart contracts, opening 10× product opportunities .

How to apply:

  • Find 10× acceptance/lending/trading flows to “translate” into smart contracts (lower cost/complexity) .
  • Case in point: Shopify’s acceptance logic (escrow, fees, refunds) was distilled to ~1,000 lines of smart contract code after nine months .
  • Design for multiple stablecoins and composability in payments and wallets .
  • Crypto can also verify AI outputs and serve as native rails for agent‑to‑agent transactions .

6) Manage bets like a portfolio

Why it matters: Evaluate success at the portfolio level, reward productive failure, and empower early‑career contributors. Excess conservatism yields too few advances per dollar .

How to apply:

  • Centralize consistent review; empower domain leaders to shape portfolios against strategy .
  • Incentivize early‑career pathways and mentorship in funding decisions .
  • Track outcomes at the portfolio level (health impact/translation), not just per‑project hit rates .

Tactical Playbook

A) Reactivate dormant trials before chasing cold leads

Why it matters: Reactivation is often cheaper and more effective than net‑new cold acquisition .

Step‑by‑step:

  1. Segment dormant trials/churned users and ask why they left; review usage patterns .
  2. Send short, personalized emails/calls—no generic “come back” blasts .
  3. Fix root causes (bugs, missing features), and offer targeted incentives (free month, training, priority support) .
  4. Run consistently and measure reactivation revenue uplift .

B) Communicating tech‑debt work in business terms

Why it matters: Framing debt as “cleanup” gets deprioritized; leaders respond to business impact .

Step‑by‑step:

  1. Quantify the chain: debt → bugs → complaints → churn; translate into dollars .
  2. Get sales to publicly commit a revenue impact for the competing initiative (e.g., API) .
  3. If a pivot is mandated, be upfront with engineering, share the “why,” and secure a hard start date for the debt sprint .

C) Sprint planning and estimation that scales

Why it matters: PMs maximize value; engineering estimates effort .

Step‑by‑step:

  1. Pre‑plan with tech lead: clarify goals, walk tickets, negotiate scope .
  2. Write stories with acceptance criteria and a feature‑level flow; refine with the whole team .
  3. Size work (t‑shirts/points) with engineering; slice vertically so each ticket is independently deployable .
  4. Pull parallel, non‑blocking stories up to velocity; move lowest‑value items to next sprint .

D) ICP = Pain + Buying Power + Urgency; in mature markets, distribution wins

Why it matters: Broad targeting wastes cycles; distribution often beats product in established spaces .

Step‑by‑step:

  1. Define a narrow ICP; validate who actually converts (update as reality diverges) .
  2. Prioritize segments actively seeking alternatives now (urgency) .
  3. Dominate one persona before expanding; better to own 80% of a small market than 2% of a big one .
  4. Build distribution advantages early (channels, partnerships, repeatable reach) .

E) Launching/steering social or community products

Why it matters: Trust is fragile; early community dynamics determine survival.

Step‑by‑step:

  • Avoid fake content; users detect it and trust collapses .
  • Start in a niche with real engagement pain; partner with micro‑influencers; use invite‑only + founder attention to create FOMO and evangelists .
  • Seed transparently (cross‑posting), host expert AMAs, and be patient; premature “growth hacks” kill products .

F) Handle PII from day one

Why it matters: Costs and risks balloon if deferred; small issues scale .

Checklist:

  • Budget real compliance costs (example: ~$68k initial audit; ~+$25k SOC2) and ongoing audits .
  • Define breach comms, RTO targets, and cadence up front .
  • Expect added complexity across regions (e.g., GDPR) .

G) AI evals: know if your AI actually works

Why it matters: Prompts/orchestration aren’t enough; you need continuous, reliable evaluation .

How to apply:

  • Use golden datasets, synthetic data, and real‑world traces; identify error modes and turn them into evals .
  • Choose code‑based evals vs LLM‑as‑judge based on context; maintain evals to prevent criteria drift; align with guardrails + human oversight .

H) Make delight practical (and safe)

Why it matters: Deep delight drives outcomes—and misfires can harm users and brand.

How to apply:

  • Use the 4‑step delight process and 50/40/10 mix .
  • Review anti‑delight risks (e.g., deceptive notifications, inappropriate reactions) in your checklist before launch .

Case Studies & Lessons

  • Peloton: When the core app became overwhelming, a new app (StrengthPlus) with a fresh codebase shipped and scaled faster—because the team broke process to move quicker . Takeaway: protect core focus and be willing to isolate innovation paths.

  • SaaStr’s 20 AI agents (9 core):

    • Outbound SDR agent sent ~15,000 hyper‑personalized messages in ~100 days; expect 2–3 weeks mailbox warm‑up and design 3‑email + 1‑LinkedIn sequences .
    • Inbound “Amelia AI” books meetings automatically; deep Salesforce/Marketo integration boosts buyer intelligence and conversion .
    • CRM hygiene automated: call summaries, attaching contacts, stage moves, weekly activity reports .
    • Lessons: train/monitor daily; start with smaller lists and spot‑check; vendor onboarding quality matters .
  • Shopify x Base: Decades of merchant acceptance logic (escrow, fees, refunds) re‑implemented as ~1,000 lines of smart contracts—an order‑of‑magnitude simplification . Takeaway: seek “translation” opportunities where programmable platforms collapse complexity.

  • Delight—wins and hazards:

    • Removing friction: Uber’s two‑click refund lowered stress and increased trust .
    • Anticipating needs: Revolut’s eSIM for travelers—surprise utility in the moment .
    • Celebrating earned achievement: Airbnb confetti on Superhost renewal .
    • Anti‑delight: deceptive “missed call” promos or auto‑celebrations in sensitive contexts create harm; run inclusion/harm reviews .
  • Fundraising narrative: Avoid “shiny object syndrome.” Pick your strongest product, tell one story, and present others only as experiments—especially for tier‑1 investors . A focused pitch avoids looking uncommitted .

  • Users ≠ customers: Sttabot’s reported metrics—20K+ visitors, ~8.3K paid, ~1K LTDs—underline why to separate users from payers and to avoid misleading metrics framing .

Career Corner

  • Master the 15 prototyping skills (Apprentice → Journeyman → Master): prompting/editing/tools, versioning/debugging/customer validation, and ultimately technical editing/functional prototyping/hand‑offs/product shaping . Use this to self‑assess and plan growth.

  • Interview prep that maps to reality:

    • Behavioral interviews dominate (~85%)—build tight narratives and frameworks .
    • Be data‑driven: when asked “DAU dropped 15%,” don’t guess—diagnose with data before prescribing fixes .
    • Use a structured roadmap: estimation frameworks, product sense/design/strategy/metrics/system design; tailor for roles at Google, Meta, OpenAI, etc. .
  • Habits that differentiate:

    • Weekly micro‑survey rhythm at friction points; ship based on answers .
    • Document failures to compound learning .
  • Managing energy and fit:

    • If motivation dips: set hard boundaries and prioritize ruthlessly; try a new org before leaving the craft; create small “joy” projects .
    • PM is a leadership role—take the high road, share credit, and shape perception through consistent, outcome‑based communication .
  • If you’re non‑technical: get help hiring/managing engineers via an advisor or fractional CTO . (Counterpoint: some advocate a technical co‑founder—pick the model that fits your context .)

Tools & Resources

  • Magic Patterns (AI prototyping for PMs/designers): generate front‑end prototypes with your component library; ideal for fast concept validation and hand‑off (workflow: PRD → prototype → share with engineering) .

  • Gamma (AI docs/decks): turn meeting notes and PRDs into polished decks; create custom prospect decks in ~10 minutes; track views for internal sharing .

  • Descript (AI video/audio): edit like a doc to produce launch videos and design walkthroughs without pro editing skills .

  • Mobbin (design reference): benchmark flows, import best‑in‑class patterns to unblock design decisions .

  • Replit + Warp (build and automate faster): vibe‑code production‑ready apps with Agent 3 (200+ minutes autonomous build/test/fix), and use an AI terminal to handle complex tasks via plain language .

  • Linear (tasks/roadmaps): modern issue and roadmap tool with agent integrations for first‑pass ticket work .

  • Wispr Flow (voice transcription): fast, context‑aware dictation that learns your acronyms across devices .

  • Cursor + GPT‑5‑Codex: coding‑optimized model now in Cursor—useful even for PMs as a power tool .

  • AI evals primer (Teresa Torres): practical overview of eval methods (golden/synthetic/real‑world traces, code‑based vs LLM‑judge) and continuous maintenance .

  • Lenny’s Product Pass (curated bundle): hands‑on guides and access to tools across building, collaborating, making it beautiful, and getting more done .

“Product management isn’t running a stopwatch against a scrum team. It’s making sure we’re not sprinting full speed off a cliff building the wrong thing.”

PM Playbook: Contextual Discovery, Platform Metrics, AI Security, and Vibe Coding
23 September 2025
9 minutes read
Exponent Exponent
Product Growth Product Growth
Lenny Rachitsky Lenny Rachitsky
+10
Actionable PM intelligence: contextual discovery over dashboards, platform metrics that matter, AI security as a PM skill, the rise of vibe coding, and a field-tested playbook for prioritization, stakeholder management, and AI product QA—plus real case studies, career moves, and tools.

Big Ideas

  • Contextual discovery beats dashboards

    • Why it matters: Analytics tell you what happened, not why. The fastest way to uncover causality is to ask one focused question at the exact moment of friction (pricing page, signup hesitation, cart abandon). Teams have reported learning more in two days from in‑context questions than months of reports, and a single pricing-page fix drove ~30% conversion lift .
    • How to apply: Place micro‑surveys on key moments (“What’s missing before you pick a plan?” “What almost stopped you from joining?”). Keep it one question, answers delivered instantly, and iterate on the top patterns .
  • Platform products: measure differently, invest deliberately

    • Why it matters: Not every investment shows immediate ROI, and platform metrics (reliability, performance, security, velocity, costs) are one step removed from topline KPIs. Big “behind‑the‑scenes” bets can be your biggest levers for pricing, profitability, ecosystem moats, and velocity .
    • How to apply: Agree on outcome hypotheses early, use leading indicators for long‑horizon bets, and share metric ownership across R&D and GTM so platform value connects to business results .
  • AI security is a core PM competency

    • Why it matters: Attacks have shifted to users and identity; >80% of breaches now target identity. AI makes it trivial to clone login pages and voices, and to scan APIs for vulnerabilities .
    • How to apply: “Red team” your own product with AI before launch; default to passkeys over passwords, set carrier PINs, and freeze credit files. Use AI to accelerate the work (80/20), not replace judgment, and only where it’s warranted—security incident response needs deterministic steps .
  • Design for “weightlessness”

    • Why it matters: Products feel magical when they reduce cognitive/physical load and disappear—there when needed, gone when not. That usually means faster perceived responsiveness (e.g., time‑to‑first‑token) and fewer demands on attention .
    • How to apply: Prioritize first‑token latency and remove micro‑frictions that make the tool noticeable. Use before/after moments to validate that the product “feels lighter” to users .
  • Data‑driven = progressive de‑risking

    • Why it matters: You rarely get perfect, statistically significant data on schedule. Qualitative signals still count; the goal is to size the first step to your confidence and iterate .
    • How to apply: Start with lower‑risk experiments informed by imperfect data, then keep evaluating signals (qual + quant) as you scale up .
  • Momentum beats vision

    • Why it matters: A half‑working thing in real users’ hands yields better learning than a perfect plan .
    • How to apply: Ship a thin, working slice, observe behavior (micro‑surveys + session recordings), and iterate quickly .
  • From no‑code to “vibe coding”

    • Why it matters: The toolchain is shifting from classic no‑code (Airtable, Retool) to new AI‑assisted builders (v0, Lovable, Bolt, Replit), changing how quickly teams can prototype and ship .
    • How to apply: Audit your build stack against these tools and skim v0’s State of Vibe Coding to spot opportunities to speed up concept‑to‑test cycles .

Tactical Playbook

  • Discovery: ask one question at the point of pain

    • Steps: Identify 3 high‑leak moments (pricing, signup, checkout) → trigger a single open‑ended prompt (“What almost stopped you from signing up?”) → ship fixes on the top two themes → re‑ask to confirm lift .
    • Why: You’ll uncover the “why” that analytics miss, often revealing high‑ROI copy, packaging, or flow fixes .
  • Validate before building: fail cheap

    • Steps: Put up a landing page with a clear offer and a “Buy Now”/waitlist to test demand; preorders are a legitimate validation tactic. Talk to customers first and build late when possible .
    • Why: Small, skewed surveys aren’t enough; real intent signals de‑risk scope and recruiting claims .
  • Reduce funnel friction

    • Steps: Audit steps between value and payment; remove any post‑signup re‑checks; run session recordings to observe unexpected drop‑offs .
    • Why: Every extra step between value and payment kills conversion .
  • Backlog to decisions: a simple operating system

    • Steps: Centralize feedback (Jira Product Discovery/Asana), bucket by theme, attach customer metadata, and tag by Strategic Theme, Impact, Effort. Use a Now/Next/Later roadmap and close the loop on “no/later.” Consider a monthly “Feature Shark Tank” with Sales/Field to pitch top items .
    • Why: Keeps the team aligned to business outcomes vs. an ever‑growing black hole of requests .
  • Prioritize by the primary KPI, then dedupe with AI

    • Steps: Pick one primary KPI (e.g., churn, acquisition, reviews), score items with RICE/weighted impact, and use AI clustering to dedupe 100s of requests .
    • Why: Focus on what moves the metric; reduce noise and rework .
  • Customer‑voice portals with purchase intent

    • Steps: Let customers (and Sales/CS by proxy) vote on ideas, but filter for buyers—remove voters who won’t buy. Track who requested what to enable targeted follow‑ups .
    • Why: Prioritizes signals from accounts that translate to revenue .
  • Stakeholder management: say no with data and clarity

    • Steps: Tie requests to user/business impact; tactfully explain trade‑offs with data; avoid glorifying heroics and micromanagement tendencies .
    • Why: Prevents impossible missions and builds trust; focus on outcomes over activity .
  • Tech debt vs. sales asks: a 5‑step playbook

    • Steps: 1) Quantify metrics moved by the sales request vs. tech debt (CSAT, retention, delivery speed) 2) Check pipeline for committed partners 3) Frame debt as business value (speed, fewer outages), propose iterative slices 4) Communicate change as reprioritization with rationale 5) Align EM and engineering leadership goals to carry the debt agenda .
    • Why: Balances near‑term revenue with long‑term velocity and reliability without breaking trust .
  • AI product QA: observability → evals → monitoring

    • Steps: 1) Manage prompts like code (version, params, replay, A/B) 2) Log LLM requests and multi‑step traces via SDK/proxy/OTEL 3) Evaluate quality with code‑checks, LLM‑as‑judge, and human labels; sample live traffic 4) Build datasets and run continuous monitoring to catch drift .
    • Why: Converts LLM behavior into measurable, improvable performance instead of hopeful release cycles .
  • AI UX: optimize time‑to‑first‑token

    • Steps: Track TTF‑token separately from total latency; design interaction to show useful partial output early; consider infrastructure patterns that improve perceived speed .
    • Why: Users prefer faster first token even if total response is slightly longer .

Case Studies & Lessons

  • Quiz app concepts: features don’t fix retention

    • What happened: Commenters highlighted that “fun quizzes” without a clear outcome suffer churn, the space is crowded (Duolingo/Quizlet/Anki/Kahoot), and B2C is saturated and hard .
    • What worked: Aim at a concrete outcome (e.g., exam prep), partner with institutions (mandated use drives retention), and consider freemium for learners while monetizing B2B buyers (HR/schools) despite slower sales cycles .
    • Pricing insight: Consumers are more open to a one‑time ~$2.99 than $10/month for general learning; in‑game currency without real‑world value was rejected .
    • Gaps/opportunities: Accessibility and cultural relevance are under‑served niches .
    • Takeaway: Pick a niche with a clear outcome and institutional distribution; don’t assume gamification will carry usage .
  • Micro‑surveys in the wild: a 30% conversion lift

    • What happened: Teams deployed a single onsite question at the moment of hesitation (e.g., pricing), found that a confusing headline was blocking purchases, and saw ~30% conversion lift after fixing it .
    • Takeaway: Ask one question at the right time—in context—and act on patterns quickly .
  • Bootstrapping lessons: don’t over‑hire or over‑spend on ads

    • What happened: Early cheap FB mobile ads drove the first 50k users for one team, but another team became over‑dependent on paid acquisition and early full‑time hiring, which increased burn and complexity .
    • Takeaway: Maintain revenue streams before cutting others, invest in compounding channels (content/SEO/partnerships), and operate lean (AI tools + contractors) .
  • AI infra strategy: ship fast, manage blast radius (NVIDIA/Oracle)

    • What happened: NVIDIA cut features to ship faster (e.g., adding tensor cores to Volta just months before tape‑out) and cultivated first‑time‑right silicon to accelerate ramps; modeling data‑center MW → chips → rental revenue enabled forecasting (e.g., Oracle/OpenAI) .
    • Operational reality: Reliability challenges in large coherent GPU domains increase failure blast radius; realized TCO depends on uptime and scheduling, not just perf/$ .
    • Takeaway: Move quickly where possible, but design for operational failure modes and end‑to‑end economics.
  • System design patterns for marketplaces (Uber Eats interview)

    • Highlights: Start simple (DB search), then add Elasticsearch for low‑latency reads; ensure payment idempotency (client‑generated keys) and use synchronous flows; keep one driver per order; stream driver location with thresholds; geo‑shard; scale as independent services .
    • Takeaway: Validate that the design meets each functional/non‑functional requirement, and articulate trade‑offs as you scale .
    • Watch the interview:

Career Corner

  • Redefine “finished” for PMs

    • PM deliverables are clarity, alignment, and direction; celebrate small wins (alignment on a hard decision, unblocking the team) and validate impact by demoing shipped work .
  • Have a point of view (and conviction)

    • New/mid managers: check your belief in the direction; acting as a “soldier” without conviction undermines execution and inspiration .
  • Influence > authority—own the outcomes

    • Great PMs act as the business voice to engineering and the technical voice to the business, optimizing for value delivered (not just shipped). Strategic PMs function as “mini‑CEOs,” accountable for profitability, fit, and adoption .
  • Move from QA/IC roles into PM

    • Pathways: QA builds ops rigor, product knowledge, and analytical thinking—strong entry points—but learn discovery and strategy. Multiple practitioners report higher salary trajectories after moving to PM; try the role if you can .
  • Guardrails for healthy teams

    • Avoid glorifying heroics and late‑night “saves”; don’t lead by metrics alone—pair quant with qualitative insight .
  • If you’re stuck doing engineering

    • If orgs keep pulling you back to build instead of PM’ing, it may block your growth—seek a role with clear PM expectations .

“we get paid to create and execute ideas that matter”

Momentum beats vision. A half-working thing in people’s hands is worth more than a perfect plan in your head.

x.com

Tools & Resources

  • AI Observability & Evals (hands‑on guide)

    • What you’ll get: A stacked approach—prompt management, observability (requests/traces), evals (code/LLM/human), datasets, and monitoring—with a 30‑minute chatbot example .
  • AI security masterclass (Okta’s VP of PM & AI)

    • Covers: Cloned login pages/voices, API attack surface, and practical protections (passkeys, phone PINs, credit freezes); includes tactics to use AI to attack your app before attackers do .
  • PRD generator prompt (free)

    • Why useful: A tested prompt that consistently produces high‑quality PRDs using ChatGPT; no paid tools required .
  • JIRA Dashboard how‑to

    • Resource: Configure widgets from the All Work view (video walkthrough) .
  • Trending AI feature breakdowns (for PM planning)

    • Why useful: Shows how complex AI features were actually built, why certain technical decisions were made, and how they affected scalability/UX—each with implementation paths and pitfalls .
  • State of Vibe Coding

    • Resource: Overview of the shift from no‑code to new AI‑assisted builders; scan for faster MVP options .
  • Lenny’s Product Pass

    • What it is: A bundle of premium tools (e.g., Linear, Lovable, Replit, n8n, Perplexity) included with an annual subscription; codes are limited/first‑come .

Quick Metrics & Signals to Watch

  • 3:1 maintenance vs. creative work; 83% do most important work after hours; 93% cite collaboration issues; 76% believe AI can help—use “say no,” timeboxing, and AI‑in‑tools to reclaim focus .
  • Retention is a lagging indicator of whether your team is honest about what’s working—watch it closely .

“If you want to know why people leave, or what’s stopping them from buying, you still need to ask the right question in the right moment.”

PM in the AI era: Builder teams, data-as-diagnosis, and shipping what matters
22 September 2025
9 minutes read
Lenny's Newsletter Lenny's Newsletter
Julie Zhuo Julie Zhuo
Lenny's Podcast Lenny's Podcast
+9
Frameworks, tactics, and resources for AI-enabled product teams: manage AI like people, instrument conversational products, balance growth vs infrastructure, fix MVP misconceptions, and explore SMS AI. Includes step-by-step discovery, stakeholder alignment, GTM patterns with measured impact, and career moves like AI PM portfolios.

Big Ideas

  • Managing AI is converging with managing people — and teams are becoming “builders”

Hot take: Managing AI well is extremely similar to managing people well.

x.com

Why it matters: Clear goals, purpose, and process are now required not just for people but for agentic systems; traditional role boundaries (PM/design/eng) are dissolving as AI lets individuals cover more of the stack . Julie Zhuo argues these same manager skills will be essential as “everyone’s about to become a manager—of AI” . How to apply:

  • Define success upfront (write objective evals/tests agents and teams can use) .

  • Decide which skills must live on the team; supplement selectively when a critical skill is missing .

  • Start small, multi‑disciplinary squads that own outcomes and use AI to fill gaps .

  • Diagnose with data; treat with design Why it matters: Data shows where the problem/opportunity is; creative design/experimentation solves it. This prevents “data tells us what to build” traps . For conversational/agentic products, instrument intent and flow quality (not just clicks) and use evals to define success . How to apply:

    • Instrument conversational intents and “flow health” early .
    • Pair quantitative diagnostics with targeted design experiments; keep evals crisp so agents/teams know success criteria .
  • Build a Minimum Viable Infrastructure (MVI) before you go hard on growth Why it matters: It’s harder to win back users after bad experiences than to delay for foundational readiness . Keep deploying for traction, but intentionally choose between blitzscaling (when competition/timing requires) and fast scaling (minimize risk while moving quickly) . How to apply:

    • Define your MVI—what’s the least infra needed to reliably serve customers—then invest to that bar before stepping on the gas .
  • MVP ≠ minimum thought Why it matters: MVPs fail when teams ship crippleware or pile on features by request to justify sunk costs; viability depends on deep customer understanding and focus on the core pain . Lean Startup works when applied as intended—not as a pretext for “build it and they will come” . How to apply:

    • Validate the pain and “must‑have” scope; cut distracting features .
    • Avoid “feature-by-request” accounting; treat requests as hypotheses, not pending customers .
  • SMS + AI is likely the next billion‑user form factor Why it matters: Text is ubiquitous; shipping AI via SMS can drive distribution fast . Example: Lennybot is trained on owned content and is accessible via SMS and web . How to apply:

    • Start with your owned corpus for training .
    • Offer SMS as a primary interface (plus web) to meet users where they are .
  • Investors are risk managers; show proof points that de‑risk your story Why it matters: Angels and VCs evaluate raise sufficiency, team strength, traction, market access, customer validation, scalability, execution, and adaptability . Team quality ranks at the top of many scorecards . How to apply:

    • Package evidence across those markers before fundraise discussions .

Tactical Playbook

  • Continuous discovery that uncovers real problems Why it matters: Most PMs either don’t talk to customers or fall into confirmation bias . How to apply:

    • Talk to 3–5 customers weekly; use 1–2 open‑ended recall prompts to surface pain (e.g., “Walk me through your busiest day…”) .
    • Ground questions in real behavior (see The Mom Test) .
    • Combine quant and qual; use surveys for breadth, interviews/usability tests for depth; Maze can help with early concept tests .
  • Turn conversations into insights and action Why it matters: Insights come from organized data, pattern‑finding, and informed judgment—not tools alone . How to apply:

    • Check readiness: Do you have enough data? Is it organized? Is the right person analyzing it?
    • Establish process before tooling; add tools when the process doesn’t scale .
    • Use ChatGPT to consolidate context and auto‑generate artifacts (prioritized shaping queues, WIP queues); when context limits hit, move source data into Linear and use MCP to integrate workflows .
    • Keep a living knowledge base; create/refresh docs on a cadence so teams can self‑serve (e.g., Helpjuice) .
  • Stakeholder alignment and launch comms that stick Why it matters: Misalignment creates silos that are costly to unwind . How to apply:

    • Run an early joint meeting to preview direction, solicit desired outcomes, and incorporate feasible input; repeat with an almost‑baked plan before execution .
    • Over‑communicate asynchronously (Slack/email/shared doc). Keep a traceable log you can reference if pushback arises .
  • GTM outreach that earns attention (and conversions) Why it matters: Buyers respond to relevance and value; covert tactics erode trust . How to apply:

    • Lead with the prospect’s challenges (consultative first contact); don’t pitch yourself out of the gate .
    • Give before you get: events, webinars with ICP‑relevant guests, useful assets/lead magnets .
    • For cold outbound, be direct, short, and diagnostic (“Is X a pain?”). Warm intros and engaging less‑solicited teammates can open doors; avoid deceptive “partnership” pretexts .
    • Borrow what works in ads: lead with pain, then your obvious fix; evoke emotion, call out competitors, and create FOMO where appropriate. One practitioner increased CTR from 0.8% to ~2.3% by adopting these patterns .
    • Early consumer apps (<$1k–5k MRR): use a hybrid of promoting high‑performing posts plus organic content (e.g., multiple TikTok accounts) rather than relying solely on paid .
  • Early‑stage build: when “vibe coding” is a smart bet Why it matters: Cash is usually the binding constraint; ship revenue‑driving features fast, then harden . How to apply:

    • At ~€750 MRR you’re at an inflection point; prioritize features that move acquisition/retention and aim for €2–3K MRR to afford proper dev help .
    • Use quick, minimal implementations when they let you ship customer‑requested value in days and demonstrably grow subscriptions; avoid complexity that doesn’t impact retention/acquisition .
    • Note the trade‑offs: shortcuts can create scaling headaches; acceptable if the app is simple with limited long‑term scope .
  • Prioritization from user workflows (evidence you can show) Why it matters: A bottom‑up list of real workflows beats opinion battles. How to apply:

    • Interview 10 actual users (not managers); capture their top workflows, then ask a larger group to rank features/tasks to create an evidence‑backed plan .
  • First 30–60 days in a new PM role Why it matters: Early momentum compounds. How to apply:

    • Listen, build relationships, and dive into the product before pushing work .
    • Learn customers’ “why,” the product itself, and the commercials (model/finance/sales/marketing); understand how decisions are made and prioritization works .
    • Structured ramp: Weeks 1–2 shadow CS and attend Eng standups; Weeks 3–4 rotate through Sales/CS and stress‑test staging; then set focus based on findings .
    • Go for quick wins in your first 90 days; The First 90 Days is widely recommended .
  • Measure what matters Why it matters: Shared understanding of metrics accelerates decisions. How to apply:

    • Learn definitions, data access paths, and fields/filters; build your own dashboards to move fast .
    • For conversational products, update your analytics to bucket user intents and assess conversation flow quality .
  • Ship release assets faster Why it matters: Turning features into docs/changelogs/updates can eat half a day; automation can reclaim time. How to apply:

    • Draft release assets from a walkthrough video (e.g., cliptokit.com); one founder reports reclaiming a few hours per week .

Case Studies & Lessons

  • LinkedIn ad patterns that lifted CTR ~3x What happened: A practitioner saved 138 high‑performing LinkedIn ads, analyzed patterns, and improved CTR from ~0.8% to ~2.3% after adopting the approaches . What worked: Lead with pain; then make your solution the obvious fix. Use emotion, competitor callouts, and FOMO judiciously . Resource: The shared collection link .

  • Availability‑first travel planning Problem: Curated itineraries often ignore real availability (e.g., fully booked venues, appointment‑only tastings) . Product lesson: Show “what I can do” (filtered by real‑time availability) in addition to “what I should do,” and handle waitlists/ticketing automatically .

  • Don’t sell ideas; demonstrate differentiation with a working product Context: Founders pitching flexible API rate limiting assert differentiation vs NGINX/gateways/CDNs through per‑user buckets and instant VIP overrides without redeploys . Lesson: Buyers don’t buy ideas; reach out with a product teams can try/buy. In crowded categories you must demonstrate something technically astonishing to displace status quo .

  • Fundraising dynamics to internalize Risk lens: Angels/VCs manage risk across team, traction, access, scalability, execution, and adaptability; more proof points → lower perceived risk . Signals: Team tops many investor scorecards . Advisors who won’t invest may be a red flag; some investors view pay‑to‑apply models as ecosystem‑degrading and poor deal‑flow filters .

Career Corner

  • Create an AI PM portfolio (fast) — and publish it where it converts Why it matters: Only 18% of PMs have a portfolio, so it differentiates you from 4/5 candidates; inbound LinkedIn drove 76/124 offers for one mentor’s mentees . How to apply:

    • Vibe‑code a portfolio in ~30 minutes; publish on the open web, atop your resume, and in LinkedIn’s Featured section (many don’t use it) .
    • Track profile visits and visit→message conversion .
    • Case: A mentee vibe‑coded a portfolio and landed an AI PM role at Meta (TC: $675K) .
    • Guide: The AI PM Portfolio (how to vibe code; 30‑min/8‑hour/40‑hour templates; common mistakes) .
  • Breaking into PM / leveling up

    • Classic transitions: Analyst → Product Owner → PM; QA to PM is possible—coding skills help .
    • Role flexibility: PM/APM can be tailored; discuss scope with your manager when pivoting from design .
    • Resume titles: One commenter notes you’re under “basically no obligation” to list exact historical titles; an example retroactively aligned titles to Product Marketing Manager .
  • Mindsets that sustain you

    • Don’t expect the journey to feel “worth it” every day; push through the bad days and judge progress “on balance” .
    • Leadership is hard; every problem lands on your desk, and optimism still matters .
    • Beware “focus inflation”: calling a Zoom day “locked in” lowers the bar for real deep work .

Tools & Resources

  • Lenny’s Podcast: Julie Zhuo on managing AI like people — watch for:

    • Why AI will make everyone a manager (08:41) .
    • “Diagnose with data, treat with design” (32:02) .
    • Dissolving role boundaries → “builders” (11:38) .
    • Feedback that lands (script at 57:49) .
  • Masters of Scale: Growth vs infrastructure — when to blitz vs fast‑scale; why MVI matters .

  • Vibe‑coding your PM portfolio — full guide with templates .

  • Discovery and research

    • Maze for early concept testing; combine surveys, interviews, and iterative testing .
    • Insightlab to synthesize interviews over time .
    • Public backlog options (Trello/Jira) vs. private steering groups under NDA—choose based on expectation‑setting/competitor risk .
  • Team knowledge and planning

    • ChatGPT for shaping queues, WIP agendas, and learning curricula; move source data into Linear and use MCP when context limits apply .
    • Helpjuice (or similar) for an up‑to‑date knowledge base .
  • Comms and release ops

    • cliptokit.com drafts release docs/changelogs/Slack updates from a walkthrough video; reported to save a few hours/week .
  • SMS AI form factor

    • Lennybot is trained on newsletter/podcast content; access via SMS +1 (877) 537‑9455 and web (lennybot.com) as a concrete example of SMS AI distribution .

“Team will always be at the top of investors’ scorecards.”

“There was a time when being locked in meant you were deep in the work… We’ve lowered the bar on what focus actually is.”

PM in the AI era: strategy-as-context, evals, and execution that ships
21 September 2025
9 minutes read
Productify by Bandan Productify by Bandan
Masters of Scale Masters of Scale
Lenny Rachitsky Lenny Rachitsky
+6
What matters now for PMs: make strategy machine‑readable, master AI evals, ship faster pre‑MVP, and navigate hiring constraints. Practical playbooks, real case studies, and job‑market tactics—plus curated tools to upskill in AI PM.

Big Ideas

1) Strategy as a context layer for AI

  • Why it matters: AI accelerates execution but has no inherent sense of direction; without feeding strategy into AI, teams drift—shipping technically valid work that doesn’t advance the roadmap .
  • How to apply:
    • Codify strategy in machine‑friendly structure (problem, audience, differentiation, value) so AI tools can consume it .
    • Distribute a reusable “context pack” (prompts, templates, system instructions) across engineering, marketing, sales, and support so every assistant operates with the same compass .
    • Govern it like a dataset—refresh as positioning and segments evolve; ensure teams reference the latest version .

2) Evals are becoming a core PM skill

  • Why it matters: OpenAI’s CPO emphasized that evals are now a core PM competency, and the topic runs deep .
  • How to apply (5 steps):
    1. Bootstrap ~100 diverse traces; prioritize quality via aggressive filtering .
    2. Open coding: label failure modes, find the first upstream failure; continue to theoretical saturation .
    3. Structure failures into binary categories; emphasize “Gulf of Generalization” issues .
    4. Build automated evaluators: code‑based checks plus LLM‑as‑Judge with clear pass/fail and structured outputs .
    5. Deploy the improvement flywheel: integrate evals in CI/CD, monitor bias‑corrected success rates, and loop Analyze → Measure → Improve .

3) AI PM is “eating” traditional PM

  • Why it matters: AI PMs build AI features/models and use AI to optimize their own workflows; as most software embeds AI, companies upskilling teams in AI PM practices are outpacing those clinging to older approaches .
  • How to apply: Upskill across strategy/PRDs, discovery, prototyping, architecture (agents/RAG/fine‑tuning), and observability to stay competitive .

4) Focus your AI product on one must‑do

  • Why it matters: Spreading early efforts across everything AI can do yields demos; depth on one essential capability yields products .
  • How to apply: Identify the single user problem your AI must solve, build deeply until it’s durable, and measure adoption before expanding scope .

Builders can’t resist the act of building. But the hard lesson is that building isn’t enough. Until you stand close enough to your customers to feel what matters to them, every line of code risks becoming wasted motion.

x.com

5) Ship fast, learn faster (pre‑MVP)

  • Why it matters: Pre‑MVP, most answers are “no.” Teams that release fast to validate demand pivot sooner and waste less time .
  • How to apply: Accept technical debt to reach customers (wizard‑of‑oz operations, tiny regression tests), keep process minimal pre‑revenue, and prioritize selling/learning over polish. One founder example shipped zero‑to‑sale in 8 weeks for a $100K ARR ticket—then iterated hard .

6) Hiring in a constrained visa environment

  • Why it matters: Many startups hire locally or offshore and do not sponsor visas; H‑1B supply is capped (80k), tech captures ~60%, ~75% go to existing employees, and the lottery success rate is ~1/3, with renewals required in 3 years .
  • How to apply: Default to domestic/remote/EOR (e.g., nearshore, platforms like Deel), and reserve sponsorship for truly exceptional hires (consider O‑1); plan months ahead if you do sponsor .

7) Compete by controlling the controllables

  • Why it matters: Competitors and funding cycles are outside your control; what you can control—product, customer happiness, funding—drives outcomes .
  • How to apply: Treat competition as market validation; move faster to serve unmet needs. Ask, “Are we actively losing deals to them?” If not, they’re not a real competitor .

Tactical Playbook

Discovery that de‑risks: from “hair‑on‑fire” to presales

  • Why it matters: Surveys and casual interest ≠ demand; launching before understanding users leads to churn .
  • How to apply:
    1. Hunt “hair‑on‑fire” problems users pay to solve .
    2. Validate research‑to‑pay conversion—what % of your research pool paid? .
    3. Run cheap tests: landing page + ads → collect signups → call everyone; treat ghosting as failed validation .
    4. Presell via demo before prototyping to avoid building what no one wants .
    5. Use 2‑day design‑thinking sessions with customers/stakeholders to narrow scope and produce artifacts engineers can act on .

Execution that ships: minimal process, maximal learning

  • Why it matters: Heavy process slows early teams without users .
  • How to apply:
    • Pre‑MVP mantra: “Make minimal, ship fast, sell, fix later” .
    • Keep testing lightweight (small regressions, wizard‑of‑oz ops), invest more after revenue signals .
    • Avoid over‑spec’ing; continuously talk to customers—even pre‑MVP .

Release readiness without blocking

  • Why it matters: PMs must balance risk with momentum.
  • How to apply:
    • Distill features that are off‑strategy, confusing to early adopters, not MVP, or pose regulatory risk; propose cutting or hiding in UI while shipping backend, enabling later .
    • Present a requirements testing doc that defines must‑pass vs can‑wait criteria; let performance against agreed standards drive go/no‑go .
    • Flag risks and propose workarounds; when leadership won’t move dates, minimize exposure and keep learning going .

PM–Eng collaboration on AI features

  • Why it matters: Role boundaries blur with AI; mis‑ownership slows iteration .
  • How to apply:
    • PM sets outcome requirements and evals; engineering owns prompt iteration (PM collaborates with context/examples) .
    • Iterate prompts cross‑functionally when outputs are off .
    • For high‑accuracy inference at scale, staff with data scientists/AI scientists rather than only an EM .

Team scaling, autonomy, and accountability

  • Why it matters: Adding people can reduce velocity; early‑stage PMs must be autonomous .
  • How to apply:
    • Plan precisely what a new dev will do, how they’ll integrate with the current team, and how they’ll ramp on the codebase (Brooks’ Law) .
    • Empower teams with high autonomy and clear guardrails to avoid micromanagement of day‑to‑day tasks .
    • When performance lags, realign expectations, use a documented PIP, and—if gaps persist—move on quickly .

Pricing and freemium you can trust

  • Why it matters: Users accept limits; they churn on surprises.
  • How to apply:
    • Ship a usable free tier with clear upgrade triggers (Calendly: free 30‑min meetings; paid for additional lengths, co‑hosting, SSO) .
    • Be upfront about caps; “limits are fine, just be upfront about them” .
    • Avoid “rug pulls” (e.g., retroactive caps after advertising “forever free”); messaging mismatches erode trust .
    • Alternative: time‑boxed free trial (2–4 weeks), then subscription .

Marketplace activation (courier/travel example)

  • Why it matters: High friction and unclear rewards kill activation.
  • How to apply:
    • Validate traffic quality; treat bot‑heavy channels and general subs with caution .
    • State earning potential clearly (e.g., “pay for a night out on your trip”) to prevent churn after low payouts .
    • Reduce onboarding friction if expected reward is small .
    • Address comparisons to incumbent alternatives (e.g., UPS) head‑on in positioning .

Visuals that convert

  • Why it matters: Studio‑quality product shots stand out, build trust, and lift clicks/sales .
  • How to apply: Upgrade catalog images (before/after proof assets help); categories include jewelry, home decor, clothing .

Case Studies & Lessons

Traveler‑courier marketplaces: trust and legality first

  • What happened: Transactions often rely on friend‑of‑friend trust; travelers accept only fresh‑from‑store items, open/search packages (“no sealed bags”); an app must first solve trust and tariff friction to add value .
  • Additional signals: Insurance/liability concerns are a primary adoption blocker; coordination costs (meeting both sender/recipient, flight delays) and low payouts deter couriers .
  • Takeaway: Don’t scale acquisition until legal/insurance and trust mechanisms are solved and payout/value propositions are explicit.

Speed over polish wins early revenue (enterprise example)

  • What happened: A team shipped “some of the worst code” but hit zero‑to‑sale in 8 weeks at a $100K ARR ticket size; they scrapped codebases five times before finding a sellable product .
  • Takeaway: Time‑to‑learning beats early code quality—optimize for paid signals, then refactor.

Apparel GTM: niche beats “aesthetic for everyone”

  • What happened: Paid social alone struggled against low‑cost giants; teams that focused on a specific niche avoided being drowned out by copycat content .
  • Takeaway: Start narrow, own a segment, then expand.

Hardware sizing: listen for form‑factor preferences

  • What happened: A user preferred a physical keyboard extension over a huge display and would “just buy the small one” .
  • Takeaway: Offer distinct variants anchored to real usage preferences; validate whether the spotlight SKU is the one customers actually buy.

Fractional execs: mixed results by function

  • What happened: Founders reported limited value from fractional CFOs and skepticism on fractional COOs due to lack of day‑to‑day context, while a fractional CISO model worked well (stakeholder calls, pen testing, yearly tech audits) .
  • Takeaway: Use fractional roles where context transfer is tractable (security); for finance/ops, consider targeted accounting/FP&A support or a fractional specialist for board reporting/budgets/pitching, with a plan to transition in‑house .

Competition and funding: focus on the controllables

  • What happened: Community consensus: competition expands markets and signals value to investors; a single seed round elsewhere is unlikely to determine your success; ask whether you’re actually losing deals to them .
  • Takeaway: Double down on product, customer happiness, and funding—what you can control .

Career Corner

The PM job market: tight, long, and interview‑heavy

  • Signals: “The market is unusually tight”; candidates report 40–50 interviews over 10 months, 24 interviews at ~5% conversion, or needing 38 interviews for a single offer .
  • How to adapt:
    • Apply the PM process to your job search; ask recruiters/interviewers for feedback .
    • Prioritize referrals—they go a long way vs. cold applications .
    • Calibrate targets (consider a step down, broaden beyond hyper‑competitive brands) .
    • Expect freezes even at smaller firms; persistence pays .

Skills to prioritize in the AI era

  • Build competence in prompt/context, evals, AI architecture, and observability; leverage structured guides and tutorials to upskill .
  • Consider free or cohort programs to formalize learning and signal currency in AI PM .

Resume and portfolio signals (PMM‑to‑PM crossover relevance)

  • Lead bullets with metrics; reframe “sales collateral” as end‑to‑end enablement mapped to the funnel; highlight analyst influence, GTM activation impact, thought leadership partnerships, customer references (named when allowed), and any category creation contributions .

Domain knowledge vs. PM competency (what hiring managers say)

  • Mixed views: some prioritize domain expertise; others hire for core PdM skills (execution, commercial acumen), especially outside highly regulated domains .
  • Tactic: target roles where your profile clearly matches the search; avoid mass‑applying to every “product” title .

Transitioning from engineering to PM

  • Smoothest path: internal transition if possible; if not, move to a more flexible company as an engineer first, then into product; build PM skills and apply for entry‑level PM roles in parallel .

Influence over authority: the PM reality

  • PMs own outcomes without CEO‑level authority—build political capital, invest in internal/external relationships, and escalate thoughtfully to unblock work .
  • Executive override is real; plan for it and keep teams moving .

Tools & Resources

  • AI PM resource hub (Aakash Gupta): strategy/PRDs, discovery, prototyping, analytics/observability, prompt engineering, evals, agents/RAG/fine‑tuning; includes job‑search/portfolio guidance and a free certificate/cohort options .
  • Starter stack: Google Workspace ($5–$10/mo) for email infra, video calls, and Google AI Studio for lightweight creative assets; expand tooling only as needed .
  • Dual‑track Agile overview and search terms to deepen your process toolkit (SDLC, dual‑track planning, release best practices) .
  • Book: The Making of Prince of Persia—journal entries on building under constraints; a recommended read for founders/PMs .
  • Leadership/audio: Masters of Scale—on developing entrepreneurial mindset, problem framing (problem → customer → GTM → TAM); useful prompts for discovery workshops
  • Brand front door: consider conversational entry points—“ChatGPT is the new front door for your brand” .

Appendix: quick heuristics

  • Two‑pizza rule as team size guidance (or simply 5–10 people) to reduce alignment costs .
  • Don’t obsess over perfect handles or domains; prioritize SEO/discoverability—“the interface to a company is Google” .
  • When a competitor raises money, ask if you’re losing deals to them; if not, recalibrate rather than panic .
AI workflows, org operating systems, and hiring shifts shaping PM practice
20 September 2025
9 minutes read
The Product Compass The Product Compass
Product Growth Product Growth
Perspectives Perspectives
+10
AI workflows, org operating systems, and hiring shifts lead this week’s PM intelligence—plus a tactical playbook for comms, discovery, retros, and managing up, with case studies (Mercor, Chrome AI Mode, disputes), career guidance, and tools to adopt now.

Big Ideas

  • Treat your environment like a product
    • Why it matters: Founder style becomes the hidden operating system that shapes decision speed, accountability, and truth flow. What energizes a 10‑person team becomes failure at 50 if you don’t rebuild the environment .

A startup is not a family. It’s not even a team. It’s an environment that either amplifies or suffocates the people inside it. Founders who scale treat environment as the product. They test it, rebuild it, and ship new versions relentlessly.

x.com
  • How to apply: Watch for signals of an outdated OS—decisions slowing, priorities colliding, truth surfacing slowly—and rebuild processes and decision rights before swapping people . “Chaos at ten people is culture. Chaos at fifty is failure.” “Scale punishes founders who cling to the style that worked at ten people.”

  • Org coupling is a strategy choice, not a moral one

    • Why it matters: Companies oscillate between modes (e.g., Federated Islands → Hands‑Off Gridlock → Locked Grid → Command Towers) as dependencies and macro cycles shift. Each box has costs; sometimes you need more vertical coupling to survive heavy horizontal coupling . Front‑line teams often experience dense dependencies even when leaders feel “empowered” .
    • How to apply: Diagnose coupling by cost, not labels. Use ruthless prioritization to manage dependencies; accept temporary passes through “worse” boxes while you re‑architect .
  • AI agents as PM force-multipliers (workflows > autonomy)

    • Why it matters: Only ~2% of PMs are using agent workflows even as GTM teams scale them; the upside is a 10x productivity gain on repetitive PM work . Most “agents” that work in production are predefined workflows with AI steps; true autonomy is brittle except for simple, low‑stakes tasks .
    • How to apply: Map repeatable work, implement workflow steps with human review where stakes are high, and pick models by task (e.g., GPT‑4o for research/dossiers, Claude Sonnet 4 for high‑quality writing, Gemini for long PDFs) .
  • Distribution beats innovation—until it doesn’t

    • Why it matters: Incumbents can flip the board by placing AI where users already are (e.g., Chrome’s AI Mode in the address bar; contextual suggestions on the page you’re viewing) . Rollout focus started in the U.S. .
    • How to apply: Prioritize distribution steps (placement, defaults, entry points) in your roadmap alongside innovation, and measure both.
  • Hiring and team strategy are shifting

    • Why it matters: Community reports point to multiple, sometimes opposing dynamics—remote/offshoring continuing or accelerating, calls to “hire local first,” O‑1 for top 1% talent, and emphasis on training native pipelines . Startups vary widely in H‑1B reliance (from rare to 25–50% in some cases) .
    • How to apply: Plan for mixed sourcing: local hiring and upskilling for core teams; remote/offshore with clear quality gates when needed. Be explicit about trade‑offs (tech debt, time zones, code quality) .
  • PM roles are evolving in the AI era

    • Why it matters: Leaders are consolidating roles and asking which work is essential as AI lets one person do what took teams; competition for roles is intense (1,814 applicants for a single PM opening cited) . Craft (empathy, judgment, asking the right questions) remains irreplaceable; AI is a lever, not a substitute .

As my friend Li Fan says, “You will not be replaced by AI. You will be replaced by someone who knows how to leverage AI better than you.”

debliu.substack.com
  • How to apply: Build AI leverage while investing in craft, courage, and community to stay essential .

Tactical Playbook

  • Agentize your week (start small, scale safely)

    • Steps:
      1. Map repetitive tasks (bug/issue digests, competitor tracking, stakeholder updates) .
      2. Choose a platform (Zapier/Relay/Lindy, or n8n/Make) .
      3. Build a 12‑agent “EA” for meeting briefs, competitor price alerts, and follow‑ups; start from a template and up‑level agents over time .
      4. Enforce human‑in‑the‑loop where stakes are high (review matrix) .
      5. Schedule cadence (e.g., Fri=competitive research; Mon=support summaries) and use digests to manage notifications .
    • Why it works: Predefined workflows succeed far more often than autonomous agents; triggers in your existing tools (e.g., calendar) reduce friction .
  • Ship product updates that get used (not ignored)

    • Steps: • Gate with “value evidence” questions: What user value does this deliver, and what evidence supports it? . If B2B, confirm you’re reaching the right roles; set objective (internal buy‑in vs external adoption) . • Communicate benefit‑first: say what changed for the user; omit irrelevant technical details . • Prefer in‑app, contextual nudges; measure impact with telemetry; expect email opens to be low and declining . • Build a user community (e.g., Slack) via update emails to recruit vocal testers and fast feedback . • WhatsApp templates: run language/compliance checks with GPT; avoid “marketing” reclassification; keep human review for tone and politics .
  • Discovery that keeps your backlog small

    • Steps (intake→validation→delivery): Central intake with auto‑tags → trend analysis → hypothesis with verbatims → solution + success metrics → validate (interviews, fake doors, surveys) → prototype/scope → score/plan in Kanban. Don’t pick up work until validated .
    • Additions: Use feature–product fit surveys when adoption is weak . Do your own research—observe competitor promos, then talk to their users . Use The Mom Test for interviews and let ChatGPT organize notes and surface insights .
  • Lightweight retros that drive change

    • Options: Jira retros ; Confluence Start/Stop/Continue with 5 minutes silent writing and action items, plus past‑actions/velocity review ; FigJam with rotating moderators and a 5‑step agenda .
    • PM involvement: Varies—from “fly on the wall” to leading in immature orgs; PM ≠ Scrum Master .
  • Managing up with clarity and calm

    • Playbook: Make strategy one sentence + two outcome metrics; review via weekly run charts . Tie every new bet to a “stop list” (focus is what you stop) . Define decision rights, deadlines, and record dissent + learn logs .
    • Practice: Use empathetic, concise weekly updates linking to a single progress doc; the aim is reducing surprise and signaling control; present as cool, calm, collected .
  • Curating UGC across touchpoints without losing your brand

    • Why it matters: Embedding UGC into product pages, emails, and in‑app can boost authenticity and engagement—if it aligns to the product story .
    • How to apply: Define guardrails for brand voice and relevance; instrument to measure impact; balance authenticity, engagement, and control .

Case Studies & Lessons

  • Mercor: turning the “evals” bottleneck into a business

    • What happened: AI labs needed human experts to create evals; Mercor connected labs with lawyers, doctors, and engineers ($95–$500/hour), now working with 6 of the Magnificent 7 and all top 5 labs; zero churn; NRR ~1,600% .
    • Takeaway: Look for system bottlenecks around AI (e.g., evals, data ops) and solve them operationally.
  • Distribution in the address bar

    • What happened: Chrome added AI Mode directly to the address bar with contextual suggestions; rolling out in the U.S. .
    • Takeaway: Treat placement as a feature; meet users where intent already lives.
  • “Forever free” that isn’t

    • What happened: Product advertised 100GB free; account enforced 100GB + 300 asset cap; user perceived it as bait‑and‑switch, trust eroded, and they switched. Open question: how to keep free tiers generous without burning early adopters .
    • Takeaway: Don’t undermine trust with hidden caps; publish constraints clearly; protect early adopters.
  • Payment disputes: price in the friction

    • What happened: For small disputes, Stripe reverses payment, keeps processing fee, charges dispute and counter‑dispute fees; likely automated in favor of cardholder. One operator treats it as a cost and raises prices elsewhere .
    • Takeaway: Model dispute/refund leakage explicitly; optimize thresholds for contesting vs writing off.
  • Offshore/onshore trade‑offs

    • What happened: Teams report technical debt and coordination costs from outsourcing, often onshoring later; offshore talent can be inexpensive but top local devs in those markets are costly and less available .
    • Takeaway: If you outsource, bound it to non‑core modules, enforce code quality gates, and account for rework.
  • Product data model at scale (Spotify DS interviews)

    • What matters: DS embedded with PM and UX; emphasis on experimentation, A/B tests, and causal inference; presentations should be high‑level first, tie to company goals, and end with next‑step recommendations as if you already work there .
  • Fashion DTC stack that prints revenue (with caveats)

    • What happened: Teams report CRM cost cuts (~70%) moving to modern CRMs; Klaviyo automations can drive ~30% of revenue; SMS campaigns see 20–30% CTR; real photoshoots beat AI content; avoid over‑tooling early .
    • Takeaway: Invest in lifecycle automation and authentic content before adding tool sprawl.

Career Corner

  • Compete by leveraging AI, not replacing craft

    • Reality: Roles are consolidating; leaders are testing “builder” capacity; one PM role drew 1,814 applicants . Craft (empathy, judgment, asking the right questions) remains central; “You will be replaced by someone who knows how to leverage AI better than you.” .
    • Actions: Prototype faster with AI, then refine ICP, UX, and pricing (value vs cost) .
  • Hiring signals are evolving

    • Practice: Some assess candidates by having them build with AI tools in an hour; PMs who can do more with AI will be well‑positioned .
  • Meetings, politics, and job fit

    • Guidance: Meetings/politics are unavoidable; make meetings high‑value and empower PMs; if you want to build hands‑on all day, the role may be misaligned .
  • Leading product (HoP) expectations

    • Table stakes: Vision/strategy articulation, stakeholder alignment, metrics/outcomes, market knowledge, and partnering well with engineering. Tactics include insulating teams, being transparent about goals, and iterative target‑setting .
    • Managing up: Crucial skill—anticipate issues, align incentives/decision rights, and signal control .
  • Navigating broken orgs

    • Tactics: Define product vs project (“product is a promise plus a learning loop”), tag roadmap items by outcomes vs outputs, run short cross‑functional “next bet” rooms, and ask in every review who owns adoption and what was learned this week . If toxic, start looking early; market remains tough .
  • Sustainable pace > 100‑hour weeks

    • Lesson: Output ≠ outcomes; chronic overwork erodes judgment and life; choose to work smarter or move on .
  • PMM progression

    • Tip: Speak your promotion goals openly—if you don’t sell yourself, no one else will .

Tools & Resources

  • Aakash Gupta’s 10‑step agent playbook for PMs (podcast + templates)

    • Why it matters: Concrete steps to 10x repetitive PM work; early adopters already run marketing with dozens of agents .
    • How to apply: Start with 3–5 tasks, use templates, keep approval loops for high‑stakes comms, and chain agents into workflows .
  • Product Compass: no‑code/AI builder stack

    • Why it matters: Delivery that took 2–3 sprints can be done in 1–2 days; example: 6 hours to ship a video‑course MVP .
    • How to apply: Use the curated stack (Supabase, Pinecone, Lovable/Replit/Cursor, Netlify/Vercel, Clerk, GrowthBook, PostHog/Clarity, Logtail, Grafana, n8n/LangChain, LangSmith, Stripe) and the hands‑on templates (RAG chatbot, voice agent, multi‑agent research system) .
  • Usersnap (beta positioning): PM collaboration for discovery→planning

    • Highlights: Auto‑tagging, precise segmentation/targeting, hypothesis with verbatims, context‑grounded solutioning, “score differently” (ICP impact, novelty, frequency, urgency, effort), validated‑only roadmap, and close‑the‑loop comms. Public upvoting available for teams that need it .
  • Boost Toad: 2‑minute feedback widget

    • Why it matters: Fast installs increase signal; one user reports it surfaced two major bugs quickly .
  • Free workshop: User Story Templates That Actually Work

    • When/what: Sep 30, 2 PM CEST; structured story formats, acceptance criteria, DoD checklists, workflow integration; free with registration .
  • Feedback framing by Julie Zhuo

    • Why it matters: Turns gut reactions into actionable critique. Ask: journey, desired outcome/feeling, importance, scope/timeline/team, confidence vs current, what to remove, whether legacy constraints still apply .

“Clarity is the closest thing to truth in business.”

PM Intelligence: Evals-as-PRDs, the Three‑Speed Problem, and AI‑First Execution
19 September 2025
9 minutes read
Acquired Acquired
Mind the Product Mind the Product
Lenny's Podcast Lenny's Podcast
+10
Actionable PM intelligence: why evals are the new PRDs, how to keep discovery and GTM in sync with faster build, when to use small domain models, and how to organize AI‑first teams—plus case studies (Stack Overflow, Mercor, Shopify), career moves, and tools you can use now.

Big Ideas

1) Evals are the new PRDs for AI products

  • Why it matters: When “the model is the product,” evaluations define what “good” looks like, guide research, and double as sales collateral to prove capability to customers .
If the model is the product, then the eval is the product requirement document.
www.youtube.com
  • How to apply:
    • Have domain experts author rubrics (like professors grading work) that enumerate required behaviors and scoring criteria .
    • Use concrete examples/“unit tests” as success checks and post‑training signals; favor reinforcement learning from AI feedback for scalable improvement .
    • In enterprises, build evals around your core value chain to measure automation impact end‑to‑end .
    • Maintain a reproducible “eval harness” (e.g., a folder of prompts with expected/judged results) and run it on each model iteration to track progress .

2) The three-speed problem: build velocity outpaces discovery and GTM

  • Why it matters: Code will ship ~10x faster, but customer listening and go‑to‑market won’t speed up automatically—creating gridlock and a flood of unvalidated features if you don’t adapt .
  • How to apply:
    • Stand up a customer listening system: consolidate ~10 feedback channels into one corpus, auto‑synthesize, and ask outcome questions (e.g., retention/activation); connect to Jira/code so the system drafts tickets/PRDs .
    • Treat “code + annotations” as the new PRD to get to working prototypes faster with engineering .
    • Time‑box your week: a feedback day, an insight day, a building/prototyping day, and a GTM day to keep speed aligned across functions .

3) Explore vs. Exploit in the AI era

  • Why it matters: Most GenAI initiatives fail when leaders apply “exploit” rules (time/budget/ROI) to high‑uncertainty exploration. Use different KPIs and funding for search vs. scale .
  • How to apply:
    • Run an AI disruption risk assessment: ask if AI makes your value prop irrelevant or free, whether you own critical data, and whether AI‑first cost structures undercut you .
    • If you lack data access, prioritize partnerships to avoid being shut out (e.g., health, wearables) .
    • Kill or pivot exploration projects based on learning—not just delivery milestones .

4) AI‑first organizations, context engineering, and decision ownership

  • Why it matters: Mandating AI as a first pass lifts org output and builds institutional evals. Clear decision owners (DRIs) beat consensus, which tends to produce safe, middling outcomes .
  • How to apply:
    • Set an AI‑first norm; review projects on a regular cadence with AI assistance against written product principles .
    • Train teams in “context engineering” (state tasks with just enough context to be solvable); it raises success rates and lowers cost .
    • Assign DRIs for major decisions; use councils to surface proposals, but avoid defaulting to consensus for final calls .

5) Smaller domain models vs. frontier models: choose deliberately

  • Why it matters: Task‑specific small models can match large models on narrow problems and be cheaper; however, hosting overhead, engineering cost, and rapid leapfrogging by frontier models often erase the ROI .
  • How to apply:
    • Benchmark on your exact task(s) and compare accuracy and total cost (engineering + hosting). Prefer smaller models only when accuracy is comparable and control/constraints justify it; otherwise prompt‑tuned frontier models are often “good enough” .
    • Validate demand with customers before building an SLM; large providers and enterprise agreements are rapidly covering many use cases .

6) The AI app wave and talent gravity in SF/LA

  • Why it matters: As foundation layers stabilize, the next wave moves to UX/app experiences. SF/LA density (Tech Week scale) accelerates founder, hiring, and GTM collisions .
  • How to apply:
    • If you’re shipping consumer AI, consider LA’s “cultural product” edge; for platform/infra, leverage Bay Area model/engineering density .

“You budget for headcount. You budget for spend. But the scarcest resource in your company is focused attention.”

“You can’t invent the future if you’re still clinging to how things used to work.”

Tactical Playbook

A) Ship evals like PRDs (step‑by‑step)

  • Why it matters: Evals let you set targets, measure progress, and sell capabilities .
  • How to apply:
    1. Select a high‑value workflow (e.g., contract redlining) .
    2. Recruit an expert to author a rubric (criteria + scoring) and sample answers .
    3. Convert criteria into automated checks (e.g., unit tests) .
    4. Use RLAIF to “hill‑climb” against the eval set for scalable capability gains .
    5. Treat the eval as product spec and sales proof in materials .
    6. Maintain a “Toby Eval” harness and run it on each model/week .

B) Build a continuous customer‑listening engine

  • Why it matters: Prevent fast code from outpacing real customer needs .
  • How to apply:
    • Centralize churn/NPS/bugs/community/sales/AM/support into one bucket and auto‑synthesize patterns .
    • Ask the system targeted questions (e.g., “top retention opportunities”) and push prioritized tickets/PRDs into Jira/code .
    • Add feature‑level feedback buttons; run community panels (Discord/Slack) for quick checks; use AI research modes for rapid pricing scans .

C) Quarterly planning that works (and wastes less)

  • Why it matters: Giant planning sessions are expensive and often unproductive .
  • How to apply:
    • Before the meeting: align on new objectives, refine requirements with business/IT, complete RICE prioritization .
    • In the room: keep it to senior folks + PMs, review the roadmap and tweak—no one should hear anything for the first time .
    • Avoid 80–100 person refinements where people “talk past each other”; bottoms‑up without top‑down direction is a time sink .

D) Form a “shipyard” (or “heist”) team for new AI bets

  • Why it matters: Small, autonomous groups with six core skills move faster on novel product bets .
  • How to apply:
    • Assemble PM, engineering, design, user research, data/ML/prompt engineering, and product marketing; mandate autonomy and clear metrics .
    • For a radical exploration, build on a fresh codebase and empower multi‑hat leaders (Typeform’s “heist team” pattern) .

E) Accessibility triage for launch

  • Why it matters: It’s both ethical and a legal risk area; quick baselines catch high‑impact issues .
  • How to apply:
    • Run Lighthouse/WAVE for a fast baseline (contrast, alt text) .
    • Manual checks anyone can do: keyboard‑only navigation and a screen‑reader pass (NVDA/VoiceOver) .
    • Use SaaS quick‑fixes (WebAbility/AccessiBe) for basics; schedule a formal audit later when funded .
    • Learn ARIA/semantic HTML; use native components to reduce debt; leverage OS tools (screen readers, grayscale) for spot checks .

F) SaaS chargeback resilience

  • Why it matters: Disputes can cost multiples of the original payment and banks often side with the cardholder .
  • How to apply:
    • Prevention: no card until subscription; pre‑renewal reminders; clear statement descriptors; self‑serve cancellation .
    • Evidence: keep exhaustive logs; be ready for outcomes where only customer withdrawal wins the dispute .
    • Pricing: model dispute incidence; consider price increases to offset costs .

G) Native‑feeling AI email assistants

  • Why it matters: Security and adoption hinge on local processing and staying inside Outlook/Gmail; draft replies and summaries deliver immediate value .
  • How to apply: Prioritize on‑prem/local options for sensitive orgs; ship “drafts + summarize long threads” first; integrate where users already work .

H) PM communication hygiene

  • Why it matters: Missed follow‑ups erode trust.
  • How to apply: Triage simple replies immediately; reserve 30–60 minutes daily to clear the rest; attach reminders to the conversation (email/Slack/LinkedIn/WhatsApp) to avoid juggling tools .

Case Studies & Lessons

  • Stack Overflow’s Overflow AI: Four iterations (keyword chat → semantic search → GPT‑4 fallback → RAG with attribution) couldn’t meet developer quality bars; the team sunset the feature and pivoted to licensing its 14M+ Q&A corpus and building benchmarks with SMEs to prove model accuracy gains . Takeaway: Prototype fast, evaluate with SMEs via simple scoring, and sunset when standards aren’t met—then monetize your unique assets .

  • Mercor: Marketplace for expert‑authored evals/training data. Grew from ~$1M to $500M revenue in 17 months; works with six of the Magnificent 7 and all top 5 AI labs; zero churn; NRR 1,600% . Takeaway: Solve the true bottleneck (evals), measure pull with flagship customers, and expand through excellence before scaling sales .

  • Shopify merchant localization: Swapping product images from Malibu beach houses to Parisian apartments (no new photoshoot) tripled sales; modern tools made this technically feasible only recently . Takeaway: Low‑effort, AI‑assisted localization can unlock outsized conversion gains .

  • Chargebacks: A $10 payment ultimately cost $43.95 after disputes; bank sided with cardholder despite evidence . Takeaway: Harden billing UX and model dispute costs into pricing .

  • Exit vs. scale (crypto casino thread): Many buyers will require non‑competes; earn‑outs/clawbacks shift risk; multiples depend on model/moat/regulation; some advise scaling the proven product with an operator before selling . Takeaway: Compare “delegate and scale” vs. “sell with restrictions,” factoring payment structure and regulatory risk .

  • Build vs. buy streaming: Owning an in‑house platform avoids fees and gives control, but can be a money sink; target customers that value sovereignty/local hosting (e.g., universities, select regions) . Takeaway: Tight ICP and TCO modeling are essential .

Career Corner

  • AI PM jobs are growing (e.g., Anthropic PM at $460K, with equity upside pushing total comp toward ~$1M per author’s note) . A 6‑step playbook: background → fundamentals → interviews → strategic applications → ace homework/presentations → negotiate using comps/competing offers . How to apply: Build projects (e.g., Karpathy course), practice AI case/system design, and value equity realistically .

  • Product Ops, done right: Not a dumping ground; when set up as a partner, it brings structure, scalability, and data‑driven process so PMs can focus on UXR/strategy . How to apply: Define scope (tools, integrations, decision cadences); avoid “just bandwidth relief” expectations; concrete responsibilities include enabling Aha!, value scoring, and Salesforce/ADO integrations .

  • Mentorship without dependency: Ask for 15 minutes, be explicit you’re too early to sell, and request blunt feedback on why it will/won’t work; define objectives per relationship and let them run their natural course .

  • Where to point your PM skills: Finance/fintech and healthtech need PMs to get their data house in order for AI .

  • Hiring/networking: SF/LA Tech Week functions like a “startup campus” for meeting cofounders, early hires, and investors at scale .

  • Interview signal: Candidates reveal themselves by the questions they ask—resource on identifying Operators/Craftspersons/Visionaries .

Tools & Resources

  • Strategyzer Playbooks: Step‑by‑step workspaces (e.g., Business Model Canvas as reusable asset), AI helper, and automated customer‑interview flows for jobs/pains/gains; codifies two decades of practice .
  • Lenny’s Podcast — Evals deep dive with Brendan Foody: Why evals are PRDs and sales collateral; how to build rubrics/unit tests and use RLAIF; hiring/NRR lessons .
  • Session intelligence: FullStory/LogRocket can now surface the top 10 issues and draft PRDs—use excess eng capacity for fit‑and‑finish earlier and run more experiments .

  • Knowledge accessibility: eesel AI shows how a simple Slack Q&A bot (Confluence/Drive) can erase repetitive support work .

  • Lead/listening utilities: ParseStream (alerts on relevant Reddit/other threads), Firecrawl+LLM for targeted scraping/export, Trialhook to route context‑rich signup alerts to Slack .

  • Accessibility starters: Lighthouse/WAVE; rapid widgets (WebAbility/AccessiBe) for basics; schedule audits later .

  • One‑page post‑mortem template: Exec summary, what went well, what didn’t, and learnings for next time .

  • PM rituals challenge: Map how you’ll harness new speed; run a continuous discovery experiment with AI this month; write and annotate your first prototype, and hand it to dev—aim to ship within 60 days .

  • Homepage trust blueprint (anti‑hype): A playbook built from raw buyer research to earn trust in the first 30 seconds .

Appendix: Notable quotes

“The best brands get closer to their users with every iteration.”

“No metric matters if you’re ignoring what your best customers actually value.”

PM Playbook: AI workflows that ship, PRDs that stick, onboarding that converts
18 September 2025
10 minutes read
Melissa Perri Melissa Perri
andrew chen andrew chen
Teresa Torres Teresa Torres
+13
Actionable PM intelligence: how to mix deterministic automation with agentic AI, pair prototypes with short specs, fix onboarding and homepages, incubate new products with dedicated squads, and ride the evals/data‑labeling wave—plus case studies, career plays, and tools.

Big Ideas

  • Mix deterministic automation with agentic AI, and keep a human in the last mile

    • Why it matters: Teams that combine exact, cheap, repeatable steps (deterministic) with AI-driven synthesis (agentic) get the best of both worlds, while avoiding AI errors where stakes are high .
    • How to apply: Break workflows into steps; automate CRUD/ETL with deterministic tools and add agentic steps (e.g., generating briefs) where judgment or synthesis is needed; keep a human to verify and act on the recommendation .

PROTOTYPES DO NOT REPLACE PRDS

x.com
  • Why it matters: Prototypes convey UX but omit strategy, full functionality, and non‑UI requirements; short specs and PRDs are still essential to align on strategy, monetization, and backend/security needs .

  • How to apply: Pair AI prototypes with concise specs that cover strategy, edge flows, and non‑UI constraints .

  • Commerce in the age of AI: design for purchase type and fix attribution

    • Why it matters: AI struggles with impulse buys and very high‑consideration purchases; agentic flows are most effective in mid‑consideration, SKU/UPC‑defined categories. Meanwhile, last‑click attribution remains misleading and will get harder with AI intermediaries .
    • How to apply: Segment flows by intent; automate UPC/SKU purchases through agents; instrument multi‑touch attribution from the start .
  • Clarity over speed: make the “why/what/scope/how” explicit

    • Why it matters: Clarity is the fuel of engineering; without it, teams over‑engineer or ship the wrong thing. Leaders should give the why and what, not the how, and redefine “done” to include evidence of impact .
    • How to apply: Set vision, scope, and success criteria; let engineering propose implementation; validate outcomes with users .
  • Fix AI homepages: plain language + real examples build trust

    • Why it matters: In a study of AI tool homepages, 97% of visitors couldn’t tell what the product did, 67% never saw a real use case, and over half left without trusting the page .
    • How to apply: Lead with a simple explanation, show a screenshot and a concrete example/customer story; avoid buzzword mazes .
  • Partnerships are growth engines

    • Why it matters: The right partners extend reach, add credibility, accelerate go‑to‑market, and fuel innovation .
    • How to apply: Prioritize partner types that map to your product gaps (distribution, trust, co‑innovation) and measure partner‑attributed pipeline.
  • Sales enablement platform decisions: adoption and data quality make or break ROI

    • Why it matters: PMM goals include findability, correct usage, and proof of impact in deals; poor CRM data and low seller adoption erode value. Many teams churn tools due to ~single‑digit engagement (; example: only 8% of a sales team logged in before tool was decommissioned ).
    • How to apply: Meet sellers in their channels (Slack/email), avoid multi‑repo versioning, and ensure CRM data quality before leaning on recommendations; recognize “it depends” on your scenarios .
  • Incubate new products with dedicated, cross‑functional teams

    • Why it matters: Protecting zero→one work speeds learning, reduces trade‑off conflicts with the core, and aligns product with GTM from day one .
    • How to apply: Give new products a dedicated squad (PM, eng, sales/SE), qualify‑in the right early customers, and win with one killer differentiator (e.g., performance) rather than feature parity .
  • Expert data‑labeling and evals are a breakout AI opportunity

    • Why it matters: Demand from AI labs for high‑quality evals/labeling has created hypergrowth businesses; example trajectories include $1M→$500M in 17 months and $0→$100M in <12 months .
    • How to apply: Leverage proprietary expert networks; isolate the new business with separate teams/offices to scale focus .

Tactical Playbook

  • Diagnose “0 trials from 500 installs” in one week

    1. Instrument the funnel end‑to‑end (install → open → see paywall → start trial → cancel) with Mixpanel/Amplitude; add session replays (Hotjar/FullStory) .
    2. Run 10–20 interviews watching users onboard live; identify the first‑minute confusion and the “job” they came to do .
    3. Deliver one outcome in <60 seconds before any hard paywall; try a softer trial prompt afterward .
    4. If traffic is low‑intent, pause paid and recruit problem‑aware users via communities/search; ship one onboarding improvement per day and measure stepwise impact .
    5. If users never reach a trial decision, clarify the value prop in the first 30 seconds; consider a free tier → upgrade path .
  • Boost onboarding and adoption

    • Use product tours and A/B test them with tools like Hopscotch or Chameleon .
    • Don’t over‑index on social proof or flashy transitions; they can be noise near signup .
    • Success signal: “Onboarding is working when users start teaching each other” .
  • Write crisp requirements, avoid waterfall drift

    • Use boundary objects (journeys, flowcharts, mocks, rules/edge‑case sheets) across functions; avoid a wall of bullet‑point “requirements” .
    • Time‑box learning with short increments; investigate unclear requirements early and stop pushing them forward .
    • Document strategy, problem statements (what’s in/out), success metrics, and user stories before build .
  • Choose and manage agencies like a product leader

    • Don’t outsource core product; service incentives trend toward saying “yes” to rack up hours .
    • Good partners push back, call out bad plans, and protect budget; verify recent results and who actually does the work; prioritize transparency over price .
    • Inside your team, learn to say “no”—products lose money by saying yes to everything .
  • Make enablement assets usable (and used)

    • Surface the right content where sellers work (Slack/email), avoid duplicate repositories and stale decks, and integrate with CRM only if data quality supports recommendations .
  • Align co‑founder equity with contribution (and protect control)

    • Tie equity to milestones with vesting/cliffs; investors dislike “dead equity” .
    • Keep at least 51% to avoid deadlocks; design performance‑based adjustments over time .
    • Community benchmarks (for context only): proposals like 70/30 or 75/25 when one founder has built the product; use calculators and clear targets to reduce friction .
  • Match channels to margins (paid ads aren’t for $20 AOV)

    • Best‑in‑class CAC often lands ~$30–$60; paid channels work when first purchase is $80–$100 or LTV tops $100+ .
    • Rule: margin dictates channel; low AOV, one‑time products can’t sustain paid distribution .
  • Operationalize AI adoption across the org

    • Run recurring hackathons (every 4–6 months) and weekly “show & tell” to share working examples and raise internal fluency .
    • Define AI baseline skills and role‑specific use cases function‑by‑function; assess candidates with live screen‑share tasks .
    • Mix deterministic and agentic steps; leave last‑mile decisions to humans until error rates are acceptable .
  • Communicate like an executive in 30 minutes

    • Use Pain → Problem → Promise (5 min), then three points each with One‑liner → Analogy → Examples (3×6 min), then a Final story + CTA (4 min). Be ruthless with time .
  • Protect deep work while keeping customers happy

    • Turn off notifications and batch replies in predictable windows; reserve instant responses for paid support levels .
    • Use a VA or a “secretary” persona to negotiate/reschedule politely when schedules get crowded .

Case Studies & Lessons

  • Zapier’s renewals agent (human in the loop)

    • What they built: an agent aggregates usage trends, call transcripts, and support tickets, then recommends upsell/flat/down renewals and drafts emails; all data flows into HubSpot .
    • Outcome: the agent does ~90% of the analysis; the AE verifies/sends (last‑mile human) .
    • Lesson: automate data gathering/recommendation; keep humans for accountability and risk control .
  • Rubrik’s perpetual→SaaS and security repositioning

    • Migration playbook: set a hard date for new logos to buy the new cloud model, then migrate existing cohorts over 2–3 years with exceptions and clear paths .
    • Guardrails: balance SEC/audit constraints and churn risk when changing packaging/pricing and release cadence .
    • Result: subscription ARR grew from ~$100M to ~$1.1B in four years .
  • AI homepages fail users without clarity

    • Finding: 97% couldn’t tell what the tool does, 67% saw no real example/use case, >50% left without trust .
    • Fix: lead with a plain explanation, screenshot, and one credible example .
  • Platformizing government services at Irembo

    • Decision: spin payments into a standalone platform with its own PM, engineers, and brand; require the gov product to be a client of payments .
    • Sales enablement gap: customers and sales weren’t brought along early, delaying value realization .
    • Mobile focus: a super‑app strategy anchored on frequent use (traffic fines) produced ~70% 30‑day retention among car owners .
  • Sales enablement adoption pitfall

    • Example: several hundred licenses provisioned, only 8% of the sales team logged in, leading to churn of the platform .
  • Expert data‑labeling within an existing network (Handshake)

    • Move: leverage a network of experts/students to supply frontier labs; separate teams/offices to focus .
    • Traction: zero to $50M in four months; $100M+ in 12 months .

Career Corner

  • Make your prep visible (and package it well)

    • Example benchmark: 35 applications → 3 screenings → 1 next round; screening calls are progress. Research helps get calls, but interview execution and how you present insights matter .
    • Market reality: hiring outcomes also hinge on luck, timing, and networks; don’t throw out a sound strategy after one miss .
  • Build mentorship and your learning loop

    • Find a mentor; proactive outreach works. Follow credible voices (e.g., April Dunford, Tamara Grominsky, Emma Stratton, etc.) and tap free PMA podcasts/case studies and local meetups .
  • Move internally before paying for certificates

    • Many teams value experience and fit over certifications; consider internal transfers, mentorship, and hands‑on frameworks like Quartz’s six‑stage model from idea→market .
  • Product Ops vs Product Manager (choose intentionally)

    • Product Ops can lead to broader ops/strategy roles and higher comp, but may pull you away from discovery/strategy; if you aim for product leadership, prioritize strategic skill‑building .
  • Write like a PM: clear, concise, data‑led

    • Resources: On Writing Well; The Elements of Style; Pyramid Principle. Aim for audience‑focused brevity; let data tell the story .
  • Follow the evals/data‑labeling wave

    • Demand for evals is surging—“evals” is the #1 course on Maven, signaling strong market interest .
  • Programs & roles worth exploring

    • Entrepreneur First and 50 Years (health/pharma) as selective programs; review eligibility/fit .
    • a16z Speedrun invests via 12‑week cohorts (150+ startups; $180M+ deployed); they’re hiring a PLG GTM partner—study the competencies even if you’re not applying .
  • Mind investor/customer optics

    • Visible political signaling can alienate investors/customers unless core to your niche; many investors prefer professionalism over public politics in fundraising contexts .

Tools & Resources

  • AI homepage teardown (free report + template)

    • What you get: buyer transcripts, quotes, expert fixes, and a practical homepage template to improve clarity and trust; full report at Crazy Egg .
  • AI prototyping, done right

    • Learn AI prototyping from Sachin Rekhi (Reforge); pair with short specs to preserve product strategy and non‑UI requirements .
  • Interview prep (PM at Microsoft)

    • Resource: tryexponents.com with Microsoft‑specific question examples .
  • Governance upskilling for AI products

    • Blue Dot Impact’s AI governance course (evals and observability metrics) to raise your team’s AI readiness .
  • Onboarding tools to experiment faster

    • Try Hopscotch or Chameleon for guided tours and A/B testing .
  • PM community content

    • PMA’s free podcasts, case studies, and meetups for ongoing learning .
  • Teresa Torres on building daily AI skills

    • Practical ways to use LLMs in everyday problem‑solving; ramps complexity step‑by‑step .
  • New dev tool to watch

    • Macroscope: uses your codebase to answer questions and auto‑review PRs; raised a $30M Series A .

One‑page checklist to apply this brief

  • Map one critical workflow into deterministic vs agentic steps; add a last‑mile human review .
  • For your next feature, ship an AI prototype and a one‑page short spec covering strategy, non‑UI requirements, and monetization .
  • Rewrite your homepage hero to state plainly what the product does; add a screenshot and one real example .
  • Pick one onboarding experiment; ship in a week and measure stepwise conversion with session replays .
  • Schedule a partner meeting to explore distribution or credibility lifts; define partner‑attributed KPIs .
  • Start a monthly cross‑functional show‑and‑tell on AI automations that are actually in use .