# Agentic Workflows, Adoption Bottlenecks, and the Rise of AI System Design

*By PM Daily Digest • April 16, 2026*

This issue tracks how AI is reshaping product management: prototype-first workflows for agentic products, adoption and alignment becoming harder than shipping, and AI PM interviews shifting toward system design. It also includes practical guidance on PXD writing, AI quality measurement, discovery, and lightweight coding practice.

## Big Ideas

### 1) Agentic development is changing the unit of PM work

Andrew Chen frames a progression from **waterfall** optimizing for being right upfront when iteration was expensive, to **agile** optimizing for human iteration speed when iteration became cheaper, to **agentic** development optimizing for a world where iteration is effectively free [^1]. He expects that shift to reorganize product teams, reduce reliance on traditional PM artifacts, and create more role collision across product, design, and engineering [^1].

Rags Vadali describes what that looks like in practice for agentic products: start with the problem, let engineers prototype directly on the codebase with AI tools, then have PM and design shape the **product experience** after seeing what is actually possible [^2]. His underlying claim is that, for agentic products, the real product is often the **experience layer on top of the agent** rather than a conventional UI spec [^2].

The trade-off is speed versus entropy. Weekly sprints become feasible, but 50-60% of experiments may still be thrown away if they do not solve a real user problem [^2]. Chen’s warning is similar: velocity can explode, but so can system drift, which means teams need product-level pruning and refactoring—not just more output [^1].

**Why it matters:** PM leverage is moving away from writing exhaustive specs and toward defining interaction quality, guardrails, and what should be kept versus discarded.

**How to apply:** Use this pattern when the product is primarily agentic. Vadali explicitly says UI-heavy work still benefits from more traditional experience design and sequencing [^2].

### 2) Faster building is making adoption and alignment the real bottlenecks

> "Your roadmap feels like progress to you. To customers, it feels like the product won’t sit still. That gap is where adoption goes to die." [^3]

Hiten Shah extends the same idea: building is faster than ever, but customers cannot absorb change at the same pace, which makes **marketing load-bearing again** [^4]. Inside teams, practitioners describe a parallel failure mode: alignment is usually strong at the start of a project, then weakens as changes accumulate, and updates do not fully carry across tools like Jira, Notion, and Figma when work moves quickly [^5][^6].

**Why it matters:** AI can increase product output without increasing customer understanding or internal coherence.

**How to apply:** Treat customer communication and cross-functional re-alignment as part of delivery work, not cleanup work after shipping.

### 3) AI products now need continuous stewardship, not launch-and-leave ownership

John Cutler argues that understanding the return on product and engineering investment is **not** a one-time calculation. It is a set of behaviors practiced continuously over time [^7]. That discipline starts with constant leverage questions—are we solving the right problem, can we do it with less complexity, can we avoid hiring around dysfunction—and compounds over time [^7]. He also warns against weak proxies such as flow metrics, revenue-per-engineer benchmarks, or short-term vanity measures [^7].

His suggested operating model is Bayesian: start with priors, define leading indicators, review the evidence quarterly, and update confidence without demanding false certainty [^7]. It also requires hiring discipline; Cutler notes that money is the easy part, while turning money into valuable software is hard, and adding people rarely fixes a broken team [^7].

That maps directly to current AI product operations. PMs in the field describe subjective output quality, systems that work in demos but fail randomly in production, users who say an AI feature is "not helping" without specifics, and cases where engineering metrics improve while support tickets worsen [^8]. Operators running real agentic apps report preview-environment failures, daily micro-hallucination maintenance, and regressions caused by silent model changes even when the code stays the same [^9].

**Why it matters:** AI products behave more like systems that require active stewardship than features that can be launched and left alone.

**How to apply:** Budget owner time for evals, maintenance, and quarterly evidence reviews—not just for shipping new capability.

### 4) Product org change works better through thin slices than copied frameworks

Cutler’s broader organizational point is that rigid frameworks fail because product work is inherently messy and context-specific [^10]. He warns that copying the current structure of a successful company rarely works, because those companies themselves have swung repeatedly between centralized and decentralized structures, monoliths and microservices, and documentation-heavy versus emergent cultures [^10].

The more durable change pattern described here is **top-down air cover plus bottom-up support**, with middle managers facilitating rather than carrying the whole transformation alone [^10]. ThoughtWorks describes a similar method: start with problem discovery, shape a custom path, and use pilots or small groups to prove value and create local champions before broader rollout [^10].

**Why it matters:** AI-driven process change is easy to oversell and hard to institutionalize if it arrives as a big-bang template.

**How to apply:** Start with one workflow, one team, or one pilot. Use local wins to justify broader change.

## Tactical Playbook

### 1) Write a Product Experience Document when the product is primarily agentic

1. **Start with the problem and the user segment**, not the system or feature list [^2][^11].
2. **Let engineering show the first prototype directly in the codebase** so PM and design can see the actual capability edge before over-specifying it [^2].
3. Document the **Why** and explicit **success criteria**. Vadali’s version includes both quantitative measures and qualitative bars, including whether the system can generate insights comparable to a real 15-minute user conversation [^2].
4. Add **experience principles** that define how the interaction should feel—for example, one question at a time, or breakpoints when the user surfaces something important [^2].
5. Include **example good, bad, and anti-pattern interactions**. Non-deterministic systems need ranges and guardrails, not just an ideal answer [^2].
6. Define **critical moments** where the agent should stop, probe, and go deeper, and specify how the conversation should close to encourage repeat use [^2].
7. Hand the document to engineering as something both humans and coding agents can use. Vadali says engineers fed these principles into Claude to generate prompts [^2].

**Why it matters:** This gives agentic products structure without freezing discovery.

### 2) Measure AI features across four layers, then wire in failure handling

1. Track **model metrics** such as recall or hallucination rate [^11].
2. Track **latency metrics** such as p95 response time [^11].
3. Track **user metrics** such as CSAT or percentage resolved without escalation [^11].
4. Track **business metrics** such as retention or revenue [^11].
5. Map **failure modes** and human handoff rules up front: model down, latency above 30 seconds, or repeated user questioning all route to a human in the example shared here [^11].
6. Add real-world checks beyond dashboards. Practitioners are still relying on random output checks and support-ticket review because users often say an AI feature is not helping without isolating the failure [^8].
7. Re-run checks whenever prompts or models change, since regressions can appear without any code change [^8][^9].

**Why it matters:** AI can look strong in demos and still fail unpredictably in production [^8]. This turns quality from a vibes-based discussion into an operating loop [^8].

### 3) Tighten the definition of done before AI-assisted development starts

1. Write **tests or clear acceptance criteria before coding**. In one discussion, two very different teams both roughly doubled output, but the common factor was a precise definition of done [^12].
2. Reduce ambiguity aggressively. One practitioner argues that widespread distrust of AI-generated code is often a **clarity problem**, not an AI problem [^12].
3. Keep scope **small and isolated**. The most sustainable PM coding reps in this set are copy changes, frontend tweaks, config fixes, and tracking events—not full features [^13].
4. Frame AI-generated PRs as **drafts**, not finished work, and include the prompts so reviewers can understand the intent behind the diff [^13].
5. Write a detailed **PR description** that answers the reviewer’s likely questions before they open the files [^13].
6. Review AI output like a **design comp** and look for unrequested changes, because seemingly helpful edits are often the hidden risk [^13].

**Why it matters:** Vague tickets that once cost a day of back-and-forth can now create hundreds of lines of confidently wrong code [^12].

### 4) Use AI to widen discovery, but keep real users in the loop

1. Start with **real conversations**. The products resonating most strongly in this set still came from actually speaking to users [^2].
2. Look for patterns after a **small number of interviews**. Vadali says six conversations were often enough to start seeing a real pattern [^2].
3. Use AI search across places like **Reddit** and **G2** to scale early problem discovery and collect citations faster [^2].
4. Use **synthetic personas** only for early, revert-to-the-mean feedback. Vadali says they are useful, but only around 50-60% of what real interviews provide today [^2].
5. Ask **negative questions**. "Why would you click buy?" invites pleasing answers; "what would make you pause before clicking?" surfaces more useful friction [^2].
6. If your product is API-first, remember that some users may now be **AI agents using your API**, not just humans [^14].

**Why it matters:** AI can widen the top of the discovery funnel, but it does not replace direct user contact.

## Case Studies & Lessons

### 1) Zoom is extending from meetings to work outcomes

Zoom’s CPO argues that meetings are an **ephemeral moment**. Users care about pre-meeting scheduling and coordination, asynchronous chat and email between meetings, and the documents, spreadsheets, or slide decks they need afterward [^15]. That is why Zoom is broadening its surface area around the full work lifecycle rather than competing on meeting quality alone [^15].

It is also doing so through a **coexistence strategy**. Zoom is integrating with ecosystems like Google, Microsoft, and Salesforce because switching costs are high, and it needs to add value even when users do not fully migrate [^15]. Its AI Companion 3.0 is positioned as agentic retrieval across first- and third-party tools that can produce work products, not just answers [^15].


[![Zoom CPO on Rebuilding the Future of Work with Agentic AI](https://img.youtube.com/vi/sSFIG1fDOkU/hqdefault.jpg)](https://youtube.com/watch?v=sSFIG1fDOkU&t=472)
*Zoom CPO on Rebuilding the Future of Work with Agentic AI (7:52)*


**Key takeaway:** the opportunity is not the meeting itself; it is the job around the meeting. Zoom is also measuring **depth** of AI engagement, not just frequency—for example, whether users move from passive summaries to deeper research and document creation [^15]. The same speaker says AI efficiency should be spent on better quality and stronger strategic direction, not just more velocity [^15].

### 2) Prototype-first experience design can work—if you are willing to throw work away

Vadali describes a workflow where the team no longer starts with a PRD. It starts from a problem, asks engineers to prototype directly on the codebase with AI tooling, then lets PM and design shape the product experience after seeing what is possible [^2]. The practical reason is speed: with weekly sprints, the PM and designer had become the bottleneck when they tried to spec everything up front [^2].

The discipline is just as important as the speed. The team throws away 50-60% of what it builds if it does not solve a real user problem [^2].

**Key takeaway:** prototype-first only works if the team is comfortable discarding work and if the product is mostly agentic. Vadali explicitly says this is a worse fit for UI-heavy products that still require tighter UX sequencing and design control [^2].

### 3) Internal agents can create coverage humans cannot—but they drift

One operator story here includes an "AI VP of Marketing" built on roughly five years of attendee and sponsor revenue data—about $100 million worth—which generates year-over-year and week-over-week analysis, daily Slack and email check-ins, graphs, and proactive prompts [^9]. A separate "AI VP of Customer Success" follows up on onboarding deliverables, detects placeholders or incomplete assets, and sends neutral reminders that humans often struggle to send consistently [^9].

> "You do not need to be technical, but you need someone pretty damn product savvy to maintain it." [^9]

The trade-off is operational. The team describes preview-environment outages, daily micro-hallucinations that require about 15 minutes of maintenance, and silent model changes that caused anomalous outputs even when the underlying code did not change [^9].

**Key takeaway:** agentic apps can create real value and full-process coverage, but they behave more like living systems than shipped features. Ownership, evals, and maintenance time have to be planned up front.

## Career Corner

### 1) AI PM interviews are moving from product design to AI system design

Aakash Gupta argues that classic prompts like "design a pencil for the blind" are giving way to **AI system design** rounds for AI PM jobs [^16]. These rounds focus on data pipelines, model trade-offs, orchestration, agent architecture, and failure modes—not just product framing [^11][^16]. The weighting is notably technical: **technical fluency** 30-40%, **system architecture** 25-30%, **product judgment within constraints** 20-25%, and **trade-off articulation** 10-15% [^16].

This is also explicitly **not** a software-engineering system design interview. The ask is deeper than standard product design, but not about load balancers or database sharding [^16].

> "If you spend more time on personas than on your system diagram, you will not pass this round." [^16]

**How to apply:** Practice drawing the system live, get comfortable explaining when a traditional ML model beats an LLM, and surface trade-offs before you are prompted [^11][^16]. The stakes are high: the roles discussed here are described at $500K-$800K+ total compensation, with staff-level packages clearing $1M with equity [^16].

### 2) Product sense is becoming a cross-functional hiring requirement

Zoom’s CPO argues that as AI takes over more formatting and implementation work, the durable human skills become **strategy, organizing people, and asking the right questions** [^15]. He also describes research increasingly drawing from telemetry, customer service insights, and sentiment [^15], specs being written from early stakeholder conversations [^15], and design systems feeding prototype automation [^15]. As those activities compress, the bottleneck shifts upstream to strategic direction and outcome-driven decision making [^15].

Vadali pushes the implication further: he would make **product sense** a required part of the interview loop for everyone shipping code, not just PMs, and says AI-native candidates with strong product sense can fit multiple roles [^2].

**How to apply:** Keep doing product critiques, ask candidates to explain a favorite product and why they like it, and stay close to users so feature speed does not outrun judgment [^2].

### 3) Small coding loops are becoming a practical PM skill builder

The most grounded advice in this set is narrow: use AI to work on **small PRs**, frame them as drafts, include the prompts, and review the diff carefully for unintended changes [^13]. The same note warns that trying to ship full features this way often adds work for engineering rather than removing it [^13].

**How to apply:** Treat these loops as judgment training, not role replacement. The reported payoff is a much faster feedback cycle—review comments can turn into a revised draft in minutes rather than a full day [^13]. For interview practice, Gupta also shared a full mock AI system design video: [watch the mock here](https://www.youtube.com/watch?v=eUFnulsUIBg) [^11].

## Tools & Resources

### 1) The Product Experience Document (PXD)

For agentic products, the PXD structure here is worth copying: **Why**, **Success Criteria**, **Experience Principles**, **Example Interactions**, **Critical Moments**, and **Close Conversation** [^2]. It is designed to be usable by both engineers and coding agents [^2]. A strong starting point is the source interview: [The document that can replace PRDs](https://www.youtube.com/watch?v=wMwrPT4rHDA) [^2].

### 2) Claude-based low-risk PM coding reps

The most practical usage described here is not big features; it is small, isolated PRs like copy edits, config fixes, and tracking events, with prompts attached and reviewer context written up front [^13]. That makes this a useful practice loop for PMs trying to build AI fluency without creating review debt.

### 3) Replit, Lovable, and Vercel v0 for fast B2B/AI prototyping

SaaStr’s operating experience is that non-technical builders can now get real B2B and AI apps into production on platforms like Replit, Lovable, and Vercel v0 [^9]. The caveat is maintenance: once the app becomes complex, someone product-savvy still needs to own it [^9].

### 4) A lightweight localization workflow

One concrete example: a team used Replit to add a translation toggle backed by OpenAI, then QA’d the output with Claude and Cowork [^9]. The reported implementation time was about 20 minutes for Chinese and Spanish, versus a much heavier manual localization process [^9].

### 5) Two books worth adding to the PM shelf

John Cutler’s conversation recommends **Viral Change** for organizational transformation and **Sales Pitch** for buyer-centric framing and helping customers buy rather than just pushing features [^10].

---

### Sources

[^1]: [𝕏 post by @andrewchen](https://x.com/andrewchen/status/2044476929374622082)
[^2]: [The document that can replace PRDs — Rags Vadali](https://www.youtube.com/watch?v=wMwrPT4rHDA)
[^3]: [𝕏 post by @hnshah](https://x.com/hnshah/status/2044552950497042804)
[^4]: [𝕏 post by @hnshah](https://x.com/hnshah/status/2044438096603644299)
[^5]: [r/ProductManagementJobs post by u/PerformerAny3503](https://www.reddit.com/r/ProductManagementJobs/comments/1smbq3u/)
[^6]: [r/ProductManagementJobs post by u/Expensive_Entry_69](https://www.reddit.com/r/ProductManagementJobs/comments/1smbrjv/)
[^7]: [TBM 416: Investment Stewardship \(As Habit\)](https://cutlefish.substack.com/p/tbm-416-investment-stewardship-as)
[^8]: [r/prodmgmt post by u/print2ssk](https://www.reddit.com/r/prodmgmt/comments/1smqggq/)
[^9]: [Our AI Agent Outages, Hallucinations, and Upsell Traps | The Agents Ep. 1](https://www.youtube.com/watch?v=fCkj9eRScAc)
[^10]: [Navigating the messy middle: A guide to real product organization transformation](https://www.youtube.com/watch?v=gepDpZ1Idlc)
[^11]: [substack](https://substack.com/@aakashgupta/note/c-244206134)
[^12]: [r/ProductManagement post by u/arapkuliev](https://www.reddit.com/r/ProductManagement/comments/1sm6hwp/)
[^13]: [substack](https://substack.com/@aakashgupta/note/c-243988585)
[^14]: [𝕏 post by @andrewchen](https://x.com/andrewchen/status/2044523650515407268)
[^15]: [Zoom CPO on Rebuilding the Future of Work with Agentic AI](https://www.youtube.com/watch?v=sSFIG1fDOkU)
[^16]: [Product Design Questions Are Dead. Here's What Replaced Them.](https://www.news.aakashg.com/p/ai-system-design-interview-your)