Opening hook
The PM craft does not change. Discovery, strategy, prioritization, execution still apply. What changes is that one component of your product is non-deterministic, evolves between deployments without you shipping a new version, and fails in ways traditional QA was never designed to catch. If you treat an AI product like a regular software product, you will ship something that works on Monday and silently degrades by Friday. This lesson is about what is actually different.
Core teaching
The first principle: AI products are probabilistic. Traditional software either does the thing or returns an error. AI products do the thing well most of the time, badly some of the time, and confidently wrong occasionally. The PM's job is to specify what "well" means quantitatively, design for graceful failure, and build the evaluation discipline that lets the team know whether the product is getting better or worse over time. Drucker's management principles still apply (Drucker, 1954): solve real problems, validate evidence, ship outcomes not outputs. The probabilistic substrate is the new constraint underneath those principles.
The second principle: the model is a vendor dependency that updates without you. Anthropic releases a new Claude version. OpenAI deprecates an old model. Google ships a price cut on Gemini. Each of those events changes the behavior of your product. If you have not engineered for it, you find out in production. If you have engineered for it, you have a model abstraction layer, an evaluation suite, a model upgrade playbook, and a rollback path. The PM is responsible for ensuring those exist. This is unlike most other vendor dependencies because the vendor is shipping changes to a non-deterministic system, not just adding features or fixing bugs.
The third principle: evaluation is the PM's job, not just the engineer's. In traditional product, QA writes test cases and acceptance criteria are mostly deterministic. In AI product, the question "is this good enough" is statistical and the metric design is consequential. If you measure summarization quality with ROUGE, you reward extracting the same words as the reference. If you measure with LLM-as-judge, you reward the things the judge model favors. Both choices shape the product. The PM should be in the room when those choices are made and should understand them well enough to argue for or against them. Hamel Husain and Eugene Yan have argued for years that eval-driven development is the dominant AI engineering discipline (Yan et al., Patterns for Building LLM-based Systems and Products, 2023). The PM is the customer of those evals; if the PM does not own which evals get built, the product has no honest definition of "good."
The fourth principle: the cost surface is different. Traditional software has fixed cost per instance and variable cost from infrastructure. AI products have variable cost per request driven by tokens, model selection, retrieval scope, agent tool use, and how chatty users are. A single user asking five questions on Tuesday and five hundred on Thursday costs you forty times more on Thursday. Pricing strategy that was straightforward in SaaS becomes a margin engineering problem. McKenna's relationship marketing work has covered this extensively (McKenna, 1986). PMs who do not internalize unit economics for AI ship products that look great in a free trial and lose money at scale.
The fifth principle: the UX must communicate uncertainty. Traditional UX hides implementation details. AI UX must surface them. Confidence indicators, citations, hedging language, "did this answer your question" prompts, and graceful "I do not know" states are part of the design. Apple's Human Interface Guidelines for Apple Intelligence, Microsoft's HAX Toolkit, and Google's PAIR guidelines all ground this work. The PM either chooses a posture (confident with sources, careful with hedges, conversational with disclaimers) or the engineering team picks defaults that may not match the brand or the user need.
The sixth principle: failure modes are weirder. Hallucination, prompt injection, jailbreaks, model drift, context window degradation, prompt-format-sensitivity (a model behaving differently based on whitespace), tokenization quirks. These are not bugs in the traditional sense. They are properties of language models that PMs must plan for. A PM who responds to a hallucination report with "the engineers should fix the hallucination" has not understood the technology. The right response is "what is our evaluation telling us about hallucination rates, what is the threshold for acceptable, what is our mitigation strategy if rates spike, and is our UX communicating uncertainty appropriately."
The seventh principle: discovery still matters and is harder. The continuous-discovery discipline grounded in Schön (1983) reflection-in-action applies, with one twist: users do not know what AI can do. They overestimate it (asking the model to do things outside its capability) and underestimate it (failing to use the model for things it would solve). User interviews about AI products require demonstrating capability before asking for needs. Otherwise the responses are about the user's mental model of AI from the news, not their actual workflow.
The eighth principle: the team composition is different. AI products require AI engineers, ML engineers, sometimes applied scientists, plus the standard product team. The PM has to communicate across roles that have different vocabularies, different success metrics, and different planning cadences. We cover this in Lesson 1.4 (working with AI engineers) and Lesson 1.5 (working with applied scientists). The PM's first job is to learn the languages well enough to translate between them.
The ninth principle: the regulatory surface is changing. EU AI Act compliance timelines are running. Sectoral regulations in healthcare, finance, employment, and insurance are changing. State laws are emerging. A PM whose product is high-risk under the EU AI Act has obligations that do not exist for traditional software: conformity assessments, post-market monitoring, transparency, registration. Course 6 covers this in depth. For now, the PM should know that "compliance" is a roadmap input, not just a legal checkbox at the end.
The tenth principle: the field changes faster than your roadmap. Foundation model capabilities at the frontier improve every six to twelve months. Pricing changes every few months. Open-weight models move quickly. A roadmap that assumed the November 2025 capability frontier is wrong by April 2026. PMs who plan rigid 12-month roadmaps in this environment lose. PMs who plan with explicit assumptions about model capabilities, with tagged decisions to revisit when those assumptions change, win.
AI-specific application
For the PM transitioning to AI products in 2026, the first 90 days should focus on building three muscles. First, evaluation literacy: read Hamel Husain's eval posts, build your own eval set on a real task, run a model swap and see what changes. Second, cost intuition: instrument a real workflow with token counters, build a cost-per-task spreadsheet, calculate margin at different price points. Third, capability calibration: spend an hour a week with frontier models on tasks adjacent to your product, note where they succeed and fail.
The PMs who succeed at AI products are the ones who treat the model as a co-worker with specific strengths and specific limitations, not as a magic box or a search bar. They write product specs that say "for the summarization step, we use a small model because latency matters and quality is fine; for the verification step, we use a frontier model because correctness matters and we can spend the latency budget." They write evaluation criteria that the team can run automatically before every model swap. They write graceful failure paths into the user flow so the product does not break when the model has a bad day.
Practice exercises
Pick an AI product you use (not yours). Identify its probabilistic component. Identify what happens when that component fails. Identify how the UX communicates uncertainty. Write a one-paragraph critique.
Calculate unit economics on a hypothetical AI feature. Pick a use case from your domain. Estimate tokens per request (use tiktoken or a vendor's tokenizer). Multiply by current Anthropic and OpenAI prices. Estimate request volume. Identify margin at three price points: $10/month, $50/month, $200/month per user.
Write three differences between AI product PM and traditional product PM that you think will matter most for your career. Be specific. The list will be your own roadmap for what to learn first.
Knowledge check
Question 1. What is the core change AI products introduce to the PM craft, in one sentence? a) The PM craft is irrelevant b) The non-deterministic substrate requires statistical specification, evaluation discipline, and graceful failure design that traditional product QA was not built for [correct] c) PMs must learn to code d) AI products require no roadmap
Question 2. Why is model-vendor dependency different from typical SaaS dependencies? a) It is not different b) The vendor ships changes to a non-deterministic system, so vendor changes can change product behavior in ways that are not visible until production [correct] c) Vendors are smaller d) Vendors charge per seat
Question 3. What is the PM's role in evaluation for AI products? a) None b) The PM owns what "good" means quantitatively, what evals get built, and how results inform the roadmap [correct] c) Only engineers handle evaluation d) Evaluation is automatic
Question 4. Why is unit economics different for AI products? a) It is not b) Variable cost per request scales with tokens, model selection, retrieval scope, and user behavior, which can vary 100x between users on the same plan [correct] c) AI products always have fixed cost d) Tokens are free
Question 5. What does a UX that communicates uncertainty look like? a) Hide the uncertainty b) Confidence indicators, citations, hedging language, "I do not know" states, and "did this answer your question" prompts where appropriate [correct] c) Always claim certainty d) Use longer responses
Question 6. Why are user interviews harder for AI products? a) They are not b) Users have inaccurate mental models from news coverage, so capability must be demonstrated before workflow needs are surfaced [correct] c) Users dislike AI d) Users are unavailable
Question 7. Why do rigid 12-month roadmaps fail for AI products? a) They do not fail b) Frontier capabilities, pricing, and open-weight options shift on a six-month cadence, so roadmaps need explicit assumptions tied to capability checkpoints [correct] c) AI products do not need roadmaps d) PMs cannot plan
Slide deck outline
- Title slide: "Lesson 1.1: What's different about AI products"
- Hook: AI products fail in ways traditional QA did not anticipate
- The PM craft does not change; the constraints under it do
- Probabilistic substrate (Drucker, 1954)
- Model-vendor dependency that updates without you
- Evaluation as PM responsibility (Yan, Husain)
- Cost surface and unit economics (McKenna, 1986)
- UX that communicates uncertainty (Apple HIG, Microsoft HAX, Google PAIR)
- Failure modes: hallucination, injection, drift
- Continuous discovery for AI products (Torres adapted)
- The "demonstrate capability before asking" interview pattern
- Team composition and translation across disciplines
- Regulatory surface (EU AI Act preview, sectoral regulations)
- Capability frontier shifts and roadmap rhythm
- Three muscles to build in the first 90 days
- Eval literacy starter: Hamel Husain reading list
- Cost intuition starter: instrumenting a workflow
- Capability calibration starter: weekly hour with frontier models
- The model as co-worker, not magic box, not search bar
- Citations: Drucker, McKenna, Torres, Yan, Husain
- Practice exercises summary
- Transition to Lesson 1.2
Reference reading
- Drucker, P.F. (1954). The Practice of Management. Harper & Brothers: https://archive.org/details/practiceofmanage0000druc
- McKenna, R. (1986). The Regis Touch. Addison-Wesley: https://archive.org/details/registouchnewmar0000mcke
- Schön, D. A. (1983). The Reflective Practitioner. Basic Books
- Yan, E., et al., Patterns for Building LLM-based Systems: https://eugeneyan.com/writing/llm-patterns/
- Husain, H., Your AI Product Needs Evals: https://hamel.dev/blog/posts/evals/
Transition
You see what is different. Next, where the AI PM market sits and which roles within it match which backgrounds. Lesson 1.2 maps the AI PM landscape: roles, levels, companies.