Chapter 01 · Module 02 · Beginner–Intermediate · 24–28 min

Chapter 1: Model Families and the Competitive Landscape — The PM Version

Map foundation families, tiers, and deployment paths — compare models like infrastructure, not fandom.

Book: AI Learning Beginner–Intermediate 24–28 min
Start reading Back to module
Use case Constraints Eval

Fit beats fame — your memo starts with constraints, not headlines

Introduction

In Module 01, Chapter 11, we closed the mechanics loop: cutoffs, private knowledge, reasoning limits, and why the product stack must own truth — not the model alone.

This module shifts from how models fail to which models you pick and what that choice costs. You do not need a favorite lab. You need a decision framework that survives the next release cycle.

Compare models like infrastructure choices — not brand fandom.

The simple PM version

Use case → Constraints → Eval → Tier → Vendor path.
The “best model” headline is marketing; your memo is engineering plus economics.

1. Why PMs Need a Model Landscape Map

Engineers read release notes. Leadership reads headlines. Finance reads invoices. PMs sit in the middle — translating capability into scope, cost into roadmap, and risk into UX.

Without a landscape map, teams default to:

  • whatever the last demo used,
  • whatever procurement already signed,
  • whatever scored highest on a benchmark none of your users trigger.

A landscape map is not a vendor brochure. It is a taxonomy: who builds foundation models, how they are licensed, what tiers exist, and which dimensions actually move your product.

PM takeaway

Your job is to name the decision dimensions before the sprint picks a model ID from habit.

2. Foundation Model Families — The PM View

A family is a line of models from one builder with shared training philosophy, safety posture, and API shape. Names change weekly; families are the stable mental buckets.

FamilyBuilder lineage (PM shorthand)Typical product association
GPTOpenAIBroad general assistant, tool-rich APIs, enterprise adoption
ClaudeAnthropicLong documents, careful tone, enterprise policy workflows
GeminiGoogleWorkspace-adjacent products, multimodal in Google stack
LlamaMeta (open weights)Self-host, customize, air-gapped or cost-controlled inference
MistralMistral AIEfficient open and hosted options, EU footprint considerations

Families overlap in capability. Differences show up in how you deploy: hosted API vs your VPC, default refusals, context packaging, tool schemas, and commercial terms — not in who “wins” a generic IQ contest.

PM takeaway

Learn families as deployment and policy buckets first; benchmark scores second.

3. Closed APIs vs Open-Weight Models

Closed API models run on the provider’s stack. You get fast iteration, managed safety layers, and predictable SLAs — with data-handling terms and per-token economics you do not control.

Open-weight models ship weights you can host. You gain deployment control, fine-tuning freedom, and potentially lower marginal cost at scale — with engineering ownership for inference, patching, evals, and guardrails.

DimensionClosed APIOpen-weight (self/hosted)
Time to first prototypeUsually fasterSlower — infra required
Data residencyContract + region choicesYou define boundary
CustomizationPrompting, tools, limited fine-tuneFine-tune, distill, route locally
Cost at huge volumeNegotiate enterprise; still per callCapEx/OpEx tradeoff; unit cost can fall
Upgrade pathProvider deprecates modelsYou own migration testing

PM takeaway

Open-weight is not “free.” It shifts spend from API bills to platform teams, GPUs, and reliability work.

4. Frontier, Mid, and Small Tiers

Providers ship multiple sizes per family. PMs should think in tiers, not a single model name:

  • Frontier — hardest tasks, highest cost/latency, widest modality; use sparingly on critical paths.
  • Mid — default production tier for most user-facing reasoning with guardrails.
  • Small — classification, routing, extraction, high-volume chat turns; pairs well with cascade architectures.

Tier strategy is a product decision: route easy work to small models, escalate on uncertainty signals, and cap frontier usage with budgets and feature flags.

PM takeaway

Spec tier routing in the PRD — not “we’ll use the big model everywhere and optimize later.”

5. General vs Specialized Models

General models aim for broad language, reasoning, and tool use across domains. Specialized models target a modality or task: code completion, embeddings, rerankers, speech, OCR, medical/legal tuned variants.

When general winsWhen specialized wins
Multi-step assistant with toolsEmbedding search at billion-scale
Policy Q&A with RAGOn-device wake-word classifiers
Draft-and-edit copilotsDeterministic code completion in IDE

Anti-pattern: forcing a general frontier model to do cheap perception work (e.g., simple layout OCR) because it is the only model ID in the repo.

PM takeaway

Map each workflow step to the cheapest competent model type — general or specialized.

6. PM Comparison Table — Dimensions That Matter

Do not maintain a static “winner” column. Maintain fit against your constraints. Refresh cells when you run evals — not when a blog post drops.

DimensionWhat to askWhy PMs care
Task qualityPass rate on your eval set?Defines MVP bar and regression gates
ContextEnough window for real prompts + RAG?Affects architecture from Chapter 9
Latencyp95 at target output length?UX and agent loop cost
Cost$ per successful outcome at projected volume?Unit economics and tier routing
Safety / refusalFalse refusals vs leaks on your policy?Support load and compliance
Tool useReliable function calling for your schemas?Agent features live or die here
Data termsTraining opt-out, retention, regions?Legal and enterprise sales
OperabilityObservability, versioning, fallbacks?Incident response and A/B tests

PM takeaway

Publish this table inside your model selection memo — empty cells are risks you have not measured yet.

7. Competitive Landscape — Builders, Hosters, and Integrators

The market has three layers PMs should separate:

  1. Foundation builders — train base models (OpenAI, Anthropic, Google, Meta, Mistral, others).
  2. Hosters / clouds — sell inference endpoints, GPUs, and managed fine-tuning (hyperscalers, Together, Fireworks, etc.).
  3. Integrators — frameworks and gateways (LangChain, LlamaIndex, LiteLLM, internal platforms) that abstract providers.

Your product rarely commits to only one layer. You might call Anthropic on Bedrock, route Llama on your cluster, and log everything through an internal gateway. PM value is making those layers explicit in architecture diagrams and cost attribution.

PM takeaway

Ask “who owns the SLA for this call?” — builder, hoster, or your platform team.

8. The Wrong Question: “What Is the Best Model?”

Best for whom, on what task, under which constraints, measured how?

Public leaderboards answer benchmark authors’ tasks. Your users bring messy PDFs, acronyms, tool schemas, and compliance language. A model that tops a multiple-choice science suite may still fail your prior-auth letter workflow.

Replace “best” with:

  • Sufficient — meets quality bar with acceptable cost/latency.
  • Replaceable — abstraction layer so you can swap providers.
  • Observable — you can detect regressions when versions change.

PM takeaway

Ban the word “best” in steering meetings. Require “sufficient for workflow X with evidence Y.”

9. Qualities by Use Case — Fit Beats Fame

Use case patternPrioritizeDeprioritize
High-stakes Q&A with citationsGrounding, refusal calibration, audit logsCreative temperature
Agent with many tool stepsFunction-call reliability, latency at medium contextPoetic prose quality
High-volume triageSmall-tier cost, consistent JSONFrontier reasoning
Document-heavy reviewEffective context + retrieval strategySingle-shot “read everything” myths
Regulated self-hostData boundary, patch process, eval ownershipDay-one benchmark rank

PM takeaway

Write use-case-specific acceptance criteria before comparing model names in a spreadsheet.

10. New Model Announcement Checklist

When marketing drops a new flagship name, run this checklist before rewriting the roadmap:

  • What tier does it replace — frontier, mid, or small?
  • Context window and modality — does it change RAG vs long-context assumptions?
  • Pricing model — input vs output vs cached tokens; any batch endpoints?
  • Safety delta — more refusals or fewer on your policy eval?
  • Breaking changes — deprecated endpoints, new tool schema, different JSON modes?
  • Your eval suite — schedule regression on top 5 production workflows, not a demo prompt.
  • Rollback — can you flip a feature flag to the prior model ID in one deploy?

PM takeaway

Treat launches as migration projects with rollback — not as automatic upgrades.

11. Common PM Mistakes

MistakeWhy it hurtsInstead
Single-model mandateNo routing; runaway costTier + cascade in architecture
Benchmark-driven roadmapOptimizes tasks users never doProduct eval harness first
Ignoring data termsDeal-blocker late in enterprise cycleLegal review in selection memo
Confusing family with deployment“We use Llama” but only via one vendor APIDocument actual inference path
No version pinSilent behavior drift in productionPin model IDs; monitor regressions

Chapter Summary

ConceptPM understanding
Landscape mapTaxonomy of families, tiers, and deployment paths
FamiliesGPT, Claude, Gemini, Llama, Mistral as buckets — not religions
Closed vs openSpeed vs control; cost shifts to platform when self-hosting
TiersFrontier / mid / small for routing and economics
General vs specializedRight model type per workflow step
“Best model”Wrong question — sufficient, replaceable, observable
AnnouncementsChecklist + eval + rollback before hype upgrades

Closing Thought

The landscape will keep adding names. Your advantage is a stable decision language: constraints, evals, tiers, and deployment paths. Chapter 2 applies that language to a hands-on comparison between GPT, Claude, Gemini, and Llama on your prompts.

The real PM lesson

Model choice is portfolio management — not picking a winner on social media.

Chapter navigation

← Previous

Module 01, Chapter 11: Hallucinations, Knowledge Cutoffs, and Model Limitations

Why strong models still fail — and how PMs design around cutoffs and limits.

Read chapter →
Next →

Chapter 2: Choosing Between GPT, Claude, Gemini, and Llama — The PM Version

Run the same five prompts across families and build a decision matrix you can defend.

Read chapter →