Introduction
In Chapter 2, we saw how transformers became Large Language Models.
In Chapter 3, we covered tokens and context windows — the working memory of an AI system.
In Chapter 4, we introduced AI safety, RLHF, and Constitutional AI — why alignment matters and how human feedback shapes behavior.
In Chapter 5, we went deeper into InstructGPT and the RLHF pipeline that turned base models into instruction-following assistants.
Now we come to a decision every AI product manager will face:
Should we solve this with prompting, or should we fine-tune the model?
This question sounds technical. It is not only technical. It is a product strategy question. The answer affects cost, latency, quality, reliability, data requirements, release timelines, maintainability, evaluation design, and long-term scalability.
A weak product team jumps to one extreme: “Let’s fine-tune the model” or “We can solve everything with better prompts.” A strong product team asks: What behavior are we trying to improve, and what is the cheapest, safest, most maintainable way to improve it?
This chapter explains prompting vs fine-tuning from a PM lens — not as a machine learning tutorial, but as a practical product decision framework.
The simple PM version
Prompt first. Evaluate early. Retrieve knowledge. Use tools for exactness.
Fine-tune only when behavior is repeated, measurable, stable, and valuable.
1. The Simple Difference
Prompting means you guide the model at runtime. Fine-tuning means you change the model’s learned behavior through training.
| Approach | Simple meaning |
|---|---|
| Prompting | Tell the model what to do in the input |
| Fine-tuning | Train the model on examples so it learns the behavior |
| RAG | Retrieve external knowledge and give it to the model as context |
| Tools | Let the model call APIs, databases, calculators, or workflows |
These are not mutually exclusive. Good AI products often use a combination. A simple product may use only prompting; a mature enterprise product may use prompting, RAG, tools, fine-tuning, evals, monitoring, human review, and audit logs.
The key is knowing which layer solves which problem.
2. Prompting: The Fastest Way to Shape Behavior
Prompting is the first lever most product teams should use. A prompt gives the model instructions, context, examples, constraints, and output format.
Example: “Summarize this claim document in five bullet points. Highlight missing documents, policy risks, and next action. Do not make a final approval decision.” The model is not retrained. Its weights do not change. You are giving it better instructions for this request.
Why prompting is powerful
| Prompting strength | Product benefit |
|---|---|
| Fast to test | Quick iteration |
| No training data required | Good for early discovery |
| Easy to modify | PMs and SMEs can participate |
| Low upfront cost | Useful for MVPs |
| Works well with strong models | Good for many general tasks |
| Can include current context | Useful with RAG and tools |
| Transparent | Easy to inspect and debug |
For most early-stage AI features, prompting should be the default starting point.
PM takeaway
If you are still discovering the user problem, do not fine-tune first. Prompt first. Use prompting to understand what users ask, where the model fails, what context is needed, what format users prefer, what edge cases appear, and what evaluation criteria matter.
3. What Prompting Is Best For
Prompting works well when the behavior change is mostly about instruction, context, or output format.
| Use case | Why prompting works |
|---|---|
| Summarization | Model already knows how to summarize |
| Rewriting | Model already understands tone and language |
| Structured output | Format can be specified in prompt |
| Basic classification | Instructions and examples may be enough |
| Drafting emails | Strong models handle this well |
| Explaining concepts | Model already has general knowledge |
| One-off tasks | Fine-tuning would be overkill |
| Rapid experimentation | Prompts are easy to change |
Example: an AI feature that rewrites customer support replies in a more polite tone. Prompting may be enough: “Rewrite this reply in a polite, concise, professional tone. Keep all factual details unchanged.” No fine-tuning needed at the start.
Prompting works best when
| Condition | Meaning |
|---|---|
| Task is simple or moderately complex | The model already has the ability |
| Data is limited | No training set exists yet |
| Behavior changes often | Prompts are easier to update |
| Use case is exploratory | Product is still evolving |
| Context changes every request | RAG/prompting may be better |
| Risk is manageable | Human review can catch issues |
Prompting is flexible. That flexibility is its biggest advantage.
4. The Limits of Prompting
Prompting is not magic. At some point, prompts become too long, too fragile, or too hard to maintain.
You may see this when:
- the prompt becomes a giant rulebook,
- the model ignores some instructions,
- output format keeps drifting,
- examples do not fit in the context window,
- behavior changes across model versions,
- latency increases due to long prompts,
- token cost becomes high at scale,
- edge cases require many examples,
- or the same mistakes happen repeatedly.
Signs prompting is reaching its limit
| Symptom | What it means |
|---|---|
| Prompt is becoming very long | You are encoding too much behavior at runtime |
| Model keeps missing the same pattern | It may need training examples |
| Output format is inconsistent | Prompt may not be strong enough |
| You need many few-shot examples | Context window is being used as a training set |
| Latency is too high | Prompt size may be too large |
| Cost is rising | Repeated examples cost tokens every call |
| SMEs keep patching instructions | Product behavior is not stable |
| Different models behave differently | Prompt is brittle |
This is where fine-tuning may become relevant.
PM takeaway
Prompting is best for steering. Fine-tuning is better when you need behavior to become more native, repeatable, and consistent.
5. Fine-Tuning: Teaching the Model Through Examples
Fine-tuning means training a model on examples of the behavior you want. Instead of writing a giant prompt that says “Always respond like this…,” you give the model many examples showing: “When the input looks like this, the output should look like that.”
| Input | Desired output |
|---|---|
| Raw claim note | Structured claim summary |
| Customer complaint | Correct category and escalation priority |
| Internal SOP question | Approved style answer |
| Messy support message | Clean response in company tone |
| Product feedback | Feature theme classification |
Fine-tuning changes the model’s learned behavior for the target task. It is not just instruction. It is training.
PM analogy
Prompting is like giving an employee instructions before each task. Fine-tuning is like training the employee over time using repeated examples. If the task happens once, instructions are enough. If it happens thousands of times, training may be better.
Further reading
OpenAI: Fine-tuning guide — when and how to fine-tune models for your use case.
Weights & Biases: OpenAI fine-tuning integration — track training runs, datasets, and model versions.
6. What Fine-Tuning Is Best For
Fine-tuning works best when you need repeatable behavior across many similar tasks.
| Use case | Why fine-tuning helps |
|---|---|
| Consistent output format | Model learns the structure |
| Brand or domain tone | Model learns repeated style |
| Repeated classification | Model learns decision pattern |
| Domain-specific extraction | Model learns field patterns |
| Shorter prompts at scale | Less need for repeated examples |
| Lower latency | Smaller prompts can reduce request size |
| Smaller model specialization | A cheaper model may perform well on one task |
| Correcting repeated instruction failures | Model learns what prompt alone cannot fix |
Fine-tuning is especially useful when the model already has the general capability but does not perform the task consistently enough.
Example: customer support tone
Prompting: “Reply in our brand tone: warm, direct, calm, non-defensive, no over-apology.” This may work. But if the product generates thousands of replies every day and tone consistency is critical, fine-tuning on approved examples may produce more reliable behavior.
Example: claims summary format
Prompting: “Summarize the claim in this exact structure…” may work for a demo. If the model keeps missing fields or producing inconsistent labels, fine-tuning on hundreds or thousands of approved summaries may help.
7. Fine-Tuning Is Not for Adding Fresh Knowledge
This is one of the most important PM lessons. Fine-tuning is often misunderstood as a way to “upload knowledge into the model.” That is usually the wrong mental model.
If your problem is “The model does not know our latest policy, product catalog, SOP, or insurer guideline,” your first solution should usually not be fine-tuning. It should usually be RAG, database lookup, or tool use.
Business knowledge changes. Policies change. Pricing changes. SOPs change. Regulations change. Product features change. Customer data changes. Claim status changes. You do not want to retrain a model every time business knowledge changes.
PM rule
| Problem | Better first approach |
|---|---|
| Model needs latest policy | RAG |
| Model needs customer record | Database/API tool |
| Model needs claim status | System integration |
| Model needs live price | API lookup |
| Model needs latest regulation | Retrieval/search |
| Model needs current inventory | Tool call |
| Model needs internal SOP | RAG/document index |
Fine-tuning is better for behavior. RAG and tools are better for knowledge.
Clear mental model
| Need | Best fit |
|---|---|
| Teach behavior | Fine-tuning |
| Provide current facts | RAG |
| Fetch live data | Tool/API |
| Enforce business rules | Rule engine/workflow |
| Improve wording | Prompting or fine-tuning |
| Improve repeated format | Fine-tuning |
| Improve one-off answer | Prompting |
PM takeaway
Do not fine-tune when the problem is missing context. Pass the context.
8. Fine-Tuning vs Prompting vs RAG
A mature PM should not compare only prompting and fine-tuning. The real choice is usually between prompting, RAG, fine-tuning, tools, or a combination.
| Problem | Prompting | RAG | Fine-tuning | Tools |
|---|---|---|---|---|
| Need better instruction-following | Good first step | Not enough alone | Strong if repeated failure | Not relevant |
| Need current knowledge | Weak | Strong | Weak | Strong |
| Need consistent format | Good first step | Not enough | Strong | Not relevant |
| Need live system data | Weak | Maybe | Weak | Strong |
| Need domain tone | Good | Not enough | Strong | Not relevant |
| Need exact calculation | Weak | Not enough | Weak | Strong |
| Need repeated classification | Good first step | Maybe | Strong | Maybe |
| Need auditability | Prompt helps | Strong with citations | Needs evals | Strong with logs |
PM takeaway
Do not ask “Prompting or fine-tuning?” Ask: “Is the problem instruction, context, behavior, knowledge, calculation, or action?” Then choose the right layer.
9. The Evaluation-First Mindset
Before deciding to fine-tune, build an evaluation set. This is non-negotiable. If you cannot measure the model’s failure, you cannot prove fine-tuning helped.
A proper evaluation set contains representative inputs and expected behavior.
| Eval component | Example |
|---|---|
| Normal cases | Common user requests |
| Edge cases | Ambiguous, incomplete, messy inputs |
| High-risk cases | Compliance-sensitive examples |
| Format checks | JSON/table/field-level validation |
| Ground truth | Expert-approved expected outputs |
| Failure categories | Hallucination, wrong format, missed field |
| Scoring method | Human review, rubric, automated grader |
Without evals, teams argue based on vibes. One person says the fine-tuned model feels better; another says the prompt version is good enough. That is not product management. That is opinion management.
Before fine-tuning, define:
- What does good output mean?
- What are the top failure modes?
- What score are we trying to improve?
- What is the baseline prompt performance?
- How will we compare the fine-tuned model?
- What trade-offs are acceptable?
- What happens if fine-tuning improves one metric but hurts another?
PM takeaway
Fine-tuning without evals is gambling.
10. The Product Optimization Loop
A healthy AI product does not jump directly from prompt to fine-tune. It follows a loop.
| Step | Action |
|---|---|
| 1 | Define task and success criteria |
| 2 | Build baseline prompt |
| 3 | Run evals on real-world-like data |
| 4 | Analyze failure patterns |
| 5 | Improve prompt/context/RAG/tools |
| 6 | Re-run evals |
| 7 | Decide whether fine-tuning is justified |
| 8 | Build fine-tuning dataset |
| 9 | Train and evaluate |
| 10 | Monitor in production |
Many failures are not fine-tuning problems. They are product design problems.
| Failure | Real fix |
|---|---|
| Model gives outdated policy | RAG, not fine-tuning |
| Model misses customer data | API integration, not fine-tuning |
| Model cannot calculate correctly | Calculator/tool, not fine-tuning |
| Model output format drifts | Prompt first, fine-tune if repeated |
| Model tone is inconsistent | Prompt first, fine-tune if scale justifies |
| Model fails edge classification | Fine-tuning may help |
| Model hallucinates missing docs | Context and source grounding first |
PM takeaway
Fine-tuning should be a decision after diagnosis — not a reflex.
11. Data Readiness: The Hidden Cost of Fine-Tuning
Fine-tuning requires data — not just any data, but good data. That is where many teams fail. “We have a lot of data” is not the same as “We have clean, representative, labeled examples.”
Good fine-tuning data should be
| Requirement | Meaning |
|---|---|
| Representative | Matches real production inputs |
| High-quality | Outputs are correct and approved |
| Consistent | Similar cases are labeled similarly |
| Diverse | Covers normal and edge cases |
| Safe | Does not include unnecessary sensitive data |
| Versioned | Dataset changes are tracked |
| Reviewed | SMEs validate examples |
| Measurable | Supports evaluation |
Bad data trains bad behavior. If your historical data contains inconsistent human decisions, messy formatting, outdated policies, or operational shortcuts, fine-tuning may teach the model the wrong thing.
Questions to ask before fine-tuning
| Question | Why it matters |
|---|---|
| Do we have approved examples? | Model learns from examples |
| Are examples current? | Avoid outdated behavior |
| Are labels consistent? | Avoid confusing training signal |
| Are edge cases included? | Improve robustness |
| Has SME reviewed the dataset? | Ensure domain correctness |
| Is sensitive data minimized? | Reduce privacy risk |
| Is dataset versioned? | Support traceability |
| Is there a separate test set? | Avoid self-deception |
Fine-tuning is only as good as the dataset.
12. Fine-Tuning Can Reduce Prompt Length
One real benefit of fine-tuning is that it can reduce the amount of instruction and examples you need to send in every request. Without fine-tuning, you may need a long system prompt, many rules, several examples, detailed formatting instructions, tone guidance, and edge-case handling. With fine-tuning, some of that behavior can be learned by the model.
Shorter prompts may reduce token cost, latency, context clutter, and prompt maintenance burden.
Example
Before fine-tuning: a 2,000-token prompt with 8 examples and strict formatting instructions.
After fine-tuning: a 300-token prompt with the task and current context.
This can matter at scale.
| Condition | Why |
|---|---|
| High request volume | Token savings compound |
| Long repeated prompts | Fine-tuning can internalize examples |
| Latency matters | Shorter prompts can help |
| Format consistency matters | Learned structure reduces drift |
| Smaller model can be used | Cost may drop significantly |
PM takeaway
Do the math. Fine-tuning has upfront training and maintenance cost. The savings must justify the complexity.
13. Fine-Tuning Can Improve Consistency, Not Guarantee Truth
Fine-tuning can make outputs more consistent. But consistency is not the same as truth. A fine-tuned model may consistently produce the wrong answer if trained on poor data. It may consistently follow a structure while still hallucinating facts. It may consistently sound like your brand while missing policy nuance.
Example
A fine-tuned language model trained on old claim summaries may become very good at writing summaries in your preferred format. But if policy rules changed, it may still produce outdated reasoning unless it receives current context.
Fine-tuning improves behavior patterns. It does not remove the need for RAG, source grounding, live data, validation, human review, and monitoring.
PM takeaway
A fine-tuned model can still be wrong. It may just be wrong more consistently. That is dangerous if you do not evaluate it.
14. Prompting Is Easier to Debug
Prompting has one big advantage: it is visible. You can inspect the prompt and say: this instruction is unclear, this example is wrong, this rule conflicts with another rule, this context is missing.
Fine-tuning is less transparent. If a fine-tuned model behaves badly, it may be harder to know why — the data, the labels, the training method, the base model, the evaluation set, the deployment prompt, or the model update.
PM takeaway
Use prompting while product behavior is still changing. Use fine-tuning only when behavior is stable, failures are repeated, examples are available, evals are ready, and the business case is clear. Fine-tuning too early creates a slower product learning loop.
15. Fine-Tuning Is a Product Commitment
A prompt can be edited quickly. A fine-tuned model must be managed: dataset management, training jobs, validation, deployment, rollback, versioning, monitoring, governance, retraining, and cost tracking.
You need traceability:
| Artifact | Why it matters |
|---|---|
| Training dataset | What did the model learn from? |
| Validation dataset | How was it tested? |
| Model version | Which version is in production? |
| Training configuration | How was it trained? |
| Evaluation results | Did it improve? |
| Deployment date | When did behavior change? |
| Rollback version | What if performance drops? |
Fine-tuning turns model behavior into a managed product asset.
PM takeaway
Do not fine-tune unless you are ready to own the lifecycle. Fine-tuning is not a one-time experiment. It is a product operations responsibility.
16. When Prompting Is the Right Choice
Prompting is the right choice when speed, flexibility, and learning matter more than hard-coded consistency.
| Situation | Why prompting fits |
|---|---|
| MVP or prototype | Fast iteration |
| Task is still evolving | Easy to change |
| Few examples exist | No dataset yet |
| User context changes often | Runtime context matters |
| Output does not need extreme consistency | Prompt is enough |
| Human review exists | Mistakes can be caught |
| Volume is low | Token cost is manageable |
| Need current information | Use prompt + RAG |
Example
You are building an AI assistant that helps product managers draft PRDs. Prompting is probably enough at first. The product is exploratory, user expectations vary, the format may evolve, and examples are still being collected. Fine-tuning would be premature.
17. When Fine-Tuning Is the Right Choice
Fine-tuning is the right choice when the task is stable, repeated, measurable, and valuable enough to justify training.
| Situation | Why fine-tuning fits |
|---|---|
| Same task repeats frequently | Training value compounds |
| Output format must be consistent | Model learns structure |
| Tone/style must be highly specific | Examples teach style |
| Prompt is too long | Fine-tuning can reduce examples |
| Model repeats same failure | Training may correct behavior |
| You have high-quality examples | Dataset is ready |
| Evaluation is clear | Improvement can be measured |
| Scale justifies cost | Business case exists |
Example
You are building a customer support reply generator for a high-volume operation. The brand tone is specific. The answer format is stable. Thousands of examples exist. Human agents already edit responses. Acceptance and edit rates can be measured. Fine-tuning may be justified.
18. When RAG Is the Right Choice
RAG is the right choice when the model needs current or private knowledge.
| Situation | Why RAG fits |
|---|---|
| Knowledge changes often | Retrieval stays current |
| Source citation is needed | Retrieved docs can be shown |
| Documents are large | Retrieve relevant chunks |
| User asks factual questions | Ground answer in source |
| Internal knowledge base exists | Index and retrieve |
| Compliance needs traceability | Cite source documents |
| Fine-tuning would go stale | RAG separates knowledge from behavior |
Example
You are building an internal policy assistant. Policy documents change monthly. Fine-tuning on old policies would be risky. RAG lets the model retrieve the latest approved policy document at runtime.
19. When Tools Are the Right Choice
Tools are the right choice when the model needs to do something exact or interact with a system.
| Situation | Why tools fit |
|---|---|
| Need live data | Query API/database |
| Need calculation | Use calculator/code |
| Need transaction | Trigger workflow |
| Need validation | Call rules engine |
| Need lookup | Search records |
| Need action | Create ticket/update system |
| Need audit | Log tool call and result |
Example
User asks: “Is this claim eligible for approval?” The model should not guess. It may need tools to fetch policy details, check member eligibility, calculate sum insured balance, check waiting period, validate hospital network status, and retrieve prior authorization history. That is not a fine-tuning problem. That is a system integration problem.
20. The PM Decision Framework
Use this framework before deciding.
Step 1: Identify the failure
| Failure type | Likely solution |
|---|---|
| Model does not understand instruction | Better prompt first |
| Model lacks current knowledge | RAG |
| Model needs live data | Tool/API |
| Model gives inconsistent format | Prompt first, then fine-tune if repeated |
| Model tone is inconsistent | Prompt first, fine-tune if scale matters |
| Model makes repeated domain classification errors | Fine-tuning may help |
| Model calculates incorrectly | Tool/calculator |
| Model acts without permission | Workflow control |
| Model hallucinates | RAG, citations, evals, guardrails |
Step 2: Check data readiness
| Question | If no |
|---|---|
| Do we have high-quality examples? | Do not fine-tune yet |
| Are outputs SME-approved? | Build review process first |
| Is data representative? | Collect more samples |
| Is task stable? | Use prompting |
| Is evaluation ready? | Build evals first |
| Is privacy handled? | Clean/redact data first |
Step 3: Compare cost and complexity
| Approach | Upfront cost | Runtime cost | Maintenance |
|---|---|---|---|
| Prompting | Low | Can rise with long prompts | Easy |
| RAG | Medium | Moderate | Requires document pipeline |
| Fine-tuning | Higher | Can reduce prompt cost | Requires model lifecycle |
| Tools | Medium to high | Depends on calls | Requires integration governance |
Step 4: Choose the minimum sufficient strategy
Do not choose the most advanced strategy. Choose the minimum strategy that reliably solves the product problem.
21. A Practical Example: Claims Shortfall Assistant
Product goal: build an AI assistant that reviews claim documents and suggests whether a shortfall should be raised.
Problem components
| Need | Best strategy |
|---|---|
| Understand user instruction | Prompting |
| Read claim documents | OCR + RAG/context |
| Know latest policy rules | RAG/rule engine |
| Identify missing documents | Prompt + rules + examples |
| Format shortfall reason | Prompt or fine-tuning |
| Use consistent tone | Prompt first, fine-tune later if needed |
| Check eligibility | Tool/API |
| Avoid final unauthorized decision | Workflow guardrail |
| Improve repeated classification | Fine-tuning if data exists |
Likely architecture
| Layer | Role |
|---|---|
| Prompt | Defines task and output format |
| RAG | Provides relevant policy/SOP clauses |
| Tools | Fetch live claim/member data |
| Rules engine | Handles deterministic checks |
| Fine-tuning | Optional later for repeated summary/shortfall patterns |
| Human review | Final approval |
| Evals | Measure correctness and safety |
Notice the answer is not “fine-tune” or “prompt.” The answer is product architecture.
22. A Practical Example: Brand Voice Assistant
Product goal: generate customer replies in a very specific brand voice.
Early stage
Use prompting: “Rewrite this in a calm, direct, helpful tone. Avoid apology-heavy language. Keep the answer under 120 words.” Collect examples. Measure edit rate. Ask users to approve/reject.
Later stage
If the task scales and tone consistency matters, fine-tune on approved examples.
| Signal | Meaning |
|---|---|
| High volume | Fine-tuning may save cost |
| Repeated tone edits | Prompt not enough |
| Stable brand guidelines | Training target is clear |
| Approved examples exist | Dataset ready |
| Acceptance rate measurable | Eval ready |
This is a good fine-tuning candidate.
23. A Practical Example: Policy Q&A Assistant
Product goal: answer employee questions from internal policy documents.
Do not start with fine-tuning. Start with RAG — because the issue is knowledge retrieval, not behavior learning.
| Requirement | Best strategy |
|---|---|
| Use latest policy | RAG |
| Cite source | RAG |
| Avoid outdated answers | RAG with versioning |
| Answer in simple language | Prompting |
| Refuse unsupported answer | Prompt + guardrail |
| Track usage | Analytics |
| Improve repeated phrasing | Maybe fine-tune later |
Fine-tuning on policies may go stale. RAG keeps knowledge external and updateable.
24. What PMs Should Not Do
Avoid these mistakes.
| Mistake | Why it is bad |
|---|---|
| Fine-tuning before evals | No proof of improvement |
| Fine-tuning to add current knowledge | Knowledge may become stale |
| Using prompting for everything forever | Prompts become bloated and brittle |
| Ignoring data quality | Model learns bad behavior |
| Skipping human review | High-risk outputs may slip |
| Comparing models by demo only | Demos hide edge cases |
| Not tracking versions | Cannot explain behavior changes |
| Not measuring cost | Product may become commercially unviable |
| Not testing against real data | Lab performance may not match production |
Fine-tuning is powerful. But badly managed fine-tuning creates expensive confusion.
25. The PM Mental Model
Prompting is instruction.
RAG is knowledge.
Tools are action.
Fine-tuning is behavior learning.
Evals are measurement.
Monitoring is production truth.
If you remember only one thing from this chapter, remember this:
Prompt first. Evaluate early. Retrieve knowledge. Use tools for exactness. Fine-tune only when the behavior is repeated, measurable, stable, and valuable.
That is the practical PM answer.
Chapter Summary
| Concept | PM understanding |
|---|---|
| Prompting | Runtime instructions that guide model behavior |
| Fine-tuning | Training the model on examples to learn repeated behavior |
| RAG | Bringing current or private knowledge into context |
| Tools | APIs or systems the model can call for exact data/actions |
| Evals | Measurement layer before and after optimization |
| Prompt limits | Long, fragile, expensive, inconsistent prompts |
| Fine-tuning value | Consistency, shorter prompts, repeated behavior, scale |
| Fine-tuning risk | Data quality, lifecycle cost, harder debugging |
| Data readiness | Fine-tuning needs clean, representative, approved examples |
| Minimum sufficient strategy | Use the simplest architecture that solves the problem reliably |
| PM role | Diagnose the failure before choosing prompting, RAG, tools, or fine-tuning |
Closing Thought
Fine-tuning vs prompting is the wrong question if asked too early. The better question is: What problem are we actually trying to solve?
If the model needs clearer instructions, prompt it. If it needs current knowledge, retrieve it. If it needs exact data or actions, give it tools. If it repeatedly fails at a stable behavior and you have good examples, fine-tune it.
That is how product managers should think. The goal is not to use the most advanced AI technique. The goal is to build the most reliable product system.
Fine-tuning is not a magic upgrade. Prompting is not a permanent shortcut. Both are tools. The product manager’s job is to know when each tool is the right one.
The next step in this module is controlling how the model generates each token — temperature, top-p, and sampling — the knobs that shape creativity versus consistency in live products.
The real PM lesson
Choose the minimum sufficient strategy — not the most impressive one on a slide deck.
Chapter navigation
Chapter 5: InstructGPT and RLHF — The PM Version
How InstructGPT and RLHF turned base language models into instruction-following assistants.
Read chapter → Next →Chapter 7: Temperature, Top-p, and Sampling — The PM Version
Why the same prompt can give different answers — and how sampling shapes reliability.
Read chapter →