Chapter 03 · Module 01 · Beginner–Intermediate · 22–26 min

Chapter 3: Tokens and Context Windows — The PM Version

Why tokens and context windows matter when building AI products.

Book: AI Learning Beginner–Intermediate 22–26 min
Start reading Back to module
Text Tokens Context window Product

Four layers: input → units → working memory → experience

Introduction

In Chapter 2, we understood how transformers became Large Language Models and why LLMs should be seen as prediction systems, not databases.

This chapter goes one level deeper into something every AI product manager must understand: tokens and context windows.

At first, this may sound like a technical backend topic. It is not.

Tokens and context windows directly influence:

  • how much information the model can read,
  • how much it can remember during a task,
  • how much an AI call costs,
  • how reliable the answer is,
  • how long an AI workflow can run,
  • how well an AI agent can use tools,
  • and whether the final experience feels intelligent or confused.

A lot of AI products fail not because the model is weak, but because the product passes the wrong context, too much context, stale context, or no context at all.

The simple PM version

Context is not storage. Context is working memory.
And working memory must be designed.

1. What Is a Token?

A token is the unit of text that an LLM reads and generates.

People casually say that models read "words," but that is not technically accurate. Models process tokens.

InputPossible Token Meaning
hospitalA full word
authorizationMay be split into smaller pieces
₹50,000Currency symbol, number, comma, or grouped chunks
function getClaim()Code tokens
Pre-AuthWord fragments and punctuation
.Punctuation
2026Number token

The exact tokenization depends on the model's tokenizer, but the product implication is simple:

PM translation

The model does not see your content exactly like a human sees paragraphs, pages, and documents. It sees a sequence of tokens — and that sequence has limits.

2. Why Tokens Matter for Product Managers

Tokens are not just a technical billing detail. They are a product constraint.

Every AI feature has a token economy.

When a user asks a question, the model may receive:

Context ComponentExample
System prompt"You are a claim adjudication assistant."
User message"Review this case."
Conversation historyPrevious user and assistant turns
Retrieved documentsPolicy wording, SOP, claim documents
Tool definitionsAvailable APIs or tools the model can call
Tool resultsDatabase lookup, OCR output, search result
InstructionsOutput format, refusal rules, tone
Current responseTokens generated by the model

All of this consumes the available context window.

The model does not only process the user's visible question. It processes the full package sent to it — hidden instructions, prior conversation, retrieved files, tool schemas, API results, and expected output space.

PM Takeaway

When you design an AI feature, you are not only designing the UI. You are deciding:

PM DecisionToken Impact
How much chat history to retainMore history = more tokens
How many documents to retrieveMore documents = more tokens
How detailed the system prompt should beMore instruction = more tokens
Whether to include full files or summariesFull files consume more context
Whether to use toolsTool definitions and outputs consume tokens
Whether to support long workflowsLong workflows need context strategy

PM takeaway

A PM who does not understand tokens will design AI features that look good in demos and fail in production.

3. What Is a Context Window?

The context window is the total amount of information the model can consider while generating a response.

Think of it as the model's working desk. You can place documents, instructions, user queries, examples, tool results, and prior conversation on the desk. The model can work with what is on the desk.

But if something is not on the desk, the model cannot reliably use it.

The context window is different from the model's training data.

ConceptMeaning
Training dataThe broad data the model learned from during training
Context windowThe current information available for this specific request

This distinction is critical. The model may have general knowledge from training, but for a specific business task, you need to provide the right working context.

Simple Analogy

Human WorkflowLLM Workflow
A person has long-term memoryModel has training data
A person opens files on a deskModel receives context
A person can only focus on visible materialModel can only attend to tokens in context
A person may forget earlier conversationModel may lose access when context is removed
A person performs better with organized notesModel performs better with curated context

PM takeaway

The context window should not be treated casually. It is one of the most important design surfaces in an AI product.

4. Context Window Is Not the Same as Memory

This is one of the most common misunderstandings in AI product discussions.

People say: "The model should remember this." But what do they mean by remember?

There are different types of memory in AI products.

TypeMeaningExample
Training memoryPatterns learned during model trainingGeneral language, coding patterns, public facts
Context windowCurrent working memoryCurrent prompt, documents, chat history
Product memoryStored user or project dataCRM notes, preferences, prior tasks
Agent stateWorkflow state across stepsCurrent task, completed actions, next action
Long-term memoryPersistent external memoryVector DB, database, file store

The context window is only one layer. It is temporary. Once the conversation changes, gets summarized, or exceeds the limit, earlier information may be dropped, compressed, or no longer visible to the model.

PM Takeaway

Do not use the context window as your product's long-term memory. Use proper storage.

NeedBetter Design
Remember user preferenceStore in profile/database
Remember past claim decisionStore in system of record
Remember prior chat summaryStore conversation summary
Remember project knowledgeUse RAG/document index
Remember agent progressStore workflow state
Remember audit trailStore structured logs

Mental model

A context window is for active reasoning. It is not permanent memory.

5. How Context Builds Up During a Conversation

In a multi-turn conversation, earlier user messages and assistant responses are usually included again in the next request. That means context grows as the conversation continues.

TurnWhat Gets Sent
Turn 1System prompt + user message
Turn 2System prompt + Turn 1 user + Turn 1 assistant + new user message
Turn 3System prompt + earlier history + new user message
Turn 10A lot of accumulated conversation

This is fine for short chats. It becomes risky for long-running workflows.

Why? Because the model may be forced to process:

  • old instructions,
  • repeated details,
  • irrelevant discussion,
  • long tool outputs,
  • large documents,
  • intermediate reasoning,
  • duplicated content,
  • stale context.

At some point, the model's working memory gets cluttered.

PM takeaway

The issue is not only hitting the maximum limit. Performance can degrade before the limit — when context gets crowded.

6. Bigger Context Is Useful, But Not Automatically Better

A larger context window allows the model to handle longer prompts, larger documents, bigger codebases, and more complex workflows. That sounds great.

But bigger context is not automatically better.

A large context window gives capacity. It does not automatically create clarity.

PM Analogy

A large desk helps if the right documents are placed neatly. But if the desk is filled with random files, duplicate printouts, old notes, irrelevant emails, and half-read PDFs, the person working on the task may actually perform worse.

Same with LLMs.

Bad Product ThinkingBetter Product Thinking
"The model supports a huge context window. Let us send everything.""The model supports a large context window. Let us decide what deserves to enter context, in what order, and in what format."

That is context engineering

Capacity alone does not create a reliable product. Curation does.

7. Context Rot: The Hidden Product Risk

As token count grows, model performance may degrade.

Anthropic refers to this type of degradation as context rot.

In simple language

The more cluttered the context becomes, the harder it can become for the model to reliably find and use the right information.

In AI products, context rot may look like:

SymptomWhat User Sees
Missed instructionModel ignores an important rule
Lost factModel forgets a document detail
Weak retrievalModel cites irrelevant context
ContradictionModel says something inconsistent
Shallow answerModel summarizes instead of reasoning
Bad tool choiceAgent calls the wrong tool
Workflow driftAgent moves away from the original task

For PMs, context rot explains why long AI workflows can become unreliable even when they technically fit inside the context window.

Example: Claims Review

Suppose you pass:

  • full policy document,
  • hospital bill,
  • discharge summary,
  • previous authorization letter,
  • email trail,
  • insurer guideline,
  • claim notes,
  • OCR output,
  • user instructions,
  • processor comments.

The model may have enough token capacity to receive all of it. But will it focus on the right sections? Not automatically.

PM takeaway

You still need retrieval, prioritization, structure, and clear instructions.

8. Context Engineering: The New PM Skill

Prompt engineering gets attention, but context engineering is more important for serious AI products.

Prompt engineering asks: What instruction should we give the model?

Context engineering asks: What information should the model see before it answers?

That second question is often more important.

Context Engineering Includes

AreaPM Question
Context selectionWhat information should be included?
Context exclusionWhat should be left out?
OrderingWhat should appear first or last?
CompressionWhat can be summarized?
FreshnessIs this the latest source?
AuthorityWhich source should override others?
RelevanceIs this needed for this task?
FormatShould this be table, JSON, bullet, or text?
LifecycleWhen should old context be removed?

A PM does not need to implement tokenization logic. But a PM must define the product rules for context.

Example

Weak context design: Send all claim documents to the model.

Better context design: Send the claim type, policy summary, relevant policy clauses, latest bill extract, discharge diagnosis, previous authorization status, missing document checklist, and user's current question. Exclude unrelated email trail unless the user asks for communication history.

PM takeaway

This is the difference between a demo and a product.

9. Token Budgeting: How PMs Should Think

A token budget is a planning tool. It answers: how much of the context window should be reserved for each part of the task?

A simple token budget may look like this:

Context ComponentBudget Priority
System instructionMust include
User requestMust include
Relevant retrieved documentsMust include
Tool definitionsInclude only required tools
Tool resultsInclude only useful results
Conversation historySummarize when long
Output spaceReserve enough for final answer
Reasoning budgetReserve if using extended reasoning

A common mistake is filling the input context so aggressively that the model has too little room to produce a useful answer.

Remember

The context window includes both what the model reads and what it generates.

PM Takeaway

When defining an AI feature, ask:

QuestionWhy It Matters
How much input context is needed?Controls grounding
How much output is expected?Controls user experience
How much history is useful?Controls continuity
How much tool output should remain?Controls agent reliability
What happens when budget is exceeded?Controls failure behavior

Token budgeting should be part of AI product requirements.

10. Token Counting: Measurement Before Execution

Token counting is the discipline of estimating how large your prompt is before sending it to the model.

Token count affects:

  • cost,
  • latency,
  • rate limits,
  • model routing,
  • context overflow,
  • prompt optimization.

For a product team, token counting should not be treated as a developer-only diagnostic. It should support product decisions.

Product ScenarioToken Counting Use
User uploads 100-page PDFCheck whether it fits before processing
Agent uses many toolsEstimate tool-definition overhead
Chat has 50 turnsDecide when to summarize
Enterprise workflow has cost capRoute to cheaper model when possible
User asks for detailed reportReserve output space
Mobile AI featureOptimize for latency

If your product has no token visibility, you are flying blind.

Further reading

For tokenization and context window fundamentals, see Anthropic's context window documentation.

11. Context Overflow: What Happens When You Run Out of Room?

Context overflow happens when the input plus expected output exceeds what the model can handle. In product terms, this means the model has run out of working memory.

Bad experience: "Error: context length exceeded."

Better experience: "This document set is too large to review in one pass. I'll summarize the older documents first, then analyze the most relevant sections."

Product Fallback Options

ProblemProduct Response
Too many documentsAsk user to select priority documents
Long conversationSummarize earlier discussion
Large tool outputKeep only important fields
Too many images/PDF pagesSplit into batches
Long report requestedGenerate section by section
Agent workflow too longSave state and start new context

PM takeaway

This is where AI product maturity shows. A serious AI product should not collapse when context is full — it should recover gracefully.

12. Context Management Strategies

For long-running conversations and agentic workflows, context has to be actively managed. There are three important product patterns.

Pattern 1: Summarize

Use when the conversation is long but the full detail is no longer needed.

"Summarize the first 30 turns into user goals, decisions made, unresolved questions, and next actions."

Pattern 2: Select

Use when only some documents are relevant.

Retrieve only the policy clauses related to maternity waiting period instead of sending the full policy booklet.

Pattern 3: Clear

Use when old tool outputs are no longer useful.

Remove raw API responses after extracting the final structured values.

PM Takeaway

Context management should be designed as a lifecycle.

StageContext Action
Start of taskLoad task instructions and key data
During taskAdd tool results and user clarifications
MidwaySummarize or compact old context
After tool useClear irrelevant raw outputs
Before final answerKeep only evidence and decision logic
After taskStore summary and audit trail externally

Do not let context grow randomly

Random context creates random behavior.

13. Extended Thinking and Context

Some models support extended thinking, where the model can spend more tokens on internal reasoning before giving a final answer.

For PMs

Reasoning also consumes budget. If you ask a model to think deeply, you must reserve room for that thinking and for the final answer.

Where Extended Thinking Helps

Use CaseWhy It Helps
Complex claim adjudicationMultiple policy checks and deductions
Codebase analysisNeed to trace dependencies
Legal clause comparisonNeed careful interpretation
Financial reconciliationMultiple calculations and exceptions
Strategic planningNeeds trade-off analysis

Where It May Be Unnecessary

Use CaseWhy Not
Simple rewriteLow reasoning need
Basic classificationShort task
FAQ answerRAG may be enough
Short summaryFast response preferred

PM takeaway

Do not enable deep reasoning everywhere. Use it where accuracy and complexity justify cost and latency.

14. Tool Use Adds More Context Pressure

AI agents often use tools. Tools are powerful, but they add context pressure.

A tool-using workflow may include:

  • tool definitions,
  • tool call request,
  • tool result,
  • assistant interpretation,
  • follow-up tool call,
  • additional result,
  • final response.

Every tool call creates context overhead.

If your AI agent calls too many tools, keeps too many raw results, or carries unnecessary history, the workflow becomes expensive and fragile.

PM Questions for Tool-Based Products

QuestionWhy It Matters
Which tools should be available for this task?Too many tools confuse and increase overhead
What fields should each tool return?Raw payloads may waste context
Should tool results be summarized?Reduces clutter
When should tool results be cleared?Prevents context rot
Which actions need approval?Controls risk
What should be logged outside context?Supports audit

PM takeaway

Tool use is not just engineering. It is product behavior design.

15. Context Awareness: Models Knowing Their Remaining Budget

Some newer models can be made aware of their remaining context budget during long workflows. This matters because a model that knows its remaining budget can behave more intelligently.

It can decide:

  • whether to be concise,
  • whether to summarize,
  • whether to continue,
  • whether to ask for prioritization,
  • whether to avoid unnecessary detail,
  • whether to save state before context runs out.

PM Takeaway

Future AI products will not only manage user journeys. They will manage context journeys.

For long-running workflows, your system should know:

StateProduct Behavior
Plenty of context leftContinue normal workflow
Moderate context leftStart being selective
Low context leftSummarize and compact
Near limitSave state and restart
Exceeded limitRecover gracefully

Agentic products

An AI agent without context management is like an employee working on a complex case while their notes keep disappearing.

16. Long Context vs RAG: Which Should a PM Choose?

A common PM question is: If the model has a huge context window, do we still need RAG?

Yes, often you do.

Long context and RAG solve related but different problems.

ApproachStrengthWeakness
Long contextCan process a large amount of material togetherExpensive, slower, risk of context rot
RAGRetrieves focused, relevant informationDepends on retrieval quality
SummarizationReduces volumeMay lose details
Tool useGets exact current dataAdds workflow complexity
Memory storePreserves long-term stateNeeds retrieval and governance

Product Rule of Thumb

  • Use long context when the model genuinely needs to reason across many pieces together.
  • Use RAG when the model needs the most relevant pieces from a larger knowledge base.
  • Use both when the task needs broad awareness and precise grounding.

Examples

Use CaseBetter Approach
Summarize one long contractLong context
Answer policy question from 10,000 documentsRAG
Compare 20 uploaded claim documentsLong context + structured extraction
Customer support from knowledge baseRAG
Codebase migration planningLong context + repo graph + retrieval
Agent running multi-step workflowRAG + tools + compaction

PM takeaway

A large context window does not remove the need for product architecture.

17. How Poor Context Design Creates Hallucination

Hallucination is not only caused by weak models. It is often caused by poor context design.

Poor Context DesignLikely Failure
Missing source documentModel guesses
Too many irrelevant documentsModel focuses on wrong detail
Old policy mixed with new policyModel gives outdated answer
No source priorityModel treats weak source as strong
Raw OCR errors includedModel trusts bad extraction
Long chat history retainedModel follows stale instruction
Tool output too largeModel misses key fields

PM Takeaway

When an AI answer is wrong, do not only blame the model. Ask:

  1. Was the right context retrieved?
  2. Was irrelevant context removed?
  3. Was the source current?
  4. Was the context structured?
  5. Was the model told which source to trust?
  6. Was there enough output budget?
  7. Was the task too broad?

PM takeaway

Many AI failures are context failures.

18. Context Design for Enterprise Workflows

Enterprise AI products need stricter context design than consumer chatbots. A consumer can tolerate a slightly vague answer. A business workflow often cannot.

Example: Pre-Auth Claim Review

A good context package may include:

Context ItemInclude?Reason
Current user questionYesDefines task
Policy number and product typeYesDefines coverage context
Admission dateYesNeeded for eligibility
Diagnosis and procedureYesNeeded for medical decision
Relevant policy clausesYesGrounding
Full policy bookletMaybeOnly if targeted clauses are not enough
Hospital bill line itemsYesNeeded for deductions
Previous authorization historyYesAvoids duplicate decisioning
Raw email trailMaybeOnly if communication issue exists
Old irrelevant claimsNoAdds noise
Internal notesYes, if relevantOperational context
Tool logsNo, unless neededAvoid clutter

How PMs should think

Not "send everything." Not "send only the user query." Send the right context.

19. A PM Checklist for Tokens and Context Windows

Before shipping an AI feature, ask:

QuestionWhy It Matters
What is the maximum expected input size?Avoid overflow
What is the expected output size?Reserve generation budget
What documents enter context?Control grounding
Who decides relevance?Product or retrieval logic
What gets summarized?Manage long sessions
What gets deleted from context?Prevent clutter
How are old instructions handled?Avoid conflict
How are tool results handled?Prevent context bloat
What happens when limit is near?Graceful fallback
How is token cost monitored?Commercial viability
How is latency monitored?User experience
How is context quality evaluated?Accuracy and trust

Add these questions to your AI PRDs. Tokens and context should not be afterthoughts.

20. Final Mental Model

Tokens are the units the model reads and writes.
The context window is the model's working memory.
A large context window gives capacity, but not automatic accuracy.
More context can help, but bad context can hurt.

For product managers, the real skill is not simply asking: "How many tokens does the model support?"

The better question is: "What context does the model need to complete this task reliably?"

That is the shift. AI product quality depends on context quality.

The best AI products will not dump everything into the model. They will curate, compress, retrieve, prioritize, clear, and preserve context deliberately.

The real PM version

Tokens and context windows are not just engineering details. They are product architecture.

Chapter Summary

ConceptPM Understanding
TokenThe unit of text the model processes.
Context windowThe model's working memory for the current request.
Training data vs contextTraining data is learned earlier; context is what the model sees now.
Token budgetPlanning how much room each part of the task gets.
Token countingEstimating prompt size before execution.
Context rotAccuracy and recall can degrade as context grows.
Long contextUseful capacity, but not a replacement for context design.
RAGRetrieves relevant external knowledge into context.
CompactionSummarizes older context so long workflows can continue.
Context editingRemoves unnecessary old tool results or blocks.
Extended thinkingUses additional output tokens for deeper reasoning.
Tool useAdds power but also context overhead.
Context awarenessModel capability to track remaining token budget.
PM roleDesign what enters context, what stays out, and what happens near limits.

Closing Thought

In traditional software, product managers designed screens, flows, fields, and rules. In AI products, product managers must also design context.

That means deciding what the model sees, what it ignores, what it retrieves, what it remembers, what it forgets, and what it should do when the working memory gets crowded.

This is not a small technical detail. This is one of the foundations of reliable AI product design.

A model with a large context window can still fail if the product gives it the wrong context. A smaller model with clean, relevant, well-structured context can often outperform a larger model drowning in noise.

The real PM lesson

Better context beats bigger context.

Chapter navigation

← Previous

Chapter 2: From Transformers to LLMs — The PM Version

How Transformer architecture became the foundation of modern AI products.

Read chapter →
Next →

Chapter 4: AI Safety, RLHF, and Constitutional AI

Why safety is product architecture — not just policy — and how RLHF and Constitutional AI shape behavior.

Read chapter →