A Great Place to Upskill
Company
Get the latest updates from Product Space
Not long ago, product managers could get by with a working knowledge of APIs, databases, and sprint ceremonies. That era is over.
In 2026, AI is no longer a feature, it's the foundation. Product teams are embedding language models into core workflows, building autonomous agents that take actions on behalf of users, and shipping entirely new categories of software that didn't exist two years ago. If you can't speak the language of AI, you will struggle to lead in it.
The glossary covers 120 essential AI terms, organized into categories, with plain-language definitions built for working product managers. Whether you're working with an in-house ML team or integrating third-party AI APIs, this is the reference you'll return to.
1. Foundation AI Concepts
Artificial Intelligence (AI)
Software systems designed to perform tasks that typically require human intelligence reasoning, understanding language, recognizing patterns, and making decisions.
Machine Learning (ML)
A subset of AI where models improve their performance by learning from data, without being explicitly programmed for every scenario.
Deep Learning
A type of machine learning using neural networks with many layers (hence "deep") to learn complex representations of data. Powers most modern AI systems.
Neural Network
A computing architecture loosely inspired by the human brain. Layers of interconnected nodes process and transform data to produce outputs.
Model
The trained artifact that powers an AI system the result of exposing a neural network to large amounts of data during training.
Training
The process of exposing a model to data so it learns patterns, weights, and representations. Training is expensive and happens before deployment.
Inference
Running a trained model to generate an output from a new input. Every time a user interacts with an AI feature, inference is happening. Inference cost directly affects your unit economics.
Parameters
The numerical values inside a model that are learned during training. More parameters generally means more capacity but also more cost.
Foundation Model
A large model trained on broad data that can be adapted for many tasks. GPT-4, Claude, Gemini, and Llama are all foundation models.
Pre-training
The initial phase of training where a model learns general language understanding from massive datasets. This is the most expensive phase.
2. Large Language Model (LLM) Terms
A type of AI model trained on massive amounts of text data to understand and generate human-like language. The backbone of most modern AI products.
The basic unit of text an LLM processes. A token is roughly 0.75 words in English. Tokens affect both cost (you pay per token) and context limits.
The maximum amount of text an LLM can process in a single interaction both input and output combined. Context window determines how much the model can "remember" at once.
The input sent to an LLM. The quality and structure of a prompt significantly affects output quality.
The output generated by a model in response to a prompt. Also called a "response" or "generation."
A setting that controls how random or creative a model's output is. Low temperature = more predictable; high temperature = more varied.
Another output randomness control. Limits token selection to the most probable tokens until a cumulative probability threshold is reached.
Instructions given to an LLM before any user interaction. Defines the model's behavior, persona, constraints, and task context.
An LLM specifically optimized for step-by-step logical reasoning, often producing intermediate "thinking" steps before a final answer. Examples: OpenAI o1, Claude's extended thinking mode.
An LLM with an exceptionally large context window (100K+ tokens), enabling it to process entire documents, codebases, or conversation histories.
An AI model that can process and generate multiple types of data text, images, audio, and video within a single model.
An AI model whose weights are publicly available for anyone to download, run, and fine-tune. Examples: Llama, Mistral, Falcon.
A proprietary model accessed only through an API. The underlying weights are not publicly released. Examples: GPT-4o, Claude 3.5 Sonnet, Gemini Ultra.
3. Prompt Engineering Terms
The practice of designing and optimizing input prompts to improve model outputs. A core skill for teams building AI-powered features.
A broader discipline than prompt engineering designing and managing the full context that an AI model receives: system prompts, retrieved content, conversation history, tool outputs, and more.
Including a small number of input-output examples in a prompt to guide the model's behavior. More reliable than zero-shot for structured tasks.
Asking a model to perform a task without examples, relying entirely on its pre-trained knowledge.
A prompting technique that instructs the model to reason through a problem step-by-step before giving a final answer. Significantly improves accuracy on complex tasks.
Assigning the model a specific persona or role ("You are an expert tax advisor…") to shape its tone, style, and knowledge focus.
A fine-tuning approach where a model is trained specifically on instruction-following examples to make it more reliable and responsive to user commands.
A security vulnerability where malicious input overrides a system prompt's instructions, causing the model to behave in unintended ways. A real risk in production AI systems.
Techniques to reduce the size of a context window input without losing critical information — important for cost management and latency.
Configuring a model to return responses in a defined format (JSON, XML, etc.) rather than free-form text. Enables reliable downstream processing.
4. AI Agent & Automation Terms
An AI system capable of taking actions autonomously using tools, making decisions, and completing multi-step tasks based on a goal, not just a single prompt.
AI systems designed to operate with persistent goals, memory, and tool access across extended tasks. Represents a shift from reactive chatbots to autonomous software actors.
An agent that can execute tasks end-to-end without human approval at each step. Requires robust guardrails and error handling in production.
An architecture where multiple AI agents collaborate, each handling specialized subtasks. One agent might plan; another might execute; another might verify.
The process of coordinating multiple agents, tools, and model calls in a structured workflow. Orchestration defines the logic that governs complex AI pipelines.
The ability of an LLM to invoke external tools — APIs, code executors, databases to extend its capabilities beyond language generation.
A specific implementation of tool use where the model generates structured function calls that the host application executes. Standardized in the OpenAI API and others.
An open protocol that standardizes how AI models connect to external tools, data sources, and services. MCP enables interoperable, reusable tool integrations across agents and applications.
A defined sequence of AI-powered steps designed to complete a business process. For example, "receive support ticket → classify → retrieve context → draft reply → escalate if needed."
An AI assistant embedded within a product to assist users with tasks rather than replace them. Copilots augment human work; agents automate it.
A design pattern where a human approves, reviews, or corrects AI actions at defined checkpoints. Critical for high-stakes or error-prone workflows.
An agent responsible for breaking a high-level goal into smaller, executable subtasks and routing them to appropriate agents or tools.
An agent that carries out specific tasks assigned by a planner such as writing a file, calling an API, or querying a database.
5. Machine Learning Concepts
Training a model on labeled input-output pairs. The model learns to predict outputs for new inputs. Used in classification, regression, and many traditional ML tasks.
Training without labeled data — the model discovers patterns, clusters, or structure on its own. Useful for anomaly detection and data segmentation.
Training where an agent learns by taking actions and receiving reward signals. The model improves by maximizing cumulative reward over time.
A technique for aligning LLMs with human preferences. Human raters evaluate model outputs; their feedback trains a reward model that guides further fine-tuning. Used to make ChatGPT, Claude, and Gemini safer and more helpful.
Continuing to train a pre-trained model on a smaller, task-specific dataset. Produces a model adapted to your domain or use case without training from scratch.
Using knowledge from a model trained on one task to improve performance on a different (but related) task. Foundation models are the ultimate expression of transfer learning.
When a model learns training data too precisely and fails to generalize to new inputs. A common quality problem in custom-trained models.
When a model is too simple to capture the underlying patterns in data, resulting in poor performance even on training data.
A standardized test used to measure and compare model capabilities — for example, performance on reasoning, coding, or factual accuracy tasks.
Artificially generated data used to train or augment ML models. Useful when real data is scarce, sensitive, or expensive to label.
6. AI Infrastructure Terms
The full set of infrastructure components that power an AI product: models, APIs, vector databases, orchestration layers, observability tools, and more.
The underlying systems — compute, storage, model serving, and tooling required to build and run AI products reliably at scale.
Specialized hardware that dramatically accelerates AI training and inference. Access to GPUs is a core infrastructure consideration for AI teams.
The infrastructure for deploying a trained model so it can receive inputs and return outputs at production scale with acceptable latency.
The time between sending a request to an AI model and receiving a response. Latency is a key product metric especially for real-time user-facing features.
The number of AI requests a system can handle per unit of time. Critical for scaling AI features under high load.
The per-request cost of running model inference, typically measured per thousand tokens. Inference cost directly affects AI product unit economics and pricing strategy.
A technique that reduces model size and inference cost by representing model weights with lower numerical precision. A common optimization for deploying models efficiently.
Training a smaller "student" model to replicate the behavior of a larger "teacher" model. Produces cheaper, faster models with comparable quality for specific tasks.
The interface through which your product sends inputs to and receives outputs from an AI model. Most teams access foundation models via API.
Restrictions on how many API requests your application can make in a given time window. Relevant for capacity planning and feature design.
The practice of monitoring AI system behavior in production tracking inputs, outputs, latency, errors, and quality metrics to detect and diagnose issues.
The consistency and stability of AI system outputs over time. High reliability means the system produces expected results with low error rates under real-world conditions.
7. Retrieval & Memory Systems
A technique that enhances LLM responses by first retrieving relevant documents from an external knowledge base, then including that content in the model's context. Reduces hallucination and enables up-to-date knowledge without fine-tuning.
A database optimized for storing and querying high-dimensional vector embeddings. Powers semantic search and retrieval in AI applications. Examples: Pinecone, Weaviate, pgvector.
A numerical representation of text (or other data) as a vector in high-dimensional space. Similar meanings produce similar vectors. Embeddings power semantic search and retrieval.
Search that finds results based on meaning rather than exact keyword match. Powered by embeddings and vector databases.
Connecting an AI model's outputs to verifiable, real-world data sources to improve accuracy and reduce hallucination. RAG is a form of grounding.
A component of an AI system that persists information across sessions or interactions enabling the model to "remember" prior context. Types include short-term (session), long-term (user profile), and episodic memory.
A structured database of entities and their relationships. Can be queried by AI systems to retrieve structured facts more precise than vector search for certain tasks.
In retrieval systems, a data structure that enables fast lookup of relevant documents or embeddings given a query.
Splitting long documents into smaller segments before embedding and indexing, so retrieval returns focused, relevant passages rather than entire documents.
A second-stage retrieval step that re-scores and reorders retrieved documents for relevance before passing them to the model. Improves retrieval quality.
8. AI Product Development Terms
A product built from the ground up around AI capabilities not a traditional product with AI bolted on. AI-native products have fundamentally different design, UX, and architecture patterns.
A product capability powered by AI, such as smart search, content generation, classification, or recommendation.
A reusable, parameterized prompt structure that can be filled with dynamic values at runtime. Central to building consistent, maintainable AI features.
Constraints applied to an AI model's inputs or outputs to enforce safety, quality, and policy compliance. Examples: content filtering, output validation, topic restrictions.
A behavior defined for when an AI system fails, produces low-confidence output, or hits a rate limit. Good AI product design always includes graceful fallbacks.
A numeric value indicating how certain a model is about its output. Used to trigger fallbacks, human review, or escalation logic.
Delivering model output token by token as it's generated, rather than waiting for the full response. Significantly improves perceived performance for long outputs.
The framework for deciding when, where, and how to use AI in a product including model selection, build vs. buy decisions, and user experience tradeoffs.
The process of measuring AI system quality accuracy, helpfulness, safety, and consistency systematically and at scale. Evals are critical for safe deployment.
Deliberately attempting to find failure modes, safety issues, or prompt injection vulnerabilities in an AI system before deployment.
Running an AI feature in production without surfacing outputs to users using real traffic to evaluate quality before a full launch.
9. AI Safety & Governance Terms
The challenge of ensuring AI systems pursue goals that are consistent with human values and intentions. A core concern as AI systems become more autonomous.
When an AI model generates plausible-sounding but factually incorrect information. A fundamental limitation of LLMs and a key risk to manage in product design.
Systematic errors in model outputs caused by skewed training data or model design. Can manifest as unfair treatment of specific groups or topics.
The policies, processes, and structures organizations use to manage AI risk, compliance, fairness, and accountability.
Automated or human review processes that detect and filter unsafe, harmful, or policy-violating AI outputs before they reach users.
A framework for developing and deploying AI ethically — considering fairness, transparency, safety, privacy, and accountability.
The protection of user data used in AI systems — covering collection, storage, usage, and deletion, in compliance with regulations like GDPR and CCPA.
A standardized document that describes a model's intended use, training data, evaluation results, limitations, and ethical considerations. A governance best practice.
A systematic review of an AI system's behavior, data practices, and decision-making processes — often required for regulated industries.
Techniques for embedding imperceptible markers in AI-generated content to enable later detection of AI origin. Increasingly required by regulation.
10. AI Analytics & Evaluation Terms
Structured testing to assess AI output quality across dimensions like accuracy, coherence, safety, and task completion. Replaces traditional unit tests in AI pipelines.
Using a language model to evaluate the outputs of another language model — a scalable alternative to human evaluation for quality assessment.
Running standardized tests to compare model performance across providers, versions, or configurations. Helps inform model selection decisions.
Comparing two AI configurations (different models, prompts, or parameters) with live users to determine which produces better outcomes.
Statistical measures of response time for 95% or 99% of requests. Better than average latency for understanding real-world user experience.
Tracking how many tokens your AI features consume per request, per user, and per feature — essential for cost management and capacity planning.
When model outputs change over time as underlying models are updated by providers, or as user input patterns shift. Monitoring for drift is a key AI ops practice.
Systematically identifying the ways an AI feature can produce incorrect, harmful, or unhelpful outputs — and designing mitigations.
11. Enterprise AI Terms
An LLM deployment configured for enterprise use, with data privacy guarantees, access controls, compliance features, and SLA commitments.
Running an AI model on infrastructure controlled by the organization, rather than via a public cloud API. Preferred when data sensitivity is high.
The end-to-end process for adapting a foundation model to an organization's specific data and use case — including data preparation, training, evaluation, and deployment.
An internal team responsible for establishing AI standards, tooling, governance, and enablement across an organization.
The full infrastructure for ingesting, chunking, embedding, indexing, and querying documents in a RAG system.
The process of evaluating, selecting, and contracting with AI model providers and tooling vendors. Involves legal, security, and compliance review.
Governing which users or teams can access which AI features, models, or data — critical for enterprise security and compliance.
12. Emerging AI Trends for 2026
A product architecture that combines multiple AI models, tools, retrievers, and logic layers — rather than relying on a single model call — to accomplish complex tasks reliably.
User experience design patterns built specifically for AI-powered interactions — managing uncertainty, streaming output, transparency, and human override.
AI that operates continuously in the background, monitoring context and taking action without requiring explicit user commands.
Emerging architectures that combine long-chain reasoning models with dynamic retrieval, enabling AI to reason over large, live knowledge bases.
Running AI models directly on user devices (phones, laptops) rather than in the cloud enabling offline use, lower latency, and improved privacy.
Persistent storage that allows AI agents to remember past actions, user preferences, and prior context across multiple sessions.
Protocols that allow multiple AI agents to exchange information, delegate tasks, and coordinate actions enabling more complex multi-agent workflows.
Using additional compute at the moment of inference (rather than only at training) to improve model reasoning quality. Powers modern reasoning models.
A system that dynamically selects the most appropriate model for a given task based on complexity, cost, latency, or capability requirements.
Product design patterns for workflows where humans and AI agents collaborate dynamically sharing tasks, escalating edge cases, and dividing decision authority.
If you're building your AI vocabulary and want to start with the highest-impact terms, focus here:
"AI is just ChatGPT." ChatGPT is one product built on one foundation model. The AI landscape includes hundreds of models, architectures, and deployment patterns. Equating AI with a single product limits your strategic thinking.
"Prompt engineering is enough." Prompt engineering matters, but production AI quality depends on context design, retrieval systems, evaluation infrastructure, guardrails, and more. A better prompt is not a substitute for a well-architected system.
"AI agents are fully autonomous and reliable." Current AI agents are impressive but fail in unpredictable ways. Production agentic systems require human checkpoints, error handling, and fallback logic. Plan for failure, not just success.
"Bigger models always mean better products." Larger models are more capable in general, but often not for your specific use case. A smaller, fine-tuned model can outperform a frontier model on domain-specific tasks at a fraction of the cost.
"AI replaces product strategy." AI is a capability, not a strategy. The PMs who will win are those who pair AI fluency with sharp user understanding, clear judgment, and strong execution not those who outsource thinking to models.
| Dimension | Prompt Engineering | Context Engineering |
| Scope | The input prompt | The full model context |
| Includes | Instructions, examples, framing | System prompts, retrieved docs, history, tool results |
| Maturity | Established practice | Emerging discipline |
| Impact | Moderate | High — especially for complex systems |
| Dimension | RAG | Fine-Tuning |
| Use Case | Access to up-to-date or proprietary data | Adapting model behavior/style for a domain |
| Cost | Lower upfront; retrieval infrastructure needed | Higher (training compute + data prep) |
| Freshness | Real-time | Static until retrained |
| Risk | Retrieval quality matters | Overfitting, data quality risks |
| Best For | Knowledge-heavy Q&A, document search | Consistent tone, specialized tasks |
| Dimension | AI Agent | Traditional Automation |
| Logic | Dynamic, model-driven | Static, rule-based |
| Handles Ambiguity | Yes | No |
| Tool Use | Flexible | Predefined |
| Failure Mode | Unpredictable | Predictable |
| Best For | Open-ended tasks, judgment calls | Repetitive, well-defined processes |
| Dimension | Open-Source | Closed-Source |
| Data Privacy | Full control | Shared with provider |
| Cost | Infrastructure only | Per-token API pricing |
| Customization | Full fine-tuning access | Limited |
| Capability (frontier) | Slightly behind | Leading edge |
| Deployment | Self-managed | Managed by provider |
What AI terms should product managers know? PMs should prioritize: LLM, token, context window, RAG, embeddings, hallucination, inference cost, AI agents, guardrails, and evals. These terms connect directly to product decisions about architecture, cost, safety, and quality.
What is RAG in AI? RAG (Retrieval-Augmented Generation) is a technique where an AI model retrieves relevant documents from an external knowledge base before generating a response. It grounds the model in real, up-to-date information reducing hallucination and enabling domain-specific knowledge without fine-tuning.
What is the difference between AI agents and chatbots? Chatbots respond to individual messages in a turn-by-turn format. AI agents pursue goals autonomously over multiple steps taking actions, using tools, making decisions, and adapting based on results. Agents are significantly more capable and more complex.
What is context engineering? Context engineering is the practice of designing and managing everything that goes into an AI model's context window — not just the prompt, but retrieved documents, conversation history, tool outputs, and system instructions. It's a broader, more systematic discipline than prompt engineering.
What is an embedding? An embedding is a numerical vector representation of text. Words or passages with similar meanings produce similar vectors. Embeddings power semantic search, recommendation systems, and retrieval in RAG architectures.
What is MCP? MCP (Model Context Protocol) is an open standard that defines how AI models connect to external tools, data sources, and services. It enables interoperable, reusable integrations so an agent can use a tool built for one platform in another without custom code.
What AI concepts matter most in 2026? In 2026, the most strategically important AI concepts for PMs are: agentic AI architecture, context engineering, inference economics, evaluation infrastructure, and AI observability. These are the concepts that determine whether AI features ship safely, scale efficiently, and improve over time.
AI literacy is no longer a nice-to-have for product managers. It is a professional baseline as fundamental today as knowing how to write a PRD or run a user interview.
The 120 terms in this glossary are not academic. They are the vocabulary of the products being built right now, and the decisions being made in product reviews, architecture discussions, and roadmap sessions every week. Understanding these concepts won't make you an AI engineer. But it will make you a better partner to your engineering team, a more credible voice in AI strategy conversations, and a more effective decision-maker on the products you own.
The field will continue to evolve. New terms will emerge. Architectures will shift. But the underlying patterns models, context, memory, retrieval, agents, evaluation will remain the conceptual bedrock of AI product management for years to come.
Keep learning. Keep building. Stay curious.
"The future product managers won't necessarily be AI engineers. But they will be fluent in the language of intelligent systems."

Learn how to use ChatGPT Deep Research to compress hours of manual research into a single prompt. Step-by-step guide with real examples and proven techniques.
.png&w=1200&q=75)
Master Perplexity AI as a product manager. Learn to run competitive research, synthesize user feedback, and accelerate decision-making with AI-powered search built for PMs who ship faster.
.png&w=1200&q=75)
AI prototyping for product managers just got a lot less painful.