Glossary
This glossary defines key terms used throughout the workshops. Terms are organized alphabetically for quick reference.
A
Absence Blindness
The tendency to focus on what you can see (like generation quality) while ignoring what you cannot easily observe (like retrieval failures). Teams often spend weeks fine-tuning prompts without checking whether retrieval returns relevant documents in the first place.
Example: A team optimizes their prompt for three weeks, only to discover their retrieval system returns completely irrelevant documents for 40% of queries.
See: Chapter 1
B
Bi-encoder
An embedding model architecture where queries and documents are encoded independently into vectors, then compared using similarity metrics like cosine distance. Fast at query time because document embeddings can be precomputed, but less accurate than cross-encoders for ranking.
Contrast with: Cross-encoder, Re-ranker
See: Chapter 2
C
Cold Start Problem
The challenge of building and improving a RAG system before you have real user data. Solved through synthetic data generation—creating realistic test queries from your document corpus.
Example: Generating 200 synthetic queries from legal case documents to establish baseline metrics before launching to users.
See: Chapter 1
Contrastive Learning
A training approach where models learn to distinguish between similar and dissimilar examples. For embeddings, this means training on triplets of (query, positive document, negative document) so the model learns to place queries closer to relevant documents in vector space.
See: Chapter 2
Cross-encoder
A model architecture that processes query and document together as a single input, producing a relevance score. More accurate than bi-encoders but much slower because it cannot precompute document representations.
Contrast with: Bi-encoder
See: Chapter 2
D
Data Flywheel
A self-reinforcing cycle where user interactions generate data that improves the system, which attracts more users, generating more data. The core concept of this workshop series.
User Interactions → Data Collection → System Improvements → Better UX → More Users → ...
E
Embedding
A dense vector representation of text (or other content) that captures semantic meaning. Similar texts have similar embeddings, enabling semantic search through vector similarity.
Related: Vector database, Cosine similarity
Embedding Alignment
The match between what your queries ask about and what information your embeddings capture. If you embed only the first message of conversations but search for conversation patterns, you have an alignment problem—the embeddings do not contain the information the queries seek.
Example: Embedding product descriptions but searching for "products similar to what I bought last month" fails because purchase history is not in the embeddings.
See: Chapter 5
Experiment Velocity
The rate at which you can test hypotheses about your RAG system. The most important leading metric for early-stage systems. Teams that run 10 experiments per week improve faster than teams that run 1 experiment per month.
See: Chapter 1
F
Few-shot Learning
Providing examples in the prompt to guide model behavior. For routing, 10 examples might achieve 88% accuracy while 40 examples reach 95%.
See: Chapter 6
H
Hard Negative
A document that appears relevant based on surface features (keywords, topic) but is actually not helpful for answering a specific query. Hard negatives are the most valuable training examples for improving retrieval because they teach the model subtle distinctions.
Example: For the query "Python memory management," a document about "Python snake habitats" is an easy negative (obviously wrong). A document about "Python garbage collection in version 2.7" when the user needs Python 3.11 information is a hard negative (seems relevant but is not).
Contrast with: Easy negative (completely unrelated documents)
Hybrid Search
Combining lexical search (keyword matching) with semantic search (embedding similarity). Often outperforms either approach alone because lexical search handles exact matches and rare terms while semantic search handles paraphrasing and conceptual similarity.
See: Chapter 1
I
Implicit Feedback
Signals about user satisfaction derived from behavior rather than explicit ratings. Includes query refinements (user rephrases immediately), abandonment, dwell time, citation clicks, and copy actions.
Contrast with: Explicit feedback (thumbs up/down, ratings)
See: Chapter 3
Intervention Bias
The tendency to make changes just to feel like progress is being made, without measuring impact. Manifests as constantly switching models, tweaking prompts, or adding features without clear hypotheses.
See: Chapter 1
Inventory Problem
When a RAG system fails because the answer does not exist in the knowledge base—not because retrieval failed. No amount of better embeddings or re-ranking can fix missing data.
Contrast with: Capabilities problem (answer exists but system cannot find it)
See: Chapter 0
L
Lagging Metric
An outcome metric you care about but cannot directly control: user satisfaction, churn rate, revenue. Like body weight—easy to measure, hard to change directly.
Contrast with: Leading metric
See: Chapter 1
Leading Metric
An actionable metric that predicts future performance and that you can directly influence: experiment velocity, evaluation coverage, feedback collection rate. Like calories consumed—you have direct control.
Contrast with: Lagging metric
See: Chapter 1
P
Precision
Of the documents you retrieved, what percentage were actually relevant? If you returned 10 documents but only 2 were relevant, precision is 20%.
Formula: Precision = (Relevant ∩ Retrieved) / Retrieved
Contrast with: Recall
See: Chapter 1
Precision@K
Precision calculated for the top K results. Precision@5 means: of the top 5 documents returned, how many were relevant?
Q
Query Routing
Directing user queries to the appropriate specialized retriever or tool based on query characteristics. A router that achieves 95% accuracy with retrievers at 82% accuracy yields 78% end-to-end success (0.95 × 0.82).
See: Chapter 6
R
RAG (Retrieval-Augmented Generation)
A pattern where relevant documents are retrieved from a knowledge base and provided as context to a language model for generating responses. Combines the knowledge storage of search systems with the language capabilities of LLMs.
RAPTOR
Recursive Abstractive Processing for Tree-Organized Retrieval. A technique for handling long documents by creating hierarchical summaries—summaries of summaries—enabling retrieval at different levels of abstraction.
See: Chapter 5
Recall
Of all the relevant documents that exist, what percentage did you find? If there are 10 relevant documents and you found 4, recall is 40%.
Formula: Recall = (Relevant ∩ Retrieved) / Relevant
Contrast with: Precision
See: Chapter 1
Recall@K
Recall calculated when retrieving K documents. Recall@10 means: if you retrieve 10 documents, what percentage of all relevant documents did you find?
Re-ranker
A model that re-scores retrieved documents to improve ranking. Typically a cross-encoder that is more accurate but slower than the initial bi-encoder retrieval. Applied to top-N results (e.g., retrieve 50, re-rank to top 10).
Typical improvement: 12-20% at top-5
See: Chapter 2
S
Semantic Cache
A cache that returns stored responses for queries that are semantically similar (not just identical) to previous queries. Requires setting a similarity threshold (e.g., 0.95 cosine similarity).
See: Chapter 7
Synthetic Data
Artificially generated evaluation data, typically created by having an LLM generate questions that a document chunk should answer. Used to overcome the cold start problem and establish baselines before real user data exists.
See: Chapter 1
T
Trellis Framework
A framework for organizing production monitoring of AI systems: (1) Discretize infinite outputs into specific buckets, (2) Prioritize by Volume × Negative Sentiment × Achievable Delta × Strategic Relevance, (3) Recursively refine within buckets.
See: Chapter 1
Two-Level Performance Formula
For systems with routing to specialized retrievers, overall success = P(correct router) × P(correct retrieval | correct router). A 95% router with 82% retrieval yields 78% overall, while a 67% router with 80% retrieval yields only 54%.
See: Chapter 6
V
Vector Database
A database optimized for storing and querying high-dimensional vectors (embeddings). Supports approximate nearest neighbor search to find similar vectors efficiently.
Examples: Pinecone, ChromaDB, pgvector, LanceDB, Weaviate
See: Chapter 1
W
Write-time vs Read-time Computation
A fundamental architectural trade-off. Write-time computation (preprocessing) increases storage costs but improves query latency. Read-time computation (on-demand) reduces storage but increases latency. Choose based on content stability and latency requirements.
See: Chapter 7
Quick Reference: Key Formulas
| Metric | Formula | Use Case |
|---|---|---|
| Precision@K | Relevant in top K / K | Measuring result quality |
| Recall@K | Relevant in top K / Total relevant | Measuring coverage |
| End-to-end success | P(router) × P(retrieval) | System performance |
| Prioritization score | Volume × (1 - Satisfaction) × Delta × Relevance | Roadmap planning |
Return to Workshop Index