Top-k Sampling
A decoding strategy that restricts token selection to the k most probable next tokens, filtering out unlikely candidates to balance output quality with diversity.
Top-k sampling is a straightforward approach to controlling LLM output randomness. At each generation step, only the k tokens with the highest probabilities are considered, and the model samples from this reduced set. With k=1, it behaves like greedy decoding (always picking the most likely token). With k=50, it considers 50 candidates, allowing for varied but still reasonable outputs.
The limitation of top-k is its fixed candidate pool size. When the model is highly confident, k=50 might include many irrelevant tokens that dilute quality. When the model faces a genuinely ambiguous choice among many viable options, k=50 might exclude valid candidates. This inflexibility led to the development of top-p sampling as an adaptive alternative.
Despite this limitation, top-k remains useful in practice, especially in combination with other sampling strategies. A common configuration uses both top-k and top-p together: top-k first reduces to the k most likely candidates, then top-p further filters to those whose cumulative probability exceeds p. This layered approach provides robust quality control. For most production applications, top-k between 20 and 100 combined with top-p 0.9 and moderate temperature gives reliable results.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.