Tokenization
The process of splitting text into smaller units (tokens) that an LLM can process, typically subword pieces averaging about 4 characters per token.
Tokenization is the first step in any LLM pipeline: converting human-readable text into a sequence of integer IDs that the model can process. Modern tokenizers (like BPE — Byte Pair Encoding) split text into subword pieces, balancing vocabulary size with coverage. Common words like "the" get their own token, while rare words are split into pieces: "tokenization" might become "token" + "ization."
Understanding tokenization matters for practical reasons. LLM pricing is per token, not per word — and a token is roughly 3/4 of a word in English. Context window limits are in tokens, so a 128K token window holds roughly 96K words. Non-English languages and code often tokenize less efficiently (more tokens per word), meaning they cost more and fill context windows faster.
For prompt engineering, token awareness helps you optimize costs: shorter prompts with the same meaning save money at scale. For RAG systems, chunk sizes should account for token limits, not just character or word counts. And for evaluation, token-level analysis helps you understand why a model produced unexpected output — sometimes it's the tokenizer splitting a word in an unexpected way.
Related Terms
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Transformer
The neural network architecture behind modern LLMs, using self-attention mechanisms to process and generate sequences of tokens in parallel.
Attention Mechanism
A neural network component that dynamically weights the relevance of different parts of the input sequence when producing each output token.
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
Further Reading
Understanding LLM Context Windows: What 128K Really Means
Context window size is more than just a number. Let's explore what it actually means for your applications.
LLM Cost Optimization: Cut Your API Bill by 80%
Spending $10K+/month on OpenAI or Anthropic? Here are the exact tactics that reduced our LLM costs from $15K to $3K/month without sacrificing quality.