Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Fine-tuning takes a general-purpose model and makes it an expert at your specific task. You provide hundreds to thousands of example input/output pairs, and the model's weights adjust to reproduce those patterns. The result is a model that handles your use case with more consistent style, better accuracy, and often lower latency than a larger prompted model.
The decision to fine-tune versus prompt-engineer is one of the most important technical choices in AI product development. Fine-tuning requires significant upfront investment: curating a high-quality dataset (500-5,000+ examples), running training jobs ($500-$5,000+), and building evaluation pipelines. The payoff is a smaller, cheaper, faster model that nails your specific task.
Most teams should start with prompt engineering on a large model and only fine-tune when they hit clear limitations: inconsistent output format, inability to match a specific tone, cost too high for their volume, or latency requirements that demand a smaller model. The hybrid approach — a fine-tuned small model for high-volume tasks plus a prompted large model for edge cases — is increasingly common in production.
Related Terms
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.
Transformer
The neural network architecture behind modern LLMs, using self-attention mechanisms to process and generate sequences of tokens in parallel.
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
Further Reading
Fine-tuning vs Prompting: The Real Trade-offs
An honest look at when each approach makes sense, with real cost comparisons and performance data.
LLM Cost Optimization: Cut Your API Bill by 80%
Spending $10K+/month on OpenAI or Anthropic? Here are the exact tactics that reduced our LLM costs from $15K to $3K/month without sacrificing quality.