Foundation Model

Foundation models represent a paradigm shift in AI development. Instead of training a separate model for each task, you train one massive model on diverse data and then adapt it. GPT-4, Claude, Gemini, and Llama are foundation models for language. CLIP and DALL-E are foundation models for vision. These models encode general knowledge that transfers to specific applications.

The term was coined by Stanford's Center for Research on Foundation Models to emphasize both the power and the risks of this approach. The power is efficiency: a single foundation model can be adapted to thousands of tasks with minimal additional training. The risk is concentration: biases, errors, or vulnerabilities in the foundation model propagate to every downstream application built on it.

For product teams, foundation models are the starting point for most AI features today. The strategic decisions involve which foundation model to build on (balancing capability, cost, and vendor lock-in), how much to customize it (prompting versus fine-tuning), and how to mitigate the risks of depending on a model you do not control. Multi-model architectures that use different foundation models for different tasks reduce single-provider dependency.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering