AI Fundamentals

AI Fundamentals Glossary

Core concepts in artificial intelligence and machine learning — from transformer architectures and embeddings to prompt engineering and RAG pipelines.

Activation Function

A nonlinear mathematical function applied to each neuron's output in a neural network, enabling the network to learn complex, nonlinear patterns that a purely linear model could not represent.

Attention Mechanism

A neural network component that dynamically weights the relevance of different parts of the input sequence when producing each output token.

AUC (Area Under the Curve)

A summary metric computed as the area under the ROC curve, representing the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.

Autoencoder

A neural network trained to compress input data into a compact latent representation and then reconstruct the original input from that representation, learning efficient data encodings in the process.

Backpropagation

The algorithm that efficiently computes gradients of the loss function with respect to every weight in a neural network by propagating error signals backward from the output layer to the input layer.

Batch Normalization

A technique that normalizes layer inputs across the training batch to have zero mean and unit variance, stabilizing and accelerating neural network training by reducing internal covariate shift.

Benchmarks

Standardized tests and datasets used to evaluate and compare AI model performance across specific tasks, providing consistent metrics for measuring progress and informing model selection decisions.

Bias-Variance Tradeoff

The fundamental tension in machine learning between bias (error from overly simplistic assumptions that cause the model to miss patterns) and variance (error from excessive sensitivity to training data fluctuations).

Classification

A supervised learning task that assigns input data to one of several predefined categories based on learned patterns, used for tasks like spam detection, sentiment analysis, and churn prediction.

Clustering

An unsupervised learning technique that groups similar data points together without predefined labels, discovering natural structure and segments within datasets based on feature similarity.

Confusion Matrix

A table that visualizes a classification model's performance by showing the counts of true positives, true negatives, false positives, and false negatives across all predicted and actual class combinations.

Constitutional AI

An alignment approach developed by Anthropic where an AI is trained to follow a set of principles (a constitution) through self-critique and revision, reducing the need for human feedback on every example.

Context Window

The maximum number of tokens an LLM can process in a single inference call, encompassing both the input prompt and the generated output, typically ranging from 4K to 1M tokens.

Convolutional Neural Network (CNN)

A neural network architecture designed for processing grid-structured data like images, using convolutional filters that slide over the input to detect local patterns like edges, textures, and shapes.

Cross-Validation

A model evaluation technique that splits data into multiple folds, training and testing on different subsets in rotation, providing a more reliable estimate of model performance than a single train-test split.

Deep Learning

A subset of machine learning that uses neural networks with many layers (deep architectures) to automatically learn hierarchical feature representations from raw data.

Diffusion Model

A generative AI model that creates data (typically images) by learning to gradually denoise random noise into coherent outputs, producing high-quality results through an iterative refinement process.

Direct Preference Optimization (DPO)

An alignment technique that fine-tunes LLMs directly on human preference data without training a separate reward model, simplifying the RLHF pipeline while achieving comparable results.

Dropout

A regularization technique that randomly deactivates a fraction of neurons during each training step, forcing the network to learn redundant representations and preventing over-reliance on any single neuron.

Embeddings

Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.

Ensemble Methods

Techniques that combine multiple models to produce predictions that are more accurate and robust than any single model, leveraging the principle that diverse models make different errors that cancel out.

F1 Score

The harmonic mean of precision and recall, providing a single metric that balances both types of classification error, ranging from 0 (worst) to 1 (perfect).

Few-Shot Learning

A prompting technique where a small number of input-output examples are included in the prompt to guide the model's behavior on a specific task, improving consistency without fine-tuning.

Fine-Tuning

The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.

Foundation Model

A large-scale AI model trained on broad data that can be adapted to a wide range of downstream tasks through fine-tuning or prompting, serving as a general-purpose base for specialized applications.

GAN (Generative Adversarial Network)

A generative model architecture consisting of two neural networks, a generator and a discriminator, that compete against each other, with the generator learning to create increasingly realistic outputs.

Generative AI

AI systems that create new content such as text, images, code, audio, or video, rather than simply analyzing or classifying existing data, powered by models like LLMs and diffusion models.

Gradient Boosting

An ensemble technique that builds models sequentially, with each new model specifically trained to correct the errors of the previous ones, producing a powerful combined predictor through iterative refinement.

Gradient Descent

The core optimization algorithm used to train neural networks, which iteratively adjusts model parameters in the direction that most reduces the loss function, guided by computed gradients.

Guardrails

Safety mechanisms applied to AI system inputs and outputs that detect, filter, or modify content to prevent harmful, off-topic, or policy-violating responses in production.

Hallucination

When an LLM generates plausible-sounding but factually incorrect or fabricated information that has no basis in its training data or provided context.

Hyperparameter Tuning

The process of systematically searching for the optimal configuration settings (learning rate, batch size, architecture choices) that are set before training and control the learning process itself.

Inference

The process of running a trained AI model on new inputs to generate predictions or outputs, as opposed to training where the model learns from data. This is what happens every time a user interacts with an AI feature.

Knowledge Distillation

A model compression technique where a smaller student model is trained to mimic the outputs of a larger teacher model, preserving most of the teacher's performance at a fraction of the compute cost.

LLM (Large Language Model)

A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning method that trains small, low-rank matrices alongside frozen model weights, enabling task-specific adaptation with a fraction of the memory and compute of full fine-tuning.

Loss Function

A mathematical function that quantifies the difference between a model's predictions and the actual target values, providing the signal that guides the optimization process during training.

Mixture of Experts (MoE)

A neural network architecture that routes each input to a subset of specialized sub-networks (experts), achieving the capacity of a very large model while only activating a fraction of parameters per inference.

Model Quantization

A technique that reduces model size and inference cost by representing weights and activations with lower-precision numbers, such as converting 32-bit floats to 8-bit or 4-bit integers.

Multimodal Model

An AI model that can process and generate multiple types of data such as text, images, audio, and video within a single unified architecture, enabling cross-modal understanding and generation.

Named Entity Recognition (NER)

An NLP task that identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and monetary values.

Natural Language Processing (NLP)

The branch of AI focused on enabling computers to understand, interpret, and generate human language, encompassing tasks from text classification to machine translation.

Neural Network

A computational model inspired by the human brain, composed of layers of interconnected nodes (neurons) that learn patterns from data by adjusting connection weights during training.

Overfitting

When a model learns to memorize training data patterns too closely, including noise and outliers, resulting in excellent training performance but poor generalization to new, unseen data.

Perplexity

A metric that measures how well a language model predicts a sequence of text, where lower perplexity indicates the model assigns higher probability to the actual text and is thus a better model of the language.

Precision and Recall

Complementary classification metrics where precision measures the fraction of positive predictions that are correct, and recall measures the fraction of actual positives that are detected.

Prompt Engineering

The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.

QLoRA (Quantized LoRA)

An extension of LoRA that combines 4-bit quantization of the base model with low-rank adaptation, enabling fine-tuning of large language models on a single consumer GPU.

RAG (Retrieval-Augmented Generation)

A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.

Random Forest

An ensemble learning method that trains many decision trees on random subsets of data and features, then aggregates their predictions through voting or averaging to produce more accurate and stable results.

Recurrent Neural Network (RNN)

A neural network architecture designed for sequential data that maintains a hidden state updated at each time step, allowing it to process variable-length sequences like text, time series, and audio.

Red Teaming

The practice of systematically probing an AI system for vulnerabilities, failure modes, and harmful outputs by simulating adversarial user behavior before and after deployment.

Regression

A supervised learning task that predicts a continuous numerical value based on input features, used for forecasting metrics like revenue, estimating customer lifetime value, and predicting engagement scores.

Reinforcement Learning from Human Feedback (RLHF)

A training method that aligns LLM outputs with human preferences by using human ratings of model responses to train a reward model, which then guides the LLM via reinforcement learning.

ROC Curve

A graphical plot that illustrates a binary classifier's performance across all classification thresholds by plotting the true positive rate against the false positive rate.

Sentiment Analysis

An NLP technique that determines the emotional tone or opinion expressed in text, classifying it as positive, negative, or neutral, often with fine-grained intensity scores.

Support Vector Machine (SVM)

A classification algorithm that finds the optimal hyperplane separating different classes by maximizing the margin between the nearest data points of each class, called support vectors.

Synthetic Data

Artificially generated training data created by AI models or statistical methods that mimics the statistical properties of real data, used when real data is scarce, expensive, or privacy-sensitive.

Temperature

A parameter that controls the randomness of LLM outputs by scaling the probability distribution over possible next tokens, where lower values produce more deterministic responses and higher values increase creativity.

Tokenization

The process of splitting text into smaller units (tokens) that an LLM can process, typically subword pieces averaging about 4 characters per token.

Top-k Sampling

A decoding strategy that restricts token selection to the k most probable next tokens, filtering out unlikely candidates to balance output quality with diversity.

Top-p Sampling (Nucleus Sampling)

A decoding strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p, dynamically adjusting the candidate pool based on the model's confidence.

Training Data

The dataset used to teach an AI model patterns and relationships during the training process, whose quality, size, diversity, and representativeness directly determine the model's capabilities and limitations.

Transfer Learning

A technique where a model trained on one task is repurposed as the starting point for a different but related task, dramatically reducing the data and compute needed for the new task.

Transformer

The neural network architecture behind modern LLMs, using self-attention mechanisms to process and generate sequences of tokens in parallel.

Underfitting

When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data because it fails to learn the relevant relationships.

Vector Database

A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.

Vision-Language Model (VLM)

A multimodal AI model specifically designed to jointly understand images and text, enabling tasks like image captioning, visual question answering, and document understanding.

Weight Initialization

The strategy for setting initial parameter values before training a neural network, which critically affects training dynamics, convergence speed, and whether the network can learn at all.

Zero-Shot Learning

The ability of a model to perform a task it was not explicitly trained on, using only a natural language description of the task without any task-specific examples.

Browse other categories

Growth Metrics & Strategy AI Engineering Data & Pipelines AI Agents SEO & Search Marketing & CRO AdTech & Programmatic Personalization & Recommendations Analytics & Measurement Product Management Infrastructure & DevOps Experimentation & Causal Inference Product & Ad Testing