AUC (Area Under the Curve)
A summary metric computed as the area under the ROC curve, representing the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.
AUC provides a single number that captures a classifier's overall ability to discriminate between classes, independent of the classification threshold. An AUC of 1.0 means perfect separation; 0.5 means no better than random; and below 0.5 means the model's predictions are inversely correlated with actual outcomes (flip the predictions and you have a good model).
The probabilistic interpretation is intuitive: an AUC of 0.85 means that if you randomly select one positive and one negative example, the model will correctly rank the positive example higher 85% of the time. This threshold-independence is AUC's main advantage: it evaluates the model's ranking ability regardless of what decision threshold you ultimately choose.
For production systems, AUC is a popular metric for model comparison during development because it is threshold-independent and scale-invariant. However, it has limitations: it can be overly optimistic for highly imbalanced datasets (a model that performs well on the easy majority class can still have high AUC while performing poorly on the important minority class). For imbalanced problems common in growth applications (fraud, churn), complement AUC with precision-recall AUC or lift charts that focus on model performance at the operating points you actually care about.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.