Slowly Changing Dimension (SCD)
A data warehousing technique for tracking changes to dimension attributes over time, preserving historical context so that past facts can be analyzed against the dimension values that were current at that time.
Dimensions change: customers move cities, products change categories, employees change departments. Slowly changing dimensions handle these updates while preserving analytical accuracy. Without SCD, historical analysis would incorrectly attribute all past activity to the current dimension values, distorting trends and comparisons.
The most common types are SCD Type 1 (overwrite the old value, losing history), Type 2 (add a new row with effective dates, preserving full history), and Type 3 (add a column for the previous value, tracking one level of history). Type 2 is most common in practice because it preserves complete history while enabling both current and historical analysis.
For AI teams, SCD Type 2 is essential for building point-in-time correct training data. A churn prediction model must use the customer's attributes as they were at the time of the prediction, not their current attributes. Without proper SCD handling, training data would contain future information (data leakage), inflating apparent model performance and degrading real-world accuracy.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.