Star Schema
A data warehouse modeling pattern that organizes data into a central fact table containing measurable events surrounded by dimension tables containing descriptive attributes, resembling a star shape.
The star schema is the most common data warehouse modeling pattern. The fact table at the center contains the quantitative data you want to analyze: sales amounts, click counts, session durations. Surrounding dimension tables provide the context: who (customer dimension), what (product dimension), when (date dimension), and where (location dimension). Foreign keys in the fact table reference each dimension.
This denormalized structure is optimized for analytical queries. A query like "total revenue by product category by month by region" joins the fact table with three dimension tables using simple key lookups. The star shape makes queries intuitive and fast because most analytical questions follow the pattern of "measure X sliced by dimensions Y and Z."
For AI teams building feature pipelines, star schemas provide a clean structure for aggregating features. User-level features are computed by grouping the fact table by user dimension. Time-based features use the date dimension for windowed aggregations. The clear separation of facts and dimensions makes feature engineering queries straightforward and maintainable.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.