Data Warehouse

Data warehouses collect data from multiple operational systems into a single analytical repository. Unlike transactional databases optimized for fast writes and point lookups, warehouses are optimized for complex analytical queries that scan millions of rows: aggregations, joins across large tables, time-series analysis, and multi-dimensional reporting.

Modern cloud data warehouses like Snowflake, Google BigQuery, and Amazon Redshift separate storage from compute, allowing each to scale independently. You can store petabytes of data cheaply and spin up massive compute clusters only when running heavy queries. This architecture makes data warehouses cost-effective for both storage and analytics at any scale.

For AI teams, the data warehouse often serves as the source of truth for feature engineering and model training data. Historical user behavior, transaction records, and product data flow into the warehouse, where feature engineering queries transform raw data into model inputs. Many teams use the warehouse as the computation layer for batch feature pipelines, with results exported to feature stores for real-time model serving.

Related Terms

Cosine Similarity

Dimensionality Reduction

Batch Inference

Real-Time Inference

Data Pipeline

ETL (Extract, Transform, Load)