ELT (Extract, Load, Transform)
A modern data integration pattern that loads raw data directly into a target system first and then transforms it in place, leveraging the processing power of cloud data warehouses.
ELT reverses the traditional ETL order. Raw data is extracted from sources and loaded directly into a cloud data warehouse (Snowflake, BigQuery, Redshift) without transformation. Transformations then happen inside the warehouse using SQL, leveraging the warehouse's massive parallel processing capabilities.
This approach became practical with the rise of cloud data warehouses that offer virtually unlimited compute for transformation. Tools like Fivetran and Airbyte handle the extract-load phase, syncing data from hundreds of sources into the warehouse. dbt (data build tool) then handles the transform phase, applying SQL transformations with version control, testing, and documentation.
For AI teams, ELT offers flexibility. Raw data is preserved in the warehouse, so new features can be computed from historical data without re-extracting from sources. Data scientists can experiment with different transformations using SQL before productionizing them as dbt models. The approach also supports schema evolution more gracefully, since raw data is always available for reprocessing when requirements change.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.