Data Mesh
A decentralized data architecture paradigm where domain teams own and operate their data as products, with federated governance and self-serve infrastructure replacing centralized data teams.
Data mesh, introduced by Zhamak Dehghani, applies domain-driven design principles to data architecture. Instead of a central data team owning all data pipelines and the data warehouse, each domain team (payments, user engagement, content) owns their data end-to-end and publishes it as a product that other teams can discover and consume.
The four principles are domain ownership (teams own their data), data as a product (data is treated with the same rigor as customer-facing products), self-serve infrastructure (platform teams provide tools that domain teams use independently), and federated computational governance (global standards with local autonomy).
For AI teams in large organizations, data mesh addresses the bottleneck of centralized data teams. Instead of waiting weeks for a central team to build a pipeline for a new feature, the domain team that generates the data provides it as a well-documented, quality-assured data product. AI teams consume these data products as model inputs, with clear contracts around freshness, quality, and schema stability.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.