Document Database
A NoSQL database that stores data as flexible, self-describing documents (typically JSON or BSON), allowing varied structures within the same collection without requiring a predefined schema.
Document databases store records as documents, each containing its own structure. Unlike relational databases where every row in a table must conform to the same schema, documents in the same collection can have different fields. This flexibility accommodates evolving data models, semi-structured data, and hierarchical information that maps naturally to application objects.
MongoDB, CouchDB, and Amazon DocumentDB are popular document databases. They excel at use cases with rapidly evolving schemas, hierarchical data (user profiles with nested preferences), and applications where each record may have different attributes (product catalogs with varying specifications per category).
For AI teams, document databases are often used to store unstructured or semi-structured data that feeds ML pipelines: user interaction logs with varying event schemas, content metadata with flexible attributes, and model configuration documents. The schemaless nature allows data structures to evolve alongside model requirements without migration overhead. However, the lack of enforced schema can lead to data quality issues if governance is not maintained.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.