Graph Database
A database that uses graph structures with nodes, edges, and properties to store and query data, excelling at traversing complex relationships that would require expensive joins in relational databases.
Graph databases model data as interconnected entities. Nodes represent objects (users, products, documents), edges represent relationships (follows, purchased, references), and properties store attributes on both nodes and edges. This structure makes relationship traversal a first-class operation rather than an expensive join.
Queries like "find all friends of friends who purchased product X" or "what is the shortest path between user A and user B" that require multiple joins in relational databases are natural and performant in graph databases. Neo4j, Amazon Neptune, and ArangoDB are popular graph database platforms, using query languages like Cypher or Gremlin.
For AI and growth teams, graph databases power recommendation systems (collaborative filtering through user-product graphs), fraud detection (identifying suspicious connection patterns), knowledge graphs (structured representations of domain knowledge for RAG systems), and social network analysis (identifying influencers, communities, and viral paths). The graph structure also enables graph neural networks, an emerging ML approach that learns directly from graph-structured data.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.