Clustering

Clustering finds groups in data without being told what the groups should be. Unlike classification, where you train with labeled examples, clustering algorithms discover structure autonomously. K-means (partition data into k groups by minimizing within-cluster distance), DBSCAN (density-based clustering that finds arbitrarily shaped clusters), and hierarchical clustering (building a tree of nested clusters) are the most common approaches.

The choice of clustering algorithm depends on your data and goals. K-means is fast and scalable but assumes spherical clusters and requires specifying k in advance. DBSCAN handles arbitrary shapes and automatically determines the number of clusters but is sensitive to density parameters. Hierarchical clustering provides a multi-scale view but is computationally expensive for large datasets.

For growth teams, clustering is powerful for customer segmentation (discovering behavioral groups), content categorization (automatically grouping similar items), anomaly detection (points that don't belong to any cluster are potential outliers), and market research (finding natural segments in survey or behavioral data). Embedding-based clustering, where you cluster vector representations rather than raw features, has become especially powerful for unstructured data like text and user behavior sequences.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering