Classification

Classification is the most common ML task in production systems. Given an input and a set of possible categories, the model predicts which category the input belongs to. Binary classification handles two classes (spam/not spam, churn/retain). Multi-class handles many classes (product category, intent detection). Multi-label allows multiple simultaneous labels (a support ticket can be both "billing" and "urgent").

The modeling spectrum ranges from simple (logistic regression, decision trees) to complex (gradient boosting, deep neural networks). For structured data, gradient boosting (XGBoost, LightGBM) typically achieves the best accuracy. For text, LLMs can classify with zero-shot prompting, few-shot prompting, or fine-tuning. For images, convolutional neural networks and vision transformers are standard.

For growth applications, classification powers many critical features: classifying leads by likelihood to convert, categorizing support tickets for automated routing, detecting fraudulent transactions, segmenting users by behavior type, and moderating user-generated content. The key to production classification is not just model accuracy but the entire system: feature engineering, threshold selection based on business costs, monitoring for distribution shift, and graceful handling of low-confidence predictions.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering