Load Balancer
A network component that distributes incoming traffic across multiple backend servers to maximize throughput, minimize response time, and ensure no single server is overwhelmed.
Load balancers sit between clients and servers, routing each request to the most appropriate backend instance. Common algorithms include round-robin (sequential distribution), least connections (route to the server handling the fewest active requests), and weighted routing (distribute based on server capacity).
Modern load balancers operate at different network layers. Layer 4 (TCP) load balancers route based on IP and port, offering high throughput with minimal processing overhead. Layer 7 (HTTP) load balancers inspect request content, enabling path-based routing, header-based decisions, and sticky sessions. Cloud providers offer managed load balancers (ALB, NLB on AWS; Cloud Load Balancing on GCP) that integrate with auto-scaling groups.
For AI serving infrastructure, load balancers are critical. They distribute inference requests across GPU servers, route traffic during canary deployments of new models, and perform health checks to remove unhealthy instances from the pool. Intelligent load balancing can also route requests based on model version, request complexity, or available GPU memory.
Related Terms
A/B Testing
A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.
Feature Flag
A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.
MLOps
The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.
Model Serving
The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
CI/CD (Continuous Integration / Continuous Deployment)
An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.