Latency Percentiles
Statistical measures of response time distribution, where p50 represents the median latency and p99 represents the latency experienced by the slowest 1% of requests, revealing tail performance that averages hide.
Averages lie about latency. A service with 50ms average latency might have a p99 of 2 seconds, meaning 1 in 100 users waits 40x longer than the median. Percentiles reveal this distribution: p50 (median), p90 (90th percentile), p95, and p99 each tell a progressively more complete story about user experience.
For AI products, tail latency matters enormously. If your LLM inference has a p99 of 8 seconds, frequent users will regularly experience unacceptably slow responses. In distributed systems, tail latencies compound: if a page makes 10 parallel API calls, the page latency is determined by the slowest call, making high p99 values even more impactful.
Teams should set SLOs (Service Level Objectives) on percentiles, not averages. A common target might be p50 under 100ms, p95 under 500ms, and p99 under 2 seconds. Monitoring percentile-based dashboards reveals degradation that average-based metrics miss, enabling teams to address performance issues before they significantly impact user experience.
Related Terms
A/B Testing
A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.
Feature Flag
A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.
MLOps
The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.
Model Serving
The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
CI/CD (Continuous Integration / Continuous Deployment)
An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.