SLA (Service Level Agreement)

SLAs are legally binding commitments that specify the minimum acceptable level of service. A typical SLA might guarantee 99.9% uptime (allowing approximately 8.7 hours of downtime per year), response times under 500ms for 95% of requests, and support response within 4 hours for critical issues. Violations usually trigger service credits or financial penalties.

SLAs should be set slightly below your actual capability to provide a safety margin. If your system achieves 99.95% uptime, an SLA of 99.9% gives room for unexpected incidents without breaching the agreement. The gap between your SLA and your actual performance is your error budget.

For AI-powered products, SLAs require careful consideration. LLM API dependencies introduce latency variability and availability risks outside your control. Teams should architect fallback paths, caching layers, and degraded-mode experiences that maintain SLA compliance even when upstream AI providers experience issues. Your SLA to customers should never be more aggressive than the weakest link in your dependency chain.

Related Terms

A/B Testing

Feature Flag

MLOps

Model Serving

Semantic Search

CI/CD (Continuous Integration / Continuous Deployment)