Rolling Deployment

Rolling deployments update application instances incrementally. If you have 10 instances running version 1, a rolling deployment might update 2 at a time: take 2 instances out of the load balancer, update them to version 2, verify they are healthy, add them back, then repeat for the next batch. This maintains capacity throughout the deployment.

The key parameters are batch size (how many instances update simultaneously) and health check criteria (what conditions must be met before proceeding). Conservative settings (small batches, strict health checks) are slower but safer. Aggressive settings (large batches, minimal checks) are faster but riskier.

Rolling deployments are the default strategy in Kubernetes and most container orchestration platforms. For AI model deployments, rolling updates allow gradual transition to a new model version while maintaining serving capacity. However, during the rollout window, both old and new model versions serve traffic simultaneously, which can be a concern if the versions produce meaningfully different outputs. Canary releases offer more control when model version consistency matters.

Related Terms

A/B Testing

Feature Flag

MLOps

Model Serving

Semantic Search

CI/CD (Continuous Integration / Continuous Deployment)