Back to glossary

Infrastructure & DevOps

Zero-Downtime Deployment

A deployment strategy that updates production systems without any period of unavailability. Zero-downtime deployments use techniques like rolling updates, blue-green switching, or canary releases to transition traffic between versions seamlessly.

Zero-downtime deployment eliminates the maintenance window that traditional deployments require. Rolling updates gradually replace old instances with new ones, maintaining a minimum number of healthy instances throughout. Blue-green deployments maintain two complete environments, switching traffic atomically between them. Canary deployments route a small percentage of traffic to the new version, expanding gradually if metrics remain healthy.

For AI product teams, zero-downtime deployments are essential because AI products often serve global user bases across all time zones. There is no good time for downtime. Model updates are particularly sensitive because loading new model weights into memory can take minutes, during which the instance cannot serve requests. Techniques like pre-loading the new model in a separate process and atomically switching the serving endpoint prevent any interruption. Growth teams depend on continuous deployment to ship experiment variations and feature flags without scheduling maintenance windows. The ability to deploy multiple times per day without downtime enables the rapid experimentation cadence that growth teams need to iterate quickly.

Related Terms

Content Delivery Network

A geographically distributed network of proxy servers that caches and delivers content from locations closest to end users. CDNs reduce latency, improve load times, and absorb traffic spikes by serving content from edge nodes rather than a single origin server.

Edge Computing

A distributed computing paradigm that processes data closer to the source of generation rather than in a centralized data center. Edge computing reduces latency, conserves bandwidth, and enables real-time processing for latency-sensitive applications.

Serverless Computing

A cloud execution model where the provider dynamically manages server allocation and scaling. Developers deploy functions or containers without provisioning infrastructure, paying only for actual compute time consumed rather than reserved capacity.

Function as a Service

A serverless computing category where developers deploy individual functions that execute in response to events. FaaS platforms like AWS Lambda, Google Cloud Functions, and Azure Functions handle all infrastructure management, scaling each function independently.

Platform as a Service

A cloud computing model that provides a complete development and deployment environment without managing underlying infrastructure. PaaS offerings like Heroku, Vercel, and Google App Engine handle servers, storage, networking, and runtime configuration.

Infrastructure as a Service

A cloud computing model that provides virtualized computing resources over the internet. IaaS offerings like AWS EC2, Google Compute Engine, and Azure Virtual Machines give teams full control over servers, storage, and networking without owning physical hardware.