Cold Start
The initial latency spike that occurs when a serverless function, container, or service instance is invoked after a period of inactivity and must initialize its runtime environment before processing the request.
Cold starts happen because serverless platforms deallocate resources from idle functions. When a new request arrives, the platform must provision a container, load the runtime, initialize dependencies, and establish database connections before executing your code. This adds hundreds of milliseconds to several seconds of latency on the first request.
Cold start severity varies by runtime and platform. Node.js and Python functions on AWS Lambda typically cold-start in 200-500ms. Java and .NET functions can take 1-3 seconds due to heavier runtimes. AI inference functions loading large models can take 10-30 seconds, making cold starts a serious UX concern.
Mitigation strategies include provisioned concurrency (keeping instances warm at a fixed cost), periodic pinging to prevent deallocation, minimizing dependency size, using lightweight runtimes, and lazy-loading heavy resources. For AI features, teams often keep model-serving containers warm with minimum replica counts rather than relying on purely serverless scaling.
Related Terms
A/B Testing
A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.
Feature Flag
A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.
MLOps
The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.
Model Serving
The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
CI/CD (Continuous Integration / Continuous Deployment)
An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.