Back to glossary

Infrastructure & DevOps

Vertical Scaling

The practice of increasing capacity by adding more resources like CPU, memory, or GPU to an existing machine rather than adding more machines. Vertical scaling is simpler to implement but has physical limits and creates single points of failure.

Vertical scaling, also called scaling up, increases the power of individual servers. Upgrading from 16GB to 64GB of RAM, moving from a 4-core to a 16-core processor, or switching from a single GPU to a multi-GPU configuration are all examples. This approach requires no architectural changes to the application, making it the simplest scaling strategy.

For AI workloads, vertical scaling is often the first step because larger models require more memory and compute than a single small instance can provide. A language model that needs 24GB of VRAM cannot be split across multiple small GPUs without complex model parallelism. Growth teams should understand vertical scaling limits because they define the ceiling for AI feature performance on a single request: a bigger instance can run a larger, more accurate model or process more context. However, vertical scaling has hard limits determined by the largest available machine type, creates availability risk from single-point failures, and can be expensive since the largest instances command premium pricing. Most production AI systems combine vertical scaling for individual inference quality with horizontal scaling for throughput and reliability.

Related Terms

Content Delivery Network

A geographically distributed network of proxy servers that caches and delivers content from locations closest to end users. CDNs reduce latency, improve load times, and absorb traffic spikes by serving content from edge nodes rather than a single origin server.

Edge Computing

A distributed computing paradigm that processes data closer to the source of generation rather than in a centralized data center. Edge computing reduces latency, conserves bandwidth, and enables real-time processing for latency-sensitive applications.

Serverless Computing

A cloud execution model where the provider dynamically manages server allocation and scaling. Developers deploy functions or containers without provisioning infrastructure, paying only for actual compute time consumed rather than reserved capacity.

Function as a Service

A serverless computing category where developers deploy individual functions that execute in response to events. FaaS platforms like AWS Lambda, Google Cloud Functions, and Azure Functions handle all infrastructure management, scaling each function independently.

Platform as a Service

A cloud computing model that provides a complete development and deployment environment without managing underlying infrastructure. PaaS offerings like Heroku, Vercel, and Google App Engine handle servers, storage, networking, and runtime configuration.

Infrastructure as a Service

A cloud computing model that provides virtualized computing resources over the internet. IaaS offerings like AWS EC2, Google Compute Engine, and Azure Virtual Machines give teams full control over servers, storage, and networking without owning physical hardware.