Back to glossary

API Gateway

A server that acts as the single entry point for all API requests, handling routing, authentication, rate limiting, and request transformation. API gateways decouple client applications from the internal microservice topology and centralize cross-cutting concerns.

An API gateway sits between clients and backend services, providing a unified interface while managing the complexity of routing requests to the appropriate microservice. It handles authentication and authorization, request and response transformation, rate limiting, caching, and monitoring. Popular options include Kong, AWS API Gateway, and Nginx-based solutions.

For AI product teams, the API gateway is a critical component because AI features typically expose multiple model endpoints that need consistent authentication, rate limiting, and monitoring. The gateway can route requests to different model versions based on experiment assignments, cache frequent predictions to reduce inference costs, and enforce usage quotas that protect expensive GPU resources. Growth teams leverage API gateways for implementing feature flags at the API level, A/B testing different backend behaviors, and collecting usage analytics without modifying individual services. A well-configured gateway also provides the request-level metrics needed to understand API performance patterns and identify bottlenecks in the AI inference pipeline.

Related Terms

Content Delivery Network

A geographically distributed network of proxy servers that caches and delivers content from locations closest to end users. CDNs reduce latency, improve load times, and absorb traffic spikes by serving content from edge nodes rather than a single origin server.

Edge Computing

A distributed computing paradigm that processes data closer to the source of generation rather than in a centralized data center. Edge computing reduces latency, conserves bandwidth, and enables real-time processing for latency-sensitive applications.

Serverless Computing

A cloud execution model where the provider dynamically manages server allocation and scaling. Developers deploy functions or containers without provisioning infrastructure, paying only for actual compute time consumed rather than reserved capacity.

Function as a Service

A serverless computing category where developers deploy individual functions that execute in response to events. FaaS platforms like AWS Lambda, Google Cloud Functions, and Azure Functions handle all infrastructure management, scaling each function independently.

Platform as a Service

A cloud computing model that provides a complete development and deployment environment without managing underlying infrastructure. PaaS offerings like Heroku, Vercel, and Google App Engine handle servers, storage, networking, and runtime configuration.

Infrastructure as a Service

A cloud computing model that provides virtualized computing resources over the internet. IaaS offerings like AWS EC2, Google Compute Engine, and Azure Virtual Machines give teams full control over servers, storage, and networking without owning physical hardware.