API Gateway
A server that acts as a single entry point for client requests, handling cross-cutting concerns like authentication, rate limiting, request routing, and protocol translation before forwarding to backend services.
An API gateway consolidates client-facing API concerns into a single managed layer. Instead of each microservice implementing its own authentication, rate limiting, logging, and CORS handling, the gateway handles these uniformly. Clients interact with one endpoint; the gateway routes requests to the appropriate backend service.
Key features include request routing (path-based, header-based), authentication and authorization, rate limiting, request/response transformation, caching, and API analytics. Products like Kong, AWS API Gateway, Apigee, and Traefik provide these capabilities with different trade-offs in flexibility, performance, and managed vs. self-hosted operation.
For AI products, API gateways serve as the control point for model routing. A single API endpoint can route requests to different model versions based on request content, user tier, or A/B test assignment. The gateway can enforce per-customer token quotas, cache frequent inference requests, and transform between different model API formats, creating a unified interface that abstracts the complexity of multiple AI backends.
Related Terms
A/B Testing
A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.
Feature Flag
A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.
MLOps
The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.
Model Serving
The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
CI/CD (Continuous Integration / Continuous Deployment)
An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.