Message Queue

Message queues buffer work between producers and consumers. When a web server receives a request that triggers expensive processing (generating a report, sending emails, running ML inference), it places a message on the queue and responds immediately. Worker processes consume messages from the queue at their own pace, processing them reliably even during traffic spikes.

Popular message queue systems include RabbitMQ (feature-rich general-purpose), Amazon SQS (fully managed, simple), Apache Kafka (high-throughput streaming), and Redis Streams (lightweight, fast). Each offers different guarantees around message ordering, delivery semantics (at-least-once vs. exactly-once), and persistence.

For AI systems, message queues are essential for managing inference workloads. Batch prediction requests are queued and processed by GPU workers at optimal batch sizes. Content moderation tasks are queued and processed asynchronously. Training job triggers are placed in queues with priority ordering. The queue absorbs variable demand, ensuring expensive GPU resources are utilized efficiently rather than sitting idle or being overwhelmed.

Related Terms

A/B Testing

Feature Flag

MLOps

Model Serving

Semantic Search

CI/CD (Continuous Integration / Continuous Deployment)