Back to glossary

Message Queue

An asynchronous communication mechanism where producers send messages to a queue and consumers process them independently, decoupling system components and absorbing traffic spikes.

Message queues buffer work between producers and consumers. When a web server receives a request that triggers expensive processing (generating a report, sending emails, running ML inference), it places a message on the queue and responds immediately. Worker processes consume messages from the queue at their own pace, processing them reliably even during traffic spikes.

Popular message queue systems include RabbitMQ (feature-rich general-purpose), Amazon SQS (fully managed, simple), Apache Kafka (high-throughput streaming), and Redis Streams (lightweight, fast). Each offers different guarantees around message ordering, delivery semantics (at-least-once vs. exactly-once), and persistence.

For AI systems, message queues are essential for managing inference workloads. Batch prediction requests are queued and processed by GPU workers at optimal batch sizes. Content moderation tasks are queued and processed asynchronously. Training job triggers are placed in queues with priority ordering. The queue absorbs variable demand, ensuring expensive GPU resources are utilized efficiently rather than sitting idle or being overwhelmed.

Related Terms