Streaming Data

Streaming data architectures process events as they flow through the system, enabling sub-second reactions to new information. User clicks, IoT sensor readings, transaction events, and log entries are examples of streaming data. Platforms like Apache Kafka, Apache Flink, Apache Spark Streaming, and AWS Kinesis provide the infrastructure for ingesting and processing these continuous data flows.

The key architectural difference from batch processing is that streaming systems process events individually or in micro-batches (milliseconds to seconds) rather than large batches (minutes to hours). This enables use cases that require immediacy: real-time fraud detection, live dashboards, instant personalization, and alerting on anomalous patterns.

For AI products, streaming data enables real-time feature computation for model inference. Instead of relying on features computed hours ago in a batch pipeline, streaming pipelines compute up-to-the-second features like "number of page views in the last 5 minutes" or "running average session duration." This freshness can significantly improve model accuracy for time-sensitive predictions like fraud detection and real-time recommendations.

Related Terms

Cosine Similarity

Dimensionality Reduction

Batch Inference

Real-Time Inference

Data Pipeline

ETL (Extract, Transform, Load)