What is backpressure in the context of data streaming?

 

What is backpressure in the context of data streaming?


When producers are faster than consumers, the data will either be dropped or accumulated - if it accumulates in memory it leads to OutOfMemoryError. Other resources could be overloaded by this outpacing, not just memory. The notion is backpressure is design to counter-balance this over-pressure. Than consumers are more in charge and can do a reactive-pull, essentially a pull, but this only works with cold data sources - what about hot data such as incoming HTTP requests or tweets? Than the only solution is to enqueue some requests and not start processing them until there is capacity for that processing downstream, or eagerly reject requests - something circuit breakers do very well. The backpressure is an end-to-end problem and it can be tackled in multiple layers starting from the way balancers are configured, queues bounds, concurrency of requests between layers. Using Apache Kafka as a push-pull converter will spare you headaches: all the extra data will accumulate on disks in brokers. Also, Using a distributed-based reactive streams implementation like Akka Streams can go a long way (reactive streams include backpressure as its primary concept; Akka HTTP is based on Akka Streams and can be used to create both clients and servers).


“event time” determined by the timestamp on the data element itself

“processing time”, determined by the clock on the system processing the element

Comments

Popular posts from this blog

Read and Navigate XML - Beautiful Soup

difference-between-stream-processing-and-message-processing

WordNet in Python