In this blog post, we will be discussing Structured Streaming including all the other concepts which are required to create a successful Streaming Application and to process complete data without losing any.
As we all know that one of the most important points to take care of while designing a Streaming application is to process every batch of data that is getting Streamed, but how?
So, to get a real solution to this question, continue reading this blog.
Structured Streaming
Structured Streaming is an efficient way to ingest large quantities of data from a variety of sources.
Problem
We have a stream of data coming in from a TCP-IP socket, Kafka, Kinesis, or other sources...
The data is coming in faster than it can be consumed
How do we solve this problem?
Micro-Batch as a Solution
Many APIs use micro batching to solve this problem. In this, we take our firehose of data and collect data for a set interval of time (Trigger Interval). Let's say the Trigger Interval is of 2 seconds
As we are processing data, the next batch of data is being collected for us.
In our example, we are processing two seconds worth of data in about one second.
What happens if we don't process the data fast enough?
Simple as that, in the case, we will most likely be losing data.
But our goal is to process the data for the previous interval before data from the next interval arrives.
Comments
Post a Comment