Azure Event Hub: Part 1

Well, recently I got the chance to work with Azure Event Hub. And it's actually a very good service to work on also, I personally liked it very much. So, thought of sharing my views on Event Hub in a simplistic way. 

However, I'm planning to explain this topic in 2-3 parts of the series to get it in a better way. Now without wasting much of your time let us start from the very beginning.

Though, before coming to Event Hub let's first discuss what is message broker? 

What is a Message Broker?

A program that translates a message to a formal messaging protocol of the sender, to the formal messaging protocol of the receiver. The primary purpose of a broker is to take incoming messages from applications and perform some action on them. 

This means that when you have a lot of messages (millions, billions of messages), it could be worth looking into a Message Broker to create a centralized store/processor for these messages so that other applications or users can work with these messages. 

So, here comes Event Hub which is one of the message broker and a PAAS(Platform as a Service) offering of Microsoft Azure Cloud which collects large amount of data/events on real-time or near real-time basis. 

However, there are so many message brokers available in the market. For Instance. Kafka, RabbitMQ, etc. Many of you have already heard of Kafka so if I compare Kafka concepts with the Event Hub, it can be defined as below:
        
Kafka ConceptEvent Hub Concept
ClusterNamespace
TopicEvent Hub
PartitionPartition
Consumer GroupConsumer Group
OffsetOffset

I hope Kafka developers can now easily compare with Event Hub.

Event Hub

Azure Event Hubs is a big data streaming platform and event ingestion service. It can receive and process millions of events per second. Data sent to an event hub can be transformed and stored by using any real-time analytics provider or batching/storage adapters.
The data can be collected from websites, mobile apps, IoT sensors & various social networking sites. Also, the data in Event hub can later be processed or analyzed using Spark, Storm, and other services. 
However, the processed data can be stored back into Blob or other storage accounts.

Event Hub Architecture & Terminologies

1) Event Producers

Any entity that sends data to an event hub. Event publishers can publish events using HTTPS or AMQP 1.0 or Apache Kafka (1.0 and above). 

The Event Hubs service provides REST API and .NET, Java, Python, JavaScript, and Go client libraries for publishing events to an event hub. (In my next part of the blog will share Python code to send/publish messages to Event Hubs). 

Event Hubs Capacity 

  • The throughput capacity of event hubs is controlled by throughput units 
  • A single throughput unit includes 
    • Ingress - Up to 1MB per sec or 1000 events per sec(whichever comes first)
    • Egress - Up to 2MB per sec or 4096 events per sec
  • Throughput units purchased for the namespace are shared across all event hubs in that namespace
  • A single partition has a min. scale of 1 throughput units 
  • Publishing events larger than this threshold will be rejected.

2) Partitions

To manage the scale of messages being sent into the Event Hub, the system creates separate ordered message streams called partitions. Each partition contains the message along with metadata in an ordered list.  Messages are sent with a Partition key to identify the partition that the message is targeting.  If no partition key is sent, a round-robin distribution is applied.

Event Hub organizes sequences of events into one or more partitions. As newer events arrive, they're added to the end of this sequence & can be thought of as a "commit log."

3) Consumers

Any process can consume messages from the Event Hub.  Consumption of events is done only via the AMQP protocol which allows the client to receive messages without polling.

4) Consumer Groups

All the event data is accessed through Consumer Groups. A consumer group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and with their own offsets.

There is a default consumer group for all event hubs called $Default and you can specify up to 20 Consumer Groups. However, only 5 consumers can be connected to a single partition.

Well, now we have gone through with the architecture and its components. Let's discuss some more terminologies present in Event Hub.

Other Terminologies

1) Offsets

In simple terms, it is the position of the event within a partition. This offset enables an event consumer to specify a point in the event stream from which they want to begin reading events. 

For better explanation, let's take it this way. Let's say you have a producer which is continuously streaming data. And on the other hand, there is a consumer which is consuming data. So, considering the same let say at some point in time you have stopped your consumer or say the consumer is not consuming data.  So, now if you restart the consumer how consumer will get to know that how much the data was previously consumed and from where it needs to start consuming again. And here, comes the use of offset. Meaning offset maintains this and helps to consume the right data without any duplicacy. 

2) Checkpointing 

Checkpoints are similar to client-side cursors that allow clients to store a partition/offset for failover processing at a later time.

3) Capture

Enabling capture while provisioning the Event hub gives you the ability to save the stream data either on Blob Storage or Azure Data Lake Service account. However, after enabling a one needs to specify the size and time window to perform the capture. Captured data is written in Avro format. 

Hope you are now clear with What Event Hub is? In my next blog of this series, I will come up with some practicals where I will be demonstrating how to create Even Hub using Azure Portal followed by integration with Pyspark. 

If you like this blog, please do show your appreciation by hitting like button and sharing this blog. Also, drop any comments about the post & improvements if needed. Till then HAPPY LEARNING.

Comments