Dat tricky Throughput Unit

This is a perfect test whether you understand the service or not. Pardon me if this was/is obvious for you - apparently for me it was not. What's more, some people still seem to be confused. And we're here to avoid confusions. Let's start!

Hit the limit

Event Hub's model(both from technical and pricing point of view) is one of the easiest to understand and really, really straightforward. Let's perform a quick calculation:

I need to perform 1K ops / sec - I need 1 TU
I need to perform 10K ops / sec - I need 10TUs

In the overall price we can exclude cost related to the number of events processed(since it's like 5% of the money spent for this service). Now there's one more thing worth mentioning - TUs come as a pair - 1TU gives you 1MB(or 1K events) ingress and 2MB(or 2K events) egress. The question is:

"How to kill Event Hub's egress?"

Let's focus a little and try to find a scenario. We cannot exceed easily 1MB of egress having max ingress of 1MB. Maybe it'd doable by loading lots of data into EH and then introducing a consumer, which will be able to fetch and process 2MB of events per second. Still, this doesn't allow us to exceed the maximum of 2MBs of egress. We're safe.

But what if you introduce another *consumer group*? Since there's no filtering in Event Hub, each consumer group gets the same amount of events(in other words - when you have N consumer groups, you will read the stream N times). Now in the following scenario:

1MB of ingress
Consumer1(Consumer Group A)
Consumer2(Consumer Group B)

You've just hit the limit of 1TU(since you have 2MB of egress). Now let's try to scale and extend this solution. Let's introduce another consumer group:

1MB of ingress
Consumer1(Consumer Group A)
Consumer2(Consumer Group B)
Consumer3(Consumer Group C)

Now 1TU is not sufficient. By scaling out our Event Hub to 2TUs we can handle up to 4MBs of egress. In the same moment we can handle 2MBs of ingress. So if some reason throttling was your friend and kept the load up to some limit, you can quickly face problems and need to scale out once more.

Be smarter

As you can see, mistakes can be made and relying on consumer groups to filter(or orchestrate) events is not a way to go. In such scenario it'd much better to post events directly to e.g. Event Grid or use topics from Service Bus, so we can easily route messages. You have to understand, that the main purpose of Event Hub is to act as a really big pipe for data, which can be easily digested - misusing it could give you serious headaches.


"You're older than I expected" - tricky message retention in Azure Event Hub

Event Hub(or Service Bus if we're talking about a service as "a whole") has a concept of a message retention. In short - you can set a fixed period, which will determine after how many days a message is considered outdated and is no longer available in a bus. So far so good - nothing tricky here and the concept is fairly simple. Unfortunately there're not so obvious gotchas, which can hit you really hard, if you forget about them.

You're older than I expected

The confusion comes from the fact, that Event Hub is a part of a bigger service called Service Bus and is only a subset of available functionalities. When we're considering Service Bus, we're talking about queues, topics and so on. All those have a property called TTL(time-to-live), which is attached to messages being passed. Although TTL means different thing for each different concept(e.g. at-least-once/at-most-once for queues), it's here and its definition is intuitive. The question is - how is this related to message retention mentioned earlier?

The confusion comes from the fact, that different services in Service Bus are designed for different purposes - because of that each treats a definition of message or entity in a slightly different way. Since Event Hubs are considered a big scale solution(in opposite to e.g. queues), they rather track whole blocks of messages rather than a single message, which is being pushed through a pipeline.

This being said, there's a reason why message retention is no always what you can expect - if you're using Event Hub for a fairly small amount of data(tens thousands events per day at most), there's quite a big likelihood, that it won't be considered as "outdated" as long as the container, which holds messages, is full.

I saw this... twice?

Now imagine following situation - you're about to go live with a solution, which is currently on a production environment, and using Event Hub as the heart of it. Let's say Event Hub was gathering data for the last seven days(message retention is set to only one day so this shouldn't be a case) because you wanted to avoid a cold start. Now consumers started and... your clients are receiving events/notifications/pop-ups from a week ago. 

The first problem - you forgot to check in your code whether a message is valid from your point of view. This happens, especially if you consider a documentation as the only source of your information about a service.

The second - well, it was nowhere said, that message retention is not what it looks like at the first glance.


As you can see, it's a good thing to remodel your way of thinking about messages and entities when working with Event Hub to avoid confusion. Apply a certain level of abstraction to your infrastructure and ask yourself a question - am I working with single messages or maybe whole blobs, which make sense only when are fully processed? If the answer is the former, you can trick yourself sooner or later.