Event Hub Capture as a service

For some reason I've had hard times searching for proper use cases when considering Event Hub Capture feature. On the first glance it seems as a reasonable functionality, which can be used for many different purposes:

  • in-built archive of events
  • seamless integration with Event Grid
  • input for batch processing of events

On the other hand I haven't seen any use cases(I mean - besided documentation) using Capture in a real scenario. What's more, the price of this feature(roughly 70$ per each TU) could be a real blocker in some projects(for 10TUs you pay ~200$ monthly for processing of events, now add additional ~700$ - my heart bleeds...). So what is Capture really for?

Compare

Experience shows, that never ever disqualify a service until you compare it with other cloud components. Recently my team has been struggling with choosing the right tool to process messages in our data ingestion platform. To be honest it's not that obvious scenario as it looks. You could choose:

  • EH Capture with Azure Data Lake Analytics
  • Direct processing by Azure Functions
  • Azure Batch with event processors
  • Stream Analytics
  • VM with event processors
  • EH Capture + Azure Batch

Really, we can easily imagine several solutions, each one having pros and cons. Some solutions seems to be more for real time processing (like Stream Analytics, VM with event processors), some require delicate precision when designing(Functions), some seem to be fun but when you calculate the cost, it's 10x bigger that in other choices(ADLA). All are more or less justified. But what about functionality?

IT'S JUST NOT THERE

Now imagine you'd like to distribute your events between different directories(e.g. in Data Lake Store) using dynamic parameter(e.g. a parameter from an event). This simple requirement easily kills some of solution listed above:

  • ADLA has this feature in private preview
  • Stream Analytics doesn't have this even in the backlog

On the other hand, even if this feature was available, I'd consider a different path. If in the end I expect to have my events catalogued, Azure Batch + EH Capture seems like a good idea(especially that it allows to perform batch processing). This doubles the amount of storage needed but greatly simplifies the solution(and gives me plenty of flexibility).

There's one big flaw in such design however - if we're considering dropping events directly to an Data Lake Store instance, we have to have them in the same resource group(what doesn't always work). In such scenario you have to use Blob Storage as a staging scenario(what could be an advantage with recent addition of Soft Delete Blobs).

What about money?

Still Capture is more expensive than a simple solution using Azure Functions. But is it always a case? I find pricing of Azure Functions better if you're able to process events in batches. If for some reason you're unable to do that(or batches are really small), the price goes up and up. That's why I said, that this requires delicate precision when designing - if you think about all problems upfront, you'll be able to use the easiest and the simplest one solution.

Conclusion

I find Capture useful in some listed scenarios, but the competition is strong here. It's hard to compete with well designed serverless services, which offer better pricing and often perform with comparable results. Remember to always choose what you need, not what you're told to. Each architecture has different requirements and many available guides in most cases what cover your solution.

How to pay MORE with Azure Functions?

This post is based on my recent calculations regarding a new version of an old project I'm about to develop for one of my clients. I'm aiming at a perfect balance between performance, flexibility and overall cost. To make things easier, I'll assume, that the fixed limit of a monthly bill for a solution is 50 EUR. I'll show you how easily you can overestimate Consumption Plan pricing model and pay much that you're about to.

Pricing

Azure Functions pricing is somehow pretty straightforward:

  • €0.000014/GB-s
  • €0.169 per million executions

Of course we have a free grant of 400.000 GB-s and 1M of executions. Pretty sweet! It's gonna to be ultra cheap!

Assumptions

This is the expected load we're going to handle:

  • 150k executions per hour
  • less than 100 ms for an execution
  • each execution should use less than 128MB of memory

After a quick calculation those are the numbers we're interested in:

  • 108M executions per month
  • 1,35M GB-s

Total cost will be:

108M*0.169 + 1,35M*0.000014 = 18.083 + 13,03 = 31,383 EUR

Of course I've taken into account free grant. 

Gotchas

While paying something like 30 EUR per month for 108M execution is not a big deal, I'd like to focus on a few gotchas here. When calculating cost for Azure Functions, you have to remember, that there's minimum execution time, which is roughly 100ms and 128MB of memory used. What does it mean? Well, there's a little point in struggling to go below 100ms. On the other hand, you should fight for each and every milisecond above this threshold.

Another thing is how memory used is calculated - during each function execution consumption is calculated by taking the value of memory used and rounding it up to the nearest 128MB. This means, that if you consume each time 129MBs, you will be billed as you'd consume 256MBs. 

Let's check what happens if my function exceeds the limit of 128MBs:

108M*0.169 + 1,35M*0.000014 = 18.083 + 32,2 = 50,283 EUR

So it's extra 20 EUR per month. Please take into consideration, that we're talking about about a simple app, which handles merely 40 requests per second.

Alternatives

So what now? Is Consumption Plan really for me? Well - it all depends on your needs. Of of its best features is the possibility to scale cost with your application's growth. On the other hand maybe you need to execute tiny functions, which seem to cost too much because of lower limits for execution? In such scenario it'd viable to use App Service Plan and just pay a fixed price(or maybe reuse it and host both simple web application and use computation power for Azure Functions).

With the current pricing you could select between S1, B1 and B2 instances and still have plenty of additional features. 

Conclusion

Being aware of how a service works(and what is even more important - how its pricing works) can be crucial in enteprise scenarios, where you have high load and each milisecond and MB matters. Imagine situation, where a simple optimization(like adjusted algorithm, updated package to the newest version) could lead to e.g. 10 EUR savings per each function per month. If you multiply this by hundreds of functions and twelve months, you could end up saving thousands EURs each year. This is of course the most happy path, but in many cases being aware of the full cost changes mindset for real.