Considering appropriate app service plan for your Azure Functions

As we all(or most of us) know, Azure Functions can be hosted using either a regular app service plan or a consumption plan. We can quickly summarize pros and cons of both:

App service plan:

  • fixed cost(+)
  • easy to scale(+)
  • ability to run 64-bit applications(+)
  • can reuse other app service plans(+)
  • fixed cost(--)
  • some triggers need Always On enabled(-)

Consumption plan

  • pay-as-you-go(++)
  • no need to have Always On for e.g. TimerTrigger(+)
  • somehow more difficult to scale(-)
  • if not designed carefully, the cost may exceed our expectations seriously(-)

All right - but what really should I consider when it comes to choosing the correct plan for my application?

Scalability

In the current world scalability is something, what should be really considered when designing an application and choosing technologies for it. Let's consider a following example - you're designing an e-commerce application, which has to handle really big traffic spikes from time to time(imagine Black Friday or any other black something). The specifics of traffic on your website could be described as:

  • stable and low traffic for the most of a week
  • an increase during weekend
  • occasional huge spikes during special events 

Now let's relate this to our service plans. What is better for us in such scenario?

Well, the problem with consumption plan is the fact, that it needs a running start. You can't expect, that if your application is hit by a huge traffic spike, it'll scale out immediately. What is more, you don't have a possibility to scale up - you're limited to the resources allocated for you for your instance of the consumption plan. You have to consider one more thing - execution of functions is not throttled in any way by default so you may face a situation, where under a heavy load your function utilize too much CPU/memory at once. You can control throttling by following three properties:

  • maxOutstandingRequests
  • maxConcurrentRequests
  • dynamicThrottlesEnabled

which are described here. The one thing you have to remember when using throttling is the possibility, that your client may get HTTP 429 Too Busy responses. Whether it's a problem, only you can decide.

When using s regular app service plan those traffic spikes are much easier to handle. Since you can have scaling rules, you don't care when scaling out happens - it just does when it hits CPU/memory threshold(or other metric if autoscale is enabled). Additionally you can preprovision extra resources if you know when a traffic spike will happen - this way you're prepared.

Cost

When designing cloud solutions, the cost is one of the most important factors. If you choose components poorly or overdesign them, the bill at the end of a month won't make you happy for sure. This is the second thing directly related to service plans, which affects our choice when it comes to select what is the best.

Pay-as-you-go model in consumption plan is something, what really make functions interesting. When designed carefully, you can run them for almost free each month(or pay only a few USD/EUR after the free quota is exceeded). The problem is when you keep your functions "red" - in such scenario, it may be easier and cheaper to use a regular app service plan, which ensures the constant cost of this component and won't surprise you after a busy weekend(how is that I have to pay extra 500$ this month?).

Of course with a regular app service plan you lose flexibility and have to remember to scale your application down(or at least have something to automate this). The compromise here depends on your current needs and how the model of your business looks like. However it's still better to discuss it now and be aware of your possibilities rather than discovering them when functions start to respond with HTTP 503 status.

"You're older than I expected" - tricky message retention in Azure Event Hub

Event Hub(or Service Bus if we're talking about a service as "a whole") has a concept of a message retention. In short - you can set a fixed period, which will determine after how many days a message is considered outdated and is no longer available in a bus. So far so good - nothing tricky here and the concept is fairly simple. Unfortunately there're not so obvious gotchas, which can hit you really hard, if you forget about them.

You're older than I expected

The confusion comes from the fact, that Event Hub is a part of a bigger service called Service Bus and is only a subset of available functionalities. When we're considering Service Bus, we're talking about queues, topics and so on. All those have a property called TTL(time-to-live), which is attached to messages being passed. Although TTL means different thing for each different concept(e.g. at-least-once/at-most-once for queues), it's here and its definition is intuitive. The question is - how is this related to message retention mentioned earlier?

The confusion comes from the fact, that different services in Service Bus are designed for different purposes - because of that each treats a definition of message or entity in a slightly different way. Since Event Hubs are considered a big scale solution(in opposite to e.g. queues), they rather track whole blocks of messages rather than a single message, which is being pushed through a pipeline.

This being said, there's a reason why message retention is no always what you can expect - if you're using Event Hub for a fairly small amount of data(tens thousands events per day at most), there's quite a big likelihood, that it won't be considered as "outdated" as long as the container, which holds messages, is full.

I saw this... twice?

Now imagine following situation - you're about to go live with a solution, which is currently on a production environment, and using Event Hub as the heart of it. Let's say Event Hub was gathering data for the last seven days(message retention is set to only one day so this shouldn't be a case) because you wanted to avoid a cold start. Now consumers started and... your clients are receiving events/notifications/pop-ups from a week ago. 

The first problem - you forgot to check in your code whether a message is valid from your point of view. This happens, especially if you consider a documentation as the only source of your information about a service.

The second - well, it was nowhere said, that message retention is not what it looks like at the first glance.

Summary

As you can see, it's a good thing to remodel your way of thinking about messages and entities when working with Event Hub to avoid confusion. Apply a certain level of abstraction to your infrastructure and ask yourself a question - am I working with single messages or maybe whole blobs, which make sense only when are fully processed? If the answer is the former, you can trick yourself sooner or later.