Durable Functions - Durable Task Framework basics

In the previous post I presented some basic concepts behind Durable Functions and reasoning why they've been introduced and what we can achieve with them. This time I'll focus on the very foundation of Durable Functions - Durable Task Framework - and its features. We'll try to understand its mechanics and build a very simple workflow to get an idea how it works. Let's go!

Mechanics

The basic concept of Durable Task Framework is to use Service Bus to orchestrate work and use it as a temporary storage for a state. When Framework is initialized, it creates a Service Bus queue under the hood, which will be used as a the main components to pass messages. Note that queue size is not unrestricted - you have to choose on of the following values:

  • 1024MB
  • 2048MB
  • 3072MB
  • 4096MB
  • 5120MB

any other size will be treated as an error and will result in an exception.

 If you go to Core Concepts sections for Durable Task Framework, you can find following diagram:

https://github.com/Azure/durabletask/wiki/images/concepts.png

It shows the underlying structure of Durable Task Framework and all main elements of the architecture. It may be a little confusing now, but once we start creating orchestrations, all will become easier to understand.

The most important thing here is Task Hub Worker, which allows adding Task Orchiestrations and Task Activities and dispatching to these - to make the long story short, it acts as foundation of your solution.

The difference between Orchiestration and Activity is fairly simple - once Activity is the actual action, which should be performed and we can refer to it as a simple and atomic task, which will be executed, Orchiestration is thing, which aggregates Activities and orchestrates them. You can think about it as a conductor, which is responsible for going the right path.

From Durable Tasks to Durable Functions

You may ask "how do Durable Tasks connect to Durable Functions?" - in fact initially there's no explicit connection. We have to consider what would be the best way to achieve orchestration in the world of serverless. In the previous post I mentioned, that current solution includes using Azure Storage queues, what for sure lets you achieve the goal, but is far from ideal solution. Natural evolution of this idea is to utilize something what is called event sourcing and instead of pushing and fetching messages from queues, just raise an event and wait for an eventual response:

  • Function1 started executing
  • Function1 called Function2
  • Function2 started executing
  • Function2 finished executing
  • Function1 called Function3
  • Function1 finished executing

This is a trivial concept but yet a really powerful one. By storing a state in a such manner(using an append-only log) you're gaining many profits:

  • there's no way to mutate a state with appending another event
  • immutable state - difficult to corrupt
  • no locking
  • it' easy to recreate a state if needed 

Now if you consider, that Activities can be treated as events, there's an easy way to Durable Functions, where each Activity is another function and a state is stored in Azure Storage and maintained by the runtime.

Summary

Today we went a bit deeper into Durable Task Framework and considered connection between this library and Durable Functions. In the next post I'll try to present a basic example of Durable Function and what changes in that approach when creating serverless application.

Durable Functions - basic concepts and justification

Recently team responsible for Azure Functions announced, that new cool feature is entering an alpha preview stage - Durable Functions. Because this introduces a completely new way of thinking about serverless in Azure, I decided to go in-depth and prepare a few blog posts regarding both new capabilities and the foundations of Durable Functions.

Concepts

Conceptually Durable Functions are something, what forces you to rethink what you've already learnt about serverless in Azure. When writing common functions like inserting something into a database or passing a message to a queue, you've always been trying to avoid storing state and perform all actions as quickly as possible. This had many advantages:

  • it was easy to write a simple function, which performs basic operations without preparing boilerplate code
  • dividing your module into small services really helped during maintaining your solution
  • scaling was quite simple and unequivocal

All right - it seems that we had all we needed, why one tries to introduce a completely different concept, which raises learning curve of Functions? 

Communication

Normally if you want to communicate between functions, you will have to use queues. It's a perfectly valid solution and in simple scenarios the whole solution won't be cumbersome. However if you're creating a bigger system with several functions orchestrating work between each other, sooner than later you'll hit a wall - communication using queues will become a bottleneck of your solution and maintenance will become a nightmare. 

Additionally - more advanced scenarios(like fan-in/fan-out) are ridiculously hard to achieve.

What about stateless?

For some people concept of introducing a state into functions destroys what is best about them - possibility to scale out seamlessly and treating them as independent fragments of your system. In fact, you have to distinguish "traditional" functions and durable ones - they have different use cases and reasoning between differs a lot. The former should be used as a reaction to an event, they're ideal when you have to take an action in answer to a message. The latter are more sublime during adoption - you'll use them for orchestrating workflows and pipelines, which let you easily perform an end-to-end processing of a message.

Pricing

One more thing considering Durable Functions is pricing, mostly because it is what makes serverless so interesting. In Durable Functions it doesn't change - you'll still pay only for the time, when a function executes - when a function awaits for a result of running other functions, no cost is allocated here. This is thanks to the fact, that once a task is scheduled, execution of a function returns to the Durable Task Framework layer and waits for further actions there.

I strongly recommend you to take a look try something with Durable Functions. This feature is still in an early preview so it might be unstable in some way, but it gives so many possibilities now, that it's really worth a try. You can find more info here: Alpha Preview for Durable Functions.