How introducing a state in Azure Functions punches you in the face

This post is a bit longer explanation of an issue I reported a few days ago. To make a long story short - I was trying to find a way to fix a bug in my function, where almost each call resulted in an error logged to Application Insights. The exception was a simple Collection was modified; enumeration operation may not execute - I knew what was failing and where but I couldn't find an answer why. I'll try to explain this a little bit and - on the occasion - tell you why state in Azure Functions is a bad, bad thing.

As you can see, each call of my function was polluted by an exception

The background

As you probably know(and if now - you'll get it in a second), before Azure Function can be actually triggered, it has to be initialized. There's a ScriptHostManager class, which is created at the very beginning of the first call to your function - it's responsible for handling the whole host. Once initialized, it waits and listens for one of three things:

  • incoming calls(triggers)
  • a signal for restart
  • an exception(which by the way will cause a restart)

This host will be kept alive for a fixed amount of time and if it's not triggered, it will be closed. Of course it can be scaled if needed, but it's not the case now.

A host is also responsible for digesting all metadata and description related to your functions. It will read function.json file to get information about bindings, find an entry point and validate it against a configuration file. All this operations happen only why time, when host is created, and once it knows everything what it needs, it can invoke your function again and again. However, this is where problem could occur.

The habits

Let's consider following function:

/
public class FooFunction 
{	
	public static void Run(string myQueueItem,
		ICollector<EventLog> fooTable,
		TraceWriter log)
	{
		// Do some processing...	
		fooTable.Add(new FooTable());
	}
}

So far so good - nothing unexpected happens. In fact - this example will work just fine. But let's say we'd like to refactor it a little and extract logic responsible for processing an item:

/
public class FooFunction 
{
	private static ICollector<FooTable> _fooTable;
	
	public static void Run(string myQueueItem,
		ICollector<EventLog> fooTable,
		TraceWriter log)
	{

		_fooTable = fooTable;

		Proc(myQueueItem);
	}
	
	private static void Proc(string myQueueItem) {
		// Do some processing...
		
		_fooTable.Add(new FooTable());
	}
}

Initially there's nothing wrong with this design. On the other hand, we've just introduced a some kind of state, which will last as long as our function is alive. It still could work though and we can live unaware of this flaw. This is how I introduced this bug into my system - I was refactoring my functions as always with a little help of Resharper and in some point just moved ICollector<T> parameter to a static field. 

The problem

As I mentioned, initially you could live with a state in your function and even don't see any problems. If you're using e.g. TimerTrigger, for sure it'll work - you need just a one instance of a function called at specific interval. However, what about triggering a function for each queue item? Per each HTTP request? Or even an event in ServiceBus? In those scenarios, your function will be called concurrently and will simultaneously access your static field, overwriting what other calls have added to a collection. Sooner than later you'll end up with a pretty Collection was modified; enumeration operation may not execute everywhere in your logs. This is why initially this problem won't affect you - if traffic is low enough, it won't trigger a function fast enough to actually make this exception happen.

Please have in mind, that documentation of Azure Functions makes this rather clear, that state should be avoided so introducing it is, well... you deserve to be punished :)

The solution

The solution of this problem is pretty simple - just pass input of your entry point as parameters to other methods in your function. This will help keep your logs clean(interesting fact is that Monitor tab in your function won't show this problem - each call will be marked as successful while the error count will grow!!) and save you from potential other problems related to sharing a state within a function.

 

Running additional VSTS agents in Azure

When working in a smaller team or on a small project, you usually aim to reduce costs of your development pipeline. This is also true for huge projects, but there we're talking about bigger budget and money. When working with a free VSTS subscription, you're limited to only 1 hosted pipeline(and 4 hours / month) to run your builds and release. To be honest - it's nothing when you have real CI/CD set up and working. However, if you consider a fact, that you also have 1 private pipeline, it's possible to mitigate this problem.

Hosting your own agent

VSTS allows you to host and connect your own agents running on your machines. It's a great way to take advantage of a private pipeline when hitting a limit of 4 hours and allows you to have a backup when you need more builds and releases in a short period. To use an agent you need two things:

  • a machine which will host it
  • an installer

A good idea to get a machine is to use your existing Azure subscription(especially when you have MSDN/BizSpark) and create a VM, which can be started and stopped easily. Another thing are capabilities of your machine, which have to match requirements for your build(e.g. MSBuild, Grunt, gulp and so on). 

Get agent screen will help to both determine what is needed to install it and how to do it

Because of capabilities, using different(you can choose from Windows/OSX/Linux) agents can be easier or more difficult. In the future I will show you how to set up e.g. Linux agent, for now we'll focus on the Windows one.

The easiest way is to create a Visual Studio VM from the marketplace in Azure Portal. That way you will have most components needed for a build already installed and configured. 

Usinga Visual Studio virtual machine will ease whole process a lot

To get an installer you have to go to Agent queues screen in VSTS and click the Download button. As present in a one from the screens above, information needed to install and configure it are already there so I won't reinvent a wheel and just ask you to go through it by yourself.

Configuration

Configuration of your agent is pretty straightforward. When you enter following command:

/
C:\your_agent_directory> .\config.cmd

You'll be asked a couple of questions regarding your VSTS account, a name for an agent, agent pool and a method of authentication. The easiest way to authenticate is to use PAT(Personal Access Token) which can be generated here: https://{your_account}.visualstudio.com/_details/security/tokens.

Once you've configured your agent, you can run it with a .\run.cmd command. If everything's all rights, when you go once more to the Agent queues screen, you'll see your build agent available in the selected agent pool:

Building and releasing on your agent

So far so good - we have additional build agent, which can be used in our private pipeline. But what we have to do to schedule a build and a release? 

Builds

Go to the Builds screen. When you click on the Queue new build... button, you will be asked to select a couple options related to a build. The one we're interested in is Queue. Just selected the one which have your build agent configured and click Ok. Your build should start though you've used all available minutes from the hosted pipeline.

Selecting a different queue will allow you to use a different pipeline

Releases

Releases are a bit tricky. When triggering a release you have no option to specify a queue. To do so you have to go to a specific release definition and find Run on agent link above release steps. When you click on it, it will show you another screen, when you can find Deployment queue list, from which your private pipeline can be selected.

Somehow hidden Run on agent screen allow you to run your releases on your own agent

Summary

In above solution you have to consider costs generated by using a VM to perform builds and releases. Currently you can buy another hosted pipeline for 40$ so it's two times less than a minimum VM needed for a Visual Studio. On the other hand, you can automate it so it works only in your work hours = saving some money. If you have an available subscription like BizSpark, you can save even more money. Both solutions are viable but as you can see, starting another build agent is a very easy task, and can help you when you either want to control usage of your pipelines or to build something, what couldn't be built by a hosted agent.