Azure Functions, WebJobs and Data Lake - writing a custom extension

I've been struggling to find some time to write something interesting regarding Azure Functions recently and... finally! In the upcoming blog posts I'll present you the way to easily create a custom extension, which can be used to automatically bind a function parameter to a Data Lake. No more boilerplate code, no need to handle the whole process on your own - a clean and easy way to extend your functions with even more syntactic sugar. Can't way to actually present some ideas so let's dive into the solution.

Extension model

As you may know, Azure Functions are actually built atop of WebJobs SDK. In this SDK the concept of bindings to different Azure components was initially introduced - after that, people responsible for Functions extended it and added even more triggers and other elements, so you can easily integrate with e.g. Table Storage using only attributes.

This whole extension model allows you to write a custom binding(either as a trigger or an output) and use it in your code, so part of the work can be done automatically. Note, that currently there's no easy way to run your custom extension within a Function - nonetheless we'll try to bypass those theoretical limits and prepare a solution, which you'll be able to use in a real scenario.

To make the long story short - to create an extension you need following things:

  • actual attribute for binding a parameter
  • custom binding provider deriving from IBindingProvider
  • config provider derived from IExtensionConfigProvider

This is only a high-level picture of what we're about to build, but it should give you an idea what will be the shape of the extension we'll make. Let's try to write some code.

What do I need?

Preferably VS2017 with a class and console project. We'll be using WebJobs SDK so references to Microsoft.Azure.WebJobs and Microsoft.Azure.WebJobs.Extensions will be needed.

Attribute

This is the easiest part of our project. All you need is to create a following class deriving from a base Attribute:

/
using System;

namespace WebJobs.DataLake
{
    [AttributeUsage(AttributeTargets.Parameter)]
    public sealed class DataLakeAttribute : Attribute
    {
        public DataLakeAttribute(string clientId, string clientSecret)
        {
            ClientId = clientId;
            ClientSecret = clientSecret;
        }

        public string ClientId { get; private set; }
        public string ClientSecret { get; private set; }
    }
}

This attribute will be needed and used when determining parameters needed to connect to a Data Lake. In fact we're done here - there's nothing what is needed more to make it work.

Binding provider

Binding provider is a bit more tricky, it requires much more work also to finish it. In fact it requires more than only one component:

  • IBindingProvider which encapsulates the actual logic
  • IBinding which tells how the actual binding happens
  • IValueProvider which is used to link a binding to a parameter instance

I think a bit of clarification is needed here. Let's consider following example:

/
public static void CustomBinding([TimerTrigger("*/15 * * * * *")] TimerInfo timerInfo, [DataLake("clientId", "clientSecret")] DataLakeProvider dataLake)
{
}

This function is triggered using a TimerTrigger with some interval between each call. It also binds a DataLakeProvider parameter in addition to a passed connection info. To bind information from DataLake attribute to DataLakeProvider you have to go through the whole flow IBindingProvider -> IBinding -> IValueProvider. I won't get into details how to implement each component, let's assume that currently it looks like this:

/
internal class DataLakeAttributeBindingProvider : IBindingProvider
{
	public Task<IBinding> TryCreateAsync(BindingProviderContext context)
	{
		if (context == null)
		{
			throw new ArgumentNullException(nameof(context));
		}

		var parameter = context.Parameter;
		var attribute = parameter.GetCustomAttribute<DataLakeAttribute>(inherit: false);
		if (attribute == null)
		{
			return Task.FromResult<IBinding>(null);
		}

		if (!ValueBinder.MatchParameterType(context.Parameter, new[] { typeof(DataLakeProvider)}))
		{
			throw new InvalidOperationException(string.Format(CultureInfo.CurrentCulture,
				"Can't bind DataLakeAttribute to type '{0}'.", parameter.ParameterType));
		}

		return Task.FromResult<IBinding>(new DataLakeBinding(parameter));
	}

	private class DataLakeBinding : IBinding
	{
		private readonly ParameterInfo _parameter;

		public DataLakeBinding(ParameterInfo parameter)
		{
			_parameter = parameter;
		}

		public Task<IValueProvider> BindAsync(object value, ValueBindingContext context)
		{
			throw new NotImplementedException();
		}

		public Task<IValueProvider> BindAsync(BindingContext context)
		{
			if (context == null)
			{
				throw new ArgumentNullException(nameof(context));
			}

			var attribute = _parameter.GetCustomAttribute<DataLakeAttribute>(inherit: false);
			var valueProviderType = typeof(DataLakeValueProvider);
			var valueProvider = (IValueProvider)Activator.CreateInstance(
				valueProviderType, _parameter, attribute);

			return Task.FromResult(valueProvider);
		}

		public ParameterDescriptor ToParameterDescriptor()
		{
			return new ParameterDescriptor
			{
				Name = _parameter.Name
			};
		}

		public bool FromAttribute => true;
	}

	private class DataLakeValueProvider : IValueProvider
	{
		private readonly ParameterInfo _parameter;
		private readonly DataLakeAttribute _resolvedAttribute;

		public DataLakeValueProvider(ParameterInfo parameter,
			DataLakeAttribute resolvedAttribute)
		{
			_parameter = parameter;
			_resolvedAttribute = resolvedAttribute;
		}

		public Task<object> GetValueAsync()
		{
			var value = new DataLakeProvider();

			return Task.FromResult<object>(value);
		}

		public string ToInvokeString()
		{
			return string.Empty;
		}

		public Type Type => _parameter.ParameterType;
	}
}

It's not fully implemented, yet it gives some basic functionality and allows some initial testing.

Running a solution

To run a solution you need a console application. Make sure you've added Microsoft.Azure.WebJobs and Microsoft.Azure.WebJobs.Extensions and paste the following code:

/
internal class Program
{
	private static void Main()
	{
		var config = new JobHostConfiguration();
		config.UseTimers();
		config.UseDataLake();
		config.TypeLocator = new TypeLocator(typeof(Function));

		var host = new JobHost(config);

		host.Call(typeof(Function).GetMethod("CustomBinding"),
			new Dictionary<string, object>
			{
				{"timerInfo", new TimerInfo(new CronSchedule("*/15 * * * * *"), new ScheduleStatus())}
			});

		host.RunAndBlock();
	}
}

Some of those types will be unavailable to you. Firstly you have to implement a dummy TypeLocator:

/
internal class TypeLocator : ITypeLocator
{
	private Type[] _types;

	public TypeLocator(params Type[] types)
	{
		_types = types;
	}

	public IReadOnlyList<Type> GetTypes()
	{
		return _types;
	}
}

It's used to choose which function will be indexed by a JobHost instance so it's required to add your function to it. The next thing is a UseDataLake() extension method, which tells the host(or actually enables it) to actually perform binding:

/
public static class DataLakeHostConfigurationExtension
{
	public static void UseDataLake(this JobHostConfiguration config)
	{
		if (config == null)
		{
			throw new ArgumentNullException(nameof(config));
		}

		// Register our extension configuration provider
		config.RegisterExtensionConfigProvider(new DataLakeExtensionConfig());
	}

	private class DataLakeExtensionConfig : IExtensionConfigProvider
	{
		public void Initialize(ExtensionConfigContext context)
		{
			if (context == null)
			{
				throw new ArgumentNullException(nameof(context));
			}

			// Register our extension binding providers
			context.Config.RegisterBindingExtensions(new DataLakeAttributeBindingProvider());
		}
	}
}

Note that we're registering an extension for an attribute binding only. If we'd like to enable triggering a function based on a custom trigger, we'd have to add another binder. The last thing is the actual function we'd like to trigger:

/
public static class Function
{
	public static void CustomBinding([TimerTrigger("*/15 * * * * *")] TimerInfo timerInfo, [DataLake("clientId", "clientSecret")] DataLakeProvider dataLake)
	{
	} 
}

DataLakeProvider for this moment can be just an empty class. The important thing is that when you hit F5 you should be able to access a function triggered:

Summary

As you can see writing an exception for WebJobs is pretty easy and gives you almost unlimited possibilities when it comes to adding custom functionalities to your solution. In the next post I'll show you how to properly implement Data Lake connection, extend our API so we can do something in the function and how to be able to actually use it in a function. Stay tuned!

It's so easy - backup build and release definitions from VSTS using Azure Functions

When working with build and release definitions in VSTS we're blessed with the possibility to check audit logs, what was changed, when and by who. This - together with proper permissions setup - allow proper access management and easy rollback if something was misused. Unfortunately VSTS lacks an easy way to export those definitions so we can backup them or version in our repository. In this post I'll show you a quick way to schedule daily backups using Azure Functions.

Prerequisities

To perform actions from this post you'll need Visual Studio 2017 15.3 with Azure Functions SDK installed. Since those tools are no longer in preview, I no longer use CSX to create examples and proofs of concepts. I strongly advise you to update to the latest VS version so you can take the most from the new SDK.

What is more you'll need also a personal access token(PAT) from VSTS. Please read this article if you haven't for an idea how to get it.

Creating functions

To be able to schedule our backup, we'll need two functions. Both we'll be triggered by a timer and both will upload a blob to a Blob Storage container. Here's our infrastructure needed:

ARM template visualization created by ARMata

As you can see this is the basic infrastructure needed to be able to use Functions, which can be easily set up in Azure Portal. Once we have required components provisioned, we can prepare code, which will create backups.

In VS when you go to Create project wizard, you'll see a window with available templates. When you go to the Cloud tab you should see Azure Functions template ready to be created:

Once a project is created right-click on it, to to Add menu and select New item:

From the available positions select Azure Function and click Add. You'll see plenty of different function templates, from which we have to choose Timer trigger. Change the schedule to 0 0 0 */1 * * so it will be triggered once a day and click Ok.

Creating a backup

To create a backup we'll use once more VSTS REST API. Here are endpoint, which we'll use here:

They return JSON definitions, which can be easily stored and versioned. The actual code for creating a build definition backup looks like this:

/
public static class BuildBackup
{
	private const string Personalaccesstoken = "PAT";

	[FunctionName("BackupBuild")]
	public static async Task Run([TimerTrigger("0 */1 * * * *")]TimerInfo myTimer, [Blob("devops/build.json", FileAccess.Write)] Stream output, TraceWriter log)
	{
		try
		{
			using (var client = new HttpClient())
			{
				client.DefaultRequestHeaders.Accept.Add(
					new MediaTypeWithQualityHeaderValue("application/json"));

				client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic",
					Convert.ToBase64String(
						System.Text.Encoding.ASCII.GetBytes(
							string.Format("{0}:{1}", "", Personalaccesstoken))));

				using (var response = await client.GetAsync(
					$"https://{instance}.visualstudio.com/DefaultCollection/{project}/_apis/build/definitions?api-version=2.0")
				)
				{
					var data = await response.Content.ReadAsAsync<JObject>();
					foreach (var pr in data.SelectToken("$.value"))
					{
						var id = pr.First.SelectToken("$.id");
						using (var release = await client.GetAsync(
							$"https://{instance}.visualstudio.com/DefaultCollection/{project}/_apis/build/definitions/{id}?api-version=2.0")
						)
						{
							release.EnsureSuccessStatusCode();
							var releaseData = await release.Content.ReadAsStringAsync();
							var bytes = Encoding.UTF8.GetBytes(releaseData);
							await output.WriteAsync(bytes, 0, bytes.Length);
						}
					}
				}
			}
		}
		catch (Exception ex)
		{
			log.Info(ex.ToString());
		}
	}
}

To create a backup of a release definition you can use following function:

/
public static class ReleaseBackup
{
	private const string Personalaccesstoken = "PAT";

	[FunctionName("BackupRelease")]
	public static async Task Run([TimerTrigger("0 0 0 */1 * *")]TimerInfo myTimer, [Blob("devops/release.json", FileAccess.Write)] Stream output, TraceWriter log)
	{
		try
		{
			using (var client = new HttpClient())
			{ 
				client.DefaultRequestHeaders.Accept.Add(
					new MediaTypeWithQualityHeaderValue("application/json"));

				client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic",
					Convert.ToBase64String(
						Encoding.ASCII.GetBytes(
							string.Format("{0}:{1}", "", Personalaccesstoken))));

				using (var response = await client.GetAsync(
					"https://{instance}.vsrm.visualstudio.com/{project}/_apis/Release/definitions")
				)
				{
					var data = await response.Content.ReadAsAsync<JObject>();
					foreach (var pr in data.SelectToken("$.value"))
					{
						var id = pr.First.SelectToken("$.id");
						using (var release = await client.GetAsync(
							$"https://{instance}.vsrm.visualstudio.com/{project}/_apis/Release/definitions/{id}")
						)
						{
							release.EnsureSuccessStatusCode();
							var releaseData = await release.Content.ReadAsStringAsync();
							var bytes = Encoding.UTF8.GetBytes(releaseData);
							await output.WriteAsync(bytes, 0, bytes.Length);
						}
					}
				}
			}
		}
		catch (Exception ex)
		{
			log.Info(ex.ToString());
		}
	}
}

Some details:

  • I used a blob container named devops  - of course you can use any name you like
  • Unfortunately there's no way to combine those two functions(as long as you'd like to use different blob for holding build and release definitions)
  • You can easily version those JSON definitions by - instead of storing them in Blob Storage - calling a VSTS REST API for a repository and uploading a blob there