Azure Function and custom extension for Data Lake - it really works!

It seems that this particular topic(extending Azure Functions with custom bindings) has gone very well(especially taking into account recent news regarding Function, which you can find here). In the final post about Data Lake extension I'll show you two things - how to register your app in AD so you can easily obtain clientId and clientSecret for authentication and how to actually run Data Lake bindings inside Azure Functions runtime!

Registering an application

This is a fairly easy step. You have to:

  • go to Azure Portal
  • find Azure Active Directory blade and then go to App Registrations
  • next click on New application registration - you'll see a form, which you can fill like this:

  • once an application is created go to Keys blade and create a new one - it's your secret
  • clientId in our extension is a value of ApplicationId field from the overview screen

Making it work with Functions runtime

We're prepared to make our extension work with WebJobs runtime, which is a bit different than the one used for Functions. What we do not control when working with Functions are extension, which are registered inside a host. If you try to reference current assembly and run your program, you'll probably see similar screen to this one:

This clearly shows, that there's a problem in discovering our custom binding. Fortunately it's something super easy to fix - as long as you know where the problem occurs. 

Consider following code:

/
public class DataLakeExtensionConfigProvider : IExtensionConfigProvider
{
	public void Initialize(ExtensionConfigContext context)
	{
		var rule = context.AddBindingRule<DataLakeAttribute>();
		rule.BindToInput(attribute => new DataLakeProvider(attribute.ClientId, attribute.ClientSecret));
	}
}

This is an extension config provider, which is needed to allow discovering a binding inside Functions runtime. You may ask what we're doing here? Well, it's pretty straightfoward - we're creating a new binding rule, which allows to bind [DataLake] attribute to DataLakeProvider class responsible for performing all actions. In fact this provider is similar in its purposes to DataLakeAttributeBindingProvider needed for WebJobs.

Additionally we have to show, that our attribute is actually a binding. This requires decorating it with [Binding] attribute:

/
[AttributeUsage(AttributeTargets.Parameter)]
[Binding]
public sealed class DataLakeAttribute : Attribute
{
	public DataLakeAttribute(string clientId, string clientSecret)
	{
		ClientId = clientId;
		ClientSecret = clientSecret;
	}

	public string ClientId { get; private set; }

	public string ClientSecret { get; private set; }
}

If you're missing this attribute, make sure you're using Microsoft.Azure.WebJobs assembly in version 2.1.0-beta1 at least.

Once we have this code we can go a step further - let's create a function, which will really test our extension!

Running a custom extension

To perform actions from this step you'll need VS2017 with Azure Functions SDK installed. If you're ready just add a new function with a timer trigger.

/
public static class DataLakeExample
{
	[FunctionName("DataLakeExample")]
	public static void Run([TimerTrigger("0 */5 * * * *")]TimerInfo myTimer, TraceWriter log)
	{
		log.Info($"C# Timer trigger function executed at: {DateTime.Now}");
	}
}

You'll also need a reference to an assembly containing a custom binding. Now you can extend your function with the logic from the previous examples:

/
public static class DataLakeExample
{
	[FunctionName("DataLakeExample")]
	public static async Task Run([TimerTrigger("*/15 * * * * *")] TimerInfo myTimer,
		[DataLake("clientId", "clientSecret")]
		DataLakeProvider dataLake, TraceWriter log)
	{
		log.Info($"C# Timer trigger function executed at: {DateTime.Now}");

		using (dataLake)
		{
			var path = Path.Combine("This", "Is", "Just", "A", "Test2");
			await dataLake.CreateDirectory(path);
			await dataLake.AppendToFile(Path.Combine(path, "foo"), "THIS IS JUST A TEST");
		}
	}
}

Let's run our function:

What is even more important, data is available inside a Data Lake storage:

Azure Functions, WebJobs and Data Lake - writing a custom extension #2

In the previous post I presented you the way to actually write a simple WebJobs extension, which we were able to execute using TimerTrigger. Besides running it doesn't provide much value - this is why I introduce today a functionality, which will really work with a Data Lake instance and help us push our simple project even further!

Extending DataLakeProvider

Previously DataLakeProvider was only a dummy class, which didn't have any real value. Today we'll it a centre of our logic, enabling easy work with Data Lake and acting as a simple adapter to our storage. Let's focus on our binding signature:

/
public static async Task CustomBinding([TimerTrigger("*/15 * * * * *")] TimerInfo timerInfo,
            [DataLake("clientId", "clientSecret")]
            DataLakeProvider dataLake)

As you can see we're passing two parameters - clientId and clientSecret - to the DataLakeProvider instance. You may ask what are those values and where do we need them? Well, consider following snippet:

/
public class DataLakeProvider : IDisposable
{
	private readonly DataLakeStoreFileSystemManagementClient _client;

	public DataLakeProvider(string clientId, string clientSecret)
	{
		var clientCredential = new ClientCredential(clientId, clientSecret);
		var creds = ApplicationTokenProvider.LoginSilentAsync("domainId", clientCredential).Result;
		_client = new DataLakeStoreFileSystemManagementClient(creds);
	}

	public Task CreateDirectory(string path)
	{
		return _client.FileSystem.MkdirsAsync("datalakeaccount", path);
	}

	public async Task AppendToFile(string destinationPath, string content)
	{
		using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(content)))
		{
			await _client.FileSystem.ConcurrentAppendAsync("datalakeaccount", destinationPath, stream, appendMode: AppendModeType.Autocreate);
		}
	}

	public void Dispose()
	{
		_client.Dispose();
	}
}

This is all what we need to be able to:

  • create a directory in Data Lake
  • perform a concurrent append to a chosen file

The logic which stands behind working on files stored in Data Lake is pretty simple and I won't focus on it for now. What requires some explanation is authentication for sure. As you can see, I'm doing a couple of things:

  • I'm creating a ClientCredential instance, which is a wrapper for AD credentials(we'll go through this later)
  • Next I need to actually log in silently to my AD so I obtain an access token
  • With a token received I can finally create a Data Lake client

This flow is required since all actions on Data Lake storage are authorized using permissions assigned to a specific user or a group in Azure. Once we're done here we can do two more things - fix DataLakeAttributeBindingProvider so it passes attribute parameters to DataLakeProvider and extend our function, so it really performs some real tasks.

Doing it for real!

We need to change one thing in DataLakeAttributeBindingProvider - previously we didn't need passing anything to DataLakeProvider, so the GetValueAsync() looked like this:

/
public Task<object> GetValueAsync()
{
	var value = new DataLakeProvider();

	return Task.FromResult<object>(value);
}

The only thing to do now is to use the right constructor:

/
public Task<object> GetValueAsync()
{
	var value = new DataLakeProvider(_resolvedAttribute.ClientId, _resolvedAttribute.ClientSecret);

	return Task.FromResult<object>(value);
}

Let's also extend our function and try to create a directory and append something to a file:

/
public static async Task CustomBinding([TimerTrigger("*/15 * * * * *")] TimerInfo timerInfo,
            [DataLake("clientId", "clientSecret")]
            DataLakeProvider dataLake)
{
	using (dataLake)
	{
		var path = Path.Combine("This", "Is", "Just", "A", "Test");
		await dataLake.CreateDirectory(path);
		await dataLake.AppendToFile(Path.Combine(path, "foo"), "THIS IS JUST A TEST");
	}
}

Result

When you run a function, you should see similar result to mine:

In the final post about this topic I'll show you how to integrate this extension with a Function App and describe how to obtain clientId and clientSecret - for those, who are not familiar with Azure Active Directory :)