Azure Functions, WebJobs and Data Lake - writing a custom extension #2

In the previous post I presented you the way to actually write a simple WebJobs extension, which we were able to execute using TimerTrigger. Besides running it doesn't provide much value - this is why I introduce today a functionality, which will really work with a Data Lake instance and help us push our simple project even further!

Extending DataLakeProvider

Previously DataLakeProvider was only a dummy class, which didn't have any real value. Today we'll it a centre of our logic, enabling easy work with Data Lake and acting as a simple adapter to our storage. Let's focus on our binding signature:

/
public static async Task CustomBinding([TimerTrigger("*/15 * * * * *")] TimerInfo timerInfo,
            [DataLake("clientId", "clientSecret")]
            DataLakeProvider dataLake)

As you can see we're passing two parameters - clientId and clientSecret - to the DataLakeProvider instance. You may ask what are those values and where do we need them? Well, consider following snippet:

/
public class DataLakeProvider : IDisposable
{
	private readonly DataLakeStoreFileSystemManagementClient _client;

	public DataLakeProvider(string clientId, string clientSecret)
	{
		var clientCredential = new ClientCredential(clientId, clientSecret);
		var creds = ApplicationTokenProvider.LoginSilentAsync("domainId", clientCredential).Result;
		_client = new DataLakeStoreFileSystemManagementClient(creds);
	}

	public Task CreateDirectory(string path)
	{
		return _client.FileSystem.MkdirsAsync("datalakeaccount", path);
	}

	public async Task AppendToFile(string destinationPath, string content)
	{
		using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(content)))
		{
			await _client.FileSystem.ConcurrentAppendAsync("datalakeaccount", destinationPath, stream, appendMode: AppendModeType.Autocreate);
		}
	}

	public void Dispose()
	{
		_client.Dispose();
	}
}

This is all what we need to be able to:

  • create a directory in Data Lake
  • perform a concurrent append to a chosen file

The logic which stands behind working on files stored in Data Lake is pretty simple and I won't focus on it for now. What requires some explanation is authentication for sure. As you can see, I'm doing a couple of things:

  • I'm creating a ClientCredential instance, which is a wrapper for AD credentials(we'll go through this later)
  • Next I need to actually log in silently to my AD so I obtain an access token
  • With a token received I can finally create a Data Lake client

This flow is required since all actions on Data Lake storage are authorized using permissions assigned to a specific user or a group in Azure. Once we're done here we can do two more things - fix DataLakeAttributeBindingProvider so it passes attribute parameters to DataLakeProvider and extend our function, so it really performs some real tasks.

Doing it for real!

We need to change one thing in DataLakeAttributeBindingProvider - previously we didn't need passing anything to DataLakeProvider, so the GetValueAsync() looked like this:

/
public Task<object> GetValueAsync()
{
	var value = new DataLakeProvider();

	return Task.FromResult<object>(value);
}

The only thing to do now is to use the right constructor:

/
public Task<object> GetValueAsync()
{
	var value = new DataLakeProvider(_resolvedAttribute.ClientId, _resolvedAttribute.ClientSecret);

	return Task.FromResult<object>(value);
}

Let's also extend our function and try to create a directory and append something to a file:

/
public static async Task CustomBinding([TimerTrigger("*/15 * * * * *")] TimerInfo timerInfo,
            [DataLake("clientId", "clientSecret")]
            DataLakeProvider dataLake)
{
	using (dataLake)
	{
		var path = Path.Combine("This", "Is", "Just", "A", "Test");
		await dataLake.CreateDirectory(path);
		await dataLake.AppendToFile(Path.Combine(path, "foo"), "THIS IS JUST A TEST");
	}
}

Result

When you run a function, you should see similar result to mine:

In the final post about this topic I'll show you how to integrate this extension with a Function App and describe how to obtain clientId and clientSecret - for those, who are not familiar with Azure Active Directory :)