Building your Big Data playground with Azure

Let's say you were assigned a task, which requires you to provision a whole new environment using technologies, which are not "cool" when used on your dev machine. Let's take into consideration Hadoop - it becomes more and more popular, yet it's still a black box for many(including me) people. What if you'd like to play with it a little? Well, here's the instruction what you have to do to install and run it on Windows. Trust me - it's doable... This is the only "good" part of the whole process.

Do it for me?

I don't like wasting my time and my computer's resources on temporary things, which I need only for a few hours. What I like, is to make something to do it for me. If you take a look at Azure Marketplace, you'll see plenty of available software images, many of them including OSS software. It can be installed and used without any additional charges. Does it include Hadoop? Yes it does. Let's grab it and install it.

Do I have money for it?

If you have an Azure subscription, feel no worries to install this image. As I said - it charges you only for the resources you're using. If you're done with it you can either delete the whole resource group with resources provisioned or disable a VM used by Hadoop - it will save you time when it will be needed next time and the cost is negligible in such case.

Got it! What's next?

The Linux VM instance used to install Hadoop on it is accessible through SSH client and requires passing SSH key to connect to it(you can use whichever client you like like PuTTY or even terminal from SourceTree) or a password, which you provided. Once connected to it, you can run tasks and scripts designed for Hadoop.

Just to make things clear - in Azure Portal, when you go to the Overview tab in VM provisioned for Hadoop, you'll se public IP address, which can be used to connect to it. What is more, you can use SFTP to upload file to the VM or download them. Go to your FTP client and use your_VM_IP:22 as host and enter your credentials. You'll see the default directory of your VM. From this point everything is set - you have your very own Hadoop playground, which you can use whenever you want.

Building feature branches on VSTS automatically

Currently VSTS lacks(see edit below) really handy feature, which lets you trigger a build based on a branch wildcard. For instance, I'd like to run my build definition only for refs/heads/feature/* branches. While it's super-easy when using e.g. TeamCity, with VSTS it requires us to prepare a smart workaround.

Service hooks

VSTS provides us a wide collection of different service hooks. They can be used to command builds, analyze logs or trigger any kind of an event. In fact the list of available services is already impressing and gets bigger - I strongly recommend you to take a look when you have a minute.

Besides integrating with third-party services like Slack, Bamboo or Trello, it allows us to use a simple Web Hook, which can send a specific event to a specific endpoint. Because we can send any event to any endpoint, the sky is the only limit - you can create a "lambda" using Azure Function(like me), a simple WebAPI or old but still reliable MVC application using technology stack of your choice. Whatever solution you'll create, it will work as long as it can process HTTP requests.

Kick it back!

Retrieving an event from VSTS is cool but gives information only. What we really need is to orchestrate VSTS to schedule a build using a specific build definition on a specific branch. Once again REST API provided by VSTS comes to the rescue:

POST https://{instance}/DefaultCollection/{project}/_apis/build/builds?api-version={version}

The important part of this request is its body:

  "definition": {
    "id": 25
  "sourceBranch": "refs/heads/master",
  "parameters": "{\"system.debug\":\"true\",\"BuildConfiguration\":\"debug\",\"BuildPlatform\":\"x64\"}"

There're two things which are required to make it work: build definition ID and source branch. You can obtain the former from the URL when you go directly to the definition you choose, once the latter is being passed along with the whole request in data.resource.refUpdates[0].name field.

Code example

An example of a Azure Function which can handle the functionality:

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
    log.Info("C# HTTP trigger function processed a request.");

    // Get request body
    dynamic data = await req.Content.ReadAsAsync<object>();

    if(data.eventType == "git.push" && data.resource.refUpdates[0].name.Value.Contains("refs/heads/feature")) {
        using(var client = new HttpClient()) {
            var branch = data.resource.refUpdates[0].name.Value;
            log.Info($"Build will be scheduled for {branch} branch.");

            client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", "yourToken");
            await client.PostAsync("https://{yourInstance}{projectId}/_apis/build/builds?api-version=2.0", 
                new StringContent($"{{\"definition\": {{\"id\": {definitionId}},\"sourceBranch\": \"{branch}\"", Encoding.UTF8, "application/json"));

            log.Info($"Build scheduled successfully!.");
            return req.CreateResponse(HttpStatusCode.OK, "Build scheduled");

    return req.CreateResponse(HttpStatusCode.BadRequest, "Error occured");

Combining it all

To summarize steps needed here:

  1. Go to Service Hooks tab in VSTS and create new Web Hook pointing to your service endpoint using Code pushed as event
  2. Create a service, which will handle requests sent by VSTS and call its API to schedule a build
  3. Enjoy builds including only filtered branches


After this was pointed I checked recently updated docs and it seems wildcard are now officially supported! You can treat this post as an inspiration for building new features around VSTS. You can find new documentation here ->