Managing your git repository via REST API in VSTS

Let's say you'd like to store some results of a build or a release inside a repository. It could be any reason - easy access, versioning, the only tool you have access to. By default VSTS doesn't provide any kind of git-related steps, which could be helpful in such case. Fortunately, once more its REST API comes to the rescue, giving us an opportunity to fully manage repositories with the possibility to push multiple commits.

Before we start, take a look at the overview to understand what are the general capabilities of this API.

Making an initial commit

Once you have your git repository created in VSTS, you can either initialize it with a commit from the GUI or push an initial commit using API. If you take a look at the example from the documentation:

/
POST /_apis/git/repositories/{repository}/pushes?api-version={version}

and its body:

/
{
  "refUpdates": [
    {
      "name": "refs/heads/master",
      "oldObjectId": "0000000000000000000000000000000000000000"
    }
  ],
  "commits": [
    {
      "comment": "Initial commit.",
      "changes": [
        {
          "changeType": "add",
          "item": {
            "path": "/readme.md"
          },
          "newContent": {
            "content": "My first file!",
            "contentType": "rawtext"
          }
        }
      ]
    }
  ]
}

you'll see mostly self-descriptive properties, which build this JSON. The only thing - oldObjectId - is not so obvious. Basically it's SHA1 of the commit this commit is based on - since there're no commits yet, it's basicaly a string full of zeros.

Pushing data

Making an initial commit is a piecie of cake. What if we'd like to update an existing file? The main issue here is to find an oldObjectId, which is required to actually make a request successful. Once more the API becomes handy here - what we can do is to fetch a list list of all pushes and take the last one. Take a look at the signature from the documentation:

/
GET https://{instance}/DefaultCollection/_apis/git/repositories/{repository}/pushes?api-version={version}[&fromDate={dateTime}&toDate={dateTime}&pusherId={guid}&$skip={integer}&$top={integer}]

What is great about this request is the possibility to filter the data - we don't have to download all pushes, only those from the date interval, made by a specific pusher or maybe only the top N. The response gives us a list of pushes ordered from the newest to the oldest. That is important here is to pass includeRefUpdates=true parameter in the query string. This way we'll get following additional property in the response:

/
{
          "repositoryId": "04baf35b-faec-4619-9e42-ce2d0ccafa4c",
          "name": "refs/heads/master",
          "oldObjectId": "0000000000000000000000000000000000000000",
          "newObjectId": "5e108508e2151f5513fffaf47f3377eb6e571b20"
}

and we're able to refer to the newObjectId property to make an update. Once we have it, we can use once more the endpoint used to create an initial commit with a slightly modified body:

/
{
  "refUpdates": [
    {
      "name": "refs/heads/master",
      "oldObjectId": "5e108508e2151f5513fffaf47f3377eb6e571b20"
    }
  ],
  "commits": [
    {
      "comment": "Added a few more items to the task list.",
      "changes": [
        {
          "changeType": "edit",
          "item": {
            "path": "/readme.md"
          },
          "newContent": {
            "content": "Modified readme file!",
            "contentType": "rawtext"
          }
        }
      ]
    }
  ]
}

Once we post this request, a new commit should be pushed and visible when you access a repository.

Building your Big Data playground with Azure

Let's say you were assigned a task, which requires you to provision a whole new environment using technologies, which are not "cool" when used on your dev machine. Let's take into consideration Hadoop - it becomes more and more popular, yet it's still a black box for many(including me) people. What if you'd like to play with it a little? Well, here's the instruction what you have to do to install and run it on Windows. Trust me - it's doable... This is the only "good" part of the whole process.

Do it for me?

I don't like wasting my time and my computer's resources on temporary things, which I need only for a few hours. What I like, is to make something to do it for me. If you take a look at Azure Marketplace, you'll see plenty of available software images, many of them including OSS software. It can be installed and used without any additional charges. Does it include Hadoop? Yes it does. Let's grab it and install it.

Do I have money for it?

If you have an Azure subscription, feel no worries to install this image. As I said - it charges you only for the resources you're using. If you're done with it you can either delete the whole resource group with resources provisioned or disable a VM used by Hadoop - it will save you time when it will be needed next time and the cost is negligible in such case.

Got it! What's next?

The Linux VM instance used to install Hadoop on it is accessible through SSH client and requires passing SSH key to connect to it(you can use whichever client you like like PuTTY or even terminal from SourceTree) or a password, which you provided. Once connected to it, you can run tasks and scripts designed for Hadoop.

Just to make things clear - in Azure Portal, when you go to the Overview tab in VM provisioned for Hadoop, you'll se public IP address, which can be used to connect to it. What is more, you can use SFTP to upload file to the VM or download them. Go to your FTP client and use your_VM_IP:22 as host and enter your credentials. You'll see the default directory of your VM. From this point everything is set - you have your very own Hadoop playground, which you can use whenever you want.