Building your Big Data playground with Azure

Let's say you were assigned a task, which requires you to provision a whole new environment using te

Let's say you were assigned a task, which requires you to provision a whole new environment using technologies, which are not "cool" when used on your dev machine. Let's take into consideration Hadoop - it becomes more and more popular, yet it's still a black box for many(including me) people. What if you'd like to play with it a little? Well, here's the instruction what you have to do to install and run it on Windows. Trust me - it's doable... This is the only "good" part of the whole process.

Do it for me?

I don't like wasting my time and my computer's resources on temporary things, which I need only for a few hours. What I like, is to make something to do it for me. If you take a look at Azure Marketplace, you'll see plenty of available software images, many of them including OSS software. It can be installed and used without any additional charges. Does it include Hadoop? Yes it does. Let's grab it and install it.

Do I have money for it?

If you have an Azure subscription, feel no worries to install this image. As I said - it charges you only for the resources you're using. If you're done with it you can either delete the whole resource group with resources provisioned or disable a VM used by Hadoop - it will save you time when it will be needed next time and the cost is negligible in such case.

Got it! What's next?

The Linux VM instance used to install Hadoop on it is accessible through SSH client and requires passing SSH key to connect to it(you can use whichever client you like like PuTTY or even terminal from SourceTree) or a password, which you provided. Once connected to it, you can run tasks and scripts designed for Hadoop.

Just to make things clear - in Azure Portal, when you go to the Overview tab in VM provisioned for Hadoop, you'll se public IP address, which can be used to connect to it. What is more, you can use SFTP to upload file to the VM or download them. Go to your FTP client and use your_VM_IP:22 as host and enter your credentials. You'll see the default directory of your VM. From this point everything is set - you have your very own Hadoop playground, which you can use whenever you want.

Add comment