Creating a Hadoop cluster on Azure is easy, yet requires some steps. In this blog entry we provide a skeleton PowerShell script to get you started.
Requirements
To run this script, you need to have Azure PowerShell SDK installed on your machine. To check if it is already installed on your machine run the command:
PS C:\> Get-Module -ListAvailable Azure
Directory: C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement ModuleType Version Name ExportedCommands ---------- ------- ---- ---------------- Manifest 1.0.1 Azure {Disable-AzureServiceProjectRemoteDesktop, Enable-AzureSer... PS C:\>
If the above command returns no data, then you need to install the SDK before proceeding. You can install the SDK in either of two ways, either via PowerShell Gallery or via Web PI. Follow this link for more details how to install the required components.
Starting the provisioning
Now that you have Azure PowerShell SDK installed, all you need to do is tweak the parameters below and then execute the script that follows.
Parameters
To make it easier to manipulate the script, we will use a set of variables to adjust how we want to create our cluster. These are as follows:
# This is the prefix that we use to make our cluster elements unique
# Change this to suit your taste
$myPrefix = "clounce"
$clusterName = $myPrefix + "cluster"
# Adjust the number of nodes that you want
$clusterNodes = 2
$clusterVersion = "3.3"
# Change the names below if you need to use pre-existing resources
$resourceGroupName = $myPrefix + "rg"
$storageAccountName = $myPrefix + "sa"
$storageContainerName= $myPrefix + "cnt"
# Change this to your nearest Azure data centre
$location = "West Europe"
# Change the subscription name to yours. You can read this by running the command
# Get-AzureSubscription. You may be asked for your Azure credentials.
$subscriptionName = "Azure Pass"
Login to your subscription
Before we can start running Azure PowerShell commands, we need to authenticate and select the subscription that we want to use.
Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionName $subscriptionName
Setting up resources
The next step is to setup up the resources needed to run a Hadoop Cluster. The code that follows checks if the resource being created already exists and if so, it uses the already existing resource.
# Create resource group
# The -Force parameter suppresses any warning and uses the old ResourceGroup if it already exixts
New-AzureRmResourceGroup -name $resourceGroupName -Location $location -Force
# Create storage account
if (!(Test-AzureName -Storage $storageAccountName))
{
New-AzureRmStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName -Location $location -Type Standard_RAGRS
}
# Get storage key
$storageAccountKey = Get-AzureRmStorageAccountKey -ResourceGroupName
$resourceGroupName -Name $storageAccountName | %{ $_.Key1 }
# Create a storage context object
$storageContext = New-AzureStorageContext -StorageAccountName
$storageAccountName -StorageAccountKey $storageAccountKey
# Create a Blob storage container
$blobContainer = Get-AzureStorageContainer -Name $storageContainerName -
Context $storageContext -Verbose:$false -ErrorAction SilentlyContinue
if($blobContainer -eq $null)
{
New-AzureStorageContainer -Name $storageContainerName -Context $storageContext
}
Create the Cluster
Now that we have the resources required created, we are ready to provision our cluster. Note that the provisioning will take some time to create. Please be patient!
# Get user credentials to use when provisioning the cluster.
Write-Verbose "Prompt user for ssh credentials to set during provisioning."
$credentials = Get-Credential
Write-Verbose "Use these credentials to login to the cluster via ssh when the script is complete."
# Create a new HDInsight cluster
New-AzureRmHDInsightCluster -ResourceGroupName $resourceGroupName `
-ClusterName $clusterName `
-Location $location `
-DefaultStorageAccountName "$storageAccountName.blob.core.windows.net" `
-DefaultStorageAccountKey $storageAccountKey `
-DefaultStorageContainer $storageContainerName `
-ClusterType Hadoop `
-OSType Linux `
-Version $clusterVersion `
-ClusterSizeInNodes $clusterNodes `
-SshCredential $credentials
Viewing Cluster Details
Once the cluster is provisioned, you can get its details by running the Get-AzureRmHDInsightCluster as follows:
PS> Get-AzureRmHDInsightCluster -ClusterName $clusterName
Connecting to your Cluster
You have now created your cluster and you can connect to it using ssh or any other terminal software such as Putty. The url that you have to use can be obtained from the Azure Portal, look for the Secure Connect under the Cluster configuration or by using the following pattern.
First view your cluster information:
PS C:\Windows\System32> Get-AzureRmHDInsightCluster -ClusterName $clusterName
Location : West Europe ClusterVersion : 3.3.1000.0 OperatingSystemType : Linux ClusterState : Running ClusterType : Hadoop CoresUsed : 16 HttpEndpoint : myclustername.azurehdinsight.net
Next, take the HttpEndpoint, and re-format it as follows:
myclustername-ssh.azurehdinsight.net
Finally, ssh to the url above.
Note of caution: Clusters can take a large amount of your subscription cost. We suggest that you remove the cluster when you are ready from it. You can keep the storage for later use. To delete the cluster run the following command:
Remove-AzureRmHDInsightCluster -ClusterName $clusterName
You can download the script from here.
Enjoy!