Introduction This PySpark notebook introduces Spark GraphFrames. The dataset SNAP: https://snap.stanford.edu/data/egonets-Facebook.html Dataset description: Nodes 4039 Edges 88234 Abstract from Stanford’s website: This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. Facebook data was collected from survey participants using a Facebook app. The dataset includes node features (profiles), circles, and ego networks. Facebook data has … Continue reading Facebook circles – A Gentle Introduction to Apache Spark GraphFrames
Category: Cloud
Adding new images to Amazon Web Services (AWS)
In this article we discuss how to upload and use Microsoft Windows client operating systems in Amazon Web Services. Unfortunately, there are no pre-canned images with Microsoft Windows client operating systems in AWS. However, this does not mean that one cannot use them in Amazon’s EC2. Although it is trickier than just attaching an ISO image … Continue reading Adding new images to Amazon Web Services (AWS)
HDInsight – Provision a Hadoop Cluster on Azure
Creating a Hadoop cluster on Azure is easy, yet requires some steps. In this blog entry we provide a skeleton PowerShell script to get you started. Requirements To run this script, you need to have Azure PowerShell SDK installed on your machine. To check if it is already installed on your machine run the command: … Continue reading HDInsight – Provision a Hadoop Cluster on Azure
Azure PowerShell SDK
Before running any Azure PowerShell command you need to install the Azure PowerShell SDK on your machine. To check if you already have this installed run the command: PS C:\> Get-Module -ListAvailable Azure Directory: C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement ModuleType Version Name ExportedCommands ———- ——- —- —————- Manifest 1.0.1 Azure {Disable- AzureServiceProjectRemoteDesktop, Enable-AzureSer… … Continue reading Azure PowerShell SDK