Connecting to an Azure Workspace from the Python SDK

Connecting to an Azure Workspace from the Python SDK

Use the Azure ML Python SDK to connect to Azure Machine Learning Studio

In this article we’ll explore several ways of connecting to an Azure Machine Learning Studio Workspace from Python code using the Azure Machine Learning SDK for Python as well as some of the things you can do with that workspace after connecting.

Note: this article assumes you have already installed the Azure Machine Learning SDK for Python. Please see Microsoft’s installation guide for current installation instructions.

Connecting to Azure Machine Learning Studio

In order to connect to a workspace, the workspace must first already exist, so see my article on creating an Azure Machine Learning Workspace if you need to.

Once you have a Workspace created, you’ll now need three things for the SDK:

  • Your Azure Subscription ID
  • The Resource Group the Workspace is in
  • The name of the Workspace

These are the three basic things you need to connect to a workspace in Azure, and there are two ways of using them.

Connecting with Raw Credentials

Although I do not recommend this approach, it is possible to manually connect to a workspace by providing the necessary information as strings to the Workspace constructor:

from azureml.core import Workspace

subscription_id = 'some-id-goes-here'
resource_group = 'my-resource-group-name'
workspace_name = 'MattOnDataScience'

ws = Workspace(subscription_id, resource_group, workspace_name)

If you don’t have a resource group yet, you can also create one with code similar to the following:

from azureml.core import Workspace

subscription_id = 'some-id-goes-here'
resource_group = 'my-resource-group-name'
workspace_name = 'MattOnDataScience'
resource_group_location = 'eastus2'

ws = Workspace(subscription_id, 
               resource_group, 
               workspace_name, 
               create_resource_group=True, 
               location=resource_group_location)

This way works, and it is has the benefit of being very clear in which workspace you are connecting to, but the major downsides here are that your workspace information is stored in code which is likely tracked in version control and it becomes harder to use the same code to affect multiple workspaces.

Connecting with a Config.json File

The second way to connect to a workspace is the one I personally recommend, and that is to use a config.json file to represent your workspace. This file lives in the same directory as your Python code and looks something like the following JSON file:

{
    "subscription_id": "some-id-goes-here",
    "resource_group": "my-resource-group-name",
    "workspace_name": "MattOnDataScience"
}

While this is a fairly simple file, you don’t need to create one yourself. Instead, navigate into your Machine Learning instance in the Azure Portal and click the Download config.json link in the upper left corner as pictured below:

Download Config.json

This will download a completed config.json file that does not require any modifications.

Next, you can connect to the workspace in Python via the following code:

from azureml.core import Workspace

# Connect to the workspace (requires config.json to be present)
ws = Workspace.from_config()

Azure Subscription IDs can be considered sensitive information so you may want to add the config.json file to your .gitignore file and not track the file in version control if storing your code in a publicly-visible repository.

A nice benefit to the config.json approach is that if you need to do a data science experiment in a different workspace, this can be as simple as changing which config.json file you’re using without any code changes needed.

What exactly is a Workspace?

Okay, so now we have a Workspace instance. What exactly is it?

Well, in the Azure ML Python SDK, Workspaces are the central object you use to get other objects, track experiments, and otherwise interact with Azure.

To see some basic details about a workspace in Python, you could run the following code:

# Display high-level info on the workspace
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Resource group: ' + ws.resource_group, sep = '\n')

If you need additional details on your workspace, you can call the get_details() method to get a dict containing many key-value pairs on resources associated with the workspace. Those could be displayed in Python with the following code:

details = ws.get_details()
print(details)

What do I do with a Workspace?

Once you have access to your workspace, there are many things you can do from the SDK, including:

  • Run a machine learning experiment
  • Register a data set
  • Register a trained model
  • Load registered data sets
  • Load registered models
  • Download a trained model’s files for use outside of Azure
  • Create compute resources
  • Deploy trained models as endpoints
  • View past experiment runs
  • View model details, metrics, and explanations

Just to show you how simple it can be to work with a Workspace, the following code lists the name of all experiments (now called Jobs) that you have run on your workspace:

for experiment in ws.experiments:
    print(experiment)

Another common task you may wind up doing with a Workspace is to get the blob storage container that acts as the default datastore for that workspace. This blob stores datasets, trained models, logs, and more and can be accessed via the following code:

# The default datastore is a blob storage container where datasets are stored
datastore = ws.get_default_datastore()

print('Default data store: ' + datastore.name)

As you can see, there’s not too much to connecting to a workspace, but this is a vital step in the process of working with machine learning on Azure via the Python SDK.

Let me know which aspects of the Azure ML Python SDK interest you the most and I’ll try to prioritize those pieces of content!