Managing Compute Resources from the Azure ML Python SDK
Provisioning and deleting compute clusters & compute instances using the Azure ML Python SDK
When running machine learning experiments using the Azure ML Python SDK, one of the things you will need to have handy is a compute resource.
This article walks you through some common Python code you might write to retrieve, provision, update, and even delete compute instances and compute clusters in Azure Machine Learning. Typically these steps are done to accomplish some goal such as launching an Auto ML regression run or an Auto ML classification run, but this code can also be helpful in its own right.
Note: The code in this article assumes you have an Azure ML Workspace and can retrieve it from config files. See my tutorial on connecting to a Azure Machine Learning Workspace from the Azure ML Python SDK for additional steps if you’ve never done this before.
Retrieving an existing compute resource
If you have already defined a compute resource via the SDK or in Azure Machine Learning Studio, you can retrieve that resource fairly easily with the Azure ML Python SDK by declaring a ComputeTarget
.
A ComputeTarget
will attempt to find a compute instance or cluster by name in your workspace and will throw a ComputeTargetException
if one is not found.
The following code will find a compute resource named My-Compute
in the workspace indicated in the config.json
file:
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
# Get the workspace from the config file
ws = Workspace.from_config()
# Now let's make sure we have a compute resource
resource_name = "My-Compute"
# Fetch or create the compute resource
try:
compute = ComputeTarget(workspace=ws, name=resource_name)
print('Found existing compute: ' + resource_name)
except ComputeTargetException:
# A ComputeTargetException is thrown if the ComputeTarget did not exist
print('The compute resource was not present')
Assuming that a compute resource was found, it will now be available in the compute
variable. However, it is very common to write this code a little differently to provision a compute resource if one was not found.
Create a new compute instance
Creating a compute resource can be done inside of the except
block. This way, the compute resource is only created if it did not exist before.
The code for creating the compute instance can be found in the sample below, starting inside of the except
block:
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget, ComputeInstance
from azureml.core.compute_target import ComputeTargetException
# Load the workspace from config.json
ws = Workspace.from_config()
# Now let's make sure we have a compute resource
instance_name = "My-Compute"
# Fetch or create the compute resource
try:
instance = ComputeTarget(workspace=ws, name=instance_name)
print('Using existing compute: ' + instance_name)
except ComputeTargetException:
# Create the instance
print('Provisioning compute instance...')
sku = 'STANDARD_DS1_V2'
compute_config = ComputeInstance.provisioning_configuration(vm_size=sku, ssh_public_access=False)
instance = ComputeInstance.create(ws, instance_name, compute_config)
# Ensure the instance is ready to go
instance.wait_for_completion(show_output=True)
Note that sku
here must be a supported VM type that exists for your workspace’s region and you have available quota for.
Billing Tip: I strongly recommend you manually select the appropriate VM image name in Azure Machine Learning Studio considering cost, performance, memory, and other factors of interest to you in order to avoid billing surprises.
Create a new compute cluster
Creating a compute cluster is extremely similar to creating a compute instance, but we work with slightly different classes and parameters as shown below:
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
# Load the workspace from config.json
ws = Workspace.from_config()
# Now let's make sure we have a compute resource
cluster_name = "My-Cluster"
# Fetch or create the compute resource
try:
cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name) # This will throw a ComputeTargetException if this doesn't exist
print('Using existing compute: ' + cluster_name)
except ComputeTargetException:
# Create the cluster
print('Provisioning cluster...')
max_nodes = 4
sku = 'Standard_D2DS_V4'
compute_config = AmlCompute.provisioning_configuration(vm_size=sku, min_nodes=0, max_nodes=max_nodes)
cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
# Ensure the cluster is ready to go
cpu_cluster.wait_for_completion(show_output=True)
Instead of using a ComputeInstance
, we now focus on using an AmlCompute
instance instead.
Additionally, when provisioning a cluster, it is important to specify the minimum and maximum number of nodes in that cluster.
Billing Tip: Always make sure that the
min_nodes
parameter on yourAmlCompute
provisioning configuration is set to0
. This allows your cluster to go offline when there is no work for it.See my compute resources article for additional details and reasoning on compute clusters.
Deleting a Compute Resource
If you find yourself wanting to delete a compute resource, the process is fairly easy; just call the delete
method, then wait_for_completion
to ensure the delete goes through!
See the following code for an example of deleting a compute resource and note that this works on both compute clusters and compute instances:
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
# Load the workspace from config.json
ws = Workspace.from_config()
# Now let's make sure we have a compute resource
instance_name = "My-Compute"
# Fetch or create the compute resource
try:
instance = ComputeTarget(workspace=ws, name=instance_name)
instance.delete()
instance.wait_for_completion(show_output=True)
print('Deleted compute resource')
except ComputeTargetException:
print('Already deleted!')
Next Steps
That’s it! Managing compute in the Azure ML Python SDK is fairly straightforward.
Being able to retrieve and provision compute resources in the Python SDK will set you up nicely for running machine learning experiments directly from the Azure ML Python SDK as we shall see in future articles.