Work with Azure Machine Learning with Python SDK
In this tutorial, I am going to show you how to use Python SDK so that you can commit your code on local computer to Azure workspace and run it with cloud computing resources
Install the Azure Machine Learning SDK
You can use pip to install required package for Azure Machine Learning SDK
We have a picture that illustrates the directory structure of our very first project
- The .azureml subdirectory is used for storing Azure Machine Learning configuration files
- config.json: contains the metadata necessary to connect to your Azure Machine Learning workspace. It is generated after running the create-workspace.py file
- pytorch-env.yml: defines the environment where our code will be executed
- In the src subdirectory:
- model.py: defines the architecture of our convolutional neural network
- train.py: code for training our network
- configuration.py: defines a class that contains some constants
- create-workspace.py: create a workspace on Azure Machine Learning
- create-compute.py: create a computing resource inside our workspace
- run-pytorch.py: push the code to our workspace and run it with the computing resource created above
Create an Azure Machine Learning Workspace
We create a workspace in the create-workspace.py file:
After running this code, a file named config.json will be created in the .azureml subdirectory, and a workspace will be created:
Create an Azure Machine Learning compute cluster
We will create a compute cluster with 4 nodes. Depending on computational complexity, the number of nodes that are used for running our code will auto scale between zero and four
Define A Convolutional Neural Network
In this project, we will use pytorch framework to build our model We define the architecture of the neural network, and also the forward propagation operation in the model.py file. Our network includes 2 convolutional layers, each followed by a max pooling layer. After that, we have a flatten out layer, then 3 consecutive fully connected layers.
We have the code for training our network in the train.py file. The converge condition for our program is after 2 epochs
In the code above, we access the run object from within the training script itself by using Run.get_context() method. Then we track the loss metric by using this statement:
By using this log function, you gain the ability to visualize metrics in the studio and compare metrics between multiple runs.
Define the environment
Because our project uses some libraries, so we need to define an environment which points out all the dependencies that your model and training script require. In the pytorch-env.yml file, we have:
Create the control script
In the run-pytorch.py file, we have the code to submit our files to Azure workspace and run it with cloud computing resource.
First we create a Workspace connecting to our Azure Machine Learning workspace:
The configuration for the workspace is previously defined on the config.json file. Now, we can communicate with our Azure Machine Learning resources. Let’s indicate the experiment where this run will belong to.
As I recall, Experiment provides a simple way to organize multiple runs under a single name, as a result, makes it easy to compare metrics between multiple runs.
ScriptRunConfig wraps our train.py code in src subdirectory, then passes it to our workspace. It also specifies what compute resource the script will run on. Briefly, this class is used for configuring how you want your script to run in Azure Machine Learning. We will create an environment which is specified in the pytorch-env.yml file. Then attach it with our running configuration.
Finally, submit our script:
This submission is called a run. A run encapsulates a single execution of your code. Use a run to monitor the script progress, capture the output, analyze the results, visualize metrics and more. You can follow the link printed on the screen to monitor running progress from Azure Machine Learning studio.
In the Metrics tab, you can see the visualization of the loss metric