Push files to Azure Machine Learning Datastore

In this tutorial, we are going to learn how to push your local files to Azure Machine Learning Datastore.

Directory Structure

We are going to work on demand forecasting problem. The picture above illustrates for our project structure:

  • The .azureml subdirectory includes some configuration files for our workspace (same as previous tutorial)
  • The data directory includes our train and test set files
  • In the src subdirectory, we have:
    • The get_data_from_azure.py file: used for retrieve data from Machine Learning Datastore
    • The models.py file defines some function to train and test our models for this problem
    • The predict.py file will use functions defined in the above files to get data, and train the model
  • The upload_data.py file takes responsibility to push our data to Azure Machine Learning Datastore

Push data from local to Azure Machine Learning Datastore

We can use upload() method to push our files from a specific directory to datastore, like this:

When we created a workspace in the previous tutorial, there is a default datastore was bound with this workspace, and we can get it by the get_default_datastore() method of Workspace. We also need to specify the source directory path where your files are located, and the target path on Azure Machine Learning Datastore.

Let’s take a glance on our dataset:

We are going to predict sales in the future by using history sales’ information.

Create regression models

There are a lot of regression models that you can choose for this problem, such as: Linear Regression, k-NN, Decision Tree, … In this tutorial, I built Ridge and Random Forest models. I used sk-learn, which is a python library that helps you easily create your own machine learning models.

These functions allow you to choose the hyper-parameters for you models. By changing these hyper-parameters and inspecting the performance of your models, you can choose the best one that works for your problem.

An evaluation function can help you to doing this inspection, it will return the root mean squared error of your model’s prediction on the test set. You need to choose the model that minimizes this metric.

Get data from Azure Machine Learning Datastore

In the previous step, we pushed our data to a specific path on datastore. Now, we can access and retrieve data via this path, then transform them into pandas dataframe for further calculations, like this:

The take(-1) method means that we are going to get all “row” in our files.

Train and evaluate models

Firstly, we need to define all the features needed for our models, and the desired values that our models are going to predict.

We can set a list of hyper-parameters’ values. With each value, we create a corresponding model, then choose the best one:

Finally, we get the result: