Connect to SQL database
Sometimes, we need to get data from a data warehouse. How to do it on Azure Machine Learning? In this tutorial, I am going to share with you the way to connect to a database on data warehouse. Let’s start!
Create a datastores
Open Azure Machine Learning workspace, click on Datastores tab and you can see the New datastore button:
Next, you can see this window:
You have to specify the name for your datastore. Because we want to get data from a table on database, on Datastore type, choose Azure SQL database option.
You also need to enter user ID and password when create a “connection” to our database.
Here, I have created a datastore named “customer_segmentation”:
Create a dataset
As I have mentioned, you can consider datastore as a “connection” to our data source, and we can create a dataset within this “connection”. Click on Datasets tab, you can see Create dataset button:
Select “From datastore” option:
Enter the name for our dataset, choose Tabular option on Dataset type:
Choose the datastore for this dataset, then enter the SQL query, the result of this query is going to be the data in our dataset:
So we have just learnt how to create a dataset from a database.
Open Notebook tab, create a new notebook
Get data from dataset
Import necessary modules:
You need to specify the workspace name, subscription_id, resource_group name to create a workspace object. Then you can use get_by_name() method to get data from the corresponding dataset, and convert it to pandas dataframe for further calculations.
After getting data from our dataset, we will execute some calculations such as data aggregation, machine learning, …, depending on your problem. To make a long story short, I am going to skip this section.
Save result to a table on database
We have carried out some calculations, now we get the result. How to commit it to a table on a database?
You need to install a python module named pyodbc, which is an open source Python module making accessing to ODBC databases simple. Using this module, you can easily connect Python application to data sources with an ODBC driver.
Firstly, we need to specify the server’s name, database’s name, username, password and the driver for creating a connection to our database.
Create a connection to our database:
Then you can execute any SQL query on this database, like an insert query below:
You can access to your database and check the result:
New data have just been inserted into our table.