Recently, Machine Learning has become a buzz word which can be spotted on many tech-related articles and magazines. It is also one of the hottest areas for startups. However, what is Machine Learning, and why is it such a big deal?
Let’s first enjoy a video showing how Machine Learning helps to improve our lives.
The elevator shown in the video is called ThyssenKrupp Elevator, a smart elevator which is able to alert technicians when to fix them. With the help of Microsoft Azure Machine Learning, the elevator is able to send data collected by sensors installed on different parts of the elevator to the cloud. Microsoft Azure Machine Learning then provides useful information that will help technicians to maintain the elevator.
This elevator is cool, isn’t it? If you would like to know more about ThyssenKrupp Elevator, you can read this article to find out more about Predictive Maintenance.
Last month, we are proud to have Doli, Big Data engineer working in Malaysia iProperty Group, to give us a good introduction to Azure Machine Learning during the Azure Community Singapore monthly meetup. The video is actually used as an example of showing how Machine Learning can help to improve our lives during the meetup.
Ano… What is Machine Learning?
Just like ThyssenKrupp Elevator, can we make our computers learn and behave more intelligently based on data?
For example, is it possible that using both flight and weather data, we can know which scheduled flights are going to be delayed? Or from the products the customers choose on our e-commerce website, can we find out more about which product should be recommended to which group of customers? Machine Learning makes it possible. Machine Learning takes historical data and make prediction about future trends.
It is a challenge to make machine learning easy to use. It requires data scientists to build learning systems and training sets. In addition, machine learning also consumes significant computing resources. However, with the friendly tools available in Azure Machine Learning, developers can now easily import data, pick suitable algorithms and then publish the predictive model as web services which can be used for new data in the future.
Supervised vs. Unsupervised Learning
Two types of Machine Learning tasks are highlighted in Doli’s talk. Supervised and unsupervised learning.
In supervised learning, new data is classified based on the training data which are accompanied with labels to help the system to learn by example. The application how-old.net which went viral recently is a good example of supervised learning. There is an interesting discussion on Quora about how how-old.net works. In the discussion, you can also read about the Microsoft Bing Senior Program Manager, Eason Wang, sharing his views about this how-old.net project that he works on.
Unlike supervised learning, unsupervised learning is trying to find structure in unlabeled data. For example, one of the unsupervised learning techniques is called Clustering. It groups data into small groups based on similarity such that data in the same group are as similar as possible and data in different groups are as different as possible.
For those who have tried out how-old.net, it should be quite clear that the outcome is not accurate. Hence, Doli highlighted in his talk a few times that prediction of Machine Learning is never about perfect accuracy.
Azure Machine Learning
With Azure Machine Learning, we are now able to perform cloud-based predictive analysis.
Azure Machine Learning is a service that developers can use to build predictive analytic models with training datasets. Those models then can be deployed for consumption as web service in C#, Python, and R. The process can be summarized as follows.
- Data Collection: Understanding the problem and collecting data
- Train: Training the model
- Analyze: Validating and tuning the data
- Deploy: Exposing the model to be consumed
Doli shared with us the steps in details during the meetup. Here, I will just use the tutorial available on Azure Machine Learning Studio to guide you through the key steps in the process.
Collecting data is part of the Experiment stage in Machine Learning. In case some of you wonder where to get large datasets, Doli shared with us a link to a discussion on Quora about where to find those public accessible large datasets.
In fact, there are quite a number of sample datasets available in Azure Machine Learning Studio too. In fact, you can also use Reader to connect to a MS SQL server to get your own data.
After getting the data, we need to do pre-processing, i.e. cleaning up the data. For example, we need to remove rows which have missing data.
In addition, we will choose relevant columns from the dataset (aka features in machine learning) which will help in the prediction. Choosing columns requires a few rounds of experiments before finding a good set of features to use for a predictive model.
Train and Analyze
Machine Learning is about learning from a dataset and applying it to new data. Hence, in order to evaluate an algorithm in Machine Learning, the data collected will be split into two sets, the Training Set for Machine Learning to train the algorithm and Testing Set for prediction.
Doli shared that the more data we use to train the model, the better the expected results. However, they are many people having different opinions. For example, there is an interesting online discussion about the optimal ratio between the Training Set and Testing Set. Some said 3:2, some said 1:1, and some said 3:1. I leave it 1:1, as shown in the tutorial in Machine Learning Studio.
In Azure Machine Learning Studio, there are many learning algorithms to choose from for your model. After we choose an algorithm, we just need to hit the “Run” button located at the command bar to train the model and make a prediction on the test dataset. Once it is finished, we can then view the prediction results.
If you are satisfied with the results, you can always improve the model by changing the features, properties of algorithm, or even algorithm itself.
Finally, we can publish it as a web service so that we can directly use it for new data in the future. Alternatively, you can also download an Excel workbook from the Machine Learning Studio which has macro added to compute the predicted values.
Join Our Meetup!
If you would like to find out more about Azure Machine Learning, there are a few materials that I would like to share with you.
- A detailed step-by-step guide available on Microsoft Azure documentation about how to create an experiment in Machine Learning Studio;
- Free e-book from Microsoft about Azure Machine Learning;
- Microsoft Project Oxford which tells you more about the Face APIs, Speech APIs, Computer Vision APIs, and other cools APIs that you can use.
Please correct me if you spot any mistake in my post because I am still very, very new to Machine Learning. Please join our Azure Community meetup too, if you would like to know more about Azure. Hope to see you there! =)