Sunday, April 22, 2018

Easy Steps To Creating And Deploying A Predictive Model Using Azure Machine Learning Studio

Yesterday, I demoed how in 10 mins you can create a predictive model, deploy it as a web service and test it without having to install anything or pay any money. That is the awesomeness made possible by Microsoft with the super easy to use Azure Machine Learning Studio. The yesterday event was the Lagos edition of the Global Azure Bootcamp held at Microsoft office, Lagos.

Participants were able to follow along, created and deployed their own predictive models too. In today's post I will be guiding you with easy steps to follow on how you too can in a few minutes create and deploy a predictive model cost-free with Azure Machine Learning Studio.

Step 1
Download the sample data we would use: Bank Marketing data from UCI Machine Learning Repository. If you download from UCI Machine Learning Repository directly, then it is the bank-additional-full.csv file in the zip file you end up with. Then you have to make sure that you break the data into separate columns rather than leave them comma separated, using Excel's Text to Columns. For you ease, I have shared a cleaned version you can directly use without any extra work by you: Bank Marketing data download

The sample data is a marketing campaign data of a Portuguese bank from May 2008 to November 2010 recording the details of prospects reached via phone calls and whether they eventually took up the service the bank was trying to sell them.
The cleaned sample data
Below is the explanation of the different fields in the data records.

Input variables:
# bank client data:
1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
# related with the last contact of the current campaign:
8 - contact: contact communication type (categorical: 'cellular','telephone') 
9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
# other attributes:
12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client (numeric)
15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
# social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric) 
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):
21 - y - has the client subscribed a term deposit? (binary: 'yes','no')

Step 2
Sign up for Azure ML studio. It is easy and free: https://studio.azureml.net 



Step 3
Upload the Bank Marketing dataset. From Datasets section on the left menu pane, click New at the bottom left.



Step 4
Create a new experiment. From Experiments section on the left menu pane, click New at the bottom left. Choose a blank experiment, as we are creating ours from scratch.



Step 5
Now we start dragging the tasks we want to carry out into the Experiment workspace, after renaming the Experiment.


Drag in the dataset we uploaded, it is in the Saved Dataset section on the left.


Next, we need to isolate the fields that would be useful for our predictive model. If you look at the description of all the fields in the dataset, it is obvious that some are not practically useful in creating a prediction of whether a prospect will take up the marketed service or not. Example is the length of the call, there is no way you would know that until the end of the call -- so not useful for profiling who to call (targeted marketing). By my thinking, the fields I that would be of real world use in creating an actionable predictive model are -- age, job, marital, education, default, housing, loan.

Drag Select Columns in Dataset in the Manipulation subsection of Data Transformation section. Connect the dataset previously dragged in to the select columns task. Then click on Launch column selector, and select the columns needed (including the outcome we want to predict, so as to be able to train the model).



Next, split the data into training set and testing set for building our predictive model. Drag Split Data, connect to the select columns task and on the settings pane on the right, set the training set to 0.75 (75%) of the entire dataset.



Drag in the model algorithm to use. It's under the Initialize Model. I chose to use the Two-Class Decision Forest. In the end, you would evaluate the model to see if it fits well or you should try another algorithm.



Drag in Train Model. Connect to both the already dragged in algorithm and the left side of the Split Data (training set). Select the outcome to predict.



Drag in Score Model. Connect to the Train Model and the testing set of the Split Data.


Lastly, drag in Evaluate Model. Connect to Score Model.


Now run the entire experiment.


Wait for it to finish running.


Right click on Evaluate Model and visualize the evaluation result to see the fitness/accuracy of the algorithm.



If you are okay with the fit, then what's left is to publish. Otherwise, you can change the algorithm, re-run and re-evaluate the fit.



Step 6
Now you set up the model as a web service that can be deployed online.



Change the input connector to point to the Score Model. 


Also, remove the predicted column from the Selected Column as it was only needed for training the model.



Now re-run and deploy as web service.



You are presented with the web service details to use for integrating with any app or online tool. You can even test the API directly.






And that's how you create and deploy a predictive model in Azure Machine Learning Studio without installing anything on your computer and without paying a cent/kobo.

Enjoy!

Thursday, February 22, 2018

February 2018 Webinar: 45 mins Crash Course on Power BI



I have been teaching professionals from different companies and countries Power BI since 2015.

Many people still find it tough to grasp what the value is in using Power BI or what business intelligence really is. Some even see it as one of those shinny new tools that line consultants pockets with money but provide not much tangible value for the company.

In this webinar, I will attempt to condense my two days class into a 45 mins one focusing more on helping you understand the value of business intelligence and how Power BI works in creating value beyond what is possible with Microsoft Excel.

So make it a day with us and you'll be on your way to grasping the practical use of business intelligence via Power BI.

Date: Wednesday, 28th February 2018
Time: 3:00pm -- 4:00pm (Nigerian time)
Venue: YouTube live (https://www.youtube.com/watch?v=mJNHmMf3nNw)

See you!

Tuesday, January 9, 2018

Understanding The 29 In-built Power BI Visuals And How To Access Additional (Custom) Visuals

Power BI visuals are the actual elements — tables, charts, filters/slicers, maps, etc — that present the data in your report. By default, Power BI comes with 29 visuals.




They are:

1. Stacked Bar Chart: This allows you to create a bar chart with the breakdowns (field in legend) stacked on top of each other. It can be used to show total sales with breakdown by products or region. It has five components — Axis (where you put the field that should have separate bars, like date), Legend (where you put the field to stack one on another for each category in the axis, e.g. products or regions; anything you drag into Legend comes out with different colors), Value (where you put the field with the figures you want to plot), Color Saturation (allows you to represent the values in a field on a light to dark color intensity on the plotted value bars. You can’t use it and Legend together. A likely use will be to show volume/quantity of products sold while the bar values present the sales amount), and Tooltips (allows you to show extra details, like price per unit of the product).




2. Stacked Column Chart: Technically same as the Stacked Bar Chart just the orientation is different, its bars are vertical.




3. Clustered Bar Chart: The difference between this and the stacked one is that it has the breakdowns (legend values) plotted on independent bars rather than stacked one on another.





4. Clustered Column Chart: The difference between this and the stacked column chart is that it has the breakdowns (legend values) plotted on independent bars rather than stacked one on another.




5. 100% Stacked Bar Chart: This has the legend values (breakdown) expressed as percentages of the total value per axis item. Useful for showing relative contribution of sales by the different branches to total sales each day/month. And if you work with market research data, excellent for market share representation.




6. 100% Stacked Column Chart: Just as you would have guessed, it is the column version of the 100% Stacked Bar Chart.




7. Line Chart: Has all the components of the Bar/Column chart except the Color Saturation one. The line chart is to show trend (change over time), so you should always put a date or time field in the Axis.




8. Area Chart: It is very much like the line chart but with the area under the lines shaded. Has same components as the line chart.






9. Stacked Area Chart: You already know area chart, this is when you stack the legend entries one on another.







10.        Line and Stacked Column Chart: This is just combining line chart with column chart in the same visual. It is what we call Combo Chart in Excel and can be useful for showing two distinct insights in one visual — like the gross margin as a line chart and the revenue as a column chart over a period of time.




11.        Line and Clustered Column Chart: Again, just like the line and stacked column one except that the columns aren’t stacked.




12.        Ribbon Chart: This chart is a lot like the area chart but with the added advantage that it makes it easier to see the changes in the values of the entries in the legend.





13.        Waterfall Chart: This chart is for showing the movement in a metric over a period of time, emphasizing the initial value and the end value. It is a beloved chart of finance analysts, it is often used to present changes in a company’s cashflow from opening cashflow to closing cashflow over a reporting financial period.





14.        Scatter Chart: This chart is for showing the relationship between two variables. That is why it requires you put a field in X-axis and another in Y-axis. And it can also serve as bubble chart, you only need to drag the field to determine the bubble size into Size.










15.        Pie Chart: This chart shows relative contribution of entries in the field put in the Legend to the field put in the Values. You can also put a field with additional useful information in the Details.





16.        Donut Chart: It is exactly pie chart but with the traditional donut hole.





17.        Treemap: This chart shows relative contribution but unlike pie chart that fits everything in a big circle this one fits everything in a resizable rectangle.



18.        Map: The name is very self-explanatory. Normally, you would drag countries/cities or any location field to Location but if the locations are not very popular places you might need to get the GPS coordinates and place in the Latitude and Longitude.



19.        Filled Map: It is like the Map but fills the entire location area on the map taking the shape of the country/state/city.





20.        Funnel: This chart is best for stage-like fields and values. Popular for sales conversion records. You can put the sales/conversion stages in the Group.




21.        Gauge: This chart is great for showing the values against target on a gauge-like scale. And you can set the dimensions (minimum and maximum of the scale).  It is one of the few visuals that allow you to set an alert on in the published Power BI dashboard






22.        Card: It displays just one thing. Can be very useful for showing total sales, KPI figure and counts (like number of stores or orders). It is also one of the visuals that allow you to set an alert on it in the published Power BI dashboard.




23.        Multi-row Card: It is like card with extra features — ability to display values of more than one field. An example is sales by branch.







24.        KPI: It shows the variance between a value and its set target. Very useful for key performance indicators (KPIs).





25.        Slicers: I often call them nicer filters. Work exactly as a filter.



26.        Table: It is an intuitive table that aggregates the fields you put in intelligently.





27.        Matrix: It is exact replica of PivotTable. We have already used it in the last sample project.




28.        R Script Visual: This allows you to run R scripts in Power BI. Might be of particular interest to people already proficient in data analysis using R.





29.        ArcGIS Maps for Power BI: This is very much like Map but with some peculiar features you might find very useful.



Lastly, Power BI allows you to access more visuals to use in your reports via Custom Visuals on the Home menu.