Tuesday, April 18, 2017

Webinar: Excel vs Python vs R for Data Analysis. Making the right pick.



There is a lot of textbook arguments on Excel vs Python vs R. In fact, a few weeks back I used to be part of that argument. I took sides with Excel. And I had my solid reasons.

But now I don't take sides anymore. I focus on a bigger picture, a picture I became aware of only after I transitioned for reading textbooks and doing video tutorials on Python and R to carrying out live interesting projects with them.

In this webinar I will be sharing my experience on using them all and what practical insight you too need to have to move from those textbook arguments to seeing the big picture. And most importantly, I will be showing you that big picture.

You shouldn't miss this webinar.

Time: 3:00pm to 4:00pm
Date: Tuesday, 21 March 2017
Venue: YouTube Live

You should set it up in your calendar with reminders so you won't miss this special edition.

See you!

Note: The webinar already took place but you can view the YouTube recorded session at https://www.youtube.com/watch?v=YLoOynh8iNQ 

Tuesday, April 11, 2017

Real-time Interactive Nigerian Stocks Analysis Portal

You asked and I have delivered. And I want to say a big thanks to everyone who kept asking and nudging me do to this -- I am happy to say its done.

Everyday, it automatically updates with the close of trading day price (for now set to 12 midnight, will adjust to late afternoon). It also shows the historical price from as far back as year 2002. That is something you won't get on any other platform for free. And the best part is that it is very interactive, I built it more like I do dashboards for very high paying clients.

It also allows you to compare the important financial statement lines of the stocks you are interested in: from Revenue, Net Profit, Return on Equity, Return on Assets, Total Assets, Gross Margin, Cashflow from Operations, Cashflow from Financing, Cashflow from Investing, Net Cash Flow, Current Assets, Current Liabilities to Total Liabilities.
And all very visual.
It has a scroll bar that shows you the latest closing price of all the stocks.
At the backend, I have a program that populates an Azure SQL database table with the closing prices for the stocks. I already have a stockpile of the annual reports of the companies from as far back as 2001 (you can check them out and download for free at https://www.slideshare.net/olafusimichael/presentations). Each year, I update the financial statement metrics from the Income Statement, Balance Sheet and Cash Flow.

Important Notice

Due to the hard work involved, I only cover the most active stocks on the Nigerian Stock Exchange. If there is any you would like to see that isn't there, do let me know and I may include it. Also there is an error in the Total Assets and Total Equities line for a couple of the bank stocks, I am working on making the correction. Other than that, all is working great (hopefully) and I use it for my own stock analysis. And that is the big edge, it is like I am sharing my quantitative stock analysis model with you all for free and in a very beautiful (to see and use) dashboard.
There are no 2016 financial metrics in yet. I am still gathering all the 2016 annual reports. Some companies have made theirs available online in the last two weeks, some are yet to. Maybe I won't wait till I have them all before I start the financial metrics extraction. I have already gotten for GTBank, First Bank, Total Nigeria, Access Bank, Dangote Cement, Cadbury, PZ and a few other companies.
The whole set-up (programming, hosting, virtual machine, SQL server etc) costs me money and I often pay freelance finance analysts to help share the work of extracting the data from the PDF annual reports into our standardized model. One day, I will put a door that opens only to paid subscription in front of it. So enjoy while its free!

Tuesday, April 4, 2017

Introduction to R and Python

R is, arguably, the world's most popular programming language for data analysis. It is an adaptation of S language developed by Bells Laboratories strictly for statistics and data analysis. S language became very popular and R grew out of it as the open-source implementation of S language in August 1993.

The popularity of R is due to its open-source nature and the huge community contributing packages to it that handle all types of common data analysis work. You can read more about R at http://www.r-project.org/ and a more detailed history of R at http://www.r-project.org/about.html

Python, on the other hand, is a general purpose programming language. It is used for all types of programming from building websites (web applications) to computer programs and data analysis. It was created by Guido van Rossum in 1991. It is also an open-source language, benefiting hugely from a large community actively contributing to it. You can read more about Python at https://www.python.org/about/ 

image: medium.com

R and Python are the two most popular languages for data analysis. And anyone serious in becoming a data scientist must be proficient in at least one of the two. I recommend that you have average knowledge in both and then become an expert in one.

Both languages have libraries. More often called packages in R. They are already built algorithms that help you achieve specific tasks. More like Excel formulas, though way more robust in nature. You load them into your R or Python work space and can access the functions they provide.

A common need you will encounter is creating graphs/charts. In R, the most commonly used package for that is ggplot2 and in Python you would use Matplotlib.

To start using R and Python, you will need to install them. 

For R, you can download R at https://cran.r-project.org/ and it is recommended to install RStudio to make using R enjoyable. RStudio is an IDE (integrated development environment) and can be installed at https://www.rstudio.com/  With those two installations, you are set to begin analysing data with R. And they work whether you have a Windows PC, or Mac or Linux.

For Python, it is recommended that you download Anaconda at https://www.continuum.io/downloads. It is regarded as the best distribution of Python for data analysis work. And as for an IDE to use, there is no obvious best as RStudio is for R. Some people are a die-hard fan of Jupyter Notebook (formerly IPython Notebook), luckily it comes pre-installed with Anaconda. Others love Spyder (again, comes pre-installed with Anaconda). And there is PyCharm, you will have to install that at https://www.jetbrains.com/pycharm/. For this training series, I will be using Rodeo downloadable at https://www.yhat.com/products/rodeo, it is an RStudio lookalike. That way you won't have to stress yourself too much in getting used to the IDEs. Once you become familiar with RStudio, you will become more comfortable with Rodeo and vice-versa.

In the next sections we will dig deeper into carrying out simple data analysis tasks in both R and Python.

Tuesday, March 28, 2017

Sentiment Analysis of Today's Tweets About President Buhari Using Python's Vader and Sentiwordnet


After a long weekend crisscrossing Lagos, Kaduna and Abuja to deliver our quarterly Business Data Analysis and In-depth Excel Training in Abuja, I rested today and decided to do an interesting data analysis task. 

Do a sentiment analysis on today's tweets about the President, Buhari.

It was a very interesting task. Made me get familiar with Python's Tweepy library. I got some help from Marco Bonzanini. You should check out his Mastering Social Media Mining with Python book.

Here is the breakdown of the steps in achieving the goal:

  1. Created the needed credentials to access Twitter API (it's at https://apps.twitter.com/app/new)
  2. Use Tweepy to search in realtime for tweets about Buhari
  3. Save the tweets as a JSON file
  4. Handle emoticons and strings peculiar to Twitter (like @, # and so on)
  5. Exclude stop words (words with no significant, for sentiment analysis, meaning) like to, be, is etc
  6. Apply Vader to each tweet text and calculate the entire tweet stream sentiment
  7. Apply Sentiwordnet, an alternative to Vader, to do the same stream sentiment analysis
Below are the screenshots of some of the steps.

Twitter developer setup

Twitter developer setup
Vader Sentiment Analysis results

Sentiwordnet sentiment analysis result
And the results?

Well let's just say if the tweets are a mirror of what people genuinely feel about President Buhari, he should be terribly worried. Most people were very emotional in their tweets about him and overwhelmingly negative.


Vader classified the tweet stream as 54.67% negative vs 17.76% positive and 27.57% neutral.


Sentiwordnet gave a more damning result: 92.06% negative vs 7.48% positive and 0.47% neutral.



Both Vader and Sentiwordnet are rule-based supervised classification algorithms. In a context like Nigeria where we have our own Naija words and peculiar word mix, machine learning algorithm, supervised or non-supervised, might be more reliable.

Monday, February 27, 2017

Tutorial On Creating A Frequency Distribution Chart With Microsoft Excel, R and Python

I got to join this amazing community of Data Scientists in Nigeria. We are a mix of experts and beginners. Today, I created a tutorial for the beginners to see how to do a common task like frequency distribution plot in both Python and R, also decided to include my dearest Microsoft Excel as a control.

The sample data is a fictionalized data for Dominos Pizza Nigeria. One day sales data for their Lekki branch. You can download the practice along raw data file here: https://dl.dropboxusercontent.com/u/28140414/Dominos%20Pizza.csv 


So the business question we want to tackle is: Is there a pattern in the quantities each customer buys? To be more specific, we want to examine the frequency distribution of the quantities purchased per sales transaction.

In Excel, it is extremely straightforward. Just plot a histogram on the quantity field.


Now let's head to doing same with R

I use R 3.3.2 and RStudio.

First, I import the csv file into RStudio.




Though not necessary for what we want to do, but I like doing it for any data I bring into R, I run the summary command on the dataframe/table. > summary(Dominos_Pizza)


Again, not a required step. I check out the standard plot graph on the Quantity field. > plot(Dominos_Pizza$Quantity)


Finally, I do the histogram chart on the Quantity field. > hist(Dominos_Pizza$Quantity)



For now I don't bother customizing the graph elements (labels, color, title, etc.)

It is Python time.

I use Rodeo IDE and Anaconda. 


I import Pandas and use it to read in the csv file. 




And here is the plot graph, like we did in R.


Finally, I create the histogram.



I will try to follow up with more tutorials of complex tasks, and some that are best suited to R and others that are best suited to Python. As per Excel, it is in a completely different class. It is a spreadsheet application. 

Got any particular task you will like me to create a tutorial around? Ask away!

Wednesday, February 1, 2017

Setting Up Power BI Embedded on Microsoft Dynamics NAV 2017

We got a request from a big company to show them how to integrate Power BI with Microsoft Dynamics NAV 2017. 

In other to prepare demo content for the training, I had to install Microsoft Dynamics NAV 2017 and set it up with Power BI. NAV 2017 can integrate with Power BI via four doors -- Power BI Embedded, Power BI service NAV content pack, Power BI Desktop OData and SQL connection to the NAV database. The Power BI Embedded one is more of you bringing in your Power BI dashboard (any one, not necessarily one built from NAV datasets) into NAV reports.

I made sure to take a lot of screenshots along the way. I know this won't be interesting to a lot of us, but for the few who some day will stumble on this when they are stuck getting to make all the frustratingly many moving parts work in sync, this can be a time/life saver.

First, what is Power BI Embedded?

It is when you make your Power BI reports show in another application. Like I did to make my stock analysis dashboard show up on my Nigerian Elite web app. So I embedded it there. And the new NAV 2017 also comes with that ability for you to embed your Power BI dashboards in its reports page.



So below are the visual steps to setting Power BI embedded up in NAV 2017.

In the beginning, there was no Power BI Reports, just a link to setting it up.



Then I followed the setup instructions.



At this stage, I had to go to Microsoft Azure on the account that is the O365 administrator and has Power BI pro enabled. Under Azure Active Directory, I created a new App Registration.






Copied the sign-on URL from the NAV reply URL (in the setup dialog box). Ignore the 8080 showing up in mine, it was the result of following a wrong instruction online, I had to verbatim copy the Reply URL value here before I could get it to work.



Got the App Id and Key.




Set the Permissions and add Power BI service (enabled all the delegate permissions for Power BI service).





Continued the setup on NAV 2017.




Now it was half time.  I was half done.

Then continued.

I got a new dialogbox asking me to authorize azure services to connect to Power BI services.




On clicking on the link, I had to login again.





All set! I can now embed any of my many Power BI reports.

:)

Wednesday, January 25, 2017

The Seven New Charts Exclusive To Excel 2016

In August 2016 we wrote about the new chart types in Excel 2016

Since then, the list has grown slightly. Now we have seven new chart types in Excel 2016. 

The latest entry is Filled Maps. It allows you do a map of your location/state/branch/region/city/country data with a graduated fill to show the values (of any metric) across those locations.



The other ones are:
 Treemap

Sunburst

Histogram

Box and Whisker

Waterfall


Funnel

For extra commentary on the charts other than the filled map, you should read: http://blog.urbizedge.com/2016/08/new-chart-types-in-excel-treemap.html