Wednesday, June 21, 2017

Scrapping Tweets About Buhari, Osinbajo and Trends: How To Schedule R Scripts To Run Daily Using Windows Task Scheduler

Today (5th June, 2017), I finally got around to doing the R script that will daily mine social media data off Twitter about conversations around President Buhari, Acting President Osinbajo and whatever is trending in Nigeria twittosphere. Will keep it running to do fun or even commercial projects in and beyond 2019 (election year).

The interesting challenge there was to schedule this to run daily, saving the tweets in a CSV file incrementally.

I found two ways online -- the familiar Windows Task Scheduler way and using taskscheduleR

I already use Windows Task Scheduler for a couple of Python scripts and familiar with the setup, so I decided to go along with it.

It is straightforward to use. I have broken the entire process into two steps:

Step 1: Create a Batch file to run the R script
Create an empty text file and save it as .bat file. Then get the executable path of your R installation. Mine is C:\Program Files\Microsoft\MRO-3.3.1\bin\R.exe

Also get the path of your R script. 

Now set up the .bat file like I did mine below

@echo off
"C:\Program Files\Microsoft\MRO-3.3.1\bin\R.exe" CMD BATCH "C:\Users\Michael Olafusi\Documents\TwitterScrapper.R"

Step 2: Create the Scheduler Task
Launch the Task Scheduler (just search for it if you use Windows 10, 8 and 7).

In the Task Scheduler window, at the right pane, click on Create Task.

And follow my screenshot guide below.

And that is it!

And that's how I scheduled the R script to daily mine tweets about Buhari, Osinbajo, Biafra and what's trending in Nigeria.

I intend to do a trend sentimental analysis on them going into the Election days in 2019.

Wednesday, June 14, 2017

How To Select, Delete or Replace All Blank Records In Excel

First, I have got to thank my fellow Excel consultant, Olukunle Babajide, for opening my eyes to this interesting use of Go To Special. Seeing him do it was a lightbulb moment for me and I have used it to solve many issues that my usual method would have taken me many extra steps to fix.

What do you do when you have blank rows in a table of thousands of rows? That is what I usually have when I copy daily stock price or Crude Oil price from the Bloomberg Terminal into Excel, the weekend days come up as blank rows. Or in your case, it could be a malformed table someone sent you and you need to remove those blank rows.

Well, today is your lucky day! I will be sharing with you a neat and quick way to get it done. Below is a snapshot of the table I will be using for illustration and as you can see, it has a few blank rows.

To clean off (delete those rows), I select the entire messed up table and navigate to Find & SelectGo To Special.

In the Go To Special dialog box that pops up, select Blanks.

And you'll see all the Blanks selected. Now you can use Delete Cells or the keyboard shortcut (CTRL + -) to wipe off those rows by indicating Shift cells up.

And voila! It is done.

And rather than deleting those blank records, you can also replace them with any values of your choice all at once, using CTRL + Enter.

Saturday, June 10, 2017

Comprehensive Introduction To Power BI And The Value Proposition

Power BI is Microsoft's self-service Business Intelligence solution and is the most popular self-service BI solution in Nigeria. Many companies in Nigeria are beginning to realise the huge importance of having flexible and robust BI reports and a lot of them are turning to Power BI due to the local presence of Microsoft and the low entry fee of Power BI compared to the other self-service BI solutions.

As a business professional, what do you need to know about Power BI and its potential value to your organization? Well, that is what we are going to discuss in the next few paragraphs.

Before the advent of self-service BI solutions, companies used to have a data warehouse or IT team who manage the company-wide data using Microsoft SQL servers, Oracle database servers and MySQL database servers (there are others but these are the most popular ones). These tech-savvy team model the company data and provide portals for the other teams/departments in the company to download data they need for their reports. Sometimes, they create reports too for the business managers to consume but these reports are often static and not robust enough for day-to-day business needs. Hence, the need for a second group of people -- analysts. They could be sales analysts, operations analysts, customer service analysts, marketing analysts or data analysts. This set of people create the highly important day-to-day reports that management use to keep the business running smoothly and for strategic planning. These reports are mostly made with Microsoft Excel because of its ease of use and managers' preference for it. And again, before now, this was good enough.

However, since the advent of self-service BI, Excel is now no longer good enough for reporting all the business operations. Excel is still extremely important and useful but not great at providing real-time robust dashboards/reports that can be consumed on the go by managers (without having to lug around a laptop and Excel). Managers are increasingly looking for reporting tools that provide them real-time access to their business data reports and accessible from their iPads, tablets and smartphones even in the middle of the night or while on a vacation in a remote island/village. And that is the very thing with many other valuable features that self-service BI solutions like Power BI provide.

With Excel, you have to create the reports repeatedly and email them out. I used to work as a business analyst and MIS analyst for Comviva on an Airtel Africa CRBT project. I often say I worked for Airtel Africa as Comviva was a sister company to Airtel, both were owned by the Bharti Group. I used to create 11 daily reports -- one for each of the 10 countries we were operating in in Africa and one pan-Africa report to the Airtel HQ in Kenya. Then every Friday, I create a weekly report and presentation. Been very good in Excel, I created a template to automate the reports and reduce my daily repetitive tasks. Yet two things could not be automated away and gave the management concern -- they only get the reports when I email them out and they always have to view it on their laptops. How were these a problem? On Saturday and Sunday, they are blind. They don't see what has happened all throughout the weekend till I send out reports on Monday. Then, whenever they are in a meeting or on a flight, they have to wait till they can open their PC, download the reports and interact with it before they can fully know what's in it.

Now, with Power BI, those problems are no more. I can design the reports once and set it to update automatically each day, so I don't have to daily recreate the reports. Then the managers can view the reports any day, especially on Saturday and Sunday, and see the updated (real-time) analysis. Lastly, they don't have to be on their PC. They can access the reports on their smartphones, tablets and any internet connect device that has a good browser.

And those aren't just the benefits. You get drill-down capabilities, enhanced visualizations, KPI trigger alerts and many more benefits that managers won't want to turn a blind eye to. Below is the extract of the slides I use to present the benefits of Power BI to managers. You can also view one of my openly available Power BI dashboards that I often use to show the power of Power BI: here and here. Enjoy! 

Note: the sample dashboard slides are extract from publicly available dashboards on Power BI Data Stories Gallery and the quote + architecture slides are from Brandon George's presentation.

Tuesday, May 9, 2017

Cluster and Word Cloud Analysis of Tweets About Buhari Today Using R

Last week I wrote about doing a sentiment analysis of tweets about the President using Python's Sentiwordnet and Vader. Today, I switched to R and did a clustering and word cloud of the tweets about President Buhari.

Below are the steps I took:
  1. I imported all the necessary libraries.
  2. I connected to Twitter and created a search stream to gather tweets about Buhari
  3. I saved the results in a csv file with append set to true so I can keep piling up the search results from different time of the day. Then I removed punctuation and stopwords.
  4. I extracted the most frequently used words and created a word cloud from them. Lastly, I did a clustering of the words.
The require statements that were struck out were of libraries I didn't use but forgot take out before the screenshot

Below is a screenshot of the scrapped tweets.

Tuesday, May 2, 2017

Data Types and Data Structures In R

R recognizes four main data types:
  1. Numeric values: These are number values which can have decimal parts. Examples are 55, 27.8 and 100.255
  2. Integer values: These are number values with no decimal parts. To differentiate them from Numeric values, when manually inputting them in R you append the number with the letter "L". So you'll write 5 as 5L, 10 as 10L and 50 as 50L to make R recognize them as Integer values rather than treat them as Numeric values.
  3. Character values: These are text values. You surround them with quotes when manually inputting them into R. You should note that if you input number values in R but surround them with quotes R will recognize them as Character and not Numeric values. Examples are "Michael", "Data" and "200".
  4. Logical values. These are TRUE and FALSE (must always be in CAPS). You enter them in R without quotes, unlike Character values. Also when you carry out comparison operations in R (often called logical operations) the results are logical values. 
Besides these four common ones that you have to be very familiar with and will extensively use in your data analysis work in R, there are two other less common data types: complex values and raw values. I won't bother discussing them because I don't see much real life practical use for them.

Above the layer of data types, we have data structures in R. These are the different standard ways you can organize your data in R. And there are six data structures in R.

  1. Vectors: These are the most basic data structure in R and the first you should be familiar with. Usually the other data structures are built on top of vectors, so a proper understanding of vectors provide a fundamental advantage to using the ones built on it. A vector is a collection of values of the same data type. A very common way to create vectors in R is to use the combine function c(). For example c(2,6,7,9) creates a vector that holds the values 2, 6, 7 and 9. You can call out the elements by providing its position number from the left in a square bracket. So to call out the value 6 if I assign the previous vector to a variable called sample_vector, I can write sample_vector[2]. 
  2. Factors: These are character values vectors. They hold what in statistics is called nominal data. Data that represent different categories. An example of factor is factor(c("Lagos","New York","Sydney","London")).
  3. Lists: These are a more advanced data structure than vectors and factors. They allow for storage of values of different data types and allow you to give each value a name you can reference. Examples of a list are list("Michael",21,"Lagos",FALSE) and list(name="Michael", age=21,city="Lagos",married=FALSE). It is also possible to create list of lists.
  4. Data Frames: These are tabular representation of data. Very similar to the way regular database (SQL tables) and spreadsheet (Excel tables) present data. They allow you to reference the values by row and column address. They are a powerful data structure often used for analysis of large records.An example is data.frame(name = c("John","Michael","Tunde"), age=c(32,21,28), city = c("Abuja","Lagos","Kaduna"), married = c(TRUE,FALSE,TRUE))
  5. Matrices: Matrices are two dimensional representation of values of the same data type. The values can also be addressed by their row and column position. An example of a matrix is matrix(c(4,5,6,7,8,9), nrow=3)
  6. Arrays: Arrays are multi-dimensional tables. They are not limited to two dimensions like the matrix, they can take up as many number of dimensions as desired. An example of an array is array(c(1,2,3,4,5,6,7,8), dim=c(2,2,2)). This is a three dimensional array.

Tuesday, April 25, 2017

Analyzing Nigeria Stock Market, Bond Yield, Exchange Rate and GDP Using R

Today I decided to do an interesting analysis using R. I compiled actual/live data from as far as 1998 on Nigerian All Share Index and 48 of the most valuable stocks. You can access the raw data here:

Here is the R code text.

# Load in necessary libraries

nse_asi<- csv="" data_analysis="" font="" read.csv="">
nse_asi$Date<- ate="" font="" mdy="" nse_asi="">

# Uncomment the following lines to see the structure and preview of the raw data

# This next lines of code are to generate the charts for each company in a separate window
# It is not compulsory, and might be worth commenting out if it makes your computer freeze
for (cmpy in unique(nse_asi$Company)){
  print(ggplot(data=nse_asi[nse_asi$Company==cmpy,], aes(x=Date, y=Last_Price))+
  geom_line() +
  ggtitle(cmpy) +
  labs(x="Date",y="Price") )

# This is the most important chart code. Generates the charts in a neatly grouped way
ggplot(data=nse_asi, aes(x=Date, y=Last_Price, group=Ticker, color=Ticker))+
  geom_line() +
  facet_wrap(~Ticker,scales="free") +
  labs(x="Date",y="Price") +

And below are the results. Enjoy. Lots of screenshots. And make sure you notice the interesting insights: like how investors in Presco, Okomu Oil, United Capitals and Mobil would have been smiling to the bank despite the market depression and economic recession. I personally made some gain from Mobil. Also notice how not very long ago, 1 USD exchanged for 22 Naira.

NSE ASI (pointed out with the red line) and 48 top stocks

FGN 10 Year Bond Yield

Nigeria Real GDP Growth from 1960 till Today

US Dollar to Nigerian Naira Exchange rate