After a long weekend crisscrossing Lagos, Kaduna and Abuja to deliver our quarterly Business Data Analysis and In-depth Excel Training in Abuja, I rested today and decided to do an interesting data analysis task.
Do a sentiment analysis on today's tweets about the President, Buhari.
It was a very interesting task. Made me get familiar with Python's Tweepy library. I got some help from Marco Bonzanini. You should check out his Mastering Social Media Mining with Python book.
Here is the breakdown of the steps in achieving the goal:
- Created the needed credentials to access Twitter API (it's at https://apps.twitter.com/app/new)
- Use Tweepy to search in realtime for tweets about Buhari
- Save the tweets as a JSON file
- Handle emoticons and strings peculiar to Twitter (like @, # and so on)
- Exclude stop words (words with no significant, for sentiment analysis, meaning) like to, be, is etc
- Apply Vader to each tweet text and calculate the entire tweet stream sentiment
- Apply Sentiwordnet, an alternative to Vader, to do the same stream sentiment analysis
Below are the screenshots of some of the steps.
Twitter developer setup |
Twitter developer setup |
Vader Sentiment Analysis results |
Sentiwordnet sentiment analysis result |
And the results?
Well let's just say if the tweets are a mirror of what people genuinely feel about President Buhari, he should be terribly worried. Most people were very emotional in their tweets about him and overwhelmingly negative.
Vader classified the tweet stream as 54.67% negative vs 17.76% positive and 27.57% neutral.
Sentiwordnet gave a more damning result: 92.06% negative vs 7.48% positive and 0.47% neutral.
Both Vader and Sentiwordnet are rule-based supervised classification algorithms. In a context like Nigeria where we have our own Naija words and peculiar word mix, machine learning algorithm, supervised or non-supervised, might be more reliable.