DataViz.ro

Python, SSIS, Tableau and Twitter – a good team

ianuarie 23, 2019

Just a few months ago I have started learning Python and the first project was building a scrapping py script for the earthquake database in Romania.

From there I have completed the course from Dataquest.io “Python for Data Science” and I learn a lot of new things. Along with this, I have started to learn also SSIS (SQL Server Integration Services) with Pluralsight.com, because I wanted to have a deeper understanding of some ETL tool.

First project

This project has started form the question “What is the word people are using the most when they post on social media?” …and that was is.

I have started to make some research on this and I found out that I need to know how to use at least 3 tools.

Python for text analysis.

SSIS for making the output of Python more usable in a BI tool

and Tableau to get the insight in a visual form.

Just to understand much better I have posted also a diagram of the flow:

Python

The core data that has been processed using Python has been downloaded from data.world. And it contains 20,000 tweets, with username and gender.

Starting from this in the script I broked each tweet in words and after that, I have counted them to see what it is the frequency of each word.

After a few hours of work, I have produced the following code that is telling me that the mows used word on Twitter in those days was “the” (by males) and “and” (by females).

top words on twitter

SSIS (SQL Server Integration Services)

The SSIS pack is quite robust for this project and in fact, the most annoying part of it was the Code page setting from the CSV connection that has created me a lot of problems in the flow.

ssis flow microsoft

With Python, I have processed the genders in separate files in order to keep a clear track of it. Anyway, we talk about 60k lines at the end.

Tableau comes into action

Being an atypical data set with multiple anomalies it has been difficult to find a good visualization to represent all.

That’s why I have made a division in 2:

  • Pronouns analysis. I was curious about who are we talking about (me, you, they…)
  • and Man vs Women war of words

Pronouns analysis in Tableau

This is how I discovered that half of the time we are talking about us.

tableau

it’s not the best viz…but I wanted to represent the whole in one piece

And “We” is almost in the last position.

Man vs Women war of words

In order to have a full view over this, I used a horizontal line chart where the women are on the left side and the males are on the right side.

And we can spot that most of the words are used on both sides, except some of them, but nothing really significant.

As you can see the women are using more words than man and also, the vocabulary used is more diverse.

tableau gender

You Might Also Like

No Comments

Leave a Reply