DataViz.ro, Statistics, Tableau

Freedom of movement for workers in the European Union – a story using Tableau and Python

august 19, 2019

Why this analysis?

A few days ago cursdeguvernare.ro, an online newspaper has published an article about the fact that 21.4% of the Romanian labor force are working abroad.

The article had also the following graph from Eurostat attached. Which presents 2 very interesting cases, Romania and Cyprus on the other side.

On one side is Cyprus, that in 2008 it was a leader in this ranking, but in 2018 people came back home for work. While in Romania from 9% it went up to 21% in 2018.

tableau line

My analysis

Starting with this, I did my own research and this and I found two datasets from Eurostat:

The data available is only from 2009 to 2018. And I have been trying to present the time changes using a slider filter and a dot for Romania in order to emphasize its position.

And Tableau did a great job representing this:

Why Pandas and Python

The data wasn’t that clean so I had to clean it, but at the same time, I took this opportunity to practice my Python skills as well.

So, I used Pandas (mostly) and NumPy to process the data that leed me to the above Tableau Dashboard.

Steps:

  1. read CSV files
  2. rename columns
  3. replace some values
  4. type conversions
  5. join the population file with the workers outside the country file
  6. select the necessary columns for my analysis
  7. export the final file

Please see the full script below:

import pandas as pd
import numpy as np

pd.options.display.width = 0


population_file = r'C:\Users\stefan\Google Drive\Storytelling\RomaniaNoYouth\demo_pjangroup_1_Data.csv'
population = pd.read_csv(population_file)

workAnotherEU_file = r'C:\Users\stefan\Google Drive\Storytelling\RomaniaNoYouth\lfst_lmbpcita_1_Data.csv'
workAnotherEU = pd.read_csv(workAnotherEU_file)

# -------- population processing

population = population.rename(columns = {'TIME':'Year', 'GEO':'Country', 'SEX':'Gender', 'AGE':'Age', 'UNIT':'Unit'})

subString = 'Euro'
population['IsCountry'] = np.where (population['Country'].str.contains(subString), 'Euro', 'Country')


population['Age'] = population['Age'].str.replace('From ', '', regex=True)
population['Age'] = population['Age'].str.replace('to', '-', regex=True)
population['Age'] = population['Age'].str.replace(' years', '', regex=True)
population['Value'] = population['Value'].str.replace('\:', '0', regex=True)
population['Value'] = population['Value'].str.replace(',', '', regex=True)

population['Value'] = population['Value'].astype(float)

# group by per Year_x  Country_x  Gender Age_x  Unit   Value_x    Flag and Footnotes_x   IsCountry
population = population.groupby(['Year', 'Country', 'Gender', 'Unit', 'IsCountry'], as_index=False)['Value'].sum()

populationFilter = population['Gender'] == 'Total'

population = population[populationFilter]

population['Key'] = population['Year'].astype(str) + '_' + population['Country']


# population.to_csv(r'C:\Users\stst\Google Drive\Storytelling\RomaniaNoYouth\TestOutput.csv')
# print(population.head)


# -------- workAnotherEU processing, workAnotherEU = people that works in another country

workAnotherEU = workAnotherEU.rename(columns = {'TIME':'Year', 'CITIZEN':'Country', 'C_RESID':'WorkingArea', 'AGE':'Age'})

workAnotherEU['WorkingArea'] = workAnotherEU['WorkingArea'].str.replace('European Union - 28 countries and European Free Trade Association \(EFTA\)', 'EU28 + EFTA', regex=True)
workAnotherEU['WorkingArea'] = workAnotherEU['WorkingArea'].str.replace('European Union - 28 countries', 'EU28', regex=True)

workAnotherEU['Age'] = workAnotherEU['Age'].str.replace('From ','', regex=True)
workAnotherEU['Age'] = workAnotherEU['Age'].str.replace('to','-', regex=True)
workAnotherEU['Age'] = workAnotherEU['Age'].str.replace(' years','', regex=True)
workAnotherEU['Value'] = workAnotherEU['Value'].str.replace('\:', '0', regex=True)
workAnotherEU['Value'] = workAnotherEU['Value'].str.replace(',', '', regex=True)

workAnotherEU['Value'] = workAnotherEU['Value'].astype(float)
workAnotherEU['Value'] = workAnotherEU['Value'] * 1000


workAnotherEUFilter = workAnotherEU['WorkingArea'] == 'EU28 + EFTA'
workAnotherEU = workAnotherEU[workAnotherEUFilter]

workAnotherEU['Key'] = workAnotherEU['Year'].astype(str) + '_' + workAnotherEU['Country']


#--------- join the files

finalFile = pd.merge(population, workAnotherEU, on='Key', how='inner')


finalFile = finalFile[['Year_x', 'Country_x', 'Gender', 'IsCountry', 'Value_x', 'WorkingArea', 'Value_y']]
finalFile = finalFile.rename(columns = {'Year_x':'Year', 'Country_x':'Country', 'Gender':'Gender', 'IsCountry':'IsCountry', 'Value_x':'Population', 'WorkingArea':'WorkingArea', 'Value_y':'WorkingInEU'})

finalFile['Percentage'] = finalFile['WorkingInEU'] / finalFile['Population']

print(finalFile.head())

#--------- Export CSV

finalFile.to_csv(r'C:\Users\stefan\Google Drive\Storytelling\RomaniaNoYouth\finalFile.csv')

You Might Also Like

No Comments

Leave a Reply