Why this analysis?
A few days ago cursdeguvernare.ro, an online newspaper has published an article about the fact that 21.4% of the Romanian labor force are working abroad.
The article had also the following graph from Eurostat attached. Which presents 2 very interesting cases, Romania and Cyprus on the other side.
On one side is Cyprus, that in 2008 it was a leader in this ranking, but in 2018 people came back home for work. While in Romania from 9% it went up to 21% in 2018.
My analysis
Starting with this, I did my own research and this and I found two datasets from Eurostat:
The data available is only from 2009 to 2018. And I have been trying to present the time changes using a slider filter and a dot for Romania in order to emphasize its position.
And Tableau did a great job representing this:
Why Pandas and Python
The data wasn’t that clean so I had to clean it, but at the same time, I took this opportunity to practice my Python skills as well.
So, I used Pandas (mostly) and NumPy to process the data that leed me to the above Tableau Dashboard.
Steps:
- read CSV files
- rename columns
- replace some values
- type conversions
- join the population file with the workers outside the country file
- select the necessary columns for my analysis
- export the final file
Please see the full script below:
import pandas as pd import numpy as np pd.options.display.width = 0 population_file = r'C:\Users\stefan\Google Drive\Storytelling\RomaniaNoYouth\demo_pjangroup_1_Data.csv' population = pd.read_csv(population_file) workAnotherEU_file = r'C:\Users\stefan\Google Drive\Storytelling\RomaniaNoYouth\lfst_lmbpcita_1_Data.csv' workAnotherEU = pd.read_csv(workAnotherEU_file) # -------- population processing population = population.rename(columns = {'TIME':'Year', 'GEO':'Country', 'SEX':'Gender', 'AGE':'Age', 'UNIT':'Unit'}) subString = 'Euro' population['IsCountry'] = np.where (population['Country'].str.contains(subString), 'Euro', 'Country') population['Age'] = population['Age'].str.replace('From ', '', regex=True) population['Age'] = population['Age'].str.replace('to', '-', regex=True) population['Age'] = population['Age'].str.replace(' years', '', regex=True) population['Value'] = population['Value'].str.replace('\:', '0', regex=True) population['Value'] = population['Value'].str.replace(',', '', regex=True) population['Value'] = population['Value'].astype(float) # group by per Year_x Country_x Gender Age_x Unit Value_x Flag and Footnotes_x IsCountry population = population.groupby(['Year', 'Country', 'Gender', 'Unit', 'IsCountry'], as_index=False)['Value'].sum() populationFilter = population['Gender'] == 'Total' population = population[populationFilter] population['Key'] = population['Year'].astype(str) + '_' + population['Country'] # population.to_csv(r'C:\Users\stst\Google Drive\Storytelling\RomaniaNoYouth\TestOutput.csv') # print(population.head) # -------- workAnotherEU processing, workAnotherEU = people that works in another country workAnotherEU = workAnotherEU.rename(columns = {'TIME':'Year', 'CITIZEN':'Country', 'C_RESID':'WorkingArea', 'AGE':'Age'}) workAnotherEU['WorkingArea'] = workAnotherEU['WorkingArea'].str.replace('European Union - 28 countries and European Free Trade Association \(EFTA\)', 'EU28 + EFTA', regex=True) workAnotherEU['WorkingArea'] = workAnotherEU['WorkingArea'].str.replace('European Union - 28 countries', 'EU28', regex=True) workAnotherEU['Age'] = workAnotherEU['Age'].str.replace('From ','', regex=True) workAnotherEU['Age'] = workAnotherEU['Age'].str.replace('to','-', regex=True) workAnotherEU['Age'] = workAnotherEU['Age'].str.replace(' years','', regex=True) workAnotherEU['Value'] = workAnotherEU['Value'].str.replace('\:', '0', regex=True) workAnotherEU['Value'] = workAnotherEU['Value'].str.replace(',', '', regex=True) workAnotherEU['Value'] = workAnotherEU['Value'].astype(float) workAnotherEU['Value'] = workAnotherEU['Value'] * 1000 workAnotherEUFilter = workAnotherEU['WorkingArea'] == 'EU28 + EFTA' workAnotherEU = workAnotherEU[workAnotherEUFilter] workAnotherEU['Key'] = workAnotherEU['Year'].astype(str) + '_' + workAnotherEU['Country'] #--------- join the files finalFile = pd.merge(population, workAnotherEU, on='Key', how='inner') finalFile = finalFile[['Year_x', 'Country_x', 'Gender', 'IsCountry', 'Value_x', 'WorkingArea', 'Value_y']] finalFile = finalFile.rename(columns = {'Year_x':'Year', 'Country_x':'Country', 'Gender':'Gender', 'IsCountry':'IsCountry', 'Value_x':'Population', 'WorkingArea':'WorkingArea', 'Value_y':'WorkingInEU'}) finalFile['Percentage'] = finalFile['WorkingInEU'] / finalFile['Population'] print(finalFile.head()) #--------- Export CSV finalFile.to_csv(r'C:\Users\stefan\Google Drive\Storytelling\RomaniaNoYouth\finalFile.csv')