World Bank : Projects & Operations Data Analysis

Team members:

Ignacio Perez
Sydney Friedman
Aisha Kigongo

Project Goals

Our Primary Goal:

Find interesting patterns/trends through analysis & comparison of the world bank loan commitments, HDI,Freedom Index and GDP
Secondary goal:

To analyze what insight open data can give us as to how effective initiatives and funding actually is as opposed to what it’s meant to be.

About the Datasets:

World Bank :
- Projects & Operations: - lending projects from 1947 - present
  - Dataset includes:
    - project title, task manager,country, project id, sector, themes,
    - commitment amount, product line,financing.
- GDP per capita ( 2000 - 2011)
  - Dataset includes:
    - country name, country code,years (2000 - 2012*)
    - 185 countries
Heritage Foundation :

An economics and development thinkTank based in Washington D.C, they analyze and keep track of the economic freedom around the world with the influential Index of Economic Freedom.
- The Index covers 10 freedoms – from property rights to entrepreneurship
- Dataset includes:
  - Rule of Law (property rights, freedom from corruption);
  - Limited Government (fiscal freedom, government spending);
  - Regulatory Efficiency (business freedom, labor freedom, monetary freedom);
  - Open Markets (trade freedom, investment freedom, financial freedom).

In [2]:

from pandas import DataFrame, Series
import pandas as pd
import os
import codecs

In [3]:

# Verify existence of & Read in the datasets - project and operations & Freedom Index

DATA_FILES={"projdict":"data/projects_operations_api.csv", "fredict":"data/FreedomIndex.csv"}
def file_path(key):
    return os.path.join(os.pardir, DATA_FILES[key])
for file_key in DATA_FILES.keys():
    abs_fname = file_path(file_key)
    print abs_fname, os.path.exists(abs_fname)

../data/projects_operations_api.csv True
../data/FreedomIndex.csv True

In [4]:

f = codecs.open(file_path("projdict"), encoding='iso-8859-1')
initial_proj_df = pd.read_csv(f)

In [12]:

initial_proj_df.columns

Out[12]:

Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location], dtype=object)

In this section, we explore the world banks commitments to Africa. By looking at the total amount loaned out to different african countries by the years.

In [13]:

is_africa = initial_proj_df['regionname']=='AFRICA'

In [14]:

initial_proj_df[is_africa]['countryname'][:5]

Out[14]:

2         Republic of Mozambique;Republic of Mozambique
4         Republic of Mozambique;Republic of Mozambique
8     Federal Republic of Nigeria;Federal Republic o...
11    United Republic of Tanzania;United Republic of...
14                    Republic of Togo;Republic of Togo
Name: countryname

In [15]:

initial_proj_df[is_africa][['countryname', 'totalamt']][:5]

Out[15]:

	countryname	totalamt
2	Republic of Mozambique;Republic of Mozambique	37;000;000
4	Republic of Mozambique;Republic of Mozambique	50;000;000
8	Federal Republic of Nigeria;Federal Republic o...	0
11	United Republic of Tanzania;United Republic of...	100;000;000
14	Republic of Togo;Republic of Togo	0

In [16]:

#The totalamt value is not properly formatted. This step cleans up the value by stripping out unnecessary characters.

initial_proj_df['totalamt'] = initial_proj_df['totalamt'].str.replace(';','')

In [17]:

initial_proj_df[is_africa]['totalamt'][:5]

Out[17]:

2      37000000
4      50000000
8             0
11    100000000
14            0
Name: totalamt

In [18]:

initial_proj_df['totalamt'] = initial_proj_df['totalamt'].astype('float32')

In [19]:

sum(initial_proj_df[is_africa]['totalamt'][:5])

Out[19]:

1.87e+08

In the next steps, we perform clean up of the data. For example, The amounts in the projects & operations dataset have comma's as the delimiters, so they need to be stripped out and the values parsed as floats.

In [20]:

initial_proj_df[['regionname','countryname','projectstatusdisplay','totalamt']][:2]

Out[20]:

	regionname	countryname	projectstatusdisplay	totalamt
0	EUROPE AND CENTRAL ASIA	Republic of Armenia;Republic of Armenia	Active	0
1	EAST ASIA AND PACIFIC	Socialist Republic of Vietnam;Socialist Republ...	Active	156000000

In [21]:

# This step is data cleaning. Removing the semi-column from the money values.

initial_proj_df['lendprojectcost'] = initial_proj_df['lendprojectcost'].str.replace(';','')
initial_proj_df['lendprojectcost'] = initial_proj_df['lendprojectcost'].astype('float32')

initial_proj_df['ibrdcommamt'] = initial_proj_df['ibrdcommamt'].str.replace(';','')
initial_proj_df['ibrdcommamt'] = initial_proj_df['ibrdcommamt'].astype('float32')

initial_proj_df['idacommamt'] = initial_proj_df['idacommamt'].str.replace(';','')
initial_proj_df['idacommamt'] = initial_proj_df['idacommamt'].astype('float32')

initial_proj_df['grantamt'] = initial_proj_df['grantamt'].str.replace(';','')
initial_proj_df['grantamt'] = initial_proj_df['grantamt'].astype('float32')

In [22]:

initial_proj_df[is_africa][['countryname','project_name','boardapprovaldate','status','lendprojectcost','grantamt']][:10]

Out[22]:

	countryname	project_name	boardapprovaldate	status	lendprojectcost	grantamt
2	Republic of Mozambique;Republic of Mozambique	Mozambique Nutrition Additional Financing	2013-01-24T00:00:00Z	Active	37000000	0
4	Republic of Mozambique;Republic of Mozambique	Mozambique Climate Change Development Policy O...	2013-01-24T00:00:00Z	Active	50000000	0
8	Federal Republic of Nigeria;Federal Republic o...	Nigeria Post-Compliance I EITI	2013-01-18T00:00:00Z	Active	900000	900000
11	United Republic of Tanzania;United Republic of...	TANZANIA SECOND CENTRAL TRANSP CORRIDOR PROJEC...	2013-01-15T00:00:00Z	Active	100000000	0
14	Republic of Togo;Republic of Togo	Integrated Disaster and Land Management Project	2013-01-09T00:00:00Z	Active	7290000	7290000
17	Africa;Africa	Nile Cooperation for Results Project	2013-01-01T00:00:00Z	Active	15300000	15300000
21	Republic of Senegal;Republic of Senegal	SN- First Governance and Growth Support Project	2012-12-20T00:00:00Z	Closed	55000000	0
22	Republic of Namibia;Republic of Namibia	Namibian Coast Conservation Additional Finance	2012-12-20T00:00:00Z	Active	7800000	1930000
23	Burkina Faso;Burkina Faso	Third Phase Community Based Rural Development ...	2012-12-20T00:00:00Z	Active	86000000	0
24	Burkina Faso;Burkina Faso	Sustainable land and forestry management Project	2012-12-20T00:00:00Z	Active	77410000	7410000

In [23]:

projcp_df = initial_proj_df.copy()

In [24]:

projcp_df = projcp_df.drop(['lendinginstrtype','envassesmentcategorycode','productlinetype','closingdate','url','sector2','sector3','sector4','sector5','sector','mjsector1','mjsector2','mjsector3','mjsector4','mjsector5','mjsector','theme1','theme2','theme3','theme4','theme5','financier','mjtheme2name','mjtheme3name','mjtheme4name','mjtheme5name'],axis=1)

In [25]:

del projcp_df['projectstatusdisplay']

In [26]:

projcp_df2 = projcp_df.drop(['prodline','supplementprojectflg','goal','mjtheme1name','location'], axis=1)

In [27]:

projcp_df2.columns

Out[27]:

Index([id, regionname, countryname, lendinginstr, status, project_name, boardapprovaldate, board_approval_month, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, projectdoc , majorsector_percent , sector1, theme ], dtype=object)

In [28]:

projcp_df2[is_africa][:5]

Out[28]:

	id	regionname	countryname	lendinginstr	status	project_name	boardapprovaldate	board_approval_month	lendprojectcost	idacommamt	totalamt	grantamt	borrower	impagency	projectdoc	majorsector_percent	sector1	theme
2	P125477	AFRICA	Republic of Mozambique;Republic of Mozambique	Specific Investment Loan	Active	Mozambique Nutrition Additional Financing	2013-01-24T00:00:00Z	January	37000000	37000000	37000000	0	GOVERNMENT OF MOZAMBIQUE	MINISTRY OF HEALTH	NaN	NaN	Health!$!100!$!JA	NaN
4	P128434	AFRICA	Republic of Mozambique;Republic of Mozambique	Development Policy Lending	Active	Mozambique Climate Change Development Policy O...	2013-01-24T00:00:00Z	January	50000000	50000000	50000000	0	GOVERNMENT OF MOZAMBIQUE	MINISTRY OF PLANNING AND DEVELOPMENT	NaN	NaN	General agriculture; fishing and forestry sect...	NaN
8	P132807	AFRICA	Federal Republic of Nigeria;Federal Republic o...	Technical Assistance Loan	Active	Nigeria Post-Compliance I EITI	2013-01-18T00:00:00Z	January	900000	0	0	900000	GOVERNMENT OF NIGERIA	NEITI SECRETARIAT	NaN	NaN	Other Mining and Extractive Industries!$!50!$!LS	NaN
11	P124114	AFRICA	United Republic of Tanzania;United Republic of...	Specific Investment Loan	Active	TANZANIA SECOND CENTRAL TRANSP CORRIDOR PROJEC...	2013-01-15T00:00:00Z	January	100000000	100000000	100000000	0	MIN. OF FINANCE & ECONOMIC AFFAIRS	TANZANIA NATIONAL ROADS AGENCY	NaN	NaN	Urban Transport!$!95!$!TC	NaN
14	P123922	AFRICA	Republic of Togo;Republic of Togo	Specific Investment Loan	Active	Integrated Disaster and Land Management Project	2013-01-09T00:00:00Z	January	7290000	0	0	7290000	MINISTRY OF ENVIRONMENT AND FORESTS	THE NATIONAL PLATFORM FOR DISASTER RISK REDUCTION	NaN	NaN	General water; sanitation and flood protection...	NaN

In [29]:

grouped = projcp_df2.groupby('regionname')

This section we look further into the Projects and operations data sets and try to find any interesting or surprising facts to analyze further. We also perform some basic statistics on the data such as summation, mean, standard deviation, etc

In [30]:

# function to calculate the total amount awarded by the worldbank per country or regional operating body
def func(x):
    totalamt = x['totalamt'].sum()
    return Series([totalamt] ,index=['totalamt'])

In [31]:

# result dataframe 
result = grouped.apply(func)

In [32]:

#create a new column in dataframe to hold the years from the board approval date
projcp_df2['year'] = projcp_df2['boardapprovaldate'].str[:4]

In [33]:

projcp_df2['year'][:2]

Out[33]:

0    2013
1    2013
Name: year

In [34]:

# group data by year and region name
grouped3 = projcp_df2.groupby(['regionname','year'])

In [35]:

# statistics on the banks lending commitments to different regions over time
grouped3['totalamt'].describe()

Out[35]:

regionname  year       
AFRICA      1950  count           2.000000
                  mean      3500000.000000
                  std       2121320.547267
                  min       2000000.000000
                  25%       2750000.000000
                  50%       3500000.000000
                  75%       4250000.000000
                  max       5000000.000000
            1951  count           4.000000
                  mean     22875000.000000
                  std      16423434.091628
                  min       1500000.000000
                  25%      15375000.000000
                  50%      25000000.000000
                  75%      32500000.000000
...
SOUTH ASIA  2011  mean     1.241677e+08
                  std      2.437052e+08
                  min      0.000000e+00
                  25%      0.000000e+00
                  50%      3.364500e+07
                  75%      1.287500e+08
                  max      1.200000e+09
            2012  count    6.100000e+01
                  mean     8.606065e+07
                  std      1.462807e+08
                  min      0.000000e+00
                  25%      0.000000e+00
                  50%      3.600000e+07
                  75%      1.060000e+08
                  max      8.400000e+08
Length: 3152

In [36]:

grouped4 = projcp_df2.groupby(['regionname','year','board_approval_month'])

In [37]:

result4 =grouped4.apply(func)

In [38]:

result4.unstack('regionname')[:5]

Out[38]:

		totalamt
	regionname	AFRICA	EAST ASIA AND PACIFIC	EUROPE AND CENTRAL ASIA	LATIN AMERICA AND CARIBBEAN	MIDDLE EAST AND NORTH AFRICA	OTHER	SOUTH ASIA
year	board_approval_month
1947	August	NaN	NaN	247000000	NaN	NaN	NaN	NaN
1947	May	NaN	NaN	250000000	NaN	NaN	NaN	NaN
1948	July	NaN	NaN	8000000	NaN	NaN	NaN	NaN
1948	March	NaN	NaN	NaN	16000000	NaN	NaN	NaN
1949	August	NaN	NaN	NaN	5000000	NaN	NaN	34000000

In [39]:

result5 = grouped3.apply(func).unstack('regionname').fillna(0)

In [40]:

result5[:5]

Out[40]:

	totalamt
regionname	AFRICA	EAST ASIA AND PACIFIC	EUROPE AND CENTRAL ASIA	LATIN AMERICA AND CARIBBEAN	MIDDLE EAST AND NORTH AFRICA	OTHER	SOUTH ASIA
year
1947	0	0	497000000	0	0	0	0
1948	0	0	8000000	16000000	0	0	0
1949	0	0	37500000	126600000	0	0	44000000
1950	7000000	125400000	25400000	90100000	12800000	0	18500000
1951	91500000	0	71500000	45400000	0	0	0

In [41]:

# python-us-cpi is a tool for parsing the latest US Consumer Price Index and also provides an inflation calculator api.
#We'll be using this api to calculate the loan commitments from other years into today's dollars for better comparision. 

from uscpi import UsCpi

cpi = UsCpi() # downloads the latest CPI data

    # $100 in 2012 is worth how much in 1980?
cpi.value_with_inflation(100, 2012, 1980)

Out[41]:

35.89

In [42]:

projcpi = projcp_df2[['regionname','countryname','project_name','totalamt','grantamt','sector1','year']].copy()

In [43]:

# Function used to convert monetary values to today's value from any year less than 2013 using the cpi api

def fun2(y):
    totalamts = y['totalamt']
    year = int(y['year'])   
    regionname =y['regionname']
    countryname = y['countryname']
    project_name = y['project_name']
    grantamt = y['grantamt']
    sector1 = y['sector1']
    boolVal = 1914 <= year <= 2013
    if(boolVal):
        totalamts = cpi.value_with_inflation(totalamts,year,2013)
    return Series([regionname,countryname,project_name,totalamts,grantamt,sector1,year],index=['regionname','countryname','project_name','totalamt','grantamt','sector1','year'])

In [44]:

resultcpi = projcpi.fillna(0).apply(fun2, axis=1)

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.10.1-py2.7-macosx-10.5-i386.egg/pandas/core/frame.py:3576: FutureWarning: rename with inplace=True  will return None from pandas 0.11 onward
  " from pandas 0.11 onward", FutureWarning)

In [45]:

# Data cleaning: Removing un-wanted data from the sector1 string
resultcpi['sectorMain'] = resultcpi['sector1'].str.split("!").str[0]

In [46]:

resultcpi['country'] = resultcpi['countryname'].str.split(";").str[0]

In [47]:

# validate that the data is cleaned. 
resultcpi[:4]

Out[47]:

	regionname	countryname	project_name	totalamt	grantamt	sector1	year	sectorMain	country
0	EUROPE AND CENTRAL ASIA	Republic of Armenia;Republic of Armenia	JSDF Strengthening the Livelihoods and Voice o...	0	2670000	Other social services!$!100!$!JB	2013	Other social services	Republic of Armenia
1	EAST ASIA AND PACIFIC	Socialist Republic of Vietnam;Socialist Republ...	Mekong Delta Transport Infrastructure Developm...	156000000	0	Rural and Inter-Urban Roads and Highways!$!55!...	2013	Rural and Inter-Urban Roads and Highways	Socialist Republic of Vietnam
2	AFRICA	Republic of Mozambique;Republic of Mozambique	Mozambique Nutrition Additional Financing	37000000	0	Health!$!100!$!JA	2013	Health	Republic of Mozambique
3	EUROPE AND CENTRAL ASIA	Republic of Moldova;Republic of Moldova	Moldova Education Reform Project	40000000	0	General education sector!$!100!$!EZ	2013	General education sector	Republic of Moldova

In [48]:

resultcpi['year'] = resultcpi['year'].astype(int)

In [49]:

# The dataset is big, which makes it very difficult to analyze. This next step we construct a boolean to extract only those items that have been funded from 200 - 2013
is_bv = (resultcpi['year'] >= 2000) & (resultcpi['year'] <= 2013)

In [50]:

resultcpi2 = resultcpi[is_bv]

In [51]:

# verify that data is formatted in the way we want to analyze it. 
resultcpi2[:4]

Out[51]:

	regionname	countryname	project_name	totalamt	grantamt	sector1	year	sectorMain	country
0	EUROPE AND CENTRAL ASIA	Republic of Armenia;Republic of Armenia	JSDF Strengthening the Livelihoods and Voice o...	0	2670000	Other social services!$!100!$!JB	2013	Other social services	Republic of Armenia
1	EAST ASIA AND PACIFIC	Socialist Republic of Vietnam;Socialist Republ...	Mekong Delta Transport Infrastructure Developm...	156000000	0	Rural and Inter-Urban Roads and Highways!$!55!...	2013	Rural and Inter-Urban Roads and Highways	Socialist Republic of Vietnam
2	AFRICA	Republic of Mozambique;Republic of Mozambique	Mozambique Nutrition Additional Financing	37000000	0	Health!$!100!$!JA	2013	Health	Republic of Mozambique
3	EUROPE AND CENTRAL ASIA	Republic of Moldova;Republic of Moldova	Moldova Education Reform Project	40000000	0	General education sector!$!100!$!EZ	2013	General education sector	Republic of Moldova

In [52]:

#In the line below we are finding the total bank commitments to Africa over a period from 2000 - 2009. 
# We use the cpi function to calculate the inflation and CPI on all the loans less than 2013
ggroup_africa = resultcpi2[resultcpi2['regionname']=='AFRICA'].groupby('year').apply(func)

In [846]:

ggroup_africa.plot(kind='bar', title='Bank lending commitments to Africa in year 2000 - 2013'); plt.tight_layout()

In [53]:

#In the line below we are finding the total bank commitments per region over a period from 2000 - 2009. 
# We use the cpi function to calculate the inflation and CPI on all the loans less than 2013
amtByRegion =resultcpi2.groupby(['regionname','year']).apply(func).unstack('regionname')

In [54]:

amtByRegion[:2]

Out[54]:

	totalamt
regionname	AFRICA	EAST ASIA AND PACIFIC	EUROPE AND CENTRAL ASIA	LATIN AMERICA AND CARIBBEAN	MIDDLE EAST AND NORTH AFRICA	OTHER	SOUTH ASIA
year
2000	4.695581e+09	3.100790e+09	4.469655e+09	5.950326e+09	1.367870e+09	0	3.328411e+09
2001	4.775420e+09	2.858354e+09	5.639439e+09	6.512251e+09	9.593054e+08	0	3.996901e+09

In [852]:

amtByRegion.plot(kind='bar',figsize=(16,8), title='Lendig commitments by the Bank from 1947 - 2013'); plt.legend(loc='best')

Out[852]:

<matplotlib.legend.Legend at 0x216a1030>

In [55]:

# count the number of world bank projects from 2000 - 2013 per country
numOfproj_by_country = resultcpi2.groupby('country').size().order(na_last=True, ascending=False, kind='mergesort')

In [56]:

numOfproj_by_country[:5]

Out[56]:

country
Republic of India                211
People's Republic of China       205
Federative Republic of Brazil    187
Republic of Indonesia            177
Africa                           167

The top 3 borrowers from the world bank are part of the BRICS. We are interested in analyzing patterns of borrowing between the BRIC Nations, their freedom index, Human Development Index and GDP. The next steps analyze the lending of the world bank to these nations.

Brazil
China
India
Russia
South Africa

In [855]:

# From above, I observed that the top most funded UN nations are BRICS, so the list below is created to filter out the BRICS for further observation and analysis
listBRICS = ['Federative Republic of Brazil','Russian Federation','Republic of India','People\'s Republic of China','Republic of South Africa']

In [856]:

brics_nations = resultcpi2[resultcpi2['country'].isin(listBRICS)].groupby(['country','year']).size()

In [857]:

# In the Graph, we look at the number of projects funded by the world bank per country per year since 2000 - 2013
brics_nations.unstack('country').fillna(0).plot(subplots=True, figsize=(8, 8),kind='bar'); plt.legend(loc='best');plt.tight_layout()

In [859]:

#rpt[rpt['STK_ID'].isin(stk_list)]
df_of_BRICS = resultcpi2[resultcpi2['country'].isin(listBRICS)].groupby(['country','sector3']).size().order(na_last=True, ascending=False, kind='mergesort')

In [860]:

df_of_BRICS.unstack('country').fillna(0)

Out[860]:

country	Federative Republic of Brazil	People's Republic of China	Republic of India	Republic of South Africa	Russian Federation
sector3
Agricultural extension and research	1	2	7	0	0
Agro-industry	0	1	0	0	0
Agro-industry; marketing; and trade	2	0	6	0	0
Animal production	0	4	1	0	0
Banking	3	2	2	0	0
Capital markets	0	2	0	0	0
Central government administration	12	4	6	1	9
Compulsory health finance	0	1	0	0	0
Compulsory pension and unemployment insurance	3	2	0	0	0
Crops	0	0	1	0	0
Energy efficiency in Heat and Power	2	14	7	0	2
Flood protection	2	6	2	0	0
Forestry	10	9	4	1	2
General agriculture; fishing and forestry sector	33	7	16	4	0
General education sector	2	0	0	0	1
General energy sector	1	2	0	1	0
General finance sector	1	2	3	0	0
General industry and trade sector	2	1	0	0	1
General public administration sector	4	3	3	0	0
General transportation sector	4	2	2	0	0
General water; sanitation and flood protection sector	4	5	6	0	0
Health	8	5	16	0	2
Housing construction	1	2	3	0	0
Housing finance	1	0	0	0	0
Hydropower	0	0	2	0	0
Irrigation and drainage	0	8	14	0	0
Law and justice	0	0	1	0	1
Micro- and SME finance	3	0	1	0	0
Microfinance	0	0	1	0	0
Non-compulsory health finance	0	1	0	0	0
Oil and gas	1	0	0	0	1
Other Renewable Energy	0	3	1	2	0
Other industry	1	5	5	0	1
Other social services	11	2	5	0	2
Payments; settlements; and remittance systems	0	1	0	0	0
Petrochemicals and fertilizers	1	0	0	0	0
Ports; waterways and shipping	1	5	0	0	0
Power	2	5	6	1	0
Pre-primary education	1	0	0	0	0
Primary education	4	1	5	0	0
Public administration- Agriculture; fishing and forestry	0	1	1	0	0
Public administration- Education	1	0	0	0	0
Public administration- Energy and mining	1	2	0	0	0
Public administration- Financial Sector	0	0	0	0	1
Public administration- Health	1	0	1	0	0
Public administration- Transportation	2	0	2	0	0
Public administration- Water; sanitation and flood protection	3	2	1	0	0
Railways	2	8	3	0	0
Renewable energy	2	6	5	3	0
Roads and highways	4	11	9	0	0
Rural and Inter-Urban Roads and Highways	7	11	17	0	0
SME Finance	0	1	2	0	0
Sanitation	1	2	1	0	0
Secondary education	0	0	1	0	0
Sewerage	2	11	0	0	0
Solid waste management	4	5	0	0	0
Sub-national government administration	18	3	23	0	4
Telecommunications	0	0	1	0	0
Tertiary education	0	0	2	0	0
Thermal Power Generation	0	1	0	0	0
Transmission and Distribution of Electricity	0	0	4	1	0
Urban Transport	11	15	1	0	0
Vocational training	0	4	2	0	0
Wastewater Collection and Transportation	2	4	0	0	0
Wastewater Treatment and Disposal	1	2	0	0	0
Water supply	4	9	9	0	1

In [861]:

# We import the Freedom index csv for comparison analysis

f = codecs.open(file_path("fredict"), encoding='iso-8859-1')
free_df = pd.read_csv(f)

In [862]:

free_df[:2]

Out[862]:

	name	index year	overall score	property rights	freedom from corruption	fiscal freedom	government spending	business freedom	labor freedom	monetary freedom	trade freedom	investment freedom	financial freedom
0	Afghanistan	2013	N/A	N/A	15	N/A	83.2	59.7	75.8	69.5	N/A	65	N/A
1	Albania	2013	65.2	30	31	92.6	75.1	81	49	78.4	79.8	65	70

In [863]:

# similar to the projects and operations dataset, I restrict the analysis to only data from 2000 - 2009
free_df2 = free_df[free_df['index year']>=2000].copy()

In [864]:

free_df2.columns

Out[864]:

Index([name, index year, overall score, property rights, freedom from corruption, fiscal freedom, government spending, business freedom, labor freedom, monetary freedom, trade freedom, investment freedom, financial freedom], dtype=object)

In [865]:

# I extract the BRICS to further observe them 
free_df2 = free_df2[free_df2['name'].isin(['China', 'India', 'Russia', 'Brazil', 'South Africa'])]

In [867]:

free_df2[:5]

Out[867]:

	name	index year	overall score	property rights	freedom from corruption	fiscal freedom	government spending	business freedom	labor freedom	monetary freedom	trade freedom	investment freedom	financial freedom
20	Brazil	2013	57.7	50	38	70.3	54.8	53	57.2	74.4	69.7	50	60
32	China	2013	51.9	20	36	70.2	83.3	48	62.6	71.6	72	25	30
70	India	2013	55.2	50	31	78.3	77.9	37.3	73.6	65.3	63.6	35	40
131	Russia	2013	51.1	25	24	86.9	54.4	69.2	52.6	66.7	77.4	25	30
147	South Africa	2013	61.8	50	41	70.5	69.2	74.7	55.6	75.8	76.3	45	60

In [868]:

free_df3 = free_df2[['name','index year','overall score']].copy()

In [869]:

free_df3[:2]

Out[869]:

	name	index year	overall score
20	Brazil	2013	57.7
32	China	2013	51.9

In [870]:

free_df3['overall score'] = free_df3['overall score'].astype(float)

In [874]:

free_df3.pivot_table(['overall score'], rows=['index year'], cols='name').plot(kind='line', title='freedom Index per BRICS country', figsize=(10,10))

Out[874]:

<matplotlib.axes.AxesSubplot at 0x24b8cbb0>

In [875]:

free_df3.pivot_table(['overall score'], rows=['index year'], cols='name').plot(subplots=True, figsize=(8, 8)); plt.legend(loc='best');plt.tight_layout();plt.ylabel('Freedom Index');

In [876]:

# In the Graph, we look at the number of projects funded by the world bank per country per year since 2000 - 2013
brics_nations.unstack('country').fillna(0).plot(subplots=True, figsize=(8, 8),kind='bar'); plt.legend(loc='best');plt.tight_layout()

Freedom versus funding

In [5]:

f = codecs.open(file_path("fredict"), encoding='iso-8859-1')
free_df = pd.read_csv(f)

In [6]:

#because I'm looking at the contribution of funds over a period of time, I want to look at the current
#Freedom Index for these countries to make an anlysis of their current state
free_df2 = free_df[free_df['index year']==2013].copy()

#for simplicity, let's ignore those who have not been scored as well
free_df2 = free_df2[free_df2['overall score']!='N/A'].copy()

In [7]:

free_df2.columns

Out[7]:

Index([name, index year, overall score, property rights, freedom from corruption, fiscal freedom, government spending, business freedom, labor freedom, monetary freedom, trade freedom, investment freedom, financial freedom], dtype=object)

What are the most "free" countries according to the freedom index?¶

In [8]:

low_freedom = free_df2.sort(['overall score'], ascending=True)
low_freedom = low_freedom[:10]
low_freedom[['name', 'overall score']]

Out[8]:

	name	overall score
118	North Korea	1.5
38	Cuba	28.5
184	Zimbabwe	28.6
180	Venezuela	36.1
50	Eritrea	36.3
23	Burma	39.2
41	Democratic Republic of Congo	39.6
49	Equatorial Guinea	42.3
171	Turkmenistan	42.6
72	Iran	43.2

In [9]:

high_freedom = free_df2.sort(['overall score'], ascending=False)
high_freedom = high_freedom[:10]
high_freedom[['name', 'overall score']]

Out[9]:

	name	overall score
67	Hong Kong	89.3
142	Singapore	88
6	Australia	82.6
114	New Zealand	81.4
155	Switzerland	81
27	Canada	79.4
31	Chile	79
104	Mauritius	76.9
42	Denmark	76.1
176	United States	76

let's look at corruption as well¶

In [10]:

#high corruption
high_corruption = free_df2.sort(['freedom from corruption'], ascending=True)
high_corruption = high_corruption[:10]
high_corruption[['name', 'freedom from corruption']]

Out[10]:

	name	freedom from corruption
14	Belize	0
118	North Korea	10
23	Burma	15
171	Turkmenistan	16
178	Uzbekistan	16
65	Haiti	18
24	Burundi	19
49	Equatorial Guinea	19
180	Venezuela	19
3	Angola	20

In [11]:

#low corruption
low_corruption = free_df2.sort(['freedom from corruption'], ascending=False)
low_corruption = low_corruption[:10]
low_corruption[['name', 'freedom from corruption']]

Out[11]:

	name	freedom from corruption
114	New Zealand	95
42	Denmark	94
54	Finland	94
154	Sweden	93
142	Singapore	92
119	Norway	90
163	The Netherlands	89
6	Australia	88
155	Switzerland	88
27	Canada	87

What are the most funded countries?¶

Hypothesis: we would expect that the most funded countries have some correlation between economic freedom. We also might expect that if countries do have the most funding and are not included high on the freedom index, that they might have a high corruption index.¶

In [57]:

numOfproj_by_country[:10]

Out[57]:

country
Republic of India                  211
People's Republic of China         205
Federative Republic of Brazil      187
Republic of Indonesia              177
Africa                             167
Socialist Republic of Vietnam      158
Islamic Republic of Pakistan       104
United Mexican States               96
Islamic State of Afghanistan        93
People's Republic of Bangladesh     92

In [58]:

#recall the resultcpi looking at the projects funded, converted to 2013 dollars

#let's sort by country
country_cpi = resultcpi.sort(column='country', ascending=True)
country_cpi[['country', 'totalamt', 'grantamt', 'year']][:2]

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.10.1-py2.7-macosx-10.5-i386.egg/pandas/core/frame.py:3112: FutureWarning: column is deprecated, use columns
  warnings.warn("column is deprecated, use columns", FutureWarning)

Out[58]:

	country	totalamt	grantamt	year
14536	NaN	10.600000	0	0
14474	NaN	57.299999	0	0

In [59]:

#there are a few problems with this data-- First, there are continents included in the countryname:

country_cpi= country_cpi.dropna()

#country_cpi= country_cpi[((country_cpi.country !='Africa')
#                &(country_cpi.country !='Central America') 
#                &(country_cpi.country !='Latin America')
#                &(country_cpi.country !='Europe')
#                &(country_cpi.country !='East Asia and Pacific')
#                &(country_cpi.country !='Europe and Central Asia')
#                &(country_cpi.country !='World')
#                &(country_cpi.country !='Asia')
#                &(country_cpi.country !='Middle East and North Africa')
#                &(country_cpi.country !='Africa')
#                &(country_cpi.country !='South Eastern Europe and Balkans'))]

In [60]:

#Because the naming conventions of the Freedom Index and World Bank, we will have to manually input the countries we are looking for to compare their freedom index with their world bank funding.
#Because the naming conventions of the Freedom Index and World Bank, we will have to manually input the countries we are looking for to compare their freedom index with their world bank funding.
low_Freedom_list= ['Belize','Turkmenistan','Republic of Zimbabwe','Republic of Uzbekistan',
                    'Republic of Haiti', 'Republic of Burundi', 'Republic of Equatorial Guinea', 
                    'People\'s Republic of Angola', 'Republica Bolivariana de Venezuela']
high_Freedom_list= ['Kingdom of Norway', 'New Zealand', 'Kingdom of Denmark', 'Republic of Finland',
                    'Republic of Sweden', 'Kingdom of The Netherlands', 'Common of Australia']

In [61]:

#now we can see how much money was committed to each country in the Freedom_list
low_Freedom_nations = country_cpi[country_cpi['country'].isin(low_Freedom_list)].groupby(['country']).size()
low_Freedom_nations

Out[61]:

country
Belize                                18
People's Republic of Angola           29
Republic of Burundi                   90
Republic of Equatorial Guinea         11
Republic of Haiti                     86
Republic of Uzbekistan                36
Republic of Zimbabwe                  54
Republica Bolivariana de Venezuela    53
Turkmenistan                           9

In [62]:

low_Freedom_nations.plot(kind='bar', title='Lending to Countries with low Freedom Index'); plt.tight_layout()

From this point, we are interested in expanding on the previous analysis, by retriving information from wikipedia an others.
¶

In [1]:

import pandas as pd
import wikipydia as wk
import mwparserfromhell
from wikitools import wiki
from wikitools import api
from wikitools import category
from wikitools import page
import itertools
import re
wikisite = "http://en.wikipedia.org/w/api.php"
wikiObject = wiki.Wiki(wikisite)

projectsAPI = pd.read_csv('../data/projects_operations_api.csv')
wikipediadf = pd.read_csv('../data/matchcountries.csv')

# some cleaning on the datasets
wikipediadf.index =wikipediadf['countryname'] 
projectsAPI['countryname'] = [str(country).split(";")[0] for country in projectsAPI['countryname']] 
#print matchNames.columns
#print projectsAPI.columns
projects = pd.merge(projectsAPI,wikipediadf, on='countryname', how = 'left')
projects = projects[projects['countryname'].map(type) != type(0.0)]
projectsAPI = projectsAPI[projectsAPI['countryname'].map(type) != type(0.0)]

projects['totalamt'] = projects['totalamt'].str.replace(';','')
projects['totalamt'] = projects['totalamt'].astype('float32')
print projects.columns
projects['year'] = [str(x)[0:4] for x in projects['boardapprovaldate']]
projects[projects.year == 'nan'] =[str(x)[0:4] for x in projects[projects.year == 'nan']['closingdate']] 

Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location, wikiname, type, checktype, mapname], dtype=object)