import pandas as pd
import numpy as np
OPEC_df = pd.DataFrame({
'Country': ['Algeria','Angola','Equatorial Guinea','Gabon','Iran','Iraq','Kuwait','Libya','Nigeria','Republic of the Congo','Saudi Arabia','UAE','Venezuela'],
'Region': ['North Africa','Southern Africa','Central Africa','Central Africa','Middle East','Middle East','Middle East','North Africa','West Africa','Central Africa','Middle East','Middle East','South America'],
'Population': [42228408,30809787,1308975,2119275,81800188,38433600,4137312,6678559,195874685,5125821,33702756,9630959,28887118],
'Oil Production': [1348361,1769615,np.nan,210820,3990956,4451516,2923825,384686,1999885,260000,10460710,3106077,2276967],
'Proven Reserves': [12.2e9,8.423e9,np.nan,2e9,157.53e9,143.069e9,101.5e9,48.363e9,37.07e9,1.6e9,266.578e9,97.8e9,299.953e9]
})
We can do boolean operations on a column of a DataFrame to obtain an array of boolean values. This boolean array can be used to filter a DataFrame - all entries corresponding to a False value are dropped from the DataFrame.
# Get Middle Eastern countries data only
is_ME = (OPEC_df['Region'] == "Middle East")
OPEC_df[is_ME]
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
Let's say we want to filter using some other condition (e.g. countries with production over 2M bbl/day):
prod_over_2M = (OPEC_df["Oil Production"] > 2e6)
OPEC_df[prod_over_2M]
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 |
The logical_and function we saw before can be used to filter on multiple conditions.
OPEC_df[np.logical_and(is_ME,prod_over_2M)]
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
# Include countries with lapsed OPEC membership
OPEC_df = OPEC_df.append({'Country':'Ecuador',
'Region':'South America',
'Population': 17084358,
'Oil Production': 548421,
'Proven Reserves': 8.273e9},
ignore_index=True)
OPEC_df = OPEC_df.append({'Country':'Indonesia',
'Region':'Southeast Asia',
'Population': 267670543,
'Oil Production': 833667,
'Proven Reserves': 3.6925e9},
ignore_index=True)
OPEC_df = OPEC_df.append({'Country':'Qatar',
'Region':'Middle East',
'Population': 2781682,
'Oil Production': 1522902,
'Proven Reserves': 25.244e9},
ignore_index=True)
OPEC_df
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN |
3 | Gabon | Central Africa | 2119275 | 210820.0 | 2.000000e+09 |
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 |
7 | Libya | North Africa | 6678559 | 384686.0 | 4.836300e+10 |
8 | Nigeria | West Africa | 195874685 | 1999885.0 | 3.707000e+10 |
9 | Republic of the Congo | Central Africa | 5125821 | 260000.0 | 1.600000e+09 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 |
13 | Ecuador | South America | 17084358 | 548421.0 | 8.273000e+09 |
14 | Indonesia | Southeast Asia | 267670543 | 833667.0 | 3.692500e+09 |
15 | Qatar | Middle East | 2781682 | 1522902.0 | 2.524400e+10 |
New columns in a DataFrame can be created by setting a column of that name to a list/array of the right length:
OPEC_df['Membership Active'] = [True]*13 + [False]*3
OPEC_df
Country | Region | Population | Oil Production | Proven Reserves | Membership Active | |
---|---|---|---|---|---|---|
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 | True |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 | True |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN | True |
3 | Gabon | Central Africa | 2119275 | 210820.0 | 2.000000e+09 | True |
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 | True |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 | True |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 | True |
7 | Libya | North Africa | 6678559 | 384686.0 | 4.836300e+10 | True |
8 | Nigeria | West Africa | 195874685 | 1999885.0 | 3.707000e+10 | True |
9 | Republic of the Congo | Central Africa | 5125821 | 260000.0 | 1.600000e+09 | True |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 | True |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 | True |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 | True |
13 | Ecuador | South America | 17084358 | 548421.0 | 8.273000e+09 | False |
14 | Indonesia | Southeast Asia | 267670543 | 833667.0 | 3.692500e+09 | False |
15 | Qatar | Middle East | 2781682 | 1522902.0 | 2.524400e+10 | False |
New columns can also be created from operations on existing columns:
OPEC_df['Production per cap'] = OPEC_df['Oil Production'] / OPEC_df['Population']
OPEC_df
Country | Region | Population | Oil Production | Proven Reserves | Membership Active | Production per cap | |
---|---|---|---|---|---|---|---|
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 | True | 0.031930 |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 | True | 0.057437 |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN | True | NaN |
3 | Gabon | Central Africa | 2119275 | 210820.0 | 2.000000e+09 | True | 0.099477 |
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 | True | 0.048789 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 | True | 0.115824 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 | True | 0.706697 |
7 | Libya | North Africa | 6678559 | 384686.0 | 4.836300e+10 | True | 0.057600 |
8 | Nigeria | West Africa | 195874685 | 1999885.0 | 3.707000e+10 | True | 0.010210 |
9 | Republic of the Congo | Central Africa | 5125821 | 260000.0 | 1.600000e+09 | True | 0.050724 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 | True | 0.310381 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 | True | 0.322510 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 | True | 0.078823 |
13 | Ecuador | South America | 17084358 | 548421.0 | 8.273000e+09 | False | 0.032101 |
14 | Indonesia | Southeast Asia | 267670543 | 833667.0 | 3.692500e+09 | False | 0.003115 |
15 | Qatar | Middle East | 2781682 | 1522902.0 | 2.524400e+10 | False | 0.547475 |
We can sort a DataFrame by the values in a given column:
OPEC_df.sort_values(by='Oil Production', ascending=False)
Country | Region | Population | Oil Production | Proven Reserves | Membership Active | Production per cap | |
---|---|---|---|---|---|---|---|
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 | True | 0.310381 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 | True | 0.115824 |
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 | True | 0.048789 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 | True | 0.322510 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 | True | 0.706697 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 | True | 0.078823 |
8 | Nigeria | West Africa | 195874685 | 1999885.0 | 3.707000e+10 | True | 0.010210 |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 | True | 0.057437 |
15 | Qatar | Middle East | 2781682 | 1522902.0 | 2.524400e+10 | False | 0.547475 |
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 | True | 0.031930 |
14 | Indonesia | Southeast Asia | 267670543 | 833667.0 | 3.692500e+09 | False | 0.003115 |
13 | Ecuador | South America | 17084358 | 548421.0 | 8.273000e+09 | False | 0.032101 |
7 | Libya | North Africa | 6678559 | 384686.0 | 4.836300e+10 | True | 0.057600 |
9 | Republic of the Congo | Central Africa | 5125821 | 260000.0 | 1.600000e+09 | True | 0.050724 |
3 | Gabon | Central Africa | 2119275 | 210820.0 | 2.000000e+09 | True | 0.099477 |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN | True | NaN |
We can also group entries by a common property. In this dataset, Region is the natural grouping. We have to specify what operation is applied to numeric data, e.g. adding or averaging the values for a given group.
OPEC_df.groupby('Region').sum()
Population | Oil Production | Proven Reserves | Membership Active | Production per cap | |
---|---|---|---|---|---|
Region | |||||
Central Africa | 8554071 | 470820.0 | 3.600000e+09 | 3 | 0.150201 |
Middle East | 170486497 | 26455986.0 | 7.917210e+11 | 5 | 2.051676 |
North Africa | 48906967 | 1733047.0 | 6.056300e+10 | 2 | 0.089530 |
South America | 45971476 | 2825388.0 | 3.082260e+11 | 1 | 0.110924 |
Southeast Asia | 267670543 | 833667.0 | 3.692500e+09 | 0 | 0.003115 |
Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 | 1 | 0.057437 |
West Africa | 195874685 | 1999885.0 | 3.707000e+10 | 1 | 0.010210 |
OPEC_df.groupby('Region').mean()
Population | Oil Production | Proven Reserves | Membership Active | Production per cap | |
---|---|---|---|---|---|
Region | |||||
Central Africa | 2.851357e+06 | 235410.0 | 1.800000e+09 | 1.000000 | 0.075100 |
Middle East | 2.841442e+07 | 4409331.0 | 1.319535e+11 | 0.833333 | 0.341946 |
North Africa | 2.445348e+07 | 866523.5 | 3.028150e+10 | 1.000000 | 0.044765 |
South America | 2.298574e+07 | 1412694.0 | 1.541130e+11 | 0.500000 | 0.055462 |
Southeast Asia | 2.676705e+08 | 833667.0 | 3.692500e+09 | 0.000000 | 0.003115 |
Southern Africa | 3.080979e+07 | 1769615.0 | 8.423000e+09 | 1.000000 | 0.057437 |
West Africa | 1.958747e+08 | 1999885.0 | 3.707000e+10 | 1.000000 | 0.010210 |
Add to the DataFrame a column for proven reserves per capita. Output: