import pandas as pd
import numpy as np
OPEC_df = pd.DataFrame({
'Country': ['Algeria','Angola','Equatorial Guinea','Gabon','Iran','Iraq','Kuwait','Libya','Nigeria','Republic of the Congo','Saudi Arabia','UAE','Venezuela'],
'Region': ['North Africa','Southern Africa','Central Africa','Central Africa','Middle East','Middle East','Middle East','North Africa','West Africa','Central Africa','Middle East','Middle East','South America'],
'Population': [42228408,30809787,1308975,2119275,81800188,38433600,4137312,6678559,195874685,5125821,33702756,9630959,28887118],
'Oil Production': [1348361,1769615,np.nan,210820,3990956,4451516,2923825,384686,1999885,260000,10460710,3106077,2276967],
'Proven Reserves': [12.2e9,8.423e9,np.nan,2e9,157.53e9,143.069e9,101.5e9,48.363e9,37.07e9,1.6e9,266.578e9,97.8e9,299.953e9]
})
OPEC_df
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN |
3 | Gabon | Central Africa | 2119275 | 210820.0 | 2.000000e+09 |
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 |
7 | Libya | North Africa | 6678559 | 384686.0 | 4.836300e+10 |
8 | Nigeria | West Africa | 195874685 | 1999885.0 | 3.707000e+10 |
9 | Republic of the Congo | Central Africa | 5125821 | 260000.0 | 1.600000e+09 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 |
For large DataFrames, rather than printing out large amounts of data, we can take a peek at some data with the 'head' and 'tail' functions:
OPEC_df.head(3)
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN |
OPEC_df.tail(3)
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 |
We can also take a look at what columns we have, and what type of data they store, with the 'info' function.
OPEC_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 13 entries, 0 to 12 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country 13 non-null object 1 Region 13 non-null object 2 Population 13 non-null int64 3 Oil Production 12 non-null float64 4 Proven Reserves 12 non-null float64 dtypes: float64(2), int64(1), object(2) memory usage: 648.0+ bytes
We can also get a statistical description of our data. The 'describe' function will return basic stats on all columns with numeric data. There are also functions for each individual statistic.
OPEC_df.describe()
Population | Oil Production | Proven Reserves | |
---|---|---|---|
count | 1.300000e+01 | 1.200000e+01 | 1.200000e+01 |
mean | 3.697980e+07 | 2.765285e+06 | 9.800717e+10 |
std | 5.295242e+07 | 2.796082e+06 | 1.021860e+11 |
min | 1.308975e+06 | 2.108200e+05 | 1.600000e+09 |
25% | 5.125821e+06 | 1.107442e+06 | 1.125575e+10 |
50% | 2.888712e+07 | 2.138426e+06 | 7.308150e+10 |
75% | 3.843360e+07 | 3.327297e+06 | 1.466842e+11 |
max | 1.958747e+08 | 1.046071e+07 | 2.999530e+11 |
# get the mean of a single column
OPEC_df['Oil Production'].mean()
2765284.8333333335
Pandas can also be used to write a DataFrame into an excel/csv format, or read in a DataFrame from a file.
# write OPEC_df to an excel spreadsheet
OPEC_df.to_csv('OPEC_df.csv', index=False)
# create a new DataFrame from OPEC_df.csv
OPEC_df_copy = pd.read_csv('OPEC_df.csv')
OPEC_df_copy
Country | Region | Population | Oil Production | Proven Reserves | |
---|---|---|---|---|---|
0 | Algeria | North Africa | 42228408 | 1348361.0 | 1.220000e+10 |
1 | Angola | Southern Africa | 30809787 | 1769615.0 | 8.423000e+09 |
2 | Equatorial Guinea | Central Africa | 1308975 | NaN | NaN |
3 | Gabon | Central Africa | 2119275 | 210820.0 | 2.000000e+09 |
4 | Iran | Middle East | 81800188 | 3990956.0 | 1.575300e+11 |
5 | Iraq | Middle East | 38433600 | 4451516.0 | 1.430690e+11 |
6 | Kuwait | Middle East | 4137312 | 2923825.0 | 1.015000e+11 |
7 | Libya | North Africa | 6678559 | 384686.0 | 4.836300e+10 |
8 | Nigeria | West Africa | 195874685 | 1999885.0 | 3.707000e+10 |
9 | Republic of the Congo | Central Africa | 5125821 | 260000.0 | 1.600000e+09 |
10 | Saudi Arabia | Middle East | 33702756 | 10460710.0 | 2.665780e+11 |
11 | UAE | Middle East | 9630959 | 3106077.0 | 9.780000e+10 |
12 | Venezuela | South America | 28887118 | 2276967.0 | 2.999530e+11 |
Obtain a spreadsheet with some data you would like to explore. You can use the example 'countries.csv' on the course page.
Make sure the spreadsheet is in the sample folder as this notebook. Read the file in as a DataFrame, and use the above functions to explore the data.