#!/usr/bin/env python
# coding: utf-8

# # Module 3
# 
# ## Video 13: Pandas
# **Python for the Energy Industry**
# 
# Pandas is a python module design for working with tabular data. The core data structure of pandas is the DataFrame. DataFrames share a lot in common with numpy arrays, with two main differences:
# - They can also store non-numeric data
# - The columns in a DataFrame are generally given text labels
# 
# A DataFrame can be created by pandas from a dictionary, where each key is a column label, and the values are the corresponding values for each 'entry'.

# In[1]:


import pandas as pd

country_df = pd.DataFrame({
    'Country': ['China','US','Russia','UK'],
    'Population': [1439323776, 331002651, 145934462, 67886011],
    'HDI': [0.758, 0.920, 0.824, 0.920]
})

print(country_df)


# In this example, there a 4 entries representing countries, and corresponding population and HDI values. A particular column can be accessed in the same way as data is accessed in a dictionary:

# In[2]:


print(country_df['Population'])


# Or multiple columns can be accessed at once:

# In[3]:


print(country_df[['Country','Population']])


# You can see that these entries have numeric indicies from 0-3. We can also use text labels for indices instead, by setting one of the columns to be the index:

# In[4]:


country_df.set_index('Country',inplace=True)

print(country_df)


# This can make it a bit easier to read data from a single column:

# In[5]:


print(country_df['HDI'])


# We can also access all data corresponding to a single entry in the DataFrame. This can be done either by the entry name (if text indices are being used) or by its numerical index.

# In[6]:


# Accessing the third entry in the DataFrame
print(country_df.iloc[2])


# In[7]:


# Accessing the entry with the index 'China'
print(country_df.loc['China'])


# *Note: we will be working a lot with pandas a lot throughout the course. If you want to learn more about any particular features of pandas, check out the [pandas documentation.](https://pandas.pydata.org/docs/)*

# ### Exercise
# 
# You can use numpy arrays as a source of data when creating a DataFrame. Make a DataFrame with two columns 'A' and 'B', each of which have 10 random numbers.

# In[ ]: