#!/usr/bin/env python # coding: utf-8 # # Module 3 # # ## Video 13: Pandas # **Python for the Energy Industry** # # Pandas is a python module design for working with tabular data. The core data structure of pandas is the DataFrame. DataFrames share a lot in common with numpy arrays, with two main differences: # - They can also store non-numeric data # - The columns in a DataFrame are generally given text labels # # A DataFrame can be created by pandas from a dictionary, where each key is a column label, and the values are the corresponding values for each 'entry'. # In[1]: import pandas as pd country_df = pd.DataFrame({ 'Country': ['China','US','Russia','UK'], 'Population': [1439323776, 331002651, 145934462, 67886011], 'HDI': [0.758, 0.920, 0.824, 0.920] }) print(country_df) # In this example, there a 4 entries representing countries, and corresponding population and HDI values. A particular column can be accessed in the same way as data is accessed in a dictionary: # In[2]: print(country_df['Population']) # Or multiple columns can be accessed at once: # In[3]: print(country_df[['Country','Population']]) # You can see that these entries have numeric indicies from 0-3. We can also use text labels for indices instead, by setting one of the columns to be the index: # In[4]: country_df.set_index('Country',inplace=True) print(country_df) # This can make it a bit easier to read data from a single column: # In[5]: print(country_df['HDI']) # We can also access all data corresponding to a single entry in the DataFrame. This can be done either by the entry name (if text indices are being used) or by its numerical index. # In[6]: # Accessing the third entry in the DataFrame print(country_df.iloc[2]) # In[7]: # Accessing the entry with the index 'China' print(country_df.loc['China']) # *Note: we will be working a lot with pandas a lot throughout the course. If you want to learn more about any particular features of pandas, check out the [pandas documentation.](https://pandas.pydata.org/docs/)* # ### Exercise # # You can use numpy arrays as a source of data when creating a DataFrame. Make a DataFrame with two columns 'A' and 'B', each of which have 10 random numbers. # In[ ]: