Data origins from https://opendata.swiss/en/dataset/elektrizitatsproduktionsanlagen and we use the following .csv files.
We use pd.read_csv
to read the csv files into a DataFrame
.
After reading we set the index to the corresponding column which makes it easier to join tables and join the (sub)-category names in order to have all information in one epp
table
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams["figure.figsize"] = (20, 10)
epp = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/ElectricityProductionPlant.csv', parse_dates=['BeginningOfOperation']).set_index('xtf_id')
mainCat = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/MainCategoryCatalogue.csv').set_index('Catalogue_id')
subCat = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/SubCategoryCatalogue.csv').set_index('Catalogue_id')
plantCat = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/PlantCategoryCatalogue.csv').set_index('Catalogue_id')
lang='de'
epp = epp.merge(mainCat[lang].rename("MainCategoryName"),how='left', left_on='MainCategory', right_index=True)
epp = epp.merge(subCat[lang].rename("SubCategoryName"),how='left', left_on='SubCategory', right_index=True)
epp = epp.merge(plantCat[lang].rename("PlantCategoryName"),how='left', left_on='PlantCategory', right_index=True)
display(epp.head())
Address | PostCode | Municipality | Canton | BeginningOfOperation | InitialPower | TotalPower | MainCategory | SubCategory | PlantCategory | _x | _y | MainCategoryName | SubCategoryName | PlantCategoryName | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
xtf_id | |||||||||||||||
5646 | Rue des Creusets 41 | 1948 | Fionnay | VS | 1958-03-07 | 1872000.0 | 1872000.0 | maincat_1 | subcat_1 | plantcat_6 | 2589880.0 | 1097661.0 | Wasserkraft | Wasserkraft | Pumpspeicherkraftwerk |
5686 | Binenweg 5 | 3904 | Naters | VS | 1969-09-01 | 349576.0 | 349576.0 | maincat_1 | subcat_1 | plantcat_7 | 2644115.0 | 1131390.0 | Wasserkraft | Wasserkraft | Speicherkraftwerk |
5726 | Robbia 504G | 7741 | San Carlo | GR | 1910-11-03 | 29150.0 | 29150.0 | maincat_1 | subcat_1 | plantcat_2 | 2801863.0 | 1136379.0 | Wasserkraft | Wasserkraft | Ausleitkraftwerk |
5727 | Via Principale 16 | 7744 | Campocologno | GR | 1907-03-01 | 55000.0 | 55000.0 | maincat_1 | subcat_1 | plantcat_7 | 2808646.0 | 1123676.0 | Wasserkraft | Wasserkraft | Speicherkraftwerk |
5730 | Büdemli 65B | 7240 | Küblis | GR | 1922-01-01 | 44200.0 | 44200.0 | maincat_1 | subcat_1 | plantcat_7 | 2778481.0 | 1198505.0 | Wasserkraft | Wasserkraft | Speicherkraftwerk |
Create a pie chart that displays the relative contribution of the main categories to the total power produced in Switzerland.
Create a pie chart that displays the relative contribution of the main categories to the total number of power plants in Switzerland.
(Optional) Can you plot both figures as subplots in one plot?
(Optional) Plot the relative contribution for the sub categories for maincat_2
.
Hints
DataFrame.groupby()
function and sum on the TotalPower
column: epp.groupby('MainCategoryName')['TotalPower'].sum()
Series.plot.pie()
Pandas Doc.count()
Use sns.countplot
to create a bar chart that displays the number of fossil fuel power plants (epp.MainCategory == 'maincat_4'
) for each canton. Can you also display the different sub-categories?
Create a stacked bar chart that displays the contribution to alternative energies (epp.MainCategory == 'maincat_2'
) for each canton.
(Optional) Can you create a similar bar chart with seaborn, i.e. sns.barplot()
? P
(Optional) Play around with the different seaborn
themes darkgrid
, whitegrid
, dark
, white
, and ticks
.
Hints
sns.countplot
you can provide your entier datasetDataFrame.groupby()
function and sum on the TotalPower
column: epp.groupby(['Canton', 'SubCategoryName'], as_index=False)['TotalPower'].sum()
. Furthermore, you can use DataFrame.pivot_table()
to explode the sub categories into columns.
Step 1: Create a line plot over the time that cumulates the energy that is produced until then.
Step 2: Create a line plot with multiple lines. Each line represents the cumulative sum of the energy produced by one main category
Step 3: Create an area plot with the same data as in step 2. With the area plot you can show the individual as well as the total values.
Hints
Step 1:
df = epp.groupby('BeginningOfOperation')['TotalPower' ].sum().sort_index()
you can create a DataFrame
with the added TotalPower
for each BeginningOfOperations
datedf.cumsum
function to return the cumulative sum that you can then plot directly with pandas into a df.plot.line
plot.Step 2:
groupby
not only by BeginningOfOperation
but by ['BeginningOfOperation', 'MainCategoryName']
. Use as_index=False
to have a DataFrame
as a result of the grouping and not a Series
DataFrame.pivot_table()
to explode the main categories into columns.data.replace(np.nan, 0)
to replace missing values with 0