National generation capacity: Processing notebook

This Notebook is part of the National Generation Capacity Datapackage of Open Power System Data.

Table of Contents¶

1. Introductory notes
2. Script setup
3. Data download and processing
4. Convert stacked data to crosstable format
5. Output
6. Documentation of the data package

1. Introductory notes¶

The script processes the compiled nationally aggregated generation capacity for European countries. Due to varying formats and data specifications of references for national generation capacity, the script firstly focuses on rearranging the manually compiled data. Thus, the script itself does not collect, select, download or manage data from original sources. Secondly, international data sources, such as EUROSTAT and ENTSO-E, are directly downloaded from original web sources and complement the initial data set.

2. Script setup¶

In [1]:

# some functions and classes that are defined in seperate files
import functions.helper_functions as func
import functions.soaf as soaf

# core packages
import os
import pandas as pd
import numpy as np

# packages to copy files, write sqllite data bases and manipulate excel files
import shutil
import sqlite3
import openpyxl
from openpyxl.styles import PatternFill, colors, Font, Alignment
from openpyxl.utils import get_column_letter

import yaml
import json

3. Data download and processing¶

We compile data from different national and international sources. Firstly, national data sources are manually compiled due to varying data formats and specifications. Secondly, international sources are compiled directly and appended to the compiled data set. The international data sources comprise:

In the following section, the data sets are downloaded and uploaded to Python.

3.1 Manually compiled dataset¶

The manually compiled dataset is imported and rearranged to a DataFrame for further processing. The dataset comprises for each European country and specified generation technology different data entries, which are based on different sources. As these sources differ by country and year, information on the corresponding reference are directly given with the data entry.

In [2]:

data_file = 'National_Generation_Capacities.xlsx'
filepath = os.path.join('input', data_file)

# Read data into pandas
data_raw = pd.read_excel(filepath,
                         sheet_name='Summary',
                         header=None,
                         na_values=['-'],
                         skiprows=0)




# Deal with merged cells from Excel: fill first three rows with information
data_raw.iloc[0:2] = data_raw.iloc[0:2].fillna(method='ffill', axis=1)

# Set index for rows
data_raw = data_raw.set_index([0])
data_raw.index.name = 'technology'

# Extract energylevels from raw data for later use
energylevels_raw = data_raw.iloc[:, 0:5]

energylevels_raw.head()

Out[2]:

	1	2	3	4	5
technology
NaN	Energy source levels and technology	Energy source levels and technology	Energy source levels and technology	Energy source levels and technology	Energy source levels and technology
NaN	Total generation capacity	Generation capacity by energy type (fossil, nu...	Generation capacity by fuel or energy source (...	Generation capacity by fuel or energy source a...	Generation capacity by fuel and technology (e....
NaN	NaN	NaN	NaN	NaN	NaN
NaN	NaN	NaN	NaN	NaN	NaN
NaN	NaN	NaN	NaN	NaN	NaN

In [3]:

# Delete definition of energy levels from raw data
data_raw.drop(data_raw.columns[[0, 1, 2, 3, 4, 5]], axis=1, inplace=True)

level_names = ['country', 'type', 'year', 'source',
               'source_type', 'weblink', 'capacity_definition']
# Set multiindex column names
data_raw.columns = pd.MultiIndex.from_arrays(data_raw[:7].values,
                                             names=level_names)

# Remove 3 rows which are already used as column names
data_raw = data_raw[pd.notnull(data_raw.index)]

# Extract the ordering of technologies
technology_order = data_raw.index.str.replace('- ', '').values.tolist()

data_raw.head()

Out[3]:

country	AT						BE				...	SI			SK
type	Installed capacity in MW						Installed capacity in MW				...	Installed capacity in MW			Installed capacity in MW
year	2014.0	2015.0	2016.0	2017.0	2018.0	2019.0	2015.0	2016.0	2017.0	2018.0	...	2015.0	2016.0	2017.0	2013.0	2014.0	2015.0	2016.0	2017.0	2018.0	NaN
source	e-control	e-control	e-control	e-control	e-control	e-control	ELIA	ELIA	ELIA	ELIA	...	Agencija za energijo	Agencija za energijo	Agencija za energijo	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia
source_type	Regulatory Authority	Regulatory Authority	Regulatory Authority	Regulatory Authority	Regulatory Authority	Regulatory Authority	Transmission System Operator	Transmission System Operator	Transmission System Operator	Transmission System Operator	...	Ministry	Ministry	Ministry	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	NaN
weblink	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-2019_KW2EPLTyp.xlsx/e9993ee3-838f-ed2e-2031-7503418fac2d?t=1596094915390	https://www.elia.be/en/grid-data/power-generation/generating-facilities	https://www.elia.be/en/grid-data/power-generation/generating-facilities	https://www.elia.be/en/grid-data/power-generation/generating-facilities	https://www.elia.be/en/grid-data/power-generation/generating-facilities	...	https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2015/f1302ae0-7267-4ae7-b74d-7ce8c4323043	https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2016/de8cc94f-b3f6-4d32-8e79-ce3b0ec386f5	https://www.agen-rs.si/documents/54870/68629/a/78f74b68-dbfc-415e-ab88-882652558d94	Link unavailable	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRZJIHHEGAOxQwwYEuvSASM31AE7jWsnb1_Z8aRtGEM7LQcNsP8Hu6t_hQS6R6LJ99WXfFdtmvxRPS-E-TliAR8OdRuAGSMIrtNR5CWWDhTQ3UngeLZP2RSATz0DAttPYysiBGyChArbURQl0ywDL8NjCIjuQZimADE783DhsuHP-Gw6e0vQY3MdgjBM0oxH5GZOT_x7_nziOCpszEdGFpKZS898h6CLt6GHP-f_M2wwUOFE5Z4lkECbbwnE5fLcELrrPwo6-DTo6Z87Njp5h3bzxx_8E-986P-tY7nKf75pU-bqlDOUIYHEstltdw9osSme8wdNNtr2e3EFanms1vmyklewf5a79euvbwCXukGkoVFJLLWsQLO4WWiSyWVBVyujMI3jt7fLqkQLHUtSyJxpK4xXGjUtS-OlBUpdMpaTEuu8_LiX7WER3Vb_wLeL_b0XsZq3YIh1nzoTwjPW4lsC0TfL4iTpmua-Gt4E42SGbYt6zii-VQM7dzBPpiYG0M-CLsP1WV70nQndgrmJQn9TS5TIBi1Uy9jFnkwwuttX8oDSZvOtVsdh8peO8AFdI1EV9eCwrAcw0DFnJsVqMy1M1P3xfK2-Pj0JW5l40-zkyw7d_3cXb-v0_ao5ea0qbR27sX-4cQkVi08_AHhGIRI!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsVZIgDhijIHYIQYMiXXpCAwNdcBOQ-3k7Ss7nrQNY2in1YEB9v9gd_WvEEf3iDdiX30RbbVtxKN8XnH9c2j4bDzGFoChTcC_TiahG5sYKKC7k8B2LY8acwA2dzXwLS-JzJAQsAjiMmyFYRjP0xTcVJ2CT7ALQZIARMaZhwvLgj_j0_niLUHXWGLwgyBOUhaSmyU98e_5s5lty7C2nGhpQBYOPfM9gj7eggH-nP_PsGaADMcy9zSGGLp8R8Avl-cE0F__UdDDJ_5A_5yp1svbtJ8__uCfePtD_29t05H-83SLGg6mzEAp4ojnTbtrH9Bqmz2LB6VolN337Ark5bHaiLwqrmD_XLSb11_fiLUotVKjSi4IVijOiMKKfK0wprFCFQwyio_f3uXVGq0wlEKYTFd0zRAKZSJXmF6aikl1BkIzszIvP-5ld1h4v9U_8N1if-9FJOfNH6vYo_aMsNTo8B0BH5plfpL0TfNQDW-CabxQLZO69iS6lQO7tFUWz3UVAJ8FfYYbsjwfOhP6BUsdBd62LlBcNGglW2Zc7MlMRXf7qjigpNl-q-VxGP-lIzxA14hXWT065PUIRlhlhk5VuZmmSuT98Xytvj49cUuaeNu0xUuL7v-7i3d18r5qRl6rStlETuQdbhxC-erTD_ffAqA!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRYkIXHEGAOxQwwYGnTpAMYNdcBOQ-3m7Ss7nrQ1Y2in1YEB9v9gd_WvkEAPSDTZvvqctdW2yZ7kcyr0TwHz-HismgCMTsC7jSeBExkqEEAfTwLLMV3C5gB87lDwTDcOjQBjMDESMmwGQRDNkwScRJuCh1UH_DgGCNmZhyvLhD_jk_niLUGHLVXwfD-KEx7guyU58e_585llyTBdTmji44VNznyPoI83YYA_5_8zTBnIcCRzTyKIoMt3BOJ6ebYP_fUfBT187A30z57SXt4i_fzxB__EWxf9v7cMW_rP1U3CbJVwhhIkkCiadtc-onSbv2SPStkou2_5DcjLU7XJiqq8gf1L2W5ef32TGWrGVsZaWfFMV4hOqMKpvCsxLwzAJIeCHr-9K6oVSiHT8hUjupIX60Ihqk4VyZfKeo1xSTnOoNAu97I7LKLf6hd8t9jfexHKefPGmuoSa4Z5wjp8RyCGZlmcJH3TPFTDm2AaLTTTII41Ce_lwC4tjUdzXQNQz4I-ww1ZXgydCf2CpY58d1uXKCoblMqWsas9mck93VflAcXN9mstj8PoLx3hArpFosrr0aGoRzBSNc50osnNNDQs74_na_Xl-VmY0sTbpi2_t-jhv7t4V8fvq-b4taqUTWiH7uHOxkSkH34AcI_n7A!!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZRRd5owFMc_yx76iLmEQJJHRASqpQLCal52AOPKLGhXp-u3X7SebpMjbGfLAwe4_x_ce_O_QQI9INHk--pzvqs2Tf6knhfC-hTRgA2Hug1AzREEt-ko8hKuAwH08SRwPNsndArApp4Jge2nMY8MA2wDCRW2oyhKplkGXobHEBi6B2GaAsT0zMOVZcOf8dl09pagR-c6BGGYpBmLjLs5OfHv-bOJ46iwOR-ZWWjMXHLmOwRdvA09_Dn_n2GTggonKvcsgQTafEsgrpfnhtBd_1HQwadBT__csdnJO6SbP_7gn3jnov_3DneV_3zLJtTVCaMoQwKJstltd49osSle8kdNNtr2W3ED6vJUrfOykjewf5G79euvbzDnJTcLoi1zk2gEl1IruGFpRcFXJsF0xTAcv70tqyVa6IwA6HylSWZxjchlqTFpgsYMbEnMTGtF8eVetodFdFv9gm8X-3svYjVvwRDrPnEmBstoi28JRN8si5Oka5r7angTjJMZtjnxnFF8rwZ27mCWTC2sWngWdBmuz_Ki70zoFswtFPqbWqJENmihWkav9mSi9nRfyQNKm83XWh2HyV86wgd0i0RV1INDWQ9goGNGLYLVZnJsqPvj-Vp9eX4WtjLxptnJ7zv08N9dvK3T91Uz47WqtHXsxv7hzjWIWHz4AZRUN9Q!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=da19bcb8-8c9c-4757-aba5-2d0720266efa
capacity_definition	Gross capacity	Gross capacity	Gross capacity	Gross capacity	Gross capacity	Gross capacity	Unknown	Unknown	Unknown	Unknown	...	Net capacity	Net capacity	Net capacity	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	NaN
technology
Fossil fuels	7243.63	7059.07	7323	6469	6492	6041	NaN	NaN	NaN	NaN	...	1442.97	1450.5	1223	3348	2907	2586	3164.51	3266.97	2581.47	NaN
- Lignite	0	0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
- Hard coal	1171	1171	NaN	1012	1012	598	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
- Oil	333.736	173.636	NaN	168	168	164	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	140	138	138	NaN	NaN	NaN	NaN
- Natural gas	4888.66	4820.19	NaN	4853	4876	4429	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	1016	1022	959	NaN	NaN	NaN	NaN

5 rows × 157 columns

In [4]:

data_raw.columns[data_raw.columns.duplicated()]

Out[4]:

MultiIndex([], names=['country', 'type', 'year', 'source', 'source_type', 'weblink', 'capacity_definition'])

In [5]:

data_raw[data_raw.index.duplicated()]

Out[5]:

country	AT						BE				...	SI			SK
type	Installed capacity in MW						Installed capacity in MW				...	Installed capacity in MW			Installed capacity in MW
year	2014.0	2015.0	2016.0	2017.0	2018.0	2019.0	2015.0	2016.0	2017.0	2018.0	...	2015.0	2016.0	2017.0	2013.0	2014.0	2015.0	2016.0	2017.0	2018.0	NaN
source	e-control	e-control	e-control	e-control	e-control	e-control	ELIA	ELIA	ELIA	ELIA	...	Agencija za energijo	Agencija za energijo	Agencija za energijo	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia	Statistical Office of Slovakia
source_type	Regulatory Authority	Regulatory Authority	Regulatory Authority	Regulatory Authority	Regulatory Authority	Regulatory Authority	Transmission System Operator	Transmission System Operator	Transmission System Operator	Transmission System Operator	...	Ministry	Ministry	Ministry	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	NaN
weblink	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538	https://www.e-control.at/documents/1785851/1811756/BeStGes-2019_KW2EPLTyp.xlsx/e9993ee3-838f-ed2e-2031-7503418fac2d?t=1596094915390	https://www.elia.be/en/grid-data/power-generation/generating-facilities	https://www.elia.be/en/grid-data/power-generation/generating-facilities	https://www.elia.be/en/grid-data/power-generation/generating-facilities	https://www.elia.be/en/grid-data/power-generation/generating-facilities	...	https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2015/f1302ae0-7267-4ae7-b74d-7ce8c4323043	https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2016/de8cc94f-b3f6-4d32-8e79-ce3b0ec386f5	https://www.agen-rs.si/documents/54870/68629/a/78f74b68-dbfc-415e-ab88-882652558d94	Link unavailable	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRZJIHHEGAOxQwwYEuvSASM31AE7jWsnb1_Z8aRtGEM7LQcNsP8Hu6t_hQS6R6LJ99WXfFdtmvxRPS-E-TliAR8OdRuAGSMIrtNR5CWWDhTQ3UngeLZP2RSATz0DAttPYysiBGyChArbURQl0ywDL8NjCIjuQZimADE783DhsuHP-Gw6e0vQY3MdgjBM0oxH5GZOT_x7_nziOCpszEdGFpKZS898h6CLt6GHP-f_M2wwUOFE5Z4lkECbbwnE5fLcELrrPwo6-DTo6Z87Njp5h3bzxx_8E-986P-tY7nKf75pU-bqlDOUIYHEstltdw9osSme8wdNNtr2e3EFanms1vmyklewf5a79euvbwCXukGkoVFJLLWsQLO4WWiSyWVBVyujMI3jt7fLqkQLHUtSyJxpK4xXGjUtS-OlBUpdMpaTEuu8_LiX7WER3Vb_wLeL_b0XsZq3YIh1nzoTwjPW4lsC0TfL4iTpmua-Gt4E42SGbYt6zii-VQM7dzBPpiYG0M-CLsP1WV70nQndgrmJQn9TS5TIBi1Uy9jFnkwwuttX8oDSZvOtVsdh8peO8AFdI1EV9eCwrAcw0DFnJsVqMy1M1P3xfK2-Pj0JW5l40-zkyw7d_3cXb-v0_ao5ea0qbR27sX-4cQkVi08_AHhGIRI!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsVZIgDhijIHYIQYMiXXpCAwNdcBOQ-3k7Ss7nrQNY2in1YEB9v9gd_WvEEf3iDdiX30RbbVtxKN8XnH9c2j4bDzGFoChTcC_TiahG5sYKKC7k8B2LY8acwA2dzXwLS-JzJAQsAjiMmyFYRjP0xTcVJ2CT7ALQZIARMaZhwvLgj_j0_niLUHXWGLwgyBOUhaSmyU98e_5s5lty7C2nGhpQBYOPfM9gj7eggH-nP_PsGaADMcy9zSGGLp8R8Avl-cE0F__UdDDJ_5A_5yp1svbtJ8__uCfePtD_29t05H-83SLGg6mzEAp4ojnTbtrH9Bqmz2LB6VolN337Ark5bHaiLwqrmD_XLSb11_fiLUotVKjSi4IVijOiMKKfK0wprFCFQwyio_f3uXVGq0wlEKYTFd0zRAKZSJXmF6aikl1BkIzszIvP-5ld1h4v9U_8N1if-9FJOfNH6vYo_aMsNTo8B0BH5plfpL0TfNQDW-CabxQLZO69iS6lQO7tFUWz3UVAJ8FfYYbsjwfOhP6BUsdBd62LlBcNGglW2Zc7MlMRXf7qjigpNl-q-VxGP-lIzxA14hXWT065PUIRlhlhk5VuZmmSuT98Xytvj49cUuaeNu0xUuL7v-7i3d18r5qRl6rStlETuQdbhxC-erTD_ffAqA!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRYkIXHEGAOxQwwYGnTpAMYNdcBOQ-3m7Ss7nrQ1Y2in1YEB9v9gd_WvkEAPSDTZvvqctdW2yZ7kcyr0TwHz-HismgCMTsC7jSeBExkqEEAfTwLLMV3C5gB87lDwTDcOjQBjMDESMmwGQRDNkwScRJuCh1UH_DgGCNmZhyvLhD_jk_niLUGHLVXwfD-KEx7guyU58e_585llyTBdTmji44VNznyPoI83YYA_5_8zTBnIcCRzTyKIoMt3BOJ6ebYP_fUfBT187A30z57SXt4i_fzxB__EWxf9v7cMW_rP1U3CbJVwhhIkkCiadtc-onSbv2SPStkou2_5DcjLU7XJiqq8gf1L2W5ef32TGWrGVsZaWfFMV4hOqMKpvCsxLwzAJIeCHr-9K6oVSiHT8hUjupIX60Ihqk4VyZfKeo1xSTnOoNAu97I7LKLf6hd8t9jfexHKefPGmuoSa4Z5wjp8RyCGZlmcJH3TPFTDm2AaLTTTII41Ce_lwC4tjUdzXQNQz4I-ww1ZXgydCf2CpY58d1uXKCoblMqWsas9mck93VflAcXN9mstj8PoLx3hArpFosrr0aGoRzBSNc50osnNNDQs74_na_Xl-VmY0sTbpi2_t-jhv7t4V8fvq-b4taqUTWiH7uHOxkSkH34AcI_n7A!!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZRRd5owFMc_yx76iLmEQJJHRASqpQLCal52AOPKLGhXp-u3X7SebpMjbGfLAwe4_x_ce_O_QQI9INHk--pzvqs2Tf6knhfC-hTRgA2Hug1AzREEt-ko8hKuAwH08SRwPNsndArApp4Jge2nMY8MA2wDCRW2oyhKplkGXobHEBi6B2GaAsT0zMOVZcOf8dl09pagR-c6BGGYpBmLjLs5OfHv-bOJ46iwOR-ZWWjMXHLmOwRdvA09_Dn_n2GTggonKvcsgQTafEsgrpfnhtBd_1HQwadBT__csdnJO6SbP_7gn3jnov_3DneV_3zLJtTVCaMoQwKJstltd49osSle8kdNNtr2W3ED6vJUrfOykjewf5G79euvbzDnJTcLoi1zk2gEl1IruGFpRcFXJsF0xTAcv70tqyVa6IwA6HylSWZxjchlqTFpgsYMbEnMTGtF8eVetodFdFv9gm8X-3svYjVvwRDrPnEmBstoi28JRN8si5Oka5r7angTjJMZtjnxnFF8rwZ27mCWTC2sWngWdBmuz_Ki70zoFswtFPqbWqJENmihWkav9mSi9nRfyQNKm83XWh2HyV86wgd0i0RV1INDWQ9goGNGLYLVZnJsqPvj-Vp9eX4WtjLxptnJ7zv08N9dvK3T91Uz47WqtHXsxv7hzjWIWHz4AZRUN9Q!/dz/d5/L2dBISEvZ0FBIS9nQSEh/	https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=da19bcb8-8c9c-4757-aba5-2d0720266efa
capacity_definition	Gross capacity	Gross capacity	Gross capacity	Gross capacity	Gross capacity	Gross capacity	Unknown	Unknown	Unknown	Unknown	...	Net capacity	Net capacity	Net capacity	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	NaN
technology

0 rows × 157 columns

In [6]:

# Reshape dataframe to list
data_opsd = pd.DataFrame(data_raw.stack(level=level_names))

# Reset index for dataframe
data_opsd.reset_index(inplace=True)
data_opsd['technology'] = data_opsd['technology'].str.replace('- ', '')

data_opsd.rename(columns={0: 'capacity'}, inplace=True)
data_opsd['capacity'] = pd.to_numeric(data_opsd['capacity'], errors='coerce')

# For some source, permission to publish data
banlist = ['ELIA', 'BMWi', 'Mavir']
davail = 'data available, but cannot be provided'
data_opsd.loc[data_opsd['source'].isin(banlist), 'comment'] = davail

data_opsd.head()

Out[6]:

	technology	country	type	year	source	source_type	weblink	capacity_definition	capacity	comment
0	Fossil fuels	AT	Installed capacity in MW	2014	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	Gross capacity	7243.634	NaN
1	Fossil fuels	AT	Installed capacity in MW	2015	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	Gross capacity	7059.065	NaN
2	Fossil fuels	AT	Installed capacity in MW	2016	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	Gross capacity	7323.000	NaN
3	Fossil fuels	AT	Installed capacity in MW	2017	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	Gross capacity	6469.000	NaN
4	Fossil fuels	AT	Installed capacity in MW	2018	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	Gross capacity	6492.000	NaN

3.2 EUROSTAT data¶

EUROSTAT publishes annual structural data on national electricity generation capacities for European countries. The dataset is available in the EUROSTAT database within the category 'Environment and Energy' (nrg_113a).

In [7]:

url_eurostat = 'Link unavailable'

filepath_eurostat = os.path.join('input', 'Eurostat', 'Eurostat.tsv.gz')

data_eurostat = pd.read_csv(filepath_eurostat,
                               compression='gzip',
                               sep='\t|,',
                               engine='python'
                               )
data_eurostat.head()

Out[7]:

	unit	product	indic_nrg	geo\time	2016	2015	2014	2013	2012	2011	...	1999	1998	1997	1996	1995	1994	1993	1992	1991	1990
0	MW	9007	12_1176011	AL	98	98	98	98	97	0	...	0	0	0	0	0	0	0	0	0	0
1	MW	9007	12_1176011	AT	5561	5964	6136	6326	6347	6339	...	5089	4917	5199	5031	5028	5079	5061	5133	4750	4743
2	MW	9007	12_1176011	BA	2065	1765	1765	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	MW	9007	12_1176011	BE	7182	7199	7202	7559	8182	9128	...	7746	7650	6945	7126	7211	7298	6433	6394	6445	6490
4	MW	9007	12_1176011	BG	3635	3854	4410	4636	4849	4495	...	5172	4977	0	0	0	0	0	0	0	0

5 rows × 31 columns

In [8]:

id_vars = ['unit', 'product','indic_nrg', 'geo\\time']
data_eurostat = pd.melt(data_eurostat, id_vars=id_vars,
                        var_name='year', value_name='value')

data_eurostat.head()

Out[8]:

	unit	product	indic_nrg	geo\time	year	value
0	MW	9007	12_1176011	AL	2016	98
1	MW	9007	12_1176011	AT	2016	5561
2	MW	9007	12_1176011	BA	2016	2065
3	MW	9007	12_1176011	BE	2016	7182
4	MW	9007	12_1176011	BG	2016	3635

In [9]:

data_definition = pd.read_csv(os.path.join('input', 'definition_EUROSTAT_indic.txt'),
                              header=None,
                              names=['indic', 'description',
                                     'energy source'],
                              sep='\t')

data_eurostat = data_eurostat.merge(data_definition,
                                    how='left',
                                    left_on='indic_nrg',
                                    right_on='indic')

The classification of generation capacities in the EUROSTAT dataset is specified in Regulation (EC) No 1099/2008 (Annex B, 3.3). The available EUROSTAT dataset nrg_113a covers the following indicators:

indic_nrg	Description	Technology in OPSD
12_1176011	Electrical capacity, main activity producers - Combustible Fuels	Fossil fuels & bioenergy
12_1176012	Electrical capacity, autoproducers - Combustible Fuels	Fossil fuels & bioenergy
12_1176061	Electrical capacity, main activity producers - Mixed plants
12_1176101	Electrical capacity, main activity producers - Other Sources
12_1176102	Electrical capacity, autoproducers - Other Sources
12_1176111	Electrical capacity, main activity producers - Steam
12_1176112	Electrical capacity, autoproducers - Steam
12_1176121	Electrical capacity, main activity producers - Gas Turbine
12_1176122	Electrical capacity, autoproducers - Gas Turbine
12_1176131	Electrical capacity, main activity producers - Combined Cycle
12_1176132	Electrical capacity, autoproducers - Combined Cycle
12_1176141	Electrical capacity, main activity producers - Internal Combustion
12_1176142	Electrical capacity, autoproducers - Internal Combustion
12_1176401	Electrical capacity, main activity producers - Other Type of Generation
12_1176402	Electrical capacity, autoproducers - Other Type of Generation
12_1176253	Net maximum capacity - Municipal Wastes	Non-renewable waste
12_1176263	Net maximum capacity - Wood/Wood Wastes/Other Solid Wastes	Other bioenergy and renewable waste
12_1176273	Net maximum capacity - Biogases	Biomass and biogas
12_1176283	Net maximum capacity - Industrial Wastes (non-renewable)	Non-renewable waste
12_1176343	Net maximum capacity - Liquid Biofuels	Biomass and biogas
12_1176031	Electrical capacity, main activity producers - Nuclear	Nuclear
12_1176032	Electrical capacity, autoproducers - Nuclear	Nuclear
12_1176051	Electrical capacity, main activity producers - Hydro	Hydro
12_1176052	Electrical capacity, autoproducers - Hydro	Hydro
12_1176071	Net electrical capacity, main activity producers - Pure Pumped Hydro	Pumped storage
12_1176072	Net electrical capacity, autoproducers - Pure Pumped Hydro	Pumped storage
12_117615	Net maximum capacity - Hydro <1 MW
12_117616	Net maximum capacity - Hydro >= 1 MW and <= 10 MW
12_117617	Net maximum capacity - Hydro 10 MW and over
12_1176301	Electrical capacity, main activity producers - Tide, wave and ocean	Marine
12_1176302	Electrical capacity, autoproducers - Tide, wave and ocean	Marine
12_1176303	Net maximum capacity - Tide, Wave, Ocean
12_1176081	Electrical capacity, main activity producers - Geothermal	Geothermal
12_1176082	Electrical capacity, autoproducers - Geothermal	Geothermal
12_1176083	Net maximum capacity - Geothermal
12_1176091	Electrical capacity, main activity producers - Wind	Wind
12_1176092	Electrical capacity, autoproducers - Wind	Wind
12_1176233	Net maximum capacity - Solar Photovoltaic	Photovoltaics
12_1176243	Net maximum capacity - Solar Thermal Electric	Concentrated solar power

Bold rows indicate top level classes within the EUROSTAT classification, whereas normal and italic rows cover different kinds of subclassifications. Especially within the top level 'Combustible fuels' different kinds of subcategorizations based on fuel or technology are available. Simarily, 'Hydro' is differentiated by type (e.g. pumped-hydro storage) or capacity classes. Italic rows are not further considered within the OPSD dataset due to the mismatch with existing technology classes.

In [10]:

data_eurostat = data_eurostat[data_eurostat['energy source'].isnull() == False]

values_as_string = data_eurostat['value'].astype(str)
string_values = values_as_string.str.split(' ', 1).str[0]
string_values.replace(':', np.nan, inplace=True)
subset_nan = string_values.isnull()

data_eurostat['value'] = string_values
data_eurostat['year'] = data_eurostat['year'].astype(int)
data_eurostat['value'] = data_eurostat['value'].astype(float)

data_eurostat.head()

Out[10]:

	unit	product	indic_nrg	geo\time	year	value	indic	description	energy source
0	MW	9007	12_1176011	AL	2016	98.0	12_1176011	Electrical capacity, main activity producers -...	Fossil fuels
1	MW	9007	12_1176011	AT	2016	5561.0	12_1176011	Electrical capacity, main activity producers -...	Fossil fuels
2	MW	9007	12_1176011	BA	2016	2065.0	12_1176011	Electrical capacity, main activity producers -...	Fossil fuels
3	MW	9007	12_1176011	BE	2016	7182.0	12_1176011	Electrical capacity, main activity producers -...	Fossil fuels
4	MW	9007	12_1176011	BG	2016	3635.0	12_1176011	Electrical capacity, main activity producers -...	Fossil fuels

In [11]:

data_eurostat = data_eurostat.drop(['unit', 'product', 'indic_nrg',
                                    'indic', 'description'], axis=1)

data_eurostat = data_eurostat.rename(columns={'geo\\time': 'country',
                                              'energy source': 'technology',
                                              'value': 'capacity'})

In [12]:

data_eurostat['country'].replace({'UK': 'GB', 'EL': 'GR'}, inplace=True)
drop_list = data_eurostat[data_eurostat['country'].isin(['EU28','EA19'])].index
data_eurostat.drop(drop_list, inplace=True)

by_columns = ['technology', 'year', 'country']
data_eurostat = pd.DataFrame(data_eurostat.groupby(by_columns)['capacity'].sum())
data_eurostat_isnull = data_eurostat['capacity'].isnull() == True
data_eurostat.reset_index(inplace=True)

data_eurostat.head()

Out[12]:

	technology	year	country	capacity
0	Biomass and biogas	1990	AL	0.0
1	Biomass and biogas	1990	AT	0.0
2	Biomass and biogas	1990	BA	0.0
3	Biomass and biogas	1990	BE	1.0
4	Biomass and biogas	1990	BG	0.0

In [13]:

eurostat_pivot = data_eurostat.pivot_table(values='capacity',
                                 index=['country','year'],
                                 columns='technology')

eurostat_pivot.head()

Out[13]:

	technology	Biomass and biogas	Concentrated solar power	Fossil fuels	Geothermal	Hydro	Marine	Non-renewable waste	Nuclear	Other bioenergy and renewable waste	Photovoltaics	Pumped storage	Wind
country	year
AL	1990	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	1991	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	1992	0.0	0.0	0.0	0.0	1453.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	1993	0.0	0.0	0.0	0.0	1453.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	1994	0.0	0.0	0.0	0.0	1453.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

In [14]:

eurostat_pivot['Differently categorized solar'] = 0
eurostat_pivot['Solar'] = eurostat_pivot[['Photovoltaics', 'Concentrated solar power']].sum(axis=1)

eurostat_pivot['Differently categorized wind'] = eurostat_pivot['Wind']

bio_arr = ['Biomass and biogas', 'Other bioenergy and renewable waste']
eurostat_pivot['Bioenergy and renewable waste'] = eurostat_pivot[bio_arr].sum(axis=1)

res_arr = ['Hydro', 'Wind', 'Solar', 'Geothermal', 'Marine', 'Bioenergy and renewable waste']
eurostat_pivot['Renewable energy sources'] = eurostat_pivot[res_arr].sum(axis=1)


eurostat_pivot['Fossil fuels'] = eurostat_pivot['Fossil fuels'] - eurostat_pivot['Bioenergy and renewable waste']
eurostat_pivot['Differently categorized fossil fuels'] = eurostat_pivot['Fossil fuels']\
                                                         - eurostat_pivot['Non-renewable waste']

total_arr = ['Fossil fuels','Nuclear','Renewable energy sources']
eurostat_pivot['Total'] = eurostat_pivot[total_arr].sum(axis=1)

eurostat_pivot.head()

01 Oct 2020 16:07:59 - numexpr.utils - INFO - NumExpr defaulting to 4 threads.

Out[14]:

	technology	Biomass and biogas	Concentrated solar power	Fossil fuels	Geothermal	Hydro	Marine	Non-renewable waste	Nuclear	Other bioenergy and renewable waste	Photovoltaics	Pumped storage	Wind	Differently categorized solar	Solar	Differently categorized wind	Bioenergy and renewable waste	Renewable energy sources	Differently categorized fossil fuels	Total
country	year
AL	1990	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0.0	0.0	0.0	0.0	0.0	0.0
	1991	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0.0	0.0	0.0	0.0	0.0	0.0
	1992	0.0	0.0	0.0	0.0	1453.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0.0	0.0	0.0	1453.0	0.0	1453.0
	1993	0.0	0.0	0.0	0.0	1453.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0.0	0.0	0.0	1453.0	0.0	1453.0
	1994	0.0	0.0	0.0	0.0	1453.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0.0	0.0	0.0	1453.0	0.0	1453.0

In [15]:

data_eurostat = eurostat_pivot.stack().reset_index().rename(columns={0: 'capacity'})

data_eurostat['source'] = 'EUROSTAT'
data_eurostat['source_type'] = 'Statistical Office'
data_eurostat['capacity_definition'] = 'Unknown'
data_eurostat['type'] = 'Installed capacity in MW'
data_eurostat['weblink'] = url_eurostat

data_eurostat.head()

Out[15]:

	country	year	technology	source	source_type	capacity_definition	type	weblink
0	AL	1990	Biomass and biogas	EUROSTAT	Statistical Office	Unknown	Installed capacity in MW	Link unavailable
1	AL	1990	Concentrated solar power	EUROSTAT	Statistical Office	Unknown	Installed capacity in MW	Link unavailable
2	AL	1990	Fossil fuels	EUROSTAT	Statistical Office	Unknown	Installed capacity in MW	Link unavailable
3	AL	1990	Geothermal	EUROSTAT	Statistical Office	Unknown	Installed capacity in MW	Link unavailable
4	AL	1990	Hydro	EUROSTAT	Statistical Office	Unknown	Installed capacity in MW	Link unavailable

3.3 ENTSO-E data¶

The ENTSO-E publishes annual data on national generation capacites in different specifications and formats. We use two relevant data sources from the ENTSOE-E, which comprises firstly statistical data within the Data Portal (up to 2015) or ENTSO-E Transparency Platform, and secondly datasets compiled within the ENTSO-E System Outlook & Adequacy Forecast (SO&AF). The ENTSO-E Transparency Platform is currently not implemented as a data source for national generation capacities.

The advantage of the ENTSO-E SO&AF is the higher granularity of the data with respect to the main fuel or technology. However, as the SO&AF provides a forecast on future system conditions in particular peak hours, the dataset also accounts for expected capacity changes throughout the years. Therefore, we only consider years which are closest to the publication year of the respective SO&AF.

3.3.1 ENTSO-E Statistical Data¶

In the following, we use the statistical data available in the Data Portal (up to 2015).

In [16]:

url_entsoe = 'https://docstore.entsoe.eu/Documents/Publications/Statistics/NGC_2010-2015.xlsx'

filepath_entsoe = func.downloadandcache(url_entsoe, 'Statistics.xls',
                                     os.path.join('ENTSO-E','Data Portal 2010-2015')
                                     )

data_entsoe_raw = pd.read_excel(filepath_entsoe)

data_entsoe_raw.head()

01 Oct 2020 16:07:59 - root - INFO - Downloading file Statistics.xls

Out[16]:

	Country	year	hydro	of which storage	of which run of river	renewable	of which wind	of which solar	of which biomass	fossil_fueals	other	representativity
0	AT	2010	12665.0	7205.0	5241.0	1031.0	1002.0	0.0	0.0	7389.0	0.0	100.0
1	AT	2011	13200.0	0.0	0.0	1542.0	1056.0	55.0	0.0	7818.0	68.0	100.0
2	AT	2012	13350.0	0.0	0.0	1906.0	1307.0	172.0	427.0	7834.0	74.0	100.0
3	AT	2013	13427.0	0.0	0.0	2305.0	1555.0	324.0	426.0	7847.0	244.0	100.0
4	AT	2014	13569.0	0.0	0.0	3413.0	2110.0	589.0	512.0	7243.0	0.0	100.0

In [17]:

dict_energy_source = {'hydro': 'Hydro',
                      'of which storage': 'Reservoir',
                      'of which run of river': 'Run-of-river',
                      'of which pumped storage': 'Pumped storage',
                      'nuclear': 'Nuclear',
                      'of which wind': 'Wind',
                      'of which solar': 'Solar',
                      'of which biomass': 'Biomass and biogas',
                      'fossil_fuels': 'Fossil fuels',
                      'other': 'Other or unspecified energy sources',
                      "Country": "country",
                      'fossil_fueals': 'Fossil fuels'}

data_entsoe_raw.rename(columns=dict_energy_source,
                       inplace=True)

data_entsoe_raw.drop(columns='representativity', inplace=True)

data_entsoe_raw.head()

Out[17]:

	country	year	Hydro	Reservoir	Run-of-river	renewable	Wind	Solar	Biomass and biogas	Fossil fuels	Other or unspecified energy sources
0	AT	2010	12665.0	7205.0	5241.0	1031.0	1002.0	0.0	0.0	7389.0	0.0
1	AT	2011	13200.0	0.0	0.0	1542.0	1056.0	55.0	0.0	7818.0	68.0
2	AT	2012	13350.0	0.0	0.0	1906.0	1307.0	172.0	427.0	7834.0	74.0
3	AT	2013	13427.0	0.0	0.0	2305.0	1555.0	324.0	426.0	7847.0	244.0
4	AT	2014	13569.0	0.0	0.0	3413.0	2110.0	589.0	512.0	7243.0	0.0

In [18]:

data_entsoe_raw['Differently categorized solar'] = data_entsoe_raw['Solar']
data_entsoe_raw['Differently categorized wind'] = data_entsoe_raw['Wind']
data_entsoe_raw['Bioenergy and renewable waste'] = data_entsoe_raw['Biomass and biogas']
data_entsoe_raw['Differently categorized fossil fuels'] = data_entsoe_raw['Fossil fuels']


data_entsoe_raw['Differently categorized hydro'] = (
        data_entsoe_raw['Hydro']
        - data_entsoe_raw['Run-of-river']
        - data_entsoe_raw['Reservoir']
        - data_entsoe_raw['Pumped storage'])

data_entsoe_raw['Differently categorized renewable energy sources'] = (
        data_entsoe_raw['renewable']
        - data_entsoe_raw['Wind']
        - data_entsoe_raw['Solar']
        - data_entsoe_raw['Biomass and biogas'])

data_entsoe_raw.drop(columns='renewable', inplace=True)

data_entsoe_raw['Renewable energy sources'] = (
        data_entsoe_raw['Hydro']
        + data_entsoe_raw['Wind']
        + data_entsoe_raw['Solar']
        + data_entsoe_raw['Bioenergy and renewable waste']
        + data_entsoe_raw['Differently categorized renewable energy sources'])

data_entsoe_raw['Total'] = (
        data_entsoe_raw['Renewable energy sources']
        + data_entsoe_raw['Nuclear']
        + data_entsoe_raw['Fossil fuels']
        + data_entsoe_raw['Other or unspecified energy sources'])

data_entsoe = pd.melt(data_entsoe_raw,
                      id_vars=['country', 'year'],
                      var_name='technology',
                      value_name='capacity')

data_entsoe.head()

Out[18]:

	country	year	technology	capacity
0	AT	2010	Hydro	12665.0
1	AT	2011	Hydro	13200.0
2	AT	2012	Hydro	13350.0
3	AT	2013	Hydro	13427.0
4	AT	2014	Hydro	13569.0

In [19]:

data_entsoe['country'].replace('NI', 'GB', inplace=True)
# set negative capacities to zero
data_entsoe.loc[data_entsoe['capacity'] < 0, 'capacity'] = 0

data_entsoe['source'] = 'ENTSO-E Data Portal'
data_entsoe['source_type'] = 'Other association'
data_entsoe['capacity_definition'] = 'Net capacity'
data_entsoe['type'] = 'Installed capacity in MW'

data_entsoe.head()

Out[19]:

	country	year	technology	capacity	source	source_type	capacity_definition	type
0	AT	2010	Hydro	12665.0	ENTSO-E Data Portal	Other association	Net capacity	Installed capacity in MW
1	AT	2011	Hydro	13200.0	ENTSO-E Data Portal	Other association	Net capacity	Installed capacity in MW
2	AT	2012	Hydro	13350.0	ENTSO-E Data Portal	Other association	Net capacity	Installed capacity in MW
3	AT	2013	Hydro	13427.0	ENTSO-E Data Portal	Other association	Net capacity	Installed capacity in MW
4	AT	2014	Hydro	13569.0	ENTSO-E Data Portal	Other association	Net capacity	Installed capacity in MW

3.3.2 ENTSO-E SO&AF data¶

In [20]:

soafs = [soaf.SoafDataRaw('https://www.entsoe.eu/fileadmin/user_upload/_library/SDC/SOAF/SO_AF_2011_-_2025_.zip',
                          'SO_AF_2011_-_2025_.zip',
                          'SO&AF 2011 - 2025 Scenario B.xls',
                          2011),

        soaf.SoafDataRaw('https://www.entsoe.eu/fileadmin/user_upload/_library/SDC/SOAF/120705_SOAF_2012_Dataset.zip',
                         '120705_SOAF_2012_Dataset.zip',
                         'SOAF 2012 Scenario B.xls',
                         2012),
                             
        soaf.SoafDataRaw('https://www.entsoe.eu/fileadmin/user_upload/_library/publications/entsoe/So_AF_2013-2030/130403_SOAF_2013-2030_dataset.zip',
                         '130403_SOAF_2013-2030_dataset.zip',
                         'ScB.xls',
                         2013),
                             
        soaf.SoafDataRaw('https://www.entsoe.eu/Documents/SDC%20documents/SOAF/140602_SOAF%202014_dataset.zip',
                         '140602_SOAF%202014_dataset.zip',
                         'ScB.xlsx',
                         2014),
                             
        soaf.SoafDataRaw('https://www.entsoe.eu/Documents/Publications/SDC/data/SO_AF_2015_dataset.zip',
                         'SO_AF_2015_dataset.zip',
                         os.path.join('SO&AF 2015 dataset', 'ScB_publication.xlsx'),
                         2016)]


data_soaf = pd.concat([s.transformed_df for s in soafs])

# Correct that in the Soaf2015 datatset the year column is 2016 instead of 2015
data_soaf['year'].replace({2016 : 2015}, inplace=True)

data_soaf.head()

01 Oct 2020 16:08:00 - root - INFO - Downloading file SO_AF_2011_-_2025_.zip
01 Oct 2020 16:08:01 - root - INFO - Extracting SO_AF_2011_-_2025_.zip into the directory download\ENTSO-E\SO&AF\2011
01 Oct 2020 16:08:01 - root - INFO - Reading download\ENTSO-E\SO&AF\2011\SO&AF 2011 - 2025 Scenario B.xls and transforming into a DataFrame
01 Oct 2020 16:08:02 - root - INFO - Downloading file 120705_SOAF_2012_Dataset.zip
01 Oct 2020 16:08:02 - root - INFO - Extracting 120705_SOAF_2012_Dataset.zip into the directory download\ENTSO-E\SO&AF\2012
01 Oct 2020 16:08:02 - root - INFO - Reading download\ENTSO-E\SO&AF\2012\SOAF 2012 Scenario B.xls and transforming into a DataFrame
01 Oct 2020 16:08:03 - root - INFO - Downloading file 130403_SOAF_2013-2030_dataset.zip
01 Oct 2020 16:08:04 - root - INFO - Extracting 130403_SOAF_2013-2030_dataset.zip into the directory download\ENTSO-E\SO&AF\2013
01 Oct 2020 16:08:04 - root - INFO - Reading download\ENTSO-E\SO&AF\2013\ScB.xls and transforming into a DataFrame
01 Oct 2020 16:08:05 - root - INFO - Downloading file 140602_SOAF%202014_dataset.zip
01 Oct 2020 16:08:05 - root - INFO - Extracting 140602_SOAF%202014_dataset.zip into the directory download\ENTSO-E\SO&AF\2014
01 Oct 2020 16:08:05 - root - INFO - Reading download\ENTSO-E\SO&AF\2014\ScB.xlsx and transforming into a DataFrame
01 Oct 2020 16:08:06 - root - INFO - Downloading file SO_AF_2015_dataset.zip
01 Oct 2020 16:08:07 - root - INFO - Extracting SO_AF_2015_dataset.zip into the directory download\ENTSO-E\SO&AF\2016
01 Oct 2020 16:08:07 - root - INFO - Reading download\ENTSO-E\SO&AF\2016\SO&AF 2015 dataset\ScB_publication.xlsx and transforming into a DataFrame

Out[20]:

	technology	country	year
1	Nuclear	AL	2011
2	Fossil fuels	AL	2011
3	Lignite	AL	2011
4	Hard coal	AL	2011
5	Natural gas	AL	2011

In [21]:

soaf_unstacked = func.unstackData(data_soaf)

soaf_unstacked['Differently categorized solar'] = soaf_unstacked['Solar']

soaf_unstacked['Differently categorized wind'] = soaf_unstacked['Wind']\
                                                - soaf_unstacked['Offshore']\
                                                - soaf_unstacked['Onshore']

soaf_unstacked['Differently categorized hydro'] = soaf_unstacked['Hydro']\
                                                - soaf_unstacked['Run-of-river']\
                                                - soaf_unstacked['Reservoir including pumped storage']
                                                

soaf_unstacked['Bioenergy and renewable waste'] = soaf_unstacked['Biomass and biogas']
                                                
soaf_unstacked['Differently categorized renewable energy sources'] = (
                                          soaf_unstacked['renewable']
                                        - soaf_unstacked['Wind']
                                        - soaf_unstacked['Solar']
                                        - soaf_unstacked['Biomass and biogas'])

soaf_unstacked.drop(columns='renewable', inplace=True)

subtract_fossils_arr = ['Lignite','Hard coal','Oil','Natural gas','Mixed fossil fuels']

soaf_unstacked['Differently categorized fossil fuels'] = soaf_unstacked['Fossil fuels']\
                                                        - soaf_unstacked[subtract_fossils_arr].sum(axis=1)


res_arr = ['Solar','Wind','Bioenergy and renewable waste','Hydro',
           'Differently categorized renewable energy sources']

soaf_unstacked['Renewable energy sources'] = soaf_unstacked[res_arr].sum(axis=1)

total_arr = ['Renewable energy sources','Fossil fuels','Nuclear',
             'Other or unspecified energy sources']

soaf_unstacked['Total'] = soaf_unstacked[total_arr].sum(axis=1)

soaf_unstacked.head()

Out[21]:

	technology	Biomass and biogas	Fossil fuels	Hard coal	Hydro	Lignite	Mixed fossil fuels	Natural gas	Nuclear	Offshore	Oil	...	Solar	Wind	Differently categorized solar	Differently categorized wind	Differently categorized hydro	Bioenergy and renewable waste	Differently categorized renewable energy sources	Differently categorized fossil fuels	Renewable energy sources	Total
country	year
AL	2011	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	2012	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	2013	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	NaN	0.0	0.0	0.0	0.0	0.0
	2014	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	NaN	0.0	0.0	0.0	0.0	0.0
	2015	0.0	100.0	0.0	1860.0	0.0	0.0	100.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	NaN	0.0	0.0	0.0	1860.0	1960.0

5 rows × 24 columns

In [22]:

data_soaf = func.restackData(soaf_unstacked)

data_soaf.loc[data_soaf['capacity'] < 0, 'capacity'] = 0

data_soaf['source'] = 'ENTSO-E SOAF'
data_soaf['type'] = 'Installed capacity in MW'
data_soaf['capacity_definition'] = 'Net capacity'
data_soaf['source_type'] = 'Other association'
data_soaf['weblink'] = url_entsoe

data_soaf.head()

Out[22]:

	country	year	technology	source	type	capacity_definition	source_type	weblink
0	AL	2011	Biomass and biogas	ENTSO-E SOAF	Installed capacity in MW	Net capacity	Other association	https://docstore.entsoe.eu/Documents/Publicati...
1	AL	2011	Fossil fuels	ENTSO-E SOAF	Installed capacity in MW	Net capacity	Other association	https://docstore.entsoe.eu/Documents/Publicati...
2	AL	2011	Hard coal	ENTSO-E SOAF	Installed capacity in MW	Net capacity	Other association	https://docstore.entsoe.eu/Documents/Publicati...
3	AL	2011	Hydro	ENTSO-E SOAF	Installed capacity in MW	Net capacity	Other association	https://docstore.entsoe.eu/Documents/Publicati...
4	AL	2011	Lignite	ENTSO-E SOAF	Installed capacity in MW	Net capacity	Other association	https://docstore.entsoe.eu/Documents/Publicati...

3.3.3 ENTSO-E Transparency Platform¶

In [23]:

# file pattern for the single years
filenamepattern = '_1_InstalledGenerationCapacityAggregated.csv'
list_of_data_tables = [] # list to append

# iterate over the years from 2015 to 2020
for i in range(2015,2021):
    filepath = os.path.join('input',
                            'ENTSO-E',
                            'Transparency',
                            'InstalledGenerationCapacityAggregated',
                            str(i) + filenamepattern)
    
    list_of_data_tables.append(pd.read_csv(filepath, delimiter="\t", encoding = "UTF-16"))

# merge the datasets of the single of files into one pandas dataframe
data_transparency = pd.concat(list_of_data_tables, ignore_index=True)

data_transparency.head()

Out[23]:

	Year	Month	Day	DateTime	ResolutionCode	areacode	AreaTypeCode	AreaName	MapCode	ProductionType	AggregatedInstalledCapacity	UpdateTime
0	2015	1	1	2015-01-01 00:00:00.000	P1Y	10YAT-APG------L	CTA	APG CA	AT	Other	22.8	2017-09-13 15:14:19
1	2015	1	1	2015-01-01 00:00:00.000	P1Y	10YAT-APG------L	CTA	APG CA	AT	Wind Offshore	0.0	2017-09-13 15:14:19
2	2015	1	1	2015-01-01 00:00:00.000	P1Y	10YAT-APG------L	CTA	APG CA	AT	Fossil Brown coal/Lignite	0.0	2017-09-13 15:14:19
3	2015	1	1	2015-01-01 00:00:00.000	P1Y	10YAT-APG------L	CTA	APG CA	AT	Nuclear	0.0	2017-09-13 15:14:19
4	2015	1	1	2015-01-01 00:00:00.000	P1Y	10YAT-APG------L	CTA	APG CA	AT	Fossil Hard coal	1171.0	2017-09-13 15:14:19

In [24]:

# rename columns according to the opsd standards
data_transparency.rename(columns={'ProductionType': 'technology',
                                  'AggregatedInstalledCapacity': 'capacity',
                                  'MapCode': 'country',
                                  'Year': 'year'}, inplace=True)

# drop non relevant columns
data_transparency = data_transparency.filter(items=['technology','capacity','country','year'], axis=1)
# drop countries that are not part of opsd
data_transparency = data_transparency[data_transparency['country'].isin(data_opsd.country.unique())]

data_transparency.head()

Out[24]:

	technology	capacity	country	year
0	Other	22.8	AT	2015
1	Wind Offshore	0.0	AT	2015
2	Fossil Brown coal/Lignite	0.0	AT	2015
3	Nuclear	0.0	AT	2015
4	Fossil Hard coal	1171.0	AT	2015

In [25]:

# adapt energy source notation
dict_energy_source = {'Biomass': 'Biomass and biogas',
                      'Fossil Brown coal/Lignite': 'Lignite',
                      'Fossil Coal-derived gas': 'Mixed fossil fuels',
                      'Fossil Gas': 'Natural gas',
                      'Fossil Hard coal': 'Hard coal',
                      'Fossil Oil': 'Oil',
                      'Fossil Oil shale': 'Oil',
                      'Fossil Peat': 'Other fossil fuels',
                      'Hydro Pumped Storage': 'Pumped storage',
                      'Hydro Run-of-river and poundage': 'Run-of-river',
                      'Hydro Water Reservoir': 'Reservoir',
                      'Other': 'Other or unspecified energy sources',
                      'Other renewable': 'Differently categorized renewable energy sources',
                      'Waste': 'Other bioenergy and renewable waste',
                      'Wind Offshore': 'Offshore',
                      'Wind Onshore': 'Onshore',
                      ' ': np.nan}

data_transparency['technology'].replace(dict_energy_source, inplace=True)


data_transparency.head()

Out[25]:

	technology	capacity	country	year
0	Other or unspecified energy sources	22.8	AT	2015
1	Offshore	0.0	AT	2015
2	Lignite	0.0	AT	2015
3	Nuclear	0.0	AT	2015
4	Hard coal	1171.0	AT	2015

In [26]:

# add missing categories
transparency_pivot = data_transparency.pivot_table(values='capacity',
                                                   index=['country','year'],
                                                   columns='technology')

# technology level
transparency_pivot['Differently categorized solar'] = transparency_pivot['Solar']
transparency_pivot['Differently categorized natural gas'] = transparency_pivot['Natural gas']
transparency_pivot['Non-renewable waste'] = 0
transparency_pivot['Differently categorized fossil fuels'] = 0

# level 3
hydro_arr = ['Pumped storage', 'Reservoir', 'Run-of-river']
transparency_pivot['Hydro'] = transparency_pivot[hydro_arr].sum(axis=1)

wind_arr = ['Onshore', 'Offshore']
transparency_pivot['Wind'] = transparency_pivot[wind_arr].sum(axis=1)

# level 2
bio_arr = ['Biomass and biogas', 'Other bioenergy and renewable waste']
transparency_pivot['Bioenergy and renewable waste'] = transparency_pivot[bio_arr].sum(axis=1)

#level 1
res_arr = ['Hydro', 'Wind', 'Solar', 'Geothermal', 'Marine',
           'Bioenergy and renewable waste', 'Differently categorized renewable energy sources']
transparency_pivot['Renewable energy sources'] = transparency_pivot[res_arr].sum(axis=1)

fossil_arr = ['Lignite', 'Hard coal', 'Oil', 'Natural gas', 'Mixed fossil fuels', 'Other fossil fuels']
transparency_pivot['Fossil fuels'] = transparency_pivot[fossil_arr].sum(axis=1)

# level 0
total_arr = ['Fossil fuels','Nuclear','Renewable energy sources', 'Other or unspecified energy sources']
transparency_pivot['Total'] = transparency_pivot[total_arr].sum(axis=1)

transparency_pivot.reset_index(inplace=True)

transparency_pivot.head()

Out[26]:

technology	country	year	Biomass and biogas	Differently categorized renewable energy sources	Geothermal	Hard coal	Natural gas	...	Differently categorized solar	Differently categorized natural gas	Hydro	Wind	Bioenergy and renewable waste	Renewable energy sources	Fossil fuels	Total
0	AT	2015	464.2	31.3	0.9	1171.0	4501.2	...	587.0	4501.2	11478.7	2120.9	608.3	14827.1	5761.15	20611.05
1	AT	2016	473.6	32.5	0.9	766.0	4465.8	...	723.0	4465.8	11510.7	2497.4	617.7	15382.2	5320.75	20725.75
2	AT	2017	473.8	32.5	0.9	598.0	4465.8	...	1031.0	4465.8	11946.8	2696.0	617.9	16325.1	5152.75	21500.65
3	AT	2018	491.0	42.3	0.9	598.0	4467.7	...	1193.0	4467.7	11990.4	2887.0	640.8	16754.4	5154.65	21931.85
4	AT	2019	499.8	42.3	0.9	598.0	4463.2	...	1192.7	4463.2	11117.5	3034.6	649.8	16037.8	5150.15	21210.75

5 rows × 31 columns

In [27]:

data_transparency = pd.melt(transparency_pivot,
                      id_vars=['country', 'year'],
                      var_name='technology',
                      value_name='capacity')

data_transparency = data_transparency.loc[data_transparency["year"] < 2020, :]

data_transparency['source'] = 'ENTSO-E Transparency Platform'
data_transparency['source_type'] = 'Other association'
data_transparency['capacity_definition'] = 'Net capacity'
data_transparency['type'] = 'Installed capacity in MW'
data_transparency['weblink'] = ('https://transparency.entsoe.eu/generation'
                                '/r2/installedGenerationCapacityAggregation/show')

data_transparency.head()

Out[27]:

	country	year	technology	capacity	source	source_type	capacity_definition	type	weblink
0	AT	2015	Biomass and biogas	464.2	ENTSO-E Transparency Platform	Other association	Net capacity	Installed capacity in MW	https://transparency.entsoe.eu/generation/r2/i...
1	AT	2016	Biomass and biogas	473.6	ENTSO-E Transparency Platform	Other association	Net capacity	Installed capacity in MW	https://transparency.entsoe.eu/generation/r2/i...
2	AT	2017	Biomass and biogas	473.8	ENTSO-E Transparency Platform	Other association	Net capacity	Installed capacity in MW	https://transparency.entsoe.eu/generation/r2/i...
3	AT	2018	Biomass and biogas	491.0	ENTSO-E Transparency Platform	Other association	Net capacity	Installed capacity in MW	https://transparency.entsoe.eu/generation/r2/i...
4	AT	2019	Biomass and biogas	499.8	ENTSO-E Transparency Platform	Other association	Net capacity	Installed capacity in MW	https://transparency.entsoe.eu/generation/r2/i...

3.3.4 ENTSO-E Power Statistics¶

In [28]:

row_of_year = {2014: 9,
               2015: 53,
               2016: 97,
               2017: 141,
               2018: 185}

dataframes = []

for year, row in row_of_year.items():
    # read the dataframe for each year
    power_statistics_raw = pd.read_excel(os.path.join('input',
                                                      'ENTSO-E',
                                                      'Power Statistics',
                                                      'NGC.xlsx'),
                                         header=[0,1], 
                                         sheet_name='NGC',
                                         skiprows=row,
                                         nrows=42)
    
    # drop non relevant columns
    power_statistics_raw.drop(columns='Coverage ratio in %', level=1, inplace=True)
    power_statistics_raw.drop(columns=['Unnamed: 1_level_1','Unnamed: 2_level_1','Unnamed: 3_level_1','Unnamed: 4_level_1'], level=1, inplace=True)

    # get rid of multi index
    df = power_statistics_raw.set_index(year).stack().reset_index().drop('level_1', axis=1)
    
    # remove leftovers of multi index in the index column
    df["technology"] = df[year].apply(lambda x: x[0])
    df.drop(columns=year, inplace=True)
    
    # stack df to the opsd standard format
    stacked_df = df.melt(id_vars='technology', var_name='country', value_name='capacity')
    
    # add information about the year
    stacked_df['year'] = year
    
    # append to the main list of dataframes
    dataframes.append(stacked_df)
    
power_statistics = pd.concat(dataframes)
power_statistics.head()

Out[28]:

	technology	country	capacity	year
0	Non-Renewable	AL	0	2014
1	Nuclear	AL	NaN	2014
2	Non-renwable hydro	AL	0	2014
3	Of which hydro pure pumped storage	AL	NaN	2014
4	Of which Hydro mixed pumped storage (non renew...	AL	NaN	2014

In [29]:

# drop countries that are not covered in opsd
opsd_countries = data_opsd.country.unique()
drop_list_country = power_statistics.loc[~power_statistics['country'].isin(opsd_countries)].index.to_list()
power_statistics.drop(drop_list_country, inplace=True)

# technology classes to be dropped
tech_to_drop = ['Non-Renewable', 'Fossil fuels', 'Renewable','Non-renwable hydro',
                'Total Waste', 'Bio', 'Renewable Hydro', 'Comments', 'Total NGC']

drop_list_tech = power_statistics.loc[power_statistics['technology'].isin(tech_to_drop)].index.to_list()
power_statistics.drop(drop_list_tech, inplace=True)

# replace string with values that can be used in math operations 
power_statistics['capacity'].replace(to_replace='Not Expected', value=0, inplace=True)
power_statistics['capacity'].replace(to_replace='Not Available', value=np.nan, inplace=True)

power_statistics.head()

Out[29]:

	technology	country	year
41	Nuclear	AT	2014
43	Of which hydro pure pumped storage	AT	2014
44	Of which Hydro mixed pumped storage (non renew...	AT	2014
46	Of which Fossil Brown coal/Lignite	AT	2014
47	Of which Fossil Coal-derived gas	AT	2014

In [30]:

# Not included because already categorized in OPSD standard:
# Nuclear, Solar, Geothermal, Wind
dict_energy_source = {'Of which hydro pure pumped storage':'Pumped storage',
                      'Of which Hydro mixed pumped storage (non renewable part)':'Pumped storage',
                      'Of which Fossil Brown coal/Lignite':'Lignite',
                      'Of which Fossil Coal-derived gas':'Differently categorized fossil fuels',
                      'Of which Fossil Gas':'Natural gas',
                      'Of which Fossil Hard coal':'Hard coal', 
                      'Of which Fossil Oil':'Oil',
                      'Of which Fossil Oil shale':'Oil', 
                      'Of which Fossil Peat':'Differently categorized fossil fuels',
                      'Of which Mixed fuels':'Mixed fossil fuels', 
                      'Of which Other fossil fuels':'Other fossil fuels',
                      'Non-renewable Waste':'Non-renewable waste', 
                      'Other non-renewable':'Differently categorized fossil fuels',                      
                      'Of which Wind offshore':'Offshore', 
                      'Of which Wind onshore':'Onshore', 
                      'Of which Solar PV':'Photovoltaics', 
                      'Of which Solar Thermal':'Differently categorized solar', 
                      'Of which Biomass':'Biomass and biogas', 
                      'Of which Biogas':'Biomass and biogas', 
                      'Renewable Waste':'Other bioenergy and renewable waste', 
                      'Of which Hydro Pure storage':'Reservoir',
                      'Of which Hydro Run-of-river and pondage':'Run-of-river',
                      'Of which Hydro mixed pumped storage (renewable part)':'Pumped storage',
                      'Of which Hydro Marine (tidal/wave)':'Marine',
                      'Other renewable (not listed)':'Differently categorized renewable energy sources',
                      'Non identified (other not listed)':'Other or unspecified energy sources', 
                      'Total Hydro':'Hydro'}

power_statistics["technology"].replace(dict_energy_source, inplace=True)
power_statistics.head()

Out[30]:

	technology	country	year
41	Nuclear	AT	2014
43	Pumped storage	AT	2014
44	Pumped storage	AT	2014
46	Lignite	AT	2014
47	Differently categorized fossil fuels	AT	2014

In [31]:

powerstats_pivot = power_statistics.pivot_table(values='capacity',
                                                   index=['country','year'],
                                                   columns='technology').reset_index()
powerstats_pivot.head()

Out[31]:

technology	country	year	Biomass and biogas	Differently categorized fossil fuels	Differently categorized renewable energy sources	Geothermal	Hard coal	Hydro	...	Onshore	Other bioenergy and renewable waste	Other fossil fuels	Other or unspecified energy sources	Photovoltaics	Pumped storage	Reservoir	Run-of-river	Solar	Wind
0	AT	2014	302.685	29.0	0.0	1.0	1171.0	13568.0	...	2110.0	21.96	414.0	0.0	586.12	2656.333333	0.0	5599.0	586.12	2110.0
1	AT	2015	298.000	30.0	NaN	1.0	1171.0	13656.0	...	2489.0	23.00	414.0	NaN	732.00	3997.000000	NaN	5662.0	732.00	2489.0
2	AT	2016	286.000	35.0	NaN	1.0	598.0	14116.0	...	2730.0	22.00	414.0	NaN	1031.00	4212.000000	NaN	5692.0	1031.00	2730.0
3	AT	2017	286.000	35.0	NaN	1.0	598.0	14116.0	...	2730.0	22.00	414.0	NaN	1031.00	4212.000000	NaN	5692.0	1031.00	2730.0
4	AT	2018	293.000	35.0	NaN	1.0	598.0	14150.0	...	2887.0	22.00	414.0	NaN	1193.00	4218.000000	NaN	5714.0	1193.00	2887.0

5 rows × 27 columns

In [32]:

# technology level
powerstats_pivot['Differently categorized natural gas'] = powerstats_pivot['Natural gas']

# level 2
powerstats_pivot['Bioenergy and renewable waste'] = (
          powerstats_pivot['Biomass and biogas'] +
          powerstats_pivot['Other bioenergy and renewable waste'])

#level 1
fossil_techs = ['Lignite', 'Hard coal', 'Oil', 'Natural gas', 'Non-renewable waste',
               'Mixed fossil fuels', 'Other fossil fuels', 'Differently categorized fossil fuels']

powerstats_pivot['Fossil fuels'] = powerstats_pivot[fossil_techs].sum(axis=1)

res_tech = ['Hydro', 'Wind', 'Solar', 'Geothermal', 'Marine', 'Bioenergy and renewable waste',
           'Differently categorized renewable energy sources']

powerstats_pivot['Renewable energy sources'] = powerstats_pivot[res_tech].sum(axis=1)

total_arr = ['Fossil fuels','Nuclear','Renewable energy sources']
powerstats_pivot['Total'] = powerstats_pivot[total_arr].sum(axis=1)

powerstats_pivot.head()

Out[32]:

technology	country	year	Biomass and biogas	Differently categorized fossil fuels	Differently categorized renewable energy sources	Geothermal	Hard coal	Hydro	...	Pumped storage	Reservoir	Run-of-river	Solar	Wind	Differently categorized natural gas	Bioenergy and renewable waste	Fossil fuels	Renewable energy sources	Total
0	AT	2014	302.685	29.0	0.0	1.0	1171.0	13568.0	...	2656.333333	0.0	5599.0	586.12	2110.0	4888.0	324.645	7105.0	16589.765	23694.765
1	AT	2015	298.000	30.0	NaN	1.0	1171.0	13656.0	...	3997.000000	NaN	5662.0	732.00	2489.0	4820.0	321.000	7002.0	17199.000	24201.000
2	AT	2016	286.000	35.0	NaN	1.0	598.0	14116.0	...	4212.000000	NaN	5692.0	1031.00	2730.0	4841.0	308.000	6407.0	18186.000	24593.000
3	AT	2017	286.000	35.0	NaN	1.0	598.0	14116.0	...	4212.000000	NaN	5692.0	1031.00	2730.0	4841.0	308.000	6407.0	18186.000	24593.000
4	AT	2018	293.000	35.0	NaN	1.0	598.0	14150.0	...	4218.000000	NaN	5714.0	1193.00	2887.0	4853.0	315.000	6420.0	18546.000	24966.000

5 rows × 32 columns

In [33]:

data_power_statistics = powerstats_pivot.melt(id_vars=['country', 'year'],
                                         var_name='technology',
                                         value_name='capacity')

data_power_statistics['source'] = 'ENTSO-E Power Statistics'
data_power_statistics['source_type'] = 'Other association'
data_power_statistics['capacity_definition'] = 'Net capacity'
data_power_statistics['type'] = 'Installed capacity in MW'
data_power_statistics['weblink'] = ('https://www.entsoe.eu/data/'
                                    'power-stats/net-gen-capacity/')

data_power_statistics.head()

Out[33]:

	country	year	technology	capacity	source	source_type	capacity_definition	type	weblink
0	AT	2014	Biomass and biogas	302.685	ENTSO-E Power Statistics	Other association	Net capacity	Installed capacity in MW	https://www.entsoe.eu/data/power-stats/net-gen...
1	AT	2015	Biomass and biogas	298.000	ENTSO-E Power Statistics	Other association	Net capacity	Installed capacity in MW	https://www.entsoe.eu/data/power-stats/net-gen...
2	AT	2016	Biomass and biogas	286.000	ENTSO-E Power Statistics	Other association	Net capacity	Installed capacity in MW	https://www.entsoe.eu/data/power-stats/net-gen...
3	AT	2017	Biomass and biogas	286.000	ENTSO-E Power Statistics	Other association	Net capacity	Installed capacity in MW	https://www.entsoe.eu/data/power-stats/net-gen...
4	AT	2018	Biomass and biogas	293.000	ENTSO-E Power Statistics	Other association	Net capacity	Installed capacity in MW	https://www.entsoe.eu/data/power-stats/net-gen...

3.4 Merge data sources¶

In [34]:

dataframes = [data_opsd, data_eurostat, data_soaf, data_entsoe, data_transparency, data_power_statistics]
data = pd.concat(dataframes, sort=False)
data['comment'] = data['comment'].fillna('').astype(str)

col_order = ['technology', 'source', 'source_type', 'weblink', 'year', 'type',
             'country', 'capacity_definition', 'capacity', 'comment']

data = data[col_order]


energy_source_mapping = pd.read_csv(os.path.join('input','energy_source_mapping.csv'),
                                    index_col ='name')
energy_source_mapping.replace({0: False, 1: True}, inplace=True)

data = data.merge(energy_source_mapping,
                  left_on='technology',
                  right_index=True,
                  how='left')

new_level_names = {"Level 0": "energy_source_level_0",
                   "Level 1": "energy_source_level_1",
                   "Level 2": "energy_source_level_2",
                   "Level 3": "energy_source_level_3",
                   "Technology level": "technology_level"}

data.rename(columns=new_level_names, inplace=True)

data.head()

Out[34]:

	technology	source	source_type	weblink	year	type	country	capacity_definition	capacity	energy_source_level_0	energy_source_level_1	energy_source_level_2	energy_source_level_3	technology_level
0	Fossil fuels	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	2014	Installed capacity in MW	AT	Gross capacity	7243.634	False	True	False	False	False
1	Fossil fuels	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	2015	Installed capacity in MW	AT	Gross capacity	7059.065	False	True	False	False	False
2	Fossil fuels	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	2016	Installed capacity in MW	AT	Gross capacity	7323.000	False	True	False	False	False
3	Fossil fuels	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	2017	Installed capacity in MW	AT	Gross capacity	6469.000	False	True	False	False	False
4	Fossil fuels	e-control	Regulatory Authority	https://www.e-control.at/documents/1785851/181...	2018	Installed capacity in MW	AT	Gross capacity	6492.000	False	True	False	False	False

4. Convert stacked data to crosstable format¶

In [35]:

cols = ['technology', 'source', 'source_type', 'weblink','year',
        'type', 'country', 'capacity_definition', 'capacity']

data_crosstable = pd.pivot_table(data[cols],
                                 index=['technology'],
                                 columns=['country', 'type', 'year',
                                          'source', 'source_type',
                                          'weblink', 'capacity_definition'],
                                 values='capacity')

# Apply initial ordering of technologies
data_crosstable = data_crosstable.reindex(technology_order)

# Delete index naming
data_crosstable.index.name = None
data_crosstable.columns.names = ('Country (ISO code)',
                                 'Type of data', 'Year',
                                 'Source', 'Type of source',
                                 'Weblink',
                                 'Capacity definition (net, gross, unknown)')

data_crosstable.head()

Out[35]:

Country (ISO code)	AL										...	XK
Type of data	Installed capacity in MW										...	Installed capacity in MW
Year	1990	1991	1992	1993	1994	1995	1996	1997	1998	1999	...	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
Source	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	...	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT	EUROSTAT
Type of source	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	...	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office	Statistical Office
Weblink	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	...	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable	Link unavailable
Capacity definition (net, gross, unknown)	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	...	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown
Fossil fuels	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	915.0	915.0	915.0	915.0	915.0	915.0	915.0	915.0	915.0	915.0
Lignite	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Hard coal	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Oil	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Natural gas	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 1633 columns

In [36]:

energylevels_table = energylevels_raw[6:]
energylevels_table.columns = pd.MultiIndex.from_arrays(energylevels_raw[:6].values,
                                                       names=['country', 'type', 'year',
                                                              'source', 'source_type',
                                                              'capacity_definition'
                                                              ])

energylevels_table = energylevels_table.reset_index()
energylevels_table['technology'] = energylevels_table['technology'].str.replace('- ', '')
energylevels_table = energylevels_table.set_index('technology')

# Delete index naming
energylevels_table.index.name = None
energylevels_table.columns.names = ('Country (ISO code)',
                                    'Description', None,
                                    None, None,
                                    'Level')

energylevels_table.head()

Out[36]:

Country (ISO code)	Energy source levels and technology
Description	Total generation capacity	Generation capacity by energy type (fossil, nuclear, renewable, other)	Generation capacity by fuel or energy source (e.g. coal, lignite, hard coal, natural gas, wind)	Generation capacity by fuel or energy source and detailed fuel for bioenergy	Generation capacity by fuel and technology (e.g. combined cycle natural gas, gas turbine, onshore wind, offshore wind)
	NaN	NaN	NaN	NaN	NaN
	NaN	NaN	NaN	NaN	NaN
	NaN	NaN	NaN	NaN	NaN
Level	NaN	NaN	NaN	NaN	NaN
NaN	Level 0	Level 1	Level 2	Level 3	Technology level
Fossil fuels	0	1	0	0	0
Lignite	0	0	1	1	1
Hard coal	0	0	1	1	1
Oil	0	0	1	1	1

5. Output¶

Delete downloaded zip files

In [37]:

for root, dirs, files in os.walk("download"):
    for file in files:
        item = os.path.join(root, file)
        if item.endswith(".zip"):
            os.remove(item)
            print("Deleted: " + item)

Deleted: download\ENTSO-E\SO&AF\2011\SO_AF_2011_-_2025_.zip
Deleted: download\ENTSO-E\SO&AF\2012\120705_SOAF_2012_Dataset.zip
Deleted: download\ENTSO-E\SO&AF\2013\130403_SOAF_2013-2030_dataset.zip
Deleted: download\ENTSO-E\SO&AF\2014\140602_SOAF%202014_dataset.zip
Deleted: download\ENTSO-E\SO&AF\2016\SO_AF_2015_dataset.zip

Copy input files

In [38]:

orig_data_path = os.path.join('output', 'original_data')
shutil.rmtree(orig_data_path, ignore_errors=True)
func.copydir(os.path.join('input'), orig_data_path)
func.copydir(os.path.join('download'), orig_data_path)

5.1 Write results to file¶

Write stacked data to formats: csv, xls and sql.

In [39]:

# Write the result to file
data.to_csv(os.path.join('output', 'national_generation_capacity_stacked.csv'),
            encoding='utf-8', index_label='ID')

# Write the results to excel file
data.to_excel(os.path.join('output', 'national_generation_capacity_stacked.xlsx'),
              sheet_name='output', index_label='ID')

# Write the results to sql database
data.to_sql('national_generation_capacity_stacked',
            sqlite3.connect(os.path.join('output',
                                         'national_generation_capacity.sqlite')),
            if_exists="replace", index_label='ID')

Write data in human readable form to excel.

In [40]:

# Write crosstable data to excel file
writer = pd.ExcelWriter(os.path.join('output', 'national_generation_capacity.xlsx'))
data_crosstable.to_excel(writer, sheet_name='output')
energylevels_table.to_excel(writer, sheet_name='technology levels')
writer.save()

5.2 Formatting of Excel tables¶

In [41]:

outputxls = openpyxl.load_workbook(os.path.join('output',
                                          'national_generation_capacity.xlsx'))

ws1 = outputxls['output']
ws2 = outputxls['technology levels']
ws1_rows, ws1_cols = data_crosstable.shape
amount_cols = ws1_cols + 1 # correct 0 index

ws1.column_dimensions['A'].width = 50
ws2.column_dimensions['A'].width = 50

In [42]:

blackfont = Font(color=colors.BLACK, italic=False, bold=False)
blackfontitalic = Font(color=colors.BLACK, italic=True, bold=False)
blackfontbold = Font(color=colors.BLACK, italic=False, bold=True)

align0 = Alignment(horizontal='left', indent=0)
align1 = Alignment(horizontal='left', indent=1)
align2 = Alignment(horizontal='left', indent=2)

# darkest grey
colour = "{0:02X}{1:02X}{2:02X}".format(166, 166, 166)
grey166 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")

# darker grey
colour = "{0:02X}{1:02X}{2:02X}".format(191, 191, 191)
grey191 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")

# lighter grey
colour = "{0:02X}{1:02X}{2:02X}".format(217, 217, 217)
grey217 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")

# lightest grey
colour = "{0:02X}{1:02X}{2:02X}".format(242, 242, 242)
grey242 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")

In [43]:

for col in range(2, amount_cols+1):
    colname = openpyxl.utils.cell.get_column_letter(col)
    ws1.column_dimensions[colname].width = 16


for col in range(1, amount_cols+1):
    # format column name block
    for row in range(2,8):
        
        ws1.cell(row=row, column=col).font = blackfont
        

    # format cells that contain the values
    for row in range(9, 48):
        ws1.cell(row=row, column=col).fill = grey242
        ws1.cell(row=row, column=1).font = blackfontitalic
        ws1.cell(row=row, column=1).alignment = align2
    
    
    # format row 'Total' with dark grey
    ws1.cell(row=47, column=col).fill = grey166
    ws1.cell(row=47, column=col).font = blackfontbold
    
    
    # format level 1
    for row in [9, 22, 23, 46]:
        ws1.cell(row=row, column=col).fill = grey191
        ws1.cell(row=row, column=col).font = blackfontbold
        ws1.cell(row=row, column=1).alignment = align0
        
    
    # format level 2
    for row in [10, 11, 12, 13, 18, 19, 20, 21, 24, 31, 35, 39, 40, 41, 45]:
        ws1.cell(row=row, column=col).fill = grey217
        ws1.cell(row=row, column=1).alignment = align1

    
ws1.cell(row=47, column=1).alignment = align0
ws1.freeze_panes = ws1['B8'] #freeze first column and header rows

In [44]:

# do the same for the second worksheet 'technology levels' 
for col in range(1, 7):
    colname = get_column_letter(col + 1)
    ws2.column_dimensions[colname].width = 25
    
    for row in range(2, 8):
        ws2.cell(row=row, column=col).font = blackfont
        
        
    for row in range(9, 48):     
        ws2.cell(row=row, column=col).fill = grey242
        ws2.cell(row=row, column=1).font = blackfontitalic
        ws2.cell(row=row, column=1).alignment = align2
        
        
    # format row 'Total' with dark grey
    ws2.cell(row=46, column=col).fill = grey166
    
    
    # format level 1
    for row in [8, 21, 22, 45]:
        ws2.cell(row=row, column=col).fill = grey191
        ws2.cell(row=row, column=col).font = blackfontbold
        ws2.cell(row=row, column=1).font = blackfontbold
        ws2.cell(row=row, column=1).alignment = align0
        
    
    # format level 2
    for row in [9, 10, 11, 12, 17, 18, 19, 20, 23, 30, 34, 38, 39, 40, 44]:
        ws2.cell(row=row, column=col).fill = grey217
        ws2.cell(row=row, column=1).alignment = align1

    
ws2.cell(row=46, column=1).alignment = align0

In [45]:

additional_notes = openpyxl.load_workbook(os.path.join('input',
                                          'National_Generation_Capacities.xlsx'))['Additional notes']
# copy additional notes to output file
for col in range(1, 3):
    for row in range(1, 10):
        add_notes_value = additional_notes.cell(row=row, column=col).value
        ws1.cell(row=row + 50, column=col).value = add_notes_value
        ws1.cell(row=51, column=1).font = blackfontbold
        ws1.cell(row=row + 51, column=1).font = blackfontitalic

In [46]:

outputxls.save(os.path.join('output', 'national_generation_capacity.xlsx'))

5.3 Write checksums¶

In [47]:

files = ['national_generation_capacity.xlsx',
        'national_generation_capacity_stacked.csv',
        'national_generation_capacity_stacked.xlsx',
        'national_generation_capacity.sqlite']

hash_dict = {}
filesize_dict = {}

with open('checksums.txt', 'w') as f:
    for file_name in files:
        path = os.path.join('output', file_name)
        file_hash = func.get_sha_hash(path)
        hash_dict[file_name] = file_hash
        filesize_dict[file_name] = os.path.getsize(path)
        f.write('{},{}\n'.format(file_name, file_hash))

6. Documentation of the data package¶

We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.

In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file.

In [48]:

with open(os.path.join('input', 'metadata.yml'), 'r') as f:
    metadata = yaml.load(f.read(), Loader=yaml.BaseLoader)
    
metadata['resources'][0]['hash'] = hash_dict['national_generation_capacity.xlsx']
metadata['resources'][1]['hash'] = hash_dict['national_generation_capacity_stacked.csv']
metadata['resources'][0]['bytes'] = filesize_dict['national_generation_capacity.xlsx']
metadata['resources'][1]['bytes'] = filesize_dict['national_generation_capacity_stacked.csv']

In [49]:

datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))
    
# Write the information of the metadata
with open(os.path.join('output', 'datapackage.json'), 'w') as f:
    f.write(datapackage_json)

End of script.