National generation capacity: Processing notebook
This Notebook is part of the National Generation Capacity Datapackage of Open Power System Data. |
The script processes the compiled nationally aggregated generation capacity for European countries. Due to varying formats and data specifications of references for national generation capacity, the script firstly focuses on rearranging the manually compiled data. Thus, the script itself does not collect, select, download or manage data from original sources. Secondly, international data sources, such as EUROSTAT and ENTSO-E, are directly downloaded from original web sources and complement the initial data set.
# some functions and classes that are defined in seperate files
import functions.helper_functions as func
import functions.soaf as soaf
# core packages
import os
import pandas as pd
import numpy as np
# packages to copy files, write sqllite data bases and manipulate excel files
import shutil
import sqlite3
import openpyxl
from openpyxl.styles import PatternFill, colors, Font, Alignment
from openpyxl.utils import get_column_letter
import yaml
import json
We compile data from different national and international sources. Firstly, national data sources are manually compiled due to varying data formats and specifications. Secondly, international sources are compiled directly and appended to the compiled data set. The international data sources comprise:
In the following section, the data sets are downloaded and uploaded to Python.
The manually compiled dataset is imported and rearranged to a DataFrame for further processing. The dataset comprises for each European country and specified generation technology different data entries, which are based on different sources. As these sources differ by country and year, information on the corresponding reference are directly given with the data entry.
data_file = 'National_Generation_Capacities.xlsx'
filepath = os.path.join('input', data_file)
# Read data into pandas
data_raw = pd.read_excel(filepath,
sheet_name='Summary',
header=None,
na_values=['-'],
skiprows=0)
# Deal with merged cells from Excel: fill first three rows with information
data_raw.iloc[0:2] = data_raw.iloc[0:2].fillna(method='ffill', axis=1)
# Set index for rows
data_raw = data_raw.set_index([0])
data_raw.index.name = 'technology'
# Extract energylevels from raw data for later use
energylevels_raw = data_raw.iloc[:, 0:5]
energylevels_raw.head()
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
technology | |||||
NaN | Energy source levels and technology | Energy source levels and technology | Energy source levels and technology | Energy source levels and technology | Energy source levels and technology |
NaN | Total generation capacity | Generation capacity by energy type (fossil, nu... | Generation capacity by fuel or energy source (... | Generation capacity by fuel or energy source a... | Generation capacity by fuel and technology (e.... |
NaN | NaN | NaN | NaN | NaN | NaN |
NaN | NaN | NaN | NaN | NaN | NaN |
NaN | NaN | NaN | NaN | NaN | NaN |
# Delete definition of energy levels from raw data
data_raw.drop(data_raw.columns[[0, 1, 2, 3, 4, 5]], axis=1, inplace=True)
level_names = ['country', 'type', 'year', 'source',
'source_type', 'weblink', 'capacity_definition']
# Set multiindex column names
data_raw.columns = pd.MultiIndex.from_arrays(data_raw[:7].values,
names=level_names)
# Remove 3 rows which are already used as column names
data_raw = data_raw[pd.notnull(data_raw.index)]
# Extract the ordering of technologies
technology_order = data_raw.index.str.replace('- ', '').values.tolist()
data_raw.head()
country | AT | BE | ... | SI | SK | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
type | Installed capacity in MW | Installed capacity in MW | ... | Installed capacity in MW | Installed capacity in MW | ||||||||||||||||
year | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | 2019.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | ... | 2015.0 | 2016.0 | 2017.0 | 2013.0 | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | NaN |
source | e-control | e-control | e-control | e-control | e-control | e-control | ELIA | ELIA | ELIA | ELIA | ... | Agencija za energijo | Agencija za energijo | Agencija za energijo | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | |
source_type | Regulatory Authority | Regulatory Authority | Regulatory Authority | Regulatory Authority | Regulatory Authority | Regulatory Authority | Transmission System Operator | Transmission System Operator | Transmission System Operator | Transmission System Operator | ... | Ministry | Ministry | Ministry | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | NaN |
weblink | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-2019_KW2EPLTyp.xlsx/e9993ee3-838f-ed2e-2031-7503418fac2d?t=1596094915390 | https://www.elia.be/en/grid-data/power-generation/generating-facilities | https://www.elia.be/en/grid-data/power-generation/generating-facilities | https://www.elia.be/en/grid-data/power-generation/generating-facilities | https://www.elia.be/en/grid-data/power-generation/generating-facilities | ... | https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2015/f1302ae0-7267-4ae7-b74d-7ce8c4323043 | https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2016/de8cc94f-b3f6-4d32-8e79-ce3b0ec386f5 | https://www.agen-rs.si/documents/54870/68629/a/78f74b68-dbfc-415e-ab88-882652558d94 | Link unavailable | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRZJIHHEGAOxQwwYEuvSASM31AE7jWsnb1_Z8aRtGEM7LQcNsP8Hu6t_hQS6R6LJ99WXfFdtmvxRPS-E-TliAR8OdRuAGSMIrtNR5CWWDhTQ3UngeLZP2RSATz0DAttPYysiBGyChArbURQl0ywDL8NjCIjuQZimADE783DhsuHP-Gw6e0vQY3MdgjBM0oxH5GZOT_x7_nziOCpszEdGFpKZS898h6CLt6GHP-f_M2wwUOFE5Z4lkECbbwnE5fLcELrrPwo6-DTo6Z87Njp5h3bzxx_8E-986P-tY7nKf75pU-bqlDOUIYHEstltdw9osSme8wdNNtr2e3EFanms1vmyklewf5a79euvbwCXukGkoVFJLLWsQLO4WWiSyWVBVyujMI3jt7fLqkQLHUtSyJxpK4xXGjUtS-OlBUpdMpaTEuu8_LiX7WER3Vb_wLeL_b0XsZq3YIh1nzoTwjPW4lsC0TfL4iTpmua-Gt4E42SGbYt6zii-VQM7dzBPpiYG0M-CLsP1WV70nQndgrmJQn9TS5TIBi1Uy9jFnkwwuttX8oDSZvOtVsdh8peO8AFdI1EV9eCwrAcw0DFnJsVqMy1M1P3xfK2-Pj0JW5l40-zkyw7d_3cXb-v0_ao5ea0qbR27sX-4cQkVi08_AHhGIRI!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsVZIgDhijIHYIQYMiXXpCAwNdcBOQ-3k7Ss7nrQNY2in1YEB9v9gd_WvEEf3iDdiX30RbbVtxKN8XnH9c2j4bDzGFoChTcC_TiahG5sYKKC7k8B2LY8acwA2dzXwLS-JzJAQsAjiMmyFYRjP0xTcVJ2CT7ALQZIARMaZhwvLgj_j0_niLUHXWGLwgyBOUhaSmyU98e_5s5lty7C2nGhpQBYOPfM9gj7eggH-nP_PsGaADMcy9zSGGLp8R8Avl-cE0F__UdDDJ_5A_5yp1svbtJ8__uCfePtD_29t05H-83SLGg6mzEAp4ojnTbtrH9Bqmz2LB6VolN337Ark5bHaiLwqrmD_XLSb11_fiLUotVKjSi4IVijOiMKKfK0wprFCFQwyio_f3uXVGq0wlEKYTFd0zRAKZSJXmF6aikl1BkIzszIvP-5ld1h4v9U_8N1if-9FJOfNH6vYo_aMsNTo8B0BH5plfpL0TfNQDW-CabxQLZO69iS6lQO7tFUWz3UVAJ8FfYYbsjwfOhP6BUsdBd62LlBcNGglW2Zc7MlMRXf7qjigpNl-q-VxGP-lIzxA14hXWT065PUIRlhlhk5VuZmmSuT98Xytvj49cUuaeNu0xUuL7v-7i3d18r5qRl6rStlETuQdbhxC-erTD_ffAqA!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRYkIXHEGAOxQwwYGnTpAMYNdcBOQ-3m7Ss7nrQ1Y2in1YEB9v9gd_WvkEAPSDTZvvqctdW2yZ7kcyr0TwHz-HismgCMTsC7jSeBExkqEEAfTwLLMV3C5gB87lDwTDcOjQBjMDESMmwGQRDNkwScRJuCh1UH_DgGCNmZhyvLhD_jk_niLUGHLVXwfD-KEx7guyU58e_585llyTBdTmji44VNznyPoI83YYA_5_8zTBnIcCRzTyKIoMt3BOJ6ebYP_fUfBT187A30z57SXt4i_fzxB__EWxf9v7cMW_rP1U3CbJVwhhIkkCiadtc-onSbv2SPStkou2_5DcjLU7XJiqq8gf1L2W5ef32TGWrGVsZaWfFMV4hOqMKpvCsxLwzAJIeCHr-9K6oVSiHT8hUjupIX60Ihqk4VyZfKeo1xSTnOoNAu97I7LKLf6hd8t9jfexHKefPGmuoSa4Z5wjp8RyCGZlmcJH3TPFTDm2AaLTTTII41Ce_lwC4tjUdzXQNQz4I-ww1ZXgydCf2CpY58d1uXKCoblMqWsas9mck93VflAcXN9mstj8PoLx3hArpFosrr0aGoRzBSNc50osnNNDQs74_na_Xl-VmY0sTbpi2_t-jhv7t4V8fvq-b4taqUTWiH7uHOxkSkH34AcI_n7A!!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZRRd5owFMc_yx76iLmEQJJHRASqpQLCal52AOPKLGhXp-u3X7SebpMjbGfLAwe4_x_ce_O_QQI9INHk--pzvqs2Tf6knhfC-hTRgA2Hug1AzREEt-ko8hKuAwH08SRwPNsndArApp4Jge2nMY8MA2wDCRW2oyhKplkGXobHEBi6B2GaAsT0zMOVZcOf8dl09pagR-c6BGGYpBmLjLs5OfHv-bOJ46iwOR-ZWWjMXHLmOwRdvA09_Dn_n2GTggonKvcsgQTafEsgrpfnhtBd_1HQwadBT__csdnJO6SbP_7gn3jnov_3DneV_3zLJtTVCaMoQwKJstltd49osSle8kdNNtr2W3ED6vJUrfOykjewf5G79euvbzDnJTcLoi1zk2gEl1IruGFpRcFXJsF0xTAcv70tqyVa6IwA6HylSWZxjchlqTFpgsYMbEnMTGtF8eVetodFdFv9gm8X-3svYjVvwRDrPnEmBstoi28JRN8si5Oka5r7angTjJMZtjnxnFF8rwZ27mCWTC2sWngWdBmuz_Ki70zoFswtFPqbWqJENmihWkav9mSi9nRfyQNKm83XWh2HyV86wgd0i0RV1INDWQ9goGNGLYLVZnJsqPvj-Vp9eX4WtjLxptnJ7zv08N9dvK3T91Uz47WqtHXsxv7hzjWIWHz4AZRUN9Q!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=da19bcb8-8c9c-4757-aba5-2d0720266efa | |
capacity_definition | Gross capacity | Gross capacity | Gross capacity | Gross capacity | Gross capacity | Gross capacity | Unknown | Unknown | Unknown | Unknown | ... | Net capacity | Net capacity | Net capacity | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | NaN |
technology | |||||||||||||||||||||
Fossil fuels | 7243.63 | 7059.07 | 7323 | 6469 | 6492 | 6041 | NaN | NaN | NaN | NaN | ... | 1442.97 | 1450.5 | 1223 | 3348 | 2907 | 2586 | 3164.51 | 3266.97 | 2581.47 | NaN |
- Lignite | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
- Hard coal | 1171 | 1171 | NaN | 1012 | 1012 | 598 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
- Oil | 333.736 | 173.636 | NaN | 168 | 168 | 164 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 140 | 138 | 138 | NaN | NaN | NaN | NaN |
- Natural gas | 4888.66 | 4820.19 | NaN | 4853 | 4876 | 4429 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 1016 | 1022 | 959 | NaN | NaN | NaN | NaN |
5 rows × 157 columns
data_raw.columns[data_raw.columns.duplicated()]
MultiIndex([], names=['country', 'type', 'year', 'source', 'source_type', 'weblink', 'capacity_definition'])
data_raw[data_raw.index.duplicated()]
country | AT | BE | ... | SI | SK | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
type | Installed capacity in MW | Installed capacity in MW | ... | Installed capacity in MW | Installed capacity in MW | ||||||||||||||||
year | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | 2019.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | ... | 2015.0 | 2016.0 | 2017.0 | 2013.0 | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | NaN |
source | e-control | e-control | e-control | e-control | e-control | e-control | ELIA | ELIA | ELIA | ELIA | ... | Agencija za energijo | Agencija za energijo | Agencija za energijo | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | Statistical Office of Slovakia | |
source_type | Regulatory Authority | Regulatory Authority | Regulatory Authority | Regulatory Authority | Regulatory Authority | Regulatory Authority | Transmission System Operator | Transmission System Operator | Transmission System Operator | Transmission System Operator | ... | Ministry | Ministry | Ministry | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | NaN |
weblink | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-JR_KWEPL.xlsx/d4357de2-9d0e-a3ae-4349-17d10306677e?t=1568280083538 | https://www.e-control.at/documents/1785851/1811756/BeStGes-2019_KW2EPLTyp.xlsx/e9993ee3-838f-ed2e-2031-7503418fac2d?t=1596094915390 | https://www.elia.be/en/grid-data/power-generation/generating-facilities | https://www.elia.be/en/grid-data/power-generation/generating-facilities | https://www.elia.be/en/grid-data/power-generation/generating-facilities | https://www.elia.be/en/grid-data/power-generation/generating-facilities | ... | https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2015/f1302ae0-7267-4ae7-b74d-7ce8c4323043 | https://www.agen-rs.si/documents/54870/68629/Report-on-the-energy-sector-in-Slovenia-for-2016/de8cc94f-b3f6-4d32-8e79-ce3b0ec386f5 | https://www.agen-rs.si/documents/54870/68629/a/78f74b68-dbfc-415e-ab88-882652558d94 | Link unavailable | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRZJIHHEGAOxQwwYEuvSASM31AE7jWsnb1_Z8aRtGEM7LQcNsP8Hu6t_hQS6R6LJ99WXfFdtmvxRPS-E-TliAR8OdRuAGSMIrtNR5CWWDhTQ3UngeLZP2RSATz0DAttPYysiBGyChArbURQl0ywDL8NjCIjuQZimADE783DhsuHP-Gw6e0vQY3MdgjBM0oxH5GZOT_x7_nziOCpszEdGFpKZS898h6CLt6GHP-f_M2wwUOFE5Z4lkECbbwnE5fLcELrrPwo6-DTo6Z87Njp5h3bzxx_8E-986P-tY7nKf75pU-bqlDOUIYHEstltdw9osSme8wdNNtr2e3EFanms1vmyklewf5a79euvbwCXukGkoVFJLLWsQLO4WWiSyWVBVyujMI3jt7fLqkQLHUtSyJxpK4xXGjUtS-OlBUpdMpaTEuu8_LiX7WER3Vb_wLeL_b0XsZq3YIh1nzoTwjPW4lsC0TfL4iTpmua-Gt4E42SGbYt6zii-VQM7dzBPpiYG0M-CLsP1WV70nQndgrmJQn9TS5TIBi1Uy9jFnkwwuttX8oDSZvOtVsdh8peO8AFdI1EV9eCwrAcw0DFnJsVqMy1M1P3xfK2-Pj0JW5l40-zkyw7d_3cXb-v0_ao5ea0qbR27sX-4cQkVi08_AHhGIRI!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsVZIgDhijIHYIQYMiXXpCAwNdcBOQ-3k7Ss7nrQNY2in1YEB9v9gd_WvEEf3iDdiX30RbbVtxKN8XnH9c2j4bDzGFoChTcC_TiahG5sYKKC7k8B2LY8acwA2dzXwLS-JzJAQsAjiMmyFYRjP0xTcVJ2CT7ALQZIARMaZhwvLgj_j0_niLUHXWGLwgyBOUhaSmyU98e_5s5lty7C2nGhpQBYOPfM9gj7eggH-nP_PsGaADMcy9zSGGLp8R8Avl-cE0F__UdDDJ_5A_5yp1svbtJ8__uCfePtD_29t05H-83SLGg6mzEAp4ojnTbtrH9Bqmz2LB6VolN337Ark5bHaiLwqrmD_XLSb11_fiLUotVKjSi4IVijOiMKKfK0wprFCFQwyio_f3uXVGq0wlEKYTFd0zRAKZSJXmF6aikl1BkIzszIvP-5ld1h4v9U_8N1if-9FJOfNH6vYo_aMsNTo8B0BH5plfpL0TfNQDW-CabxQLZO69iS6lQO7tFUWz3UVAJ8FfYYbsjwfOhP6BUsdBd62LlBcNGglW2Zc7MlMRXf7qjigpNl-q-VxGP-lIzxA14hXWT065PUIRlhlhk5VuZmmSuT98Xytvj49cUuaeNu0xUuL7v-7i3d18r5qRl6rStlETuQdbhxC-erTD_ffAqA!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZTBcpswEIafpYccsRYkIXHEGAOxQwwYGnTpAMYNdcBOQ-3m7Ss7nrQ1Y2in1YEB9v9gd_WvkEAPSDTZvvqctdW2yZ7kcyr0TwHz-HismgCMTsC7jSeBExkqEEAfTwLLMV3C5gB87lDwTDcOjQBjMDESMmwGQRDNkwScRJuCh1UH_DgGCNmZhyvLhD_jk_niLUGHLVXwfD-KEx7guyU58e_585llyTBdTmji44VNznyPoI83YYA_5_8zTBnIcCRzTyKIoMt3BOJ6ebYP_fUfBT187A30z57SXt4i_fzxB__EWxf9v7cMW_rP1U3CbJVwhhIkkCiadtc-onSbv2SPStkou2_5DcjLU7XJiqq8gf1L2W5ef32TGWrGVsZaWfFMV4hOqMKpvCsxLwzAJIeCHr-9K6oVSiHT8hUjupIX60Ihqk4VyZfKeo1xSTnOoNAu97I7LKLf6hd8t9jfexHKefPGmuoSa4Z5wjp8RyCGZlmcJH3TPFTDm2AaLTTTII41Ce_lwC4tjUdzXQNQz4I-ww1ZXgydCf2CpY58d1uXKCoblMqWsas9mck93VflAcXN9mstj8PoLx3hArpFosrr0aGoRzBSNc50osnNNDQs74_na_Xl-VmY0sTbpi2_t-jhv7t4V8fvq-b4taqUTWiH7uHOxkSkH34AcI_n7A!!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/wps/portal/!ut/p/z1/rZRRd5owFMc_yx76iLmEQJJHRASqpQLCal52AOPKLGhXp-u3X7SebpMjbGfLAwe4_x_ce_O_QQI9INHk--pzvqs2Tf6knhfC-hTRgA2Hug1AzREEt-ko8hKuAwH08SRwPNsndArApp4Jge2nMY8MA2wDCRW2oyhKplkGXobHEBi6B2GaAsT0zMOVZcOf8dl09pagR-c6BGGYpBmLjLs5OfHv-bOJ46iwOR-ZWWjMXHLmOwRdvA09_Dn_n2GTggonKvcsgQTafEsgrpfnhtBd_1HQwadBT__csdnJO6SbP_7gn3jnov_3DneV_3zLJtTVCaMoQwKJstltd49osSle8kdNNtr2W3ED6vJUrfOykjewf5G79euvbzDnJTcLoi1zk2gEl1IruGFpRcFXJsF0xTAcv70tqyVa6IwA6HylSWZxjchlqTFpgsYMbEnMTGtF8eVetodFdFv9gm8X-3svYjVvwRDrPnEmBstoi28JRN8si5Oka5r7angTjJMZtjnxnFF8rwZ27mCWTC2sWngWdBmuz_Ki70zoFswtFPqbWqJENmihWkav9mSi9nRfyQNKm83XWh2HyV86wgd0i0RV1INDWQ9goGNGLYLVZnJsqPvj-Vp9eX4WtjLxptnJ7zv08N9dvK3T91Uz47WqtHXsxv7hzjWIWHz4AZRUN9Q!/dz/d5/L2dBISEvZ0FBIS9nQSEh/ | https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=da19bcb8-8c9c-4757-aba5-2d0720266efa | |
capacity_definition | Gross capacity | Gross capacity | Gross capacity | Gross capacity | Gross capacity | Gross capacity | Unknown | Unknown | Unknown | Unknown | ... | Net capacity | Net capacity | Net capacity | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | NaN |
technology |
0 rows × 157 columns
# Reshape dataframe to list
data_opsd = pd.DataFrame(data_raw.stack(level=level_names))
# Reset index for dataframe
data_opsd.reset_index(inplace=True)
data_opsd['technology'] = data_opsd['technology'].str.replace('- ', '')
data_opsd.rename(columns={0: 'capacity'}, inplace=True)
data_opsd['capacity'] = pd.to_numeric(data_opsd['capacity'], errors='coerce')
# For some source, permission to publish data
banlist = ['ELIA', 'BMWi', 'Mavir']
davail = 'data available, but cannot be provided'
data_opsd.loc[data_opsd['source'].isin(banlist), 'comment'] = davail
data_opsd.head()
technology | country | type | year | source | source_type | weblink | capacity_definition | capacity | comment | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Fossil fuels | AT | Installed capacity in MW | 2014 | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | Gross capacity | 7243.634 | NaN |
1 | Fossil fuels | AT | Installed capacity in MW | 2015 | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | Gross capacity | 7059.065 | NaN |
2 | Fossil fuels | AT | Installed capacity in MW | 2016 | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | Gross capacity | 7323.000 | NaN |
3 | Fossil fuels | AT | Installed capacity in MW | 2017 | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | Gross capacity | 6469.000 | NaN |
4 | Fossil fuels | AT | Installed capacity in MW | 2018 | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | Gross capacity | 6492.000 | NaN |
EUROSTAT publishes annual structural data on national electricity generation capacities for European countries. The dataset is available in the EUROSTAT database within the category 'Environment and Energy' (nrg_113a).
url_eurostat = 'Link unavailable'
filepath_eurostat = os.path.join('input', 'Eurostat', 'Eurostat.tsv.gz')
data_eurostat = pd.read_csv(filepath_eurostat,
compression='gzip',
sep='\t|,',
engine='python'
)
data_eurostat.head()
unit | product | indic_nrg | geo\time | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | ... | 1999 | 1998 | 1997 | 1996 | 1995 | 1994 | 1993 | 1992 | 1991 | 1990 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MW | 9007 | 12_1176011 | AL | 98 | 98 | 98 | 98 | 97 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | MW | 9007 | 12_1176011 | AT | 5561 | 5964 | 6136 | 6326 | 6347 | 6339 | ... | 5089 | 4917 | 5199 | 5031 | 5028 | 5079 | 5061 | 5133 | 4750 | 4743 |
2 | MW | 9007 | 12_1176011 | BA | 2065 | 1765 | 1765 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | MW | 9007 | 12_1176011 | BE | 7182 | 7199 | 7202 | 7559 | 8182 | 9128 | ... | 7746 | 7650 | 6945 | 7126 | 7211 | 7298 | 6433 | 6394 | 6445 | 6490 |
4 | MW | 9007 | 12_1176011 | BG | 3635 | 3854 | 4410 | 4636 | 4849 | 4495 | ... | 5172 | 4977 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 31 columns
id_vars = ['unit', 'product','indic_nrg', 'geo\\time']
data_eurostat = pd.melt(data_eurostat, id_vars=id_vars,
var_name='year', value_name='value')
data_eurostat.head()
unit | product | indic_nrg | geo\time | year | value | |
---|---|---|---|---|---|---|
0 | MW | 9007 | 12_1176011 | AL | 2016 | 98 |
1 | MW | 9007 | 12_1176011 | AT | 2016 | 5561 |
2 | MW | 9007 | 12_1176011 | BA | 2016 | 2065 |
3 | MW | 9007 | 12_1176011 | BE | 2016 | 7182 |
4 | MW | 9007 | 12_1176011 | BG | 2016 | 3635 |
data_definition = pd.read_csv(os.path.join('input', 'definition_EUROSTAT_indic.txt'),
header=None,
names=['indic', 'description',
'energy source'],
sep='\t')
data_eurostat = data_eurostat.merge(data_definition,
how='left',
left_on='indic_nrg',
right_on='indic')
The classification of generation capacities in the EUROSTAT dataset is specified in Regulation (EC) No 1099/2008 (Annex B, 3.3). The available EUROSTAT dataset nrg_113a covers the following indicators:
indic_nrg | Description | Technology in OPSD |
---|---|---|
12_1176011 | Electrical capacity, main activity producers - Combustible Fuels | Fossil fuels & bioenergy |
12_1176012 | Electrical capacity, autoproducers - Combustible Fuels | Fossil fuels & bioenergy |
12_1176061 | Electrical capacity, main activity producers - Mixed plants | |
12_1176101 | Electrical capacity, main activity producers - Other Sources | |
12_1176102 | Electrical capacity, autoproducers - Other Sources | |
12_1176111 | Electrical capacity, main activity producers - Steam | |
12_1176112 | Electrical capacity, autoproducers - Steam | |
12_1176121 | Electrical capacity, main activity producers - Gas Turbine | |
12_1176122 | Electrical capacity, autoproducers - Gas Turbine | |
12_1176131 | Electrical capacity, main activity producers - Combined Cycle | |
12_1176132 | Electrical capacity, autoproducers - Combined Cycle | |
12_1176141 | Electrical capacity, main activity producers - Internal Combustion | |
12_1176142 | Electrical capacity, autoproducers - Internal Combustion | |
12_1176401 | Electrical capacity, main activity producers - Other Type of Generation | |
12_1176402 | Electrical capacity, autoproducers - Other Type of Generation | |
12_1176253 | Net maximum capacity - Municipal Wastes | Non-renewable waste |
12_1176263 | Net maximum capacity - Wood/Wood Wastes/Other Solid Wastes | Other bioenergy and renewable waste |
12_1176273 | Net maximum capacity - Biogases | Biomass and biogas |
12_1176283 | Net maximum capacity - Industrial Wastes (non-renewable) | Non-renewable waste |
12_1176343 | Net maximum capacity - Liquid Biofuels | Biomass and biogas |
12_1176031 | Electrical capacity, main activity producers - Nuclear | Nuclear |
12_1176032 | Electrical capacity, autoproducers - Nuclear | Nuclear |
12_1176051 | Electrical capacity, main activity producers - Hydro | Hydro |
12_1176052 | Electrical capacity, autoproducers - Hydro | Hydro |
12_1176071 | Net electrical capacity, main activity producers - Pure Pumped Hydro | Pumped storage |
12_1176072 | Net electrical capacity, autoproducers - Pure Pumped Hydro | Pumped storage |
12_117615 | Net maximum capacity - Hydro <1 MW | |
12_117616 | Net maximum capacity - Hydro >= 1 MW and <= 10 MW | |
12_117617 | Net maximum capacity - Hydro 10 MW and over | |
12_1176301 | Electrical capacity, main activity producers - Tide, wave and ocean | Marine |
12_1176302 | Electrical capacity, autoproducers - Tide, wave and ocean | Marine |
12_1176303 | Net maximum capacity - Tide, Wave, Ocean | |
12_1176081 | Electrical capacity, main activity producers - Geothermal | Geothermal |
12_1176082 | Electrical capacity, autoproducers - Geothermal | Geothermal |
12_1176083 | Net maximum capacity - Geothermal | |
12_1176091 | Electrical capacity, main activity producers - Wind | Wind |
12_1176092 | Electrical capacity, autoproducers - Wind | Wind |
12_1176233 | Net maximum capacity - Solar Photovoltaic | Photovoltaics |
12_1176243 | Net maximum capacity - Solar Thermal Electric | Concentrated solar power |
Bold rows indicate top level classes within the EUROSTAT classification, whereas normal and italic rows cover different kinds of subclassifications. Especially within the top level 'Combustible fuels' different kinds of subcategorizations based on fuel or technology are available. Simarily, 'Hydro' is differentiated by type (e.g. pumped-hydro storage) or capacity classes. Italic rows are not further considered within the OPSD dataset due to the mismatch with existing technology classes.
data_eurostat = data_eurostat[data_eurostat['energy source'].isnull() == False]
values_as_string = data_eurostat['value'].astype(str)
string_values = values_as_string.str.split(' ', 1).str[0]
string_values.replace(':', np.nan, inplace=True)
subset_nan = string_values.isnull()
data_eurostat['value'] = string_values
data_eurostat['year'] = data_eurostat['year'].astype(int)
data_eurostat['value'] = data_eurostat['value'].astype(float)
data_eurostat.head()
unit | product | indic_nrg | geo\time | year | value | indic | description | energy source | |
---|---|---|---|---|---|---|---|---|---|
0 | MW | 9007 | 12_1176011 | AL | 2016 | 98.0 | 12_1176011 | Electrical capacity, main activity producers -... | Fossil fuels |
1 | MW | 9007 | 12_1176011 | AT | 2016 | 5561.0 | 12_1176011 | Electrical capacity, main activity producers -... | Fossil fuels |
2 | MW | 9007 | 12_1176011 | BA | 2016 | 2065.0 | 12_1176011 | Electrical capacity, main activity producers -... | Fossil fuels |
3 | MW | 9007 | 12_1176011 | BE | 2016 | 7182.0 | 12_1176011 | Electrical capacity, main activity producers -... | Fossil fuels |
4 | MW | 9007 | 12_1176011 | BG | 2016 | 3635.0 | 12_1176011 | Electrical capacity, main activity producers -... | Fossil fuels |
data_eurostat = data_eurostat.drop(['unit', 'product', 'indic_nrg',
'indic', 'description'], axis=1)
data_eurostat = data_eurostat.rename(columns={'geo\\time': 'country',
'energy source': 'technology',
'value': 'capacity'})
data_eurostat['country'].replace({'UK': 'GB', 'EL': 'GR'}, inplace=True)
drop_list = data_eurostat[data_eurostat['country'].isin(['EU28','EA19'])].index
data_eurostat.drop(drop_list, inplace=True)
by_columns = ['technology', 'year', 'country']
data_eurostat = pd.DataFrame(data_eurostat.groupby(by_columns)['capacity'].sum())
data_eurostat_isnull = data_eurostat['capacity'].isnull() == True
data_eurostat.reset_index(inplace=True)
data_eurostat.head()
technology | year | country | capacity | |
---|---|---|---|---|
0 | Biomass and biogas | 1990 | AL | 0.0 |
1 | Biomass and biogas | 1990 | AT | 0.0 |
2 | Biomass and biogas | 1990 | BA | 0.0 |
3 | Biomass and biogas | 1990 | BE | 1.0 |
4 | Biomass and biogas | 1990 | BG | 0.0 |
eurostat_pivot = data_eurostat.pivot_table(values='capacity',
index=['country','year'],
columns='technology')
eurostat_pivot.head()
technology | Biomass and biogas | Concentrated solar power | Fossil fuels | Geothermal | Hydro | Marine | Non-renewable waste | Nuclear | Other bioenergy and renewable waste | Photovoltaics | Pumped storage | Wind | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
country | year | ||||||||||||
AL | 1990 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1991 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
1992 | 0.0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
1993 | 0.0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
1994 | 0.0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
eurostat_pivot['Differently categorized solar'] = 0
eurostat_pivot['Solar'] = eurostat_pivot[['Photovoltaics', 'Concentrated solar power']].sum(axis=1)
eurostat_pivot['Differently categorized wind'] = eurostat_pivot['Wind']
bio_arr = ['Biomass and biogas', 'Other bioenergy and renewable waste']
eurostat_pivot['Bioenergy and renewable waste'] = eurostat_pivot[bio_arr].sum(axis=1)
res_arr = ['Hydro', 'Wind', 'Solar', 'Geothermal', 'Marine', 'Bioenergy and renewable waste']
eurostat_pivot['Renewable energy sources'] = eurostat_pivot[res_arr].sum(axis=1)
eurostat_pivot['Fossil fuels'] = eurostat_pivot['Fossil fuels'] - eurostat_pivot['Bioenergy and renewable waste']
eurostat_pivot['Differently categorized fossil fuels'] = eurostat_pivot['Fossil fuels']\
- eurostat_pivot['Non-renewable waste']
total_arr = ['Fossil fuels','Nuclear','Renewable energy sources']
eurostat_pivot['Total'] = eurostat_pivot[total_arr].sum(axis=1)
eurostat_pivot.head()
01 Oct 2020 16:07:59 - numexpr.utils - INFO - NumExpr defaulting to 4 threads.
technology | Biomass and biogas | Concentrated solar power | Fossil fuels | Geothermal | Hydro | Marine | Non-renewable waste | Nuclear | Other bioenergy and renewable waste | Photovoltaics | Pumped storage | Wind | Differently categorized solar | Solar | Differently categorized wind | Bioenergy and renewable waste | Renewable energy sources | Differently categorized fossil fuels | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
country | year | |||||||||||||||||||
AL | 1990 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1991 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
1992 | 0.0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 1453.0 | |
1993 | 0.0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 1453.0 | |
1994 | 0.0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 1453.0 | 0.0 | 1453.0 |
data_eurostat = eurostat_pivot.stack().reset_index().rename(columns={0: 'capacity'})
data_eurostat['source'] = 'EUROSTAT'
data_eurostat['source_type'] = 'Statistical Office'
data_eurostat['capacity_definition'] = 'Unknown'
data_eurostat['type'] = 'Installed capacity in MW'
data_eurostat['weblink'] = url_eurostat
data_eurostat.head()
country | year | technology | capacity | source | source_type | capacity_definition | type | weblink | |
---|---|---|---|---|---|---|---|---|---|
0 | AL | 1990 | Biomass and biogas | 0.0 | EUROSTAT | Statistical Office | Unknown | Installed capacity in MW | Link unavailable |
1 | AL | 1990 | Concentrated solar power | 0.0 | EUROSTAT | Statistical Office | Unknown | Installed capacity in MW | Link unavailable |
2 | AL | 1990 | Fossil fuels | 0.0 | EUROSTAT | Statistical Office | Unknown | Installed capacity in MW | Link unavailable |
3 | AL | 1990 | Geothermal | 0.0 | EUROSTAT | Statistical Office | Unknown | Installed capacity in MW | Link unavailable |
4 | AL | 1990 | Hydro | 0.0 | EUROSTAT | Statistical Office | Unknown | Installed capacity in MW | Link unavailable |
The ENTSO-E publishes annual data on national generation capacites in different specifications and formats. We use two relevant data sources from the ENTSOE-E, which comprises firstly statistical data within the Data Portal (up to 2015) or ENTSO-E Transparency Platform, and secondly datasets compiled within the ENTSO-E System Outlook & Adequacy Forecast (SO&AF). The ENTSO-E Transparency Platform is currently not implemented as a data source for national generation capacities.
The advantage of the ENTSO-E SO&AF is the higher granularity of the data with respect to the main fuel or technology. However, as the SO&AF provides a forecast on future system conditions in particular peak hours, the dataset also accounts for expected capacity changes throughout the years. Therefore, we only consider years which are closest to the publication year of the respective SO&AF.
In the following, we use the statistical data available in the Data Portal (up to 2015).
url_entsoe = 'https://docstore.entsoe.eu/Documents/Publications/Statistics/NGC_2010-2015.xlsx'
filepath_entsoe = func.downloadandcache(url_entsoe, 'Statistics.xls',
os.path.join('ENTSO-E','Data Portal 2010-2015')
)
data_entsoe_raw = pd.read_excel(filepath_entsoe)
data_entsoe_raw.head()
01 Oct 2020 16:07:59 - root - INFO - Downloading file Statistics.xls
Country | year | hydro | of which storage | of which run of river | of which pumped storage | nuclear | renewable | of which wind | of which solar | of which biomass | fossil_fueals | other | representativity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AT | 2010 | 12665.0 | 7205.0 | 5241.0 | 0.0 | 0 | 1031.0 | 1002.0 | 0.0 | 0.0 | 7389.0 | 0.0 | 100.0 |
1 | AT | 2011 | 13200.0 | 0.0 | 0.0 | 0.0 | 0 | 1542.0 | 1056.0 | 55.0 | 0.0 | 7818.0 | 68.0 | 100.0 |
2 | AT | 2012 | 13350.0 | 0.0 | 0.0 | 0.0 | 0 | 1906.0 | 1307.0 | 172.0 | 427.0 | 7834.0 | 74.0 | 100.0 |
3 | AT | 2013 | 13427.0 | 0.0 | 0.0 | 0.0 | 0 | 2305.0 | 1555.0 | 324.0 | 426.0 | 7847.0 | 244.0 | 100.0 |
4 | AT | 2014 | 13569.0 | 0.0 | 0.0 | 0.0 | 0 | 3413.0 | 2110.0 | 589.0 | 512.0 | 7243.0 | 0.0 | 100.0 |
dict_energy_source = {'hydro': 'Hydro',
'of which storage': 'Reservoir',
'of which run of river': 'Run-of-river',
'of which pumped storage': 'Pumped storage',
'nuclear': 'Nuclear',
'of which wind': 'Wind',
'of which solar': 'Solar',
'of which biomass': 'Biomass and biogas',
'fossil_fuels': 'Fossil fuels',
'other': 'Other or unspecified energy sources',
"Country": "country",
'fossil_fueals': 'Fossil fuels'}
data_entsoe_raw.rename(columns=dict_energy_source,
inplace=True)
data_entsoe_raw.drop(columns='representativity', inplace=True)
data_entsoe_raw.head()
country | year | Hydro | Reservoir | Run-of-river | Pumped storage | Nuclear | renewable | Wind | Solar | Biomass and biogas | Fossil fuels | Other or unspecified energy sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AT | 2010 | 12665.0 | 7205.0 | 5241.0 | 0.0 | 0 | 1031.0 | 1002.0 | 0.0 | 0.0 | 7389.0 | 0.0 |
1 | AT | 2011 | 13200.0 | 0.0 | 0.0 | 0.0 | 0 | 1542.0 | 1056.0 | 55.0 | 0.0 | 7818.0 | 68.0 |
2 | AT | 2012 | 13350.0 | 0.0 | 0.0 | 0.0 | 0 | 1906.0 | 1307.0 | 172.0 | 427.0 | 7834.0 | 74.0 |
3 | AT | 2013 | 13427.0 | 0.0 | 0.0 | 0.0 | 0 | 2305.0 | 1555.0 | 324.0 | 426.0 | 7847.0 | 244.0 |
4 | AT | 2014 | 13569.0 | 0.0 | 0.0 | 0.0 | 0 | 3413.0 | 2110.0 | 589.0 | 512.0 | 7243.0 | 0.0 |
data_entsoe_raw['Differently categorized solar'] = data_entsoe_raw['Solar']
data_entsoe_raw['Differently categorized wind'] = data_entsoe_raw['Wind']
data_entsoe_raw['Bioenergy and renewable waste'] = data_entsoe_raw['Biomass and biogas']
data_entsoe_raw['Differently categorized fossil fuels'] = data_entsoe_raw['Fossil fuels']
data_entsoe_raw['Differently categorized hydro'] = (
data_entsoe_raw['Hydro']
- data_entsoe_raw['Run-of-river']
- data_entsoe_raw['Reservoir']
- data_entsoe_raw['Pumped storage'])
data_entsoe_raw['Differently categorized renewable energy sources'] = (
data_entsoe_raw['renewable']
- data_entsoe_raw['Wind']
- data_entsoe_raw['Solar']
- data_entsoe_raw['Biomass and biogas'])
data_entsoe_raw.drop(columns='renewable', inplace=True)
data_entsoe_raw['Renewable energy sources'] = (
data_entsoe_raw['Hydro']
+ data_entsoe_raw['Wind']
+ data_entsoe_raw['Solar']
+ data_entsoe_raw['Bioenergy and renewable waste']
+ data_entsoe_raw['Differently categorized renewable energy sources'])
data_entsoe_raw['Total'] = (
data_entsoe_raw['Renewable energy sources']
+ data_entsoe_raw['Nuclear']
+ data_entsoe_raw['Fossil fuels']
+ data_entsoe_raw['Other or unspecified energy sources'])
data_entsoe = pd.melt(data_entsoe_raw,
id_vars=['country', 'year'],
var_name='technology',
value_name='capacity')
data_entsoe.head()
country | year | technology | capacity | |
---|---|---|---|---|
0 | AT | 2010 | Hydro | 12665.0 |
1 | AT | 2011 | Hydro | 13200.0 |
2 | AT | 2012 | Hydro | 13350.0 |
3 | AT | 2013 | Hydro | 13427.0 |
4 | AT | 2014 | Hydro | 13569.0 |
data_entsoe['country'].replace('NI', 'GB', inplace=True)
# set negative capacities to zero
data_entsoe.loc[data_entsoe['capacity'] < 0, 'capacity'] = 0
data_entsoe['source'] = 'ENTSO-E Data Portal'
data_entsoe['source_type'] = 'Other association'
data_entsoe['capacity_definition'] = 'Net capacity'
data_entsoe['type'] = 'Installed capacity in MW'
data_entsoe.head()
country | year | technology | capacity | source | source_type | capacity_definition | type | |
---|---|---|---|---|---|---|---|---|
0 | AT | 2010 | Hydro | 12665.0 | ENTSO-E Data Portal | Other association | Net capacity | Installed capacity in MW |
1 | AT | 2011 | Hydro | 13200.0 | ENTSO-E Data Portal | Other association | Net capacity | Installed capacity in MW |
2 | AT | 2012 | Hydro | 13350.0 | ENTSO-E Data Portal | Other association | Net capacity | Installed capacity in MW |
3 | AT | 2013 | Hydro | 13427.0 | ENTSO-E Data Portal | Other association | Net capacity | Installed capacity in MW |
4 | AT | 2014 | Hydro | 13569.0 | ENTSO-E Data Portal | Other association | Net capacity | Installed capacity in MW |
soafs = [soaf.SoafDataRaw('https://www.entsoe.eu/fileadmin/user_upload/_library/SDC/SOAF/SO_AF_2011_-_2025_.zip',
'SO_AF_2011_-_2025_.zip',
'SO&AF 2011 - 2025 Scenario B.xls',
2011),
soaf.SoafDataRaw('https://www.entsoe.eu/fileadmin/user_upload/_library/SDC/SOAF/120705_SOAF_2012_Dataset.zip',
'120705_SOAF_2012_Dataset.zip',
'SOAF 2012 Scenario B.xls',
2012),
soaf.SoafDataRaw('https://www.entsoe.eu/fileadmin/user_upload/_library/publications/entsoe/So_AF_2013-2030/130403_SOAF_2013-2030_dataset.zip',
'130403_SOAF_2013-2030_dataset.zip',
'ScB.xls',
2013),
soaf.SoafDataRaw('https://www.entsoe.eu/Documents/SDC%20documents/SOAF/140602_SOAF%202014_dataset.zip',
'140602_SOAF%202014_dataset.zip',
'ScB.xlsx',
2014),
soaf.SoafDataRaw('https://www.entsoe.eu/Documents/Publications/SDC/data/SO_AF_2015_dataset.zip',
'SO_AF_2015_dataset.zip',
os.path.join('SO&AF 2015 dataset', 'ScB_publication.xlsx'),
2016)]
data_soaf = pd.concat([s.transformed_df for s in soafs])
# Correct that in the Soaf2015 datatset the year column is 2016 instead of 2015
data_soaf['year'].replace({2016 : 2015}, inplace=True)
data_soaf.head()
01 Oct 2020 16:08:00 - root - INFO - Downloading file SO_AF_2011_-_2025_.zip 01 Oct 2020 16:08:01 - root - INFO - Extracting SO_AF_2011_-_2025_.zip into the directory download\ENTSO-E\SO&AF\2011 01 Oct 2020 16:08:01 - root - INFO - Reading download\ENTSO-E\SO&AF\2011\SO&AF 2011 - 2025 Scenario B.xls and transforming into a DataFrame 01 Oct 2020 16:08:02 - root - INFO - Downloading file 120705_SOAF_2012_Dataset.zip 01 Oct 2020 16:08:02 - root - INFO - Extracting 120705_SOAF_2012_Dataset.zip into the directory download\ENTSO-E\SO&AF\2012 01 Oct 2020 16:08:02 - root - INFO - Reading download\ENTSO-E\SO&AF\2012\SOAF 2012 Scenario B.xls and transforming into a DataFrame 01 Oct 2020 16:08:03 - root - INFO - Downloading file 130403_SOAF_2013-2030_dataset.zip 01 Oct 2020 16:08:04 - root - INFO - Extracting 130403_SOAF_2013-2030_dataset.zip into the directory download\ENTSO-E\SO&AF\2013 01 Oct 2020 16:08:04 - root - INFO - Reading download\ENTSO-E\SO&AF\2013\ScB.xls and transforming into a DataFrame 01 Oct 2020 16:08:05 - root - INFO - Downloading file 140602_SOAF%202014_dataset.zip 01 Oct 2020 16:08:05 - root - INFO - Extracting 140602_SOAF%202014_dataset.zip into the directory download\ENTSO-E\SO&AF\2014 01 Oct 2020 16:08:05 - root - INFO - Reading download\ENTSO-E\SO&AF\2014\ScB.xlsx and transforming into a DataFrame 01 Oct 2020 16:08:06 - root - INFO - Downloading file SO_AF_2015_dataset.zip 01 Oct 2020 16:08:07 - root - INFO - Extracting SO_AF_2015_dataset.zip into the directory download\ENTSO-E\SO&AF\2016 01 Oct 2020 16:08:07 - root - INFO - Reading download\ENTSO-E\SO&AF\2016\SO&AF 2015 dataset\ScB_publication.xlsx and transforming into a DataFrame
technology | capacity | country | year | |
---|---|---|---|---|
1 | Nuclear | 0.0 | AL | 2011 |
2 | Fossil fuels | 0.0 | AL | 2011 |
3 | Lignite | 0.0 | AL | 2011 |
4 | Hard coal | 0.0 | AL | 2011 |
5 | Natural gas | 0.0 | AL | 2011 |
soaf_unstacked = func.unstackData(data_soaf)
soaf_unstacked['Differently categorized solar'] = soaf_unstacked['Solar']
soaf_unstacked['Differently categorized wind'] = soaf_unstacked['Wind']\
- soaf_unstacked['Offshore']\
- soaf_unstacked['Onshore']
soaf_unstacked['Differently categorized hydro'] = soaf_unstacked['Hydro']\
- soaf_unstacked['Run-of-river']\
- soaf_unstacked['Reservoir including pumped storage']
soaf_unstacked['Bioenergy and renewable waste'] = soaf_unstacked['Biomass and biogas']
soaf_unstacked['Differently categorized renewable energy sources'] = (
soaf_unstacked['renewable']
- soaf_unstacked['Wind']
- soaf_unstacked['Solar']
- soaf_unstacked['Biomass and biogas'])
soaf_unstacked.drop(columns='renewable', inplace=True)
subtract_fossils_arr = ['Lignite','Hard coal','Oil','Natural gas','Mixed fossil fuels']
soaf_unstacked['Differently categorized fossil fuels'] = soaf_unstacked['Fossil fuels']\
- soaf_unstacked[subtract_fossils_arr].sum(axis=1)
res_arr = ['Solar','Wind','Bioenergy and renewable waste','Hydro',
'Differently categorized renewable energy sources']
soaf_unstacked['Renewable energy sources'] = soaf_unstacked[res_arr].sum(axis=1)
total_arr = ['Renewable energy sources','Fossil fuels','Nuclear',
'Other or unspecified energy sources']
soaf_unstacked['Total'] = soaf_unstacked[total_arr].sum(axis=1)
soaf_unstacked.head()
technology | Biomass and biogas | Fossil fuels | Hard coal | Hydro | Lignite | Mixed fossil fuels | Natural gas | Nuclear | Offshore | Oil | ... | Solar | Wind | Differently categorized solar | Differently categorized wind | Differently categorized hydro | Bioenergy and renewable waste | Differently categorized renewable energy sources | Differently categorized fossil fuels | Renewable energy sources | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
country | year | |||||||||||||||||||||
AL | 2011 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2012 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
2013 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
2014 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
2015 | 0.0 | 100.0 | 0.0 | 1860.0 | 0.0 | 0.0 | 100.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 0.0 | 1860.0 | 1960.0 |
5 rows × 24 columns
data_soaf = func.restackData(soaf_unstacked)
data_soaf.loc[data_soaf['capacity'] < 0, 'capacity'] = 0
data_soaf['source'] = 'ENTSO-E SOAF'
data_soaf['type'] = 'Installed capacity in MW'
data_soaf['capacity_definition'] = 'Net capacity'
data_soaf['source_type'] = 'Other association'
data_soaf['weblink'] = url_entsoe
data_soaf.head()
country | year | technology | capacity | source | type | capacity_definition | source_type | weblink | |
---|---|---|---|---|---|---|---|---|---|
0 | AL | 2011 | Biomass and biogas | 0.0 | ENTSO-E SOAF | Installed capacity in MW | Net capacity | Other association | https://docstore.entsoe.eu/Documents/Publicati... |
1 | AL | 2011 | Fossil fuels | 0.0 | ENTSO-E SOAF | Installed capacity in MW | Net capacity | Other association | https://docstore.entsoe.eu/Documents/Publicati... |
2 | AL | 2011 | Hard coal | 0.0 | ENTSO-E SOAF | Installed capacity in MW | Net capacity | Other association | https://docstore.entsoe.eu/Documents/Publicati... |
3 | AL | 2011 | Hydro | 0.0 | ENTSO-E SOAF | Installed capacity in MW | Net capacity | Other association | https://docstore.entsoe.eu/Documents/Publicati... |
4 | AL | 2011 | Lignite | 0.0 | ENTSO-E SOAF | Installed capacity in MW | Net capacity | Other association | https://docstore.entsoe.eu/Documents/Publicati... |
# file pattern for the single years
filenamepattern = '_1_InstalledGenerationCapacityAggregated.csv'
list_of_data_tables = [] # list to append
# iterate over the years from 2015 to 2020
for i in range(2015,2021):
filepath = os.path.join('input',
'ENTSO-E',
'Transparency',
'InstalledGenerationCapacityAggregated',
str(i) + filenamepattern)
list_of_data_tables.append(pd.read_csv(filepath, delimiter="\t", encoding = "UTF-16"))
# merge the datasets of the single of files into one pandas dataframe
data_transparency = pd.concat(list_of_data_tables, ignore_index=True)
data_transparency.head()
Year | Month | Day | DateTime | ResolutionCode | areacode | AreaTypeCode | AreaName | MapCode | ProductionType | AggregatedInstalledCapacity | DeletedFlag | UpdateTime | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015 | 1 | 1 | 2015-01-01 00:00:00.000 | P1Y | 10YAT-APG------L | CTA | APG CA | AT | Other | 22.8 | 0 | 2017-09-13 15:14:19 |
1 | 2015 | 1 | 1 | 2015-01-01 00:00:00.000 | P1Y | 10YAT-APG------L | CTA | APG CA | AT | Wind Offshore | 0.0 | 0 | 2017-09-13 15:14:19 |
2 | 2015 | 1 | 1 | 2015-01-01 00:00:00.000 | P1Y | 10YAT-APG------L | CTA | APG CA | AT | Fossil Brown coal/Lignite | 0.0 | 0 | 2017-09-13 15:14:19 |
3 | 2015 | 1 | 1 | 2015-01-01 00:00:00.000 | P1Y | 10YAT-APG------L | CTA | APG CA | AT | Nuclear | 0.0 | 0 | 2017-09-13 15:14:19 |
4 | 2015 | 1 | 1 | 2015-01-01 00:00:00.000 | P1Y | 10YAT-APG------L | CTA | APG CA | AT | Fossil Hard coal | 1171.0 | 0 | 2017-09-13 15:14:19 |
# rename columns according to the opsd standards
data_transparency.rename(columns={'ProductionType': 'technology',
'AggregatedInstalledCapacity': 'capacity',
'MapCode': 'country',
'Year': 'year'}, inplace=True)
# drop non relevant columns
data_transparency = data_transparency.filter(items=['technology','capacity','country','year'], axis=1)
# drop countries that are not part of opsd
data_transparency = data_transparency[data_transparency['country'].isin(data_opsd.country.unique())]
data_transparency.head()
technology | capacity | country | year | |
---|---|---|---|---|
0 | Other | 22.8 | AT | 2015 |
1 | Wind Offshore | 0.0 | AT | 2015 |
2 | Fossil Brown coal/Lignite | 0.0 | AT | 2015 |
3 | Nuclear | 0.0 | AT | 2015 |
4 | Fossil Hard coal | 1171.0 | AT | 2015 |
# adapt energy source notation
dict_energy_source = {'Biomass': 'Biomass and biogas',
'Fossil Brown coal/Lignite': 'Lignite',
'Fossil Coal-derived gas': 'Mixed fossil fuels',
'Fossil Gas': 'Natural gas',
'Fossil Hard coal': 'Hard coal',
'Fossil Oil': 'Oil',
'Fossil Oil shale': 'Oil',
'Fossil Peat': 'Other fossil fuels',
'Hydro Pumped Storage': 'Pumped storage',
'Hydro Run-of-river and poundage': 'Run-of-river',
'Hydro Water Reservoir': 'Reservoir',
'Other': 'Other or unspecified energy sources',
'Other renewable': 'Differently categorized renewable energy sources',
'Waste': 'Other bioenergy and renewable waste',
'Wind Offshore': 'Offshore',
'Wind Onshore': 'Onshore',
' ': np.nan}
data_transparency['technology'].replace(dict_energy_source, inplace=True)
data_transparency.head()
technology | capacity | country | year | |
---|---|---|---|---|
0 | Other or unspecified energy sources | 22.8 | AT | 2015 |
1 | Offshore | 0.0 | AT | 2015 |
2 | Lignite | 0.0 | AT | 2015 |
3 | Nuclear | 0.0 | AT | 2015 |
4 | Hard coal | 1171.0 | AT | 2015 |
# add missing categories
transparency_pivot = data_transparency.pivot_table(values='capacity',
index=['country','year'],
columns='technology')
# technology level
transparency_pivot['Differently categorized solar'] = transparency_pivot['Solar']
transparency_pivot['Differently categorized natural gas'] = transparency_pivot['Natural gas']
transparency_pivot['Non-renewable waste'] = 0
transparency_pivot['Differently categorized fossil fuels'] = 0
# level 3
hydro_arr = ['Pumped storage', 'Reservoir', 'Run-of-river']
transparency_pivot['Hydro'] = transparency_pivot[hydro_arr].sum(axis=1)
wind_arr = ['Onshore', 'Offshore']
transparency_pivot['Wind'] = transparency_pivot[wind_arr].sum(axis=1)
# level 2
bio_arr = ['Biomass and biogas', 'Other bioenergy and renewable waste']
transparency_pivot['Bioenergy and renewable waste'] = transparency_pivot[bio_arr].sum(axis=1)
#level 1
res_arr = ['Hydro', 'Wind', 'Solar', 'Geothermal', 'Marine',
'Bioenergy and renewable waste', 'Differently categorized renewable energy sources']
transparency_pivot['Renewable energy sources'] = transparency_pivot[res_arr].sum(axis=1)
fossil_arr = ['Lignite', 'Hard coal', 'Oil', 'Natural gas', 'Mixed fossil fuels', 'Other fossil fuels']
transparency_pivot['Fossil fuels'] = transparency_pivot[fossil_arr].sum(axis=1)
# level 0
total_arr = ['Fossil fuels','Nuclear','Renewable energy sources', 'Other or unspecified energy sources']
transparency_pivot['Total'] = transparency_pivot[total_arr].sum(axis=1)
transparency_pivot.reset_index(inplace=True)
transparency_pivot.head()
technology | country | year | Biomass and biogas | Differently categorized renewable energy sources | Geothermal | Hard coal | Lignite | Marine | Mixed fossil fuels | Natural gas | ... | Differently categorized solar | Differently categorized natural gas | Non-renewable waste | Differently categorized fossil fuels | Hydro | Wind | Bioenergy and renewable waste | Renewable energy sources | Fossil fuels | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AT | 2015 | 464.2 | 31.3 | 0.9 | 1171.0 | 0.0 | 0.0 | 0.0 | 4501.2 | ... | 587.0 | 4501.2 | 0 | 0 | 11478.7 | 2120.9 | 608.3 | 14827.1 | 5761.15 | 20611.05 |
1 | AT | 2016 | 473.6 | 32.5 | 0.9 | 766.0 | 0.0 | 0.0 | 0.0 | 4465.8 | ... | 723.0 | 4465.8 | 0 | 0 | 11510.7 | 2497.4 | 617.7 | 15382.2 | 5320.75 | 20725.75 |
2 | AT | 2017 | 473.8 | 32.5 | 0.9 | 598.0 | 0.0 | 0.0 | 0.0 | 4465.8 | ... | 1031.0 | 4465.8 | 0 | 0 | 11946.8 | 2696.0 | 617.9 | 16325.1 | 5152.75 | 21500.65 |
3 | AT | 2018 | 491.0 | 42.3 | 0.9 | 598.0 | 0.0 | 0.0 | 0.0 | 4467.7 | ... | 1193.0 | 4467.7 | 0 | 0 | 11990.4 | 2887.0 | 640.8 | 16754.4 | 5154.65 | 21931.85 |
4 | AT | 2019 | 499.8 | 42.3 | 0.9 | 598.0 | 0.0 | 0.0 | 0.0 | 4463.2 | ... | 1192.7 | 4463.2 | 0 | 0 | 11117.5 | 3034.6 | 649.8 | 16037.8 | 5150.15 | 21210.75 |
5 rows × 31 columns
data_transparency = pd.melt(transparency_pivot,
id_vars=['country', 'year'],
var_name='technology',
value_name='capacity')
data_transparency = data_transparency.loc[data_transparency["year"] < 2020, :]
data_transparency['source'] = 'ENTSO-E Transparency Platform'
data_transparency['source_type'] = 'Other association'
data_transparency['capacity_definition'] = 'Net capacity'
data_transparency['type'] = 'Installed capacity in MW'
data_transparency['weblink'] = ('https://transparency.entsoe.eu/generation'
'/r2/installedGenerationCapacityAggregation/show')
data_transparency.head()
country | year | technology | capacity | source | source_type | capacity_definition | type | weblink | |
---|---|---|---|---|---|---|---|---|---|
0 | AT | 2015 | Biomass and biogas | 464.2 | ENTSO-E Transparency Platform | Other association | Net capacity | Installed capacity in MW | https://transparency.entsoe.eu/generation/r2/i... |
1 | AT | 2016 | Biomass and biogas | 473.6 | ENTSO-E Transparency Platform | Other association | Net capacity | Installed capacity in MW | https://transparency.entsoe.eu/generation/r2/i... |
2 | AT | 2017 | Biomass and biogas | 473.8 | ENTSO-E Transparency Platform | Other association | Net capacity | Installed capacity in MW | https://transparency.entsoe.eu/generation/r2/i... |
3 | AT | 2018 | Biomass and biogas | 491.0 | ENTSO-E Transparency Platform | Other association | Net capacity | Installed capacity in MW | https://transparency.entsoe.eu/generation/r2/i... |
4 | AT | 2019 | Biomass and biogas | 499.8 | ENTSO-E Transparency Platform | Other association | Net capacity | Installed capacity in MW | https://transparency.entsoe.eu/generation/r2/i... |
row_of_year = {2014: 9,
2015: 53,
2016: 97,
2017: 141,
2018: 185}
dataframes = []
for year, row in row_of_year.items():
# read the dataframe for each year
power_statistics_raw = pd.read_excel(os.path.join('input',
'ENTSO-E',
'Power Statistics',
'NGC.xlsx'),
header=[0,1],
sheet_name='NGC',
skiprows=row,
nrows=42)
# drop non relevant columns
power_statistics_raw.drop(columns='Coverage ratio in %', level=1, inplace=True)
power_statistics_raw.drop(columns=['Unnamed: 1_level_1','Unnamed: 2_level_1','Unnamed: 3_level_1','Unnamed: 4_level_1'], level=1, inplace=True)
# get rid of multi index
df = power_statistics_raw.set_index(year).stack().reset_index().drop('level_1', axis=1)
# remove leftovers of multi index in the index column
df["technology"] = df[year].apply(lambda x: x[0])
df.drop(columns=year, inplace=True)
# stack df to the opsd standard format
stacked_df = df.melt(id_vars='technology', var_name='country', value_name='capacity')
# add information about the year
stacked_df['year'] = year
# append to the main list of dataframes
dataframes.append(stacked_df)
power_statistics = pd.concat(dataframes)
power_statistics.head()
technology | country | capacity | year | |
---|---|---|---|---|
0 | Non-Renewable | AL | 0 | 2014 |
1 | Nuclear | AL | NaN | 2014 |
2 | Non-renwable hydro | AL | 0 | 2014 |
3 | Of which hydro pure pumped storage | AL | NaN | 2014 |
4 | Of which Hydro mixed pumped storage (non renew... | AL | NaN | 2014 |
# drop countries that are not covered in opsd
opsd_countries = data_opsd.country.unique()
drop_list_country = power_statistics.loc[~power_statistics['country'].isin(opsd_countries)].index.to_list()
power_statistics.drop(drop_list_country, inplace=True)
# technology classes to be dropped
tech_to_drop = ['Non-Renewable', 'Fossil fuels', 'Renewable','Non-renwable hydro',
'Total Waste', 'Bio', 'Renewable Hydro', 'Comments', 'Total NGC']
drop_list_tech = power_statistics.loc[power_statistics['technology'].isin(tech_to_drop)].index.to_list()
power_statistics.drop(drop_list_tech, inplace=True)
# replace string with values that can be used in math operations
power_statistics['capacity'].replace(to_replace='Not Expected', value=0, inplace=True)
power_statistics['capacity'].replace(to_replace='Not Available', value=np.nan, inplace=True)
power_statistics.head()
technology | country | capacity | year | |
---|---|---|---|---|
41 | Nuclear | AT | 0.0 | 2014 |
43 | Of which hydro pure pumped storage | AT | 0.0 | 2014 |
44 | Of which Hydro mixed pumped storage (non renew... | AT | 0.0 | 2014 |
46 | Of which Fossil Brown coal/Lignite | AT | 0.0 | 2014 |
47 | Of which Fossil Coal-derived gas | AT | 0.0 | 2014 |
# Not included because already categorized in OPSD standard:
# Nuclear, Solar, Geothermal, Wind
dict_energy_source = {'Of which hydro pure pumped storage':'Pumped storage',
'Of which Hydro mixed pumped storage (non renewable part)':'Pumped storage',
'Of which Fossil Brown coal/Lignite':'Lignite',
'Of which Fossil Coal-derived gas':'Differently categorized fossil fuels',
'Of which Fossil Gas':'Natural gas',
'Of which Fossil Hard coal':'Hard coal',
'Of which Fossil Oil':'Oil',
'Of which Fossil Oil shale':'Oil',
'Of which Fossil Peat':'Differently categorized fossil fuels',
'Of which Mixed fuels':'Mixed fossil fuels',
'Of which Other fossil fuels':'Other fossil fuels',
'Non-renewable Waste':'Non-renewable waste',
'Other non-renewable':'Differently categorized fossil fuels',
'Of which Wind offshore':'Offshore',
'Of which Wind onshore':'Onshore',
'Of which Solar PV':'Photovoltaics',
'Of which Solar Thermal':'Differently categorized solar',
'Of which Biomass':'Biomass and biogas',
'Of which Biogas':'Biomass and biogas',
'Renewable Waste':'Other bioenergy and renewable waste',
'Of which Hydro Pure storage':'Reservoir',
'Of which Hydro Run-of-river and pondage':'Run-of-river',
'Of which Hydro mixed pumped storage (renewable part)':'Pumped storage',
'Of which Hydro Marine (tidal/wave)':'Marine',
'Other renewable (not listed)':'Differently categorized renewable energy sources',
'Non identified (other not listed)':'Other or unspecified energy sources',
'Total Hydro':'Hydro'}
power_statistics["technology"].replace(dict_energy_source, inplace=True)
power_statistics.head()
technology | country | capacity | year | |
---|---|---|---|---|
41 | Nuclear | AT | 0.0 | 2014 |
43 | Pumped storage | AT | 0.0 | 2014 |
44 | Pumped storage | AT | 0.0 | 2014 |
46 | Lignite | AT | 0.0 | 2014 |
47 | Differently categorized fossil fuels | AT | 0.0 | 2014 |
powerstats_pivot = power_statistics.pivot_table(values='capacity',
index=['country','year'],
columns='technology').reset_index()
powerstats_pivot.head()
technology | country | year | Biomass and biogas | Differently categorized fossil fuels | Differently categorized renewable energy sources | Differently categorized solar | Geothermal | Hard coal | Hydro | Lignite | ... | Onshore | Other bioenergy and renewable waste | Other fossil fuels | Other or unspecified energy sources | Photovoltaics | Pumped storage | Reservoir | Run-of-river | Solar | Wind |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AT | 2014 | 302.685 | 29.0 | 0.0 | 0.0 | 1.0 | 1171.0 | 13568.0 | 0.0 | ... | 2110.0 | 21.96 | 414.0 | 0.0 | 586.12 | 2656.333333 | 0.0 | 5599.0 | 586.12 | 2110.0 |
1 | AT | 2015 | 298.000 | 30.0 | NaN | 0.0 | 1.0 | 1171.0 | 13656.0 | 0.0 | ... | 2489.0 | 23.00 | 414.0 | NaN | 732.00 | 3997.000000 | NaN | 5662.0 | 732.00 | 2489.0 |
2 | AT | 2016 | 286.000 | 35.0 | NaN | 0.0 | 1.0 | 598.0 | 14116.0 | 0.0 | ... | 2730.0 | 22.00 | 414.0 | NaN | 1031.00 | 4212.000000 | NaN | 5692.0 | 1031.00 | 2730.0 |
3 | AT | 2017 | 286.000 | 35.0 | NaN | 0.0 | 1.0 | 598.0 | 14116.0 | 0.0 | ... | 2730.0 | 22.00 | 414.0 | NaN | 1031.00 | 4212.000000 | NaN | 5692.0 | 1031.00 | 2730.0 |
4 | AT | 2018 | 293.000 | 35.0 | NaN | 0.0 | 1.0 | 598.0 | 14150.0 | 0.0 | ... | 2887.0 | 22.00 | 414.0 | NaN | 1193.00 | 4218.000000 | NaN | 5714.0 | 1193.00 | 2887.0 |
5 rows × 27 columns
# technology level
powerstats_pivot['Differently categorized natural gas'] = powerstats_pivot['Natural gas']
# level 2
powerstats_pivot['Bioenergy and renewable waste'] = (
powerstats_pivot['Biomass and biogas'] +
powerstats_pivot['Other bioenergy and renewable waste'])
#level 1
fossil_techs = ['Lignite', 'Hard coal', 'Oil', 'Natural gas', 'Non-renewable waste',
'Mixed fossil fuels', 'Other fossil fuels', 'Differently categorized fossil fuels']
powerstats_pivot['Fossil fuels'] = powerstats_pivot[fossil_techs].sum(axis=1)
res_tech = ['Hydro', 'Wind', 'Solar', 'Geothermal', 'Marine', 'Bioenergy and renewable waste',
'Differently categorized renewable energy sources']
powerstats_pivot['Renewable energy sources'] = powerstats_pivot[res_tech].sum(axis=1)
total_arr = ['Fossil fuels','Nuclear','Renewable energy sources']
powerstats_pivot['Total'] = powerstats_pivot[total_arr].sum(axis=1)
powerstats_pivot.head()
technology | country | year | Biomass and biogas | Differently categorized fossil fuels | Differently categorized renewable energy sources | Differently categorized solar | Geothermal | Hard coal | Hydro | Lignite | ... | Pumped storage | Reservoir | Run-of-river | Solar | Wind | Differently categorized natural gas | Bioenergy and renewable waste | Fossil fuels | Renewable energy sources | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AT | 2014 | 302.685 | 29.0 | 0.0 | 0.0 | 1.0 | 1171.0 | 13568.0 | 0.0 | ... | 2656.333333 | 0.0 | 5599.0 | 586.12 | 2110.0 | 4888.0 | 324.645 | 7105.0 | 16589.765 | 23694.765 |
1 | AT | 2015 | 298.000 | 30.0 | NaN | 0.0 | 1.0 | 1171.0 | 13656.0 | 0.0 | ... | 3997.000000 | NaN | 5662.0 | 732.00 | 2489.0 | 4820.0 | 321.000 | 7002.0 | 17199.000 | 24201.000 |
2 | AT | 2016 | 286.000 | 35.0 | NaN | 0.0 | 1.0 | 598.0 | 14116.0 | 0.0 | ... | 4212.000000 | NaN | 5692.0 | 1031.00 | 2730.0 | 4841.0 | 308.000 | 6407.0 | 18186.000 | 24593.000 |
3 | AT | 2017 | 286.000 | 35.0 | NaN | 0.0 | 1.0 | 598.0 | 14116.0 | 0.0 | ... | 4212.000000 | NaN | 5692.0 | 1031.00 | 2730.0 | 4841.0 | 308.000 | 6407.0 | 18186.000 | 24593.000 |
4 | AT | 2018 | 293.000 | 35.0 | NaN | 0.0 | 1.0 | 598.0 | 14150.0 | 0.0 | ... | 4218.000000 | NaN | 5714.0 | 1193.00 | 2887.0 | 4853.0 | 315.000 | 6420.0 | 18546.000 | 24966.000 |
5 rows × 32 columns
data_power_statistics = powerstats_pivot.melt(id_vars=['country', 'year'],
var_name='technology',
value_name='capacity')
data_power_statistics['source'] = 'ENTSO-E Power Statistics'
data_power_statistics['source_type'] = 'Other association'
data_power_statistics['capacity_definition'] = 'Net capacity'
data_power_statistics['type'] = 'Installed capacity in MW'
data_power_statistics['weblink'] = ('https://www.entsoe.eu/data/'
'power-stats/net-gen-capacity/')
data_power_statistics.head()
country | year | technology | capacity | source | source_type | capacity_definition | type | weblink | |
---|---|---|---|---|---|---|---|---|---|
0 | AT | 2014 | Biomass and biogas | 302.685 | ENTSO-E Power Statistics | Other association | Net capacity | Installed capacity in MW | https://www.entsoe.eu/data/power-stats/net-gen... |
1 | AT | 2015 | Biomass and biogas | 298.000 | ENTSO-E Power Statistics | Other association | Net capacity | Installed capacity in MW | https://www.entsoe.eu/data/power-stats/net-gen... |
2 | AT | 2016 | Biomass and biogas | 286.000 | ENTSO-E Power Statistics | Other association | Net capacity | Installed capacity in MW | https://www.entsoe.eu/data/power-stats/net-gen... |
3 | AT | 2017 | Biomass and biogas | 286.000 | ENTSO-E Power Statistics | Other association | Net capacity | Installed capacity in MW | https://www.entsoe.eu/data/power-stats/net-gen... |
4 | AT | 2018 | Biomass and biogas | 293.000 | ENTSO-E Power Statistics | Other association | Net capacity | Installed capacity in MW | https://www.entsoe.eu/data/power-stats/net-gen... |
dataframes = [data_opsd, data_eurostat, data_soaf, data_entsoe, data_transparency, data_power_statistics]
data = pd.concat(dataframes, sort=False)
data['comment'] = data['comment'].fillna('').astype(str)
col_order = ['technology', 'source', 'source_type', 'weblink', 'year', 'type',
'country', 'capacity_definition', 'capacity', 'comment']
data = data[col_order]
energy_source_mapping = pd.read_csv(os.path.join('input','energy_source_mapping.csv'),
index_col ='name')
energy_source_mapping.replace({0: False, 1: True}, inplace=True)
data = data.merge(energy_source_mapping,
left_on='technology',
right_index=True,
how='left')
new_level_names = {"Level 0": "energy_source_level_0",
"Level 1": "energy_source_level_1",
"Level 2": "energy_source_level_2",
"Level 3": "energy_source_level_3",
"Technology level": "technology_level"}
data.rename(columns=new_level_names, inplace=True)
data.head()
technology | source | source_type | weblink | year | type | country | capacity_definition | capacity | comment | energy_source_level_0 | energy_source_level_1 | energy_source_level_2 | energy_source_level_3 | technology_level | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Fossil fuels | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | 2014 | Installed capacity in MW | AT | Gross capacity | 7243.634 | False | True | False | False | False | |
1 | Fossil fuels | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | 2015 | Installed capacity in MW | AT | Gross capacity | 7059.065 | False | True | False | False | False | |
2 | Fossil fuels | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | 2016 | Installed capacity in MW | AT | Gross capacity | 7323.000 | False | True | False | False | False | |
3 | Fossil fuels | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | 2017 | Installed capacity in MW | AT | Gross capacity | 6469.000 | False | True | False | False | False | |
4 | Fossil fuels | e-control | Regulatory Authority | https://www.e-control.at/documents/1785851/181... | 2018 | Installed capacity in MW | AT | Gross capacity | 6492.000 | False | True | False | False | False |
cols = ['technology', 'source', 'source_type', 'weblink','year',
'type', 'country', 'capacity_definition', 'capacity']
data_crosstable = pd.pivot_table(data[cols],
index=['technology'],
columns=['country', 'type', 'year',
'source', 'source_type',
'weblink', 'capacity_definition'],
values='capacity')
# Apply initial ordering of technologies
data_crosstable = data_crosstable.reindex(technology_order)
# Delete index naming
data_crosstable.index.name = None
data_crosstable.columns.names = ('Country (ISO code)',
'Type of data', 'Year',
'Source', 'Type of source',
'Weblink',
'Capacity definition (net, gross, unknown)')
data_crosstable.head()
Country (ISO code) | AL | ... | XK | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type of data | Installed capacity in MW | ... | Installed capacity in MW | ||||||||||||||||||
Year | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | ... | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 |
Source | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | ... | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT | EUROSTAT |
Type of source | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | ... | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office | Statistical Office |
Weblink | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | ... | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable | Link unavailable |
Capacity definition (net, gross, unknown) | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | ... | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown |
Fossil fuels | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 915.0 | 915.0 | 915.0 | 915.0 | 915.0 | 915.0 | 915.0 | 915.0 | 915.0 | 915.0 |
Lignite | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Hard coal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Oil | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Natural gas | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 1633 columns
energylevels_table = energylevels_raw[6:]
energylevels_table.columns = pd.MultiIndex.from_arrays(energylevels_raw[:6].values,
names=['country', 'type', 'year',
'source', 'source_type',
'capacity_definition'
])
energylevels_table = energylevels_table.reset_index()
energylevels_table['technology'] = energylevels_table['technology'].str.replace('- ', '')
energylevels_table = energylevels_table.set_index('technology')
# Delete index naming
energylevels_table.index.name = None
energylevels_table.columns.names = ('Country (ISO code)',
'Description', None,
None, None,
'Level')
energylevels_table.head()
Country (ISO code) | Energy source levels and technology | ||||
---|---|---|---|---|---|
Description | Total generation capacity | Generation capacity by energy type (fossil, nuclear, renewable, other) | Generation capacity by fuel or energy source (e.g. coal, lignite, hard coal, natural gas, wind) | Generation capacity by fuel or energy source and detailed fuel for bioenergy | Generation capacity by fuel and technology (e.g. combined cycle natural gas, gas turbine, onshore wind, offshore wind) |
NaN | NaN | NaN | NaN | NaN | |
NaN | NaN | NaN | NaN | NaN | |
NaN | NaN | NaN | NaN | NaN | |
Level | NaN | NaN | NaN | NaN | NaN |
NaN | Level 0 | Level 1 | Level 2 | Level 3 | Technology level |
Fossil fuels | 0 | 1 | 0 | 0 | 0 |
Lignite | 0 | 0 | 1 | 1 | 1 |
Hard coal | 0 | 0 | 1 | 1 | 1 |
Oil | 0 | 0 | 1 | 1 | 1 |
Delete downloaded zip files
for root, dirs, files in os.walk("download"):
for file in files:
item = os.path.join(root, file)
if item.endswith(".zip"):
os.remove(item)
print("Deleted: " + item)
Deleted: download\ENTSO-E\SO&AF\2011\SO_AF_2011_-_2025_.zip Deleted: download\ENTSO-E\SO&AF\2012\120705_SOAF_2012_Dataset.zip Deleted: download\ENTSO-E\SO&AF\2013\130403_SOAF_2013-2030_dataset.zip Deleted: download\ENTSO-E\SO&AF\2014\140602_SOAF%202014_dataset.zip Deleted: download\ENTSO-E\SO&AF\2016\SO_AF_2015_dataset.zip
Copy input files
orig_data_path = os.path.join('output', 'original_data')
shutil.rmtree(orig_data_path, ignore_errors=True)
func.copydir(os.path.join('input'), orig_data_path)
func.copydir(os.path.join('download'), orig_data_path)
Write stacked data to formats: csv, xls and sql.
# Write the result to file
data.to_csv(os.path.join('output', 'national_generation_capacity_stacked.csv'),
encoding='utf-8', index_label='ID')
# Write the results to excel file
data.to_excel(os.path.join('output', 'national_generation_capacity_stacked.xlsx'),
sheet_name='output', index_label='ID')
# Write the results to sql database
data.to_sql('national_generation_capacity_stacked',
sqlite3.connect(os.path.join('output',
'national_generation_capacity.sqlite')),
if_exists="replace", index_label='ID')
Write data in human readable form to excel.
# Write crosstable data to excel file
writer = pd.ExcelWriter(os.path.join('output', 'national_generation_capacity.xlsx'))
data_crosstable.to_excel(writer, sheet_name='output')
energylevels_table.to_excel(writer, sheet_name='technology levels')
writer.save()
outputxls = openpyxl.load_workbook(os.path.join('output',
'national_generation_capacity.xlsx'))
ws1 = outputxls['output']
ws2 = outputxls['technology levels']
ws1_rows, ws1_cols = data_crosstable.shape
amount_cols = ws1_cols + 1 # correct 0 index
ws1.column_dimensions['A'].width = 50
ws2.column_dimensions['A'].width = 50
blackfont = Font(color=colors.BLACK, italic=False, bold=False)
blackfontitalic = Font(color=colors.BLACK, italic=True, bold=False)
blackfontbold = Font(color=colors.BLACK, italic=False, bold=True)
align0 = Alignment(horizontal='left', indent=0)
align1 = Alignment(horizontal='left', indent=1)
align2 = Alignment(horizontal='left', indent=2)
# darkest grey
colour = "{0:02X}{1:02X}{2:02X}".format(166, 166, 166)
grey166 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")
# darker grey
colour = "{0:02X}{1:02X}{2:02X}".format(191, 191, 191)
grey191 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")
# lighter grey
colour = "{0:02X}{1:02X}{2:02X}".format(217, 217, 217)
grey217 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")
# lightest grey
colour = "{0:02X}{1:02X}{2:02X}".format(242, 242, 242)
grey242 = PatternFill(fgColor=colour, bgColor=colour, patternType="solid")
for col in range(2, amount_cols+1):
colname = openpyxl.utils.cell.get_column_letter(col)
ws1.column_dimensions[colname].width = 16
for col in range(1, amount_cols+1):
# format column name block
for row in range(2,8):
ws1.cell(row=row, column=col).font = blackfont
# format cells that contain the values
for row in range(9, 48):
ws1.cell(row=row, column=col).fill = grey242
ws1.cell(row=row, column=1).font = blackfontitalic
ws1.cell(row=row, column=1).alignment = align2
# format row 'Total' with dark grey
ws1.cell(row=47, column=col).fill = grey166
ws1.cell(row=47, column=col).font = blackfontbold
# format level 1
for row in [9, 22, 23, 46]:
ws1.cell(row=row, column=col).fill = grey191
ws1.cell(row=row, column=col).font = blackfontbold
ws1.cell(row=row, column=1).alignment = align0
# format level 2
for row in [10, 11, 12, 13, 18, 19, 20, 21, 24, 31, 35, 39, 40, 41, 45]:
ws1.cell(row=row, column=col).fill = grey217
ws1.cell(row=row, column=1).alignment = align1
ws1.cell(row=47, column=1).alignment = align0
ws1.freeze_panes = ws1['B8'] #freeze first column and header rows
# do the same for the second worksheet 'technology levels'
for col in range(1, 7):
colname = get_column_letter(col + 1)
ws2.column_dimensions[colname].width = 25
for row in range(2, 8):
ws2.cell(row=row, column=col).font = blackfont
for row in range(9, 48):
ws2.cell(row=row, column=col).fill = grey242
ws2.cell(row=row, column=1).font = blackfontitalic
ws2.cell(row=row, column=1).alignment = align2
# format row 'Total' with dark grey
ws2.cell(row=46, column=col).fill = grey166
# format level 1
for row in [8, 21, 22, 45]:
ws2.cell(row=row, column=col).fill = grey191
ws2.cell(row=row, column=col).font = blackfontbold
ws2.cell(row=row, column=1).font = blackfontbold
ws2.cell(row=row, column=1).alignment = align0
# format level 2
for row in [9, 10, 11, 12, 17, 18, 19, 20, 23, 30, 34, 38, 39, 40, 44]:
ws2.cell(row=row, column=col).fill = grey217
ws2.cell(row=row, column=1).alignment = align1
ws2.cell(row=46, column=1).alignment = align0
additional_notes = openpyxl.load_workbook(os.path.join('input',
'National_Generation_Capacities.xlsx'))['Additional notes']
# copy additional notes to output file
for col in range(1, 3):
for row in range(1, 10):
add_notes_value = additional_notes.cell(row=row, column=col).value
ws1.cell(row=row + 50, column=col).value = add_notes_value
ws1.cell(row=51, column=1).font = blackfontbold
ws1.cell(row=row + 51, column=1).font = blackfontitalic
outputxls.save(os.path.join('output', 'national_generation_capacity.xlsx'))
files = ['national_generation_capacity.xlsx',
'national_generation_capacity_stacked.csv',
'national_generation_capacity_stacked.xlsx',
'national_generation_capacity.sqlite']
hash_dict = {}
filesize_dict = {}
with open('checksums.txt', 'w') as f:
for file_name in files:
path = os.path.join('output', file_name)
file_hash = func.get_sha_hash(path)
hash_dict[file_name] = file_hash
filesize_dict[file_name] = os.path.getsize(path)
f.write('{},{}\n'.format(file_name, file_hash))
We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.
In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file.
with open(os.path.join('input', 'metadata.yml'), 'r') as f:
metadata = yaml.load(f.read(), Loader=yaml.BaseLoader)
metadata['resources'][0]['hash'] = hash_dict['national_generation_capacity.xlsx']
metadata['resources'][1]['hash'] = hash_dict['national_generation_capacity_stacked.csv']
metadata['resources'][0]['bytes'] = filesize_dict['national_generation_capacity.xlsx']
metadata['resources'][1]['bytes'] = filesize_dict['national_generation_capacity_stacked.csv']
datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))
# Write the information of the metadata
with open(os.path.join('output', 'datapackage.json'), 'w') as f:
f.write(datapackage_json)
End of script.