Import of Python modules needed to process the data and creation of required output folders.
import numpy as np
import pandas as pd
import os
import yaml
import json
import sqlite3
import hashlib
from download_and_process_functions import get_sha_hash
from download_and_process_functions import add_location_and_EIC
# create output folder if they do not exist
os.makedirs(os.path.join('output'), exist_ok=True)
# set data & input directory
data_directory = os.path.join('input','data')
locations_directory = os.path.join('input', 'locations')
Unlike the previous releases of this package, where the data was partially downloaded within this script, all data in the current release is pre-downloaded and provided. The following states all relevant data sources.
meta_data = """
BE:
filename: ProductionParkOverview.xls
source: https://www.elia.be/en/grid-data/power-generation/generating-facilities#
source_file: https://griddata.elia.be/eliabecontrols.prod/interface/fdn/download/generatingfacilities/xls
filetype: xls
date_of_access: Feb 2020
manually_assembled: no
provider: Elia
institution: TSO
NL:
filename: export.csv
source: https://www.tennet.org/english/operational_management/export_data.aspx
source_file: www.tennet.org/english/operational_management/export_data.aspx?exporttype=installedcapacity&format=csv&quarter=2019-4&submit=3
filetype: csv
date_of_access: Feb 2020
manually_assembled: no
provider: Tennet
institution: TSO
FR:
filename: Production_Capacities.csv
source: https://www.services-rte.com/en/view-data-published-by-rte/production-installed-capacity.html
source_file: NA
filetype: csv
date_of_access: Feb 2020
manually_assembled: no
provider: RTE
institution: TSO
PL:
filename: units_list_2019_11_29_PL.csv
source: http://gpi.tge.pl/en/wykaz-jednostek
source_file: http://gpi.tge.pl/en/wykaz-jednostek
date_of_access: Dec 2019
manually_assembled: no
provider: GPI Power Market Data
institution: Information platform
CZ:
filename: 21915_2019.pdf
source: https://www.ceps.cz/cs/priprava-provozu
source_file: https://www.ceps.cz/cs/priprava-provozu
date_of_access: Feb 2020
manually_assembled: no
provider: Ceps
institution: TSO
CH:
filename: 2018 Statistik der Wasserkraftanlagen der Schweiz 31.12.2018.csv
source: https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html
source_file: https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html
date_of access: Dec 2019
manually_assempled: no
provider: Swiss Federal office of energy
institution: Federal Administration
IT:
filename: 18.xlsx
source: http://www2018.terna.it/it-it/sistemaelettrico/transparencyreport/generation/installedgenerationcapacity.aspx
source_file: http://download.terna.it/terna/0000/0216/18.XLSX
date_of_access: Feb 2020
manually_assembled: no
provider: Terna
institution: TSO
FI:
filename: Energiaviraston voimalaitosrekisteri.csv
source: https://energiavirasto.fi/toimitusvarmuus
source_file: https://energiavirasto.fi/toimitusvarmuus
date_of_access: Feb 2020
manually_assembled: no
provider: energiavirasto
institution: energy agency
ES:
filename: Registro_16_12_2019.csv
source: https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica
source_file: https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica
date_of_access: Dec 2019
manually_assembled: no
provider:
institution:
UK:
filename: DUKES_5.11_UK.csv
source: https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes#content
source_file: https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes#content
date_of_access: Dec 2019
manually_assembled: no
provider: UK gov
institution: UK statistics
NO:
filename_thermal: termiske-kraftverk-i-norge-2019.xlsx
filename_hydro: Vannkraftverk.csv
source: https://www.nve.no/
source_file_thermal: https://www.nve.no/media/8967/termiske-kraftverk-i-norge-2019.xlsx
source_file_hydro: https://www.nve.no/energiforsyning/kraftproduksjon/vannkraft/vannkraftdatabase/#
date_of_access: Feb 2020
manually_assembled: no
provider: Norwegian Water Resources and Energy Directorate
institution: Ministry
SE:
filename: input_plant-list_SE.csv
source: https://www.nordpoolgroup.com/
source_file: NA
date_of_access: 2014
manually_assembled: no
provider: Nordpool Group
institution: Market operator
SK:
filename: input_plant-list_SK.csv
source: https://www.seas.sk/thermal-power-plants
source_file: https://www.seas.sk/thermal-power-plants
date_of_access: Feb 2020
manually_assembled: yes
provider: Slovenské elektrárne
institution: joint-stock company
SI:
filename: input_plant-list_SI.csv
source: multiples (in document)
source_file:
date_of_access: Dec 2019
manually_assembled: yes
provider: multiples (in document)
institution: Private company
AT:
filename_thermal: input_plant-list_AT_thermal.csv
filename_hydro: input_plant-list_AT_hydro.csv
source: multiples (in document)
source_file:
date_of_access: Feb 2020
manually_assembled: yes
provider: multiples (in document)
institution: multiples (in document)
DK:
filename: input_plant-list_DK.csv
source: multiples (in document)
source_file:
date_of_access: Jan 2020
manually_assembled: yes
provider: multiples (in document)
institution: multiples (in document)
"""
# Conversion to JSON (if needed)
# meta_data = yaml.load(meta_data, Loader=yaml.BaseLoader)
To provide a standardizes set of power plant information among all national data sources, a set of required columns is defined which is subsequently filled with available data. The following columns and their structure are the basis for all national data sources.
Note: If information for specific columns are not available, the data entry is empty. On the other hand, if the national data sources provides other information than required by the scheme, these information are not processed.
columns_sorted = ['name',
'company',
'street',
'postcode',
'city',
'country',
'capacity',
'energy_source',
'technology',
'chp',
'commissioned',
'type',
'lat',
'lon',
'eic_code',
'additional_info',
'comment',
'source']
The data is provided by the Belgian transmission network operator ELIA. It encompasses a detailed list of Belgian generation units with comprehensive information on technologies and energy fuels.
filepath_BE = os.path.join(data_directory, 'BE','ProductionParkOverview.xls')
data_BE = pd.read_excel(filepath_BE,
sheet_name='ProductionParkOverview',
skiprows=1)
data_BE.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Translate columns
dict_columns_BE = {'ARP': 'company',
'Generation plant': 'name',
'Plant Type': 'technology',
'Technical Nominal Power (MW)': 'capacity',
'Remarks': 'comment',
'Fuel': 'energy_source',
'Country': 'country',
'Source': 'source'}
# Apply general template of columns
data_BE = data_BE.rename(columns=dict_columns_BE).reindex(columns=columns_sorted)
# Drop rows without capacity entries, so that row with
# "Unit connected to Distribution Grid" is dropped
data_BE.dropna(subset=['capacity'], inplace=True)
# Adjust types of entries in all columns
data_BE.capacity = data_BE.capacity.astype(float)
The generation type provides information on the 'usage' of the power plants (beside electricity generaiton), thus if the plant is an industrial power plant or provides thermal heat for district heating.
The Belgian data source provides only a general information on the heat supply (here: WKK). Thus, due to these general informaiton, we classify corresponding plants as both, industrial or combined heat power plant, and cannot distringuish both types.
# Generate entries in column "type" according to technology "WKK"
data_BE.loc[data_BE['technology'] == 'WKK', 'type'] = 'CHP/IPP'
data_BE.loc[data_BE['name'].str.contains('WKK'), 'type'] = 'CHP/IPP'
# Generate entries in column "CHP" according to column "type"
data_BE.loc[(data_BE['type'] == 'CHP') |
(data_BE['type'] == 'IPP') |
(data_BE['type'] == 'CHP/IPP'), 'chp'] = 'Yes'
Overall translation of all technology types mentioned in the column "technology".
# Translate technologies
dict_technology_BE = {'GT': 'Gas turbine',
'BG': np.nan,
'CL': 'Steam turbine',
'WKK': np.nan,
'CCGT': 'Combined cycle',
'D': np.nan,
'HU': np.nan,
'IS': np.nan,
'NU': 'Steam turbine',
'TJ': 'Gas turbine',
'WT': np.nan,
' ': np.nan,
'nan': np.nan,
}
data_BE["technology"].replace(dict_technology_BE, inplace=True)
# add technology parameter for steam and gas turbines
data_BE.loc[data_BE['name'].str.contains('ST') &
data_BE['technology'].isna(), 'technology'] = 'Steam turbine'
data_BE.loc[data_BE['name'].str.contains('GT') &
data_BE['technology'].isna(), 'technology'] = 'Gas turbine'
data_BE.head()
Overall translation of all energy sources types mentioned in the column "energy_sources" and subsequent translation check. Deletion of rows containing "wind" as energy source.
# Translate energy sources
dict_energysources_BE = {'BIO': 'Biomass and biogas',
'BF': 'Other fossil fuels',
'CL': 'Lignite',
'CP': 'Hard coal',
'CG': 'Other fossil fuels',
'GO': 'Oil',
'LF': 'Oil',
'LV': 'Oil',
'CP/BF': 'Mixed fossil fuels',
'CP/CG': 'Mixed fossil fuels',
'FA/BF': 'Mixed fossil fuels',
'NG/BF': 'Mixed fossil fuels',
'NG': 'Natural gas',
'NU': 'Nuclear',
'WR': 'Non-renewable waste',
'WA': 'Hydro',
'WI': 'Wind',
'WP': 'Biomass and biogas'}
data_BE["energy_source"].replace(dict_energysources_BE, inplace=True)
data_BE["energy_source"].replace('NaN', np.nan, inplace=True)
# Delete unwanted energy source
data_BE = data_BE[data_BE.energy_source != 'Wind']
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_BE = add_location_and_EIC('BE', data_BE)
# add source
data_BE["source"] = "https://www.elia.be/en/grid-data/power-generation/generating-facilities#"
data_BE.head()
The data is provided by the Dutch transmission network operator TenneT. It encompasses the daily available generation capacity, thus a list of Dutch generation units being operational on a specific day. The data is downloaded for all four quarters in 2018.
Import of quartely data
filepath_NL = os.path.join(data_directory, 'NL', 'export.csv')
data_NL = pd.read_csv(filepath_NL, encoding='utf-8')
data_NL.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Merge columns "street" and "Number" to one column called "Street"
data_NL['street'] = data_NL[['street', 'Number']].apply(
lambda x: '{} {}'.format(x[0], x[1]), axis=1)
# Drop columns not needed anymore
colsToDrop = ['Location', 'Date', 'Number']
data_NL = data_NL.drop(colsToDrop, axis=1)
# Rename columns
dict_columns_NL = {'Connected body': 'company',
'Entity': 'name',
'Fuel': 'energy_source',
'Capacity': 'capacity',
'zipcode': 'postcode',
'place-name': 'city'}
data_NL.rename(columns=dict_columns_NL, inplace=True)
# Adjust types of entries in all columns
data_NL.capacity = data_NL.capacity.astype(float)
Overall translation of all energy sources types mentioned in the column "energy_sources". Generation of entries for the column "technology" according to information given in the column "energy_source" by TenneT.
# Rename types of energy sources
dict_energysources_NL = {'E01': 'Solar',
'E02': 'Wind',
'E03': 'Hydro',
'E04': 'Biomass and biogas',
'E05': 'Hard coal',
'E06': 'Natural gas',
'E07': 'Oil',
'E08': 'Nuclear',
'E09': 'Other or unspecified energy sources'}
data_NL["energy_source"].replace(dict_energysources_NL, inplace=True)
# Generate technology entry according to energy source
data_NL.loc[data_NL['energy_source'] == 'Nuclear',
'technology'] = 'Steam turbine'
data_NL.loc[data_NL['energy_source'] == 'Hard coal',
'technology'] = 'Steam turbine'
# Delete unwanted energy sources in column "energy_source"
data_NL = data_NL[data_NL.energy_source != 'Solar']
data_NL = data_NL[data_NL.energy_source != 'Wind']
We estimate the installed capacity by the highest available daily capacity for each unit.
# Filter rows by considering "name" and maximum "capacity
data_NL = data_NL.sort_values(
'capacity', ascending=False).groupby('name', as_index=False).first()
# Apply general template of columns
data_NL = data_NL.reindex(columns=columns_sorted)
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_NL = add_location_and_EIC('NL', data_NL)
# add source
data_NL["source"] = "https://www.tennet.org/english/operational_management/export_data.aspx"
data_NL.head()
The data is provided by the French transmission network operator RTE. It encompasses a detailed list of French generation units with a capacity of more than 100 MW.
filepath_FR = os.path.join(data_directory, 'FR', 'Production_Capacities.csv')
data_FR = pd.read_csv(filepath_FR)
data_FR.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Translate columns
dict_columns_FR = {'Type': 'energy_source',
'Name': 'name',
'Installed capacity (MW)': 'capacity',
'Start date of the current version': 'commissioned',
'Location': 'country'
}
data_FR.rename(columns=dict_columns_FR, inplace=True)
# Apply general template of columns
data_FR = data_FR.reindex(columns=columns_sorted)
# Delete place holder datetime
data_FR["commissioned"].replace('01/01/2000', np.nan, inplace=True)
# Map commissioned year to Timestamp col
data_FR['commissioned_year'] = pd.to_datetime(data_FR['commissioned'], format='%d/%m/%Y')
# Reassing commissioned col with year only
mask = data_FR['commissioned_year'].notna()
data_FR.loc[mask, 'commissioned'] = data_FR.loc[mask].commissioned_year.apply(lambda x: x.year)
# Drop not needed col
data_FR.drop('commissioned_year', axis=1, inplace=True)
# Adjust types of entries in all columns
data_FR.capacity = data_FR.capacity.astype(float)
Generation of entries for technologies. Overall translation of all energy sources types mentioned in the column "energy_sources" and subsequent translation check.
# Generate technology entries according to energy sources
data_FR.loc[data_FR['energy_source'] == 'Pumping',
'technology'] = 'Pumped storage'
data_FR.loc[data_FR['energy_source'] == 'Hydraulic over water / guided through',
'technology'] = 'Run-of-river'
data_FR.loc[data_FR['energy_source'] == 'Hydraulic lakes',
'technology'] = 'Reservoir'
data_FR.loc[data_FR['energy_source'] == 'Nuclear',
'technology'] = 'Steam turbine'
data_FR.loc[data_FR['energy_source'] == 'Hard coal',
'technology'] = 'Steam turbine'
# Translate types of energy sources
dict_energysources_FR = {'Other': 'Other or unspecified energy sources',
'Gas': 'Natural gas',
'Pumping': 'Hydro',
'Hydraulic over water / guided through': 'Hydro',
'Hydraulic lakes': 'Hydro',
'Biomass': 'Biomass and biogas'}
data_FR["energy_source"].replace(dict_energysources_FR, inplace=True)
# Delete unwanted energy sources in column "energy_source"
data_FR = data_FR[data_FR.energy_source != 'Wind']
data_FR = data_FR[data_FR.energy_source != 'Solar']
data_FR = data_FR[data_FR.energy_source != 'Marine']
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_FR = add_location_and_EIC('FR', data_FR)
# add source
data_FR["source"] = "https://www.services-rte.com/en/view-data-published-by-rte/production-installed-capacity.html"
data_FR.head()
The data is provided by the Polish Power Exchange GPI. It encompasses a detailed list of large Polish generation units with information on the single power plant blocks.
filepath_PL = os.path.join(data_directory, 'PL', 'units_list_2019_11_29_PL.csv')
data_PL = pd.read_csv(filepath_PL, sep=';')
data_PL.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Rename columns
dict_columns_PL = {'Generating unit name': 'name',
'Comments': 'comment',
'Available capacity [MW]': 'capacity',
'Basic fuel': 'energy_source',
'Country': 'country',
'Source': 'source'}
# Rename first column
data_PL.columns.values[0] = 'company'
# Rename columns
dict_columns_PL = {'Generating unit name': 'name',
'Comments': 'comment',
'Available capacity [MW]': 'capacity',
'Basic fuel': 'energy_source',
'Country': 'country',
'Source': 'source'}
data_PL = data_PL.rename(columns=dict_columns_PL)
# Fill columns "energy_source" and "company" with the belonging entries
cols = ['energy_source', 'company']
data_PL[cols] = data_PL[cols].ffill()
# Delete empty and therefore unwanted rows by referring to column "Generating unit code"
data_PL = data_PL.dropna(subset=['Generating unit code'])
# Apply general template of columns
data_PL = data_PL.reindex(columns=columns_sorted)
# Adjust types of entries in all columns
data_PL.capacity = data_PL.capacity.astype(float)
Overall translation of all energy sources types mentioned in the column "energy_sources".
# Rename energy sources types
dict_energysources_PL = {'Brown coal': 'Lignite',
'Black coal': 'Hard coal',
'Water': 'Hydro',
'Natural gas': 'Natural gas',
}
data_PL["energy_source"].replace(dict_energysources_PL, inplace=True)
Generation of entries for the column "technology" according to information given in the column "energy_source".
# Generate entries in column "technology" according to energy source "hydro"
data_PL.loc[data_PL['energy_source'] == 'Hydro', 'technology'] = 'Pumped storage'
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_PL = add_location_and_EIC('PL', data_PL)
# add source
data_PL["source"] = "http://gpi.tge.pl/en/wykaz-jednostek"
data_PL.head()
The data is provided by the Czech transmission network operator CEPS. It encompasses the daily available capacity reported by the transmission system operator.
filepath_CZ = os.path.join(data_directory, 'CZ', '21915_2019.csv')
data_CZ = pd.read_csv(filepath_CZ, encoding='utf-8')
data_CZ.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Remove white space from names
data_CZ['název elektrárny'] = data_CZ['název elektrárny'].str.strip()
data_CZ['oznacení bloku'] = data_CZ['oznacení bloku'].str.strip()
data_CZ['Typ'] = data_CZ['Typ'].str.strip()
data_CZ['Palivo'] = data_CZ['Palivo'].str.strip()
# Insert dummy G1 where plant block is NA
data_CZ.loc[data_CZ['oznacení bloku'].isna(), 'oznacení bloku'] = 'G1'
# Merge name and block to one column called "name"
data_CZ['name'] = data_CZ['název elektrárny'].map(str) + ' ' + data_CZ['oznacení bloku']
# Rename columns
dict_columns_CZ = {'Typ': 'technology',
'Palivo': 'energy_source',
'výkon instalovaný (MW)': 'capacity'
}
data_CZ.rename(columns=dict_columns_CZ, inplace=True)
# Apply general template of columns
data_CZ = data_CZ.reindex(columns=columns_sorted)
# Adjust types of entries in all columns
data_CZ.capacity = data_CZ.capacity.astype(float)
Overall translation of all technology types mentioned in the column "technology".
# Translate energy source
dict_energy_source_CZ = {'VODA': 'Hydro',
'PLYN': 'Natural gas',
'OLEJ': 'Oil',
'URAN': 'Nuclear',
'HU': 'Lignite',
'CU': 'Hard coal',
'BIO': 'Bioenergy'}
data_CZ["energy_source"].replace(dict_energy_source_CZ, inplace=True)
# Translate technologies
dict_technologies_CZ = {'PE': 'Steam turbine',
'PPE': 'Combined cycle',
'PSE': 'Combined cycle',
'JE': 'Steam turbine',
'VE': np.nan,
'PVE': 'Pumped storage'}
data_CZ["technology"].replace(dict_technologies_CZ, inplace=True)
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_CZ = add_location_and_EIC('CZ', data_CZ)
# add source
data_CZ["source"] = "https://www.ceps.cz/cs/priprava-provozu"
data_CZ.head()
The data is provided by the Swiss Ministry of Energy BFE. It encompasses a detailed list of Swiss hydro generation units with comprehensive information on technical specifications. The list of nuclear generators is manually assembled and provided separately.
filepath_CH_hydro = os.path.join(data_directory, 'CH', '2018 Statistik der Wasserkraftanlagen der Schweiz 31.12.2018.csv')
data_CH = pd.read_csv(filepath_CH_hydro, error_bad_lines=False, sep=';',decimal=',')
filepath_CH_nuclear = os.path.join(data_directory, 'CH', 'input_plant-list_CH_nuclear.csv')
data_nuclear_CH = pd.read_csv(filepath_CH_nuclear, encoding='utf-8', header=0, index_col=None)
In this section, the imported generator list of hydro generators is standardized.
Consolidation of columns
# Merge columns "ZE-Erste Inbetriebnahme" and "ZE-Letzte Inbetriebnahme" to one column called "Commissioned"
data_CH['commissioned'] = data_CH[
['ZE-Erste-Inbetriebnahme', 'ZE-Letzte-Inbetriebnahme']].apply(
lambda x: max(x[0], x[1]), axis=1)
# Merge columns "Bemerkung (1) - (10)" to one column "Comment"
data_CH['comment'] = data_CH[['Bemerkung (1)',
'Bemerkung (2)',
'Bemerkung (3)',
'Bemerkung (4)',
'Bemerkung (5)',
'Bemerkung (6)',
'Bemerkung (7)',
'Bemerkung (8)',
'Bemerkung (9)',
'Bemerkung (10)']].apply(
lambda x:
'{}; {}; {}; {}; {}; {}; {}; {}; {}; {}'.format(
x[0],
x[1],
x[2],
x[3],
x[4],
x[5],
x[6],
x[7],
x[8],
x[9]), axis=1)
data_CH['comment'] = data_CH['comment'].str.replace('nan; ', '')
data_CH['comment'] = data_CH['comment'].str.replace('nan', '')
Translation and harmonization of columns
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Translate columns
dict_columns_CH = {'WKA-Name': 'name',
'ZE-Standort': 'city',
'WKA-Typ': 'technology',
'ZE-Status': 'availability',
'Inst. Turbinenleistung': 'capacity'}
data_CH.rename(columns=dict_columns_CH, inplace=True)
# Adjust type of entries in column "Capacity"
data_CH.capacity = data_CH.capacity.astype(float)
# Adjust availabilities
dict_availabilities_CH = {'im Normalbetrieb': '1',
'im Bau': '0',
'im Umbau': '0',
'stillgelegt': '0'}
data_CH["availability"].replace(dict_availabilities_CH, inplace=True)
# List only operating plants
data_CH = data_CH[data_CH.availability != '0']
# Apply general template of columns
data_CH = data_CH.reindex(columns=columns_sorted)
# Set energy source to "hydro"
data_CH['energy_source'] = 'Hydro'
# Adjust technologies
dict_technologies_CH = {'L': 'Run-of-river',
'S': 'Reservoir',
'P': 'Pumped storage with natural inflow',
'U': 'Pumped storage'}
data_CH["technology"].replace(dict_technologies_CH, inplace=True)
# add source for hydro
data_CH["source"] = "https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html"
# Concat dataframes
data_CH = pd.concat([data_CH, data_nuclear_CH], ignore_index=True, sort=False)
data_CH.head()
In this section a manually compiled list is used to define the geographic coordinates of indivdual hydro power plants.
data_CH = add_location_and_EIC('CH', data_CH)
data_CH.head()
The data is provided by the Italian transmission network operator TERNA. It encompasses a detailed list of Italian generation units of more than 100 MW.
filepath_IT = os.path.join(data_directory, 'IT', '18.XLSX')
data_IT = pd.read_excel(filepath_IT, sheet_name='UPR PmaxOver 100MW')
data_IT.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Translate columns
dict_columns_IT = {'Descrizione Impianto': 'name',
'TIPOLOGIA': 'energy_source',
'Comune': 'city',
'PMAX [MW]': 'capacity',
'Country': 'country',
'Source': 'source',
'Zona': 'additional_info'}
data_IT.rename(columns=dict_columns_IT, inplace=True)
# Apply general template of columns
data_IT = data_IT.reindex(columns=columns_sorted)
# Consider of geographical information in column "additional_info"
data_IT['additional_info'] = data_IT[['additional_info']].apply(
lambda x: 'Zone: {}'.format(x[0]), axis=1)
# Adjust types of entries in all columns
data_IT.capacity = data_IT.capacity.astype(float)
Overall translation of all energy source types mentioned in the column "energy_sources". Deletion of rows containing "wind" and "geothermal_power"as energy source.
# Translate types of energy sources
dict_energysources_IT = {'GEOTERMICO': 'Geothermal',
'TERMOELETTRICO': 'Mixed fossil fuels',
'IDROELETTRICO': 'Hydro',
'EOLICO': 'Wind'}
data_IT["energy_source"].replace(dict_energysources_IT, inplace=True)
# Delete unwanted energy sources in column "energy_source"
data_IT = data_IT[data_IT.energy_source != 'Wind']
data_IT = data_IT[data_IT.energy_source != 'Geothermal']
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_IT = add_location_and_EIC('IT', data_IT)
# add source
data_IT["source"] = "http://www2018.terna.it/it-it/sistemaelettrico/transparencyreport/generation/installedgenerationcapacity.aspx"
data_IT.head()
The data is provided by the Finnish Energy Authority. It encompasses a detailed list of Finnish generation units of at least one megavolt ampere [1 MVA].
filepath_FI = os.path.join(data_directory, 'FI', 'Energiaviraston voimalaitosrekisteri.csv')
data_FI = pd.read_csv(filepath_FI, sep=';')
data_FI.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Generate entries in column "CHP"
data_FI.loc[data_FI[
'Combined Heat and Power Production, Industry,Maximum, Total, MW'] > 0,
'chp'] = 'Yes'
data_FI.loc[data_FI[
'Combined Heat and Power Production, District Heating, Total, MW'] > 0,
'chp'] = 'Yes'
# Rename columns
dict_columns_FI = {'Name': 'name',
'Company': 'company',
'Type': 'type',
'Address': 'street',
'Town': 'city',
'Postal code': 'postcode',
'Maximum, total, MW': 'capacity',
'Main fuel': 'energy_source',
'Country': 'country',
'Source': 'source'}
data_FI.rename(columns=dict_columns_FI, inplace=True)
# Apply general template of columns
data_FI = data_FI.reindex(columns=columns_sorted)
# Adjust types of entries in all columns
data_FI.capacity = data_FI.capacity.astype(float)
Overall translation of all energy sources types mentioned in the column "energy_sources". Generation of entries for the column "energy_scoures" according to information given in the column "type".
# Rename types of energy sources
dict_energysources_FI = {'Biogas': 'Biomass and biogas',
'Black liquor and concentrated liquors': 'Biomass and biogas',
'Blast furnace gas': 'Other fossil fuels',
'By-products from wood processing industry': 'Biomass and biogas',
'Exothermic heat from industry': 'Other or unspecified energy sources',
'Forest fuelwood': 'Biomass and biogas',
'Gasified waste': 'Non-renewable waste',
'Hard coal and anthracite': 'Hard coal',
'Heavy distillates': 'Oil',
'Industrial wood residues': 'Biomass and biogas',
'Light distillates': 'Oil',
'Medium heavy distillates': 'Oil',
'Mixed fuels': 'Mixed fossil fuels',
'Natural gas': 'Natural gas',
'Nuclear energy': 'Nuclear',
'Other by-products and wastes used as fuel': 'Other fossil fuels',
'Other non-specified energy sources': 'Other or unspecified energy sources',
'Peat': 'Biomass and biogas',
' ': 'Other or unspecified energy sources',
np.nan: 'Other or unspecified energy sources'}
data_FI["energy_source"].replace(dict_energysources_FI, inplace=True)
data_FI["energy_source"].replace('NaN', np.nan, inplace=True)
# Generate entries in column "energy_sources" for hydro and wind stations according to column "type"
data_FI.loc[data_FI['type'] == 'Hydro power', 'energy_source'] = 'Hydro'
data_FI.loc[data_FI['type'] == 'Wind power', 'energy_source'] = 'Wind'
Generation of entries for the column "technology" according to information given in the column "energy_source". Deletion of rows containing "wind" as energy source.
# Generate entries in column "technology" according to column "energy_source"
data_FI.loc[data_FI['energy_source'] == 'Nuclear',
'technology'] = 'Steam turbine'
data_FI.loc[data_FI['energy_source'] == 'Hard coal',
'technology'] = 'Steam turbine'
# Delete unwanted energy source (wind) in column "energy_source"
data_FI = data_FI[data_FI.energy_source != 'Wind']
Overall translation of all types mentioned in the column "type" and subsequent translation check.
# Rename types
dict_types_FI = {'District heating CHP': 'CHP',
'Hydro power': 'NaN',
'Industry CHP': 'IPP',
'Nuclear energy': 'NaN',
'Separate electricity production': 'NaN',
'Wind power': 'NaN'}
data_FI["type"].replace(dict_types_FI, inplace=True)
data_FI["type"].replace('NaN', np.nan, inplace=True)
# drop solar generator (redundant)
data_FI = data_FI[data_FI.type != 'Solar']
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_FI = add_location_and_EIC('FI', data_FI)
# add source
data_FI["source"] = "https://energiavirasto.fi/toimitusvarmuus"
data_FI.head()
The data is provided by the Spanish SEDE - Ministry of Industry, Energy and Tourism. It encompasses a detailed list of Spanish generation units with comprehensive information on technologies and energy fuels.
filepath_ES = os.path.join(data_directory, 'ES', 'Registro_16_12_2019.csv')
data_ES = pd.read_csv(filepath_ES, error_bad_lines=False, sep=';',decimal=',')
data_ES.head()
Overall adjustment of all columns within the dataframe. Translation, addition, deletion, sorting of columns as well as adjustment of the column entries' types. Adjustment of the entries' units from kW to MW in the columns "Capacity" (corresponding to the net capacity in the original data set).
# Select value of 'Potencia Instalada en MW' if gross capacity is empty
data_ES['Potencia Bruta Total en MW'] = np.where(data_ES['Potencia Bruta Total en MW'].isnull(),data_ES['Potencia Instalada en MW'],data_ES['Potencia Bruta Total en MW'])
# Rename columns
dict_columns_ES = {'Nombre del Titular de la Unidad de Producción': 'company',
'Nombre de la Unidad de Producción': 'name',
'Municipio de la Unidad de Producción': 'city',
'CPostal del Titular': 'postcode',
'Tecnología de la Unidad de Producción': 'technology',
'Comment': 'comment',
'Potencia Bruta Total en MW': 'capacity',
'Tipo de Unidad de Producción': 'energy_source',
'Fecha de la puesta en servicio de la Unidad de Producción': 'commissioned',
'Country': 'country',
'Source': 'source'}
data_ES.rename(columns=dict_columns_ES, inplace=True)
#Fix capacity entries to float
numeric_capacity = []
for cap in data_ES.capacity:
if isinstance(cap, float):
numeric_capacity.append(cap)
else:
split_cap = cap.split(',')
if len(split_cap) == 1:
numeric_capacity.append(int(split_cap[0]))
elif len(split_cap) == 2:
numeric_capacity.append(int(split_cap[0]) + float('.' + split_cap[1]))
elif len(split_cap) == 3:
numeric_capacity.append(int(split_cap[0])*1000 + int(split_cap[1]) + float('.' + split_cap[2]))
else:
numeric_capacity.append(np.nan)
data_ES["capacity"] = numeric_capacity
# Apply general template of columns
data_ES = data_ES.reindex(columns=columns_sorted)
Overall translation of all energy sources types mentioned in the column "energy_sources".
dict_energysources_ES = {'Biocombustibles liquidos': 'Biomass and biogas',
'Biogas': 'Biomass and biogas',
'Biogas de digestion': 'Biomass and biogas',
'Biogas de vertedero': 'Biomass and biogas',
'Biomasa industrial agricola': 'Biomass and biogas',
'Biomasa industrial forestal': 'Biomass and biogas',
'Biomasa primaria': 'Biomass and biogas',
'Calor residual': 'Other or unspecified energy sources',
'Carbon': 'Hard coal',
'CARBON IMPORTADO': 'Hard coal',
'Cultivos energeticos agricolas o forestales': 'Biomass and biogas',
'DIESEL': 'Oil',
'Energias residuales': 'Non-renewable waste',
'Fuel': 'Oil',
'FUEL-OIL 0,3': 'Oil',
'FUELOLEO': 'Oil',
'GAS DE REFINERIA': 'Natural gas',
'Gas natural': 'Natural gas',
'GAS NATURAL': 'Natural gas',
'Gas residual': 'Natural gas',
'Gasoleo': 'Oil',
'GASOLEO': 'Oil',
'HULLA+ANTRACITA': 'Hard coal',
'Licores negros': 'Biomass and biogas',
'LIGNITO NEGRO': 'Lignite',
'LIGNITO PARDO': 'Lignite',
'NUCLEAR': 'Nuclear',
'Propano': 'Natural gas',
'Residuo aprovechamiento forestal o selvicola': 'Other bioenergy and renewable waste',
'Residuos': 'Non-renewable waste',
'Residuos actividad agricolas o jardineria': 'Other bioenergy and renewable waste',
'Residuos industriales': 'Non-renewable waste',
'Residuos solidos urbanos': 'Non-renewable waste',
'RESIDUOS SOLIDOS URBANOS': 'Non-renewable waste',
' ': 'Other or unspecified energy sources',
np.nan: 'Other or unspecified energy sources',
'HIDRÁULICA': 'Hydro',
'TERMONUCLEAR': 'Nuclear',
'TÉRMICA': 'Hard coal',
'TÉRMICA CLÁSICA': 'Hard coal'}
data_ES["energy_source"].replace(dict_energysources_ES, inplace=True)
data_ES["energy_source"].replace('NaN', np.nan, inplace=True)
Overall translation of all technology types mentioned in the column "technology".
# Translate technologies
data_ES.loc[data_ES.technology == "COGENERACIÓN", "chp"] = "yes"
data_ES["technology"].replace('COGENERACIÓN','chp', inplace=True)
dict_technologies_ES = {'FLUYENTE': 'Run-of-river',
'EMBALSE': 'Reservoir',
'BOMBEO MIXTO': 'Pumped storage with natural inflow',
'CT CARBÓN': '',
'CN PWR': '',
'CN BWR': '',
'COGENERACIÓN': 'chp',
'Turbinas de Vapor de Fuel': '',
'CICLO COMBINADO': 'Combined cycle',
'Ciclo combinado configuración 2x1': 'Combined cycle',
'RESÍDUOS SÓLIDOS URBANOS': '',
'Turbinas de vapor de Carbón': 'Steam turbine',
'Ciclo combinado configuración 3x1': 'Combined cycle',
'Turbinas de gas aeroderivadas': 'Gas turbine',
'TURBINA DE GAS Y DE VAPOR': 'Gas turbine',
'BOMBEO PURO': 'Pumped storage',
'CT FUELOLEO': '',
'Grupos Diésel - 4T': '',
'MOTORES DIESEL': '',
'Turbinas de gas heavy duty': 'Gas turbine',
'Grupos Diésel - 2T': '',
'BOMBEO+ EOLICA': '',
'TURBINA DE GAS': 'Gas turbine',
'CT FUEL-GAS': '',
}
data_ES.loc[:, "technology"] = data_ES["technology"].replace(dict_technologies_ES)
data_ES.loc[data_ES.technology == "chp", "chp"] = "yes"
Exclude renewable energy sources, delete power stations with no names and capacities, and adjust the format of commissioning year.
# Delete unwanted energy source in column "energy_source" & "technology"
data_ES = data_ES[data_ES.energy_source != 'TERMOELÉCTRICA']
data_ES = data_ES[data_ES.energy_source != 'COGENERACIÓN']
data_ES = data_ES[data_ES.energy_source != 'EXPERIMENTAL']
data_ES = data_ES[data_ES.name != '']
# Delete power stations with no name and no capacities
data_ES = data_ES[data_ES.name.notna()]
# Map commissioned year to Timestamp col
data_ES['commissioned_year'] = pd.to_datetime(data_ES['commissioned'], format='%d.%m.%Y %H:%M')
# Reassing commissioned col with year only
mask = data_ES['commissioned_year'].notna()
data_ES.loc[mask, 'commissioned'] = data_ES.loc[mask].commissioned_year.apply(lambda x: x.year)
# Drop not needed col
data_ES.drop('commissioned_year', axis=1, inplace=True)
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_ES = add_location_and_EIC('ES', data_ES)
# add source
data_ES["source"] = "https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica"
data_ES.head()
The data is provided by the British government's Statistical Office. It encompasses a detailed list of British generation units with comprehensive information on technologies and energy fuels.
filepath_UK = os.path.join(data_directory, 'UK', 'DUKES_5.11_UK.csv')
data_UK = pd.read_csv(filepath_UK, sep=';')
data_UK.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Rename sixth column
data_UK.columns.values[6] = 'Location'
# Rename columns
dict_columns_UK = {'Company Name': 'company',
'Station Name': 'name',
'Installed Capacity (MW)': 'capacity',
'Country': 'country',
'Location': 'location',
'Fuel': 'energy_source',
'Year of commission or year generation began': 'commissioned',
'Source': 'source'}
data_UK.rename(columns=dict_columns_UK, inplace=True)
dict_regions_UK = {'East': 'England',
'East Midlands': 'England',
'London': 'England',
'North East': 'England',
'North West': 'England',
'South East': 'England',
'South West': 'England',
'West Midlands': 'England',
'Yorkshire and the Humber': 'England',
'N Ireland': 'Northern Ireland'}
data_UK["location"]=data_UK["location"].replace(dict_regions_UK, inplace=True)
# Merge columns "Country" and "Location" to one column called "Country"
data_UK['additional_info'] = data_UK[['location']].apply(
lambda x: 'Region: {}'.format(x[0]), axis=1)
# Drop column "Location" after merger
colsToDrop = ['location']
data_UK = data_UK.drop(colsToDrop, axis=1)
# Apply general template of columns
data_UK = data_UK.reindex(columns=columns_sorted)
#Set specific territory to "additional_info"
data_UK['additional_info'] = data_UK['country']
# Solve comma problem in capacity column and convert to float
data_UK.capacity = data_UK.capacity.str.replace(',', '').astype(float)
Generation of entries for the column "technology" according to information given in the column "energy_source".
# Generate entries in column "technology" according to column "energy_source"
data_UK.loc[data_UK['energy_source'] == 'Hydro / pumped storage',
'technology'] = 'Pumped storage'
data_UK.loc[data_UK['energy_source'] == 'Pumped storage',
'technology'] = 'Pumped storage'
data_UK.loc[data_UK['energy_source'] == 'Wind',
'technology'] = 'Onshore'
data_UK.loc[data_UK['energy_source'] == 'Wind (offshore)',
'technology'] = 'Offshore'
data_UK.loc[data_UK['energy_source'] == 'Nuclear',
'technology'] = 'Steam turbine'
data_UK.loc[data_UK['energy_source'] == 'CCGT',
'technology'] = 'Combined cycle'
data_UK.loc[data_UK['energy_source'] == 'OCGT',
'technology'] = 'Gas turbine'
Overall translation of all energy sources types mentioned in the column "energy_source" and subsequent translation check. Deletion of rows containing "wind" as energy source.
dict_energysources_UK = {'Biomass': 'Biomass and biogas',
'Biomass / gas / waste derived fuel': 'Mixed fossil fuels',
'Natural Gas': 'Natural gas',
'CCGT': 'Natural gas',
'Sour gas': 'Natural gas',
'Coal': 'Hard coal',
'Coal / biomass': 'Mixed fossil fuels',
'Coal / biomass / gas / waste derived fuel': 'Mixed fossil fuels',
'Coal / oil': 'Mixed fossil fuels',
'Coal/oil': 'Mixed fossil fuels',
'Diesel': 'Oil',
'Gas': 'Natural gas',
'Gas / oil': 'Mixed fossil fuels',
'Diesel/gas oil': 'Mixed fossil fuels',
'Gas oil': 'Oil',
'Gas oil / kerosene': 'Oil',
'Hydro': 'Hydro',
'Hydro / pumped storage': 'Hydro',
'Pumped Storage': 'Hydro',
'Light oil': 'Oil',
'Meat & bone meal': 'Other bioenergy and renewable waste',
'Nuclear': 'Nuclear',
'OCGT': 'Natural gas',
'Oil': 'Oil',
'Light oil ': 'Oil',
'Pumped storage': 'Hydro',
'Straw': 'Biomass and biogas',
'Biomass (wood pellets, sunflower/oat husk pellets)': 'Biomass and biogas',
'Biomass (woodchip)': 'Biomass and biogas',
'Biomass (litter, woodchip)': 'Biomass and biogas',
'Biomass (meat and bone meal)': 'Biomass and biogas',
'Biomass (poultry litter, waste wood)': 'Biomass and biogas',
'Biomass (straw)': 'Biomass and biogas',
'Biomass (recycled wood)': 'Biomass and biogas',
'Biomass (poultry litter, woodchip)': 'Biomass and biogas',
'Biomass (wood pellets)': 'Biomass and biogas',
'Waste (municipal solid waste)': 'Non-renewable waste',
'Biomass (recycled wood, virgin wood)': 'Biomass and biogas',
'Biomass (virgin wood)': 'Biomass and biogas',
'Waste': 'Non-renewable waste',
'Waste (anaerobic digestion)': 'Non-renewable waste',
'Wind': 'Wind',
'Wind (offshore)': 'Wind',
'Wind (onshore)': 'Wind',
'Solar': 'Solar'}
data_UK["energy_source"].replace(dict_energysources_UK, inplace=True)
# Delete unwanted energy sources
data_UK = data_UK[data_UK.energy_source != 'Wind']
data_UK = data_UK[data_UK.energy_source != 'Solar']
In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants.
data_UK = add_location_and_EIC('UK', data_UK)
# add source
data_UK["source"] = "https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes#content"
data_UK.head()
The data is provided by the Norwegian Water Resources and Energy Directorate. It encompasses a database on the installed capacity of thermal generators as well as on hydro generators.
filepath_NO_hydro = os.path.join(data_directory, 'NO', 'Vannkraftverk.csv')
data_NO_hydro = pd.read_csv(filepath_NO_hydro,
skiprows=2,
sep=";",
decimal=",",
header=0,
index_col=False,
encoding='latin-1')
filepath_NO_thermal = os.path.join(data_directory, 'NO', 'termiske-kraftverk-i-norge-2019.xlsx')
data_NO_thermal = pd.read_excel(filepath_NO_thermal,
sheet_name='Ark1')
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Translate columns
dict_columns_NO_hydro = {'Navn': 'name',
'Type': 'technology',
'Kommune': 'city',
'Kommunenr.': 'postcode',
'Maks ytelse [MW]': 'capacity',
'Hovedeier': 'company',
'Dato for første utnyttelse av fallet': 'commissioned',
'Elspotområde': 'additional_info'}
data_NO_hydro.rename(columns=dict_columns_NO_hydro, inplace=True)
# Apply general template of columns
data_NO_hydro = data_NO_hydro.reindex(columns=columns_sorted)
# Fill with general information
data_NO_hydro['country'] = 'NO'
data_NO_hydro['energy_source'] = 'Hydro'
data_NO_hydro['additional_info'] = 'Zone: NO' + data_NO_hydro['additional_info'].astype(str)
# Change comissioning date to year only
data_NO_hydro['commissioned'] = data_NO_hydro['commissioned'].apply(lambda x: x[0:4]).astype(int)
Overall translation of all technology types mentioned in the column "technology".
# Add comment for plants with pump only (later categorized as pumped storage)
data_NO_hydro.loc[data_NO_hydro['technology'] == 'Pumpe', 'comment'] = 'Pump only'
# Take absolute of negative capacity of plants with pump only
mask = data_NO_hydro['technology'] == 'Pumpe'
data_NO_hydro.loc[mask, 'capacity'] = data_NO_hydro.loc[mask].capacity.apply(lambda x: abs(x))
# Translate technologies
dict_technologies_NO_hydro = {
'Kraftverk': 'Reservoir',
'Pumpekraftverk': 'Pumped storage',
'Pumpe': 'Pumped storage'
}
data_NO_hydro['technology'].replace(dict_technologies_NO_hydro, inplace=True)
data_NO_hydro.head()
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
# Translate columns
dict_columns_NO_thermal = {'Kraftverk': 'name',
'Kommentar': 'comment',
'Brensel': 'energy_source',
'Kommune': 'city',
'Kommunenr.': 'postcode',
'Installert effekt [MW]': 'capacity',
'Idriftsettelsesår': 'commissioned',
'Elspotområde': 'additional_info'}
data_NO_thermal.rename(columns=dict_columns_NO_thermal, inplace=True)
# Apply general template of columns
data_NO_thermal = data_NO_thermal.reindex(columns=columns_sorted)
# Fill with general information
data_NO_thermal['country'] = 'NO'
data_NO_thermal['additional_info'] = 'Zone: ' + data_NO_thermal['additional_info'].astype(str)
Overall translation of all technology types mentioned in the column "technology".
# Translate energy sources
dict_energy_sources_NO_thermal = {
'Avfallsforbrenning': 'Non-renewable waste',
'Varmegjenvinning': 'Other fossil fuels',
'Naturgass': 'Natural gas',
'Biogass fra avfall': 'Biomass and biogas',
'Ukjent': np.nan,
'Bark, returfiberavfall, slam, rivningsvirke og olje ': 'Mixed fossil fuels',
'Flis fra impregnert tre, avfallsforbrenning': 'Other bioenergy and renewable waste',
'Biogass': 'Biomass and biogas',
'CO gass': 'Other fossil fuels'
}
data_NO_thermal['energy_source'].replace(dict_energy_sources_NO_thermal,
inplace=True)
# Delete unwanted energy sources
data_NO_thermal = data_NO_thermal[data_NO_thermal.energy_source != 'Biomass and biogas']
data_NO_thermal = data_NO_thermal[data_NO_thermal.energy_source != 'Other bioenergy and renewable waste']
data_NO_thermal.head()
# add sources
data_NO_hydro["source"] = "https://www.nve.no/energiforsyning/kraftproduksjon/vannkraft/vannkraftdatabase/#"
data_NO_thermal["source"] = "https://www.nve.no/media/8967/termiske-kraftverk-i-norge-2019.xlsx"
data_NO = pd.concat([data_NO_hydro, data_NO_thermal], ignore_index=True)
data_NO.head()
The data is provided by the power exchange Nordpool. It encompasses a detailed list of Swedish generation units with a capacity of more than 100 MW for 2014. Since there was no new data on the Swedish generators found, the list from 2014 is still used in this release.
filepath_SE = os.path.join(data_directory, 'SE', 'input_plant-list_SE.csv')
data_SE = pd.read_csv(filepath_SE, encoding='utf-8', header=0, index_col=None)
data_SE.head()
The data is provided by the Slovakian utility Slovenské elektrárne a.s. (SEAS). It encompasses a detailed list of Slovak generation units with comprehensive information on technologies and energy fuels.
filepath_SK = os.path.join(data_directory, 'SK', 'input_plant-list_SK.csv')
data_SK = pd.read_csv(filepath_SK, encoding='utf-8', header=0, index_col=None)
data_SK.head()
The data is provided by several Slovenian utilities. The respective data links are given in the column "source". This list encompasses Slovenian generation units with comprehensive information on technologies and energy fuels.
filepath_SI = os.path.join(data_directory, 'SI', 'input_plant-list_SI.csv')
data_SI = pd.read_csv(filepath_SI, encoding='utf-8')
data_SI.head()
The data for conventional power plants is provided by several Austrian utilities. The respective data links are given in the column "source". The specifications of Austrian hydro power plants, however, solely are based on Verbund AG. The resulting list encompasses Austrian generation units with comprehensive information on technologies and energy fuels.
filepath_AT_hydro = os.path.join(data_directory, 'AT', 'input_plant-list_AT_hydro.csv')
data_AT_hydro = pd.read_csv(filepath_AT_hydro, encoding="latin1")
filepath_AT_thermal = os.path.join(data_directory, 'AT', 'input_plant-list_AT_thermal.csv')
data_AT_thermal = pd.read_csv(filepath_AT_thermal, encoding="latin1")
The imported data is standardized with respect to the columns as defined in section 2.3. In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process.
#Delete MW in capacity column
data_AT_hydro.capacity = data_AT_hydro.capacity.apply(lambda x: x.replace('MW',''))
#Apply general template of columns
data_AT_hydro = data_AT_hydro.reindex(columns=columns_sorted)
data_AT_thermal = data_AT_thermal.reindex(columns=columns_sorted)
data_AT = pd.concat([data_AT_hydro, data_AT_thermal], ignore_index=True)
data_AT.head()
The data is assembled using the information of several websites. The sources can be found within the document. It encompasses a detailed list of Danish generation units with comprehensive information on technologies and energy fuels.
filepath_DK = os.path.join(data_directory, 'DK', 'input_plant-list_DK.csv')
data_DK = pd.read_csv(filepath_DK, encoding='utf-8', header=0, index_col=None)
All generators, which are not or only partly available, are dropped. The imported data is then standardized with respect to the columns as defined in section 2.3.
# List only operating plants
data_DK = data_DK[data_DK.availability != '0']
data_DK = data_DK[data_DK.availability != 'partly']
#Drop unwanted columns
data_DK = data_DK.drop('availability', axis=1)
# Apply general template of columns
data_DK=data_DK.reindex(columns=columns_sorted)
data_DK.head()
In the following, the national datasets are consolidated to a single European dataset. Unfortunately, the Belgian dataset cannot be integrated due to the copyright by the data owner ELIA.
dataframes = [data_BE,
data_NL,
data_FR,
data_PL,
data_CZ,
data_CH,
data_IT,
data_FI,
data_ES,
data_UK,
data_NO,
data_SE,
data_SK,
data_SI,
data_AT,
data_DK]
data_EU = pd.concat(dataframes, sort=False)
data_EU.head()
# Import energy source level definition
energy_source_levels = pd.read_csv(
os.path.join('input', 'energy_source_levels.csv'), index_col=None, header=0)
# Merge energy source levels to data set
data_EU = data_EU.reset_index().merge(
energy_source_levels,
how='left',
left_on='energy_source',
right_on='energy_source_level_1').drop_duplicates(
subset=['name',
'city',
'country',
'capacity'], keep='first').set_index('name')
data_EU = data_EU.reset_index().merge(
energy_source_levels,
how='left',
left_on='energy_source',
right_on='energy_source_level_2').drop_duplicates(
subset=['name',
'city',
'country',
'capacity'], keep='first').set_index('name')
data_EU = data_EU.reset_index().merge(
energy_source_levels,
how='left',
left_on='energy_source',
right_on='energy_source_level_3').drop_duplicates(
subset=['name',
'city',
'country',
'capacity'], keep='first').set_index('name')
# Combine different energy source levels created by merge
data_EU['energy_source_level_1'] = data_EU[
['energy_source_level_1',
'energy_source_level_1_x',
'energy_source_level_1_y']].fillna('').sum(axis=1)
data_EU['energy_source_level_2'] = data_EU[
['energy_source_level_2',
'energy_source_level_2_y']].fillna('').sum(axis=1)
data_EU['energy_source_level_3'] = data_EU[
['energy_source_level_3']].fillna('').sum(axis=1)
# Drop auxiliary columns due to merge
colsToDrop = ['energy_source_level_1_y',
'energy_source_level_2_y',
'energy_source_level_3_y',
'energy_source_level_1_x',
'energy_source_level_2_x',
'energy_source_level_3_x']
data_EU = data_EU.drop(colsToDrop, axis=1)
# replace false energy source levels for plants without energy source in original data
index_with_NAN = data_EU[data_EU.energy_source.isna()].index
data_EU.loc[index_with_NAN,['energy_source_level_1',
'energy_source_level_2',
'energy_source_level_3']] = np.NaN
data_EU.loc[index_with_NAN, 'comment'] = 'Energy source not in original data'
data_EU.head()
First, we define the ordering of the columns. Secondly, the data types are redefined. At the moment, this has the drawback that empty columns are redefined as float
instead of object
.
columns_sorted_output = ['name',
'company',
'street',
'postcode',
'city',
'country',
'capacity',
'energy_source',
'technology',
'chp',
'commissioned',
'type',
'lat',
'lon',
'eic_code',
'energy_source_level_1',
'energy_source_level_2',
'energy_source_level_3',
'additional_info',
'comment',
'source']
# Set ordering of columns
data_EU = data_EU.reset_index()
data_EU = data_EU.reindex(columns=columns_sorted_output)
# Set data types for columns
data_EU = data_EU.astype(str)
data_EU[['capacity', 'commissioned', 'lat', 'lon']] = data_EU[
['capacity', 'commissioned', 'lat', 'lon']].astype(float)
data_EU.replace('nan', np.nan, inplace=True)
# data_EU.dtypes
# Set index
data_EU = data_EU.set_index('name')
data_EU.head()
Obtain the DE list and concatinate. Note, the stand alone DE list contains more information.
# call DE script if not already executed
# %run ./download_and_process_DE.ipynb
data_DE = pd.read_csv('output/conventional_power_plants_DE.csv', index_col=0)
data_DE = data_DE.rename(columns={'eic_code_plant': 'eic_code',
'capacity_net_bnetza': 'capacity'})
data_DE["source"] = "BNetzA/UBA"
data_DE["additional_info"] = ""
data_EU = pd.concat([data_EU, data_DE.loc[:, data_EU.columns]])
Write the final list to file.
output_path = 'output'
data_EU.to_csv(os.path.join(
output_path, 'conventional_power_plants_EU.csv'),
encoding='utf-8',
index_label='name')
data_EU.to_excel(
os.path.join(output_path, 'conventional_power_plants_EU.xlsx'),
sheet_name='plants',
index_label='name')
data_EU.to_sql(
'conventional_power_plants_EU',
sqlite3.connect(os.path.join(output_path, 'conventional_power_plants.sqlite')),
if_exists="replace",
index_label='name')
End of script.