RES Timeseries Data Extraction

This Jupyter Notebook downloads and cleans up solar and wind timeseries from the German TSOs TransnetBW and Tennet.

Loading some python libraries needed later

In [8]:
import urllib
import os
import pandas
import numpy as np

Configuring URLs

Note: The placeholder syntax for the month (and year) variables in the URL look quite ugly at the moment. I use a trick there to avoid running into the problem that the default placeholder marker in python is the percentage sign, but the URL also contains percentage signs itself which must be escaped by two percentage signs %%.

In [9]:
conf = {
    'transnetbw': {
        'pv':  "https://www.transnetbw.de/de/kennzahlen/erneuerbare-energien/fotovoltaik?app=solar&activeTab=csv&selectMonatDownload=__ESCAPE__d&view=1&download=true".replace("%", "%%").replace("__ESCAPE__","%"),
        'wind':  'https://www.transnetbw.de/de/kennzahlen/erneuerbare-energien/windenergie?app=wind&activeTab=csv&selectMonatDownload=__ESCAPE__d&view=1&download=true'.replace("%", "%%").replace("__ESCAPE__","%"),
    }
    ,'tennet': {
        'pv':  "http://www.tennettso.de/site/de/phpbridge?commandpath=Tatsaechliche_und_prognostizierte_Solarenergieeinspeisung%2FmonthDataSheetCsv.php&sub=total&querystring=monat%3D__ESCAPE__(year)d-__ESCAPE__(month)02d&contenttype=text%2Fx-csv".replace("%", "%%").replace("__ESCAPE__","%"),
        'wind':  "http://www.tennettso.de/site/de/phpbridge?commandpath=Tatsaechliche_und_prognostizierte_Windenergieeinspeisung%2FmonthDataSheetCsv.php&querystring=monat%3D__ESCAPE__(year)-__ESCAPE__(month)02d&contenttype=text%2Fx-csv".replace("%", "%%").replace("__ESCAPE__","%"),
    },
}

Downloading the data

Here we loop through the configuration defined above by TSO (transnetbw, tennet) and Technology (pv, wind).

We then do slightly different things depending on which TSO we're in, because Tennet has month and year as variables in the URL whereas TransnetBW only has the month in there.

In [10]:
# Download data

for tsoName, tsoConf in conf.iteritems():
    for techName, url in tsoConf.iteritems():
        
        if tsoName == 'tennet':
            for year in range (2014, 2015): # This means a range from 2014 to 2014 (including)
                path = 'csv/'+tsoName+'/'+techName+'/'+str(year)+'/'
                if not os.path.exists(path): os.makedirs(path)

                for month in range(1, 13): # This means a range from 1 to 12 (including)
                    full_url = url % {'month': month, 'year': year}
                    urllib.urlretrieve(full_url, path+"%d.csv" % month)

        if tsoName == 'transnetbw':
            path = 'csv/'+tsoName+'/'+techName+'/'
            if not os.path.exists(path): os.makedirs(path)
                
            for month in range(1, 25): # This means the last 24 months
                full_url = url % month
                urllib.urlretrieve(full_url, path+"%d.csv" % month)

Define individual read functions

The TSOs have different columns and date formats they use in their CSV files. Here we define individual functions for the two TSOs to read-in a single monthly file in their specific format.

We're going to use these functions in the loop for all months later on.

In [11]:
# Define specific readData functions for the TSOs

def readData_tennet(filePath, tsoName, techName):
    data = pandas.read_csv(
        filePath,
        sep=";",
        skiprows=4,
        names=['datum','position','forecast_'+tsoName+'_'+techName,'actual_'+tsoName+'_'+techName,'unknown']
    )

    data.fillna(method='ffill',inplace=True)

    data['hour'] = (np.trunc((data['position']-1) /4)).astype(int).astype(str)
    data['minute'] = (((data['position']-1) % 4)*15).astype(int).astype(str)
    data['time'] = data['datum']+' '+data['hour']+':'+data['minute']

    data['Timestamp'] = pandas.to_datetime(data['time'])

    data.set_index('Timestamp',inplace=True)

    del data['position']
    del data['hour'] 
    del data['minute'] 
    del data['datum'] 
    del data['time'] 
    del data['unknown']
    return data

                
def readData_transnet(filePath, tsoName, techName):
    data = pandas.read_csv(
        filePath,
        decimal=',',
        sep=";",
        parse_dates = {'Timestamp' : ['Datum von', 'Uhrzeit von']},
        index_col = "Timestamp",
        dayfirst=True,
    )

    data = data.drop('Unnamed: 6', 1)
    data = data.drop('Datum bis', 1)
    data = data.drop('Uhrzeit bis', 1)
    data.fillna(0,inplace=True)
    
    forecastCol = 'forecast_'+tsoName+'_'+techName
    actualCol = 'actual_'+tsoName+'_'+techName

    data.rename(
        columns={'Prognose (MW)': forecastCol, 'Ist-Wert (MW)': actualCol},
        inplace=True
    )
    

    if 'Datum' in data.columns: 
        del data['Datum']
    if 'Uhrzeit' in data.columns: 
        del data['Uhrzeit']    
    
    return data

Testing the read functions

If you want to test the individual read functions just for checking if everything works, you can uncomment the line below.

In [12]:
# transnetTestDataWind = readData_transnet("csv/transnetbw/wind/7.csv", 'transnetbw', 'wind')

# And now output the TestData:
# transnetTestDataWind

Loading the data into memory and cleaning it up

In [13]:
# Read and merge data

resultDataSet = pandas.DataFrame();

for tsoName, tsoConf in conf.iteritems():
    for techName, url in tsoConf.iteritems():

        if tsoName == 'tennet':
            for year in range (2014, 2015):
                for month in range(1, 12):
                    filePath = 'csv/'+tsoName+'/'+techName+'/'+str(year)+'/'+str(month)+'.csv'
                    try:
                        df = readData_tennet(filePath, tsoName, techName)
                        try:
                            df = readData_tennet(filePath, tsoName, techName)
                            resultDataSet = resultDataSet.combine_first(df)
                        except TypeError:
                            atLeastOneError = 'yes'
                    except ValueError:
                        atLeastOneError = 'yes'

        if tsoName == 'transnetbw':
            for month in range(1, 20):
                filePath = 'csv/'+tsoName+'/'+techName+'/'+str(month)+'.csv'
                df = readData_transnet(filePath, tsoName, techName)
                try:
                    df = readData_transnet(filePath, tsoName, techName)
                    resultDataSet = resultDataSet.combine_first(df)
                except ValueError:
                    atLeastOneError = 'yes'
                    print 'error'


if 'Datum' in resultDataSet.columns: 
    del resultDataSet['Datum']
if 'Uhrzeit' in resultDataSet.columns: 
    del resultDataSet['Uhrzeit']                    

Validation and Plausibility Checks

Ideally you would do plausibility checks on your data here. (Check for missing data, check that output is never negative, etc.)

In [14]:
# Here you would do plausibility checks...

Save to one big CSV file

In [15]:
resultDataSet.to_csv("csv/allData.csv")

Display the data

Something seems to be broken with the Tennet wind data... Other than that, the time spans of the different TSO datasets read-in above are different.

In [16]:
resultDataSet
Out[16]:
actual_tennet_pv actual_tennet_wind actual_transnetbw_pv actual_transnetbw_wind forecast_tennet_pv forecast_tennet_wind forecast_transnetbw_pv forecast_transnetbw_wind
Timestamp
2014-01-01 00:00:00 0 NaN 0 134 0 NaN 0 79
2014-01-01 00:15:00 0 NaN 0 124 0 NaN 0 79
2014-01-01 00:30:00 0 NaN 0 115 0 NaN 0 80
2014-01-01 00:45:00 0 NaN 0 118 0 NaN 0 80
2014-01-01 01:00:00 0 NaN 0 128 0 NaN 0 80
2014-01-01 01:15:00 0 NaN 0 119 0 NaN 0 83
2014-01-01 01:30:00 0 NaN 0 106 0 NaN 0 86
2014-01-01 01:45:00 0 NaN 0 79 0 NaN 0 89
2014-01-01 02:00:00 0 NaN 0 88 0 NaN 0 91
2014-01-01 02:15:00 0 NaN 0 95 0 NaN 0 96
2014-01-01 02:30:00 0 NaN 0 85 0 NaN 0 101
2014-01-01 02:45:00 0 NaN 0 89 0 NaN 0 106
2014-01-01 03:00:00 0 NaN 0 104 0 NaN 0 111
2014-01-01 03:15:00 0 NaN 0 118 0 NaN 0 117
2014-01-01 03:30:00 0 NaN 0 133 0 NaN 0 124
2014-01-01 03:45:00 0 NaN 0 171 0 NaN 0 130
2014-01-01 04:00:00 0 NaN 0 113 0 NaN 0 136
2014-01-01 04:15:00 0 NaN 0 104 0 NaN 0 142
2014-01-01 04:30:00 0 NaN 0 100 0 NaN 0 148
2014-01-01 04:45:00 0 NaN 0 92 0 NaN 0 155
2014-01-01 05:00:00 0 NaN 0 69 0 NaN 0 161
2014-01-01 05:15:00 0 NaN 0 96 0 NaN 0 166
2014-01-01 05:30:00 0 NaN 0 97 0 NaN 0 171
2014-01-01 05:45:00 0 NaN 0 100 0 NaN 0 176
2014-01-01 06:00:00 0 NaN 0 87 0 NaN 0 181
2014-01-01 06:15:00 0 NaN 0 106 0 NaN 0 183
2014-01-01 06:30:00 0 NaN 0 116 0 NaN 0 186
2014-01-01 06:45:00 0 NaN 0 105 0 NaN 0 188
2014-01-01 07:00:00 0 NaN 0 99 0 NaN 0 190
2014-01-01 07:15:00 0 NaN 0 90 0 NaN 0 188
... ... ... ... ... ... ... ... ...
2015-07-31 16:30:00 NaN NaN 2636 32 NaN NaN 2759 43
2015-07-31 16:45:00 NaN NaN 2438 29 NaN NaN 2579 45
2015-07-31 17:00:00 NaN NaN 2208 33 NaN NaN 2383 47
2015-07-31 17:15:00 NaN NaN 1998 36 NaN NaN 2171 49
2015-07-31 17:30:00 NaN NaN 1791 30 NaN NaN 1948 52
2015-07-31 17:45:00 NaN NaN 1535 32 NaN NaN 1718 54
2015-07-31 18:00:00 NaN NaN 1292 37 NaN NaN 1490 56
2015-07-31 18:15:00 NaN NaN 1096 36 NaN NaN 1267 58
2015-07-31 18:30:00 NaN NaN 911 41 NaN NaN 1050 60
2015-07-31 18:45:00 NaN NaN 754 53 NaN NaN 844 63
2015-07-31 19:00:00 NaN NaN 601 65 NaN NaN 660 65
2015-07-31 19:15:00 NaN NaN 448 68 NaN NaN 502 68
2015-07-31 19:30:00 NaN NaN 359 70 NaN NaN 373 71
2015-07-31 19:45:00 NaN NaN 274 79 NaN NaN 279 75
2015-07-31 20:00:00 NaN NaN 199 83 NaN NaN 200 79
2015-07-31 20:15:00 NaN NaN 123 95 NaN NaN 123 84
2015-07-31 20:30:00 NaN NaN 45 109 NaN NaN 63 89
2015-07-31 20:45:00 NaN NaN 14 122 NaN NaN 22 94
2015-07-31 21:00:00 NaN NaN 0 137 NaN NaN 2 98
2015-07-31 21:15:00 NaN NaN 0 153 NaN NaN 2 103
2015-07-31 21:30:00 NaN NaN 0 174 NaN NaN 1 107
2015-07-31 21:45:00 NaN NaN 0 195 NaN NaN 0 112
2015-07-31 22:00:00 NaN NaN 0 220 NaN NaN 0 117
2015-07-31 22:15:00 NaN NaN 0 224 NaN NaN 0 122
2015-07-31 22:30:00 NaN NaN 0 230 NaN NaN 0 127
2015-07-31 22:45:00 NaN NaN 0 243 NaN NaN 0 133
2015-07-31 23:00:00 NaN NaN 0 231 NaN NaN 0 135
2015-07-31 23:15:00 NaN NaN 0 224 NaN NaN 0 135
2015-07-31 23:30:00 NaN NaN 0 235 NaN NaN 0 135
2015-07-31 23:45:00 NaN NaN 0 235 NaN NaN 0 135

55417 rows × 8 columns

Example: Resample the timeseries data to 60minute

Here we use Pandas' powerful timeseries functionality to resample the data to 60minute intervals

In [17]:
resultDataSet.resample('60Min',how='sum')
Out[17]:
actual_tennet_pv actual_tennet_wind actual_transnetbw_pv actual_transnetbw_wind forecast_tennet_pv forecast_tennet_wind forecast_transnetbw_pv forecast_transnetbw_wind
Timestamp
2014-01-01 00:00:00 0 NaN 0 491 0 NaN 0 318
2014-01-01 01:00:00 0 NaN 0 432 0 NaN 0 338
2014-01-01 02:00:00 0 NaN 0 357 0 NaN 0 394
2014-01-01 03:00:00 0 NaN 0 526 0 NaN 0 482
2014-01-01 04:00:00 0 NaN 0 409 0 NaN 0 581
2014-01-01 05:00:00 0 NaN 0 362 0 NaN 0 674
2014-01-01 06:00:00 0 NaN 0 414 0 NaN 0 738
2014-01-01 07:00:00 0 NaN 3 364 12 NaN 0 746
2014-01-01 08:00:00 240 NaN 124 347 568 NaN 123 686
2014-01-01 09:00:00 1740 NaN 811 288 2910 NaN 740 580
2014-01-01 10:00:00 4451 NaN 2253 289 4778 NaN 1976 465
2014-01-01 11:00:00 6976 NaN 4309 242 5759 NaN 3116 368
2014-01-01 12:00:00 8847 NaN 6028 144 5590 NaN 3849 274
2014-01-01 13:00:00 9347 NaN 5952 84 4614 NaN 4053 184
2014-01-01 14:00:00 7358 NaN 4639 129 3127 NaN 3226 126
2014-01-01 15:00:00 3353 NaN 2310 162 1504 NaN 1633 100
2014-01-01 16:00:00 513 NaN 394 254 171 NaN 371 137
2014-01-01 17:00:00 0 NaN 1 411 0 NaN 0 245
2014-01-01 18:00:00 0 NaN 0 451 0 NaN 0 405
2014-01-01 19:00:00 0 NaN 0 693 0 NaN 0 547
2014-01-01 20:00:00 0 NaN 0 803 0 NaN 0 647
2014-01-01 21:00:00 0 NaN 0 821 0 NaN 0 700
2014-01-01 22:00:00 0 NaN 0 681 0 NaN 0 710
2014-01-01 23:00:00 0 NaN 0 553 0 NaN 0 684
2014-01-02 00:00:00 0 NaN 0 423 0 NaN 0 544
2014-01-02 01:00:00 0 NaN 0 397 0 NaN 0 464
2014-01-02 02:00:00 0 NaN 0 390 0 NaN 0 444
2014-01-02 03:00:00 0 NaN 0 401 0 NaN 0 434
2014-01-02 04:00:00 0 NaN 0 429 0 NaN 0 427
2014-01-02 05:00:00 0 NaN 0 460 0 NaN 0 455
... ... ... ... ... ... ... ... ...
2015-07-30 18:00:00 NaN NaN 3792 152 NaN NaN 4059 239
2015-07-30 19:00:00 NaN NaN 1507 90 NaN NaN 1692 200
2015-07-30 20:00:00 NaN NaN 333 101 NaN NaN 366 170
2015-07-30 21:00:00 NaN NaN 1 251 NaN NaN 6 160
2015-07-30 22:00:00 NaN NaN 0 239 NaN NaN 0 160
2015-07-30 23:00:00 NaN NaN 0 159 NaN NaN 0 153
2015-07-31 00:00:00 NaN NaN 0 131 NaN NaN 0 138
2015-07-31 01:00:00 NaN NaN 0 117 NaN NaN 0 125
2015-07-31 02:00:00 NaN NaN 0 126 NaN NaN 0 122
2015-07-31 03:00:00 NaN NaN 0 111 NaN NaN 0 99
2015-07-31 04:00:00 NaN NaN 0 113 NaN NaN 0 76
2015-07-31 05:00:00 NaN NaN 1 115 NaN NaN 0 58
2015-07-31 06:00:00 NaN NaN 413 150 NaN NaN 439 47
2015-07-31 07:00:00 NaN NaN 2342 161 NaN NaN 2249 40
2015-07-31 08:00:00 NaN NaN 5853 80 NaN NaN 5611 40
2015-07-31 09:00:00 NaN NaN 9459 78 NaN NaN 9165 40
2015-07-31 10:00:00 NaN NaN 12361 53 NaN NaN 12029 49
2015-07-31 11:00:00 NaN NaN 14304 60 NaN NaN 14020 67
2015-07-31 12:00:00 NaN NaN 15264 79 NaN NaN 15137 83
2015-07-31 13:00:00 NaN NaN 15221 76 NaN NaN 15405 100
2015-07-31 14:00:00 NaN NaN 14488 112 NaN NaN 14897 129
2015-07-31 15:00:00 NaN NaN 13221 110 NaN NaN 13560 152
2015-07-31 16:00:00 NaN NaN 10802 131 NaN NaN 11333 170
2015-07-31 17:00:00 NaN NaN 7532 131 NaN NaN 8220 202
2015-07-31 18:00:00 NaN NaN 4053 167 NaN NaN 4651 237
2015-07-31 19:00:00 NaN NaN 1682 282 NaN NaN 1814 279
2015-07-31 20:00:00 NaN NaN 381 409 NaN NaN 408 346
2015-07-31 21:00:00 NaN NaN 0 659 NaN NaN 5 420
2015-07-31 22:00:00 NaN NaN 0 917 NaN NaN 0 499
2015-07-31 23:00:00 NaN NaN 0 925 NaN NaN 0 540

13848 rows × 8 columns