import tensorflow as tf
import numpy as np
import pandas as pd
np.random.seed(42)
In fact, Luis already put his solution to github(https://github.com/luisandresilva/Kaggle/tree/master/Rain-II). He used Keras API and python 2. Here, we will reproduce his model for latest tf.keras API and python 3.
First, download data of "How Much Did It Rain? II" competition and put it on datasets/rain2
folder. Read the train.csv
rain2 = pd.read_csv('datasets/rain2/train.csv')
Let's view some rows via head()
method of pandas's DataFrame
rain2.head(n=30)
Id | minutes_past | radardist_km | Ref | Ref_5x5_10th | Ref_5x5_50th | Ref_5x5_90th | RefComposite | RefComposite_5x5_10th | RefComposite_5x5_50th | ... | RhoHV_5x5_90th | Zdr | Zdr_5x5_10th | Zdr_5x5_50th | Zdr_5x5_90th | Kdp | Kdp_5x5_10th | Kdp_5x5_50th | Kdp_5x5_90th | Expected | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 3 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.254000 |
1 | 1 | 16 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.254000 |
2 | 1 | 25 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.254000 |
3 | 1 | 35 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.254000 |
4 | 1 | 45 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.254000 |
5 | 1 | 55 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.254000 |
6 | 2 | 1 | 2.0 | 9.0 | 5.0 | 7.5 | 10.5 | 15.0 | 10.5 | 16.5 | ... | 0.998333 | 0.3750 | -0.1250 | 0.3125 | 0.8750 | 1.059998 | -1.410004 | -0.350006 | 1.059998 | 1.016000 |
7 | 2 | 6 | 2.0 | 26.5 | 22.5 | 25.5 | 31.5 | 26.5 | 26.5 | 28.5 | ... | 1.005000 | 0.0625 | -0.1875 | 0.2500 | 0.6875 | NaN | NaN | NaN | 1.409988 | 1.016000 |
8 | 2 | 11 | 2.0 | 21.5 | 15.5 | 20.5 | 25.0 | 26.5 | 23.5 | 25.0 | ... | 1.001667 | 0.3125 | -0.0625 | 0.3125 | 0.6250 | 0.349991 | NaN | -0.350006 | 1.759994 | 1.016000 |
9 | 2 | 16 | 2.0 | 18.0 | 14.0 | 17.5 | 21.0 | 20.5 | 18.0 | 20.5 | ... | 1.001667 | 0.2500 | 0.1250 | 0.3750 | 0.6875 | 0.349991 | -1.059998 | 0.000000 | 1.059998 | 1.016000 |
10 | 2 | 21 | 2.0 | 24.5 | 16.5 | 21.0 | 24.5 | 24.5 | 21.0 | 24.0 | ... | 0.998333 | 0.2500 | 0.0625 | 0.1875 | 0.5625 | -0.350006 | -1.059998 | -0.350006 | 1.759994 | 1.016000 |
11 | 2 | 26 | 2.0 | 12.0 | 12.0 | 16.0 | 20.0 | 16.5 | 17.0 | 19.0 | ... | 0.998333 | 0.5625 | 0.2500 | 0.4375 | 0.6875 | -1.760010 | -1.760010 | -0.350006 | 0.709991 | 1.016000 |
12 | 2 | 31 | 2.0 | 22.5 | 19.0 | 22.0 | 25.0 | 26.0 | 23.5 | 25.5 | ... | 1.001667 | 0.0000 | -0.1875 | 0.2500 | 0.6250 | -1.059998 | -2.120010 | -0.710007 | 0.349991 | 1.016000 |
13 | 2 | 37 | 2.0 | 14.0 | 14.0 | 18.5 | 21.0 | 19.5 | 20.0 | 21.0 | ... | 0.998333 | 0.5000 | 0.1875 | 0.4375 | 0.8125 | 0.000000 | -1.760010 | -0.350006 | 1.059998 | 1.016000 |
14 | 2 | 42 | 2.0 | 12.0 | 11.0 | 12.5 | 17.0 | 19.5 | 18.0 | 21.0 | ... | 0.998333 | 0.6250 | 0.3750 | 0.6250 | 0.8750 | -0.350006 | -0.350006 | 0.000000 | 0.349991 | 1.016000 |
15 | 2 | 47 | 2.0 | 1.5 | 3.5 | 7.0 | 10.5 | 18.0 | 16.5 | 18.5 | ... | 0.998333 | 0.3750 | 0.1875 | 0.5000 | 0.6875 | 0.349991 | -2.110001 | -0.350006 | 1.059998 | 1.016000 |
16 | 2 | 53 | 2.0 | 16.0 | 14.5 | 18.0 | 23.5 | 28.0 | 23.5 | 26.5 | ... | 0.998333 | 0.8750 | 0.6250 | 0.9375 | 1.3750 | -0.350006 | -1.410004 | -0.350006 | 2.119995 | 1.016000 |
17 | 2 | 58 | 2.0 | 22.0 | 16.5 | 22.5 | 26.5 | 31.5 | 26.5 | 29.0 | ... | 1.001667 | 0.3750 | 0.1875 | 0.3750 | 0.8750 | -1.410004 | NaN | -0.350006 | 0.699997 | 1.016000 |
18 | 3 | 4 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
19 | 3 | 9 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
20 | 3 | 14 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
21 | 3 | 18 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
22 | 3 | 23 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
23 | 3 | 28 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
24 | 3 | 33 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
25 | 3 | 38 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
26 | 3 | 43 | 10.0 | NaN | NaN | NaN | 8.5 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
27 | 3 | 48 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
28 | 3 | 53 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.801667 | NaN | NaN | NaN | 2.0625 | NaN | NaN | NaN | NaN | 26.162014 |
29 | 3 | 58 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 26.162014 |
30 rows × 24 columns
There are 24 columns. First column Id
is unique value representing a sequence. Last column Expected
is target. We simply remove rows that Ref
column is NaN
. If Ref
column is NaN
, all other columns is NaN
too.
train_df = rain2.dropna(subset=['Ref'])
Using describe()
method, we get the statistics for dataset. The features excluding Id
and Expected
has different scale. But the order of magnitude is not significant.
train_df.describe()
Id | minutes_past | radardist_km | Ref | Ref_5x5_10th | Ref_5x5_50th | Ref_5x5_90th | RefComposite | RefComposite_5x5_10th | RefComposite_5x5_50th | ... | RhoHV_5x5_90th | Zdr | Zdr_5x5_10th | Zdr_5x5_50th | Zdr_5x5_90th | Kdp | Kdp_5x5_10th | Kdp_5x5_50th | Kdp_5x5_90th | Expected | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 6.349375e+06 | 6.349375e+06 | 6.349375e+06 | 6.349375e+06 | 5.269105e+06 | 6.156846e+06 | 6.349288e+06 | 6.349375e+06 | 5.550921e+06 | 6.207490e+06 | ... | 4.600898e+06 | 4.069799e+06 | 3.577674e+06 | 4.095586e+06 | 4.600898e+06 | 3.494665e+06 | 2.978691e+06 | 3.511354e+06 | 4.062702e+06 | 6.349375e+06 |
mean | 5.894813e+05 | 2.927051e+01 | 9.286182e+00 | 2.292666e+01 | 1.998385e+01 | 2.302103e+01 | 2.812518e+01 | 2.540071e+01 | 2.256900e+01 | 2.541082e+01 | ... | 1.013113e+00 | 5.295568e-01 | -6.975932e-01 | 3.815768e-01 | 2.057199e+00 | 3.230033e-02 | -3.409001e+00 | -3.781228e-01 | 4.038652e+00 | 1.714876e+01 |
std | 3.406124e+05 | 1.715383e+01 | 4.068784e+00 | 1.035516e+01 | 9.195141e+00 | 9.882618e+00 | 1.028228e+01 | 1.044013e+01 | 9.547383e+00 | 1.003713e+01 | ... | 4.179456e-02 | 1.476643e+00 | 1.017368e+00 | 9.260191e-01 | 1.617217e+00 | 3.699795e+00 | 2.719573e+00 | 2.087361e+00 | 3.902270e+00 | 2.025279e+02 |
min | 2.000000e+00 | 0.000000e+00 | 0.000000e+00 | -3.100000e+01 | -3.200000e+01 | -3.200000e+01 | -2.850000e+01 | -2.800000e+01 | -3.050000e+01 | -2.750000e+01 | ... | 2.083333e-01 | -7.875000e+00 | -7.875000e+00 | -7.875000e+00 | -7.875000e+00 | -9.604000e+01 | -8.079000e+01 | -7.168000e+01 | -1.002000e+02 | 1.000000e-02 |
25% | 2.923290e+05 | 1.400000e+01 | 6.000000e+00 | 1.600000e+01 | 1.450000e+01 | 1.650000e+01 | 2.150000e+01 | 1.850000e+01 | 1.650000e+01 | 1.850000e+01 | ... | 9.983333e-01 | -1.875000e-01 | -1.062500e+00 | 0.000000e+00 | 1.062500e+00 | -1.410004e+00 | -4.540008e+00 | -7.100067e-01 | 2.059998e+00 | 5.080003e-01 |
50% | 5.906240e+05 | 2.900000e+01 | 1.000000e+01 | 2.250000e+01 | 2.000000e+01 | 2.300000e+01 | 2.750000e+01 | 2.500000e+01 | 2.250000e+01 | 2.500000e+01 | ... | 1.005000e+00 | 3.750000e-01 | -6.250000e-01 | 3.125000e-01 | 1.687500e+00 | 0.000000e+00 | -2.820007e+00 | 0.000000e+00 | 3.509994e+00 | 1.425001e+00 |
75% | 8.883170e+05 | 4.400000e+01 | 1.200000e+01 | 2.950000e+01 | 2.600000e+01 | 2.950000e+01 | 3.500000e+01 | 3.200000e+01 | 2.900000e+01 | 3.200000e+01 | ... | 1.051667e+00 | 1.062500e+00 | -1.875000e-01 | 6.875000e-01 | 2.562500e+00 | 1.409988e+00 | -1.740006e+00 | 3.499908e-01 | 5.629990e+00 | 4.064002e+00 |
max | 1.180945e+06 | 5.900000e+01 | 2.100000e+01 | 7.100000e+01 | 6.250000e+01 | 6.900000e+01 | 7.250000e+01 | 9.250000e+01 | 6.600000e+01 | 7.100000e+01 | ... | 1.051667e+00 | 7.937500e+00 | 7.937500e+00 | 7.937500e+00 | 7.937500e+00 | 1.797500e+02 | 3.169998e+00 | 1.280000e+01 | 1.446000e+02 | 3.301773e+04 |
8 rows × 24 columns
We removed the rows that Ref
column is NaN
. But there are still NaN
values. Let's check How many NaN
are in the train_df
.
train_df.isna().sum()
Id 0 minutes_past 0 radardist_km 0 Ref 0 Ref_5x5_10th 1080270 Ref_5x5_50th 192529 Ref_5x5_90th 87 RefComposite 0 RefComposite_5x5_10th 798454 RefComposite_5x5_50th 141885 RefComposite_5x5_90th 67 RhoHV 2279576 RhoHV_5x5_10th 2771701 RhoHV_5x5_50th 2253789 RhoHV_5x5_90th 1748477 Zdr 2279576 Zdr_5x5_10th 2771701 Zdr_5x5_50th 2253789 Zdr_5x5_90th 1748477 Kdp 2854710 Kdp_5x5_10th 3370684 Kdp_5x5_50th 2838021 Kdp_5x5_90th 2286673 Expected 0 dtype: int64
Ok. For simplicity, just fill NaN
cell with 0.
train_df = train_df.fillna(0)
train_df.isna().sum()
Id 0 minutes_past 0 radardist_km 0 Ref 0 Ref_5x5_10th 0 Ref_5x5_50th 0 Ref_5x5_90th 0 RefComposite 0 RefComposite_5x5_10th 0 RefComposite_5x5_50th 0 RefComposite_5x5_90th 0 RhoHV 0 RhoHV_5x5_10th 0 RhoHV_5x5_50th 0 RhoHV_5x5_90th 0 Zdr 0 Zdr_5x5_10th 0 Zdr_5x5_50th 0 Zdr_5x5_90th 0 Kdp 0 Kdp_5x5_10th 0 Kdp_5x5_50th 0 Kdp_5x5_90th 0 Expected 0 dtype: int64
We can group by Id
column via groupby
method. This is the sequence we feed into the model.
train_seq = train_df.groupby(['Id'])
Now, let's check the number of unique Id
and maximum length of sequence. size()
method return the length of each group. It's pandas's Series
object that has count()
and max()
method.
train_seq_size = train_seq.size()
train_seq_size.count(), train_seq_size.max()
(731556, 19)
Now, we know number of sequence and max length of each sequence. So we can make numpy array X
and y
. X
has 731,556 sequences. Each sequence has up to 19 rows. Each row has 22 features. y
has just one column for Expected
.
X = np.zeros((731556, 19, 22))
y = np.zeros((731556, 1))
Luis applied some preprocessing and feature extraction. Here, we just try to apply raw dataset to LSTM without any preprocessing.
i = 0
for name, group in train_seq:
# d.shape is (seq_length, 24)
d = group.values
# column 1~22 are features.
# column 0 is Id and column 23 is target.
# save 1~22 features to 0~21 index of dataset up to d.shape[0].
X[i, :d.shape[0], 0:22] = d[:, 1:23]
y[i, 0] = d[0, 23]
i += 1;
print(i)
731556
We make a python generator function to feed batch data to fit_generator()
method.
def feed_gen(X, y, batch_size=1024):
shuffled_index = np.random.permutation(len(X)) # shuffle index
start = 0
while 1:
end = start + batch_size
if end > len(X): # cannot exceed X's length
end = len(X)
yield X[shuffled_index[start:end]], y[shuffled_index[start:end]]
start = end
if end >= len(X): # if arrive at the end, shuffle again.
shuffled_index = np.random.permutation(len(X))
start = 0
We split dataset to train set and test set. In fact test set is validation set for this task. Total dataset size is 731,556. Test set is sufficient 10% of the data.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
Calculate steps_per_epoch
and validation_steps
for fit_generator()
method. We set batch_size
is 1024.
batch_size = 1024
steps_per_epoch = np.ceil(X_train.shape[0] / batch_size)
validation_steps = np.ceil(X_test.shape[0] / batch_size)
steps_per_epoch, validation_steps
(643.0, 72.0)
Now, we can make two generator for training and testing.
train_gen = feed_gen(X_train, y_train, batch_size)
val_gen = feed_gen(X_test, y_test, batch_size)
This model has two layers. LSTM
layer has 35 units. It is just Luis choice. Final Dense
layer has 1 unit without activation for Expected
target. All other parameters of LSTM
and Dense
layer use default value. Loss is 'mae'
for regression task. We choose 'rmsprop'
optimizer with default values.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
model = Sequential()
model.add(LSTM(35, input_shape=(None, 22)))
model.add(Dense(1))
model.compile(loss='mae', optimizer='rmsprop')
Model training is simple. We just put the two generators and steps for epoch and validation. Let's try 100 epochs.
history = model.fit_generator(train_gen,
steps_per_epoch=steps_per_epoch,
epochs=100,
validation_data=val_gen,
validation_steps=validation_steps)
Epoch 1/100 643/643 [==============================] - 15s 23ms/step - loss: 23.3592 - val_loss: 22.4884 Epoch 2/100 643/643 [==============================] - 12s 19ms/step - loss: 23.2784 - val_loss: 22.8763 Epoch 3/100 643/643 [==============================] - 12s 19ms/step - loss: 23.2461 - val_loss: 22.5458 Epoch 4/100 643/643 [==============================] - 12s 19ms/step - loss: 23.2264 - val_loss: 22.2134 Epoch 5/100 643/643 [==============================] - 12s 19ms/step - loss: 23.2092 - val_loss: 22.8454 Epoch 6/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1981 - val_loss: 23.1129 Epoch 7/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1891 - val_loss: 22.3145 Epoch 8/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1821 - val_loss: 21.9953 Epoch 9/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1786 - val_loss: 22.3672 Epoch 10/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1720 - val_loss: 23.1699 Epoch 11/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1685 - val_loss: 22.4761 Epoch 12/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1650 - val_loss: 21.7791 Epoch 13/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1613 - val_loss: 22.2748 Epoch 14/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1596 - val_loss: 23.2295 Epoch 15/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1558 - val_loss: 22.7003 Epoch 16/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1543 - val_loss: 22.4209 Epoch 17/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1514 - val_loss: 21.8673 Epoch 18/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1495 - val_loss: 22.3021 Epoch 19/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1477 - val_loss: 22.4935 Epoch 20/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1462 - val_loss: 23.3478 Epoch 21/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1431 - val_loss: 21.3769 Epoch 22/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1428 - val_loss: 21.9813 Epoch 23/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1396 - val_loss: 22.3207 Epoch 24/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1373 - val_loss: 22.6694 Epoch 25/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1369 - val_loss: 22.1238 Epoch 26/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1365 - val_loss: 21.3457 Epoch 27/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1358 - val_loss: 22.6521 Epoch 28/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1337 - val_loss: 21.9777 Epoch 29/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1312 - val_loss: 22.9756 Epoch 30/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1305 - val_loss: 22.3396 Epoch 31/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1287 - val_loss: 22.1856 Epoch 32/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1278 - val_loss: 22.3289 Epoch 33/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1268 - val_loss: 21.7444 Epoch 34/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1245 - val_loss: 23.2074 Epoch 35/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1238 - val_loss: 22.0729 Epoch 36/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1228 - val_loss: 21.4301 Epoch 37/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1223 - val_loss: 22.5052 Epoch 38/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1210 - val_loss: 21.5444 Epoch 39/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1195 - val_loss: 22.3118 Epoch 40/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1193 - val_loss: 22.7369 Epoch 41/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1195 - val_loss: 22.1068 Epoch 42/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1191 - val_loss: 21.6861 Epoch 43/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1169 - val_loss: 22.4820 Epoch 44/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1169 - val_loss: 22.3503 Epoch 45/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1156 - val_loss: 22.2530 Epoch 46/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1145 - val_loss: 21.5551 Epoch 47/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1153 - val_loss: 22.6081 Epoch 48/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1135 - val_loss: 21.7304 Epoch 49/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1125 - val_loss: 22.7773 Epoch 50/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1113 - val_loss: 22.8420 Epoch 51/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1113 - val_loss: 22.8178 Epoch 52/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1106 - val_loss: 22.4296 Epoch 53/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1100 - val_loss: 21.7362 Epoch 54/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1084 - val_loss: 22.7183 Epoch 55/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1081 - val_loss: 22.3555 Epoch 56/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1084 - val_loss: 22.4259 Epoch 57/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1069 - val_loss: 23.2153 Epoch 58/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1061 - val_loss: 22.9800 Epoch 59/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1052 - val_loss: 22.7757 Epoch 60/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1048 - val_loss: 21.8380 Epoch 61/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1039 - val_loss: 21.5761 Epoch 62/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1042 - val_loss: 22.3047 Epoch 63/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1023 - val_loss: 23.1328 Epoch 64/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1026 - val_loss: 22.4232 Epoch 65/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1021 - val_loss: 22.2933 Epoch 66/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1015 - val_loss: 21.1154 Epoch 67/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1006 - val_loss: 22.1923 Epoch 68/100 643/643 [==============================] - 12s 19ms/step - loss: 23.1005 - val_loss: 21.9716 Epoch 69/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0997 - val_loss: 22.3442 Epoch 70/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0983 - val_loss: 22.2724 Epoch 71/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0991 - val_loss: 22.5911 Epoch 72/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0983 - val_loss: 21.6359 Epoch 73/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0982 - val_loss: 22.7203 Epoch 74/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0974 - val_loss: 22.4947 Epoch 75/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0953 - val_loss: 20.7540 Epoch 76/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0951 - val_loss: 22.6389 Epoch 77/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0967 - val_loss: 22.2818 Epoch 78/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0939 - val_loss: 22.0626 Epoch 79/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0944 - val_loss: 22.2057 Epoch 80/100 643/643 [==============================] - 12s 19ms/step - loss: 23.0934 - val_loss: 23.2137 Epoch 81/100 643/643 [==============================] - 13s 21ms/step - loss: 23.0940 - val_loss: 21.4464 Epoch 82/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0915 - val_loss: 22.6907 Epoch 83/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0922 - val_loss: 22.5729 Epoch 84/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0912 - val_loss: 22.4931 Epoch 85/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0902 - val_loss: 22.3457 Epoch 86/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0901 - val_loss: 21.6814 Epoch 87/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0899 - val_loss: 21.8171 Epoch 88/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0895 - val_loss: 22.9623 Epoch 89/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0900 - val_loss: 23.2738 Epoch 90/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0888 - val_loss: 21.2127 Epoch 91/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0879 - val_loss: 22.2649 Epoch 92/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0868 - val_loss: 22.1007 Epoch 93/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0862 - val_loss: 22.3556 Epoch 94/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0863 - val_loss: 22.7753 Epoch 95/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0865 - val_loss: 22.3068 Epoch 96/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0860 - val_loss: 21.9186 Epoch 97/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0861 - val_loss: 21.5697 Epoch 98/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0845 - val_loss: 22.2280 Epoch 99/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0857 - val_loss: 22.9889 Epoch 100/100 643/643 [==============================] - 13s 20ms/step - loss: 23.0853 - val_loss: 21.9630
Now we can make prediction for test data. Read the test.csv file and fill NaN
with 0.
rain2_test = pd.read_csv('datasets/rain2/test.csv')
test_df = rain2_test.fillna(0)
Test data has 23 columns not 24 columns. It doesn't have Expected
column.
test_df.head()
Id | minutes_past | radardist_km | Ref | Ref_5x5_10th | Ref_5x5_50th | Ref_5x5_90th | RefComposite | RefComposite_5x5_10th | RefComposite_5x5_50th | ... | RhoHV_5x5_50th | RhoHV_5x5_90th | Zdr | Zdr_5x5_10th | Zdr_5x5_50th | Zdr_5x5_90th | Kdp | Kdp_5x5_10th | Kdp_5x5_50th | Kdp_5x5_90th | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 8.0 | 0.0 | 0.0 | 0.0 | 14.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 1 | 5 | 8.0 | 10.0 | 0.0 | 10.0 | 18.0 | 11.5 | 0.0 | 11.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 1 | 8 | 8.0 | 0.0 | 0.0 | 7.0 | 14.5 | 0.0 | 0.0 | 7.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 1 | 12 | 8.0 | 14.0 | 0.0 | 9.0 | 16.0 | 14.0 | 0.0 | 9.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 1 | 15 | 8.0 | 10.5 | 0.0 | 9.0 | 15.5 | 13.5 | 0.0 | 9.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 23 columns
Ok. Let's check the number of sequences and maximum length just like above.
test_seq = test_df.groupby(['Id'])
test_seq_size = test_seq.size()
test_seq_size.count(), test_seq_size.max()
(717625, 19)
Create X_text
numpy array. Its size is (717625, 19, 22).
X_test = np.zeros((717625, 19, 22))
Just like train data, copy test_seq
to X_test
.
i = 0
for name, group in test_seq:
# d.shape is (seq_length, 23)
d = group.values
# column 1~22 are features.
# save 1~22 features to 0~21 index of dataset up to d.shape[0].
X_test[i, :d.shape[0], 0:22] = d[:, 1:23]
i += 1;
print(i)
717625
Now we can feed X_test
to our model and get the predictions!
pred = model.predict(X_test)
Making submission file, we stack pred
array and its index horizontally. Using numpy.savetxt
, we can make csv file with format("int, float") and title("Id,Expected").
pred_with_index = np.hstack((np.arange(1, pred.shape[0]+1).reshape(-1,1), pred))
np.savetxt("datasets/rain2/test_prediction.csv", pred_with_index, "%d,%f",
delimiter=",", header="Id,Expected", comments="")
!head "datasets/rain2/test_prediction.csv"
Id,Expected 1,0.611155 2,1.193402 3,3.511984 4,7.424457 5,0.799749 6,0.733051 7,2.819939 8,0.949913 9,0.663545
If you submit this file to Kaggle competition(https://www.kaggle.com/c/how-much-did-it-rain-ii/submit), you will get about 23.76 score. It may be ranked before 100. Cool!
You can stack another LSTM
layer and change weight initializer or optimizer. If you want to search more parameters, it's better to use keras.wrappers.scikit_learn.KerasClassifier
with scikit-learn's GridSearchCV
.