Prognostics using Autoencoder

In [1]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)
Out[1]:
In [2]:
from addutils import css_notebook
css_notebook()
Out[2]:

1. Introduction

An autoencoder is an artificial neural network for learning an efficient encoding. The network will learn a representation for a set of data.

The basic idea as can be seen above is that you have a first part of the network called encoder which will learn to create an encoding z of the input X. The second part, the decoder, will learn to recreate the X as X' from the encoding z. In other words, the goal is to minimize the difference between the input X and the output X'.

In this notebook we will use an autoencoder to learn an encoding of 26 signals from turbo fan engines within a window of 26 timesteps. We will then use this encoding to predict breakdown of the turbo fan and make a visualization of the current state of the turbo fan. We will use Tensorflow as our deep learning framework

2. Dataset

The dataset we will use can be found at https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-repository/ It is number 6 in the list and is called Turbofan Engine Degradation Simulation Data Set.

Download the data set and update DATA_DIR below to point to the correct directory.

In [3]:
import os
import time

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

import numpy as np
import pandas as pd
import tensorflow as tf

DATA_DIR = "/mnt/hdd_data/nasa/turbofan_engine/CMAPSSData" 
HEADER_NAMES = ["unit", "time", "op_set_1", "op_set_2", "op_set_3"] + ["sensor_%02d" % i for i in range(1,24)]

%pylab inline
pylab.rcParams['figure.figsize'] = (15, 15)
Populating the interactive namespace from numpy and matplotlib

First we take a look at the data using pandas.

In [4]:
df = pd.read_csv(os.path.join(DATA_DIR, "train_FD001.txt"), sep=" ", index_col=None, header=None, names=HEADER_NAMES)

print("First five rows")
print(df.head(5))

print("\n Unique units (engines)")
print(df["unit"].unique())
First five rows
   unit  time  op_set_1  op_set_2  op_set_3  sensor_01  sensor_02  sensor_03  \
0     1     1   -0.0007   -0.0004     100.0     518.67     641.82    1589.70   
1     1     2    0.0019   -0.0003     100.0     518.67     642.15    1591.82   
2     1     3   -0.0043    0.0003     100.0     518.67     642.35    1587.99   
3     1     4    0.0007    0.0000     100.0     518.67     642.35    1582.79   
4     1     5   -0.0019   -0.0002     100.0     518.67     642.37    1582.85   

   sensor_04  sensor_05    ...      sensor_14  sensor_15  sensor_16  \
0    1400.60      14.62    ...        8138.62     8.4195       0.03   
1    1403.14      14.62    ...        8131.49     8.4318       0.03   
2    1404.20      14.62    ...        8133.23     8.4178       0.03   
3    1401.87      14.62    ...        8133.83     8.3682       0.03   
4    1406.22      14.62    ...        8133.80     8.4294       0.03   

   sensor_17  sensor_18  sensor_19  sensor_20  sensor_21  sensor_22  sensor_23  
0        392       2388      100.0      39.06    23.4190        NaN        NaN  
1        392       2388      100.0      39.00    23.4236        NaN        NaN  
2        390       2388      100.0      38.95    23.3442        NaN        NaN  
3        392       2388      100.0      38.88    23.3739        NaN        NaN  
4        393       2388      100.0      38.90    23.4044        NaN        NaN  

[5 rows x 28 columns]

 Unique units (engines)
[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
  91  92  93  94  95  96  97  98  99 100]

We create some functions to load train and test data.

In [5]:
def train_data(set_number, with_rul=False, with_rul_class=False):
    filepath = os.path.join(DATA_DIR, "train_FD00%s.txt" % (set_number))
    df = pd.read_csv(filepath, sep=" ", index_col=None, header=None, names=HEADER_NAMES)
    df.fillna(0, inplace=True)

    if with_rul:
        df["RUL"] = None

        for unit, df_unit in df.groupby("unit"):
            until_failure = df_unit["time"].max()            
            df.loc[df_unit.index, "RUL"] = until_failure - df_unit["time"]

        if with_rul_class:
            df["RUL_class"] = "normal"
            for name, level in LEVELS:
                df.loc[df["RUL"] < level, "RUL_class"] = name

    return df

def test_data(set_number, with_rul=False):
    filepath = os.path.join(DATA_DIR, "test_FD00%s.txt" % (set_number))
    df = pd.read_csv(filepath, sep=" ", index_col=None, header=None, names=HEADER_NAMES)
    df.fillna(0, inplace=True)

    return df

Plot a specific unit.

In [6]:
def plot_unit(set_number, unit):
    df = train_data(set_number, with_rul=True)

    ignore_cols = ["unit", "time", "RUL"]
    units = [unit]
        
    fig, ax = plt.subplots(1)
    ax = [ax]    

    for i, (unit, df_unit) in enumerate(df[df["unit"].isin(units)].groupby("unit")):

        rul = df_unit["RUL"].values
        labels = [c for c in df_unit.columns if not c in ignore_cols]

        x = df_unit[labels].values #returns a numpy array
        min_max_scaler = preprocessing.MinMaxScaler()
        x_scaled = min_max_scaler.fit_transform(x)

        for values in x_scaled.T:
            ax[i].plot(rul, values)
            #ax[i].invert_xaxis()
            ax[i].set_xlim(max(rul), 0)
            ax[i].set_title("Unit %d" % unit)
            ax[i].set_xlabel('Remaining Useful Life (RUL)')
            ax[i].set_ylabel('normalized values')

    fig.subplots_adjust(hspace=0.4, wspace=0.2)
    plt.show()

Plotting the 26 signals of unit 5 and 63 in dataset 1 called FD001.

In [7]:
plot_unit(1, 5)
plot_unit(1, 63)