Notebook

Build a chatbot from scratch¶

Hello everyone, nowadays, after the release of chatGPT - a natural language AI, the Data Science & AI once again rise. Having been interested in data science for so long, now i see that as an oppotunity for me to learn more about the world of AI and data science. today, i am going to try to build a chatbot from scratch, by doing this way, i have the oppotunity to learn how to build a database, how does training AI look like, what is the process of making an AI.

The data im getting to train my AI is from Kaggle. It's a good data to use to train my chatbot.

I - Data Preparation¶

Import statment¶

I will first import several libraries that I would be using.

In [ ]:

import json
import string
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import random
import string

I also need to import the data too!

In [ ]:

data = json.load(open('/content/sample_data/Intent.json'))
data = data["intents"]

If you look closely at the data, it has some important collumns that will be useful for use to build a chatbot.

intent: the intentions of the phrases
text: the texts that are usually associated with that intention. For example, "Hi" is an text usually associates with
responses: the possible responses for this intentions

These 3 collumns with be enugh for us to build a chatbot.

The data¶

Now let build a dataframe from pandas

In [ ]:

df = pd.DataFrame(data)
df.head()

Out[ ]:

	intent	text	responses	extension	context	entityType	entities
0	Greeting	[Hi, Hi there, Hola, Hello, Hello there, Hya, ...	[Hi human, please tell me your GeniSys user, H...	{'function': '', 'entities': False, 'responses...	{'in': '', 'out': 'GreetingUserRequest', 'clea...	NA	[]
1	GreetingResponse	[My user is Adam, This is Adam, I am Adam, It ...	[Great! Hi <HUMAN>! How can I help?, Good! Hi ...	{'function': 'extensions.gHumans.updateHuman',...	{'in': 'GreetingUserRequest', 'out': '', 'clea...	NA	[{'entity': 'HUMAN', 'rangeFrom': 3, 'rangeTo'...
2	CourtesyGreeting	[How are you?, Hi how are you?, Hello how are ...	[Hello, I am great, how are you? Please tell m...	{'function': '', 'entities': False, 'responses...	{'in': '', 'out': 'CourtesyGreetingUserRequest...	NA	[]
3	CourtesyGreetingResponse	[Good thanks! My user is Adam, Good thanks! Th...	[Great! Hi <HUMAN>! How can I help?, Good! Hi ...	{'function': 'extensions.gHumans.updateHuman',...	{'in': 'GreetingUserRequest', 'out': '', 'clea...	NA	[{'entity': 'HUMAN', 'rangeFrom': 5, 'rangeTo'...
4	CurrentHumanQuery	[What is my name?, What do you call me?, Who d...	[You are <HUMAN>! How can I help?, Your name i...	{'function': 'extensions.gHumans.getCurrentHum...	{'in': '', 'out': 'CurrentHumanQuery', 'clear'...	NA	[]

II - Chatbot strategy¶

To build a chatbot, we need to develop an strategy.

In real life, when we are in a conversation, you listen to the speaker what they says, and process that information by thinking what do they mean and choose the most possible answer to what the sentences they have said earlier.

This is what also going to be implemmented in our chatbot. We will first let the user inputs something, let the AI guess the intentions of the inpunt and Define the various intents (user inputs) and corresponding responses.

Create necessary Dataframes¶

I will ccreate 2 dataframes called df_patterns which consists of texts and intent and the other dataframe called df_responses which have responses and intents collumns.

In [ ]:

df_patterns = df[['text', 'intent']]
df_responses = df[['responses', 'intent']]

In [ ]:

df_patterns.head()

Out[ ]:

	text	intent
0	[Hi, Hi there, Hola, Hello, Hello there, Hya, ...	Greeting
1	[My user is Adam, This is Adam, I am Adam, It ...	GreetingResponse
2	[How are you?, Hi how are you?, Hello how are ...	CourtesyGreeting
3	[Good thanks! My user is Adam, Good thanks! Th...	CourtesyGreetingResponse
4	[What is my name?, What do you call me?, Who d...	CurrentHumanQuery

In [ ]:

df_responses.head()

Out[ ]:

	responses	intent
0	[Hi human, please tell me your GeniSys user, H...	Greeting
1	[Great! Hi <HUMAN>! How can I help?, Good! Hi ...	GreetingResponse
2	[Hello, I am great, how are you? Please tell m...	CourtesyGreeting
3	[Great! Hi <HUMAN>! How can I help?, Good! Hi ...	CourtesyGreetingResponse
4	[You are <HUMAN>! How can I help?, Your name i...	CurrentHumanQuery

As you can see, the repsonses and the texts are ok for the coresponding intents, but there is one problem is that they are in an array, which we don't want it to happen so we need to explode the array and make each index an individual cell.

In [ ]:

df_patterns = df_patterns.explode('text')
df_responses = df_responses.explode('responses')

df_patterns.head()

Out[ ]:

	text	intent
0	Hi	Greeting
0	Hi there	Greeting
0	Hola	Greeting
0	Hello	Greeting
0	Hello there	Greeting

In [ ]:

df_patterns['text'] = df_patterns['text'].apply(lambda x: ''.join(ch for ch in x if ch not in string.punctuation).lower())
df_responses['responses'] = df_responses['responses'].apply(lambda x: ''.join(ch for ch in x if ch not in string.punctuation).lower())
df_responses.head()

Out[ ]:

	responses	intent
0	hi human please tell me your genisys user	Greeting
0	hello human please tell me your genisys user	Greeting
0	hola human please tell me your genisys user	Greeting
1	great hi human how can i help	GreetingResponse
1	good hi human how can i help you	GreetingResponse

The Algorithm¶

So my approach would be this:

We will first train the algorithm to recognize which texts is from which intent, to achieve that, first we must break it the text into separate words and tokenize those words and put then into a sequence for easier computation. THen we will take those tokenized sequences and fit them with the approriate intents.
After we trained the model, I would get input from the user, the input well also be tokenized and matched with the right intents, after this, we will use df_ responses to randomly choose a response to the user's inputs.

In [ ]:

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words = 2000)
tokenizer.fit_on_texts(df_patterns['text'])
sequences = tokenizer.texts_to_sequences(df_patterns['text'])
x_train = pad_sequences(sequences)


from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(df_patterns["intent"])

In this section, the Tokenizer class from Keras is imported. A tokenizer is a tool used to split text into individual words (tokens) and assign unique numerical indices to each word. Here, a tokenizer is initialized with a num_words parameter of 2000, which indicates that only the top 2000 most frequent words will be considered. The fit_on_texts method is called on the tokenizer, which processes the text data provided in df_patterns['text'] and learns the vocabulary from it. Next, the texts_to_sequences method is used to convert the text into sequences of tokenized indices. These sequences are stored in the sequences variable.

t, the LabelEncoder class from Scikit-learn is imported. A label encoder is used to convert categorical labels into numerical representations. Here, a label encoder is initialized as le, and the fit_transform method is called on it, passing df_patterns["intent"] as the target labels. This method learns the unique intents from the data and assigns a numerical label to each intent. The encoded labels are stored in the y_train variable.

By tokenizing the text data and padding the sequences, we prepare the input (x_train) for training the model. The label encoder helps encode the intents into numerical representations (y_train) suitable for model training and evaluation.

In [ ]:

input_shape = x_train.shape[1]
print(input_shape)

In the first line, x_train.shape[1] retrieves the second dimension of the x_train array, which corresponds to the length of the sequences in the input data. The shape attribute of a NumPy array returns a tuple indicating the dimensions of the array.

By assigning x_train.shape[1] to the input_shape variable, we capture the length of the sequences, which is essential information for defining the shape of the input layer in the subsequent model architecture.

The second line, print(input_shape), simply prints the value of input_shape to the console, allowing you to verify the calculated value. This step can be useful for debugging or understanding the shape of the input data before building the model.

In [ ]:

#Define vocab
num_vocabulary = len(tokenizer.word_index)
print("number of unique words:", num_vocabulary)
output_length = le.classes_.shape[0]
print("output length:", output_length)

number of unique words: 117
output length: 22

In this section, len(tokenizer.word_index) retrieves the number of unique words in the vocabulary that was learned by the tokenizer. The word_index attribute of the tokenizer is a dictionary that maps each word to its corresponding index. By calculating the length of this dictionary, we obtain the total number of unique words in the vocabulary. The value is assigned to the variable num_vocabulary. The subsequent line prints the number of unique words to the console, providing visibility into the size of the vocabulary.

le.classes_.shape[0] retrieves the shape of the encoded intent classes obtained from the label encoder. The classes_ attribute of the label encoder returns the unique classes (intents) in the encoded form. The shape of this array is then accessed to obtain the length of the classes. The value is assigned to the variable output_length. The subsequent line prints the output length to the console, giving you information about the number of unique intents present in the dataset.

By determining the vocabulary size (num_vocabulary) and the output length (output_length), these values can be utilized in further steps of the model architecture, such as defining the dimensions of the embedding layer and the output layer.

In [ ]:

from tensorflow.keras.layers import Input, Embedding, LSTM, Flatten, Dense
from tensorflow.keras.models  import Model


i = Input(shape =(input_shape))
x = Embedding(num_vocabulary+1, 10)(i)
x = LSTM(10, return_sequences = True)(x)
x = Flatten()(x)
x = Dense(output_length, activation = "softmax")(x)
model = Model(i, x)

The code snippet you provided defines a Keras model with an input layer, embedding layer, LSTM layer, flatten layer, and dense output layer. Let's break it down:

from tf.keras.layers import Input, Embedding, LSTM, Flatten, Dense
from tf.keras.models import Model

i = Input(shape=(input_shape))
x = Embedding(num_vocabulary + 1, 10)(i)
x = LSTM(10, return_sequences=True)(x)
x = Flatten()(x)
x = Dense(output_length, activation="softmax")(x)
model = Model(i, x)

Importing Libraries:
```
from tf.keras.layers import Input, Embedding, LSTM, Flatten, Dense
from tf.keras.models import Model
```
This section imports the necessary layers and models from the Keras library to build the model.
Input Layer:
```
i = Input(shape=(input_shape))
```
This line defines an input layer with the shape specified by input_shape. The input_shape is assumed to have been defined earlier in the code.
Embedding Layer:
```
x = Embedding(num_vocabulary + 1, 10)(i)
```
This line adds an embedding layer to the model. The embedding layer maps the input sequence of word indices to dense vectors of dimension 10. The num_vocabulary + 1 represents the size of the vocabulary plus one for out-of-vocabulary (OOV) words.
LSTM Layer:
```
x = LSTM(10, return_sequences=True)(x)
```
This line adds an LSTM layer to the model. The LSTM layer has 10 units and is set to return the entire sequence of outputs rather than just the last output.
Flatten Layer:
```
x = Flatten()(x)
```
This line adds a flatten layer to the model. The flatten layer converts the output of the LSTM layer into a 1-dimensional tensor, preparing it for the subsequent dense layer.
Dense Layer:
```
x = Dense(output_length, activation="softmax")(x)
```
This line adds a dense layer to the model. The dense layer has output_length units, which corresponds to the number of classes or intents in the output. The activation function used is softmax, which produces a probability distribution over the output classes.
Model Compilation:
```
model = Model(i, x)
```
This line creates a Keras model using the input layer i and output layer x. The resulting model can be compiled, trained, and evaluated for intent recognition tasks.

In summary, the code defines a sequential model architecture with an embedding layer to represent the input sequence, an LSTM layer to capture sequential patterns, and a dense output layer to predict the intent based on the learned features.

In [ ]:

model.compile(loss = "sparse_categorical_crossentropy", optimizer = "adam", metrics=["accuracy"])

The code snippet you provided compiles the Keras model with specific loss, optimizer, and metrics configurations. Let's break it down:

model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

Loss Function:
```
loss = "sparse_categorical_crossentropy"
```
This line sets the loss function for the model. In this case, it uses the "sparse_categorical_crossentropy" loss function. This loss function is suitable for multi-class classification tasks where the target variable (intents in this case) is encoded as integers.
Optimizer:
```
optimizer = "adam"
```
This line specifies the optimizer to be used during the model training. The "adam" optimizer is a popular optimization algorithm that adapts the learning rate dynamically based on the characteristics of the training data. It is well-suited for a wide range of deep learning tasks.
Metrics:
```
metrics = ["accuracy"]
```
This line sets the evaluation metric(s) to be used during training and evaluation. In this case, the model will be evaluated based on accuracy, which measures the fraction of correctly predicted intents.

By compiling the model with the specified loss function, optimizer, and metrics, it is ready to be trained and evaluated using the provided configurations. The choice of loss function, optimizer, and metrics depends on the specific requirements of the intent recognition task and can be modified to suit different scenarios.

In [ ]:

train = model.fit(x_train, y_train, epochs = 200)

Epoch 1/200
5/5 [==============================] - 3s 12ms/step - loss: 3.0924 - accuracy: 0.0350
Epoch 2/200
5/5 [==============================] - 0s 10ms/step - loss: 3.0891 - accuracy: 0.0490
Epoch 3/200
5/5 [==============================] - 0s 12ms/step - loss: 3.0862 - accuracy: 0.0490
Epoch 4/200
5/5 [==============================] - 0s 10ms/step - loss: 3.0834 - accuracy: 0.0629
Epoch 5/200
5/5 [==============================] - 0s 9ms/step - loss: 3.0807 - accuracy: 0.0699
Epoch 6/200
5/5 [==============================] - 0s 10ms/step - loss: 3.0775 - accuracy: 0.0769
Epoch 7/200
5/5 [==============================] - 0s 9ms/step - loss: 3.0741 - accuracy: 0.0839
Epoch 8/200
5/5 [==============================] - 0s 9ms/step - loss: 3.0702 - accuracy: 0.0979
Epoch 9/200
5/5 [==============================] - 0s 8ms/step - loss: 3.0657 - accuracy: 0.1189
Epoch 10/200
5/5 [==============================] - 0s 9ms/step - loss: 3.0606 - accuracy: 0.1259
Epoch 11/200
5/5 [==============================] - 0s 8ms/step - loss: 3.0543 - accuracy: 0.1259
Epoch 12/200
5/5 [==============================] - 0s 8ms/step - loss: 3.0469 - accuracy: 0.1329
Epoch 13/200
5/5 [==============================] - 0s 8ms/step - loss: 3.0386 - accuracy: 0.1329
Epoch 14/200
5/5 [==============================] - 0s 8ms/step - loss: 3.0280 - accuracy: 0.1399
Epoch 15/200
5/5 [==============================] - 0s 8ms/step - loss: 3.0179 - accuracy: 0.1329
Epoch 16/200
5/5 [==============================] - 0s 9ms/step - loss: 3.0025 - accuracy: 0.1608
Epoch 17/200
5/5 [==============================] - 0s 10ms/step - loss: 2.9866 - accuracy: 0.1608
Epoch 18/200
5/5 [==============================] - 0s 10ms/step - loss: 2.9682 - accuracy: 0.1958
Epoch 19/200
5/5 [==============================] - 0s 9ms/step - loss: 2.9462 - accuracy: 0.2168
Epoch 20/200
5/5 [==============================] - 0s 6ms/step - loss: 2.9199 - accuracy: 0.2727
Epoch 21/200
5/5 [==============================] - 0s 6ms/step - loss: 2.8927 - accuracy: 0.2727
Epoch 22/200
5/5 [==============================] - 0s 6ms/step - loss: 2.8586 - accuracy: 0.2657
Epoch 23/200
5/5 [==============================] - 0s 6ms/step - loss: 2.8204 - accuracy: 0.2587
Epoch 24/200
5/5 [==============================] - 0s 6ms/step - loss: 2.7794 - accuracy: 0.2727
Epoch 25/200
5/5 [==============================] - 0s 9ms/step - loss: 2.7350 - accuracy: 0.2937
Epoch 26/200
5/5 [==============================] - 0s 7ms/step - loss: 2.6853 - accuracy: 0.3077
Epoch 27/200
5/5 [==============================] - 0s 6ms/step - loss: 2.6362 - accuracy: 0.3287
Epoch 28/200
5/5 [==============================] - 0s 6ms/step - loss: 2.5821 - accuracy: 0.3497
Epoch 29/200
5/5 [==============================] - 0s 7ms/step - loss: 2.5276 - accuracy: 0.3636
Epoch 30/200
5/5 [==============================] - 0s 8ms/step - loss: 2.4731 - accuracy: 0.3776
Epoch 31/200
5/5 [==============================] - 0s 6ms/step - loss: 2.4176 - accuracy: 0.4266
Epoch 32/200
5/5 [==============================] - 0s 6ms/step - loss: 2.3604 - accuracy: 0.4476
Epoch 33/200
5/5 [==============================] - 0s 7ms/step - loss: 2.3001 - accuracy: 0.4545
Epoch 34/200
5/5 [==============================] - 0s 6ms/step - loss: 2.2511 - accuracy: 0.4545
Epoch 35/200
5/5 [==============================] - 0s 6ms/step - loss: 2.1897 - accuracy: 0.4755
Epoch 36/200
5/5 [==============================] - 0s 6ms/step - loss: 2.1401 - accuracy: 0.4965
Epoch 37/200
5/5 [==============================] - 0s 6ms/step - loss: 2.0851 - accuracy: 0.5105
Epoch 38/200
5/5 [==============================] - 0s 6ms/step - loss: 2.0332 - accuracy: 0.5175
Epoch 39/200
5/5 [==============================] - 0s 6ms/step - loss: 1.9825 - accuracy: 0.5245
Epoch 40/200
5/5 [==============================] - 0s 5ms/step - loss: 1.9342 - accuracy: 0.5455
Epoch 41/200
5/5 [==============================] - 0s 6ms/step - loss: 1.8946 - accuracy: 0.5524
Epoch 42/200
5/5 [==============================] - 0s 5ms/step - loss: 1.8484 - accuracy: 0.5594
Epoch 43/200
5/5 [==============================] - 0s 6ms/step - loss: 1.8091 - accuracy: 0.6014
Epoch 44/200
5/5 [==============================] - 0s 7ms/step - loss: 1.7715 - accuracy: 0.6014
Epoch 45/200
5/5 [==============================] - 0s 6ms/step - loss: 1.7318 - accuracy: 0.6224
Epoch 46/200
5/5 [==============================] - 0s 6ms/step - loss: 1.6970 - accuracy: 0.6224
Epoch 47/200
5/5 [==============================] - 0s 8ms/step - loss: 1.6637 - accuracy: 0.6084
Epoch 48/200
5/5 [==============================] - 0s 7ms/step - loss: 1.6295 - accuracy: 0.6224
Epoch 49/200
5/5 [==============================] - 0s 6ms/step - loss: 1.5980 - accuracy: 0.6224
Epoch 50/200
5/5 [==============================] - 0s 6ms/step - loss: 1.5674 - accuracy: 0.6364
Epoch 51/200
5/5 [==============================] - 0s 9ms/step - loss: 1.5375 - accuracy: 0.6573
Epoch 52/200
5/5 [==============================] - 0s 6ms/step - loss: 1.5120 - accuracy: 0.6503
Epoch 53/200
5/5 [==============================] - 0s 6ms/step - loss: 1.4813 - accuracy: 0.6713
Epoch 54/200
5/5 [==============================] - 0s 6ms/step - loss: 1.4533 - accuracy: 0.6713
Epoch 55/200
5/5 [==============================] - 0s 6ms/step - loss: 1.4275 - accuracy: 0.6783
Epoch 56/200
5/5 [==============================] - 0s 7ms/step - loss: 1.4053 - accuracy: 0.6713
Epoch 57/200
5/5 [==============================] - 0s 7ms/step - loss: 1.3788 - accuracy: 0.6993
Epoch 58/200
5/5 [==============================] - 0s 8ms/step - loss: 1.3545 - accuracy: 0.6923
Epoch 59/200
5/5 [==============================] - 0s 5ms/step - loss: 1.3288 - accuracy: 0.6993
Epoch 60/200
5/5 [==============================] - 0s 7ms/step - loss: 1.3057 - accuracy: 0.7063
Epoch 61/200
5/5 [==============================] - 0s 6ms/step - loss: 1.2882 - accuracy: 0.6923
Epoch 62/200
5/5 [==============================] - 0s 6ms/step - loss: 1.2631 - accuracy: 0.6923
Epoch 63/200
5/5 [==============================] - 0s 6ms/step - loss: 1.2431 - accuracy: 0.6923
Epoch 64/200
5/5 [==============================] - 0s 7ms/step - loss: 1.2248 - accuracy: 0.7063
Epoch 65/200
5/5 [==============================] - 0s 6ms/step - loss: 1.2085 - accuracy: 0.6853
Epoch 66/200
5/5 [==============================] - 0s 6ms/step - loss: 1.1946 - accuracy: 0.7203
Epoch 67/200
5/5 [==============================] - 0s 6ms/step - loss: 1.1668 - accuracy: 0.7203
Epoch 68/200
5/5 [==============================] - 0s 6ms/step - loss: 1.1469 - accuracy: 0.7273
Epoch 69/200
5/5 [==============================] - 0s 6ms/step - loss: 1.1338 - accuracy: 0.7552
Epoch 70/200
5/5 [==============================] - 0s 6ms/step - loss: 1.1350 - accuracy: 0.7273
Epoch 71/200
5/5 [==============================] - 0s 7ms/step - loss: 1.1079 - accuracy: 0.7552
Epoch 72/200
5/5 [==============================] - 0s 7ms/step - loss: 1.0948 - accuracy: 0.7273
Epoch 73/200
5/5 [==============================] - 0s 7ms/step - loss: 1.0678 - accuracy: 0.7483
Epoch 74/200
5/5 [==============================] - 0s 6ms/step - loss: 1.0482 - accuracy: 0.7692
Epoch 75/200
5/5 [==============================] - 0s 6ms/step - loss: 1.0297 - accuracy: 0.7622
Epoch 76/200
5/5 [==============================] - 0s 8ms/step - loss: 1.0157 - accuracy: 0.7692
Epoch 77/200
5/5 [==============================] - 0s 7ms/step - loss: 1.0010 - accuracy: 0.7972
Epoch 78/200
5/5 [==============================] - 0s 6ms/step - loss: 0.9888 - accuracy: 0.7832
Epoch 79/200
5/5 [==============================] - 0s 6ms/step - loss: 0.9751 - accuracy: 0.8252
Epoch 80/200
5/5 [==============================] - 0s 6ms/step - loss: 0.9642 - accuracy: 0.8252
Epoch 81/200
5/5 [==============================] - 0s 6ms/step - loss: 0.9458 - accuracy: 0.8322
Epoch 82/200
5/5 [==============================] - 0s 6ms/step - loss: 0.9319 - accuracy: 0.8182
Epoch 83/200
5/5 [==============================] - 0s 7ms/step - loss: 0.9210 - accuracy: 0.7972
Epoch 84/200
5/5 [==============================] - 0s 5ms/step - loss: 0.9036 - accuracy: 0.8252
Epoch 85/200
5/5 [==============================] - 0s 7ms/step - loss: 0.8891 - accuracy: 0.8322
Epoch 86/200
5/5 [==============================] - 0s 6ms/step - loss: 0.8785 - accuracy: 0.7902
Epoch 87/200
5/5 [==============================] - 0s 8ms/step - loss: 0.8654 - accuracy: 0.8112
Epoch 88/200
5/5 [==============================] - 0s 6ms/step - loss: 0.8538 - accuracy: 0.8462
Epoch 89/200
5/5 [==============================] - 0s 8ms/step - loss: 0.8408 - accuracy: 0.8462
Epoch 90/200
5/5 [==============================] - 0s 9ms/step - loss: 0.8288 - accuracy: 0.8462
Epoch 91/200
5/5 [==============================] - 0s 6ms/step - loss: 0.8170 - accuracy: 0.8531
Epoch 92/200
5/5 [==============================] - 0s 6ms/step - loss: 0.8062 - accuracy: 0.8741
Epoch 93/200
5/5 [==============================] - 0s 6ms/step - loss: 0.7990 - accuracy: 0.8671
Epoch 94/200
5/5 [==============================] - 0s 6ms/step - loss: 0.7854 - accuracy: 0.8741
Epoch 95/200
5/5 [==============================] - 0s 7ms/step - loss: 0.7749 - accuracy: 0.8811
Epoch 96/200
5/5 [==============================] - 0s 8ms/step - loss: 0.7630 - accuracy: 0.8741
Epoch 97/200
5/5 [==============================] - 0s 6ms/step - loss: 0.7531 - accuracy: 0.8881
Epoch 98/200
5/5 [==============================] - 0s 7ms/step - loss: 0.7434 - accuracy: 0.8881
Epoch 99/200
5/5 [==============================] - 0s 6ms/step - loss: 0.7314 - accuracy: 0.8741
Epoch 100/200
5/5 [==============================] - 0s 6ms/step - loss: 0.7214 - accuracy: 0.8951
Epoch 101/200
5/5 [==============================] - 0s 6ms/step - loss: 0.7095 - accuracy: 0.8811
Epoch 102/200
5/5 [==============================] - 0s 9ms/step - loss: 0.7031 - accuracy: 0.8811
Epoch 103/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6916 - accuracy: 0.9021
Epoch 104/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6824 - accuracy: 0.9021
Epoch 105/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6731 - accuracy: 0.8951
Epoch 106/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6648 - accuracy: 0.8881
Epoch 107/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6553 - accuracy: 0.9091
Epoch 108/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6476 - accuracy: 0.9091
Epoch 109/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6439 - accuracy: 0.9161
Epoch 110/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6303 - accuracy: 0.9091
Epoch 111/200
5/5 [==============================] - 0s 8ms/step - loss: 0.6229 - accuracy: 0.8881
Epoch 112/200
5/5 [==============================] - 0s 6ms/step - loss: 0.6154 - accuracy: 0.9091
Epoch 113/200
5/5 [==============================] - 0s 7ms/step - loss: 0.6073 - accuracy: 0.9091
Epoch 114/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5990 - accuracy: 0.9021
Epoch 115/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5902 - accuracy: 0.9091
Epoch 116/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5823 - accuracy: 0.9161
Epoch 117/200
5/5 [==============================] - 0s 8ms/step - loss: 0.5743 - accuracy: 0.9161
Epoch 118/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5662 - accuracy: 0.9161
Epoch 119/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5600 - accuracy: 0.9231
Epoch 120/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5525 - accuracy: 0.9231
Epoch 121/200
5/5 [==============================] - 0s 7ms/step - loss: 0.5471 - accuracy: 0.9301
Epoch 122/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5396 - accuracy: 0.9161
Epoch 123/200
5/5 [==============================] - 0s 7ms/step - loss: 0.5326 - accuracy: 0.9301
Epoch 124/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5253 - accuracy: 0.9301
Epoch 125/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5179 - accuracy: 0.9301
Epoch 126/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5118 - accuracy: 0.9231
Epoch 127/200
5/5 [==============================] - 0s 6ms/step - loss: 0.5098 - accuracy: 0.9301
Epoch 128/200
5/5 [==============================] - 0s 10ms/step - loss: 0.4973 - accuracy: 0.9231
Epoch 129/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4964 - accuracy: 0.9301
Epoch 130/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4884 - accuracy: 0.9371
Epoch 131/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4846 - accuracy: 0.9371
Epoch 132/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4736 - accuracy: 0.9371
Epoch 133/200
5/5 [==============================] - 0s 8ms/step - loss: 0.4680 - accuracy: 0.9301
Epoch 134/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4621 - accuracy: 0.9441
Epoch 135/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4570 - accuracy: 0.9371
Epoch 136/200
5/5 [==============================] - 0s 7ms/step - loss: 0.4490 - accuracy: 0.9510
Epoch 137/200
5/5 [==============================] - 0s 10ms/step - loss: 0.4446 - accuracy: 0.9441
Epoch 138/200
5/5 [==============================] - 0s 7ms/step - loss: 0.4379 - accuracy: 0.9580
Epoch 139/200
5/5 [==============================] - 0s 7ms/step - loss: 0.4333 - accuracy: 0.9580
Epoch 140/200
5/5 [==============================] - 0s 7ms/step - loss: 0.4277 - accuracy: 0.9580
Epoch 141/200
5/5 [==============================] - 0s 7ms/step - loss: 0.4220 - accuracy: 0.9650
Epoch 142/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4159 - accuracy: 0.9650
Epoch 143/200
5/5 [==============================] - 0s 7ms/step - loss: 0.4120 - accuracy: 0.9510
Epoch 144/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4062 - accuracy: 0.9650
Epoch 145/200
5/5 [==============================] - 0s 6ms/step - loss: 0.4016 - accuracy: 0.9580
Epoch 146/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3953 - accuracy: 0.9580
Epoch 147/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3916 - accuracy: 0.9720
Epoch 148/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3852 - accuracy: 0.9860
Epoch 149/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3812 - accuracy: 0.9790
Epoch 150/200
5/5 [==============================] - 0s 8ms/step - loss: 0.3773 - accuracy: 0.9860
Epoch 151/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3719 - accuracy: 0.9790
Epoch 152/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3669 - accuracy: 0.9790
Epoch 153/200
5/5 [==============================] - 0s 8ms/step - loss: 0.3620 - accuracy: 0.9860
Epoch 154/200
5/5 [==============================] - 0s 7ms/step - loss: 0.3582 - accuracy: 0.9860
Epoch 155/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3541 - accuracy: 0.9860
Epoch 156/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3491 - accuracy: 0.9790
Epoch 157/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3444 - accuracy: 0.9790
Epoch 158/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3416 - accuracy: 0.9860
Epoch 159/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3358 - accuracy: 0.9860
Epoch 160/200
5/5 [==============================] - 0s 7ms/step - loss: 0.3322 - accuracy: 0.9860
Epoch 161/200
5/5 [==============================] - 0s 7ms/step - loss: 0.3278 - accuracy: 0.9860
Epoch 162/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3241 - accuracy: 0.9860
Epoch 163/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3206 - accuracy: 0.9860
Epoch 164/200
5/5 [==============================] - 0s 8ms/step - loss: 0.3159 - accuracy: 0.9860
Epoch 165/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3120 - accuracy: 0.9860
Epoch 166/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3081 - accuracy: 0.9860
Epoch 167/200
5/5 [==============================] - 0s 6ms/step - loss: 0.3049 - accuracy: 0.9860
Epoch 168/200
5/5 [==============================] - 0s 8ms/step - loss: 0.3014 - accuracy: 0.9860
Epoch 169/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2972 - accuracy: 0.9860
Epoch 170/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2944 - accuracy: 0.9860
Epoch 171/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2905 - accuracy: 0.9860
Epoch 172/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2873 - accuracy: 0.9860
Epoch 173/200
5/5 [==============================] - 0s 9ms/step - loss: 0.2843 - accuracy: 0.9860
Epoch 174/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2806 - accuracy: 0.9860
Epoch 175/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2768 - accuracy: 0.9860
Epoch 176/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2740 - accuracy: 0.9860
Epoch 177/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2705 - accuracy: 0.9860
Epoch 178/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2670 - accuracy: 0.9860
Epoch 179/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2638 - accuracy: 0.9860
Epoch 180/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2620 - accuracy: 0.9860
Epoch 181/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2588 - accuracy: 0.9860
Epoch 182/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2561 - accuracy: 0.9860
Epoch 183/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2519 - accuracy: 0.9860
Epoch 184/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2490 - accuracy: 0.9860
Epoch 185/200
5/5 [==============================] - 0s 8ms/step - loss: 0.2456 - accuracy: 0.9860
Epoch 186/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2430 - accuracy: 0.9860
Epoch 187/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2406 - accuracy: 0.9860
Epoch 188/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2387 - accuracy: 0.9860
Epoch 189/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2352 - accuracy: 0.9860
Epoch 190/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2325 - accuracy: 0.9860
Epoch 191/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2290 - accuracy: 0.9860
Epoch 192/200
5/5 [==============================] - 0s 10ms/step - loss: 0.2269 - accuracy: 0.9860
Epoch 193/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2243 - accuracy: 0.9860
Epoch 194/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2227 - accuracy: 0.9860
Epoch 195/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2198 - accuracy: 0.9860
Epoch 196/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2173 - accuracy: 0.9860
Epoch 197/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2140 - accuracy: 0.9930
Epoch 198/200
5/5 [==============================] - 0s 7ms/step - loss: 0.2118 - accuracy: 0.9930
Epoch 199/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2093 - accuracy: 0.9930
Epoch 200/200
5/5 [==============================] - 0s 6ms/step - loss: 0.2079 - accuracy: 0.9860

The fit() function is a method in Keras that trains the model on the provided data. It iterates over the dataset for the specified number of epochs, adjusting the model's internal parameters to minimize the specified loss function and improve performance. During each epoch, the model makes predictions, calculates the loss, and updates its parameters using the optimization algorithm specified during compilation.

In this case, the model will be trained for 200 epochs, meaning it will go through the entire dataset 200 times. This allows the model to learn from the data and improve its performance over time.

III - The result¶

Now let's see the result

In [ ]:

plt.plot(train.history["accuracy"], label = "training set accurarcy")
plt.plot(train.history["loss"], label = "training set loss")
plt.legend()

Out[ ]:

<matplotlib.legend.Legend at 0x78087e29c4f0>

According to the graph, the accuary of the model gradually achieves 100% as more training we do, while the loss exponentially decreases.

IV - The chatbot¶

Setting up an Infinite Loop: The code starts with a while True loop, which creates an infinite loop that continues until a certain condition is met.
User Input: The user is prompted to enter their input with the input("You: ") line. The input is stored in the prediction_input variable.
Preprocessing the User Input: The user input is preprocessed by converting all characters to lowercase and removing any punctuation using a list comprehension.
Tokenization and Padding: The preprocessed input is tokenized using the tokenizer.texts_to_sequences() function to convert it into a sequence of word indices. Then, the input sequence is padded using pad_sequences() to ensure it has the same length as the input_shape defined earlier.
Making a Prediction: The preprocessed and padded input is passed to the model's predict() function to obtain the model's output, which represents the predicted intent or category.
Mapping the Output to a Response: The predicted output is mapped back to its corresponding intent label using the inverse_transform() function of the label encoder (le). The resulting intent is stored in the response_tag variable.
Generating a Random Response: A random response is chosen from a dataframe (df_responses) based on the predicted intent. The response is printed as the AI's reply.
Exiting the Loop: If the predicted intent is "GoodBye", indicating the user wants to end the conversation, the loop is terminated with the break statement.

In [ ]:

while True:
    text_p = []
    prediction_input = input("You: ")

    prediction_input = [letters.lower() for letters in prediction_input if letters not in string.punctuation]
    prediction_input = ''.join(prediction_input)
    text_p.append(prediction_input)

    prediction_input = tokenizer.texts_to_sequences(text_p)
    prediction_input = np.array(prediction_input).reshape(-1)
    prediction_input = pad_sequences([prediction_input], maxlen=input_shape)


    output = model.predict(prediction_input)
    output = output.argmax()

    response_tag = le.inverse_transform([output])[0]
    print("AI:", random.choice(df_responses[df_responses["intent"] == response_tag]["responses"].values))

    if response_tag == "GoodBye":
        break

You: hello
1/1 [==============================] - 1s 660ms/step
AI: hi human please tell me your genisys user
You: who are you 
1/1 [==============================] - 0s 29ms/step
AI: you can call me geni
You: GoodBuye
1/1 [==============================] - 0s 37ms/step
AI: hello human please tell me your genisys user
You: GoodBye
1/1 [==============================] - 0s 22ms/step
AI: have a nice day

Yay so we just built our chatbot from scratch! Not too hard eh? :)