ScalarStop helps you train machine learning models by:
ScalarStop is available on PyPI. You can install it from the command line using::
pip3 install scalarstop
First, we will organize your training, validation, and test sets with subclasses of a DataBlob
objects.
Second, we will describe the architecture of your machine learning models with subclasses of ModelTemplate
objects.
Third, we'll create a Model
subclass instance that initializes a model with a ModelTemplate
and trains it on a DataBlob
's training and validation sets.
Finally, we will save the hyperparameters and training metrics from many DataBlob
s, ModelTemplate
s, and Model
s into a SQLite or PostgreSQL database using the TrainStore
client.
But first, let's import the modules we'll need for this demo.
import os
import scalarstop as sp
import tensorflow as tf
DataBlob
: Keeping your training dataset organized¶The first step to training machine learning models with ScalarStop is to encase your dataset into a DataBlob
.
A DataBlob
is a set of three tf.data.Dataset
pipelines--representing your training, validation, and test sets.
When you create a DataBlob
, variables that affect the creation of the tf.data.Dataset
pipeline are are stored in a nested Python dataclass named Hyperparams
. Only store simple JSON-serializable types in the Hyperparams
dataclass.
Creating a new DataBlob
with Hyperparams
looks roughly like this:
from typing import List, Dict
import scalarstop as sp
class my_datablob_group_name(sp.DataBlob):
@sp.dataclass
class Hyperparams(sp.HyperparamsType):
a: int
b: str
c: Dict[str, float]
d = List[int]
# ... more setup below ...
Then, we define three methods on our DataBlob
subclass:
set_training()
set_validation()
set_test()
Each one of them has to create a new instance of a tf.data.Dataset
pipeline with data samples and labels zipped together. Typically that looks like:
# Create a tf.data.Dataset for your training samples.
samples = tf.data.Dataset.from_tensor_slices([1, 2, 3])
# And another tf.data.Dataset for your training labels.
labels = tf.data.Dataset.from_tensor_slices([0, 1, 0])
# And zip them together.
tf.data.Dataset.zip((samples, labels))
Do not apply any batching at this stage. We will do that later.
Now we'll create a DataBlob
that contains the Fashion MNIST dataset.
class fashion_mnist_v1(sp.DataBlob):
@sp.dataclass
class Hyperparams(sp.HyperparamsType):
num_training_samples: int
def __init__(self, hyperparams):
"""
You only need to override __init__ if you want to validate
your hyperparameters or add arguments that are not hyperparameters.
One example of a non-hyperparameter argument would be a
database connection URL.
"""
if hyperparams["num_training_samples"] > 50_000:
raise ValueError("num_training_samples should be <= 50_000")
super().__init__(hyperparams=hyperparams)
(self._train_images, self._train_labels), \
(self._test_images, self._test_labels) = \
tf.keras.datasets.fashion_mnist.load_data()
def set_training(self) -> tf.data.Dataset:
"""The training set."""
samples = tf.data.Dataset.from_tensor_slices(
self._train_images[:self.hyperparams.num_training_samples]
)
labels = tf.data.Dataset.from_tensor_slices(
self._train_labels[:self.hyperparams.num_training_samples]
)
return tf.data.Dataset.zip((samples, labels))
def set_validation(self) -> tf.data.Dataset:
"""
The validation set.
In this example, the validation set does not change with the
hyperparameters. This allows us to compare results with
different training sets to the same validation set.
However, if your hyperparameters specify how to engineer
features, then you might wnat the validation set and
training set to rely on the same hyperparameters.
"""
samples = tf.data.Dataset.from_tensor_slices(
self._train_images[50_000:]
)
labels = tf.data.Dataset.from_tensor_slices(
self._train_labels[50_000:]
)
return tf.data.Dataset.zip((samples, labels))
def set_test(self) -> tf.data.Dataset:
"""The test set. Used to evaluate models but not train them."""
samples = tf.data.Dataset.from_tensor_slices(
self._test_images
)
labels = tf.data.Dataset.from_tensor_slices(
self._test_labels
)
return tf.data.Dataset.zip((samples, labels))
Here we create a DataBlob
instance with a dictionary to set our Hyperparams
.
The DataBlob
name is computed by hashing your DataBlob
subclass class name and the names and values of your Hyperparams
.
datablob1 = fashion_mnist_v1(hyperparams=dict(num_training_samples=10))
datablob1.name
'fashion_mnist_v1-p166sf7xz19hg8n3mj8f93m8'
The DataBlob
group name is by default the DataBlob
subclass name.
datablob1.group_name
'fashion_mnist_v1'
print(datablob1.hyperparams)
fashion_mnist_v1.Hyperparams(num_training_samples=10)
Now we create another DataBlob
instance with a different value for Hyperparams
.
Note that it has a different automatically-generated name
, but it'll have the same group_name
.
datablob2 = fashion_mnist_v1(hyperparams=dict(num_training_samples=50))
datablob2.name, datablob2.group_name
('fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza', 'fashion_mnist_v1')
datablob1.training.take(1)
<TakeDataset shapes: ((28, 28), ()), types: (tf.uint8, tf.uint8)>
We can save a DataBlob to the filesystem and load it back later.
os.makedirs("datablobs_directory", exist_ok=True)
datablob1.save("datablobs_directory")
<sp.DataBlob fashion_mnist_v1-p166sf7xz19hg8n3mj8f93m8>
Here, we use the classmethod from_filesystem()
to calculate the exact path of our saved DataBlob
using a copy of the DataBlob
's hyperparameters.
loaded_datablob1 = fashion_mnist_v1.from_filesystem(
hyperparams=dict(num_training_samples=10),
datablobs_directory="datablobs_directory",
)
loaded_datablob1
<sp.DataBlob fashion_mnist_v1-p166sf7xz19hg8n3mj8f93m8>
Alternatiely, if we know the exact directory name of our saved DataBlob
1, we can load it with with_exact_path()
.
loaded_datablob2 = fashion_mnist_v1.from_exact_path(
os.path.join("datablobs_directory", datablob1.name)
)
loaded_datablob2
<sp.DataBlob fashion_mnist_v1-p166sf7xz19hg8n3mj8f93m8>
ModelTemplate
: Parameterizing your model creation¶The ModelTemplate
is the same concept as the DataBlob
, but instead of three tf.data.Dataset
s, the ModelTemplate
creates a machine learning framework model object.
Here is an example of creating a Keras model. Building and compiling the model is parameterized by values in the Hyperparams
dataclass.
class small_dense_10_way_classifier_v1(sp.ModelTemplate):
@sp.dataclass
class Hyperparams(sp.HyperparamsType):
hidden_units: int
optimizer: str = "adam"
def new_model(self):
model = tf.keras.Sequential(
layers=[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(
units=self.hyperparams.hidden_units,
activation="relu",
),
tf.keras.layers.Dense(units=10)
],
name=self.name,
)
model.compile(
optimizer=self.hyperparams.optimizer,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"],
)
return model
Once again, the ModelTemplate
has a unique name generated by hashing your subclass and the Hyperparams
.
model_template = small_dense_10_way_classifier_v1(hyperparams=dict(hidden_units=3))
model_template.name
'small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs'
Model
:Combine your ModelTemplate
with your DataBlob
¶DataBlob
s and ModelTemplate
s are not very useful until you bring them together with a Model
.
A Model
is an object created by pairing together a ModelTemplate
instance and a DataBlob
instance, for the purpose of training the machine learning model created by the ModelTemplate
on the DataBlob
's training and validation sets.
Make sure to batch your DataBlob
before using it.
datablob = datablob2.batch(2)
model = sp.KerasModel(
datablob=datablob,
model_template=model_template,
)
Once again, the Model
has a unique name. But this time it is just a concatenation of the DataBlob
and ModelTemplate
names.
model.name
'mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza'
model.fit(final_epoch=2, verbose=1)
Epoch 1/2 25/25 [==============================] - 3s 115ms/step - loss: 39.2720 - accuracy: 0.3199 - val_loss: 2.6354 - val_accuracy: 0.1039 Epoch 2/2 25/25 [==============================] - 2s 99ms/step - loss: 2.3040 - accuracy: 0.1014 - val_loss: 2.5192 - val_accuracy: 0.1040
{'loss': [23.980653762817383, 2.3024940490722656], 'accuracy': [0.18000000715255737, 0.11999999731779099], 'val_loss': [2.635443925857544, 2.5192012786865234], 'val_accuracy': [0.1039000004529953, 0.10400000214576721]}
In ScalarStop, training a machine learning model is an idempotent operation. Instead of saying, "Train for $n$ more epochs," we say, "Train until the model has been trained for $n$ epochs total."
If we call model.fit()
again with final_epoch()
still at 2, we get the same metrics but no training happened.
model.fit(final_epoch=2, verbose=1)
{'loss': [23.980653762817383, 2.3024940490722656], 'accuracy': [0.18000000715255737, 0.11999999731779099], 'val_loss': [2.635443925857544, 2.5192012786865234], 'val_accuracy': [0.1039000004529953, 0.10400000214576721]}
Training ScalarStop Model
s are idempotent because they keep track of how many epochs they have been trained for and the generated training metrics (e.g. loss, accuracy, etc.). This information is saved to the filesystem if you call model.save()
and is loaded back from disk if you create a new Model
object with Model.from_filesystem()
or Model.from_filesystem_or_new()
.
os.makedirs("models_directory", exist_ok=True)
model.save("models_directory")
os.listdir("models_directory")
INFO:tensorflow:Assets written to: models_directory/mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza/assets
['mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza']
This is an example of us loading the model back, calculating the exact filename based on the hyperparameters of both the DataBlob
and ModelTemplate
.
model2 = sp.KerasModel.from_filesystem(
datablob=datablob,
model_template=model_template,
models_directory="models_directory",
)
print(model2.name)
model2.history
mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza
{'accuracy': [0.18000000715255737, 0.11999999731779099], 'loss': [23.980653762817383, 2.3024940490722656], 'val_accuracy': [0.1039000004529953, 0.10400000214576721], 'val_loss': [2.635443925857544, 2.5192012786865234]}
If you provide models_directory
as an argument to fit()
, ScalarStop will save the model to the filesystem after every epoch.
_ = model2.fit(final_epoch=5, verbose=1, models_directory="models_directory")
Epoch 3/5 25/25 [==============================] - 3s 103ms/step - loss: 2.3016 - accuracy: 0.1200 - val_loss: 2.5192 - val_accuracy: 0.1040 INFO:tensorflow:Assets written to: models_directory/mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza/assets Epoch 4/5 25/25 [==============================] - 2s 88ms/step - loss: 2.2994 - accuracy: 0.1000 - val_loss: 2.5192 - val_accuracy: 0.1060 INFO:tensorflow:Assets written to: models_directory/mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza/assets Epoch 5/5 25/25 [==============================] - 2s 90ms/step - loss: 2.2980 - accuracy: 0.1600 - val_loss: 2.5192 - val_accuracy: 0.1060 INFO:tensorflow:Assets written to: models_directory/mt_small_dense_10_way_classifier_v1-uptyfbjofo7rqv8antxrwhjs__d_fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza/assets
Once again, ScalarStop saves the model's trainining history alongside the model's weights, but this is not very convenient if you want to do large-scale analysis on the training metrics of many models at once.
A better way of storing the training metrics is to use the TrainStore
.
TrainStore
: Save and query your machine learning metrics in a database¶The TrainStore
is a client that saves hyperparameters and training metrics to a SQLite or PostgreSQL database. Let's create a new TrainStore
instance that will save data to a file named train_store.sqlite3
.
train_store = sp.TrainStore.from_filesystem(filename="train_store.sqlite3")
train_store
<sp.TrainStore sqlite:///train_store.sqlite3>
The TrainStore
is also available as a Python context manager.
with sp.TrainStore.from_filesystem(filename="train_store.sqlite3") as train_store:
# use the TrainStore here
# here the TrainStore database connection is automatically closed for you.
We don't use it that way in this example because we want to use the TrainStore across multiple Jupyter notebook cells.
And if we want to connect to a PostgreSQL database, the syntax looks like:
connection_string = "postgresql://username:password@hostname:port/database"
with sp.TrainStore(connection_string=connection_string) as train_store:
# ...
The TrainStore
will automatically save your DataBlob
and ModelTemplate
name, group name, and hyperparameters to the database. And when you train a Model
, the TrainStore
will persist the model name and the epoch training metrics.
All of this happens automatically if you pass the TrainStore
instance to Model.fit()
.
_ = model.fit(final_epoch=5, train_store=train_store)
Epoch 3/5 25/25 [==============================] - 2s 87ms/step - loss: 2.3012 - accuracy: 0.0800 - val_loss: 2.5129 - val_accuracy: 0.1039 Epoch 4/5 25/25 [==============================] - 2s 88ms/step - loss: 2.2999 - accuracy: 0.0800 - val_loss: 2.5124 - val_accuracy: 0.0959 Epoch 5/5 25/25 [==============================] - 2s 90ms/step - loss: 2.2985 - accuracy: 0.0600 - val_loss: 2.5124 - val_accuracy: 0.0959
Once you have some information in the TrainStore
, you can query it for information and receive results as a Pandas DataFrame
.
First, let's list the DataBlob
s that we have saved:
train_store.list_datablobs()
name | group_name | hyperparams | last_modified | |
---|---|---|---|---|
0 | fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza | fashion_mnist_v1 | {'num_training_samples': 50} | 2021-04-11 20:40:11.891137 |
...and the ModelTemplate
s that we have saved:
train_store.list_model_templates()
name | group_name | hyperparams | last_modified | |
---|---|---|---|---|
0 | small_dense_10_way_classifier_v1-uptyfbjofo7rq... | small_dense_10_way_classifier_v1 | {'hidden_units': 3, 'optimizer': 'adam'} | 2021-04-11 20:40:11.892808 |
...and the models that we have trained:
train_store.list_models()
model_name | model_class_name | model_last_modified | datablob_name | datablob_group_name | model_template_name | model_template_group_name | dbh__num_training_samples | mth__hidden_units | mth__optimizer | |
---|---|---|---|---|---|---|---|---|---|---|
0 | mt_small_dense_10_way_classifier_v1-uptyfbjofo... | KerasModel | 2021-04-11 20:40:11.894643 | fashion_mnist_v1-3wzktz1cmz86vs1r7rbmdoza | fashion_mnist_v1 | small_dense_10_way_classifier_v1-uptyfbjofo7rq... | small_dense_10_way_classifier_v1 | 50 | 3 | adam |
...and this is how we query for the training history for a given model:
train_store.list_model_epochs(model_name=model.name)
epoch_num | model_name | last_modified | metric__loss | metric__accuracy | metric__val_loss | metric__val_accuracy | |
---|---|---|---|---|---|---|---|
0 | 3 | mt_small_dense_10_way_classifier_v1-uptyfbjofo... | 2021-04-11 20:40:13.981798 | 2.301222 | 0.08 | 2.512903 | 0.1039 |
1 | 4 | mt_small_dense_10_way_classifier_v1-uptyfbjofo... | 2021-04-11 20:40:16.110798 | 2.299875 | 0.08 | 2.512378 | 0.0959 |
2 | 5 | mt_small_dense_10_way_classifier_v1-uptyfbjofo... | 2021-04-11 20:40:18.271750 | 2.298548 | 0.06 | 2.512365 | 0.0959 |
model_template_2 = small_dense_10_way_classifier_v1(hyperparams=dict(hidden_units=5))
model_2 = sp.KerasModel(datablob=datablob, model_template=model_template_2)
_ = model_2.fit(final_epoch=10, train_store=train_store)
Epoch 1/10 25/25 [==============================] - 2s 91ms/step - loss: 28.2529 - accuracy: 0.1749 - val_loss: 2.5577 - val_accuracy: 0.1217 Epoch 2/10 25/25 [==============================] - 2s 92ms/step - loss: 2.8903 - accuracy: 0.2733 - val_loss: 2.4093 - val_accuracy: 0.1032 Epoch 3/10 25/25 [==============================] - 2s 89ms/step - loss: 2.2542 - accuracy: 0.2733 - val_loss: 2.3985 - val_accuracy: 0.1081 Epoch 4/10 25/25 [==============================] - 2s 95ms/step - loss: 2.2393 - accuracy: 0.2733 - val_loss: 2.3969 - val_accuracy: 0.1096 Epoch 5/10 25/25 [==============================] - 2s 89ms/step - loss: 2.2373 - accuracy: 0.2733 - val_loss: 2.3966 - val_accuracy: 0.1097 Epoch 6/10 25/25 [==============================] - 2s 89ms/step - loss: 2.2353 - accuracy: 0.2733 - val_loss: 2.3966 - val_accuracy: 0.1097 Epoch 7/10 25/25 [==============================] - 2s 88ms/step - loss: 2.2333 - accuracy: 0.2733 - val_loss: 2.3966 - val_accuracy: 0.1097 Epoch 8/10 25/25 [==============================] - 2s 93ms/step - loss: 2.2313 - accuracy: 0.2733 - val_loss: 2.3966 - val_accuracy: 0.1097 Epoch 9/10 25/25 [==============================] - 2s 91ms/step - loss: 2.2293 - accuracy: 0.2733 - val_loss: 2.3967 - val_accuracy: 0.1097 Epoch 10/10 25/25 [==============================] - 2s 88ms/step - loss: 2.2273 - accuracy: 0.2733 - val_loss: 2.3967 - val_accuracy: 0.1097
train_store.list_model_epochs(model_name=model_2.name)
epoch_num | model_name | last_modified | metric__loss | metric__accuracy | metric__val_loss | metric__val_accuracy | |
---|---|---|---|---|---|---|---|
0 | 1 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:20.782506 | 13.858220 | 0.14 | 2.557676 | 0.1217 |
1 | 2 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:22.989173 | 2.634313 | 0.18 | 2.409259 | 0.1032 |
2 | 3 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:25.121682 | 2.264078 | 0.18 | 2.398494 | 0.1081 |
3 | 4 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:27.412649 | 2.252162 | 0.18 | 2.396904 | 0.1096 |
4 | 5 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:29.547414 | 2.250889 | 0.18 | 2.396623 | 0.1097 |
5 | 6 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:31.679167 | 2.249614 | 0.18 | 2.396614 | 0.1097 |
6 | 7 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:33.794024 | 2.248346 | 0.18 | 2.396612 | 0.1097 |
7 | 8 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:36.031745 | 2.247089 | 0.18 | 2.396631 | 0.1097 |
8 | 9 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:38.216244 | 2.245847 | 0.18 | 2.396651 | 0.1097 |
9 | 10 | mt_small_dense_10_way_classifier_v1-axos7t2rck... | 2021-04-11 20:40:40.344820 | 2.244625 | 0.18 | 2.396712 | 0.1097 |