pylearn2 tutorial: Softmax regression

by Ian Goodfellow


This ipython notebook will teach you the basics of how softmax regression works, and show you how to do softmax regression in pylearn2.

To do this, we will go over several concepts:

Part 1: What pylearn2 is doing for you in this example

  • What softmax regression is, and the math of how it works

  • The basic theory of how softmax regression training works

Part 2: How to use pylearn2 to do softmax regression

  • How to load data in pylearn2, and specifically how to load the MNIST dataset

  • How to configure the pylearn2 SoftmaxRegression model

  • How to set up a pylearn2 training algorithm

  • How to run training with the pylearn2 train script, and interpret its output

  • How to analyze the results of training

Note that this won't explain in detail how the individual classes are implemented. The classes follow pretty good naming conventions and have pretty good docstrings, but if you have trouble understanding them, write to me and I might add a part 3 explaining how some of the parts work under the hood.

Please write to if you encounter any problem with this tutorial.


Before running this notebook, you must have installed pylearn2. Follow the download and installation instructions if you have not yet done so.

Part 1: What pylearn2 is doing for you in this example

In this part, we won't get into any specifics of pylearn2 yet. We'll just discuss how to train a softmax regression model. If you already know about softmax regression, feel free to skip straight to part 2, where we show how to do all of this in pylearn2.

What softmax regression is, and the math of how it works

Softmax regression is type of classification model (so the "regression" in the name is really a misnomer), which means it is a pattern recognition algorithm that maps input patterns to categories. In this tutorial, the input patterns will be images of handwritten digits, and the output category will be the identity of the digit (0-9). In other words, we will use softmax regression to solve a simple optical character recognition problem.

You may have heard of logistic regression. Logistic regression is a special case of softmax regression. Specifically, it is the case where there are only two possible output categories. Softmax regression is a generalization of logistic regression to multiple categories.

Let's define some basic terms. First, we'll use the variable $x$ to represent the input to the softmax regression model. We'll use the variable $y$ to represent the output category. Let $y$ be a non-negative integer, such that $0 \leq y < k$ , where $k$ is the number of categories $x$ may belong to. In our example, we are classifying handwritten digits ranging in value from 0 to 9, so the value of y is very easy to interpret. When $y = 7$, the category identified is 7. In most applications, we interpret $y$ as being a numeric code identifying a category, e.g., 0 = cat, 1 = dog, 2 = airplane, etc.

The job of the softmax regression classifier is to predict the probability of $x$ belonging to each class. i.e, we want to be able to compute $p(y = i \mid x)$ for all $k$ possible values of $i$.

The role of a parametric model like softmax regression is to define a set of parameters and describe how they map to functions $f$ defining $p(y \mid x)$. In the case of softmax regression, the model assumes that the log probability of $y=i$ is an affine function of the input $x$, up to some constant $c(x)$. $c(x)$ is defined to be whatever constant is needed to make the distribution add up to 1.

To make this more formal, let $p(y)$ be written as a vector $[ p(y=0), p(y=1), \dots, p(y=k-1) ]^T$. Assume that $x$ can be represented as a vector of numbers (in this example, we will regard each pixel of an grayscale image as being represented by a number in [0,1], and we will turn the 2D array of the image into a vector by using numpy's reshape method). Then the assumption that softmax regression makes is that

$$\log p(y \mid x) = x^T W + b + c(x) $$

where $W$ is a matrix and $b$ is a vector. Note that $c(x)$ is just a scalar but here I am adding it to a vector. I'm using numpy broadcasting rules in my math here, so this means to add $c(x)$ to every element of the vector. I'll use numpy broadcasting rules throughout this tutorial.

$W$ and $b$ are the parameters of the model, and determine how inputs are mapped to output categories. We usually call $W$ the "weights" and $b$ the "biases."

By doing some algebra, using the constraint that $p(y)$ must add up to 1, we get

$$ p(y \mid x) = \frac { \exp( x^T W + b ) } { \sum_i \exp(x^T W + b)_i } = \text{softmax}( x^T W + b) $$

where $\text{softmax}$ is the softmax activation function.

The basic theory of how softmax regression training works

Of course, the softmax model will only assign $x$ to the right category if its parameters have been adjusted to make them specify the right mapping. To do this we need to train the model.

The basic idea is that we have a collection of training examples, $\mathcal{D}$. Each example is an (x, y) tuple. We will fit the model to the training set, so that when run on the training data, it outputs a good estimate of the probability distribution over $y$ for all of the $x$s.

One way to fit the model is maximum likelihood estimation. Suppose we draw a category variable $\hat{y}$ from our model's distribution $p(y \mid x)$ for every training example independently. We want to maximize the probability of all of those labels being correct. To do this, we maximize the function

$$ J( \mathcal{D}, W, b) = \Pi_{x,y \in \mathcal{D} } p(y \mid x ). $$

That function involves lots of multiplication, of possibly very small numbers (note that the softmax activation function guarantees none of them will ever be exactly zero). Multiplying together many small numbers can result in numerical underflow. In practice, we usually take the logarithm of this function to avoid underflow. Since the logarithm is a monotically increasing function, it doesn't change which parameter value is optimal. It does get rid of the multiplication though:

$$ J( \mathcal{D}, W, b) = \sum_{x,y \in \mathcal{D} } \log p(y \mid x ). $$

Many different algorithms can maximize $J$. In this tutorial, we will use an algorithm called nonlinear conjugate gradient descent to minimize $-J$. In the case of softmax regression, maximizing $J$ is a convex optimization problem so any optimization algorithm should find the same solution. The choice of nonlinear conjugate gradient is mostly to demonstrate that feature of pylearn2.

One problem with maximium likelihood estimation is that it can suffer from a problem called overfitting. The basic intuition is that the model can memorize patterns in the training set that are specific to the training examples, i.e. patterns that are spurious and not indicative of the correct way to categorize new, previously unseen inputs. One way to prevent this is to use early stopping. Most optimization methods are iterative, in that they try out several values of $W$ and $b$ gradually looking for the best one. Early stopping refers to stopping this search before finding the absolute best values on the training set. If we start with $W$ close to the origin, then stopping early means that $W$ will not travel as far from the origin as it would if we ran the optimization procedure to completion. Early stopping corresponds to assuming that the correlations between input features and output categories are not as strong as pure maximum likelihood estimation would determine them to be.

In order to pick the right point in time to stop, we divide the training set into two subsets: one that we will actually train on, and one that we use to see how well the model is generalizing to new data, then "validation set." The idea is to return the model that does the best at classifying the validation set, rather than the model that assigns the highest probability to the training set.

Part 2: How to use pylearn2 to do softmax regression

Now that we've described the theory of what we're going to do, it's time to do it! This part describes how to use pylearn2 to run the algorithms described above.

How to load data in pylearn2, and specifically how to load the MNIST dataset

To train a model in pylearn2, we need to construct several objects specifying how to train it. There are two ways to do this. One is to explicitly construct them as python objects. The other is to specify them using YAML strings. The latter option is better supported at present, so we will use that.

In this ipython notebook, we will construct YAML strings in python. Most of the time when I use pylearn2, I write the yaml string out on disk, then run pylearn2's script on that YAML file. In the format of this tutorial, in an ipython notebook, it's easier to just do everything in python though.

YAML allows the definition of third-party tags that specify how the YAML string should be deserialized, and pylearn2 has a few of those. One of them is the !obj tag, which specifies that what follows is a full specification of a python callable that returns an object. Usually this will just be a class name.

In this tutorial, we will train our model on the MNIST dataset. In order to load that, we use an !obj tag to construct an instance of pylearn2's MNIST class, found in the pylearn2.datasets.mnist python module.

We can pass arguments to the MNIST class's init method by defining a dictionary mapping argument names to their values.

The MNIST dataset is split into a training set and a test set. Since the object we are constructing now will be used as the training set, we must specify that we want to load the training data. We can use the 'which_set' argument to do this.

Finally, as described above, we will use early stopping, so we shouldn't train on the entire training set. The MNIST training set contains 60,000 examples. We use the 'start' and 'stop' arguments to train on the first 50,000 of them.

In [1]:
import os
import pylearn2
dirname = os.path.abspath(os.path.dirname('softmax_regression.ipynb'))
with open(os.path.join(dirname, 'sr_dataset.yaml'), 'r') as f:
    dataset =
hyper_params = {'train_stop' : 50000}
dataset = dataset % (hyper_params)
print dataset
!obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000

How to configure the pylearn2 SoftmaxRegression model

Next, we need to specify an object representing the model to be trained. To do this, we need to make an instance of the SoftmaxRegression class defined in pylearn2.models.softmax_regression. We need to specify a few details of how to configure the model.

The "nvis" argument stands for "number of visible units." In neural network terminology, the "visible units" are the pieces of data that the model gets to observe. This argument is asking for the dimension of $x$. If we didn't want $x$ to be a vector, there is another more flexible way of configuring the input of the model, but for vector-based models, "nvis" is the easiest piece of the API to use. The MNIST dataset contains 28x28 grayscale images, not vectors, so the SoftmaxRegression model will ask pylearn2 to flatten the images into vectors. That means it will receive a vector with 28*28=784 elements.

We also need to specify how many categories or classes there are with the "n_classes" argument.

Finally, the matrix $W$ will be randomly initialized. There are a few different initialization schemes in pylearn2. Specifying the "irange" argument will make each element of $W$ be initialized from $U(-\text{irange}, \text{irange})$. Since softmax regression training is a convex optimization problem, we can set irange to 0 to initialize all of $W$ to 0. (Some other models require that the different columns of $W$ differ from each other initially in order for them to train correctly)

In [2]:
import os
import pylearn2
dirname = os.path.abspath(os.path.dirname('softmax_regression.ipynb'))
with open(os.path.join(dirname, 'sr_model.yaml'), 'r') as f:
    model =

print model
!obj:pylearn2.models.softmax_regression.SoftmaxRegression {
    n_classes: 10,
    irange: 0.,
    nvis: 784,

How to set up a pylearn2 training algorithm

Next, we need to specify a training algorithm to maximize the log likelihood with. (Actually, we will minimize the negative log likelihood, because all of pylearn2's optimization algorithms are written in terms of minimizing a cost function. theano will optimize out any double-negation that results, so this has no effect on the runtime of the algorithm)

We can use an !obj tag to load pylearn2's BGD class. BGD stands for batch gradient descent. It is a class designed to train models by moving in the direction of the gradient of the objective function applied to large batches of examples.

The "batch_size" argument determines how many examples the BGD class will act on at one time. This should be a fairly large number so that the updates are more likely to generalize to other batches.

Setting "line_search_mode" to exhaustive means that the BGD class will try to binary search for the best possible point along the direction of the gradient of the cost function, rather than just trying out a few pre-selected step sizes. This implements the method of steepest descent.

"conjugate" is a boolean flag. By setting it to 1, we make BGD modify the gradient directions to preserve conjugacy prior to doing the line search. This implements nonlinear conjugate gradient descent.

During training, we will keep track of several different quantities of interest to the experimenter, such as the number of examples that are classified correctly, the objective function value, etc. The quantities to track are determined by the model class and by the training algorithm class. These quantities are referred to as "channels" and the act of tracking them is called "monitoring" in pylearn2 terms. In order to track them, we need to specify a monitoring dataset. In this case, we use a dictionary to make multiple, named monitoring datasets.

We use "*train" to define the training set. The * is YAML syntax saying to reference an object defined elsewhere in the YAML file. Later, when we specify which dataset to train on, we will define this reference.

Finally, the BGD algorithm needs to know when to stop training. We therefore give it a "termination criterion." In this case, we use a monitor-based termination criterion that says to stop when too little progress is being made at reducing the value tracked by one of the monitoring channels. In this case, we use "valid_y_misclass", which is the rate at which the model mislabels examples on the validation set. MonitorBased has some other arguments that we don't bother to specify here, and just use the defaults. These defaults will result in the training algorithm running for a while after the lowest value of the validation error has been reached, to make sure that we don't stop too soon just because the validation error randomly bounced upward for a few epochs.

You might expect the BGD algorithm to need to be told what objective function to minimize. It turns out that if the user doesn't say what objective function to minimize, BGD will ask the model for some default objective function, by calling the models "get_default_cost" method. In this case, the SoftmaxRegression model provides the negative log likelihood as the default objective function.

In [3]:
import os
import pylearn2
dirname = os.path.abspath(os.path.dirname('softmax_regression.ipynb'))
with open(os.path.join(dirname, 'sr_algorithm.yaml'), 'r') as f:
    algorithm =
hyper_params = {'batch_size' : 10000,
                'valid_stop' : 60000}
algorithm = algorithm % (hyper_params)
print algorithm
!obj:pylearn2.training_algorithms.bgd.BGD {
        batch_size: 10000,
        line_search_mode: 'exhaustive',
        conjugate: 1,
                'train' : *train,
                'valid' : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'train',
                              start: 50000,
                              stop:  60000
                'test'  : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'test',
        termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
            channel_name: "valid_y_misclass"

How to run training with the pylearn2 train script, and interpret its output

We now use a pylearn2 Train object to represent the training problem.

We use "&train" here to define the reference used with the "*train" line in the algorithm section.

We use the python %(varname)s syntax and the locals() dictionary to paste the dataset, model, and algorithm strings from the earlier sections into this final string here.

As specified in the previous section, the model will keep training for a while after the lowest validation error is reached, just to make sure that it won't start going down again. However, the final model we would like to return is the one with the lowest validation error. We add an "extension" to the training algorithm here. Extensions are objects with callbacks that get triggered at different points in time, such as the end of a training epoch. In this case, we use the MonitorBasedSaveBest extension. Whenever the monitoring channels are updated, MonitorBasedSaveBest will check if a specific channel decreased, and if so, it will save a copy of the model. This way, the best model is saved at the end. Here we save the model with the lowest validation set error to "softmax_regression_best.pkl."

In [4]:
import os
import pylearn2
dirname = os.path.abspath(os.path.dirname('softmax_regression.ipynb'))
with open(os.path.join(dirname, 'sr_train.yaml'), 'r') as f:
    train =
save_path = '.'
train = train %locals()

Execute the cell below to see the final YAML string.

In [5]:
print train
!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000
    model: !obj:pylearn2.models.softmax_regression.SoftmaxRegression {
    n_classes: 10,
    irange: 0.,
    nvis: 784,
    algorithm: !obj:pylearn2.training_algorithms.bgd.BGD {
        batch_size: 10000,
        line_search_mode: 'exhaustive',
        conjugate: 1,
                'train' : *train,
                'valid' : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'train',
                              start: 50000,
                              stop:  60000
                'test'  : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'test',
        termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
            channel_name: "valid_y_misclass"
    extensions: [
        !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
             channel_name: 'valid_y_misclass',
             save_path: "softmax_regression_best.pkl"
    save_path: "softmax_regression.pkl",
    save_freq: 1

Now, we use pylearn2's yaml_parse.load to construct the Train object, and run its main loop. The same thing could be accomplished by running pylearn2's script on a file containing the yaml string.

Execute the next cell to train the model. This will take a few minutes, and it will print out output periodically as it runs.

In [6]:
from pylearn2.config import yaml_parse
train = yaml_parse.load(train)
compiling begin_record_entry...
/u/almahaia/Code/pylearn2/pylearn2/models/ UserWarning: MLP changing the recursion limit.
  warnings.warn("MLP changing the recursion limit.")
compiling begin_record_entry done. Time elapsed: 0.127929 seconds
Monitored channels: 
Compiling accum...
graph size: 58
graph size: 53
graph size: 53
Compiling accum done. Time elapsed: 1.825620 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	ave_grad_mult: 0.0
	ave_grad_size: 0.0
	ave_step_size: 0.0
	test_objective: 2.30258509299
	test_y_col_norms_max: 0.0
	test_y_col_norms_mean: 0.0
	test_y_col_norms_min: 0.0
	test_y_max_max_class: 0.1
	test_y_mean_max_class: 0.1
	test_y_min_max_class: 0.1
	test_y_misclass: 0.902
	test_y_nll: 2.30258509299
	test_y_row_norms_max: 0.0
	test_y_row_norms_mean: 0.0
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 0.0
	train_objective: 2.30258509299
	train_y_col_norms_max: 0.0
	train_y_col_norms_mean: 0.0
	train_y_col_norms_min: 0.0
	train_y_max_max_class: 0.1
	train_y_mean_max_class: 0.1
	train_y_min_max_class: 0.1
	train_y_misclass: 0.90136
	train_y_nll: 2.30258509299
	train_y_row_norms_max: 0.0
	train_y_row_norms_mean: 0.0
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 0.0
	valid_objective: 2.30258509299
	valid_y_col_norms_max: 0.0
	valid_y_col_norms_mean: 0.0
	valid_y_col_norms_min: 0.0
	valid_y_max_max_class: 0.1
	valid_y_mean_max_class: 0.1
	valid_y_min_max_class: 0.1
	valid_y_misclass: 0.9009
	valid_y_nll: 2.30258509299
	valid_y_row_norms_max: 0.0
	valid_y_row_norms_mean: 0.0
	valid_y_row_norms_min: 0.0
Time this epoch: 47.135716 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 5
	Examples seen: 50000
	ave_grad_mult: 2.55542355706
	ave_grad_size: 0.694843087116
	ave_step_size: 1.82795330924
	test_objective: 0.301359300793
	test_y_col_norms_max: 3.23311685335
	test_y_col_norms_mean: 2.91097673718
	test_y_col_norms_min: 2.20925662298
	test_y_max_max_class: 0.99999504546
	test_y_mean_max_class: 0.883456583251
	test_y_min_max_class: 0.18919041972
	test_y_misclass: 0.0824
	test_y_nll: 0.301359300793
	test_y_row_norms_max: 0.894549596168
	test_y_row_norms_mean: 0.245640441388
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 0.0
	train_objective: 0.312732697075
	train_y_col_norms_max: 3.23311685335
	train_y_col_norms_mean: 2.91097673718
	train_y_col_norms_min: 2.20925662298
	train_y_max_max_class: 0.999997104388
	train_y_mean_max_class: 0.878126747054
	train_y_min_max_class: 0.210295235229
	train_y_misclass: 0.08648
	train_y_nll: 0.312732697075
	train_y_row_norms_max: 0.894549596168
	train_y_row_norms_mean: 0.245640441388
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 47.135716
	valid_objective: 0.294293650438
	valid_y_col_norms_max: 3.23311685335
	valid_y_col_norms_mean: 2.91097673718
	valid_y_col_norms_min: 2.20925662298
	valid_y_max_max_class: 0.999998686662
	valid_y_mean_max_class: 0.885458000598
	valid_y_min_max_class: 0.175666181209
	valid_y_misclass: 0.0807
	valid_y_nll: 0.294293650438
	valid_y_row_norms_max: 0.894549596168
	valid_y_row_norms_mean: 0.245640441388
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.037422 seconds
Time this epoch: 48.883598 seconds
Monitoring step:
	Epochs seen: 2
	Batches seen: 10
	Examples seen: 100000
	ave_grad_mult: 2.56596440313
	ave_grad_size: 0.441610700411
	ave_step_size: 1.16066396621
	test_objective: 0.285237258524
	test_y_col_norms_max: 3.91262647813
	test_y_col_norms_mean: 3.46406578099
	test_y_col_norms_min: 2.63731792036
	test_y_max_max_class: 0.999998908541
	test_y_mean_max_class: 0.89510200425
	test_y_min_max_class: 0.172724479158
	test_y_misclass: 0.0786
	test_y_nll: 0.285237258524
	test_y_row_norms_max: 1.04003162633
	test_y_row_norms_mean: 0.300680907131
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 48.107276
	train_objective: 0.289143688973
	train_y_col_norms_max: 3.91262647813
	train_y_col_norms_mean: 3.46406578099
	train_y_col_norms_min: 2.63731792036
	train_y_max_max_class: 0.999999368949
	train_y_mean_max_class: 0.890736819369
	train_y_min_max_class: 0.224060606814
	train_y_misclass: 0.08084
	train_y_nll: 0.289143688973
	train_y_row_norms_max: 1.04003162633
	train_y_row_norms_mean: 0.300680907131
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.883598
	valid_objective: 0.276589904503
	valid_y_col_norms_max: 3.91262647813
	valid_y_col_norms_mean: 3.46406578099
	valid_y_col_norms_min: 2.63731792036
	valid_y_max_max_class: 0.999998435824
	valid_y_mean_max_class: 0.897311954342
	valid_y_min_max_class: 0.225660718987
	valid_y_misclass: 0.0775
	valid_y_nll: 0.276589904503
	valid_y_row_norms_max: 1.04003162633
	valid_y_row_norms_mean: 0.300680907131
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.032445 seconds
Time this epoch: 48.469979 seconds
Monitoring step:
	Epochs seen: 3
	Batches seen: 15
	Examples seen: 150000
	ave_grad_mult: 2.64427660828
	ave_grad_size: 0.28695642402
	ave_step_size: 0.756031211218
	test_objective: 0.280055491744
	test_y_col_norms_max: 4.38959255973
	test_y_col_norms_mean: 3.85022961032
	test_y_col_norms_min: 3.00824662805
	test_y_max_max_class: 0.999999413457
	test_y_mean_max_class: 0.900924742822
	test_y_min_max_class: 0.232922344039
	test_y_misclass: 0.0779
	test_y_nll: 0.280055491744
	test_y_row_norms_max: 1.12073028284
	test_y_row_norms_mean: 0.339333001502
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.867692
	train_objective: 0.278605710329
	train_y_col_norms_max: 4.38959255973
	train_y_col_norms_mean: 3.85022961032
	train_y_col_norms_min: 3.00824662805
	train_y_max_max_class: 0.999999704668
	train_y_mean_max_class: 0.896695616738
	train_y_min_max_class: 0.225369612588
	train_y_misclass: 0.0778
	train_y_nll: 0.278605710329
	train_y_row_norms_max: 1.12073028284
	train_y_row_norms_mean: 0.339333001502
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.469979
	valid_objective: 0.272806812447
	valid_y_col_norms_max: 4.38959255973
	valid_y_col_norms_mean: 3.85022961032
	valid_y_col_norms_min: 3.00824662805
	valid_y_max_max_class: 0.999998390007
	valid_y_mean_max_class: 0.902116310016
	valid_y_min_max_class: 0.222342784632
	valid_y_misclass: 0.0758
	valid_y_nll: 0.272806812447
	valid_y_row_norms_max: 1.12073028284
	valid_y_row_norms_mean: 0.339333001502
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.034981 seconds
Time this epoch: 48.332214 seconds
Monitoring step:
	Epochs seen: 4
	Batches seen: 20
	Examples seen: 200000
	ave_grad_mult: 2.7328310823
	ave_grad_size: 0.192327609673
	ave_step_size: 0.510358148581
	test_objective: 0.27844626375
	test_y_col_norms_max: 4.70789506799
	test_y_col_norms_mean: 4.16072458576
	test_y_col_norms_min: 3.2649938495
	test_y_max_max_class: 0.999999857874
	test_y_mean_max_class: 0.904837677386
	test_y_min_max_class: 0.235765614223
	test_y_misclass: 0.0784
	test_y_nll: 0.27844626375
	test_y_row_norms_max: 1.18924856325
	test_y_row_norms_mean: 0.370615415845
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.427461
	train_objective: 0.271860810833
	train_y_col_norms_max: 4.70789506799
	train_y_col_norms_mean: 4.16072458576
	train_y_col_norms_min: 3.2649938495
	train_y_max_max_class: 0.999999937335
	train_y_mean_max_class: 0.900762911307
	train_y_min_max_class: 0.215634585856
	train_y_misclass: 0.07636
	train_y_nll: 0.271860810833
	train_y_row_norms_max: 1.18924856325
	train_y_row_norms_mean: 0.370615415845
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.332214
	valid_objective: 0.26726459723
	valid_y_col_norms_max: 4.70789506799
	valid_y_col_norms_mean: 4.16072458576
	valid_y_col_norms_min: 3.2649938495
	valid_y_max_max_class: 0.99999968428
	valid_y_mean_max_class: 0.906190087062
	valid_y_min_max_class: 0.242091672253
	valid_y_misclass: 0.0746
	valid_y_nll: 0.26726459723
	valid_y_row_norms_max: 1.18924856325
	valid_y_row_norms_mean: 0.370615415845
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.032767 seconds
Time this epoch: 48.096713 seconds
Monitoring step:
	Epochs seen: 5
	Batches seen: 25
	Examples seen: 250000
	ave_grad_mult: 2.79555296073
	ave_grad_size: 0.135665128631
	ave_step_size: 0.362963828727
	test_objective: 0.273817364497
	test_y_col_norms_max: 5.03140113083
	test_y_col_norms_mean: 4.43736580535
	test_y_col_norms_min: 3.4996553347
	test_y_max_max_class: 0.999999924695
	test_y_mean_max_class: 0.908420561625
	test_y_min_max_class: 0.20105158513
	test_y_misclass: 0.0784
	test_y_nll: 0.273817364497
	test_y_row_norms_max: 1.27503247719
	test_y_row_norms_mean: 0.398188014557
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.265087
	train_objective: 0.266190019836
	train_y_col_norms_max: 5.03140113083
	train_y_col_norms_mean: 4.43736580535
	train_y_col_norms_min: 3.4996553347
	train_y_max_max_class: 0.999999949637
	train_y_mean_max_class: 0.904266352694
	train_y_min_max_class: 0.214983817154
	train_y_misclass: 0.0747
	train_y_nll: 0.266190019836
	train_y_row_norms_max: 1.27503247719
	train_y_row_norms_mean: 0.398188014557
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.096713
	valid_objective: 0.262773057665
	valid_y_col_norms_max: 5.03140113083
	valid_y_col_norms_mean: 4.43736580535
	valid_y_col_norms_min: 3.4996553347
	valid_y_max_max_class: 0.999999834998
	valid_y_mean_max_class: 0.90977309107
	valid_y_min_max_class: 0.227467467432
	valid_y_misclass: 0.0734
	valid_y_nll: 0.262773057665
	valid_y_row_norms_max: 1.27503247719
	valid_y_row_norms_mean: 0.398188014557
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.040086 seconds
Time this epoch: 48.270360 seconds
Monitoring step:
	Epochs seen: 6
	Batches seen: 30
	Examples seen: 300000
	ave_grad_mult: 2.86472520335
	ave_grad_size: 0.0995817704689
	ave_step_size: 0.27145607381
	test_objective: 0.274650362478
	test_y_col_norms_max: 5.29667864648
	test_y_col_norms_mean: 4.6681660556
	test_y_col_norms_min: 3.69537437025
	test_y_max_max_class: 0.999999953454
	test_y_mean_max_class: 0.909515981021
	test_y_min_max_class: 0.240807995221
	test_y_misclass: 0.0762
	test_y_nll: 0.274650362478
	test_y_row_norms_max: 1.34757538106
	test_y_row_norms_mean: 0.421872641122
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.063998
	train_objective: 0.263465024685
	train_y_col_norms_max: 5.29667864648
	train_y_col_norms_mean: 4.6681660556
	train_y_col_norms_min: 3.69537437025
	train_y_max_max_class: 0.999999967452
	train_y_mean_max_class: 0.904810346072
	train_y_min_max_class: 0.222843798769
	train_y_misclass: 0.07312
	train_y_nll: 0.263465024685
	train_y_row_norms_max: 1.34757538106
	train_y_row_norms_mean: 0.421872641122
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.27036
	valid_objective: 0.264160131695
	valid_y_col_norms_max: 5.29667864648
	valid_y_col_norms_mean: 4.6681660556
	valid_y_col_norms_min: 3.69537437025
	valid_y_max_max_class: 0.999999944173
	valid_y_mean_max_class: 0.910249991543
	valid_y_min_max_class: 0.230041435408
	valid_y_misclass: 0.0738
	valid_y_nll: 0.264160131695
	valid_y_row_norms_max: 1.34757538106
	valid_y_row_norms_mean: 0.421872641122
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.343817 seconds
Time this epoch: 48.146142 seconds
Monitoring step:
	Epochs seen: 7
	Batches seen: 35
	Examples seen: 350000
	ave_grad_mult: 2.94614410339
	ave_grad_size: 0.0791918096972
	ave_step_size: 0.22197684405
	test_objective: 0.27202668856
	test_y_col_norms_max: 5.53374863965
	test_y_col_norms_mean: 4.89227673333
	test_y_col_norms_min: 3.8870453917
	test_y_max_max_class: 0.999999976028
	test_y_mean_max_class: 0.91265320303
	test_y_min_max_class: 0.224122342798
	test_y_misclass: 0.0774
	test_y_nll: 0.27202668856
	test_y_row_norms_max: 1.41189262807
	test_y_row_norms_mean: 0.444192206987
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.509842
	train_objective: 0.260971194008
	train_y_col_norms_max: 5.53374863965
	train_y_col_norms_mean: 4.89227673333
	train_y_col_norms_min: 3.8870453917
	train_y_max_max_class: 0.999999976047
	train_y_mean_max_class: 0.908725530838
	train_y_min_max_class: 0.232349314658
	train_y_misclass: 0.0732
	train_y_nll: 0.260971194008
	train_y_row_norms_max: 1.41189262807
	train_y_row_norms_mean: 0.444192206987
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.146142
	valid_objective: 0.26436051024
	valid_y_col_norms_max: 5.53374863965
	valid_y_col_norms_mean: 4.89227673333
	valid_y_col_norms_min: 3.8870453917
	valid_y_max_max_class: 0.999999949706
	valid_y_mean_max_class: 0.912402441241
	valid_y_min_max_class: 0.22991605073
	valid_y_misclass: 0.0738
	valid_y_nll: 0.26436051024
	valid_y_row_norms_max: 1.41189262807
	valid_y_row_norms_mean: 0.444192206987
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 1.046755 seconds
Time this epoch: 48.541562 seconds
Monitoring step:
	Epochs seen: 8
	Batches seen: 40
	Examples seen: 400000
	ave_grad_mult: 2.9589170095
	ave_grad_size: 0.0661130527395
	ave_step_size: 0.187367468335
	test_objective: 0.270976679584
	test_y_col_norms_max: 5.75592674885
	test_y_col_norms_mean: 5.08230762173
	test_y_col_norms_min: 4.02942401417
	test_y_max_max_class: 0.999999960476
	test_y_mean_max_class: 0.911246352209
	test_y_min_max_class: 0.202601016211
	test_y_misclass: 0.0765
	test_y_nll: 0.270976679584
	test_y_row_norms_max: 1.5186928872
	test_y_row_norms_mean: 0.46350948492
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 50.092236
	train_objective: 0.256781505833
	train_y_col_norms_max: 5.75592674885
	train_y_col_norms_mean: 5.08230762173
	train_y_col_norms_min: 4.02942401417
	train_y_max_max_class: 0.99999996249
	train_y_mean_max_class: 0.907843630275
	train_y_min_max_class: 0.227267591038
	train_y_misclass: 0.07108
	train_y_nll: 0.256781505833
	train_y_row_norms_max: 1.5186928872
	train_y_row_norms_mean: 0.46350948492
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.541562
	valid_objective: 0.261108444735
	valid_y_col_norms_max: 5.75592674885
	valid_y_col_norms_mean: 5.08230762173
	valid_y_col_norms_min: 4.02942401417
	valid_y_max_max_class: 0.999999906762
	valid_y_mean_max_class: 0.912796132628
	valid_y_min_max_class: 0.240817912865
	valid_y_misclass: 0.0717
	valid_y_nll: 0.261108444735
	valid_y_row_norms_max: 1.5186928872
	valid_y_row_norms_mean: 0.46350948492
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.074899 seconds
Time this epoch: 48.921041 seconds
Monitoring step:
	Epochs seen: 9
	Batches seen: 45
	Examples seen: 450000
	ave_grad_mult: 3.07640978739
	ave_grad_size: 0.0578164084249
	ave_step_size: 0.168970324213
	test_objective: 0.269887997532
	test_y_col_norms_max: 5.97236412318
	test_y_col_norms_mean: 5.28605773752
	test_y_col_norms_min: 4.20493453263
	test_y_max_max_class: 0.999999964196
	test_y_mean_max_class: 0.914368745924
	test_y_min_max_class: 0.232516262448
	test_y_misclass: 0.0759
	test_y_nll: 0.269887997532
	test_y_row_norms_max: 1.61850785481
	test_y_row_norms_mean: 0.483502165396
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 52.954261
	train_objective: 0.255329041014
	train_y_col_norms_max: 5.97236412318
	train_y_col_norms_mean: 5.28605773752
	train_y_col_norms_min: 4.20493453263
	train_y_max_max_class: 0.999999968304
	train_y_mean_max_class: 0.911166781743
	train_y_min_max_class: 0.233844764224
	train_y_misclass: 0.07102
	train_y_nll: 0.255329041014
	train_y_row_norms_max: 1.61850785481
	train_y_row_norms_mean: 0.483502165396
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.921041
	valid_objective: 0.260598210855
	valid_y_col_norms_max: 5.97236412318
	valid_y_col_norms_mean: 5.28605773752
	valid_y_col_norms_min: 4.20493453263
	valid_y_max_max_class: 0.99999995121
	valid_y_mean_max_class: 0.9160507243
	valid_y_min_max_class: 0.247938293814
	valid_y_misclass: 0.0707
	valid_y_nll: 0.260598210855
	valid_y_row_norms_max: 1.61850785481
	valid_y_row_norms_mean: 0.483502165396
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.045346 seconds
Time this epoch: 47.756314 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 50
	Examples seen: 500000
	ave_grad_mult: 3.17751170721
	ave_grad_size: 0.0527696885026
	ave_step_size: 0.161604222181
	test_objective: 0.272032441361
	test_y_col_norms_max: 6.15020937219
	test_y_col_norms_mean: 5.45818015281
	test_y_col_norms_min: 4.32908868031
	test_y_max_max_class: 0.999999982392
	test_y_mean_max_class: 0.912849967862
	test_y_min_max_class: 0.243103761542
	test_y_misclass: 0.0766
	test_y_nll: 0.272032441361
	test_y_row_norms_max: 1.70016734551
	test_y_row_norms_mean: 0.500912519066
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 50.532318
	train_objective: 0.253810005663
	train_y_col_norms_max: 6.15020937219
	train_y_col_norms_mean: 5.45818015281
	train_y_col_norms_min: 4.32908868031
	train_y_max_max_class: 0.999999988222
	train_y_mean_max_class: 0.909491011432
	train_y_min_max_class: 0.245272533783
	train_y_misclass: 0.07128
	train_y_nll: 0.253810005663
	train_y_row_norms_max: 1.70016734551
	train_y_row_norms_mean: 0.500912519066
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 47.756314
	valid_objective: 0.262035188475
	valid_y_col_norms_max: 6.15020937219
	valid_y_col_norms_mean: 5.45818015281
	valid_y_col_norms_min: 4.32908868031
	valid_y_max_max_class: 0.999999987956
	valid_y_mean_max_class: 0.913773812077
	valid_y_min_max_class: 0.257571659974
	valid_y_misclass: 0.0732
	valid_y_nll: 0.262035188475
	valid_y_row_norms_max: 1.70016734551
	valid_y_row_norms_mean: 0.500912519066
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.078452 seconds
Time this epoch: 48.350366 seconds
Monitoring step:
	Epochs seen: 11
	Batches seen: 55
	Examples seen: 550000
	ave_grad_mult: 3.20139199162
	ave_grad_size: 0.0497989460367
	ave_step_size: 0.15526891131
	test_objective: 0.269176335486
	test_y_col_norms_max: 6.35982269721
	test_y_col_norms_mean: 5.62561655342
	test_y_col_norms_min: 4.51263517136
	test_y_max_max_class: 0.999999982551
	test_y_mean_max_class: 0.913888237951
	test_y_min_max_class: 0.211160862124
	test_y_misclass: 0.0769
	test_y_nll: 0.269176335486
	test_y_row_norms_max: 1.77393186259
	test_y_row_norms_mean: 0.517690625322
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 48.733069
	train_objective: 0.251423795062
	train_y_col_norms_max: 6.35982269721
	train_y_col_norms_mean: 5.62561655342
	train_y_col_norms_min: 4.51263517136
	train_y_max_max_class: 0.999999981008
	train_y_mean_max_class: 0.910697294132
	train_y_min_max_class: 0.22384883143
	train_y_misclass: 0.07048
	train_y_nll: 0.251423795062
	train_y_row_norms_max: 1.77393186259
	train_y_row_norms_mean: 0.517690625322
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.350366
	valid_objective: 0.260297250861
	valid_y_col_norms_max: 6.35982269721
	valid_y_col_norms_mean: 5.62561655342
	valid_y_col_norms_min: 4.51263517136
	valid_y_max_max_class: 0.999999976584
	valid_y_mean_max_class: 0.914831005565
	valid_y_min_max_class: 0.253224141585
	valid_y_misclass: 0.0707
	valid_y_nll: 0.260297250861
	valid_y_row_norms_max: 1.77393186259
	valid_y_row_norms_mean: 0.517690625322
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.287064 seconds
Time this epoch: 48.337238 seconds
Monitoring step:
	Epochs seen: 12
	Batches seen: 60
	Examples seen: 600000
	ave_grad_mult: 3.18421082504
	ave_grad_size: 0.0480066437265
	ave_step_size: 0.149554335764
	test_objective: 0.268167696714
	test_y_col_norms_max: 6.53694805673
	test_y_col_norms_mean: 5.78651338071
	test_y_col_norms_min: 4.62123127003
	test_y_max_max_class: 0.999999994824
	test_y_mean_max_class: 0.916031399063
	test_y_min_max_class: 0.253071105052
	test_y_misclass: 0.0745
	test_y_nll: 0.268167696714
	test_y_row_norms_max: 1.82371102711
	test_y_row_norms_mean: 0.533708300513
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.521323
	train_objective: 0.250407471096
	train_y_col_norms_max: 6.53694805673
	train_y_col_norms_mean: 5.78651338071
	train_y_col_norms_min: 4.62123127003
	train_y_max_max_class: 0.999999992194
	train_y_mean_max_class: 0.912168287933
	train_y_min_max_class: 0.234702556895
	train_y_misclass: 0.07012
	train_y_nll: 0.250407471096
	train_y_row_norms_max: 1.82371102711
	train_y_row_norms_mean: 0.533708300513
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.337238
	valid_objective: 0.261415798892
	valid_y_col_norms_max: 6.53694805673
	valid_y_col_norms_mean: 5.78651338071
	valid_y_col_norms_min: 4.62123127003
	valid_y_max_max_class: 0.999999982776
	valid_y_mean_max_class: 0.916843313029
	valid_y_min_max_class: 0.236225524738
	valid_y_misclass: 0.0721
	valid_y_nll: 0.261415798892
	valid_y_row_norms_max: 1.82371102711
	valid_y_row_norms_mean: 0.533708300513
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.076885 seconds
Time this epoch: 48.408465 seconds
Monitoring step:
	Epochs seen: 13
	Batches seen: 65
	Examples seen: 650000
	ave_grad_mult: 3.32896781367
	ave_grad_size: 0.0465828940367
	ave_step_size: 0.152447504409
	test_objective: 0.273306774992
	test_y_col_norms_max: 6.73632193798
	test_y_col_norms_mean: 5.95297548891
	test_y_col_norms_min: 4.7863381339
	test_y_max_max_class: 0.999999991705
	test_y_mean_max_class: 0.916068767119
	test_y_min_max_class: 0.228716432447
	test_y_misclass: 0.0765
	test_y_nll: 0.273306774992
	test_y_row_norms_max: 1.908978446
	test_y_row_norms_mean: 0.550136388278
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 49.294602
	train_objective: 0.250124890774
	train_y_col_norms_max: 6.73632193798
	train_y_col_norms_mean: 5.95297548891
	train_y_col_norms_min: 4.7863381339
	train_y_max_max_class: 0.99999999497
	train_y_mean_max_class: 0.912162787419
	train_y_min_max_class: 0.242706558496
	train_y_misclass: 0.06994
	train_y_nll: 0.250124890774
	train_y_row_norms_max: 1.908978446
	train_y_row_norms_mean: 0.550136388278
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.408465
	valid_objective: 0.264447621587
	valid_y_col_norms_max: 6.73632193798
	valid_y_col_norms_mean: 5.95297548891
	valid_y_col_norms_min: 4.7863381339
	valid_y_max_max_class: 0.999999995746
	valid_y_mean_max_class: 0.916910058012
	valid_y_min_max_class: 0.232430962179
	valid_y_misclass: 0.0726
	valid_y_nll: 0.264447621587
	valid_y_row_norms_max: 1.908978446
	valid_y_row_norms_mean: 0.550136388278
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 1.955159 seconds
Time this epoch: 48.118448 seconds
Monitoring step:
	Epochs seen: 14
	Batches seen: 70
	Examples seen: 700000
	ave_grad_mult: 3.3627779367
	ave_grad_size: 0.0461675912642
	ave_step_size: 0.152041664159
	test_objective: 0.271266060728
	test_y_col_norms_max: 6.9214493801
	test_y_col_norms_mean: 6.10749929728
	test_y_col_norms_min: 4.91335986129
	test_y_max_max_class: 0.999999995112
	test_y_mean_max_class: 0.91712864321
	test_y_min_max_class: 0.240703665988
	test_y_misclass: 0.0766
	test_y_nll: 0.271266060728
	test_y_row_norms_max: 1.96655587113
	test_y_row_norms_mean: 0.565461485822
	test_y_row_norms_min: 0.0
	total_seconds_last_epoch: 51.252898
	train_objective: 0.247700760285
	train_y_col_norms_max: 6.9214493801
	train_y_col_norms_mean: 6.10749929728
	train_y_col_norms_min: 4.91335986129
	train_y_max_max_class: 0.999999996171
	train_y_mean_max_class: 0.913135796548
	train_y_min_max_class: 0.237545493213
	train_y_misclass: 0.0687
	train_y_nll: 0.247700760285
	train_y_row_norms_max: 1.96655587113
	train_y_row_norms_mean: 0.565461485822
	train_y_row_norms_min: 0.0
	training_seconds_this_epoch: 48.118448
	valid_objective: 0.261790115276
	valid_y_col_norms_max: 6.9214493801
	valid_y_col_norms_mean: 6.10749929728
	valid_y_col_norms_min: 4.91335986129
	valid_y_max_max_class: 0.999999994777
	valid_y_mean_max_class: 0.917263145852
	valid_y_min_max_class: 0.238750828718
	valid_y_misclass: 0.0718
	valid_y_nll: 0.261790115276
	valid_y_row_norms_max: 1.96655587113
	valid_y_row_norms_mean: 0.565461485822
	valid_y_row_norms_min: 0.0
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.047610 seconds
Saving to softmax_regression.pkl...
Saving to softmax_regression.pkl done. Time elapsed: 0.072144 seconds

As the model trained, it should have printed out progress messages. Most of these are the values of the various channels being monitored throughout training.

How to analyze the results of training

We can use the print_monitor script to print the last monitoring entry of a saved model. By running it on "softmax_regression_best.pkl", we can see the performance of the model at the point where it did the best on the validation set. We see by executing the next cell (the ! mark tells ipython to run a shell command) that the test set misclassification rate is 0.0759, obtained after training for 9 epochs.

In [7]:
! softmax_regression_best.pkl
/u/almahaia/Code/pylearn2/pylearn2/models/ UserWarning: MLP changing the recursion limit.
  warnings.warn("MLP changing the recursion limit.")
epochs seen:  9
time trained:  458.503871202
ave_grad_mult : 3.07640978739
ave_grad_size : 0.0578164084249
ave_step_size : 0.168970324213
test_objective : 0.269887997532
test_y_col_norms_max : 5.97236412318
test_y_col_norms_mean : 5.28605773752
test_y_col_norms_min : 4.20493453263
test_y_max_max_class : 0.999999964196
test_y_mean_max_class : 0.914368745924
test_y_min_max_class : 0.232516262448
test_y_misclass : 0.0759
test_y_nll : 0.269887997532
test_y_row_norms_max : 1.61850785481
test_y_row_norms_mean : 0.483502165396
test_y_row_norms_min : 0.0
total_seconds_last_epoch : 52.954261
train_objective : 0.255329041014
train_y_col_norms_max : 5.97236412318
train_y_col_norms_mean : 5.28605773752
train_y_col_norms_min : 4.20493453263
train_y_max_max_class : 0.999999968304
train_y_mean_max_class : 0.911166781743
train_y_min_max_class : 0.233844764224
train_y_misclass : 0.07102
train_y_nll : 0.255329041014
train_y_row_norms_max : 1.61850785481
train_y_row_norms_mean : 0.483502165396
train_y_row_norms_min : 0.0
training_seconds_this_epoch : 48.921041
valid_objective : 0.260598210855
valid_y_col_norms_max : 5.97236412318
valid_y_col_norms_mean : 5.28605773752
valid_y_col_norms_min : 4.20493453263
valid_y_max_max_class : 0.99999995121
valid_y_mean_max_class : 0.9160507243
valid_y_min_max_class : 0.247938293814
valid_y_misclass : 0.0707
valid_y_nll : 0.260598210855
valid_y_row_norms_max : 1.61850785481
valid_y_row_norms_mean : 0.483502165396
valid_y_row_norms_min : 0.0

Another common way of analyzing trained models is to look at their weights. Here we use the show_weights script to visualize $W$:

In [1]:
! softmax_regression_best.pkl
making weights report
loading model
/u/almahaia/Code/pylearn2/pylearn2/models/ UserWarning: MLP changing the recursion limit.
  warnings.warn("MLP changing the recursion limit.")
loading done
loading dataset...
smallest enc weight magnitude: 0.0
mean enc weight magnitude: 0.121750386838
max enc weight magnitude: 1.46967125826
min norm:  4.20493453263
mean norm:  5.28605773752
max norm:  5.97236412318

Further reading

You can find more information on softmax regression from the following sources:

LISA lab's Deep Learning Tutorials: Classifying MNIST digits using Logistic Regression

Stanford's Unsupervised Feature Learning and Deep Learning wiki: Softmax Regression

This is by no means a complete list.