import torch
import warnings
import sys
sys.path.append('/home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/')
import d2l
warnings.filterwarnings('ignore')
class Classifier(d2l.Module):
def validation_step(self, batch):
y_hat = self(*batch[:-1])
self.plot('loss', self.loss(y_hat, batch[-1]), train=False)
self.plot('acc', self.accuracy(y_hat, batch[-1]), train=False)
def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=self.lr)
def accuracy(self, y_hat, y, averaged=True):
y_hat = y_hat.reshape((-1, y_hat.shape[-1]))
preds = y_hat.argmax(axis=1).type(y.dtype)
comp = (preds == y.reshape(-1)).type(torch.float32)
return comp.mean if averaged else comp
/home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/d2l.py:119: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(self, 'net'), 'Neural network is defined' /home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/d2l.py:123: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(self, 'trainer'), 'trainer is not inited'
We assume that the validation dataset is split into $N$ samples, and each minibatch contains $M$ samples. The quick and dirty estimate $L_v^q$ is computed by averaging the loss computed on each minibatch. Since there are $N$ samples in total, and each minibatch contains $M$ samples, there are $N/M$ minibatches in total. Now, let's express $L_v$ in terms of $L_v^q$, $l_v^b$, $N$, and $M$: $L_v$ is the true validation loss, and it can be considered as an average of the batch losses: $$L_v = \frac{M}{N} \sum_{i=1}^{N/M}l_v^q$$
Hint: express the expected loss, using $l$ and $p(y|x)$.
The optimal selection of $y^\prime$ in a multiclass classification scenario can be formulated using the concept of expected loss. Given a true class $y$ and a predicted class $y^\prime$, and assuming that $p(y|x)$ represents the probability of observing class $y$ given input $x$, the expected loss can be used to guide the decision-making process. The expected loss $\mathbb{E}[l(y, y^\prime)]$ is the average loss that we expect to incur when predicting $y^\prime$ while the true class is $y$. To minimize the expected loss, we need to select the $y^\prime$ that minimizes this average. The optimal selection of $y^\prime$ can be formulated as follows: $$y^\prime = \arg\min_{\text{all possible } y^\prime} \sum_{y} p(y|x) \cdot l(y, y^\prime)$$