#!/usr/bin/env python
# coding: utf-8

# # Chapter 7 - Training a State-of-the-Art Model
# > Deep Learning For Coders with fastai & Pytorch - Training a State-of-the-Art Model, this chapter is a bit different, it  contains some technics that make the results better. My plan is taking some notes about this practical technics and come back later when I need them.
# - toc: true 
# - badges: true
# - comments: true
# - categories: [fastbook]
# - image: images/fastbook_images/chapter-07/taner_ceylan.png

# ![](images/chapter-07/taner_ceylan.png)
# This my favorite Turkish coffe cup, designed by German-Turkish artist Taner Ceylan, check his works at [here](https://www.tanerceylan.com)

# In[19]:


#!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
get_ipython().run_line_magic('config', 'Completer.use_jedi = False')


# ## Imagenette

# `Imagenette` is a subset of `ImageNet` that contains 10 classes from the full ImageNet that looked very different from one another. Considering the size of ImageNet, it is very costly and time consuming to create a prototype for your project. Smaller datasets lets you make much more experiments, and could provide insight for your projects direction. 

# In[3]:


from fastai.vision.all import *
path = untar_data(URLs.IMAGENETTE)


# In[4]:


dblock = DataBlock(blocks = (ImageBlock(),CategoryBlock()),
                    get_items=get_image_files,
                    get_y=parent_label,
                    item_tfms=Resize(460),
                    batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = dblock.dataloaders(path,bs=64)


# In[5]:


model=xresnet50(n_out=dls.c)
learn=Learner(dls,model,loss_func=CrossEntropyLossFlat(),metrics=accuracy)
learn.fit_one_cycle(5,3e-3)


# ## Normalization

# Normalized data helps better results. Normalization is your data has a mean of 0 and standart deviation of 1. But our data encoded with numbers between 0 and 255 or sometimes 0-1. Lets check the data in the Imaginette:

# In[6]:


x,y = dls.one_batch()
x.mean(dim=[0,2,3]),x.std(dim=[0,2,3])


# Our data is around 0.5 mean and 0.3 deviation. So it is not in desirable range.With fastai it is possible to normalize our data by adding `Normalize` transform.

# In[7]:


def get_dls(bs, size):
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   get_items=get_image_files,
                   get_y=parent_label,
                   item_tfms=Resize(460),
                   batch_tfms=[*aug_transforms(size=size, min_scale=0.75),
                               Normalize.from_stats(*imagenet_stats)])
    return dblock.dataloaders(path, bs=bs)


# In[8]:


dls = get_dls(64, 224)


# In[9]:


x,y = dls.one_batch()
x.mean(dim=[0,2,3]),x.std(dim=[0,2,3])


# Now it is better. Let's check it if it helped the training process. Same code again for the training.

# In[10]:


model = xresnet50(n_out=dls.c)
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learn.fit_one_cycle(5, 3e-3)


# a little bit better but `Normalization` is much more important when we use pretrained model. Normalizing our data with the original data statistic helps better transfer learning results.

# ## Progressive Resizing

# from the book:
# 
#  Spending most of the epochs training with small images, helps training complete much faster. Completing training using large images makes the final accuracy much higher. We call this approach progressive resizing.

# ### This my check on using progressive resizing

# In[11]:


import time
start_time = time.time()
dls = get_dls(128, 128)
learn = Learner(dls, xresnet50(n_out=dls.c), loss_func=CrossEntropyLossFlat(), 
                metrics=accuracy)
learn.fit_one_cycle(4, 3e-3)


# In[12]:


learn.dls = get_dls(64, 224)
learn.fine_tune(6, 3e-3)
print("--- %s seconds ---" % (time.time() - start_time))


# In[14]:


import time
start_time = time.time()
dls = get_dls(32, 224)
learn = Learner(dls, xresnet50(n_out=dls.c), loss_func=CrossEntropyLossFlat(), 
                metrics=accuracy)
learn.fit_one_cycle(8, 3e-3)
print("--- %s seconds ---" % (time.time() - start_time))


# I've changed some hyperparameters like number of epochs and learning rate. It is faster and  better result most of the time(not in every situation), nice.

# ## Test Time Augmentation

# Random cropping sometimes leads suprising problems.Especially if it used with multicategory images, for example  the objects in the image that close to edges could be ignored totaly. There are some workarounds to solve this problem (squish or stretch them)but most of them couse other kind of problems that could hurt the results. Only downside is validation time would be slower.

# > Warning: How is it possible? Since we do not use validation loss for backpropagation how come it improves our results.

# In[16]:


preds,targs = learn.tta()
accuracy(preds, targs).item()


# from the book:
# 
# jargon: test time augmentation (TTA): During inference or validation, creating multiple versions of each image, using data augmentation, and then taking the average or maximum of the predictions for each augmented version of the image.

# ## Mixup

# Especially used when we don't have enough data and do not have pretrained model that was trained on similar to our dataset.
# 
# from the book:
# Mixup works as follows, for each image:
# 
# 1. Select another image from your dataset at random.
# 1. Pick a weight at random.
# 1. Take a weighted average (using the weight from step 2) of the selected image with your image; this will be your independent variable.
# 1. Take a weighted average (with the same weight) of this image's labels with your image's labels; this will be your dependent variable.
# 
# The paper explains: "While data augmentation consistently leads to improved generalization, the procedure is dataset-dependent, and thus requires the use of expert knowledge." For instance, it's common to flip images as part of data augmentation, but should you flip only horizontally, or also vertically? The answer is that it depends on your dataset. In addition, if flipping (for instance) doesn't provide enough data augmentation for you, you can't "flip more." It's helpful to have data augmentation techniques where you can "dial up" or "dial down" the amount of change, to see what works best for you.
# 
# shows what it looks like when we take a linear combination of images, as done in Mixup.

# In[18]:


#hide_input
#id mixup_example
#caption Mixing a church and a gas station
#alt An image of a church, a gas station and the two mixed up.
church = PILImage.create(get_image_files(path/'train'/'n03028079')[0])
gas = PILImage.create(get_image_files(path/'train'/'n03425413')[0])
church = church.resize((256,256))
gas = gas.resize((256,256))
tchurch = tensor(church).float() / 255.
tgas = tensor(gas).float() / 255.

_,axs = plt.subplots(1, 3, figsize=(12,4))
show_image(tchurch, ax=axs[0]);
show_image(tgas, ax=axs[1]);
show_image((0.3*tchurch + 0.7*tgas), ax=axs[2]);


# I've replaced these rows like above. It seems there is no ``get_image_files_sorted`` method in the fastai.
# ```python
# church = PILImage.create(get_image_files_sorted(path/'train'/'n03028079')[0])
# gas = PILImage.create(get_image_files_sorted(path/'train'/'n03425413')[0])
# ```

# ## Label Smoothing

# > Warning: check the original notebook for this part. Only thing I can say is, it used for making the model less confident for the classification to overcome overfitting.

# In[ ]: