#!/usr/bin/env python
# coding: utf-8

# # (MultiFiT) Portuguese Text Classifier on TCU jurisprudência dataset
# ### MultiFiT configuration
# - **Architecture 4 QRNN with 1550 hidden parameters by layer, SentencePiece tokenizer (15 000 tokens)**
# - **Hyperparameters and training method from the MultiFiT paper**

# - Author: [Pierre Guillou](https://www.linkedin.com/in/pierreguillou)
# - Date: **edition of October 15, 2019** (initial publication on September 2019)
# - Post in medium: [link](https://medium.com/@pierre_guillou/nlp-fastai-portuguese-language-model-980c8ec75362)
# - Ref: [Fastai v1](https://docs.fast.ai/) (Deep Learning library on PyTorch)

# ## Warning (15/10/2019)

# **This notebook is a modified version of the v1 published in September 2019.** Indeed (thanks to [David Vieira](https://medium.com/@davidhsv/ol%C3%A1-pierre-tudo-bom-2bc8ae36dc14)), we noticed that the fine-tuning of the LM and classifier did not use the SentencePiece model and vocab trained for the General Portuguese Language Model ([lm3-portuguese.ipynb](https://github.com/piegu/language-models/blob/master/lm3-portuguese.ipynb)).
# 
# For example, the code used to create the fine-tuned Portuguese forward LM was : 
# 
# ```data_lm = (TextList.from_df(df_trn_val, path, cols=reviews, 
#                             processor=[OpenFileProcessor(), SPProcessor(max_vocab_sz=15000)])
#     .split_by_rand_pct(0.1, seed=42)
#     .label_for_lm()           
#     .databunch(bs=bs, num_workers=1))```
#     
# It has been corrected by using the [SPProcessor.load()](https://github.com/fastai/fastai/blob/master/fastai/text/data.py#L481) function:
# 
# ```data_lm = (TextList.from_df(df_trn_val, path, cols=reviews, processor=SPProcessor.load(dest))
#     .split_by_rand_pct(0.1, seed=42)
#     .label_for_lm()           
#     .databunch(bs=bs, num_workers=1))```
#     
# Therefore, we retrained the fine-tuned Portuguese forward LM and the classifier on TCU jurisprudência dataset and **got better results! :-)** (see the Results paragraph to get all results)
# 
# - **(fine-tuned) Language Model**   
#     - forward : (accuracy) **51.56%** instead of 44.66% | (perplexity) 11.38 instead of 15.97
#     - backward: (accuracy) **52.15%** instead of 44.97% | (perplexity) 12.54 instead of 18.73
# 
# - **(fine-tuned) Text Classifier**
#     - **Accuracy** (ensemble) **97.95%** instead of 97.39%
#     - **f1 score** (ensemble): **0.9795** instead of 0.9737

# ## Information

# ### Overview
# 
# According to this new article "[MultiFiT: Efficient Multi-lingual Language Model Fine-tuning](https://arxiv.org/abs/1909.04761)" (September 10, 2019), the QRNN architecture and the SentencePiece tokenizer give better results than AWD-LSTM and the spaCy tokenizer respectively. 
# 
# Therefore, they have been used in this notebook to **fine-tune a Portuguese Bidirectional Language Model** by Transfer Learning of a Portuguese Bidirectional Language Model (with the QRNN architecture and the SentencePiece tokenizer, too) trained on a Wikipedia corpus of 100 millions tokens ([lm3-portuguese.ipynb](https://github.com/piegu/language-models/blob/master/lm3-portuguese.ipynb)). 
# 
# This Portuguese Bidirectional Language Model has been **fine-tuned on the [tcu_jurisp_reduzido.csv dataset about TCU jurisprudência](https://github.com/fastai-bsb/nlp-tcu-enunciados/blob/master/tcu_jurisp_reduzido.csv?raw=true)"** and **its encoder part has been transfered to a text classifier which has been finally trained on this corpus**.
# 
# This process **LM General --> LM fine-tuned --> Classifier fine-tuned** is called [ULMFiT](http://nlp.fast.ai/category/classification.html) but we trained our 3 models with the hyperparameters values and method of the [MultiFiT](https://arxiv.org/abs/1909.04761) paper that are given at the end of the MultiFiT paper.

# ### Hyperparameters values
# 
# - Language Model
#     - (batch size) bs = 50
#     - (QRNN) 4 QRNN (default: 3) with 1550 hidden parameters each one (default: 1152)
#     - (SentencePiece) vocab of 15000 tokens
#     - (dropout) mult_drop = 1.0
#     - (weight decay) wd = 0.1
#     - (number of training epochs) 20 epochs
#     - (learning rate) modified version of 1-cycle learning rate schedule (Smith, 2018) that uses cosine instead of linear annealing, cyclical momentum and discriminative finetuning
#     - (loss) FlattenedLoss of weighted LabelSmoothingCrossEntropy
#     
# 
# - Sentiment Classifier
#     - (batch size) bs = 18
#     - (SentencePiece) vocab of 15000 tokens
#     - (dropout) mult_drop = 0.3
#     - (weight decay) wd = 0.1
#     - (number of training epochs) 14 epochs (forward) and 19 epochs (backward)
#     - (learning rate) modified version of 1-cycle learning rate schedule (Smith, 2018) that uses cosine instead of linear annealing, cyclical momentum and discriminative finetuning
#     - (loss) FlattenedLoss of weighted LabelSmoothingCrossEntropy 

# ## Results

# **We can conclude that this Bidirectional Portuguese LM model using the MultiFiT configuration is a good model to perform text classification but with about 46 millions of parameters, it is far from being a LM that can gan compete with [GPT-2](https://openai.com/blog/better-language-models/) or [BERT](https://arxiv.org/abs/1810.04805) in NLP tasks like text generation.**
#     
#     
# - **About the data**: the dataset [tcu_jurisp_reduzido.csv](https://github.com/fastai-bsb/nlp-tcu-enunciados/blob/master/tcu_jurisp_reduzido.csv?raw=true) about "TCU jurisprudência" is unbalanced. Therefore, we used a weighted loss function (FlattenedLoss of weighted LabelSmoothingCrossEntropy).
#     - number of texts: 10263
#         - class 0: 3468 (33.79%)
#         - class 1: 2723 (26.53%)
#         - class 2: 2297 (22.38%)
#         - class 3: 1775 (17.3%)
# 
# 
# - **(fine-tuned) Language Model**   
#     - forward : (accuracy) 51.56% | (perplexity) 11.38
#     - backward: (accuracy) 52.15% | (perplexity) 12.54
#     
# 
# - **(fine-tuned) Text Classifier**
# 
#     - **Accuracy**
#         - forward : (global) 97.08% | (class 0) 98.49% | (class 1) 98.24% | (class 2) 96.71% | (class 3) 93.40%
#         - backward: (global) 97.07% | (class 0) 99.10% | (class 1) 97.89% | (class 2) 96.71% | (class 3) 92.89%
#         - ensemble: (global) **97.95%** | (class 0) **99.40%** | (class 1) **99.30%** | (class 2) **97.18%** | (class 3) **94.42%**
# 
#     - **f1 score**
#         - forward: 0.9707
#         - backward: 0.9708
#         - ensemble: **0.9795**
# 
# (neg = negative reviews | pos = positive reviews)

# ## Initialisation

# In[1]:


get_ipython().run_line_magic('reload_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')
get_ipython().run_line_magic('matplotlib', 'inline')

from fastai import *
from fastai.text import *
from fastai.callbacks import *

import matplotlib.cm as cm


# In[2]:


get_ipython().system('python -m fastai.utils.show_install')


# In[2]:


# bs=48
# bs=24
bs=50


# In[3]:


torch.cuda.set_device(0)


# In[4]:


data_path = Config.data_path()


# This will create a `{lang}wiki` folder, containing a `{lang}wiki` text file with the wikipedia contents. (For other languages, replace `{lang}` with the appropriate code from the [list of wikipedias](https://meta.wikimedia.org/wiki/List_of_Wikipedias).)

# In[5]:


lang = 'pt'


# In[6]:


name = f'{lang}wiki'
path = data_path/name
path.mkdir(exist_ok=True, parents=True)

lm_fns3 = [f'{lang}_wt_sp15_multifit', f'{lang}_wt_vocab_sp15_multifit']
lm_fns3_bwd = [f'{lang}_wt_sp15_multifit_bwd', f'{lang}_wt_vocab_sp15_multifit_bwd']


# In[7]:


from sklearn.metrics import f1_score

@np_func
def f1(inp,targ): return f1_score(targ, np.argmax(inp, axis=-1), average='weighted')


# In[8]:


# source: https://github.com/fastai/fastai/blob/master//fastai/layers.py#L300:7
# blog: https://bfarzin.github.io/Label-Smoothing/
class WeightedLabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, weight, eps:float=0.1, reduction='mean'):
        super().__init__()
        self.weight,self.eps,self.reduction = weight,eps,reduction
        
    def forward(self, output, target):
        c = output.size()[-1]
        log_preds = F.log_softmax(output, dim=-1)
        if self.reduction=='sum': loss = -log_preds.sum()
        else:
            loss = -log_preds.sum(dim=-1)
            if self.reduction=='mean':  loss = loss.mean()
        return loss*self.eps/c + (1-self.eps) * F.nll_loss(log_preds, target, weight=self.weight, reduction=self.reduction)


# In[9]:


import warnings
warnings.filterwarnings('ignore')  # "error", "ignore", "always", "default", "module" or "on


# ## Data

# TCU jurisprudência:
# - reduzido: https://github.com/fastai-bsb/nlp-tcu-enunciados/blob/master/tcu_jurisp_reduzido.csv
# - completo: https://github.com/fastai-bsb/nlp-tcu-enunciados/blob/master/tcu_jurisp.csv

# ### Download

# In[8]:


import urllib.request
from converter import *


# In[9]:


# create TCU folder
name_data = 'TCU'
path_data = data_path/name_data
path_data.mkdir(exist_ok=True, parents=True)


# In[10]:


get_ipython().run_cell_magic('time', '', "# Download each file from url and save it locally under file_name\n\nurl = 'https://github.com/fastai-bsb/nlp-tcu-enunciados/blob/master/tcu_jurisp_reduzido.csv?raw=true'\nfile_name = 'tcu_jurisp_reduzido.csv'\nurl_file = path_data/file_name\nurllib.request.urlretrieve(url, url_file)\n\nurl = 'https://raw.githubusercontent.com/fastai-bsb/nlp-tcu-enunciados/master/tcu_jurisp.csv'\nfile_name = 'tcu_jurisp.csv'\nurl_file = path_data/file_name\nurllib.request.urlretrieve(url, url_file)\n")


# In[11]:


path_data.ls()


# In[12]:


get_ipython().system('head -n4 {path_data.ls()[0]}')


# ### Overview

# In[13]:


# to solve display error of pandas dataframe
get_ipython().config.get('IPKernelApp', {})['parent_appname'] = ""


# In[14]:


df = pd.read_csv(path_data/'tcu_jurisp_reduzido.csv', encoding='utf-8')
print(len(df))
print(Counter(df.labels))
df.head()


# In[15]:


df = pd.read_csv(path_data/'tcu_jurisp.csv', encoding='utf-8')
print(len(df))
print(Counter(df.labels))
df.head()


# ### Analysis (reduzido file)

# In[16]:


df = pd.read_csv(path_data/'tcu_jurisp_reduzido.csv', encoding='utf-8')
print(len(df))
print(Counter(df.labels))
df.head()


# In[17]:


# columns names
reviews = "text"
label = "labels"

# keep columns
df2 = df[[reviews,label]].copy()


# In[18]:


# number of reviews
print(f'(orginal csv) number of all reviews: {len(df2)}')

# keep not null reviews
## delete nan reviews
empty_nan = (df2[reviews].isnull()).sum()
df2 = df2[df2[reviews].notnull()]
## delete empty reviews
list_idx_none = []
for idxs, row in df2.iterrows():
    if row[reviews].strip() == "":
        df2.drop(idxs, axis=0, inplace=True)
        list_idx_none.append(idxs)
empty_none = len(list_idx_none)
## print results
empty = empty_nan+empty_none
if empty != 0:
    print(f'{empty} empty reviews were deleted')
else:
    print('there is no empty review.')

# # check that there is no twice the same review
# # keep the first of unique review_id reviews
# same = len(df2) - len(df2[idx].unique())
# if same != 0:
#     df2.drop_duplicates(subset=[idx], inplace=True)
#     print(f'from the {same} identical reviews ids, only the first one has been kept.')
# else:
#     print('there is no identical review id.')

## delete nan labels
empty_label_nan = (df2[label].isnull()).sum()
df2 = df2[df2[label].notnull()]
print(f'{empty_label_nan} reviews with nan label were deleted')

# number of reviews by class
counter = Counter(df2[label])
clas_0, clas_1, clas_2, clas_3 = counter[0], counter[1], counter[2], counter[3]
num = len(df2)
pc_clas_0, pc_clas_1 = round((clas_0/num)*100,2), round((clas_1/num)*100,2)
pc_clas_2, pc_clas_3 = round((clas_2/num)*100,2), round((clas_3/num)*100,2)
print(f'\nnumber of text of class 0: {clas_0} ({pc_clas_0}%)')
print(f'number of text of class 1: {clas_1} ({pc_clas_1}%)')
print(f'number of text of class 2: {clas_2} ({pc_clas_2}%)')
print(f'number of text of class 3: {clas_3} ({pc_clas_3}%)')
print(f'\n(final) number of all texts: {num}')  

# convert HTML caracters to normal letters
df2[reviews] = df2[reviews].apply(convert)

df2.head(5)


# In[19]:


df_trn_val = df2.copy()

# number of reviews by class
counter = Counter(df_trn_val[label])
clas_0, clas_1, clas_2, clas_3 = counter[0], counter[1], counter[2], counter[3]
num = len(df_trn_val)
pc_clas_0, pc_clas_1 = round((clas_0/num)*100,2), round((clas_1/num)*100,2)
pc_clas_2, pc_clas_3 = round((clas_2/num)*100,2), round((clas_3/num)*100,2)
print(f'\nnumber of text of class 0: {clas_0} ({pc_clas_0}%)')
print(f'number of text of class 1: {clas_1} ({pc_clas_1}%)')
print(f'number of text of class 2: {clas_2} ({pc_clas_2}%)')
print(f'number of text of class 3: {clas_3} ({pc_clas_3}%)')
print(f'\n(final) number of all texts: {num}') 

# plot histogram
keys = list(df_trn_val[label].value_counts().keys())
values = list(df_trn_val[label].value_counts().array)
plt.bar(keys, values[::-1]) 
plt.xticks(keys, keys[::-1])
# print(df_trn_val['label'].value_counts())
plt.show()


# In[20]:


df_trn_val.head()


# In[21]:


df_trn_val.to_csv(path_data/'tcu_jurisp_reduzido_preprocessed.csv', index = None, header=True)


# ## Fine-tuning "forward LM"

# In[10]:


name_data = 'TCU'
path_data = data_path/name_data

# Load csv
df_trn_val = pd.read_csv(path_data/'tcu_jurisp_reduzido_preprocessed.csv')

# columns names
reviews = "text"
label = "labels"


# In[11]:


dest = path/'corpus2_100'
(dest/'tmp').ls()


# ### Databunch

# In[22]:


get_ipython().run_cell_magic('time', '', 'data_lm = (TextList.from_df(df_trn_val, path, cols=reviews, processor=SPProcessor.load(dest))\n    .split_by_rand_pct(0.1, seed=42)\n    .label_for_lm()           \n    .databunch(bs=bs, num_workers=1))\n')


# In[23]:


data_lm.save(f'{path}/{lang}_databunch_lm_tcu_jurisp_reduzido_sp15_multifit_v2')


# ### Training

# In[24]:


data_lm = load_data(path, f'{lang}_databunch_lm_tcu_jurisp_reduzido_sp15_multifit_v2', bs=bs)


# In[25]:


config = awd_lstm_lm_config.copy()
config['qrnn'] = True
config['n_hid'] = 1550 #default 1152
config['n_layers'] = 4 #default 3


# In[26]:


get_ipython().run_cell_magic('time', '', 'perplexity = Perplexity()\nlearn_lm = language_model_learner(data_lm, AWD_LSTM, config=config, pretrained_fnames=lm_fns3, drop_mult=1., \n                                  metrics=[error_rate, accuracy, perplexity]).to_fp16()\n')


# In[27]:


# number of model parameters
sum([p.numel() for p in learn_lm.model.parameters()])


# In[28]:


learn_lm.model


# #### Change loss function

# In[29]:


learn_lm.loss_func


# In[30]:


learn_lm.loss_func = FlattenedLoss(LabelSmoothingCrossEntropy)


# In[31]:


learn_lm.loss_func


# #### Training

# In[32]:


learn_lm.lr_find()


# In[33]:


learn_lm.recorder.plot()


# In[35]:


lr = 2e-2
lr *= bs/48

wd = 0.1


# In[36]:


learn_lm.fit_one_cycle(2, lr*10, wd=wd, moms=(0.8,0.7))


# In[37]:


learn_lm.save(f'{lang}fine_tuned1_tcu_jurisp_reduzido_sp15_multifit_v2')
learn_lm.save_encoder(f'{lang}fine_tuned1_enc_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[38]:


learn_lm.unfreeze()
learn_lm.fit_one_cycle(18, lr, wd=wd, moms=(0.8,0.7), callbacks=[ShowGraph(learn_lm)])


# In[39]:


learn_lm.save(f'{lang}fine_tuned2_lenerbr_sp15_multifit_v2')
learn_lm.save_encoder(f'{lang}fine_tuned2_enc_lenerbr_sp15_multifit_v2')


# Save best LM learner and its encoder

# In[40]:


learn_lm.save(f'{lang}fine_tuned_tcu_jurisp_reduzido_sp15_multifit_v2')
learn_lm.save_encoder(f'{lang}fine_tuned_enc_tcu_jurisp_reduzido_sp15_multifit_v2')


# ## Fine-tuning "backward LM"

# ### Databunch

# In[41]:


get_ipython().run_cell_magic('time', '', 'data_lm = (TextList.from_df(df_trn_val, path, cols=reviews, processor=SPProcessor.load(dest))\n    .split_by_rand_pct(0.1, seed=42)\n    .label_for_lm()           \n    .databunch(bs=bs, num_workers=1, backwards=True))\n')


# In[42]:


data_lm.save(f'{path}/{lang}_databunch_lm_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# ### Training

# In[43]:


get_ipython().run_cell_magic('time', '', "data_lm = load_data(path, f'{lang}_databunch_lm_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', bs=bs, backwards=True)\n")


# In[44]:


config = awd_lstm_lm_config.copy()
config['qrnn'] = True
config['n_hid'] = 1550 #default 1152
config['n_layers'] = 4 #default 3


# In[45]:


get_ipython().run_cell_magic('time', '', 'perplexity = Perplexity()\nlearn_lm = language_model_learner(data_lm, AWD_LSTM, config=config, pretrained_fnames=lm_fns3_bwd, drop_mult=1., \n                                  metrics=[error_rate, accuracy, perplexity]).to_fp16()\n')


# #### Change loss function

# In[46]:


learn_lm.loss_func


# In[47]:


learn_lm.loss_func = FlattenedLoss(LabelSmoothingCrossEntropy)


# In[48]:


learn_lm.loss_func


# #### Training

# In[49]:


learn_lm.lr_find()


# In[50]:


learn_lm.recorder.plot()


# In[51]:


lr = 2e-2
lr *= bs/48

wd = 0.1


# In[52]:


learn_lm.fit_one_cycle(2, lr*10, wd=wd, moms=(0.8,0.7))


# In[53]:


learn_lm.save(f'{lang}fine_tuned1_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')
learn_lm.save_encoder(f'{lang}fine_tuned1_enc_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[54]:


learn_lm.unfreeze()
learn_lm.fit_one_cycle(18, lr, wd=wd, moms=(0.8,0.7), callbacks=[ShowGraph(learn_lm)])


# In[55]:


learn_lm.save(f'{lang}fine_tuned2_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')
learn_lm.save_encoder(f'{lang}fine_tuned2_enc_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# Save best LM learner and its encoder

# In[56]:


learn_lm.save(f'{lang}fine_tuned_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')
learn_lm.save_encoder(f'{lang}fine_tuned_enc_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# ## Fine-tuning "forward Classifier"

# In[12]:


bs = 18


# ### Databunch

# In[13]:


get_ipython().run_cell_magic('time', '', "data_lm = load_data(path, f'{lang}_databunch_lm_tcu_jurisp_reduzido_sp15_multifit_v2', bs=bs)\n")


# In[14]:


get_ipython().run_cell_magic('time', '', 'data_clas = (TextList.from_df(df_trn_val, path, vocab=data_lm.vocab, cols=reviews, processor=SPProcessor.load(dest))\n    .split_by_rand_pct(0.1, seed=42)\n    .label_from_df(cols=label)\n    .databunch(bs=bs, num_workers=1))\n')


# In[15]:


get_ipython().run_cell_magic('time', '', "data_clas.save(f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_v2')\n")


# ### Get weights to penalize loss function of the majority class

# In[17]:


get_ipython().run_cell_magic('time', '', "data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_v2', bs=bs, num_workers=1)\n")


# In[18]:


num_trn = len(data_clas.train_ds.x)
num_val = len(data_clas.valid_ds.x)
num_trn, num_val, num_trn+num_val


# In[19]:


trn_LabelCounts = np.unique(data_clas.train_ds.y.items, return_counts=True)[1]
val_LabelCounts = np.unique(data_clas.valid_ds.y.items, return_counts=True)[1]
trn_LabelCounts, val_LabelCounts


# In[20]:


trn_weights = [1 - count/num_trn for count in trn_LabelCounts]
val_weights = [1 - count/num_val for count in val_LabelCounts]
trn_weights, val_weights


# ### Training (Loss = FlattenedLoss of weighted LabelSmoothingCrossEntropy)

# In[46]:


get_ipython().run_cell_magic('time', '', "data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_v2', bs=bs, num_workers=1)\n")


# In[47]:


config = awd_lstm_clas_config.copy()
config['qrnn'] = True
config['n_hid'] = 1550 #default 1152
config['n_layers'] = 4 #default 3


# In[48]:


learn_c = text_classifier_learner(data_clas, AWD_LSTM, config=config, pretrained=False, drop_mult=0.3, 
                                  metrics=[accuracy,f1]).to_fp16()
learn_c.load_encoder(f'{lang}fine_tuned_enc_tcu_jurisp_reduzido_sp15_multifit_v2');


# #### Change loss function

# In[49]:


learn_c.loss_func


# In[50]:


loss_weights = torch.FloatTensor(trn_weights).cuda()
learn_c.loss_func = FlattenedLoss(WeightedLabelSmoothingCrossEntropy, weight=loss_weights)


# In[51]:


learn_c.loss_func


# #### Training

# In[52]:


learn_c.freeze()


# In[28]:


learn_c.lr_find()


# In[29]:


learn_c.recorder.plot()


# In[53]:


lr = 2e-1
lr *= bs/48

wd = 0.1


# In[54]:


learn_c.fit_one_cycle(2, lr, wd=wd, moms=(0.8,0.7))


# In[55]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[56]:


learn_c.fit_one_cycle(2, lr, wd=wd, moms=(0.8,0.7))


# In[57]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[58]:


learn_c.freeze_to(-2)
learn_c.fit_one_cycle(2, slice(lr/(2.6**4),lr), wd=wd, moms=(0.8,0.7))


# In[59]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[60]:


learn_c.freeze_to(-3)
learn_c.fit_one_cycle(2, slice(lr/2/(2.6**4),lr/2), wd=wd, moms=(0.8,0.7))


# In[61]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[62]:


learn_c.unfreeze()
learn_c.fit_one_cycle(4, slice(lr/10/(2.6**4),lr/10), wd=wd, moms=(0.8,0.7))


# In[63]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[64]:


learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')
learn_c.fit_one_cycle(4, slice(lr/100/(2.6**4),lr/100), wd=wd, moms=(0.8,0.7))


# In[65]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# In[69]:


learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')
learn_c.fit_one_cycle(2, slice(lr/1000/(2.6**4),lr/1000), wd=wd, moms=(0.8,0.7))


# In[70]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2')


# ### Confusion matrix

# In[71]:


get_ipython().run_cell_magic('time', '', "data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_v2', bs=bs, num_workers=1);\n\nconfig = awd_lstm_clas_config.copy()\nconfig['qrnn'] = True\nconfig['n_hid'] = 1550 #default 1152\nconfig['n_layers'] = 4 #default 3\n\nlearn_c = text_classifier_learner(data_clas, AWD_LSTM, config=config)\n")


# In[72]:


learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2', purge=False);


# In[73]:


preds,y,losses = learn_c.get_preds(with_loss=True)
predictions = np.argmax(preds, axis = 1)

interp = ClassificationInterpretation(learn_c, preds, y, losses)
interp.plot_confusion_matrix()


# In[74]:


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(np.array(y), np.array(predictions))
print(cm)

## acc
print(f'accuracy global: {(cm[0,0]+cm[1,1]+cm[2,2]+cm[3,3])/(cm.sum())}')

# acc neg, acc pos
print(f'accuracy on class 0: {cm[0,0]/(cm.sum(1)[0])*100}') 
print(f'accuracy on class 1: {cm[1,1]/(cm.sum(1)[1])*100}')
print(f'accuracy on class 2: {cm[2,2]/(cm.sum(1)[2])*100}')
print(f'accuracy on class 3: {cm[3,3]/(cm.sum(1)[3])*100}')


# In[75]:


learn_c.show_results()


# ### Predictions some random sentences

# In[76]:


# Get the prediction
test_text = "A medida cautelar do TCU que determina a suspensão de licitação por falhas no edital não impede o órgão ou a entidade de rever seu ato convocatório, valendo-se do poder de autotutela (art. 49 da Lei 8.666/1993 c/c o art. 9º da Lei 10.520/2002) , com o objetivo de, antecipando-se a eventual deliberação do Tribunal, promover de modo próprio a anulação da licitação e o refazimento do edital, livre dos vícios apontados."
pred = learn_c.predict(test_text)
print(pred)


# In[77]:


# The darker the word-shading in the below example, the more it contributes to the classification. 
txt_ci = TextClassificationInterpretation.from_learner(learn_c)
txt_ci.show_intrinsic_attention(test_text,cmap=plt.cm.Purples)


# In[78]:


txt_ci.intrinsic_attention(test_text)[1]


# In[79]:


# tabulation showing the first k texts in top_losses along with their prediction, actual,loss, and probability of actual class.
# max_len is the maximum number of tokens displayed. If max_len=None, it will display all tokens.
txt_ci.show_top_losses(5)


# ## Fine-tuning "backward Classifier"

# In[80]:


import warnings
warnings.filterwarnings('ignore')  # "error", "ignore", "always", "default", "module" or "on


# In[81]:


bs = 18


# ### Databunch

# In[82]:


get_ipython().run_cell_magic('time', '', "data_lm = load_data(path, f'{lang}_databunch_lm_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', bs=bs, backwards=True)\n")


# In[83]:


get_ipython().run_cell_magic('time', '', 'data_clas = (TextList.from_df(df_trn_val, path, cols=reviews, processor=SPProcessor.load(dest), vocab=data_lm.vocab)\n    .split_by_rand_pct(0.1, seed=42)\n    .label_from_df(cols=label)\n    .databunch(bs=bs, num_workers=1, backwards=True))\n')


# In[84]:


get_ipython().run_cell_magic('time', '', "data_clas.save(f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')\n")


# ### Get weights to penalize loss function of the majority class

# In[85]:


get_ipython().run_cell_magic('time', '', "data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', bs=bs, num_workers=1, backwards=True)\n")


# In[86]:


num_trn = len(data_clas.train_ds.x)
num_val = len(data_clas.valid_ds.x)
num_trn, num_val, num_trn+num_val


# In[87]:


trn_LabelCounts = np.unique(data_clas.train_ds.y.items, return_counts=True)[1]
val_LabelCounts = np.unique(data_clas.valid_ds.y.items, return_counts=True)[1]
trn_LabelCounts, val_LabelCounts


# In[88]:


trn_weights = [1 - count/num_trn for count in trn_LabelCounts]
val_weights = [1 - count/num_val for count in val_LabelCounts]
trn_weights, val_weights


# ### Training (Loss = FlattenedLoss of weighted LabelSmoothingCrossEntropy)

# In[89]:


get_ipython().run_cell_magic('time', '', "data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', bs=bs, num_workers=1, backwards=True)\n")


# In[90]:


config = awd_lstm_clas_config.copy()
config['qrnn'] = True
config['n_hid'] = 1550 #default 1152
config['n_layers'] = 4 #default 3


# In[91]:


learn_c = text_classifier_learner(data_clas, AWD_LSTM, config=config, drop_mult=0.3, metrics=[accuracy,f1]).to_fp16()
learn_c.load_encoder(f'{lang}fine_tuned_enc_tcu_jurisp_reduzido_sp15_multifit_bwd_v2');


# #### Change loss function

# In[92]:


learn_c.loss_func


# In[93]:


loss_weights = torch.FloatTensor(trn_weights).cuda()
learn_c.loss_func = FlattenedLoss(WeightedLabelSmoothingCrossEntropy, weight=loss_weights)


# In[94]:


learn_c.loss_func


# #### Training

# In[95]:


learn_c.freeze()


# In[96]:


learn_c.lr_find()


# In[97]:


learn_c.recorder.plot()


# In[98]:


lr = 2e-1
lr *= bs/48

wd = 0.1


# In[99]:


learn_c.fit_one_cycle(2, lr, wd=wd, moms=(0.8,0.7))


# In[100]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[101]:


learn_c.fit_one_cycle(2, lr, wd=wd, moms=(0.8,0.7))


# In[102]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[103]:


learn_c.freeze_to(-2)
learn_c.fit_one_cycle(2, slice(lr/(2.6**4),lr), wd=wd, moms=(0.8,0.7))


# In[104]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[105]:


learn_c.freeze_to(-3)
learn_c.fit_one_cycle(2, slice(lr/2/(2.6**4),lr/2), wd=wd, moms=(0.8,0.7))


# In[106]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[107]:


learn_c.unfreeze()
learn_c.fit_one_cycle(4, slice(lr/10/(2.6**4),lr/10), wd=wd, moms=(0.8,0.7))


# In[108]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[109]:


learn_c.fit_one_cycle(4, slice(lr/100/(2.6**4),lr/100), wd=wd, moms=(0.8,0.7))


# In[110]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[115]:


learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')
learn_c.fit_one_cycle(1, slice(lr/1000/(2.6**4),lr/1000), wd=wd, moms=(0.8,0.7))


# In[116]:


learn_c.fit_one_cycle(1, slice(lr/1000/(2.6**4),lr/1000), wd=wd, moms=(0.8,0.7))


# In[117]:


learn_c.save(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# In[118]:


learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2');
learn_c.to_fp32().export(f'{lang}_classifier_tcu_jurisp_reduzido_sp15_multifit_bwd_v2')


# ### Confusion matrix

# In[119]:


get_ipython().run_cell_magic('time', '', "data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', bs=bs, num_workers=1, backwards=True)\n\nconfig = awd_lstm_clas_config.copy()\nconfig['qrnn'] = True\nconfig['n_hid'] = 1550 #default 1152\nconfig['n_layers'] = 4 #default 3\n\nlearn_c = text_classifier_learner(data_clas, AWD_LSTM, config=config)\n")


# In[120]:


learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', purge=False);


# In[121]:


preds,y,losses = learn_c.get_preds(with_loss=True)
predictions = np.argmax(preds, axis = 1)

interp = ClassificationInterpretation(learn_c, preds, y, losses)
interp.plot_confusion_matrix()


# In[122]:


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(np.array(y), np.array(predictions))
print(cm)

## acc
print(f'accuracy global: {(cm[0,0]+cm[1,1]+cm[2,2]+cm[3,3])/(cm.sum())}')

# acc neg, acc pos
print(f'accuracy on class 0: {cm[0,0]/(cm.sum(1)[0])*100}') 
print(f'accuracy on class 1: {cm[1,1]/(cm.sum(1)[1])*100}')
print(f'accuracy on class 2: {cm[2,2]/(cm.sum(1)[2])*100}')
print(f'accuracy on class 3: {cm[3,3]/(cm.sum(1)[3])*100}')


# In[123]:


learn_c.show_results()


# ### Predictions some random sentences

# In[124]:


# Get the prediction
test_text = "A medida cautelar do TCU que determina a suspensão de licitação por falhas no edital não impede o órgão ou a entidade de rever seu ato convocatório, valendo-se do poder de autotutela (art. 49 da Lei 8.666/1993 c/c o art. 9º da Lei 10.520/2002) , com o objetivo de, antecipando-se a eventual deliberação do Tribunal, promover de modo próprio a anulação da licitação e o refazimento do edital, livre dos vícios apontados."
pred = learn_c.predict(test_text)
print(pred)


# In[125]:


# The darker the word-shading in the below example, the more it contributes to the classification. 
txt_ci = TextClassificationInterpretation.from_learner(learn_c)
txt_ci.show_intrinsic_attention(test_text,cmap=plt.cm.Purples)


# In[126]:


txt_ci.intrinsic_attention(test_text)[1]


# In[127]:


# tabulation showing the first k texts in top_losses along with their prediction, actual,loss, and probability of actual class.
# max_len is the maximum number of tokens displayed. If max_len=None, it will display all tokens.
txt_ci.show_top_losses(5)


# ## Ensemble

# In[128]:


bs = 18


# In[129]:


config = awd_lstm_clas_config.copy()
config['qrnn'] = True
config['n_hid'] = 1550 #default 1152
config['n_layers'] = 4 #default 3


# In[130]:


data_clas = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_v2', bs=bs, num_workers=1)
learn_c = text_classifier_learner(data_clas, AWD_LSTM, config=config, drop_mult=0.3, metrics=[accuracy,f1]).to_fp16()
learn_c.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_v2', purge=False);


# In[131]:


preds,targs = learn_c.get_preds(ordered=True)
accuracy(preds,targs),f1(preds,targs)


# In[132]:


data_clas_bwd = load_data(path, f'{lang}_textlist_class_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', bs=bs, num_workers=1, backwards=True)
learn_c_bwd = text_classifier_learner(data_clas_bwd, AWD_LSTM, config=config, drop_mult=0.3, metrics=[accuracy,f1]).to_fp16()
learn_c_bwd.load(f'{lang}clas_tcu_jurisp_reduzido_sp15_multifit_bwd_v2', purge=False);


# In[133]:


preds_b,targs_b = learn_c_bwd.get_preds(ordered=True)
accuracy(preds_b,targs_b),f1(preds_b,targs_b)


# In[134]:


preds_avg = (preds+preds_b)/2


# In[135]:


accuracy(preds_avg,targs_b),f1(preds_avg,targs_b)


# In[136]:


from sklearn.metrics import confusion_matrix

predictions = np.argmax(preds_avg, axis = 1)
cm = confusion_matrix(np.array(targs_b), np.array(predictions))
print(cm)

## acc
print(f'accuracy global: {(cm[0,0]+cm[1,1]+cm[2,2]+cm[3,3])/(cm.sum())}')

# acc neg, acc pos
print(f'accuracy on class 0: {cm[0,0]/(cm.sum(1)[0])*100}') 
print(f'accuracy on class 1: {cm[1,1]/(cm.sum(1)[1])*100}')
print(f'accuracy on class 2: {cm[2,2]/(cm.sum(1)[2])*100}')
print(f'accuracy on class 3: {cm[3,3]/(cm.sum(1)[3])*100}')


# In[ ]: