from fastai import * # Quick access to most common functionality
from fastai.text import * # Quick access to NLP functionality
An example of creating a language model and then transfering to a classifier.
path = untar_data(URLs.IMDB_SAMPLE)
path
PosixPath('/home/jhoward/.fastai/data/imdb_sample')
Open and view the independent and dependent variables:
df = pd.read_csv(path/'texts.csv', header=None)
df.head()
0 | 1 | 2 | |
---|---|---|---|
0 | label | text | is_valid |
1 | negative | Un-bleeping-believable! Meg Ryan doesn't even ... | False |
2 | positive | This is a extremely well-made film. The acting... | False |
3 | negative | Every once in a long while a movie will come a... | False |
4 | positive | Name just says it all. I watched this movie wi... | False |
Create a DataBunch
for each of the language model and the classifier:
data_lm = TextLMDataBunch.from_csv(path, 'texts.csv')
data_clas = TextClasDataBunch.from_csv(path, 'texts.csv', vocab=data_lm.train_ds.vocab, bs=42)
We'll fine-tune the language model. fast.ai has a pre-trained English model available that we can download, we jsut have to specify it like this:
moms = (0.8,0.7)
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103)
learn.unfreeze()
learn.fit_one_cycle(4, slice(1e-2), moms=moms)
Total time: 00:22 epoch train_loss valid_loss accuracy 1 4.937087 4.177917 0.246990 (00:05) 2 4.648961 4.078122 0.255556 (00:05) 3 4.429574 4.040652 0.257715 (00:05) 4 4.258292 4.030890 0.259000 (00:05)
Save our language model's encoder:
learn.save_encoder('enc')
Fine tune it to create a classifier:
learn = text_classifier_learner(data_clas)
learn.load_encoder('enc')
learn.freeze()
learn.fit_one_cycle(4, moms=moms)
Total time: 00:23 epoch train_loss valid_loss accuracy 1 0.667434 0.633349 0.676617 (00:05) 2 0.656408 0.583322 0.696517 (00:06) 3 0.637747 0.562945 0.736318 (00:05) 4 0.611062 0.560547 0.736318 (00:05)
learn.unfreeze()
learn.fit_one_cycle(8, slice(1e-5,1e-3), moms=moms)
Total time: 01:31 epoch train_loss valid_loss accuracy 1 0.599531 0.552196 0.741294 (00:10) 2 0.593956 0.546969 0.706468 (00:11) 3 0.580071 0.536979 0.716418 (00:10) 4 0.589724 0.512848 0.746269 (00:12) 5 0.578401 0.495420 0.766169 (00:10) 6 0.593369 0.505710 0.791045 (00:11) 7 0.594004 0.514395 0.781095 (00:11) 8 0.601742 0.503753 0.791045 (00:12)