fastai offers several widgets to support the workflow of a deep learning practitioner. The purpose of the widgets are to help you organize, clean, and prepare your data for your model. Widgets are separated by data type.
from fastai.vision import *
from fastai.widgets import DatasetFormatter, ImageCleaner, ImageDownloader, download_google_images
from fastai.gen_doc.nbdoc import *
%reload_ext autoreload
%autoreload 2
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
learn = cnn_learner(data, models.resnet18, metrics=error_rate)
learn.fit_one_cycle(2)
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 0.167665 | 0.106727 | 0.037291 |
2 | 0.103579 | 0.077936 | 0.023553 |
learn.save('stage-1')
We create a databunch with all the data in the training set and no validation set (DatasetFormatter uses only the training set)
db = (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.databunch())
learn = cnn_learner(db, models.resnet18, metrics=[accuracy])
learn.load('stage-1');
show_doc(DatasetFormatter)
class
DatasetFormatter
[source][test]
DatasetFormatter
()
No tests found for DatasetFormatter
. To contribute a test please refer to this guide and this discussion.
Returns a dataset with the appropriate format and file indices to be displayed.
The DatasetFormatter
class prepares your image dataset for widgets by returning a formatted DatasetTfm
based on the DatasetType
specified. Use from_toplosses
to grab the most problematic images directly from your learner. Optionally, you can restrict the formatted dataset returned to n_imgs
.
show_doc(DatasetFormatter.from_similars)
from_similars
[source][test]
from_similars
(learn
,layer_ls
:list
=*[0, 7, 2]
, ***kwargs
**)
No tests found for from_similars
. To contribute a test please refer to this guide and this discussion.
Gets the indices for the most similar images.
from fastai.gen_doc.nbdoc import *
from fastai.widgets.image_cleaner import *
show_doc(DatasetFormatter.from_toplosses)
from_toplosses
[source][test]
from_toplosses
(learn
,n_imgs
=*None
, ***kwargs
**)
No tests found for from_toplosses
. To contribute a test please refer to this guide and this discussion.
Gets indices with top losses.
show_doc(ImageCleaner)
class
ImageCleaner
[source][test]
ImageCleaner
(dataset
,fns_idxs
,path
,batch_size
:int
=*5
,duplicates
=False
*)
Tests found for ImageCleaner
:
pytest -sv tests/test_widgets_image_cleaner.py::test_image_cleaner_index_length_mismatch
[source]pytest -sv tests/test_widgets_image_cleaner.py::test_image_cleaner_length_correct
[source]pytest -sv tests/test_widgets_image_cleaner.py::test_image_cleaner_wrong_input_type
[source]To run tests please refer to this guide.
Displays images for relabeling or deletion and saves changes in path
as 'cleaned.csv'.
ImageCleaner
is for cleaning up images that don't belong in your dataset. It renders images in a row and gives you the opportunity to delete the file from your file system. To use ImageCleaner
we must first use DatasetFormatter().from_toplosses
to get the suggested indices for misclassified images.
ds, idxs = DatasetFormatter().from_toplosses(learn)
ImageCleaner(ds, idxs, path)
HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…
Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())
<fastai.widgets.image_cleaner.ImageCleaner at 0x7f9da3659b00>
ImageCleaner
does not change anything on disk (neither labels or existence of images). Instead, it creates a 'cleaned.csv' file in your data path from which you need to load your new databunch for the files to changes to be applied.
df = pd.read_csv(path/'cleaned.csv', header='infer')
# We create a databunch from our csv. We include the data in the training set and we don't use a validation set (DatasetFormatter uses only the training set)
np.random.seed(42)
db = (ImageList.from_df(df, path)
.split_none()
.label_from_df()
.databunch(bs=64))
learn = cnn_learner(db, models.resnet18, metrics=error_rate)
learn = learn.load('stage-1')
You can then use ImageCleaner
again to find duplicates in the dataset. To do this, you can specify duplicates=True
while calling ImageCleaner after getting the indices and dataset from .from_similars
. Note that if you are using a layer's output which has dimensions (n_batches, n_features, 1, 1)
then you don't need any pooling (this is the case with the last layer). The suggested use of .from_similars()
with resnets is using the last layer and no pooling, like in the following cell.
ds, idxs = DatasetFormatter().from_similars(learn, layer_ls=[0,7,1], pool=None)
Getting activations...
Computing similarities...
ImageCleaner(ds, idxs, path, duplicates=True)
HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…
Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())
<fastai.widgets.image_cleaner.ImageCleaner at 0x7f9d3dfd53c8>
show_doc(ImageDownloader)
ImageDownloader
widget gives you a way to quickly bootstrap your image dataset without leaving the notebook. It searches and downloads images that match the search criteria and resolution / quality requirements and stores them on your filesystem within the provided path
.
Images for each search query (or label) are stored in a separate folder within path
. For example, if you pupulate tiger
with a path
setup to ./data
, you'll get a folder ./data/tiger/
with the tiger images in it.
ImageDownloader
will automatically clean up and verify the downloaded images with verify_images()
after downloading them.
path = Config.data_path()/'image_downloader'
os.makedirs(path, exist_ok=True)
ImageDownloader(path)
VBox(children=(HBox(children=(Text(value='', placeholder='What images to search for?'), BoundedIntText(value=1…
<fastai.widgets.image_downloader.ImageDownloader at 0x7f9da36599b0>
path = Config.data_path()/'image_downloader'
files = download_google_images(path, 'aussie shepherd', size='>1024*768', n_images=30)
len(files)
30
show_doc(download_google_images)
download_google_images
[source][test]
download_google_images
(path
:PathOrStr
,search_term
:str
,size
:str
=*'>400*300'
,n_images
:int
=10
,format
:str
='jpg'
,max_workers
:int
=4
,timeout
:int
=4
*) →FilePathList
No tests found for download_google_images
. To contribute a test please refer to this guide and this discussion.
Search for n_images
images on Google, matching search_term
and size
requirements, download them into path
/search_term
and verify them, using max_workers
threads.
After populating images with ImageDownloader
, you can get a an ImageDataBunch
by calling ImageDataBunch.from_folder(path, size=size)
, or using the data block API.
# Setup path and labels to search for
path = Config.data_path()/'image_downloader'
labels = ['boston terrier', 'french bulldog']
# Download images
for label in labels:
download_google_images(path, label, size='>400*300', n_images=50)
# Build a databunch and train!
src = (ImageList.from_folder(path)
.split_by_rand_pct()
.label_from_folder()
.transform(get_transforms(), size=224))
db = src.databunch(bs=16, num_workers=0)
learn = cnn_learner(db, models.resnet34, metrics=[accuracy])
learn.fit_one_cycle(3)
epoch | train_loss | valid_loss | accuracy |
---|---|---|---|
1 | 1.161491 | 0.424679 | 0.807692 |
2 | 0.751288 | 0.086240 | 0.961538 |
3 | 0.523341 | 0.066993 | 1.000000 |
To fetch more than a hundred images, ImageDownloader
uses selenium
and chromedriver
to scroll through the Google Images search results page and scrape image URLs. They're not required as dependencies by default. If you don't have them installed on your system, the widget will show you an error message.
To install selenium
, just pip install selenium
in your fastai environment.
On a mac, you can install chromedriver
with brew cask install chromedriver
.
On Ubuntu Take a look at the latest Chromedriver version available, then something like:
wget https://chromedriver.storage.googleapis.com/2.45/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
Note that downloading under 100 images doesn't require any dependencies other than fastai itself, however downloading more than a hundred images uses selenium
and chromedriver
.
size
can be one of:
'>400*300'
'>640*480'
'>800*600'
'>1024*768'
'>2MP'
'>4MP'
'>6MP'
'>8MP'
'>10MP'
'>12MP'
'>15MP'
'>20MP'
'>40MP'
'>70MP'
show_doc(ImageCleaner.make_dropdown_widget)
make_dropdown_widget
[source][test]
make_dropdown_widget
(description
=*'Description'
,options
=['Label 1', 'Label 2']
,value
='Label 1'
,file_path
=None
,layout
=Layout()
,handler
=None
*)
No tests found for make_dropdown_widget
. To contribute a test please refer to this guide and this discussion.
Return a Dropdown widget with specified handler
.
show_doc(ImageCleaner.next_batch)
next_batch
[source][test]
next_batch
(_
)
No tests found for next_batch
. To contribute a test please refer to this guide and this discussion.
Handler for 'Next Batch' button click. Delete all flagged images and renders next batch.
show_doc(DatasetFormatter.sort_idxs)
sort_idxs
[source][test]
sort_idxs
(similarities
)
No tests found for sort_idxs
. To contribute a test please refer to this guide and this discussion.
Sorts similarities
and return the indexes in pairs ordered by highest similarity.
show_doc(ImageCleaner.make_vertical_box)
make_vertical_box
[source][test]
make_vertical_box
(children
,layout
=*Layout()
,duplicates
=False
*)
No tests found for make_vertical_box
. To contribute a test please refer to this guide and this discussion.
Make a vertical box with children
and layout
.
show_doc(ImageCleaner.relabel)
relabel
[source][test]
relabel
(change
)
No tests found for relabel
. To contribute a test please refer to this guide and this discussion.
Relabel images by moving from parent dir with old label class_old
to parent dir with new label class_new
.
show_doc(DatasetFormatter.largest_indices)
largest_indices
[source][test]
largest_indices
(arr
,n
)
No tests found for largest_indices
. To contribute a test please refer to this guide and this discussion.
Returns the n
largest indices from a numpy array arr
.
show_doc(ImageCleaner.delete_image)
delete_image
[source][test]
delete_image
(file_path
)
No tests found for delete_image
. To contribute a test please refer to this guide and this discussion.
show_doc(ImageCleaner.empty)
empty
[source][test]
empty
()
No tests found for empty
. To contribute a test please refer to this guide and this discussion.
show_doc(ImageCleaner.empty_batch)
empty_batch
[source][test]
empty_batch
()
No tests found for empty_batch
. To contribute a test please refer to this guide and this discussion.
show_doc(DatasetFormatter.comb_similarity)
comb_similarity
[source][test]
comb_similarity
(t1
:Tensor
,t2
:Tensor
, ****kwargs
**)
No tests found for comb_similarity
. To contribute a test please refer to this guide and this discussion.
Computes the similarity function between each embedding of t1
and t2
matrices.
show_doc(ImageCleaner.get_widgets)
get_widgets
[source][test]
get_widgets
(duplicates
)
No tests found for get_widgets
. To contribute a test please refer to this guide and this discussion.
Create and format widget set.
show_doc(ImageCleaner.write_csv)
write_csv
[source][test]
write_csv
()
No tests found for write_csv
. To contribute a test please refer to this guide and this discussion.
show_doc(ImageCleaner.create_image_list)
create_image_list
[source][test]
create_image_list
(dataset
,fns_idxs
)
No tests found for create_image_list
. To contribute a test please refer to this guide and this discussion.
Create a list of images, filenames and labels but first removing files that are not supposed to be displayed.
show_doc(ImageCleaner.render)
render
[source][test]
render
()
No tests found for render
. To contribute a test please refer to this guide and this discussion.
Re-render Jupyter cell for batch of images.
show_doc(DatasetFormatter.get_similars_idxs)
get_similars_idxs
[source][test]
get_similars_idxs
(learn
,layer_ls
, ****kwargs
**)
No tests found for get_similars_idxs
. To contribute a test please refer to this guide and this discussion.
Gets the indices for the most similar images in ds_type
dataset
show_doc(ImageCleaner.on_delete)
on_delete
[source][test]
on_delete
(btn
)
No tests found for on_delete
. To contribute a test please refer to this guide and this discussion.
Flag this image as delete or keep.
show_doc(ImageCleaner.make_button_widget)
make_button_widget
[source][test]
make_button_widget
(label
,file_path
=*None
,handler
=None
,style
=None
,layout
=Layout(width='auto')
*)
No tests found for make_button_widget
. To contribute a test please refer to this guide and this discussion.
Return a Button widget with specified handler
.
show_doc(ImageCleaner.make_img_widget)
make_img_widget
[source][test]
make_img_widget
(img
,layout
=*Layout()
,format
='jpg'
*)
No tests found for make_img_widget
. To contribute a test please refer to this guide and this discussion.
Returns an image widget for specified file name img
.
show_doc(DatasetFormatter.get_actns)
get_actns
[source][test]
get_actns
(learn
,hook
:Hook
,dl
:DataLoader
,pool
=*'AdaptiveConcatPool2d'
,pool_dim
:int
=4
, ***kwargs
**)
No tests found for get_actns
. To contribute a test please refer to this guide and this discussion.
Gets activations at the layer specified by hook
, applies pool
of dim pool_dim
and concatenates
show_doc(ImageCleaner.batch_contains_deleted)
batch_contains_deleted
[source][test]
batch_contains_deleted
()
No tests found for batch_contains_deleted
. To contribute a test please refer to this guide and this discussion.
Check if current batch contains already deleted images.
show_doc(ImageCleaner.make_horizontal_box)
make_horizontal_box
[source][test]
make_horizontal_box
(children
,layout
=*Layout()
*)
No tests found for make_horizontal_box
. To contribute a test please refer to this guide and this discussion.
Make a horizontal box with children
and layout
.
show_doc(DatasetFormatter.get_toplosses_idxs)
get_toplosses_idxs
[source][test]
get_toplosses_idxs
(learn
,n_imgs
, ****kwargs
**)
No tests found for get_toplosses_idxs
. To contribute a test please refer to this guide and this discussion.
Sorts ds_type
dataset by top losses and returns dataset and sorted indices.
show_doc(DatasetFormatter.padded_ds)
padded_ds
[source][test]
padded_ds
(ll_input
,size
=*(250, 300)
,resize_method
=<ResizeMethod.CROP: 1>
,padding_mode
='zeros'
, ***kwargs
**)
No tests found for padded_ds
. To contribute a test please refer to this guide and this discussion.
For a LabelList ll_input
, resize each image to size
using resize_method
and padding_mode
.