Notebook

Computer vision data¶

In [ ]:

%matplotlib inline
from fastai.gen_doc.nbdoc import *
from fastai.vision import * 

This module contains the classes that define datasets handling Image objects and their transformations. As usual, we'll start with a quick overview, before we get in to the detailed API docs.

Before any work can be done a dataset needs to be converted into a DataBunch object, and in the case of the computer vision data - specifically into an ImageDataBunch subclass.

This is done with the help of data block API and the ImageList class and its subclasses.

However, there is also a group of shortcut methods provided by ImageDataBunch which reduce the multiple stages of the data block API, into a single wrapper method. These shortcuts methods work really well for:

Imagenet-style of datasets (ImageDataBunch.from_folder)
A pandas DataFrame with a column of filenames and a column of labels which can be strings for classification, strings separated by a label_delim for multi-classification or floats for a regression problem (ImageDataBunch.from_df)
A csv file with the same format as above (ImageDataBunch.from_csv)
A list of filenames and a list of targets (ImageDataBunch.from_lists)
A list of filenames and a function to get the target from the filename (ImageDataBunch.from_name_func)
A list of filenames and a regex pattern to get the target from the filename (ImageDataBunch.from_name_re)

In the last five factory methods, a random split is performed between train and validation, in the first one it can be a random split or a separation from a training and a validation folder.

If you're just starting out you may choose to experiment with these shortcut methods, as they are also used in the first lessons of the fastai deep learning course. However, you can completely skip them and start building your code using the data block API from the very beginning. Internally, these shortcuts use this API anyway.

The first part of this document is dedicated to the shortcut ImageDataBunch factory methods. Then all the other computer vision data-specific methods that are used with the data block API are presented.

Quickly get your data ready for training¶

To get you started as easily as possible, the fastai provides two helper functions to create a DataBunch object that you can directly use for training a classifier. To demonstrate them you'll first need to download and untar the file by executing the following cell. This will create a data folder containing an MNIST subset in data/mnist_sample.

In [ ]:

path = untar_data(URLs.MNIST_SAMPLE); path

Out[ ]:

PosixPath('/home/ubuntu/.fastai/data/mnist_sample')

There are a number of ways to create an ImageDataBunch. One common approach is to use Imagenet-style folders (see a ways down the page below for details) with ImageDataBunch.from_folder:

In [ ]:

tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=24)

Here the datasets will be automatically created in the structure of Imagenet-style folders. The parameters specified:

the transforms to apply to the images in ds_tfms (here with do_flip=False because we don't want to flip numbers),
the target size of our pictures (here 24).

As with all DataBunch usage, a train_dl and a valid_dl are created that are of the type PyTorch DataLoader.

If you want to have a look at a few images inside a batch, you can use DataBunch.show_batch. The rows argument is the number of rows and columns to display.

In [ ]:

data.show_batch(rows=3, figsize=(5,5))

The second way to define the data for a classifier requires a structure like this:

path\
  train\
  test\
  labels.csv

where the labels.csv file defines the label(s) of each image in the training set. This is the format you will need to use when each image can have multiple labels. It also works with single labels:

In [ ]:

pd.read_csv(path/'labels.csv').head()

Out[ ]:

	name	label
0	train/3/7463.png	0
1	train/3/21102.png	0
2	train/3/31559.png	0
3	train/3/46882.png	0
4	train/3/26209.png	0

You can then use ImageDataBunch.from_csv:

In [ ]:

data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)

In [ ]:

data.show_batch(rows=3, figsize=(5,5))

An example of multiclassification can be downloaded with the following cell. It's a sample of the planet dataset.

In [ ]:

planet = untar_data(URLs.PLANET_SAMPLE)

If we open the labels files, we seach that each image has one or more tags, separated by a space.

In [ ]:

df = pd.read_csv(planet/'labels.csv')
df.head()

Out[ ]:

	image_name	tags
0	train_21983	partly_cloudy primary
1	train_9516	clear cultivation primary water
2	train_12664	haze primary
3	train_36960	clear primary
4	train_5302	haze primary road

In [ ]:

data = ImageDataBunch.from_csv(planet, folder='train', size=128, suffix='.jpg', label_delim=' ',
    ds_tfms=get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.))

The show_batchmethod will then print all the labels that correspond to each image.

In [ ]:

data.show_batch(rows=3, figsize=(10,8), ds_type=DatasetType.Valid)

You can find more ways to build an ImageDataBunch without the factory methods in data_block.

In [ ]:

show_doc(ImageDataBunch)

`class` `ImageDataBunch`[source][test]

ImageDataBunch(train_dl:DataLoader, valid_dl:DataLoader, fix_dl:DataLoader=*None, test_dl:Optional[DataLoader]=None, device:device=None, dl_tfms:Optional[Collection[Callable]]=None, path:PathOrStr='.', collate_fn:Callable='data_collate', no_check:bool=False*) :: DataBunch

Tests found for ImageDataBunch:

Some other tests where ImageDataBunch is used:

pytest -sv tests/test_vision_data.py::test_clean_tear_down [source]
pytest -sv tests/test_vision_data.py::test_denormalize [source]
pytest -sv tests/test_vision_data.py::test_from_csv_and_from_df [source]
pytest -sv tests/test_vision_data.py::test_from_folder [source]
pytest -sv tests/test_vision_data.py::test_from_lists [source]
pytest -sv tests/test_vision_data.py::test_from_name_re [source]
pytest -sv tests/test_vision_data.py::test_image_resize [source]
pytest -sv tests/test_vision_data.py::test_multi_iter [source]
pytest -sv tests/test_vision_data.py::test_multi_iter_broken [source]
pytest -sv tests/test_vision_data.py::test_normalize [source]
pytest -sv tests/test_vision_data.py::test_path_can_be_str_type [source]

To run tests please refer to this guide.

DataBunch suitable for computer vision.

This is the same initialization as a regular DataBunch so you probably don't want to use this directly, but one of the factory methods instead.

Factory methods¶

If you quickly want to get a ImageDataBunch and train a model, you should process your data to have it in one of the formats the following functions handle.

In [ ]:

show_doc(ImageDataBunch.from_folder)

`from_folder`[source][test]

from_folder(path:PathOrStr, train:PathOrStr=*'train', valid:PathOrStr='valid', valid_pct=None, classes:Collection[T_co]=None, ***kwargs**:Any) → ImageDataBunch

Tests found for from_folder:

pytest -sv tests/test_vision_data.py::test_from_folder [source]

Some other tests where from_folder is used:

pytest -sv tests/test_vision_data.py::test_camvid [source]
pytest -sv tests/test_vision_data.py::test_clean_tear_down [source]
pytest -sv tests/test_vision_data.py::test_coco [source]
pytest -sv tests/test_vision_data.py::test_coco_pickle [source]
pytest -sv tests/test_vision_data.py::test_coco_same_size [source]
pytest -sv tests/test_vision_data.py::test_denormalize [source]
pytest -sv tests/test_vision_data.py::test_image_resize [source]
pytest -sv tests/test_vision_data.py::test_image_to_image_different_tfms [source]
pytest -sv tests/test_vision_data.py::test_image_to_image_different_y_size [source]
pytest -sv tests/test_vision_data.py::test_multi_iter [source]
pytest -sv tests/test_vision_data.py::test_multi_iter_broken [source]
pytest -sv tests/test_vision_data.py::test_normalize [source]
pytest -sv tests/test_vision_data.py::test_points [source]
pytest -sv tests/test_vision_data.py::test_vision_datasets [source]

To run tests please refer to this guide.

Create from imagenet style dataset in path with train,valid,test subfolders (or provide valid_pct).

Refer to create_from_ll to see all the **kwargs arguments.

"Imagenet-style" datasets look something like this (note that the test folder is optional):

path\
  train\
    clas1\
    clas2\
    ...
  valid\
    clas1\
    clas2\
    ...
  test\

For example:

In [ ]:

data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=24)

Note that this (and all factory methods in this section) pass any kwargs to DataBunch.create.

In [ ]:

show_doc(ImageDataBunch.from_csv)

`from_csv`[source][test]

from_csv(path:PathOrStr, folder:PathOrStr=*None, label_delim:str=None, csv_labels:PathOrStr='labels.csv', valid_pct:float=0.2, fn_col:int=0, label_col:int=1, suffix:str='', delimiter:str=None, header:Union[int, str, NoneType]='infer', ***kwargs**:Any) → ImageDataBunch

Tests found for from_csv:

pytest -sv tests/test_vision_data.py::test_from_csv_and_from_df [source]
pytest -sv tests/test_vision_data.py::test_path_can_be_str_type [source]

Some other tests where from_csv is used:

pytest -sv tests/test_vision_data.py::test_multi [source]

To run tests please refer to this guide.

Create from a csv file in path/csv_labels.

Refer to create_from_ll to see all the **kwargs arguments.

Create an ImageDataBunch from path by splitting the data in folder and labelled in a file csv_labels between a training and validation set. Use valid_pct to indicate the percentage of the total images to use as the validation set. An optional test folder contains unlabelled data and suffix contains an optional suffix to add to the filenames in csv_labels (such as '.jpg'). fn_col is the index (or the name) of the the column containing the filenames and label_col is the index (indices) (or the name(s)) of the column(s) containing the labels. Use header to specify the format of the csv header, and delimiter to specify a non-standard csv-field separator. In case your csv has no header, column parameters can only be specified as indices. If label_delim is passed, split what's in the label column according to that separator.

For example:

In [ ]:

data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=24);

In [ ]:

show_doc(ImageDataBunch.from_df)

`from_df`[source][test]

from_df(path:PathOrStr, df:DataFrame, folder:PathOrStr=*None, label_delim:str=None, valid_pct:float=0.2, fn_col:IntsOrStrs=0, label_col:IntsOrStrs=1, suffix:str='', ***kwargs**:Any) → ImageDataBunch

Tests found for from_df:

pytest -sv tests/test_vision_data.py::test_from_csv_and_from_df [source]

To run tests please refer to this guide.

Create from a DataFrame df.

Refer to create_from_ll to see all the **kwargs arguments.

Same as ImageDataBunch.from_csv, but passing in a DataFrame instead of a csv file. e.g

In [ ]:

df = pd.read_csv(path/'labels.csv', header='infer')
df.head()

Out[ ]:

	name	label
0	train/3/7463.png	0
1	train/3/21102.png	0
2	train/3/31559.png	0
3	train/3/46882.png	0
4	train/3/26209.png	0

In [ ]:

data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)

Different datasets are labeled in many different ways. The following methods can help extract the labels from the dataset in a wide variety of situations. The way they are built in fastai is constructive: there are methods which do a lot for you but apply in specific circumstances and there are methods which do less for you but give you more flexibility.

In this case the hierarchy is:

ImageDataBunch.from_name_re: Gets the labels from the filenames using a regular expression
ImageDataBunch.from_name_func: Gets the labels from the filenames using any function
ImageDataBunch.from_lists: Labels need to be provided as an input in a list

In [ ]:

show_doc(ImageDataBunch.from_name_re)

`from_name_re`[source][test]

from_name_re(path:PathOrStr, fnames:FilePathList, pat:str, valid_pct:float=*0.2, ***kwargs**)

Tests found for from_name_re:

pytest -sv tests/test_vision_data.py::test_from_name_re [source]
pytest -sv tests/test_vision_data.py::test_image_resize [source]

To run tests please refer to this guide.

Create from list of fnames in path with re expression pat.

Refer to create_from_ll to see all the **kwargs arguments.

Creates an ImageDataBunch from fnames, calling a regular expression (containing one re group) on the file names to get the labels, putting aside valid_pct for the validation. In the same way as ImageDataBunch.from_csv, an optional test folder contains unlabelled data.

Our previously created dataframe contains the labels in the filenames so we can leverage it to test this new method. ImageDataBunch.from_name_re needs the exact path of each file so we will append the data path to each filename before creating our ImageDataBunch object.

In [ ]:

fn_paths = [path/name for name in df['name']]; fn_paths[:2]

Out[ ]:

[PosixPath('/home/ubuntu/.fastai/data/mnist_sample/train/3/7463.png'),
 PosixPath('/home/ubuntu/.fastai/data/mnist_sample/train/3/21102.png')]

In [ ]:

pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)

In [ ]:

data.classes

Out[ ]:

['3', '7']

In [ ]:

show_doc(ImageDataBunch.from_name_func)

`from_name_func`[source][test]

from_name_func(path:PathOrStr, fnames:FilePathList, label_func:Callable, valid_pct:float=*0.2, ***kwargs**)

No tests found for from_name_func. To contribute a test please refer to this guide and this discussion.

Create from list of fnames in path with label_func.

Refer to create_from_ll to see all the **kwargs arguments.

Works in the same way as ImageDataBunch.from_name_re, but instead of a regular expression it expects a function that will determine how to extract the labels from the filenames. (Note that from_name_re uses this function in its implementation).

To test it we could build a function with our previous regex. Let's try another, similar approach to show that the labels can be obtained in a different way.

In [ ]:

def get_labels(file_path): return '3' if '/3/' in str(file_path) else '7'
data = ImageDataBunch.from_name_func(path, fn_paths, label_func=get_labels, ds_tfms=tfms, size=24)
data.classes

Out[ ]:

['3', '7']

In [ ]:

show_doc(ImageDataBunch.from_lists)

`from_lists`[source][test]

from_lists(path:PathOrStr, fnames:FilePathList, labels:StrList, valid_pct:float=*0.2, item_cls:Callable=None, ***kwargs**)

Tests found for from_lists:

pytest -sv tests/test_vision_data.py::test_from_lists [source]

To run tests please refer to this guide.

Create from list of fnames in path.

Refer to create_from_ll to see all the **kwargs arguments.

The most flexible factory function; pass in a list of labels that correspond to each of the filenames in fnames.

To show an example we have to build the labels list outside our ImageDataBunch object and give it as an argument when we call from_lists. Let's use our previously created function to create our labels list.

In [ ]:

labels_ls = list(map(get_labels, fn_paths))
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels_ls, ds_tfms=tfms, size=24)
data.classes

Out[ ]:

['3', '7']

In [ ]:

show_doc(ImageDataBunch.create_from_ll)

`create_from_ll`[source][test]

create_from_ll(lls:LabelLists, bs:int=*64, val_bs:int=None, ds_tfms:Union[Callable, Collection[Callable], NoneType]=None, num_workers:int=4, dl_tfms:Optional[Collection[Callable]]=None, device:device=None, test:Union[Path, str, NoneType]=None, collate_fn:Callable='data_collate', size:int=None, no_check:bool=False, resize_method:ResizeMethod=None, mult:int=None, padding_mode:str='reflection', mode:str='bilinear', tfm_y:bool=False*) → ImageDataBunch

No tests found for create_from_ll. To contribute a test please refer to this guide and this discussion.

Create an ImageDataBunch from LabelLists lls with potential ds_tfms.

Use bs, num_workers, collate_fn and a potential test folder. ds_tfms is a tuple of two lists of transforms to be applied to the training and the validation (plus test optionally) set. tfms are the transforms to apply to the DataLoader. The size and the kwargs are passed to the transforms for data augmentation.

In [ ]:

show_doc(ImageDataBunch.single_from_classes)

`single_from_classes`[source][test]

single_from_classes(path:PathOrStr, classes:StrList, ds_tfms:Union[Callable, Collection[Callable]]=*None, ***kwargs**)

No tests found for single_from_classes. To contribute a test please refer to this guide and this discussion.

Create an empty ImageDataBunch in path with classes. Typically used for inference.

In [ ]:

jekyll_note('This method is deprecated, you should use DataBunch.load_empty now.')

Note: This method is deprecated, you should use DataBunch.load_empty now.

Other methods¶

In the next few methods we will use another dataset, CIFAR. This is because the second method will get the statistics for our dataset and we want to be able to show different statistics per channel. If we were to use MNIST, these statistics would be the same for every channel. White pixels are [255,255,255] and black pixels are [0,0,0] (or in normalized form [1,1,1] and [0,0,0]) so there is no variance between channels.

In [ ]:

path = untar_data(URLs.CIFAR); path

Out[ ]:

PosixPath('/home/ubuntu/.fastai/data/cifar10')

In [ ]:

show_doc(channel_view)

`channel_view`[source][test]

channel_view(x:Tensor) → Tensor

No tests found for channel_view. To contribute a test please refer to this guide and this discussion.

Make channel the first axis of x and flatten remaining axes

In [ ]:

data = ImageDataBunch.from_folder(path, ds_tfms=tfms, valid='test', size=24)

In [ ]:

def channel_view(x:Tensor)->Tensor:
    "Make channel the first axis of `x` and flatten remaining axes"
    return x.transpose(0,1).contiguous().view(x.shape[1],-1) 

This function takes a tensor and flattens all dimensions except the channels, which it keeps as the first axis. This function is used to feed ImageDataBunch.batch_stats so that it can get the pixel statistics of a whole batch.

Let's take as an example the dimensions our MNIST batches: 128, 3, 24, 24.

In [ ]:

t = torch.Tensor(128, 3, 24, 24)

In [ ]:

t.size()

Out[ ]:

torch.Size([128, 3, 24, 24])

In [ ]:

tensor = channel_view(t)

In [ ]:

tensor.size()

Out[ ]:

torch.Size([3, 73728])

In [ ]:

show_doc(ImageDataBunch.batch_stats)

`batch_stats`[source][test]

batch_stats(funcs:Collection[Callable]=*None, ds_type:DatasetType=<DatasetType.Train: 1>*) → Tensor

No tests found for batch_stats. To contribute a test please refer to this guide and this discussion.

Grab a batch of data and call reduction function func per channel

In [ ]:

data.batch_stats()

Out[ ]:

[tensor([0.4928, 0.4767, 0.4671]), tensor([0.2677, 0.2631, 0.2630])]

In [ ]:

show_doc(ImageDataBunch.normalize)

`normalize`[source][test]

normalize(stats:Collection[Tensor]=*None, do_x:bool=True, do_y:bool=False*)

Tests found for normalize:

pytest -sv tests/test_vision_data.py::test_normalize [source]

Some other tests where normalize is used:

pytest -sv tests/test_vision_data.py::test_clean_tear_down [source]
pytest -sv tests/test_vision_data.py::test_denormalize [source]
pytest -sv tests/test_vision_data.py::test_multi_iter [source]

To run tests please refer to this guide.

Add normalize transform using stats (defaults to DataBunch.batch_stats)

In the fast.ai library we have imagenet_stats, cifar_stats and mnist_stats so we can add normalization easily with any of these datasets. Let's see an example with our dataset of choice: MNIST.

In [ ]:

data.normalize(cifar_stats)

Out[ ]:

ImageDataBunch;

Train: LabelList
y: CategoryList (50000 items)
[Category truck, Category truck, Category truck, Category truck, Category truck]...
Path: /home/ubuntu/.fastai/data/cifar10
x: ImageList (50000 items)
[Image (3, 32, 32), Image (3, 32, 32), Image (3, 32, 32), Image (3, 32, 32), Image (3, 32, 32)]...
Path: /home/ubuntu/.fastai/data/cifar10;

Valid: LabelList
y: CategoryList (10000 items)
[Category truck, Category truck, Category truck, Category truck, Category truck]...
Path: /home/ubuntu/.fastai/data/cifar10
x: ImageList (10000 items)
[Image (3, 32, 32), Image (3, 32, 32), Image (3, 32, 32), Image (3, 32, 32), Image (3, 32, 32)]...
Path: /home/ubuntu/.fastai/data/cifar10;

Test: None

In [ ]:

data.batch_stats()

Out[ ]:

[tensor([ 0.0074, -0.0219,  0.0769]), tensor([1.0836, 1.0829, 1.0078])]

Data normalization¶

You may also want to normalize your data, which can be done by using the following functions.

In [ ]:

show_doc(normalize)

`normalize`[source][test]

normalize(x:Tensor, mean:FloatTensor, std:FloatTensor) → Tensor

Tests found for normalize:

Some other tests where normalize is used:

pytest -sv tests/test_vision_data.py::test_clean_tear_down [source]
pytest -sv tests/test_vision_data.py::test_denormalize [source]
pytest -sv tests/test_vision_data.py::test_multi_iter [source]
pytest -sv tests/test_vision_data.py::test_normalize [source]

To run tests please refer to this guide.

Normalize x with mean and std.

In [ ]:

show_doc(denormalize)

`denormalize`[source][test]

denormalize(x:Tensor, mean:FloatTensor, std:FloatTensor, do_x:bool=*True*) → Tensor

Tests found for denormalize:

pytest -sv tests/test_vision_data.py::test_denormalize [source]

To run tests please refer to this guide.

Denormalize x with mean and std.

In [ ]:

show_doc(normalize_funcs)

`normalize_funcs`[source][test]

normalize_funcs(mean:FloatTensor, std:FloatTensor, do_x:bool=*True, do_y:bool=False*) → Tuple[Callable, Callable]

No tests found for normalize_funcs. To contribute a test please refer to this guide and this discussion.

Create normalize/denormalize func using mean and std, can specify do_y and device.

On MNIST the mean and std are 0.1307 and 0.3081 respectively (looked on Google). If you're using a pretrained model, you'll need to use the normalization that was used to train the model. The imagenet norm and denorm functions are stored as constants inside the library named imagenet_norm and imagenet_denorm. If you're training a model on CIFAR-10, you can also use cifar_norm and cifar_denorm.

You may sometimes see warnings about clipping input data when plotting normalized data. That's because even although it's denormalized when plotting automatically, sometimes floating point errors may make some values slightly out or the correct range. You can safely ignore these warnings in this case.

In [ ]:

data = ImageDataBunch.from_folder(untar_data(URLs.MNIST_SAMPLE),
                                  ds_tfms=tfms, size=24)
data.normalize()
data.show_batch(rows=3, figsize=(6,6))

In [ ]:

show_doc(get_annotations)

`get_annotations`[source][test]

get_annotations(fname, prefix=*None*)

Tests found for get_annotations:

Some other tests where get_annotations is used:

pytest -sv tests/test_vision_data.py::test_coco [source]
pytest -sv tests/test_vision_data.py::test_coco_pickle [source]
pytest -sv tests/test_vision_data.py::test_points [source]

To run tests please refer to this guide.

Open a COCO style json in fname and returns the lists of filenames (with maybe prefix) and labelled bboxes.

To use this dataset and collate samples into batches, you'll need to following function:

In [ ]:

show_doc(bb_pad_collate)

`bb_pad_collate`[source][test]

bb_pad_collate(samples:BatchSamples, pad_idx:int=*0*) → Tuple[FloatTensor, Tuple[LongTensor, LongTensor]]

No tests found for bb_pad_collate. To contribute a test please refer to this guide and this discussion.

Function that collect samples of labelled bboxes and adds padding with pad_idx.

Finally, to apply transformations to Image in a Dataset, we use this last class.

ItemList specific to vision¶

The vision application adds a few subclasses of ItemList specific to images.

In [ ]:

show_doc(ImageList, title_level=3)

`class` `ImageList`[source][test]

ImageList(***args, convert_mode='RGB', after_open:Callable=None, **kwargs**) :: ItemList

Tests found for ImageList:

Some other tests where ImageList is used:

pytest -sv tests/test_vision_data.py::test_image_resize [source]
pytest -sv tests/test_vision_data.py::test_multi [source]
pytest -sv tests/test_vision_data.py::test_vision_datasets [source]

To run tests please refer to this guide.

ItemList suitable for computer vision.

Create a ItemList in path from filenames in items. create_func will default to open_image. label_cls can be specified for the labels, xtra contains any extra information (usually in the form of a dataframe) and processor is applied to the ItemList after splitting and labelling.

In [ ]:

show_doc(ImageList.from_folder)

`from_folder`[source][test]

from_folder(path:PathOrStr=*'.', extensions:StrList=None, ***kwargs**) → ItemList

Tests found for from_folder:

pytest -sv tests/test_vision_data.py::test_vision_datasets [source]

Some other tests where from_folder is used:

pytest -sv tests/test_vision_data.py::test_camvid [source]
pytest -sv tests/test_vision_data.py::test_clean_tear_down [source]
pytest -sv tests/test_vision_data.py::test_coco [source]
pytest -sv tests/test_vision_data.py::test_coco_pickle [source]
pytest -sv tests/test_vision_data.py::test_coco_same_size [source]
pytest -sv tests/test_vision_data.py::test_denormalize [source]
pytest -sv tests/test_vision_data.py::test_from_folder [source]
pytest -sv tests/test_vision_data.py::test_image_resize [source]
pytest -sv tests/test_vision_data.py::test_image_to_image_different_tfms [source]
pytest -sv tests/test_vision_data.py::test_image_to_image_different_y_size [source]
pytest -sv tests/test_vision_data.py::test_multi_iter [source]
pytest -sv tests/test_vision_data.py::test_multi_iter_broken [source]
pytest -sv tests/test_vision_data.py::test_normalize [source]
pytest -sv tests/test_vision_data.py::test_points [source]

To run tests please refer to this guide.

Get the list of files in path that have an image suffix. recurse determines if we search subfolders.

In [ ]:

show_doc(ImageList.from_df)

`from_df`[source][test]

from_df(df:DataFrame, path:PathOrStr, cols:IntsOrStrs=*0, folder:PathOrStr=None, suffix:str='', ***kwargs**) → ItemList

Tests found for from_df:

Some other tests where from_df is used:

pytest -sv tests/test_vision_data.py::test_from_csv_and_from_df [source]

To run tests please refer to this guide.

Get the filenames in cols of df with folder in front of them, suffix at the end.

In [ ]:

show_doc(get_image_files)

`get_image_files`[source][test]

get_image_files(c:PathOrStr, check_ext:bool=*True, recurse=False*) → FilePathList

Tests found for get_image_files:

Some other tests where get_image_files is used:

pytest -sv tests/test_vision_data.py::test_from_name_re [source]
pytest -sv tests/test_vision_data.py::test_image_resize [source]

To run tests please refer to this guide.

Return list of files in c that are images. check_ext will filter to image_extensions.

In [ ]:

show_doc(ImageList.open)

`open`[source][test]

open(fn)

Tests found for open:

Some other tests where open is used:

pytest -sv tests/test_vision_data.py::test_download_images [source]
pytest -sv tests/test_vision_data.py::test_verify_image [source]
pytest -sv tests/test_vision_data.py::test_verify_images [source]
pytest -sv tests/test_vision_data.py::test_vision_pil2tensor [source]
pytest -sv tests/test_vision_data.py::test_vision_pil2tensor_16bit [source]

To run tests please refer to this guide.

Open image in fn, subclass and overwrite for custom behavior.

In [ ]:

show_doc(ImageList.show_xys)

`show_xys`[source][test]

show_xys(xs, ys, imgsize:int=*4, figsize:Optional[Tuple[int, int]]=None, ***kwargs**)

No tests found for show_xys. To contribute a test please refer to this guide and this discussion.

Show the xs (inputs) and ys (targets) on a figure of figsize.

In [ ]:

show_doc(ImageList.show_xyzs)

`show_xyzs`[source][test]

show_xyzs(xs, ys, zs, imgsize:int=*4, figsize:Optional[Tuple[int, int]]=None, ***kwargs**)

No tests found for show_xyzs. To contribute a test please refer to this guide and this discussion.

Show xs (inputs), ys (targets) and zs (predictions) on a figure of figsize.

In [ ]:

show_doc(ObjectCategoryList, title_level=3)

`class` `ObjectCategoryList`[source][test]

ObjectCategoryList(items:Iterator[T_co], classes:Collection[T_co]=*None, label_delim:str=None, one_hot:bool=False, ***kwargs**) :: MultiCategoryList

No tests found for ObjectCategoryList. To contribute a test please refer to this guide and this discussion.

ItemList for labelled bounding boxes.

In [ ]:

show_doc(ObjectItemList, title_level=3)

`class` `ObjectItemList`[source][test]

ObjectItemList(***args, convert_mode='RGB', after_open:Callable=None, **kwargs**) :: ImageList

Tests found for ObjectItemList:

pytest -sv tests/test_vision_data.py::test_coco [source]
pytest -sv tests/test_vision_data.py::test_coco_pickle [source]
pytest -sv tests/test_vision_data.py::test_coco_same_size [source]

To run tests please refer to this guide.

ItemList suitable for object detection.

In [ ]:

show_doc(SegmentationItemList, title_level=3)

`class` `SegmentationItemList`[source][test]

SegmentationItemList(***args, convert_mode='RGB', after_open:Callable=None, **kwargs**) :: ImageList

Tests found for SegmentationItemList:

pytest -sv tests/test_vision_data.py::test_camvid [source]

To run tests please refer to this guide.

ItemList suitable for segmentation tasks.

In [ ]:

show_doc(SegmentationLabelList, title_level=3)

`class` `SegmentationLabelList`[source][test]

SegmentationLabelList(items:Iterator[T_co], classes:Collection[T_co]=*None, ***kwargs**) :: ImageList

No tests found for SegmentationLabelList. To contribute a test please refer to this guide and this discussion.

ItemList for segmentation masks.

In [ ]:

show_doc(PointsLabelList, title_level=3)

`class` `PointsLabelList`[source][test]

PointsLabelList(items:Iterator[T_co], path:PathOrStr=*'.', label_cls:Callable=None, inner_df:Any=None, processor:Union[PreProcessor, Collection[PreProcessor]]=None, x:ItemList=None, ignore_empty:bool=False*) :: ItemList

No tests found for PointsLabelList. To contribute a test please refer to this guide and this discussion.

ItemList for points.

In [ ]:

show_doc(PointsItemList, title_level=3)

`class` `PointsItemList`[source][test]

PointsItemList(***args, convert_mode='RGB', after_open:Callable=None, **kwargs**) :: ImageList

Tests found for PointsItemList:

pytest -sv tests/test_vision_data.py::test_points [source]

To run tests please refer to this guide.

ItemList for Image to ImagePoints tasks.

In [ ]:

show_doc(ImageImageList, title_level=3)

`class` `ImageImageList`[source][test]

ImageImageList(***args, convert_mode='RGB', after_open:Callable=None, **kwargs**) :: ImageList

Tests found for ImageImageList:

Some other tests where ImageImageList is used:

pytest -sv tests/test_vision_data.py::test_image_to_image_different_tfms [source]
pytest -sv tests/test_vision_data.py::test_image_to_image_different_y_size [source]

To run tests please refer to this guide.

ItemList suitable for Image to Image tasks.

Building your own dataset¶

This module also contains a few helper functions to allow you to build you own dataset for image classification.

In [ ]:

show_doc(download_images)

`download_images`[source][test]

download_images(urls:StrList, dest:PathOrStr, max_pics:int=*1000, max_workers:int=8, timeout=4*)

Tests found for download_images:

pytest -sv tests/test_vision_data.py::test_download_images [source]

To run tests please refer to this guide.

Download images listed in text file urls to path dest, at most max_pics

In [ ]:

show_doc(verify_images)

`verify_images`[source][test]

verify_images(path:PathOrStr, delete:bool=*True, max_workers:int=4, max_size:int=None, recurse:bool=False, dest:PathOrStr='.', n_channels:int=3, interp=2, ext:str=None, img_format:str=None, resume:bool=None, ***kwargs**)

Tests found for verify_images:

pytest -sv tests/test_vision_data.py::test_verify_images [source]

To run tests please refer to this guide.

Check if the images in path aren't broken, maybe resize them and copy it in dest.

It will try if every image in this folder can be opened and has n_channels. If n_channels is 3 – it'll try to convert image to RGB. If delete=True, it'll be removed it this fails. If resume – it will skip already existent images in dest. If max_size is specified, image is resized to the same ratio so that both sizes are less than max_size, using interp. Result is stored in dest, ext forces an extension type, img_format and kwargs are passed to PIL.Image.save. Use max_workers CPUs.

Undocumented Methods - Methods moved below this line will intentionally be hidden¶

In [ ]:

show_doc(PointsItemList.get)

`get`[source][test]

get(i)

No tests found for get. To contribute a test please refer to this guide and this discussion.

Subclass if you want to customize how to create item i from self.items.

In [ ]:

show_doc(SegmentationLabelList.new)

`new`[source][test]

new(items:Iterator[T_co], processor:Union[PreProcessor, Collection[PreProcessor]]=*None, ***kwargs**) → ItemList

No tests found for new. To contribute a test please refer to this guide and this discussion.

Create a new ItemList from items, keeping the same attributes.

In [ ]:

show_doc(ImageList.from_csv)

`from_csv`[source][test]

from_csv(path:PathOrStr, csv_name:str, header:str=*'infer', ***kwargs**) → ItemList

Tests found for from_csv:

pytest -sv tests/test_vision_data.py::test_multi [source]

Some other tests where from_csv is used:

pytest -sv tests/test_vision_data.py::test_from_csv_and_from_df [source]
pytest -sv tests/test_vision_data.py::test_path_can_be_str_type [source]

To run tests please refer to this guide.

Get the filenames in path/csv_name opened with header.

In [ ]:

show_doc(ObjectCategoryList.get)

`get`[source][test]

get(i)

No tests found for get. To contribute a test please refer to this guide and this discussion.

Subclass if you want to customize how to create item i from self.items.

In [ ]:

show_doc(ImageList.get)

`get`[source][test]

get(i)

No tests found for get. To contribute a test please refer to this guide and this discussion.

Subclass if you want to customize how to create item i from self.items.

In [ ]:

show_doc(SegmentationLabelList.reconstruct)

`reconstruct`[source][test]

reconstruct(t:Tensor)

No tests found for reconstruct. To contribute a test please refer to this guide and this discussion.

Reconstruct one of the underlying item for its data t.

In [ ]:

show_doc(ImageImageList.show_xys)

`show_xys`[source][test]

show_xys(xs, ys, imgsize:int=*4, figsize:Optional[Tuple[int, int]]=None, ***kwargs**)

No tests found for show_xys. To contribute a test please refer to this guide and this discussion.

Show the xs (inputs) and ys(targets) on a figure of figsize.

In [ ]:

show_doc(ImageImageList.show_xyzs)

`show_xyzs`[source][test]

show_xyzs(xs, ys, zs, imgsize:int=*4, figsize:Optional[Tuple[int, int]]=None, ***kwargs**)

No tests found for show_xyzs. To contribute a test please refer to this guide and this discussion.

Show xs (inputs), ys (targets) and zs (predictions) on a figure of figsize.

In [ ]:

show_doc(ImageList.open)

`open`[source][test]

open(fn)

Tests found for open:

Some other tests where open is used:

pytest -sv tests/test_vision_data.py::test_download_images [source]
pytest -sv tests/test_vision_data.py::test_verify_image [source]
pytest -sv tests/test_vision_data.py::test_verify_images [source]
pytest -sv tests/test_vision_data.py::test_vision_pil2tensor [source]
pytest -sv tests/test_vision_data.py::test_vision_pil2tensor_16bit [source]

To run tests please refer to this guide.

Open image in fn, subclass and overwrite for custom behavior.

In [ ]:

show_doc(PointsItemList.analyze_pred)

`analyze_pred`[source][test]

analyze_pred(pred:Tensor)

No tests found for analyze_pred. To contribute a test please refer to this guide and this discussion.

Called on pred before reconstruct for additional preprocessing.

In [ ]:

show_doc(SegmentationLabelList.analyze_pred)

`analyze_pred`[source][test]

analyze_pred(pred, thresh:float=*0.5*)

No tests found for analyze_pred. To contribute a test please refer to this guide and this discussion.

Called on pred before reconstruct for additional preprocessing.

In [ ]:

show_doc(PointsItemList.reconstruct)

`reconstruct`[source][test]

reconstruct(t:Tensor)

No tests found for reconstruct. To contribute a test please refer to this guide and this discussion.

Reconstruct one of the underlying item for its data t.

In [ ]:

show_doc(SegmentationLabelList.open)

`open`[source][test]

open(fn)

Tests found for open:

Some other tests where open is used:

pytest -sv tests/test_vision_data.py::test_download_images [source]
pytest -sv tests/test_vision_data.py::test_verify_image [source]
pytest -sv tests/test_vision_data.py::test_verify_images [source]
pytest -sv tests/test_vision_data.py::test_vision_pil2tensor [source]
pytest -sv tests/test_vision_data.py::test_vision_pil2tensor_16bit [source]

To run tests please refer to this guide.

Open image in fn, subclass and overwrite for custom behavior.

In [ ]:

show_doc(ImageList.reconstruct)

`reconstruct`[source][test]

reconstruct(t:Tensor)

No tests found for reconstruct. To contribute a test please refer to this guide and this discussion.

Reconstruct one of the underlying item for its data t.

In [ ]:

show_doc(resize_to)

`resize_to`[source][test]

resize_to(img, targ_sz:int, use_min:bool=*False*)

No tests found for resize_to. To contribute a test please refer to this guide and this discussion.

Size to resize to, to hit targ_sz at same aspect ratio, in PIL coords (i.e w*h)

In [ ]:

show_doc(ObjectCategoryList.reconstruct)

`reconstruct`[source][test]

reconstruct(t, x)

No tests found for reconstruct. To contribute a test please refer to this guide and this discussion.

Reconstruct one of the underlying item for its data t.

In [ ]:

show_doc(PointsLabelList.reconstruct)

`reconstruct`[source][test]

reconstruct(t, x)

No tests found for reconstruct. To contribute a test please refer to this guide and this discussion.

Reconstruct one of the underlying item for its data t.

In [ ]:

show_doc(PointsLabelList.analyze_pred)

`analyze_pred`[source][test]

analyze_pred(pred, thresh:float=*0.5*)

No tests found for analyze_pred. To contribute a test please refer to this guide and this discussion.

Called on pred before reconstruct for additional preprocessing.

In [ ]:

show_doc(PointsLabelList.get)

`get`[source][test]

get(i)

No tests found for get. To contribute a test please refer to this guide and this discussion.

Subclass if you want to customize how to create item i from self.items.

New Methods - Please document or move to the undocumented section¶

In [ ]:

show_doc(ObjectCategoryList.analyze_pred)

`analyze_pred`[source][test]

analyze_pred(pred)

No tests found for analyze_pred. To contribute a test please refer to this guide and this discussion.

Called on pred before reconstruct for additional preprocessing.

Computer vision data¶

Quickly get your data ready for training¶

class ImageDataBunch[source][test]

Factory methods¶

from_folder[source][test]

from_csv[source][test]

from_df[source][test]

from_name_re[source][test]

from_name_func[source][test]

from_lists[source][test]

create_from_ll[source][test]

single_from_classes[source][test]

Other methods¶

channel_view[source][test]

batch_stats[source][test]

normalize[source][test]

Data normalization¶

normalize[source][test]

denormalize[source][test]

normalize_funcs[source][test]

get_annotations[source][test]

bb_pad_collate[source][test]

ItemList specific to vision¶

class ImageList[source][test]

from_folder[source][test]

from_df[source][test]

get_image_files[source][test]

open[source][test]

show_xys[source][test]

show_xyzs[source][test]

class ObjectCategoryList[source][test]

class ObjectItemList[source][test]

class SegmentationItemList[source][test]

class SegmentationLabelList[source][test]

class PointsLabelList[source][test]

class PointsItemList[source][test]

class ImageImageList[source][test]

Building your own dataset¶

download_images[source][test]

verify_images[source][test]

Undocumented Methods - Methods moved below this line will intentionally be hidden¶

get[source][test]

new[source][test]

from_csv[source][test]

get[source][test]

get[source][test]

reconstruct[source][test]

show_xys[source][test]

show_xyzs[source][test]

open[source][test]

analyze_pred[source][test]

analyze_pred[source][test]

reconstruct[source][test]

open[source][test]

reconstruct[source][test]

resize_to[source][test]

reconstruct[source][test]

reconstruct[source][test]

analyze_pred[source][test]

get[source][test]

New Methods - Please document or move to the undocumented section¶

analyze_pred[source][test]

`class` `ImageDataBunch`[source][test]

`from_folder`[source][test]

`from_csv`[source][test]

`from_df`[source][test]

`from_name_re`[source][test]

`from_name_func`[source][test]

`from_lists`[source][test]

`create_from_ll`[source][test]

`single_from_classes`[source][test]

`channel_view`[source][test]

`batch_stats`[source][test]

`normalize`[source][test]

`normalize`[source][test]

`denormalize`[source][test]

`normalize_funcs`[source][test]

`get_annotations`[source][test]

`bb_pad_collate`[source][test]

`class` `ImageList`[source][test]

`from_folder`[source][test]

`from_df`[source][test]

`get_image_files`[source][test]

`open`[source][test]

`show_xys`[source][test]

`show_xyzs`[source][test]

`class` `ObjectCategoryList`[source][test]

`class` `ObjectItemList`[source][test]

`class` `SegmentationItemList`[source][test]

`class` `SegmentationLabelList`[source][test]

`class` `PointsLabelList`[source][test]

`class` `PointsItemList`[source][test]

`class` `ImageImageList`[source][test]

`download_images`[source][test]

`verify_images`[source][test]

`get`[source][test]

`new`[source][test]

`from_csv`[source][test]

`get`[source][test]

`get`[source][test]

`reconstruct`[source][test]

`show_xys`[source][test]

`show_xyzs`[source][test]

`open`[source][test]

`analyze_pred`[source][test]

`analyze_pred`[source][test]

`reconstruct`[source][test]

`open`[source][test]

`reconstruct`[source][test]

`resize_to`[source][test]

`reconstruct`[source][test]

`reconstruct`[source][test]

`analyze_pred`[source][test]

`get`[source][test]

`analyze_pred`[source][test]