Notebook

Augmentation on multiple CPU cores¶

Augmentation can be a slow process, especially when working with large images and when combining many different augmentation techniques. Take a look at the performance documentation for some lower bounds on the expected single core performance using outdated hardware.

One way to improve performance is to augment simultaneously on multiple CPU cores. imgaug offers a native system to do that. It is based on roughly the following steps:

Split the dataset into batches. Each batch contains one or more images and the additional data associated with them, e.g. bounding boxes or segmentation maps. (Splitting the data can also be done on the fly using a generator.)
Start one or more child processes. Each of them runs on its own CPU core.
Send batches to the child processes. Try to distribute them equally over the child processes so that each of them has a similar amount of work to do.
Let the child processes augment the data.
Receive the augmented batches from the child processes.

A few important points can be derived from these steps. First, the data has to be split into batches. Second, combining all data into one batch and using multicore augmentation is pointless, as each individual batch is augmented by exactly one core. Third, using multicore augmentation for small amounts of data can also be pointless as starting the child processes might take up more time than simply augmenting the dataset on a single CPU core. (Though, you can re-use the child processes between epochs, so it might still pay off.)

Important: imgaug offers multicore features and it is recommended to use them for multicore augmentation. It is not recommended to execute imgaug in a custom-made multicore routine using e.g. python's multiprocessing library or by using the multiprocessing support of some deep learning libraries. Doing so runs a major risk to accidentally apply the same augmentations in each child worker (just to different images). If one still decides to build a custom implementation, make sure to call imgaug.seed(value) and augmenter.reseed(value) with different seeds per child process. Generating debug outputs per child process is then also recommended. Messing this up is easy and hard to even notice.

Example: augment_batches(..., background=True)¶

The easiest way to do multicore augmentation in imgaug is to call augment_batches(..., background=True). It works similarly to e.g. augment_images(). The difference is that it expects a list of imgaug.Batch instances. Each of these instances contains the data of a batch, e.g. images or bounding boxes. Creating a batch is trivial and can be done via e.g. batch = imgaug.Batch(images=<list of numpy arrays>, bounding_boxes=<list of imgaug.BoundingBoxOnImages>). Another difference to augment_images() is that augment_batches() returns a generator, which continuously yields augmented batches as they are received from the child processes. The final (and important) difference is that augment_batches() currently does not use the random state set in the augmenter, but rather picks a new one. That is because otherwise all child processes would apply the same augmentations (just to different images). If you need more control over the random state use pool() or imgaug.multicore.Pool instead (see further below).

Let's try to use augment_batches(). First, we define some example data.

In [1]:

import numpy as np
import imgaug as ia
%matplotlib inline

BATCH_SIZE = 16
NB_BATCHES = 100

image = ia.quokka_square(size=(256, 256))
images = [np.copy(image) for _ in range(BATCH_SIZE)]

Now we combine the images to imgaug.Batch instances:

In [2]:

batches = [ia.Batch(images=images) for _ in range(NB_BATCHES)]

Our augmentation sequence contains PiecewiseAffine, which tends to be a very slow augmenter. We further slow it down by using a denser grid of points on the image. Each such point will lead to more local affine transformations being applied.

In [3]:

from imgaug import augmenters as iaa

aug = iaa.Sequential([
    iaa.PiecewiseAffine(scale=0.05, nb_cols=6, nb_rows=6),  # very slow
    iaa.Fliplr(0.5),  # very fast
    iaa.CropAndPad(px=(-10, 10))  # very fast
])

Now we augment the generated batches. Let's first augment without multicore augmentation to see how long a single CPU core needs. augment_batches() returns a generator of imgaug.Batch instances. We can then access the augmented images via the attribute imgaug.Batch.images_aug.

In [4]:

import time

time_start = time.time()
batches_aug = list(aug.augment_batches(batches, background=False))  # list() converts generator to list
time_end = time.time()

print("Augmentation done in %.2fs" % (time_end - time_start,))
ia.imshow(batches_aug[0].images_aug[0])

Augmentation done in 134.91s

Roughly 130 seconds for 100 batches, each containing 16 images of size 256x256. That's about 0.08s per image. Not very fast, the GPU would most likely train faster than this. Let's try it instead with multicore augmentation.

In [5]:

time_start = time.time()
batches_aug = list(aug.augment_batches(batches, background=True))  # background=True for multicore aug
time_end = time.time()

print("Augmentation done in %.2fs" % (time_end - time_start,))
ia.imshow(batches_aug[0].images_aug[0])

Augmentation done in 28.07s

Down to less than 30 seconds -- or roughly a quarter of the single core time. That is already much better. Note that this is on an outdated CPU with 4 cores and 8 threads. A modern 8 core CPU should benefit even more.

Batches with Non-Image Data¶

The example above only showed how to augment images. Often enough, you will also want to augment e.g. keypoints or bounding boxes on these. That is achieved by a trivial change when creating imgaug.Batch objects. You do not have to worry about random states or stochastic/deterministic mode in this case. imgaug will automatically handle that and make sure that the augmentations between images and associated data align.

Let's extend our previous example data with some keypoints.

In [6]:

BATCH_SIZE = 16
NB_BATCHES = 100
image = ia.quokka(size=0.2)
images = [np.copy(image) for _ in range(BATCH_SIZE)]
keypoints = ia.quokka_keypoints(size=0.2)
keypoints = [keypoints.deepcopy() for _ in range(BATCH_SIZE)]

batches = [ia.Batch(images=images, keypoints=keypoints) for _ in range(NB_BATCHES)]

And now augment the data in the same way as before:

In [7]:

time_start = time.time()
batches_aug = list(aug.augment_batches(batches, background=True))  # background=True for multicore aug
time_end = time.time()

print("Augmentation done in %.2fs" % (time_end - time_start,))
ia.imshow(
    batches_aug[0].keypoints_aug[0].draw_on_image(
        batches_aug[0].images_aug[0]
    )
)

Augmentation done in 83.81s

And that's it. Simply add keypoints=<list of imgaug.KeypointsOnImage> when instantiating an imgaug.Batch() instance and the rest is handled by the library. The same can be done for bounding boxes (bounding_boxes=<list of imgaug.BoundingBoxesOnImage>), heatmaps (heatmaps=<list of imgaug.HeatmapsOnImage>) or segmentation maps (segmentation_maps=<list of SegmentationMapOnImage>). Just make sure that the lists have the same length and entries with the same index actually belong to each other (e.g. image 0014059.jpg and the keypoints for image 0014059.jpg).

You might have noticed that the augmentation time here went up from ~30 seconds to ~80 seconds -- just by adding keypoints. That is because PiecewiseAffine uses an image based method for keypoint augmentation due to inaccuracies when transforming keypoints as coordinates. It is currently the slowest keypoint augmenter in the library (so avoid using PiecewiseAffine when augmenting keypoints or bounding boxes).

Using Pool¶

augment_batches() is easy to use, but does not offer much customization. If you want to e.g. control the number of used CPU cores or the random number seed, augmenter.pool() is a simple alternative (and it is the backend that augment_batches() uses). The example below augments again the previously defined batches, this time with pool(). We configure the pool to use all CPU cores except one (processes=-1), restart child processes after 20 tasks (maxtasksperchild=20) and to start with a random number seed of 1. The argument maxtasksperchild can be useful if you deal with memory leaks that lead to more and more memory consumption over time. If you don't have this problem, there is no reason to use the argument (and it does cost performance to use it).

In [8]:

with aug.pool(processes=-1, maxtasksperchild=20, seed=1) as pool:
    batches_aug = pool.map_batches(batches)
ia.imshow(batches_aug[0].images_aug[0])

Note that we called map_batches() here exactly once to augment the input batches. In practice, we can call that command many times for each generated pool using different input batches -- and it is recommended to do so, because creating a new pool requires respawning the child processes, which does cost some time.

augmenter.pool() is a shortcut that creates an instance of imgaug.multicore.Pool, which again is a wrapper around python's multiprocessing.Pool. The wrapper deals mainly with the correct management of random states between child processes. The below example shows the usage of imgaug.multicore.Pool, using the same seed as in the augmenter.pool() example above and hence generating the same output.

In [9]:

from imgaug import multicore

with multicore.Pool(aug, processes=-1, maxtasksperchild=20, seed=1) as pool:
    batches_aug = pool.map_batches(batches)

ia.imshow(batches_aug[0].images_aug[0])

Using Pool with Generators¶

The two previous examples showed how to use lists with imgaug's Pool. For large datasets, using generators can be more appropiate to avoid having to store the whole dataset in memory. This is trivially done by replacing map_batches(<list>) with imap_batches(<generator>). The output of that function is also a generator.

In [10]:

def create_generator(lst):
    for list_entry in lst:
        yield list_entry

my_generator = create_generator(batches)

with aug.pool(processes=-1, maxtasksperchild=20, seed=1) as pool:
    batches_aug = pool.imap_batches(my_generator)

    for i, batch_aug in enumerate(batches_aug):
        if i == 0:
            ia.imshow(batch_aug.images_aug[0])
        # do something else with the batch here

Conclusion¶

So to use multicore augmentation with imgaug just do the following:

Convert your data to instances of imgaug.Batch. Make sure that corresponding data has the same list index within the batch, e.g. images and their corresponding keypoints.
Call augmenter.augment_batches(batches, background=True). This returns a generator.
Use augmenter.pool([processes], [maxtasksperchild], [seed]) if you need more control or want to use a generator as input. Call pool.map_batches(list) or pool.imap_batches(generator) on the pool.
Avoid implementing your own multicore system or using another library for that as it is easy to mess up.