Augmentation can be a slow process, especially when working with large images and when combining many different augmentation techniques. Take a look at the performance documentation for some lower bounds on the expected single core performance using outdated hardware.
One way to improve performance is to augment simultaneously on multiple CPU cores. imgaug
offers a native system to do that. It is based on roughly the following steps:
A few important points can be derived from these steps. First, the data has to be split into batches. Second, combining all data into one batch and using multicore augmentation is pointless, as each individual batch is augmented by exactly one core. Third, using multicore augmentation for small amounts of data can also be pointless as starting the child processes might take up more time than simply augmenting the dataset on a single CPU core. (Though, you can re-use the child processes between epochs, so it might still pay off.)
Important: imgaug
offers multicore features and it is recommended to use them for multicore augmentation. It is not recommended to execute imgaug
in a custom-made multicore routine using e.g. python's multiprocessing
library or by using the multiprocessing support of some deep learning libraries. Doing so runs a major risk to accidentally apply the same augmentations in each child worker (just to different images). If one still decides to build a custom implementation, make sure to call imgaug.seed(value)
and augmenter.reseed(value)
with different seeds per child process. Generating debug outputs per child process is then also recommended. Messing this up is easy and hard to even notice.
The easiest way to do multicore augmentation in imgaug
is to call augment_batches(..., background=True)
. It works similarly to e.g. augment_images()
. The difference is that it expects a list of imgaug.Batch
instances. Each of these instances contains the data of a batch, e.g. images or bounding boxes. Creating a batch is trivial and can be done via e.g. batch = imgaug.Batch(images=<list of numpy arrays>, bounding_boxes=<list of imgaug.BoundingBoxOnImages>)
. Another difference to augment_images()
is that augment_batches()
returns a generator, which continuously yields augmented batches as they are received from the child processes. The final (and important) difference is that augment_batches()
currently does not use the random state set in the augmenter, but rather picks a new one. That is because otherwise all child processes would apply the same augmentations (just to different images). If you need more control over the random state use pool()
or imgaug.multicore.Pool
instead (see further below).
Let's try to use augment_batches()
. First, we define some example data.
import numpy as np
import imgaug as ia
%matplotlib inline
BATCH_SIZE = 16
NB_BATCHES = 100
image = ia.quokka_square(size=(256, 256))
images = [np.copy(image) for _ in range(BATCH_SIZE)]
Now we combine the images to imgaug.Batch
instances:
batches = [ia.Batch(images=images) for _ in range(NB_BATCHES)]
Our augmentation sequence contains PiecewiseAffine
, which tends to be a very slow augmenter. We further slow it down by using a denser grid of points on the image. Each such point will lead to more local affine transformations being applied.
from imgaug import augmenters as iaa
aug = iaa.Sequential([
iaa.PiecewiseAffine(scale=0.05, nb_cols=6, nb_rows=6), # very slow
iaa.Fliplr(0.5), # very fast
iaa.CropAndPad(px=(-10, 10)) # very fast
])
Now we augment the generated batches. Let's first augment without multicore augmentation to see how long a single CPU core needs. augment_batches()
returns a generator of imgaug.Batch
instances. We can then access the augmented images via the attribute imgaug.Batch.images_aug
.
import time
time_start = time.time()
batches_aug = list(aug.augment_batches(batches, background=False)) # list() converts generator to list
time_end = time.time()
print("Augmentation done in %.2fs" % (time_end - time_start,))
ia.imshow(batches_aug[0].images_aug[0])
Augmentation done in 134.91s
Roughly 130 seconds for 100 batches, each containing 16 images of size 256x256. That's about 0.08s per image. Not very fast, the GPU would most likely train faster than this. Let's try it instead with multicore augmentation.
time_start = time.time()
batches_aug = list(aug.augment_batches(batches, background=True)) # background=True for multicore aug
time_end = time.time()
print("Augmentation done in %.2fs" % (time_end - time_start,))
ia.imshow(batches_aug[0].images_aug[0])
Augmentation done in 28.07s
Down to less than 30 seconds -- or roughly a quarter of the single core time. That is already much better. Note that this is on an outdated CPU with 4 cores and 8 threads. A modern 8 core CPU should benefit even more.
The example above only showed how to augment images. Often enough, you will also want to augment e.g. keypoints or bounding boxes on these. That is achieved by a trivial change when creating imgaug.Batch
objects. You do not have to worry about random states or stochastic/deterministic mode in this case. imgaug
will automatically handle that and make sure that the augmentations between images and associated data align.
Let's extend our previous example data with some keypoints.
BATCH_SIZE = 16
NB_BATCHES = 100
image = ia.quokka(size=0.2)
images = [np.copy(image) for _ in range(BATCH_SIZE)]
keypoints = ia.quokka_keypoints(size=0.2)
keypoints = [keypoints.deepcopy() for _ in range(BATCH_SIZE)]
batches = [ia.Batch(images=images, keypoints=keypoints) for _ in range(NB_BATCHES)]
And now augment the data in the same way as before:
time_start = time.time()
batches_aug = list(aug.augment_batches(batches, background=True)) # background=True for multicore aug
time_end = time.time()
print("Augmentation done in %.2fs" % (time_end - time_start,))
ia.imshow(
batches_aug[0].keypoints_aug[0].draw_on_image(
batches_aug[0].images_aug[0]
)
)
Augmentation done in 83.81s
And that's it. Simply add keypoints=<list of imgaug.KeypointsOnImage>
when instantiating an imgaug.Batch()
instance and the rest is handled by the library. The same can be done for bounding boxes (bounding_boxes=<list of imgaug.BoundingBoxesOnImage>
), heatmaps (heatmaps=<list of imgaug.HeatmapsOnImage>
) or segmentation maps (segmentation_maps=<list of SegmentationMapOnImage>
). Just make sure that the lists have the same length and entries with the same index actually belong to each other (e.g. image 0014059.jpg
and the keypoints for image 0014059.jpg
).
You might have noticed that the augmentation time here went up from ~30 seconds to ~80 seconds -- just by adding keypoints. That is because PiecewiseAffine
uses an image based method for keypoint augmentation due to inaccuracies when transforming keypoints as coordinates. It is currently the slowest keypoint augmenter in the library (so avoid using PiecewiseAffine
when augmenting keypoints or bounding boxes).
augment_batches()
is easy to use, but does not offer much customization. If you want to e.g. control the number of used CPU cores or the random number seed, augmenter.pool()
is a simple alternative (and it is the backend that augment_batches()
uses). The example below augments again the previously defined batches, this time with pool()
. We configure the pool to use all CPU cores except one (processes=-1
), restart child processes after 20 tasks (maxtasksperchild=20
) and to start with a random number seed of 1
. The argument maxtasksperchild
can be useful if you deal with memory leaks that lead to more and more memory consumption over time. If you don't have this problem, there is no reason to use the argument (and it does cost performance to use it).
with aug.pool(processes=-1, maxtasksperchild=20, seed=1) as pool:
batches_aug = pool.map_batches(batches)
ia.imshow(batches_aug[0].images_aug[0])
Note that we called map_batches()
here exactly once to augment the input batches. In practice, we can call that command many times for each generated pool using different input batches -- and it is recommended to do so, because creating a new pool requires respawning the child processes, which does cost some time.
augmenter.pool()
is a shortcut that creates an instance of imgaug.multicore.Pool
, which again is a wrapper around python's multiprocessing.Pool
. The wrapper deals mainly with the correct management of random states between child processes. The below example shows the usage of imgaug.multicore.Pool
, using the same seed as in the augmenter.pool()
example above and hence generating the same output.
from imgaug import multicore
with multicore.Pool(aug, processes=-1, maxtasksperchild=20, seed=1) as pool:
batches_aug = pool.map_batches(batches)
ia.imshow(batches_aug[0].images_aug[0])
The two previous examples showed how to use lists with imgaug
's Pool. For large datasets, using generators can be more appropiate to avoid having to store the whole dataset in memory. This is trivially done by replacing map_batches(<list>)
with imap_batches(<generator>)
. The output of that function is also a generator.
def create_generator(lst):
for list_entry in lst:
yield list_entry
my_generator = create_generator(batches)
with aug.pool(processes=-1, maxtasksperchild=20, seed=1) as pool:
batches_aug = pool.imap_batches(my_generator)
for i, batch_aug in enumerate(batches_aug):
if i == 0:
ia.imshow(batch_aug.images_aug[0])
# do something else with the batch here
So to use multicore augmentation with imgaug
just do the following:
imgaug.Batch
. Make sure that corresponding data has the same list index within the batch, e.g. images and their corresponding keypoints.augmenter.augment_batches(batches, background=True)
. This returns a generator.augmenter.pool([processes], [maxtasksperchild], [seed])
if you need more control or want to use a generator as input. Call pool.map_batches(list)
or pool.imap_batches(generator)
on the pool.