#!/usr/bin/env python # coding: utf-8 # # Faster Feature Location Through Parallel Computation # # Feature-finding can easily be parallelized: each frame an independent task, and the tasks can be divided among the multiple CPU cores in most modern computers. Instead of running in a single process as usual, your code is spread across multiple "worker" processes, each running on its own CPU core. # # First, let's set up the movie to track: # In[1]: import pims import trackpy as tp @pims.pipeline def gray(image): return image[:, :, 1] frames = gray(pims.ImageSequence('../sample_data/bulk_water/*.png')) # In[2]: tp.quiet() # Disabling progress reports makes this a fairer comparison # # Using trackpy.batch # # Beginning with trackpy v0.4.2, use the "processes" argument to have `trackpy.batch` run on multiple CPU cores at once (using Python's built-in multiprocessing module). Give the number of cores you want to use, or specify `'auto'` to let trackpy detect how many cores your computer has. # # Let's compare the time required to process the first 100 frames: # In[6]: get_ipython().run_cell_magic('timeit', '', "features = tp.batch(frames[:100], 13, invert=True, processes='auto')\n") # For comparison, here's the same thing running in a single process. This was run on a laptop with only 2 cores, so we should expect `batch` to take roughly twice as long as the parallel version: # In[8]: get_ipython().run_cell_magic('timeit', '', 'features = tp.batch(frames[:100], 13, invert=True)\n') # # Using IPython Parallel # # Using [IPython parallel](https://github.com/ipython/ipyparallel) is a little more involved, but it gives you a lot of flexibility if you need to go beyond `batch`, for example by having the parallel workers run your own custom image processing. It also works with all versions of trackpy. # # ## Install ipyparallel and start a cluster # # As of IPython 6.2 (November 2017), IPython parallel is a separate package. If you are not using a comprehensive distribution like Anaconda, you may need to install this package at the command prompt using `pip install ipyparallel` or `conda install ipyparallel`. # # It is simplest to start a cluster on the CPUs of your local machine. In order to start a cluster, you will need to go to a Terminal and type: # ``` # ipcluster start # ``` # # This automatically uses all available CPU cores, but you can also use the `-n` option to specify how many workers to start. Now you are running a cluster — it's that easy! More information on IPython parallel is available in [the IPython parallel documentation](http://ipyparallel.readthedocs.io/en/latest/intro.html). # In[10]: from ipyparallel import Client client = Client() view = client.load_balanced_view() # We can see that there are four cores available. # In[11]: client[:] # Use a little magic, ``%%px``, to import trackpy on all cores. # In[12]: get_ipython().run_cell_magic('px', '', 'import trackpy as tp\ntp.quiet()\n') # ## Use the workers to locate features # # Define a function from ``locate`` with all the parameters specified, so the function's only argument is the image to be analyzed. We can map this function directly onto our collection of images. (This is a called "currying" a function, hence the choice of name.) # In[13]: curried_locate = lambda image: tp.locate(image, 13, invert=True) # In[14]: view.map(curried_locate, frames[:4]) # Optionally, prime each engine: make it set up numba. # Compare the time it takes to locate features in the first 100 images with and without parallelization. # In[15]: get_ipython().run_cell_magic('timeit', '', 'amr = view.map_async(curried_locate, frames[:100])\namr.wait_interactive()\nresults = amr.get()\n') # In[16]: get_ipython().run_cell_magic('timeit', '', 'serial_result = list(map(curried_locate, frames[:100]))\n') # Finally, if we want to get output similar to `batch`, we collect the results into a single DataFrame: # In[17]: import pandas as pd amr = view.map_async(curried_locate, frames[:100]) amr.wait_interactive() results = amr.get() features_ipy = pd.concat(results, ignore_index=True) features_ipy.head()