import cv2
Low-level code written in Python, as looping in big arrays, can be slow, mainly because Python is dynamically typed and interpreted. However, in the scientific computing environment described above, this is rarely a problem: the OpenCV interface just access optimized C/C++ code, and most of the software in the SciPy Stack relies in a base of numerical software implemented in C, C++ and Fortran, including the efficient NumPy arrays.
But in the few situations where a low-level looping must be implemented (if the task cannot be implemented using NumPy capabilities) or the functionality of a external library is needed, Cython raises as an alternative. Cython is a static compiler capable of working in a super-set of the Python language that supports C-like static type declarations. It compiles Python code to C, further producing a Python module that can be imported and used from the interpreter. As noted by Behnel et al., the key idea behind Cython is the Pareto Principle, also known as the "80/20 rule": 80% of the run-time is spent in 20% of the source code. Cython’s goal is to speed up the critical parts of the code while avoiding too much overhead on coding by the programmer.
Other performance issues can be addressed by parallelization. IPython.parallel is a powerful architecture for parallel and distributed computing supporting different styles of parallelism, such as single program, multiple data (SPMD) and multiple program, multiple data (MPMD). Parallel applications can be easily developed, executed and monitored interactively from the IPython shell. Computer vision tasks can involve large sets of images or big point clouds, but many times the parallelization of these tasks is trivial and, using IPython.parallel, implemented in a few lines of code. The dynamic load balancing feature allows the use of all the available processing threads in the computer or all the processing power available in a cluster, but keeping the interactive computing environment free from large amounts of specific code for parallel computing.
In this example, SIFT descriptors of a reference image $I_1$ are computed. Then, descriptors are extracted for every image $I_n$ in a list, and the matches to $I_1$ descriptors are computed. The processing of the list is done in parallel, using all the available cores in the user’s machine.
Let $D_1$ be an array containing the descriptors of $I_1$.
T1 = cv2.imread('data/templeRing/templeR0001.png', cv2.IMREAD_GRAYSCALE)
sift = cv2.xfeatures2d.SIFT_create(nfeatures=5000)
_, D_1 = sift.detectAndCompute(T1, mask=None)
In a system shell, an IPython cluster for parallel computing is started using:
ipcluster start --n=8
Eight nodes are started (in this example, the number of clusters is selected based on the number of cores available in the user’s machine).
Note: if ipcluster
is not available, it can be installed using, for example, pip
:
$ pip install ipyparallel
Back to the IPython shell, the next step is the creation of a Client
object. A LoadBalancedView
object is created to provide a load-balanced parallel execution:
from ipyparallel import Client
rc = Client()
lview = rc.load_balanced_view()
@lview.parallel
defines a parallel, load-balanced funtionget_num_matches
will:BFMatcher
andNext, a Python decorator is used to define a parallel function that computes the descriptors and the matches (the decorator starts with a "@
" symbol). The function below takes a path to an image in the file system, computes the SIFT features and uses OpenCV’s BFMatcher
to get the matches to $D_1$, returning the number of matches found and the image’s path:
@lview.parallel()
def get_num_matches(arg):
fname, D_src = arg
import cv2
frame = cv2.imread(fname, cv2.IMREAD_GRAYSCALE)
print frame.shape
sift = cv2.SIFT(nfeatures=5000)
_, D = sift.detectAndCompute(frame, mask=None)
matcher = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
matches = matcher.match(D_src, D)
return fname, len(matches)
map
function starts the parallelized callIPython capability to access the system’s shell is employed to list all the files in a directory and store the file paths in a list of strings, fnames
. Finally, the map
function calls get_num_matches
to every string in the fnames
list, automatically performing the load balance on the nodes:
fnames = !ls data/templeRing/temple*.png
args = [(fname, D_1) for fname in fnames]
async_res = get_num_matches.map(args)
--------------------------------------------------------------------------- BadYieldError Traceback (most recent call last) <ipython-input-10-0ade8fe7f65c> in <module>() 2 3 args = [(fname, D_1) for fname in fnames] ----> 4 async_res = get_num_matches.map(args) /usr/local/lib/python2.7/dist-packages/ipyparallel/client/remotefunction.pyc in map(self, *sequences) 283 and mismatched sequence lengths will be padded with None. 284 """ --> 285 return self(*sequences, __ipp_mapping=True) 286 287 __all__ = ['remote', 'parallel', 'RemoteFunction', 'ParallelFunction'] <decorator-gen-128> in __call__(self, *sequences, **kwargs) /usr/local/lib/python2.7/dist-packages/ipyparallel/client/remotefunction.pyc in sync_view_results(f, self, *args, **kwargs) 77 view._in_sync_results = True 78 try: ---> 79 ret = f(self, *args, **kwargs) 80 finally: 81 view._in_sync_results = False /usr/local/lib/python2.7/dist-packages/ipyparallel/client/remotefunction.pyc in __call__(self, *sequences, **kwargs) 257 view = self.view if balanced else client[t] 258 with view.temp_flags(block=False, **self.flags): --> 259 ar = view.apply(f, *args) 260 ar.owner = False 261 /usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in apply(self, f, *args, **kwargs) 209 ``f(*args, **kwargs)``. 210 """ --> 211 return self._really_apply(f, args, kwargs) 212 213 def apply_async(self, f, *args, **kwargs): <decorator-gen-144> in _really_apply(self, f, args, kwargs, block, track, after, follow, timeout, targets, retries) /usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in sync_results(f, self, *args, **kwargs) 45 """sync relevant results from self.client to our results attribute.""" 46 if self._in_sync_results: ---> 47 return f(self, *args, **kwargs) 48 self._in_sync_results = True 49 try: <decorator-gen-143> in _really_apply(self, f, args, kwargs, block, track, after, follow, timeout, targets, retries) /usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in save_ids(f, self, *args, **kwargs) 33 n_previous = len(self.client.history) 34 try: ---> 35 ret = f(self, *args, **kwargs) 36 finally: 37 nmsgs = len(self.client.history) - n_previous /usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in _really_apply(self, f, args, kwargs, block, track, after, follow, timeout, targets, retries) 1035 1036 future = self.client.send_apply_request(self._socket, f, args, kwargs, track=track, -> 1037 metadata=metadata) 1038 1039 ar = AsyncResult(self.client, future, fname=getname(f), /usr/local/lib/python2.7/dist-packages/ipyparallel/client/client.pyc in send_apply_request(self, socket, f, args, kwargs, metadata, track, ident) 1397 1398 future = self._send(socket, "apply_request", buffers=bufs, ident=ident, -> 1399 metadata=metadata, track=track) 1400 1401 msg_id = future.msg_id /usr/local/lib/python2.7/dist-packages/ipyparallel/client/client.pyc in _send(self, socket, msg_type, content, parent, ident, buffers, track, header, metadata) 967 self.metadata.pop(msg_id, None) 968 --> 969 multi_future(futures).add_done_callback(cleanup) 970 971 def _really_send(): /usr/local/lib/python2.7/dist-packages/tornado/gen.pyc in multi_future(children, quiet_exceptions) 776 else: 777 keys = None --> 778 children = list(map(convert_yielded, children)) 779 assert all(is_future(i) for i in children) 780 unfinished_children = set(children) /usr/local/lib/python2.7/dist-packages/singledispatch.pyc in wrapper(*args, **kw) 208 209 def wrapper(*args, **kw): --> 210 return dispatch(args[0].__class__)(*args, **kw) 211 212 registry[object] = func /usr/local/lib/python2.7/dist-packages/tornado/gen.pyc in convert_yielded(yielded) 1227 return _wrap_awaitable(yielded) 1228 else: -> 1229 raise BadYieldError("yielded unknown object %r" % (yielded,)) 1230 1231 if singledispatch is not None: BadYieldError: yielded unknown object <Future at 0x7f10477efb50 state=pending>
for f, n in async_res:
print f, n
/tmp/templeRing/templeR0001.png 802 /tmp/templeRing/templeR0002.png 549 /tmp/templeRing/templeR0003.png 491 /tmp/templeRing/templeR0004.png 482 /tmp/templeRing/templeR0005.png 470 /tmp/templeRing/templeR0006.png 441 /tmp/templeRing/templeR0007.png 401 /tmp/templeRing/templeR0008.png 358 /tmp/templeRing/templeR0009.png 393 /tmp/templeRing/templeR0010.png 438 /tmp/templeRing/templeR0011.png 456 /tmp/templeRing/templeR0012.png 455 /tmp/templeRing/templeR0013.png 444 /tmp/templeRing/templeR0014.png 405 /tmp/templeRing/templeR0015.png 436 /tmp/templeRing/templeR0016.png 421 /tmp/templeRing/templeR0017.png 410 /tmp/templeRing/templeR0018.png 398 /tmp/templeRing/templeR0019.png 408 /tmp/templeRing/templeR0020.png 451 /tmp/templeRing/templeR0021.png 430 /tmp/templeRing/templeR0022.png 444 /tmp/templeRing/templeR0023.png 454 /tmp/templeRing/templeR0024.png 476 /tmp/templeRing/templeR0025.png 475 /tmp/templeRing/templeR0026.png 509 /tmp/templeRing/templeR0027.png 509 /tmp/templeRing/templeR0028.png 546 /tmp/templeRing/templeR0029.png 558 /tmp/templeRing/templeR0030.png 662 /tmp/templeRing/templeR0031.png 577 /tmp/templeRing/templeR0032.png 465 /tmp/templeRing/templeR0033.png 477 /tmp/templeRing/templeR0034.png 479 /tmp/templeRing/templeR0035.png 457 /tmp/templeRing/templeR0036.png 456 /tmp/templeRing/templeR0037.png 473 /tmp/templeRing/templeR0038.png 477 /tmp/templeRing/templeR0039.png 473 /tmp/templeRing/templeR0040.png 454 /tmp/templeRing/templeR0041.png 461 /tmp/templeRing/templeR0042.png 431 /tmp/templeRing/templeR0043.png 442 /tmp/templeRing/templeR0044.png 454 /tmp/templeRing/templeR0045.png 448 /tmp/templeRing/templeR0046.png 454 /tmp/templeRing/templeR0047.png 459
This simple example is able to explore all the available cores in the local machine, just asking for a few extra lines of code. But the parallel computing capabilities in IPython go farbeyond, supporting SPMD and MPMD parallelism and the use of StarCluster for execution in Amazon’s Elastic Compute Cloud (EC$_2$). The interested reader is referred to the section Using IPython for parallel computing in the IPython documentation.