clEsperanto brings some pixel-by-pixel comparison functions which are also available with numpy. Let's see how numpy performs in comparison with our OpenCL stuff. When doing similar comparisons with ImageJ, we saw more performance benefits when GPU-accelerating 3D operations compared to 2D operations. https://clij.github.io/clij-benchmarking/benchmarking_operations_jmh
Note: benchmarking results vary heavily depending on image size, kernel size, used operations, parameters and used hardware. Use this notebook to adapt it to your use-case scenario and benchmark on your target hardware. If you have different scenarios or use-cases, you are very welcome to submit your notebook as pull-request!
import pyclesperanto_prototype as cle
import numpy as np
import cupy as cp
import time
# to measure kernel execution duration properly, we need to set this flag. It will slow down exection of workflows a bit though
cle.set_wait_for_kernel_finish(True)
# selet a GPU with the following in the name. This will fallback to any other GPU if none with this name is found
cle.select_device('RTX')
<NVIDIA GeForce RTX 3050 Ti Laptop GPU on Platform: NVIDIA CUDA (1 refs)>
# test data
test_image1 = np.random.random([100, 512, 512])
test_image2 = np.random.random([100, 512, 512])
# multiply with pyclesperanto
result_image = None
cl_test_image1 = cle.push_zyx(test_image1)
cl_test_image2 = cle.push_zyx(test_image2)
for i in range(0, 10):
start_time = time.time()
result_image = cle.greater_or_equal(cl_test_image1, cl_test_image2, result_image)
print("clEsperanto greater_or_equal duration: " + str(time.time() - start_time))
clEsperanto greater_or_equal duration: 0.031006336212158203 clEsperanto greater_or_equal duration: 0.003000974655151367 clEsperanto greater_or_equal duration: 0.0020003318786621094 clEsperanto greater_or_equal duration: 0.003000497817993164 clEsperanto greater_or_equal duration: 0.002000570297241211 clEsperanto greater_or_equal duration: 0.003000497817993164 clEsperanto greater_or_equal duration: 0.0020008087158203125 clEsperanto greater_or_equal duration: 0.003000497817993164 clEsperanto greater_or_equal duration: 0.002000570297241211 clEsperanto greater_or_equal duration: 0.002000570297241211
# comparie with numpy
result_image = None
cp_test_image1 = cp.asarray(test_image1)
cp_test_image2 = cp.asarray(test_image2)
for i in range(0, 10):
start_time = time.time()
result_image = cp.greater_equal(cp_test_image1, cp_test_image2)
cp.cuda.stream.get_current_stream().synchronize() # we need to wait here to measure time properly
print("cupy greater_equal duration: " + str(time.time() - start_time))
# comparie with numpy
result_image = None
for i in range(0, 10):
start_time = time.time()
result_image = np.greater_equal(test_image1, test_image2)
print("Numpy greater_equal duration: " + str(time.time() - start_time))