In clesperanto, you can list available GPUs and select them for processing.
import numpy as np
import pyclesperanto_prototype as cle
cle.available_device_names()
['NVIDIA GeForce RTX 3050 Ti Laptop GPU', 'gfx902']
You can then select a GPU and process on it.
cle.select_device('gfx')
<gfx902 on Platform: AMD Accelerated Parallel Processing (2 refs)>
image = np.random.random((10, 100, 100))
processed_image = cle.gaussian_blur(image, sigma_x=10)
cle.imshow(processed_image)
For comparing the performance of multiple devices, run executions of operations multiple times to get a good impression of general performance. Single individual time measurements may be misleading. You can either program your own for-loop or use timeit which can automate that for you./
cle.select_device('gfx')
<gfx902 on Platform: AMD Accelerated Parallel Processing (2 refs)>
%%timeit
cle.gaussian_blur(image, sigma_x=10)
3.97 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cle.select_device("RTX")
<NVIDIA GeForce RTX 3050 Ti Laptop GPU on Platform: NVIDIA CUDA (1 refs)>
%%timeit
cle.gaussian_blur(image, sigma_x=10)
2.54 ms ± 502 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cl_info
outputs all information about available hardware
print(cle.cl_info())
NVIDIA CUDA EXTENSIONS:cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid EXTENSIONS_WITH_VERSION:[<pyopencl._cl.NameVersion object at 0x000002554B5110B0>, <pyopencl._cl.NameVersion object at 0x000002554B511130>, <pyopencl._cl.NameVersion object at 0x000002554B5111B0>, <pyopencl._cl.NameVersion object at 0x000002554B5111F0>, <pyopencl._cl.NameVersion object at 0x000002554B5112B0>, <pyopencl._cl.NameVersion object at 0x000002554B5112F0>, <pyopencl._cl.NameVersion object at 0x000002554B511330>, <pyopencl._cl.NameVersion object at 0x000002554B511630>, <pyopencl._cl.NameVersion object at 0x000002554B5116F0>, <pyopencl._cl.NameVersion object at 0x000002554B5113B0>, <pyopencl._cl.NameVersion object at 0x000002554B511770>, <pyopencl._cl.NameVersion object at 0x000002554B5117F0>, <pyopencl._cl.NameVersion object at 0x000002554B511870>, <pyopencl._cl.NameVersion object at 0x000002554B5118F0>, <pyopencl._cl.NameVersion object at 0x000002554B511930>, <pyopencl._cl.NameVersion object at 0x000002554B511970>, <pyopencl._cl.NameVersion object at 0x000002554B5119F0>, <pyopencl._cl.NameVersion object at 0x000002554B511A30>, <pyopencl._cl.NameVersion object at 0x000002554B511A70>, <pyopencl._cl.NameVersion object at 0x000002554B511AF0>] HOST_TIMER_RESOLUTION:0 NAME:NVIDIA CUDA NUMERIC_VERSION:12582912 PROFILE:FULL_PROFILE VENDOR:NVIDIA Corporation VERSION:OpenCL 3.0 CUDA 11.3.121 NVIDIA GeForce RTX 3050 Ti Laptop GPU ADDRESS_BITS:64 ATOMIC_FENCE_CAPABILITIES:19 ATOMIC_MEMORY_CAPABILITIES:17 ATTRIBUTE_ASYNC_ENGINE_COUNT_NV:5 AVAILABLE:1 AVAILABLE_ASYNC_QUEUES_AMD:None BOARD_NAME_AMD:None BUILT_IN_KERNELS: BUILT_IN_KERNELS_WITH_VERSION:[] COMPILER_AVAILABLE:1 COMPUTE_CAPABILITY_MAJOR_NV:8 COMPUTE_CAPABILITY_MINOR_NV:6 DEVICE_ENQUEUE_CAPABILITIES:2564095475712 DOUBLE_FP_CONFIG:63 DRIVER_VERSION:466.77 ENDIAN_LITTLE:1 ERROR_CORRECTION_SUPPORT:0 EXECUTION_CAPABILITIES:1 EXTENSIONS:cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid EXTENSIONS_WITH_VERSION:[<pyopencl._cl.NameVersion object at 0x000002554B511B30>, <pyopencl._cl.NameVersion object at 0x000002554B5110B0>, <pyopencl._cl.NameVersion object at 0x000002554B511130>, <pyopencl._cl.NameVersion object at 0x000002554B5111B0>, <pyopencl._cl.NameVersion object at 0x000002554B5111F0>, <pyopencl._cl.NameVersion object at 0x000002554B5112F0>, <pyopencl._cl.NameVersion object at 0x000002554B511330>, <pyopencl._cl.NameVersion object at 0x000002554B511630>, <pyopencl._cl.NameVersion object at 0x000002554B5116F0>, <pyopencl._cl.NameVersion object at 0x000002554B5112B0>, <pyopencl._cl.NameVersion object at 0x000002554B5113B0>, <pyopencl._cl.NameVersion object at 0x000002554B511770>, <pyopencl._cl.NameVersion object at 0x000002554B5117F0>, <pyopencl._cl.NameVersion object at 0x000002554B511870>, <pyopencl._cl.NameVersion object at 0x000002554B5118F0>, <pyopencl._cl.NameVersion object at 0x000002554B511930>, <pyopencl._cl.NameVersion object at 0x000002554B511970>, <pyopencl._cl.NameVersion object at 0x000002554B5119F0>, <pyopencl._cl.NameVersion object at 0x000002554B511A30>, <pyopencl._cl.NameVersion object at 0x000002554B511A70>] EXT_MEM_PADDING_IN_BYTES_QCOM:None GENERIC_ADDRESS_SPACE_SUPPORT:0 GFXIP_MAJOR_AMD:None GFXIP_MINOR_AMD:None GLOBAL_FREE_MEMORY_AMD:None GLOBAL_MEM_CACHELINE_SIZE:128 GLOBAL_MEM_CACHE_SIZE:573440 GLOBAL_MEM_CACHE_TYPE:2 GLOBAL_MEM_CHANNELS_AMD:None GLOBAL_MEM_CHANNEL_BANKS_AMD:None GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD:None GLOBAL_MEM_SIZE:4294967296 GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE:0 GPU_OVERLAP_NV:1 HALF_FP_CONFIG:None HOST_UNIFIED_MEMORY:0 ILS_WITH_VERSION:[] IL_VERSION: IMAGE2D_MAX_HEIGHT:32768 IMAGE2D_MAX_WIDTH:32768 IMAGE3D_MAX_DEPTH:16384 IMAGE3D_MAX_HEIGHT:16384 IMAGE3D_MAX_WIDTH:16384 IMAGE_MAX_ARRAY_SIZE:2048 IMAGE_MAX_BUFFER_SIZE:268435456 IMAGE_SUPPORT:1 INTEGRATED_MEMORY_NV:0 KERNEL_EXEC_TIMEOUT_NV:1 LINKER_AVAILABLE:1 LOCAL_MEM_BANKS_AMD:None LOCAL_MEM_SIZE:49152 LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD:None LOCAL_MEM_TYPE:1 MAX_CLOCK_FREQUENCY:1035 MAX_COMPUTE_UNITS:20 MAX_CONSTANT_ARGS:9 MAX_CONSTANT_BUFFER_SIZE:65536 MAX_GLOBAL_VARIABLE_SIZE:0 MAX_MEM_ALLOC_SIZE:1073741824 MAX_NUM_SUB_GROUPS:0 MAX_ON_DEVICE_EVENTS:0 MAX_ON_DEVICE_QUEUES:0 MAX_PARAMETER_SIZE:4352 MAX_PIPE_ARGS:0 MAX_READ_IMAGE_ARGS:256 MAX_READ_WRITE_IMAGE_ARGS:0 MAX_SAMPLERS:32 MAX_WORK_GROUP_SIZE:1024 MAX_WORK_GROUP_SIZE_AMD:None MAX_WORK_ITEM_DIMENSIONS:3 MAX_WORK_ITEM_SIZES:[1024, 1024, 64] MAX_WRITE_IMAGE_ARGS:32 MEM_BASE_ADDR_ALIGN:4096 ME_VERSION_INTEL:None MIN_DATA_TYPE_ALIGN_SIZE:128 NAME:NVIDIA GeForce RTX 3050 Ti Laptop GPU NATIVE_VECTOR_WIDTH_CHAR:1 NATIVE_VECTOR_WIDTH_DOUBLE:1 NATIVE_VECTOR_WIDTH_FLOAT:1 NATIVE_VECTOR_WIDTH_HALF:0 NATIVE_VECTOR_WIDTH_INT:1 NATIVE_VECTOR_WIDTH_LONG:1 NATIVE_VECTOR_WIDTH_SHORT:1 NON_UNIFORM_WORK_GROUP_SUPPORT:0 NUMERIC_VERSION:12582912 NUM_SIMULTANEOUS_INTEROPS_INTEL:None OPENCL_C_ALL_VERSIONS:[<pyopencl._cl.NameVersion object at 0x000002554B511BF0>, <pyopencl._cl.NameVersion object at 0x000002554B511AF0>, <pyopencl._cl.NameVersion object at 0x000002554B511C30>, <pyopencl._cl.NameVersion object at 0x000002554B511CB0>] OPENCL_C_FEATURES:[<pyopencl._cl.NameVersion object at 0x000002554B511CF0>, <pyopencl._cl.NameVersion object at 0x000002554B511D70>, <pyopencl._cl.NameVersion object at 0x000002554B511DF0>, <pyopencl._cl.NameVersion object at 0x000002554B511E30>] OPENCL_C_VERSION:OpenCL C 1.2 PAGE_SIZE_QCOM:None PARTITION_AFFINITY_DOMAIN:[0] PARTITION_MAX_SUB_DEVICES:1 PARTITION_PROPERTIES:[0] PARTITION_TYPE:[0] PCIE_ID_AMD:None PCI_BUS_ID_NV:1 PCI_DOMAIN_ID_NV:0 PCI_SLOT_ID_NV:0 PIPE_MAX_ACTIVE_RESERVATIONS:0 PIPE_MAX_PACKET_SIZE:0 PIPE_SUPPORT:0 PLATFORM:<pyopencl.Platform 'NVIDIA CUDA' at 0x2551805f6b0> PREFERRED_CONSTANT_BUFFER_SIZE_AMD:None PREFERRED_GLOBAL_ATOMIC_ALIGNMENT:0 PREFERRED_INTEROP_USER_SYNC:0 PREFERRED_LOCAL_ATOMIC_ALIGNMENT:0 PREFERRED_PLATFORM_ATOMIC_ALIGNMENT:0 PREFERRED_VECTOR_WIDTH_CHAR:1 PREFERRED_VECTOR_WIDTH_DOUBLE:1 PREFERRED_VECTOR_WIDTH_FLOAT:1 PREFERRED_VECTOR_WIDTH_HALF:0 PREFERRED_VECTOR_WIDTH_INT:1 PREFERRED_VECTOR_WIDTH_LONG:1 PREFERRED_VECTOR_WIDTH_SHORT:1 PREFERRED_WORK_GROUP_SIZE_AMD:None PREFERRED_WORK_GROUP_SIZE_MULTIPLE:32 PRINTF_BUFFER_SIZE:None PROFILE:FULL_PROFILE PROFILING_TIMER_OFFSET_AMD:None PROFILING_TIMER_RESOLUTION:1000 QUEUE_ON_DEVICE_MAX_SIZE:0 QUEUE_ON_DEVICE_PREFERRED_SIZE:0 QUEUE_ON_DEVICE_PROPERTIES:0 QUEUE_ON_HOST_PROPERTIES:3 QUEUE_PROPERTIES:3 REFERENCE_COUNT:1 REGISTERS_PER_BLOCK_NV:65536 SIMD_INSTRUCTION_WIDTH_AMD:None SIMD_PER_COMPUTE_UNIT_AMD:None SIMD_WIDTH_AMD:None SIMULTANEOUS_INTEROPS_INTEL:None SINGLE_FP_CONFIG:191 SPIR_VERSIONS:None SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS:0 SVM_CAPABILITIES:1 THREAD_TRACE_SUPPORTED_AMD:None TOPOLOGY_AMD:None TYPE:4 VENDOR:NVIDIA Corporation VENDOR_ID:4318 VERSION:OpenCL 3.0 CUDA WARP_SIZE_NV:32 WAVEFRONT_WIDTH_AMD:None WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT:0 AMD Accelerated Parallel Processing EXTENSIONS:cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices EXTENSIONS_WITH_VERSION:None HOST_TIMER_RESOLUTION:100 NAME:AMD Accelerated Parallel Processing NUMERIC_VERSION:None PROFILE:FULL_PROFILE VENDOR:Advanced Micro Devices, Inc. VERSION:OpenCL 2.1 AMD-APP (3180.7) gfx902 ADDRESS_BITS:64 ATOMIC_FENCE_CAPABILITIES:None ATOMIC_MEMORY_CAPABILITIES:None ATTRIBUTE_ASYNC_ENGINE_COUNT_NV:None AVAILABLE:1 AVAILABLE_ASYNC_QUEUES_AMD:2 BOARD_NAME_AMD:AMD Radeon(TM) Graphics BUILT_IN_KERNELS: BUILT_IN_KERNELS_WITH_VERSION:None COMPILER_AVAILABLE:1 COMPUTE_CAPABILITY_MAJOR_NV:None COMPUTE_CAPABILITY_MINOR_NV:None DEVICE_ENQUEUE_CAPABILITIES:None DOUBLE_FP_CONFIG:63 DRIVER_VERSION:3180.7 (PAL,HSAIL) ENDIAN_LITTLE:1 ERROR_CORRECTION_SUPPORT:0 EXECUTION_CAPABILITIES:1 EXTENSIONS:cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_copy_buffer_p2p cl_amd_planar_yuv EXTENSIONS_WITH_VERSION:None EXT_MEM_PADDING_IN_BYTES_QCOM:None GENERIC_ADDRESS_SPACE_SUPPORT:None GFXIP_MAJOR_AMD:9 GFXIP_MINOR_AMD:2 GLOBAL_FREE_MEMORY_AMD:[12600032] GLOBAL_MEM_CACHELINE_SIZE:64 GLOBAL_MEM_CACHE_SIZE:16384 GLOBAL_MEM_CACHE_TYPE:2 GLOBAL_MEM_CHANNELS_AMD:4 GLOBAL_MEM_CHANNEL_BANKS_AMD:4 GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD:256 GLOBAL_MEM_SIZE:12981370880 GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE:12981370880 GPU_OVERLAP_NV:None HALF_FP_CONFIG:0 HOST_UNIFIED_MEMORY:1 ILS_WITH_VERSION:None IL_VERSION:None IMAGE2D_MAX_HEIGHT:16384 IMAGE2D_MAX_WIDTH:16384 IMAGE3D_MAX_DEPTH:2048 IMAGE3D_MAX_HEIGHT:2048 IMAGE3D_MAX_WIDTH:2048 IMAGE_MAX_ARRAY_SIZE:2048 IMAGE_MAX_BUFFER_SIZE:134217728 IMAGE_SUPPORT:1 INTEGRATED_MEMORY_NV:None KERNEL_EXEC_TIMEOUT_NV:None LINKER_AVAILABLE:1 LOCAL_MEM_BANKS_AMD:32 LOCAL_MEM_SIZE:32768 LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD:65536 LOCAL_MEM_TYPE:1 MAX_CLOCK_FREQUENCY:2100 MAX_COMPUTE_UNITS:8 MAX_CONSTANT_ARGS:8 MAX_CONSTANT_BUFFER_SIZE:10577824972 MAX_GLOBAL_VARIABLE_SIZE:9520042240 MAX_MEM_ALLOC_SIZE:10577824972 MAX_NUM_SUB_GROUPS:None MAX_ON_DEVICE_EVENTS:1024 MAX_ON_DEVICE_QUEUES:1 MAX_PARAMETER_SIZE:1024 MAX_PIPE_ARGS:16 MAX_READ_IMAGE_ARGS:128 MAX_READ_WRITE_IMAGE_ARGS:64 MAX_SAMPLERS:16 MAX_WORK_GROUP_SIZE:256 MAX_WORK_GROUP_SIZE_AMD:None MAX_WORK_ITEM_DIMENSIONS:3 MAX_WORK_ITEM_SIZES:[1024, 1024, 1024] MAX_WRITE_IMAGE_ARGS:64 MEM_BASE_ADDR_ALIGN:2048 ME_VERSION_INTEL:None MIN_DATA_TYPE_ALIGN_SIZE:128 NAME:gfx902 NATIVE_VECTOR_WIDTH_CHAR:4 NATIVE_VECTOR_WIDTH_DOUBLE:1 NATIVE_VECTOR_WIDTH_FLOAT:1 NATIVE_VECTOR_WIDTH_HALF:1 NATIVE_VECTOR_WIDTH_INT:1 NATIVE_VECTOR_WIDTH_LONG:1 NATIVE_VECTOR_WIDTH_SHORT:2 NON_UNIFORM_WORK_GROUP_SUPPORT:None NUMERIC_VERSION:None NUM_SIMULTANEOUS_INTEROPS_INTEL:None OPENCL_C_ALL_VERSIONS:None OPENCL_C_FEATURES:None OPENCL_C_VERSION:OpenCL C 2.0 PAGE_SIZE_QCOM:None PARTITION_AFFINITY_DOMAIN:[0] PARTITION_MAX_SUB_DEVICES:8 PARTITION_PROPERTIES:[0] PARTITION_TYPE:[0] PCIE_ID_AMD:None PCI_BUS_ID_NV:None PCI_DOMAIN_ID_NV:None PCI_SLOT_ID_NV:None PIPE_MAX_ACTIVE_RESERVATIONS:16 PIPE_MAX_PACKET_SIZE:1987890380 PIPE_SUPPORT:None PLATFORM:<pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7ffd9d15a490> PREFERRED_CONSTANT_BUFFER_SIZE_AMD:None PREFERRED_GLOBAL_ATOMIC_ALIGNMENT:0 PREFERRED_INTEROP_USER_SYNC:1 PREFERRED_LOCAL_ATOMIC_ALIGNMENT:0 PREFERRED_PLATFORM_ATOMIC_ALIGNMENT:0 PREFERRED_VECTOR_WIDTH_CHAR:4 PREFERRED_VECTOR_WIDTH_DOUBLE:1 PREFERRED_VECTOR_WIDTH_FLOAT:1 PREFERRED_VECTOR_WIDTH_HALF:1 PREFERRED_VECTOR_WIDTH_INT:1 PREFERRED_VECTOR_WIDTH_LONG:1 PREFERRED_VECTOR_WIDTH_SHORT:2 PREFERRED_WORK_GROUP_SIZE_AMD:None PREFERRED_WORK_GROUP_SIZE_MULTIPLE:None PRINTF_BUFFER_SIZE:None PROFILE:FULL_PROFILE PROFILING_TIMER_OFFSET_AMD:1627642789794715400 PROFILING_TIMER_RESOLUTION:1 QUEUE_ON_DEVICE_MAX_SIZE:8388608 QUEUE_ON_DEVICE_PREFERRED_SIZE:262144 QUEUE_ON_DEVICE_PROPERTIES:3 QUEUE_ON_HOST_PROPERTIES:2 QUEUE_PROPERTIES:2 REFERENCE_COUNT:1 REGISTERS_PER_BLOCK_NV:None SIMD_INSTRUCTION_WIDTH_AMD:None SIMD_PER_COMPUTE_UNIT_AMD:4 SIMD_WIDTH_AMD:None SIMULTANEOUS_INTEROPS_INTEL:None SINGLE_FP_CONFIG:190 SPIR_VERSIONS:1.2 SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS:None SVM_CAPABILITIES:3 THREAD_TRACE_SUPPORTED_AMD:1 TOPOLOGY_AMD:<pyopencl._cl.DeviceTopologyAmd object at 0x000002554B511AF0> TYPE:4 VENDOR:Advanced Micro Devices, Inc. VENDOR_ID:4098 VERSION:OpenCL 2.0 AMD-APP (3180.7) WARP_SIZE_NV:None WAVEFRONT_WIDTH_AMD:None WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT:None Current device: NVIDIA GeForce RTX 3050 Ti Laptop GPU