The CPU is the most common type of processor for execurting your code. CPUs have one or more serial processors which each take single intructions from a stack and execute them sequentially.
GPUs are a form of coprocessor which are commonly used for video and image rendering, but are not extremely popular in machine learning and data science fields too. GPUs have one or more streaming multiprocessors which take in arrays of instructions and execute them in parallel.
This video may help explain the concept visually.
from IPython.display import YouTubeVideo
YouTubeVideo(id='-P28LKWTzrI',width=1000,height=600)
Executing code on your GPU feels a lot like executing code on a second computer over a network.
If I wanted to send a Python program to another machine to be executed I would need a few things:
To achieve the same things with the GPU we need to use CUDA over PCI. But the idea is still the same, we need to move data and code to the device and execute that code.
CUDA is an extension to C++ which allows us to compile GPU code and interact with the GPU.
Over the last few years NVIDIA has invested in bringing the CUDA functionality to Python.
Today there are packages like Numba which allows us to Just In Time (JIT) compile Python code into something that is compatible with CUDA and provides bindings to transfer data and execute that code.
There are also many high level packages such as CuPy, cuDF, cuML, cuGraph, cuSignal and more which implement functionality in CUDA C++ and then package that with Python bindings so that it can be used directly from Python. These packages are collectively known as RAPIDS.
Lastly there is also the recently announced CUDA Python which provides Cython/Python wrappers for the CUDA driver and runtime APIs and is currently in preview.
This tutorial will focus on Numba and RAPIDS.