TACO, which stands for Tensor Algebra Compiler, is a library for performing sparse and dense linear algebra and tensor algebra computations.
The computations can range from relatively simple ones like sparse matrix-vector multiplication to more complex ones like matricized tensor times Khatri-Rao product. All these computations can be performed on any mix of dense and sparse tensors. Under the hood, TACO automatically generates efficient code to perform these computations.
This notebook provides a brief introduction to the TACO Python library. For a more comprehensive overview, please see the documentation linked here. We will also link to relevant pages as we progress.
First, let's import TACO. Press Shift
+ Enter
to run the code below.
import pytaco as pt
from pytaco import dense, compressed
In the above, dense
and compressed
are mode (dimension) formats. We can think of tensors as multi-dimensional arrays, and the mode formats allow us to specify how we would like to store the data in each dimension:
dense
, then all of the elements in that dimension are stored.compressed
, then only nonzeros are stored.For example, we can declare a $512 \times 64 \times 2048$ tensor whose first dimension is dense and second and third dimensions are compressed:
T = pt.tensor([512, 64, 2048], pt.format([dense, compressed, compressed]))
We can initialize $T$ by calling its insert
method to add a nonzero element to the tensor. The insert
method takes two arguments: a list of coordinates and the value to be inserted at those coordinates:
# Set T(0, 1, 0) = 42.0
T.insert([0, 1, 0], 42.0)
If multiple elements are inserted at the same coordinates, they are summed together:
# Set T(0, 0, 1) = 12.0 + 24.0 = 36.0
T.insert([0, 0, 1], 12.0)
T.insert([0, 0, 1], 24.0)
We can then iterate over the nonzero elements of the tensor as follows:
for coordinates, value in T:
print("Coordinates: {}, Value: {}".format(coordinates, value))
Consider a matrix $M$ (aka a two-dimensional tensor) containing the following values:
$$M = \begin{bmatrix} 6 & \cdot & 9 & 8 \\ \cdot & \cdot & \cdot & \cdot \\ 5 & \cdot & \cdot & 7 \end{bmatrix}$$Denote the rows and columns as dimensions $d_1$ and $d_2$, respectively.
We look at how $M$ is represented differently in different formats. For convenience, let's define a helper function to initialize $M$.
def make_example_matrix(format):
M = pt.tensor([3, 4], format)
M.insert([0, 0], 6)
M.insert([0, 2], 9)
M.insert([0, 3], 8)
M.insert([2, 0], 5)
M.insert([2, 3], 7)
return M
Note that passing in dense
makes all of the dimensions dense. This is equivalent to pt.format([dense, dense])
.
make_example_matrix(dense)
For this example, we focus on the last line of the output, the vals
array: since all values are stored, it is a flattened $3 \times 4$ matrix stored in row-major order.
This is called compressed sparse row (CSR) format.
csr = pt.format([dense, compressed])
make_example_matrix(csr)
Since $d_1$ is dense, we need only store the size
of the dimension; values, both zero and nonzero, are stored for every coordinate in dimension $d_1$.
Since $d_2$ is compressed, we store a pos
array ([0, 3, 3, 5]
) and an idx
array ([0, 2, 3, 0, 3]
); these together form a segmented vector with one segment per entry in the previous dimension. The idx
array stores all the indices with nonzero values in the dimension, while the pos
array stores the location in the idx
array where each segment begins. In particular, segment $i$ is stored in locations pos[i]:pos[i+1]
in the idx
array.
The below animation visualizes the format. Hover over any non-empty entry of the matrix on the left to see how the value is stored.
%run ./animation/animation_2.py
We switch the order of the dimensions by passing in [1, 0]
for the optional parameter mode_ordering
. This results in a column-major (rather than row-major) format called compressed sparse column (CSC).
csc = pt.format([dense, compressed], [1, 0])
make_example_matrix(csc)
In this format, $d_2$ has only size
, while $d_1$ has a pos
and idx
array.
%run ./animation/animation_3.py
To more clearly visualize the compressed sparse fiber (CSF) format, where all dimensions are sparse, we move to three dimensions. The tensor $N$ defined below is a $2 \times 3 \times 4$ tensor.
Similarly as above, passing in compressed
is equivalent to pt.format([compressed, compressed])
.
N = pt.tensor([2, 3, 4], compressed)
# First layer.
N.insert([0, 0, 0], 6)
N.insert([0, 0, 2], 9)
N.insert([0, 0, 3], 8)
N.insert([0, 2, 0], 5)
N.insert([0, 2, 3], 7)
# Second layer.
N.insert([1, 0, 1], 2)
N.insert([1, 1, 0], 3)
N.insert([1, 2, 2], 6)
N.insert([1, 1, 3], 1)
N
This animation represents $N$ as two $3 \times 4$ matrices.
%run ./animation/animation_4.py
We can also initialize tensors with NumPy arrays or SciPy sparse (CSR or CSC) matrices. Let's start by importing the packages we need.
import numpy as np
import scipy as sp
Given a NumPy array such as the one randomly generated below, we can convert it to a TACO tensor using the pytaco.from_array
function, which creates a dense tensor.
array = np.random.uniform(size=10)
tensor = pt.from_array(array)
tensor
We can also export TACO tensors to NumPy arrays.
tensor.to_array()
Similarly, given a SciPy sparse matrix, we can convert it to a TACO tensor. For a CSR matrix like the one below, we use the pytaco.from_sp_csr
function, which creates a CSR tensor.
size = 100
density = 0.1
sparse_matrix = sp.sparse.rand(size, size, density, format = 'csr')
A = pt.from_sp_csr(sparse_matrix)
And we can export the tensor to a SciPy sparse matrix.
A.to_sp_csr()
The following example demonstrates how computations on tensors are performed using TACO.
Sparse matrix-vector multiplication (SpMV) is a bottleneck computation in many scientific and engineering computations. Mathematically, SpMV can be expressed as $$y = Ax + z,$$
where $A$ is a sparse matrix and $x$, $y$, and $z$ are dense vectors. The computation can also be expressed in index notation as
$$y_i = A_{ij} \cdot x_j + z_i.$$Starting with the $A$ generated above, we can view its shape
attribute, which we expect to be [size, size]
:
A.shape
Examining the formula, we need to define a vector $x$ whose length is the number of columns of $A$, and a vector $z$ whose length is the number of rows. We generate $x$ and $z$ randomly with NumPy.
x = pt.from_array(np.random.uniform(size=A.shape[1]))
z = pt.from_array(np.random.uniform(size=A.shape[0]))
We can express the result $y$ as a dense vector.
y = pt.tensor([A.shape[0]], dense)
The syntax for TACO computations closely mirrors index notation, with the caveat that we also have to explicitly declare the index variables beforehand:
i, j = pt.get_index_vars(2)
y[i] = A[i, j] * x[j] + z[i]
Once a tensor algebra computation has been defined, we can simply invoke the result tensor's evaluate
method to perform the actual computation.
Under the hood, TACO will first invoke the result tensor's compile
method to generate code that performs the computation. TACO will then perform the actual computation by first invoking assemble
to compute the sparsity structure of the result and subsequently invoking compute
to compute the values of the result's nonzero elements.
y.evaluate()
If we define a computation and then access the result without first manually invoking evaluate
or compile
/assemble
/compute
, TACO will automatically invoke the computation immediately before the result is accessed.
Finally, we can display the result.
y