:label:sec_ndarray
In order to get anything done, we need some way to store and manipulate data.
Generally, there are two important things we need to do with data:
(i) acquire them; and (ii) process them once they are inside the computer.
There is no point in acquiring data without some way to store it,
so let us get our hands dirty first by playing with synthetic data.
To start, we introduce the $n$-dimensional array (ndarray
),
MXNet's primary tool for storing and transforming data.
In MXNet, ndarray
is a class and we call any instance "an ndarray
".
If you have worked with NumPy, the most widely-used
scientific computing package in Python,
then you will find this section familiar.
That's by design. We designed MXNet's ndarray
to be
an extension to NumPy's ndarray
with a few killer features.
First, MXNet's ndarray
supports asynchronous computation
on CPU, GPU, and distributed cloud architectures,
whereas NumPy only supports CPU computation.
Second, MXNet's ndarray
supports automatic differentiation.
These properties make MXNet's ndarray
suitable for deep learning.
Throughout the book, when we say ndarray
,
we are referring to MXNet's ndarray
unless otherwise stated.
In this section, we aim to get you up and running, equipping you with the basic math and numerical computing tools that you will build on as you progress through the book. Do not worry if you struggle to grok some of the mathematical concepts or library functions. The following sections will revisit this material in the context of practical examples and it will sink. On the other hand, if you already have some background and want to go deeper into the mathematical content, just skip this section.
To start, we import the api
and mxnet-engine
modules from Deep Java Library on maven.
Here, the api
module includes all high level Java APIs that will be used for data processing, training and inference. The mxnet-engine
includes the implementation of those high level APIs using Apache MXnet framework.
Using the DJL automatic engine mode, the MXNet native libraries with basic operations and functions implemented in C++ will be downloaded automatically when DJL is first used.
%load ../utils/djl-imports
An ndarray
represents a (possibly multi-dimensional) array of numerical values.
With one axis, an ndarray
corresponds (in math) to a vector.
With two axes, an ndarray
corresponds to a matrix.
Arrays with more than two axes do not have special
mathematical names---we simply call them tensors.
To start, we can use arange
to create a row vector x
containing the first $12$ integers starting with $0$,
though they are created as floats by default.
Each of the values in an ndarray
is called an element of the ndarray
.
For instance, there are $12$ elements in the ndarray
x
.
Unless otherwise specified, a new ndarray
will be stored in main memory and designated for CPU-based computation.
NDManager manager = NDManager.newBaseManager();
var x = manager.arange(12);
x
Here we are using a NDManager
to create the ndarray
x. NDManager
implements the AutoClosable interface and manages the life cycles of the ndarray
s it created. This is needed to help manage native memory consumption that Java Garbage Collector does not have control of. We usually wrap NDManager with try blocks so all ndarray
s will be closed in time. To know more about memory management, read DJL's documentation.
try(NDManager manager = NDManager.newBaseManager()){
NDArray x = manager.arange(12);
}
We can access an ndarray
's shape (the length along each axis)
by inspecting its shape
property.
x.getShape()
If we just want to know the total number of elements in an ndarray
,
i.e., the product of all of the shape elements,
we can inspect its size
property.
Because we are dealing with a vector here,
the single element of its shape
is identical to its size
.
x.size()
To change the shape of an ndarray
without altering
either the number of elements or their values,
we can invoke the reshape
function.
For example, we can transform our ndarray
, x
,
from a row vector with shape ($12$,) to a matrix with shape ($3$, $4$).
This new ndarray
contains the exact same values,
but views them as a matrix organized as $3$ rows and $4$ columns.
To reiterate, although the shape has changed,
the elements in x
have not.
Note that the size
is unaltered by reshaping.
x = x.reshape(3, 4);
x
Reshaping by manually specifying every dimension is unnecessary.
If our target shape is a matrix with shape (height, width),
then after we know the width, the height is given implicitly.
Why should we have to perform the division ourselves?
In the example above, to get a matrix with $3$ rows,
we specified both that it should have $3$ rows and $4$ columns.
Fortunately, ndarray
can automatically work out one dimension given the rest.
We invoke this capability by placing -1
for the dimension
that we would like ndarray
to automatically infer.
In our case, instead of calling x.reshape(3, 4)
,
we could have equivalently called x.reshape(-1, 4)
or x.reshape(3, -1)
.
Passing create
method with only Shape
will grab a chunk of memory and hands us back a matrix
without bothering to change the value of any of its entries.
This is remarkably efficient but we must be careful because
the entries might take arbitrary values, including very big ones!
manager.create(new Shape(3, 4))
Typically, we will want our matrices initialized
either with zeros, ones, some other constants,
or numbers randomly sampled from a specific distribution.
We can create an ndarray
representing a tensor with all elements
set to $0$ and a shape of ($2$, $3$, $4$) as follows:
manager.zeros(new Shape(2, 3, 4))
Similarly, we can create tensors with each element set to 1 as follows:
manager.ones(new Shape(2, 3, 4))
Often, we want to randomly sample the values
for each element in an ndarray
from some probability distribution.
For example, when we construct arrays to serve
as parameters in a neural network, we will
typically initialize their values randomly.
The following snippet creates an ndarray
with shape ($3$, $4$).
Each of its elements is randomly sampled
from a standard Gaussian (normal) distribution
with a mean of $0$ and a standard deviation of $1$.
manager.randomNormal(0f, 1f, new Shape(3, 4), DataType.FLOAT32)
You can also just pass the shape and it will use default values for mean and standard deviation (0 and 1).
manager.randomNormal(new Shape(3, 4))
We can also specify the exact values for each element in the desired ndarray
by supplying an array containing the numerical values and the desired shape.
manager.create(new float[]{2, 1, 4, 3, 1, 2, 3, 4, 4, 3, 2, 1}, new Shape(3, 4))
This book is not about software engineering. Our interests are not limited to simply reading and writing data from/to arrays. We want to perform mathematical operations on those arrays. Some of the simplest and most useful operations are the elementwise operations. These apply a standard scalar operation to each element of an array. For functions that take two arrays as inputs, elementwise operations apply some standard binary operator on each pair of corresponding elements from the two arrays. We can create an elementwise function from any function that maps from a scalar to a scalar.
In mathematical notation, we would denote such a unary scalar operator (taking one input) by the signature $f: \mathbb{R} \rightarrow \mathbb{R}$. This just means that the function is mapping from any real number ($\mathbb{R}$) onto another. Likewise, we denote a binary scalar operator (taking two real inputs, and yielding one output) by the signature $f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$. Given any two vectors $\mathbf{u}$ and $\mathbf{v}$ of the same shape, and a binary operator $f$, we can produce a vector $\mathbf{c} = F(\mathbf{u},\mathbf{v})$ by setting $c_i \gets f(u_i, v_i)$ for all $i$, where $c_i, u_i$, and $v_i$ are the $i^\mathrm{th}$ elements of vectors $\mathbf{c}, \mathbf{u}$, and $\mathbf{v}$. Here, we produced the vector-valued $F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d$ by lifting the scalar function to an elementwise vector operation.
In DJL, the common standard arithmetic operators
(+
, -
, *
, /
, and **
)
have all been lifted to elementwise operations
for any identically-shaped tensors of arbitrary shape.
We can call elementwise operations on any two tensors of the same shape.
In the following example, we use commas to formulate a $5$-element tuple,
where each element is the result of an elementwise operation. Note: you need to use add
, sub
, mul
, div
, and pow
as Java does not support overloading of these operators.
var x = manager.create(new float[]{1f, 2f, 4f, 8f});
var y = manager.create(new float[]{2f, 2f, 2f, 2f});
x.add(y);
x.sub(y);
x.mul(y);
x.div(y);
x.pow(y);
Many more operations can be applied elementwise, including unary operators like exponentiation.
x.exp()
In addition to elementwise computations,
we can also perform linear algebra operations,
including vector dot products and matrix multiplication.
We will explain the crucial bits of linear algebra
(with no assumed prior knowledge) in :numref:sec_linear-algebra
.
We can also concatenate multiple ndarray
s together,
stacking them end-to-end to form a larger ndarray
.
We just need to provide a list of ndarray
s
and tell the system along which axis to concatenate.
The example below shows what happens when we concatenate
two matrices along rows (axis $0$, the first element of the shape)
vs. columns (axis $1$, the second element of the shape).
We can see that the first output ndarray
's axis-$0$ length ($6$)
is the sum of the two input ndarray
s' axis-$0$ lengths ($3 + 3$);
while the second output ndarray
's axis-$1$ length ($8$)
is the sum of the two input ndarray
s' axis-$1$ lengths ($4 + 4$).
x = manager.arange(12f).reshape(3, 4);
y = manager.create(new float[]{2, 1, 4, 3, 1, 2, 3, 4, 4, 3, 2, 1}, new Shape(3, 4));
x.concat(y) // default axis = 0
x.concat(y, 1)
Sometimes, we want to construct a binary ndarray
via logical statements.
Take x.eq(y)
as an example.
For each position, if x
and y
are equal at that position,
the corresponding entry in the new ndarray
takes a value of $1$,
meaning that the logical statement x.eq(y)
is true at that position;
otherwise that position takes $0$.
x.eq(y)
Summing all the elements in the ndarray
yields an ndarray
with only one element.
x.sum()
For stylistic convenience, we can write x.sum()
as np.sum(x)
.
In the above section, we saw how to perform elementwise operations
on two ndarray
s of the same shape. Under certain conditions,
even when shapes differ, we can still perform elementwise operations
by invoking the broadcasting mechanism.
This mechanism works in the following way:
First, expand one or both arrays
by copying elements appropriately
so that after this transformation,
the two ndarray
s have the same shape.
Second, carry out the elementwise operations
on the resulting arrays.
In most cases, we broadcast along an axis where an array initially only has length $1$, such as in the following example:
var a = manager.arange(3f).reshape(3, 1);
var b = manager.arange(2f).reshape(1, 2);
a
b
Since a
and b
are $3\times1$ and $1\times2$ matrices respectively,
their shapes do not match up if we want to add them.
We broadcast the entries of both matrices into a larger $3\times2$ matrix as follows:
for matrix a
it replicates the columns
and for matrix b
it replicates the rows
before adding up both elementwise.
a.add(b)
DJL use the same syntax as Numpy in Python for indexing and slicing. Just as in any other Python array, elements in an ndarray
can be accessed by index.
As in any Python array, the first element has index $0$
and ranges are specified to include the first but before the last element.
As in standard Python lists, we can access elements
according to their relative position to the end of the list
by using negative indices.
Thus, [-1]
selects the last element and [1:3]
selects the second and the third elements as follows:
x.get(":-1");
x.get("1:3")
Beyond reading, we can also write elements of a matrix by specifying indices.
x.set(new NDIndex("1, 2"), 9);
x
If we want to assign multiple elements the same value,
we simply index all of them and then assign them the value.
For instance, [0:2, :]
accesses the first and second rows,
where :
takes all the elements along axis $1$ (column).
While we discussed indexing for matrices,
this obviously also works for vectors
and for tensors of more than $2$ dimensions.
x.set(new NDIndex("0:2, :"), 12);
x
Running operations can cause new memory to be
allocated to host results.
For example, if we write y = x.add(y)
,
we will dereference the ndarray
that y
used to point to
and instead point y
at the newly allocated memory.
This might be undesirable for two reasons. First, we do not want to run around allocating memory unnecessarily all the time. In machine learning, we might have hundreds of megabytes of parameters and update all of them multiple times per second. Typically, we will want to perform these updates in place. Second, we might point at the same parameters from multiple variables. If we do not update in place, other references will still point to the old memory location, making it possible for parts of our code to inadvertently reference stale parameters.
Fortunately, performing in-place operations in DJL is easy.
We can assign the result of an operation
to a previously allocated array using inplace operators like addi
, subi
, muli
, and divi
.
var original = manager.zeros(y.getShape());
var actual = original.addi(x);
original == actual
ndarray
is an extension to NumPy's ndarray
with a few killer advantages that make it suitable for deep learning.ndarray
provides a variety of functionalities including
basic mathematics operations, broadcasting, indexing, slicing,
memory saving, and conversion to other Python objects.x.eq(y)
in this section to x.lt(y)
(less than) or x.gt(y)
(greater than), and then see what kind of ndarray
you can get.ndarray
s that operate by element in the broadcasting mechanism with other shapes, e.g., three dimensional tensors. Is the result the same as expected?