Notebook

WrightTools for Numpy Users¶

As scientists transitioned to the Python Scientific Library during its rise in popularity, new users needed translations to their familiar software, such as Matlab. In the same way, it's important to compare Numpy strategies for organizing data with the framework of WrightTools. This notebook attempts to show how common tools of Numpy (especially advanced indexing) can translate to the WrightTools framework relatively easily.

These examples are concerned with the Data objects that retain the rectangular shapes of ndarrays. The benefits of numpy arrays can generally be accessed within WrightTools. It is a subset of the more general Data object representations that can be used (via making axes that are linear combinations of variables).

Don't forget to also consult the WrightTools documentation or examine WrightTools test scripts for examples.

numpy ndarrays --> `data` object¶

In [ ]:

import numpy as np
import WrightTools as wt
print(np.__version__)  # tested on 1.18.1
print(wt.__version__)  # tested on 3.3.1

In [ ]:

x = np.linspace(0, 1, 5)  # Hz
y = np.linspace(500, 700, 3)  # nm
z = np.exp(-x[:, None]) * np.sqrt(y - 500)[None, :]

data = wt.Data()
data.create_channel(name="z", values=z)
# BE SURE TO BROADCAST REDUCED DIM VARIABLES--this is how wt.Data knows dimensionality
data.create_variable(name="x", values=x[:, None], units="Hz")  
data.create_variable(name="y", values=y[None, :], units="nm")
data.transform("x", "y")  # create axes

data.print_tree()

Note that we need to broadcast the variables into the data object. The reason is similar to why we need to broadcast x and y arrays when defining the z array: otherwise it is not clear that the data is multidimensional. Failing to broadcast is a "gotcha", because WrightTools will still create data (just as with the z-array, you can make a 1D array from x and y); you will only run into problems once you try to work with the data.

WARNING: Array index order does not correspond to axes numbers!

In [ ]:

print([ax.natural_name for ax in data.axes])
print(f"data.z.shape= {data.z.shape}")

data.transform("y", "x")
print([ax.natural_name for ax in data.axes])  # order of axes switches
print(f"data.z.shape = {data.z.shape}")  # shape of channel does not

Indexing, advanced indexing¶

Under the hood, WrightTools objects, such as Data, Channel, and Variable, are a type of hdf5 object, not numpy arrays (for more info, read about the wt5 format). As such, slice commands emulate ndarray behavior, and only support a subset of all numpy array indexing tricks.

Access datasets as numpy arrays using slice commands:¶

All regular slice commands work. Using a null slice returns a full view of the channel or variable as a numpy array.

In [ ]:

print(type(data.z))
print(type(data.z[:]))  # a view of the z values as a numpy array
print(data.z[:5, 2:3])  # typical slicing arguments works here as well

REMINDER: the relationship between axis number and channel indices is not fixed (cf. data.transform, above), and can be difficult to discern. For this reason, index slicing can get confusing quickly, especially if several dimensions have the same size. For a versatile option that leverages the strengths of WrightTools, use the split and chop methods to slice and iterate along dimensions, respectively. Examples using both methods are shown further below.

Do not use `newaxis` or `None` indexing to expand dimensionality¶

In [ ]:

try:  # raises TypeError
    data.z[..., np.newaxis]  # or, equivalently, data.z[..., None]
except TypeError:
    print("didn't work!")

A quick workaround, of course, is to work directly with the ndarray view (i.e. data.z[:]), which is a numpy array and accepts all regular numpy indexing tricks. For example, newaxis works here, as do other advanced indexing methods:

In [ ]:

data.z[:][..., np.newaxis]  # no error
data.z[:][np.isnan(data.z[:])]  # no error

Be careful, though! Applying this indexing will not give you write capabilities:

In [ ]:

temp = data.copy(verbose=False)
temp.z[:][..., np.newaxis] *= -1  # no error, but z does not change because boolean indexing copies the view

print(np.all(temp.z[:] == data.z[:]))  # temp.z is unchanged!

Alternatives to expanding dimensionality include making a new data object with expanded ndarrays, or using wt.data.join. We show how to expand dimensionality using wt.data.join further below.

Do not use boolean array indexing on channels and variables - consider `split`¶

In [ ]:

positive = z > 0  # first column is False
z_advind = z[positive]  # traditional boolean array indexing with a numpy array

try:
    data.z[positive]  # doesn't work
except TypeError:
    print("Boolean indexing of channel did not work!")

data.z cannot be indexed with a boolean array: instead, the split method provides the analogous indexing tricks. To use split, we first establish the boolean logic as an expression, and then use split to parse that expression.

For this example, we can pass the boolean logic it as a Variable, and then split based on the variable value:

In [ ]:

temp = data.copy()
temp.create_variable(name="zvar", values=positive)
zpositive = temp.split("zvar", [True])[1]
print(data.z[:], '\n')
print(positive, '\n')
print(zpositive.z[:], '\n')

Here's a few more good examples of using split

Slicing (Keep Dimensionality) --> `data.split`¶

In [ ]:

z_subset = z[x >= 0.5]
data_subset = data.split("x", [0.5], units="Hz", verbose=False)[1]

print("ndim:", data_subset.ndim)
print(np.all(data_subset.z[:] == z_subset))

Slicing (Reduce Dimensionality) --> `data.chop`¶

In [ ]:

z_subset = z[2]  # z_subset = z[x==x[2]] is equivalent
data_subset = data.chop("y", at={"x": [data.x[2], "Hz"]}, verbose=False)[0]

print("ndim:", data_subset.ndim)
print(np.all(data_subset.z[:] == z_subset))

Use chop to loop through reduced dimensions of the data.

In [ ]:

# option 1: iterate through collection
chop = data.chop("y")
for di in chop.values():
    print(di.constants)
print("\r")
# option 2: iterate through points, use "at" kwarg
for xi in data.x.points:
    di = data.chop("y", at={"x": [xi, data.x.units]}, verbose=False)[0]
    print(di.constants)

For very large datasets, the second option is useful because you never deal with the whole collection. Thus you can loop through individual chop elements and close them after each iteration, saving memory.

`np.newaxis` (or `arr[:, None]`) --> `create_variable`¶

For a Data object to understand another dimension, create a new variable for the dataset (and transform to the new variable). Since np.newaxis makes an orthogonal array dimension, the new variable will be a constant over the data currently spanned:

In [ ]:

z_na = z[..., None]

new_data = data.copy()
new_data.create_variable(name="Temperature", values=np.ones((1, 1)))
# note the variable shape--variable is broadcast to all elements of the data
# optional: declare Temperature a constant via `create constant`
# new_data.create_constant("Temperature")
new_data.transform("x", "y", "Temperature")

print("z_na.shape: ", z_na.shape, f" ({z_na.ndim}D)")
print("new_data.shape: ", new_data.shape, f" ({new_data.ndim}D)")

We have added a new variable, but new_data does not increase dimensionality. Dimensionality corresponds to the array shapes, not the number of variables (the dimensionality would still be 2 even if Temperature changed for each x and y coordinate).

Even though the dimensionality has not changed, new_data now understands another axis is in play. The above procedure allows us to expand the dimensionality via join.

`np.concatenate`/`tile`/`stack`/`block`, etc --> `wt.data.join`¶

Case I: increase dimensionality (stack, block)¶

If we have two datasets with a trivial dimension of different values, we can combine them to achieve a higher dimensionality data object:

In [ ]:

new_data2 = data.copy(verbose=False)
new_data2.create_variable(name="Temperature", values=np.ones((1, 1))*2)
# new_data2.create_constant("Temperature")
new_data2.transform("x", "y", "Temperature")

data_with_temps = wt.data.join([new_data, new_data2])
data_with_temps.print_tree()

Note that this strategy can be used to undo the chop operation:

In [ ]:

chopped = data.chop("y", verbose=False)  # data objects in `chopped` have the same axes and points, but differing "y" values
# pre-condition data as higher dimensionality
for di in chopped.values():
    di.transform("x", "y")
stacked = wt.data.join(chopped.values(), name="stacked", verbose=False)
stacked.print_tree()

print(np.all(stacked.z[:] == data.z[:]))

Case II: same dimensionality/mixed dimensionality (concatenate)¶

This problem is equivalent to inverting split. Note that this rectification will only recompose the array shape to within a transpose.

In [ ]:

splitted = data.split("x", [0.5], units="Hz")  # data objects with the same axes, but different points
concatenated = wt.data.join(splitted.values(), name="concatenated", verbose=False)  

print(data.shape, concatenated.shape)  # note:  different shapes can arise!
print(np.all(data.z[:].T == concatenated.z[:]))

Channel Array Math¶

Binary (channel + constant) operations¶

In [ ]:

z **= 2
data.z **= 2  # works for +, -, /, *, **

print(np.all(data.z[:] == z))

data.z **= 0.5
z **= 0.5

Binary (channel + channel) Operations¶

In [ ]:

# within the same data object:
data.create_channel(name="zed", values=-data.z[:])
data.zed += data.z

print(data.zed[:])
data.remove_channel("zed")

In [ ]:

# between two data objects
data2 = data.copy(verbose=False)
data2.z += data.z

print(np.all(data2.z[:] == 2 * data.z[:]))
data2.close()

Variable Math¶

Variables require tricky syntax to change (the above channel math will not work). But the following will work:

In [ ]:

change_x = data.copy(verbose=False)
x = change_x["x"]  # the x reference is necessary to use setitem (*=, +=, etc.) syntax on a variable
x **=2
print(np.all(data["x"][:]**2 == change_x["x"][:]))

However, do you really want to change your variables, or just use different units? It's often the latter. In that case, apply convert to the axes:

In [ ]:

units_data = data.copy()
units_data.convert("wn")  # all axes with frequency/wavelength units will be converted to wavenumbers
units_data.print_tree()
units_data.x.convert("Hz")  # apply conversion only to x axis
units_data.print_tree()

Maybe you do need to do math that is not a unit conversion (for instance, shift delay values). If needed, you can overwrite the old variable by removing it and renaming the new variable:

In [ ]:

print(*data.x[:])
# define replacement variable
data.create_variable(name="_x", values=np.linspace(0, 2, data.x.size).reshape(data["x"].shape), units=data.x.units)
# remove target variable
data.transform("y")
data.remove_variable("x")
# replace target variable
data.rename_variables(_x="x")
data.transform("x", "y")
data.print_tree()
print(*data.x[:])

Axes Math - `transform`¶

Axes are just expressions of variables, so the scope of Axis math is the linear combinations of variables (plus offsets). Keep in mind that calling linear combinations of variables will force the Data object to rectify the units of all variables involved.

In [ ]:

data = data.copy(verbose=False)
data.transform("2*x", "3*y")  # do not use spaces when defining axes
print(*data.axes)
data.transform("2*x-y", "2*y")  # note that axis[0] is now 2D
print(*data.axes)
data.transform("x-2", "y")  # constant 2 is interpreted in units of `data.x.units`
print(*data.axes)

Summary¶

We've gone over many of the common array manipulations that make numpy so powerful, and shown how similar manipulations can be done within the WrightTools framework. Hopefully these examples will empower you explore your datasets with ease!

Need more examples? Are there some numpy manipulations you are still curious about? Give us feedback by making an issue at our GitHub page!

In [ ]:

WrightTools for Numpy Users¶

numpy ndarrays --> data object¶

Indexing, advanced indexing¶

Access datasets as numpy arrays using slice commands:¶

Do not use newaxis or None indexing to expand dimensionality¶

Do not use boolean array indexing on channels and variables - consider split¶

Slicing (Keep Dimensionality) --> data.split¶

Slicing (Reduce Dimensionality) --> data.chop¶

np.newaxis (or arr[:, None]) --> create_variable¶

np.concatenate/tile/stack/block, etc --> wt.data.join¶

Case I: increase dimensionality (stack, block)¶

Case II: same dimensionality/mixed dimensionality (concatenate)¶

Channel Array Math¶

Binary (channel + constant) operations¶

Binary (channel + channel) Operations¶

Variable Math¶

Axes Math - transform¶

Summary¶

numpy ndarrays --> `data` object¶

Do not use `newaxis` or `None` indexing to expand dimensionality¶

Do not use boolean array indexing on channels and variables - consider `split`¶

Slicing (Keep Dimensionality) --> `data.split`¶

Slicing (Reduce Dimensionality) --> `data.chop`¶

`np.newaxis` (or `arr[:, None]`) --> `create_variable`¶

`np.concatenate`/`tile`/`stack`/`block`, etc --> `wt.data.join`¶

Axes Math - `transform`¶