As scientists transitioned to the Python Scientific Library during its rise in popularity, new users needed translations to their familiar software, such as Matlab. In the same way, it's important to compare Numpy strategies for organizing data with the framework of WrightTools. This notebook attempts to show how common tools of Numpy (especially advanced indexing) can translate to the WrightTools framework relatively easily.
These examples are concerned with the Data
objects that retain the rectangular shapes of ndarrays. The benefits of numpy arrays can generally be accessed within WrightTools. It is a subset of the more general Data
object representations that can be used (via making axes that are linear combinations of variables).
Don't forget to also consult the WrightTools documentation or examine WrightTools test scripts for examples.
data
object¶import numpy as np
import WrightTools as wt
print(np.__version__) # tested on 1.18.1
print(wt.__version__) # tested on 3.3.1
x = np.linspace(0, 1, 5) # Hz
y = np.linspace(500, 700, 3) # nm
z = np.exp(-x[:, None]) * np.sqrt(y - 500)[None, :]
data = wt.Data()
data.create_channel(name="z", values=z)
# BE SURE TO BROADCAST REDUCED DIM VARIABLES--this is how wt.Data knows dimensionality
data.create_variable(name="x", values=x[:, None], units="Hz")
data.create_variable(name="y", values=y[None, :], units="nm")
data.transform("x", "y") # create axes
data.print_tree()
Note that we need to broadcast the variables into the data object. The reason is similar to why we need to broadcast x
and y
arrays when defining the z
array: otherwise it is not clear that the data is multidimensional. Failing to broadcast is a "gotcha", because WrightTools will still create data
(just as with the z-array, you can make a 1D array from x
and y
); you will only run into problems once you try to work with the data
.
WARNING: Array index order does not correspond to axes numbers!
print([ax.natural_name for ax in data.axes])
print(f"data.z.shape= {data.z.shape}")
data.transform("y", "x")
print([ax.natural_name for ax in data.axes]) # order of axes switches
print(f"data.z.shape = {data.z.shape}") # shape of channel does not
Under the hood, WrightTools objects, such as Data
, Channel
, and Variable
, are a type of hdf5
object, not numpy arrays (for more info, read about the wt5 format). As such, slice commands emulate ndarray behavior, and only support a subset of all numpy array indexing tricks.
All regular slice commands work. Using a null slice returns a full view of the channel or variable as a numpy array.
print(type(data.z))
print(type(data.z[:])) # a view of the z values as a numpy array
print(data.z[:5, 2:3]) # typical slicing arguments works here as well
REMINDER: the relationship between axis number and channel indices is not fixed (cf. data.transform
, above), and can be difficult to discern. For this reason, index slicing can get confusing quickly, especially if several dimensions have the same size. For a versatile option that leverages the strengths of WrightTools, use the split
and chop
methods to slice and iterate along dimensions, respectively. Examples using both methods are shown further below.
newaxis
or None
indexing to expand dimensionality¶try: # raises TypeError
data.z[..., np.newaxis] # or, equivalently, data.z[..., None]
except TypeError:
print("didn't work!")
A quick workaround, of course, is to work directly with the ndarray view (i.e. data.z[:]
), which is a numpy array and accepts all regular numpy indexing tricks. For example, newaxis
works here, as do other advanced indexing methods:
data.z[:][..., np.newaxis] # no error
data.z[:][np.isnan(data.z[:])] # no error
Be careful, though! Applying this indexing will not give you write capabilities:
temp = data.copy(verbose=False)
temp.z[:][..., np.newaxis] *= -1 # no error, but z does not change because boolean indexing copies the view
print(np.all(temp.z[:] == data.z[:])) # temp.z is unchanged!
Alternatives to expanding dimensionality include making a new data object with expanded ndarrays, or using wt.data.join
. We show how to expand dimensionality using wt.data.join
further below.
split
¶positive = z > 0 # first column is False
z_advind = z[positive] # traditional boolean array indexing with a numpy array
try:
data.z[positive] # doesn't work
except TypeError:
print("Boolean indexing of channel did not work!")
data.z
cannot be indexed with a boolean array: instead, the split
method provides the analogous indexing tricks. To use split
, we first establish the boolean logic as an expression
, and then use split
to parse that expression.
For this example, we can pass the boolean logic it as a Variable
, and then split based on the variable value:
temp = data.copy()
temp.create_variable(name="zvar", values=positive)
zpositive = temp.split("zvar", [True])[1]
print(data.z[:], '\n')
print(positive, '\n')
print(zpositive.z[:], '\n')
data.split
¶z_subset = z[x >= 0.5]
data_subset = data.split("x", [0.5], units="Hz", verbose=False)[1]
print("ndim:", data_subset.ndim)
print(np.all(data_subset.z[:] == z_subset))
data.chop
¶z_subset = z[2] # z_subset = z[x==x[2]] is equivalent
data_subset = data.chop("y", at={"x": [data.x[2], "Hz"]}, verbose=False)[0]
print("ndim:", data_subset.ndim)
print(np.all(data_subset.z[:] == z_subset))
Use chop to loop through reduced dimensions of the data.
# option 1: iterate through collection
chop = data.chop("y")
for di in chop.values():
print(di.constants)
print("\r")
# option 2: iterate through points, use "at" kwarg
for xi in data.x.points:
di = data.chop("y", at={"x": [xi, data.x.units]}, verbose=False)[0]
print(di.constants)
For very large datasets, the second option is useful because you never deal with the whole collection. Thus you can loop through individual chop elements and close them after each iteration, saving memory.
np.newaxis
(or arr[:, None]
) --> create_variable
¶For a Data object to understand another dimension, create a new variable for the dataset (and transform to the new variable). Since np.newaxis
makes an orthogonal array dimension, the new variable will be a constant over the data currently spanned:
z_na = z[..., None]
new_data = data.copy()
new_data.create_variable(name="Temperature", values=np.ones((1, 1)))
# note the variable shape--variable is broadcast to all elements of the data
# optional: declare Temperature a constant via `create constant`
# new_data.create_constant("Temperature")
new_data.transform("x", "y", "Temperature")
print("z_na.shape: ", z_na.shape, f" ({z_na.ndim}D)")
print("new_data.shape: ", new_data.shape, f" ({new_data.ndim}D)")
We have added a new variable, but new_data
does not increase dimensionality. Dimensionality corresponds to the array shapes, not the number of variables (the dimensionality would still be 2 even if Temperature
changed for each x
and y
coordinate).
Even though the dimensionality has not changed, new_data
now understands another axis is in play. The above procedure allows us to expand the dimensionality via join
.
np.concatenate
/tile
/stack
/block
, etc --> wt.data.join
¶If we have two datasets with a trivial dimension of different values, we can combine them to achieve a higher dimensionality data object:
new_data2 = data.copy(verbose=False)
new_data2.create_variable(name="Temperature", values=np.ones((1, 1))*2)
# new_data2.create_constant("Temperature")
new_data2.transform("x", "y", "Temperature")
data_with_temps = wt.data.join([new_data, new_data2])
data_with_temps.print_tree()
Note that this strategy can be used to undo the chop
operation:
chopped = data.chop("y", verbose=False) # data objects in `chopped` have the same axes and points, but differing "y" values
# pre-condition data as higher dimensionality
for di in chopped.values():
di.transform("x", "y")
stacked = wt.data.join(chopped.values(), name="stacked", verbose=False)
stacked.print_tree()
print(np.all(stacked.z[:] == data.z[:]))
This problem is equivalent to inverting split
. Note that this rectification will only recompose the array shape to within a transpose.
splitted = data.split("x", [0.5], units="Hz") # data objects with the same axes, but different points
concatenated = wt.data.join(splitted.values(), name="concatenated", verbose=False)
print(data.shape, concatenated.shape) # note: different shapes can arise!
print(np.all(data.z[:].T == concatenated.z[:]))
z **= 2
data.z **= 2 # works for +, -, /, *, **
print(np.all(data.z[:] == z))
data.z **= 0.5
z **= 0.5
# within the same data object:
data.create_channel(name="zed", values=-data.z[:])
data.zed += data.z
print(data.zed[:])
data.remove_channel("zed")
# between two data objects
data2 = data.copy(verbose=False)
data2.z += data.z
print(np.all(data2.z[:] == 2 * data.z[:]))
data2.close()
Variables require tricky syntax to change (the above channel math will not work). But the following will work:
change_x = data.copy(verbose=False)
x = change_x["x"] # the x reference is necessary to use setitem (*=, +=, etc.) syntax on a variable
x **=2
print(np.all(data["x"][:]**2 == change_x["x"][:]))
However, do you really want to change your variables, or just use different units? It's often the latter. In that case, apply convert
to the axes:
units_data = data.copy()
units_data.convert("wn") # all axes with frequency/wavelength units will be converted to wavenumbers
units_data.print_tree()
units_data.x.convert("Hz") # apply conversion only to x axis
units_data.print_tree()
Maybe you do need to do math that is not a unit conversion (for instance, shift delay values). If needed, you can overwrite the old variable by removing it and renaming the new variable:
print(*data.x[:])
# define replacement variable
data.create_variable(name="_x", values=np.linspace(0, 2, data.x.size).reshape(data["x"].shape), units=data.x.units)
# remove target variable
data.transform("y")
data.remove_variable("x")
# replace target variable
data.rename_variables(_x="x")
data.transform("x", "y")
data.print_tree()
print(*data.x[:])
transform
¶Axes are just expressions of variables, so the scope of Axis
math is the linear combinations of variables (plus offsets). Keep in mind that calling linear combinations of variables will force the Data object to rectify the units of all variables involved.
data = data.copy(verbose=False)
data.transform("2*x", "3*y") # do not use spaces when defining axes
print(*data.axes)
data.transform("2*x-y", "2*y") # note that axis[0] is now 2D
print(*data.axes)
data.transform("x-2", "y") # constant 2 is interpreted in units of `data.x.units`
print(*data.axes)
We've gone over many of the common array manipulations that make numpy so powerful, and shown how similar manipulations can be done within the WrightTools framework. Hopefully these examples will empower you explore your datasets with ease!
Need more examples? Are there some numpy manipulations you are still curious about? Give us feedback by making an issue at our GitHub page!