Arrays can be generated in a number of ways, depending on their properties and the application they are used for. In many situations it is necessary to generate arrays with elements that follow some given rule, such as filled with constant values, increasing integers, uniformly spaced numbers, random numbers, etc. In other cases we might need to create arrays from data stored in a file.
The requirements are many and varied, and the NumPy library provides a comprehensive set of functions for generating arrays of various types. A summary or frequently used array-generating functions is given in the following table:
Function Name | Type of Array |
---|---|
np.array | Creates an array for which the elements are given by an array-like object, which, for example, can be a (nested) Python list, a tuple, an iterable sequence, or another ndarray instance. |
np.zeros | Creates an array with the specified dimensions and data type that is filled with zeros. |
np.ones | Creates an array with the specified dimensions and data type that is filled with ones. |
np.diag | Creates a diagonal array with specified values along the diagonal and zeros elsewhere. |
np.arange | Creates an array with evenly spaced values between the specified start, end, and increment values. |
np.linspace | Creates an array with evenly spaced values between the specified start and end values, using a specified number of elements. |
np.logspace | Creates an array with values that are logarithmically spaced between the given start and end values. |
np.meshgrid | Generates coordinate matrices (and higher-dimensional coordinate arrays) from one-dimensional coordinate vector. |
np.fromfunction | Creates an array and fills it with values specified by a given function, which is evaluated for each combination of indices for the given array size. |
np.fromfile | Creates an array with the data from a binary (or text) file. NumPy also provides a corresponding function np.tofile with which NumPy arrays can be stored to disk and later read back using np.fromfile. |
np.genfromtxt, np.loadtxt | Create an array from data read from a text file, for example, a comma-separated values (CVS) file. The function np.genfromtxt also supports data files with missing values. |
np.random.rand | Generates an array with random number that are uniformly distributed between 0 and 1. Other types of distributions are also available in the np.random module. |
Using the np.array function, NumPy arrays can be constructed from explicit Python lists, iterable expressions, and other array-like objets (such as other ndarray instances). For example, to create a one-dimensional array from a Python list, we simply pass the Python list as an argument to the np.array function:
# Importing the NumPy library
import numpy as np
data = np.array([1, 2, 3, 4]); data
array([1, 2, 3, 4])
data.ndim
1
data.shape
(4,)
To create a two-dimensional array with the same data as in the previous example, we can use a nested Python list:
data = np.array([[1, 2], [3, 4]]); data
array([[1, 2], [3, 4]])
data.ndim
2
data.shape
(2, 2)
The functions np.zeros and np.ones create and return arrays filled with zeros and ones, respectively. They take, as first argument, an integer or a tuple that describes the number of elements along each dimensions of the array. For example, to create a $2*3$ array filled with zeros, and an array of length 4 filled with ones, we can use:
np.zeros((2, 3))
array([[0., 0., 0.], [0., 0., 0.]])
np.ones(4)
array([1., 1., 1., 1.])
Like other array-generating functions, the np.zeros and np.ones functions also accept an optional keyword argument that specifies the data types for the elements in the array. By default, the data type is float64, and it can be changed to the requiered type by explicitly specifying the dtype argument
data = np.ones(4)
data.dtype
dtype('float64')
data = np.ones(4, dtype=np.int64)
data.dtype
dtype('int64')
An array filled with an arbitrary constant value can be generated by first creating an array filled with ones and then multiplying the array with the desired fill value. However, NumPy also provides the function np.full that does exactly this in one step. The following two ways of constructing arrays with ten elements, which are initialized to the numerical number 5.4 in this example, produces the same results, but using np.full is slightly more efficient since it avoids the multiplications.
x1 = 5.4 * np.ones(10)
x2 = np.full(10, 5.4)
An already created array can also be filled with constant values using the np.fill function, which takes an array and a value as arguments, and set all elements in the array to the given value. The following two methods to create an array therefore given the same results:
x1 = np.empty(5)
x1.fill(3.0)
x1
array([3., 3., 3., 3., 3.])
x2 = np.full(5, 3.0)
x2
array([3., 3., 3., 3., 3.])
In this last example, we also used the np.empty function, which generates an array with unininitialized values, of the given size. This function should only be used when the initialization of all elements can be guaranteed by other means, such as an explicit loop over the array elements or another explicit assignement.
In numerical computing it is very common to require arrays with evenly spaced values between a starting value and ending value. NumPy provides two similar functions to create such arrays: np.arange and np.linspace. Both functions take three arguments, where the first two arguments are the start and end values. The third argument of np.arange is the increment, while for np.linspace it is the total number of points in the array.
For example, to generate arrays with values between 1 and 10, with increment 1, we could use either of the following:
np.arange(0, 10, 1)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.linspace(0, 10, 11)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
However, note that np.arange does not include the end value (10), while by default np.linspace does (although this behavior can be changed using the optional endpoint keyword argument). Whether to use np.arange or np.linspace is mostly a matter of personal preference, but it is generally recommended to use np.linspace whenever the increment is a noninteger.
The function np.logspace is similar to np.linspace, but the increments between the elements in the array are logarithmically distributed, and the first two arguments, for the start and end values, are the powers of the optional base keyword argument (which defaults to 10).
For example, to generate an array with logarithmically distributed values between 1 and 100, we can use:
np.logspace(0, 2, 5) # 5 data points between 10**0=1 to 10**2=100
array([ 1. , 3.16227766, 10. , 31.6227766 , 100. ])
Multidimensional coordinate grids can be generated using the function np.meshgrid. Given two one-dimensional coordinate arrays (i.e., arrays containing a set of coordinates along a given dimension), we can generate two-dimensional coordinate arrays using the np.meshgrid function.
An ilustration of this is given in the following example:
x = np.array([-1, 0, 1])
y = np.array([-2, 0, 2])
X, Y = np.meshgrid(x, y)
X
array([[-1, 0, 1], [-1, 0, 1], [-1, 0, 1]])
Y
array([[-2, -2, -2], [ 0, 0, 0], [ 2, 2, 2]])
A common use-case of the two-dimensional coordinate arrays, like $X$ and $Y$ in this example, is to evaluate functions over two variables $x$ and $y$. This can be used when plotting functions over two variables, as colormap plots and contour plots.
For example, to evaluate the expression $(X + Y)^{2}$ at all combinations of values from the $x$ and $y$ arrays in the preceding section, we can use two-dimensional coordinate arrays $X$ and $Y$:
Z = (X + Y) ** 2
Z
array([[9, 4, 1], [1, 0, 1], [1, 4, 9]], dtype=int32)
It is also possible to generate higher-dimensional coordinate arrays by passing more arrays as argument to the np.meshgrid function. Alternatively, the functions np.mgrid and np.ogrid can also be used to generate coordinate arrays, using a slightly different syntax based on indexing and slice objects.
To create an array of specific size and data type, but without initializing the elements in the array to any particular values, we can use the function np.empty. The advantage of using this function, for example, instead of np.zeros, which creates an array initialized with zero-valued elements, is that we can avoid the initiaton step. If all elements are guaranteed to be initialized later in the code, this can save a little bit of time, especially when working with large arrays.
To illustrate the use of the np.empty function, consider the following example:
np.empty(3, dtype=np.float)
array([8.04892414e-312, 0.00000000e+000, 4.22795269e-307])
Here we generated a new array with three elements of type float. There is no guarantee that the elements have any particular values, and the actual values will vary from time to time. For this reason it is important that all values, and are explicitly assigned before the array is used; otherwise unpredictable errors are likely to arise. Often the np.zeros function is a safer alternative to np.empty, and if the performance gain is not essential, it is better to use np.zeros, to minimize the likelihood of subtle and hard-to-reproduce bugs due to uninitialized values in the array returned by np.empty.
It is often necessary to create new arrays that share properties, such as shape and data type, with another array. NumPy provides a family of functions for this purpose: np.ones_like, np.zeros_like, np.full_like, and np.empty_like. A typical use-case is a function that takes arrays of unspecified type and size as arguments and requires working arrays of the same size and type.
For example, a boilerplate example of this situation is given in the following function:
def f(x):
y = np.ones_like(x)
# compute whit x and y
return y```
At the first line of the body of this function, an new array $y$ is created using np.ones_like, which results in an array of the same size and data type as $x$, and filled with ones.
### Creating Matrix Arrays
Matrices, or two-dimensional arrays, are an important case for numerical computing. NumPy provides functions for generating commonly used matrices. In particular, the function np.identity generates a square matrix with ones on the diagonal and zeros elsewhere:
np.identity(4)
array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]])
The similar function numpy.eye generate matrices with ones on a diagonal (optional offset). This is illustrated in the following example, which produces matrices with nonzero diagonals above and below the diagonal repectively:
np.eye(3, k=1)
array([[0., 1., 0.], [0., 0., 1.], [0., 0., 0.]])
np.eye(3, k=-1)
array([[0., 0., 0.], [1., 0., 0.], [0., 1., 0.]])
To construct a matrix with an arbitrary one-dimensional array on the diagonal, we can use the np.diag function (which also takes the optional keyword argument k to specify an offset from the diagonal), as demonstrated here:
np.diag(np.arange(0, 20, 5))
array([[ 0, 0, 0, 0], [ 0, 5, 0, 0], [ 0, 0, 10, 0], [ 0, 0, 0, 15]])
Here we gave a third argument to the np.arange function, which specifies the step size in the enumeration of elements in the array returned by the function. The resulting array therefore contains the values [0, 5, 10, 15], which is inserted on the diagonal of a two-dimensional matrix by the np.diag function.