# The Numeric Python Library (numpy)¶

This is a beginner guide looking at one of the most common Python libraries This can be viewed in the browser however the itneractive Python Notebook File can als be opened in JupyterLab. The Numeric Python Library (which is abbreviated using the lower case numpy) is used for numeric operations.

## Perquisites¶

Before looking at any Python library it is recommended to familarise yourself with the Python programming language and object orientated programming concepts. Details about these in addition to Python installation instructions are available in my perquisite guide below.

Object Orientated Programming

## Practical Applications of Arrays¶

Arrays are often taught in mathematics classes with a poor description of the practical applications and therefore may seem isoteric when first encountered. You are actually surrounded daily by data arrays and even as we speak while you are reading this tutorial you are actually interfacing with a 4D data array.

### Scalars (1D Array)¶

A scalar is just a single number which you are used to working with when performing a simple calculation.

In [1]:
x=0.1
y=0.2
x+y

Out[1]:
0.30000000000000004

A scalar float can be used to set the intensity level on a white LED where 0 corresponds to the LED being off and 1 is a level of maximum brightness suitable for the eye. Intermediate values can be set using a float between 0 and 1 which can be visualised using a dimmer switch. We can think of a single black and white pixel as having a minaturized version of this mechanism.

### Vectors (1D Arrays)¶

A vector is a series of numbers. Vectors are commonly used to store a series of linked values. For example if a velocity measurement is carried out at time 0, giving a velocity of 0, time 10 giving a velocity of 227.04, time 15 giving a velocity of 362.78, time 20 giving a velocity of 517.35, time 22.5 giving a velocity of 602.95 and time 330 giving a velocity of 901.67. The data may be stored as two lists.

In [2]:
t=[0,10,15,20,22.5,30]
v=[0,227.04,362.78,517.35,602.97,901.67]


These equally sized vectors can then be feed into plotting programs for example matplotlib.pyplot which I will discuss in a later guide and graphed out to visually represent the dataset.

Another purpose of a list is for the depiction of a color.

Visible light is defined as electromagnetic radiation that can be detected by the human eye. The human eye can see in the wavelength range of 390-750 nm. The human eye has three types of detectors, S-cones, M-cones and L-cones for Short, Medium and Long wavelength detection respectively. The average response from the three detectors in a human eye is as follows and this is of course the origin of the three "primary colors".

Instead of a single white LED, three colored LEDs may be selected that are blue, green and red respectively. These LEDs can be designed so that their output overlaps spatially.

If we examine the overlapping regime between the red and the green LEDs, both the M-cones and L-cones of the eye will detect light while the S-cones won’t. Using this ratio, the brain translates this as the color "yellow" i.e. what the brain receives from the eyes is a color ratio and the brain maps this to a color which we "see".

Each LED can be thought of as having its independent dimmer switch. This means each LEDs brightness control can be varied from 0 where the LED is off to a value 1 which is a level of maximum brightness suitable for the eye. Any color can be represented by [R,G,B] where R, G and B are floats between 0 and 1.

The three primary colors when the respective LED intensity is set to 1 and the other 2 LEDs are set to 0.

In [3]:
red=[1,0,0]
green=[0,1,0]
blue=[0,0,1]


The three secondary colors where 2 of the LEDs have a value of 1 and the third is set to 0 as seen in the overlapping regions of the diagram above.

In [4]:
cyan=[0,1,1]
yellow=[1,1,0]
magenta=[1,0,1]


White is when all LEDs are at 1 as seen in the centre of the diagram above.

In [5]:
white=[1,1,1]


Black is the absense of color i.e. all the LEDs are at 0.

In [6]:
black=[0,0,0]


The three LEDs depicted can be minaturized down to make a single colored pixel.

### Matrices (2D Arrays)¶

We can construct a matrix using a list of lists. We will construct a very simple matrix that has 3 rows and 3 columns. We can construct each row individually using a list and then to construct the matrix we can create a list of the rows.

Note the elements in each row are of equal size.

In [7]:
row0=[0.0,0.1,0.2]
row1=[0.4,0.5,0.6]
row2=[0.8,0.9,1.0]

image=[row0,row1,row2]


Looking at the output.

In [8]:
image

Out[8]:
[[0.0, 0.1, 0.2], [0.4, 0.5, 0.6], [0.8, 0.9, 1.0]]

We can see the outside [] is for the outer list and we have a nested [] for each individual row. Sometimes it is easier to construct the matrix over multiple lines which makes it easier to visualize the matrix.

In [9]:
image=[[0.0, 0.1, 0.2],
[0.5, 0.6, 0.7],
[0.8, 0.9, 1.0]]

In [10]:
image

Out[10]:
[[0.0, 0.1, 0.2], [0.5, 0.6, 0.7], [0.8, 0.9, 1.0]]

Earlier we discussed a white LED that has a dimmer switch ranging from 0 (off) to 1 (a maximum intensity suitable for the eye). In a black and white image, each pixel can be thought of as an independently controlled LED that has its own dimmer switch. The 3 by 3 matrix above can be visualized. Here we can see how each float correponds to an intensity level. The third axis the z-scale denotes the brightness which is normalised between 0 (black) and 1 (white) in this case.

The black and white picture above has 3 rows by 3 columns giving 9 pixels. A black and white picture taken with a camera, typically has many more pixels with a 1 MP camera having 1024 pixels.

We can print this image onto a piece of paper using a so called "dot matrix printer". New printers are out but we will use the term dot matrix as it explains exactly what a printer does. The dot matrix printer essentially treats the piece of paper as a grid and to each point in the grid prints a dot of ink. If the pixel is to be white, no ink is added (0) and if the pixel is black (1), the maximum amount of ink is added. If the shade is to be intermediate, an intermediate level of ink is added (a float between 0 and 1). Each dot can be translated to a number which represents the level of ink in the dot. A common printing resolution is 600 dpi meaning there are 600 dots per inch on the piece of paper. This translates to a piece of A4 paper merely being treated as a matrix that has 7014 rows and 4962 columns. The product of these scalar values gives 34,803,468 pixels which is abbreviated to ~35 million pixels or 35 mega pixels.

### 3D Arrays¶

A black and white book is an example of a 3D array. In a book each page (2D array) has the same number of rows and columns as indicated by the gridlines in Microsoft Word.

We seen that a page can be thought of as being a list of equally sized lists. A book can be thought of as a list of equally sized pages. Or more directly a book is a list of equally sized lists of equally sized lists.

In [11]:
page0row0=[0.0,1.0,0.0]
page0row1=[0.0,0.0,0.0]
page0row2=[0.0,1.0,0.0]

page0=[page0row0,page0row1,page0row2]

page1row0=[0.0,0.0,0.0]
page1row1=[1.0,0.0,1.0]
page1row2=[0.0,0.0,0.0]

page1=[page1row0,page1row1,page1row2]

page2row0=[1.0,1.0,1.0]
page2row1=[1.0,1.0,1.0]
page2row2=[1.0,0.0,1.0]

page2=[page2row0,page2row1,page2row2]

book=[page0,page1,page2]

In [12]:
book

Out[12]:
[[[0.0, 1.0, 0.0], [0.0, 0.0, 0.0], [0.0, 1.0, 0.0]],
[[0.0, 0.0, 0.0], [1.0, 0.0, 1.0], [0.0, 0.0, 0.0]],
[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 0.0, 1.0]]]

The above output may initially look confusing, for convenience this can also be split over multiple lines.

In [13]:
[[[0.0, 1.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 1.0, 0.0]],

[[0.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[0.0, 0.0, 0.0]],

[[1.0, 1.0, 1.0],
[1.0, 1.0, 1.0],
[1.0, 0.0, 1.0]]]

Out[13]:
[[[0.0, 1.0, 0.0], [0.0, 0.0, 0.0], [0.0, 1.0, 0.0]],
[[0.0, 0.0, 0.0], [1.0, 0.0, 1.0], [0.0, 0.0, 0.0]],
[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 0.0, 1.0]]]

For the white LED we know a value of 1.0 is white and a value of 0.0 is black. For this simple book, there are 3 pages and each page has 3 rows by 3 columns. The pages displayed side by side spells out "HI."

We have seen earlier that a single black and white image is a 2D Array. A colored image is therefore a 3D Array. Behind the scenes the colored picture can be thought of as a book with 3 pages, a red page, a green page and a blue page respectively. Earlier we discussed the origins behind primary, secondary, black and white colors. We can see how color mixing works by examining the constructed image.

In [14]:
redrow0=[1.0,0.0,0.0]
redrow1=[1.0,0.0,1.0]
redrow2=[0.0,0.5,1.0]

redm=[redrow0,redrow1,redrow2]

greenrow0=[0.0,1.0,0.0]
greenrow1=[1.0,1.0,0.0]
greenrow2=[0.0,0.5,1.0]

greenm=[greenrow0,greenrow1,greenrow2]

bluerow0=[0.0,0.0,1.0]
bluerow1=[0.0,1.0,1.0]
bluerow2=[0.0,0.5,1.0]

bluem=[bluerow0,bluerow1,bluerow2]

colorimage=[redm,greenm,bluem]


The colour image (top left) is a result of color mixing the red matrix (top right), green matrix (bottom left) and blue matrix (bottom right).

In [15]:
colorimage

Out[15]:
[[[1.0, 0.0, 0.0], [1.0, 0.0, 1.0], [0.0, 0.5, 1.0]],
[[0.0, 1.0, 0.0], [1.0, 1.0, 0.0], [0.0, 0.5, 1.0]],
[[0.0, 0.0, 1.0], [0.0, 1.0, 1.0], [0.0, 0.5, 1.0]]]

The above output may initially look confusing, for convenience this can also be split over multiple lines.

In [16]:
[[[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[0.0, 0.5, 1.0]],

[[0.0, 1.0, 0.0],
[1.0, 1.0, 0.0],
[0.0, 0.5, 1.0]],

[[0.0, 0.0, 1.0],
[0.0, 1.0, 1.0],
[0.0, 0.5, 1.0]]]

Out[16]:
[[[1.0, 0.0, 0.0], [1.0, 0.0, 1.0], [0.0, 0.5, 1.0]],
[[0.0, 1.0, 0.0], [1.0, 1.0, 0.0], [0.0, 0.5, 1.0]],
[[0.0, 0.0, 1.0], [0.0, 1.0, 1.0], [0.0, 0.5, 1.0]]]

The image taken from the camera was originally a color image. It can likewise be split into its r g b components.

### 4D Arrays¶

A colored book is an example of a 4D array.

In [17]:
frame0redrow0=[1.0,0.0,1.0]
frame0redrow1=[1.0,1.0,1.0]
frame0redrow2=[1.0,0.0,1.0]

frame0redm=[frame0redrow0,frame0redrow1,frame0redrow2]

frame0bluerow0=[0.0,0.0,0.0]
frame0bluerow1=[0.0,0.0,0.0]
frame0bluerow2=[0.0,0.0,0.0]

frame0bluem=[frame0bluerow0,frame0bluerow1,frame0bluerow2]

frame0greenrow0=[0.0,0.0,0.0]
frame0greenrow1=[0.0,0.0,0.0]
frame0greenrow2=[0.0,0.0,0.0]

frame0greenm=[frame0greenrow0,frame0greenrow1,frame0greenrow2]

frame0=[frame0redm,frame0greenm,frame0bluem]

frame1redrow0=[0.0,0.0,0.0]
frame1redrow1=[0.0,0.0,0.0]
frame1redrow2=[0.0,0.0,0.0]

frame1redm=[frame1redrow0,frame1redrow1,frame1redrow2]

frame1bluerow0=[1.0,1.0,1.0]
frame1bluerow1=[0.0,1.0,0.0]
frame1bluerow2=[1.0,1.0,1.0]

frame1bluem=[frame1bluerow0,frame1bluerow1,frame1bluerow2]

frame1greenrow0=[0.0,0.0,0.0]
frame1greenrow1=[0.0,0.0,0.0]
frame1greenrow2=[0.0,0.0,0.0]

frame1greenm=[frame1greenrow0,frame1greenrow1,frame1greenrow2]

frame1=[frame1redm,frame1greenm,frame1bluem]

frame2redrow0=[0.0,0.0,0.0]
frame2redrow1=[0.0,0.0,0.0]
frame2redrow2=[0.0,0.0,0.0]

frame2redm=[frame2redrow0,frame2redrow1,frame2redrow2]

frame2bluerow0=[0.0,0.0,0.0]
frame2bluerow1=[0.0,0.0,0.0]
frame2bluerow2=[0.0,0.0,0.0]

frame2bluem=[frame2bluerow0,frame2bluerow1,frame2bluerow2]

frame2greenrow0=[0.0,0.0,0.0]
frame2greenrow1=[0.0,0.0,0.0]
frame2greenrow2=[0.0,1.0,0.0]

frame2greenm=[frame2greenrow0,frame2greenrow1,frame2greenrow2]

frame2=[frame2redm,frame2greenm,frame2bluem]

colorbook=[frame0,frame1,frame2]


We can examine the output.

In [18]:
colorbook

Out[18]:
[[[[1.0, 0.0, 1.0], [1.0, 1.0, 1.0], [1.0, 0.0, 1.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]],
[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[1.0, 1.0, 1.0], [0.0, 1.0, 0.0], [1.0, 1.0, 1.0]]],
[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 1.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]]

Or write it over multiple lines so it is easier to vissualize.

In [19]:
[[[[1.0, 0.0, 1.0],
[1.0, 1.0, 1.0],
[1.0, 0.0, 1.0]],

[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]],

[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]],

[[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]],

[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]],

[[1.0, 1.0, 1.0],
[0.0, 1.0, 0.0],
[1.0, 1.0, 1.0]]],

[[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]],

[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 1.0, 0.0]],

[[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]]]

Out[19]:
[[[[1.0, 0.0, 1.0], [1.0, 1.0, 1.0], [1.0, 0.0, 1.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]],
[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[1.0, 1.0, 1.0], [0.0, 1.0, 0.0], [1.0, 1.0, 1.0]]],
[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 1.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]]

When typing in such a long expression ensure you match the brackets. Examining the bottom right, the currently selected bracket will be highlighted in green. Its corresponding beginning half will also be highlighted in green. I have highlighted each set of brackets and emphasized it with red in the animated gif below.

A video is essentially a colored book with a fixed frame rate. We can view the above as an animated gif which has a 1 fps frame rate (each frame last 1 s).

A screen has multiple pixels and an image on a screen is an example of a 3D array. Screens are commonly seen in most electronic devices laptops, phones, cameras etc. Screen resolution is usually defined in terms of pixels, in the case of the Dell XPS 13 this is 3200×1600 pixels which means 1600 rows and 3200 columns. A common refresh rate in computers is 60 frames per second. This means our computer has flicked through 60 3D Arrays every second when watching a color video which is a 4D array.

## Python lists¶

Let's have a look at some of the common datatypes present within the Python programming language. First let's look at a str. A str is an abbreviation for a string of characters.

In [20]:
a='apple'

In [21]:
type(a)

Out[21]:
str

The object has a number of attributes and methods which can be accessed by typing in the objects name follwoed by a dot . and tab ↹. Unsurprisingly for a str most of these are optimised towards text operations.

We can more directly instantiate another str using the str init method of the str class.

In [22]:
b=str('banana')


Recall that operators are defined using classes special methods. For a str the + operator performs concatenation.

In [23]:
a+b

Out[23]:
'applebanana'

Now let's have a look at a float, an abbreviation for a floating point number.

In [24]:
c=0.4

In [25]:
type(c)

Out[25]:
float

The object has a number of attributes and methods which can be accessed by typing in the objects name follwoed by a dot . and tab ↹. Unsurprisingly for a float most of these are optimised towards numeric data.

We can more directly instantiate another float using the float init method of the float class.

In [26]:
d=float(0.1)


Recall that operators are defined using classes special methods. For a float the + operator performs addition.

In [27]:
c+d

Out[27]:
0.5

Now let's have a look at a list.

In [28]:
e=[0.1,0.2,0.3,0.4]

In [29]:
type(e)

Out[29]:
list

It has a number of attributes and methods available to it and most of these are optimised towards list operations.

We can more directly instantiate another list using the list init method of the list class.

In [30]:
f=list([0.3,0.3,0.3,0.3])


Recall that operators are defined using classes special methods. For a list the + operator performs concatenation.

In [31]:
e+f

Out[31]:
[0.1, 0.2, 0.3, 0.4, 0.3, 0.3, 0.3, 0.3]

To perform addition we would need to use the likes of a for loop which is possible but a bit convoluted.

In [32]:
def sumlists(list1,list2):
summed_list=[]
for idx,val in enumerate(list1):
summed_val=list1[idx]+list2[idx]
summed_list.append(summed_val)
return summed_list

In [33]:
sumlists(e,f)

Out[33]:
[0.4, 0.5, 0.6, 0.7]

lists are a very flexible data structure and each index can store different datatypes. This flexibility is very useful in some applications but can be a hinderance when dealing with numeric data and can result in TypeErrors.

In [34]:
g=['a',0.1,0.1,0.1]


sumlists(e,g)

## NumPy Arrays (ndarray)¶

We therefore want a data structure that is similar to a list but optimized for numeric data i.e. a numeric array. The numeric Python library is based around such a datatype known as a numpy array.

### Importing the NumPy Library¶

To use the numpy library we need to import it. As numpy is the most commonly used standard library for python, it is usually imported using a 2 letter abbreviation np. Note everything is lower case in the import line.

In [35]:
import numpy as np


Once the numpy library is imported a multitude of objects and functions can be accessed from the numpy lbrary. To view these type in np followed by a dot . and a tab ↹.

This is a large list and its contents can be viewed in a cell output using.

In [36]:
dir(np)

Out[36]:
['ALLOW_THREADS',
'AxisError',
'BUFSIZE',
'CLIP',
'ComplexWarning',
'DataSource',
'ERR_CALL',
'ERR_DEFAULT',
'ERR_IGNORE',
'ERR_LOG',
'ERR_PRINT',
'ERR_RAISE',
'ERR_WARN',
'FLOATING_POINT_SUPPORT',
'FPE_DIVIDEBYZERO',
'FPE_INVALID',
'FPE_OVERFLOW',
'FPE_UNDERFLOW',
'False_',
'Inf',
'Infinity',
'MAXDIMS',
'MAY_SHARE_BOUNDS',
'MAY_SHARE_EXACT',
'MachAr',
'ModuleDeprecationWarning',
'NAN',
'NINF',
'NZERO',
'NaN',
'PINF',
'PZERO',
'RAISE',
'RankWarning',
'SHIFT_DIVIDEBYZERO',
'SHIFT_INVALID',
'SHIFT_OVERFLOW',
'SHIFT_UNDERFLOW',
'ScalarType',
'Tester',
'TooHardError',
'True_',
'UFUNC_BUFSIZE_DEFAULT',
'UFUNC_PYVALS_NAME',
'VisibleDeprecationWarning',
'WRAP',
'_NoValue',
'_UFUNC_API',
'__NUMPY_SETUP__',
'__all__',
'__builtins__',
'__cached__',
'__config__',
'__dir__',
'__doc__',
'__file__',
'__getattr__',
'__git_revision__',
'__mkl_version__',
'__name__',
'__package__',
'__path__',
'__spec__',
'__version__',
'_distributor_init',
'_globals',
'_mat',
'_pytesttester',
'abs',
'absolute',
'alen',
'all',
'allclose',
'alltrue',
'amax',
'amin',
'angle',
'any',
'append',
'apply_along_axis',
'apply_over_axes',
'arange',
'arccos',
'arccosh',
'arcsin',
'arcsinh',
'arctan',
'arctan2',
'arctanh',
'argmax',
'argmin',
'argpartition',
'argsort',
'argwhere',
'around',
'array',
'array2string',
'array_equal',
'array_equiv',
'array_repr',
'array_split',
'array_str',
'asanyarray',
'asarray',
'asarray_chkfinite',
'ascontiguousarray',
'asfarray',
'asfortranarray',
'asmatrix',
'asscalar',
'atleast_1d',
'atleast_2d',
'atleast_3d',
'average',
'bartlett',
'base_repr',
'binary_repr',
'bincount',
'bitwise_and',
'bitwise_not',
'bitwise_or',
'bitwise_xor',
'blackman',
'block',
'bmat',
'bool',
'bool8',
'bool_',
'busday_count',
'busday_offset',
'busdaycalendar',
'byte',
'byte_bounds',
'bytes0',
'bytes_',
'c_',
'can_cast',
'cast',
'cbrt',
'cdouble',
'ceil',
'cfloat',
'char',
'character',
'chararray',
'choose',
'clip',
'clongdouble',
'clongfloat',
'column_stack',
'common_type',
'compare_chararrays',
'compat',
'complex',
'complex128',
'complex64',
'complex_',
'complexfloating',
'compress',
'concatenate',
'conj',
'conjugate',
'convolve',
'copy',
'copysign',
'copyto',
'core',
'corrcoef',
'correlate',
'cos',
'cosh',
'count_nonzero',
'cov',
'cross',
'csingle',
'ctypeslib',
'cumprod',
'cumproduct',
'cumsum',
'datetime64',
'datetime_as_string',
'datetime_data',
'degrees',
'delete',
'deprecate',
'deprecate_with_doc',
'diag',
'diag_indices',
'diag_indices_from',
'diagflat',
'diagonal',
'diff',
'digitize',
'disp',
'divide',
'divmod',
'dot',
'double',
'dsplit',
'dstack',
'dtype',
'e',
'ediff1d',
'einsum',
'einsum_path',
'emath',
'empty',
'empty_like',
'equal',
'errstate',
'euler_gamma',
'exp',
'exp2',
'expand_dims',
'expm1',
'extract',
'eye',
'fabs',
'fastCopyAndTranspose',
'fft',
'fill_diagonal',
'find_common_type',
'finfo',
'fix',
'flatiter',
'flatnonzero',
'flexible',
'flip',
'fliplr',
'flipud',
'float',
'float16',
'float32',
'float64',
'float_',
'float_power',
'floating',
'floor',
'floor_divide',
'fmax',
'fmin',
'fmod',
'format_float_positional',
'format_float_scientific',
'format_parser',
'frexp',
'frombuffer',
'fromfile',
'fromfunction',
'fromiter',
'frompyfunc',
'fromregex',
'fromstring',
'full',
'full_like',
'fv',
'gcd',
'generic',
'genfromtxt',
'geomspace',
'get_array_wrap',
'get_include',
'get_printoptions',
'getbufsize',
'geterr',
'geterrcall',
'geterrobj',
'greater',
'greater_equal',
'half',
'hamming',
'hanning',
'heaviside',
'histogram',
'histogram2d',
'histogram_bin_edges',
'histogramdd',
'hsplit',
'hstack',
'hypot',
'i0',
'identity',
'iinfo',
'imag',
'in1d',
'index_exp',
'indices',
'inexact',
'inf',
'info',
'infty',
'inner',
'insert',
'int',
'int0',
'int16',
'int32',
'int64',
'int8',
'int_',
'intc',
'integer',
'interp',
'intersect1d',
'intp',
'invert',
'ipmt',
'irr',
'is_busday',
'isclose',
'iscomplex',
'iscomplexobj',
'isfinite',
'isfortran',
'isin',
'isinf',
'isnan',
'isnat',
'isneginf',
'isposinf',
'isreal',
'isrealobj',
'isscalar',
'issctype',
'issubclass_',
'issubdtype',
'issubsctype',
'iterable',
'ix_',
'kaiser',
'kron',
'lcm',
'ldexp',
'left_shift',
'less',
'less_equal',
'lexsort',
'lib',
'linalg',
'linspace',
'little_endian',
'log',
'log10',
'log1p',
'log2',
'logical_and',
'logical_not',
'logical_or',
'logical_xor',
'logspace',
'long',
'longcomplex',
'longdouble',
'longfloat',
'longlong',
'lookfor',
'ma',
'mafromtxt',
'mat',
'math',
'matmul',
'matrix',
'matrixlib',
'max',
'maximum',
'maximum_sctype',
'may_share_memory',
'mean',
'median',
'memmap',
'meshgrid',
'mgrid',
'min',
'min_scalar_type',
'minimum',
'mintypecode',
'mirr',
'mkl',
'mod',
'modf',
'moveaxis',
'msort',
'multiply',
'nan',
'nan_to_num',
'nanargmax',
'nanargmin',
'nancumprod',
'nancumsum',
'nanmax',
'nanmean',
'nanmedian',
'nanmin',
'nanpercentile',
'nanprod',
'nanquantile',
'nanstd',
'nansum',
'nanvar',
'nbytes',
'ndarray',
'ndenumerate',
'ndfromtxt',
'ndim',
'ndindex',
'nditer',
'negative',
'nested_iters',
'newaxis',
'nextafter',
'nonzero',
'not_equal',
'nper',
'npv',
'numarray',
'number',
'obj2sctype',
'object',
'object0',
'object_',
'ogrid',
'oldnumeric',
'ones',
'ones_like',
'os',
'outer',
'packbits',
'partition',
'percentile',
'pi',
'piecewise',
'place',
'pmt',
'poly',
'poly1d',
'polyder',
'polydiv',
'polyfit',
'polyint',
'polymul',
'polynomial',
'polysub',
'polyval',
'positive',
'power',
'ppmt',
'printoptions',
'prod',
'product',
'promote_types',
'ptp',
'put',
'put_along_axis',
'pv',
'quantile',
'r_',
'random',
'rate',
'ravel',
'ravel_multi_index',
'real',
'real_if_close',
'rec',
'recarray',
'recfromcsv',
'recfromtxt',
'reciprocal',
'record',
'remainder',
'repeat',
'require',
'reshape',
'resize',
'result_type',
'right_shift',
'rint',
'roll',
'rollaxis',
'roots',
'rot90',
'round',
'round_',
'row_stack',
's_',
'safe_eval',
'save',
'savetxt',
'savez',
'savez_compressed',
'sctype2char',
'sctypeDict',
'sctypeNA',
'sctypes',
'searchsorted',
'select',
'set_numeric_ops',
'set_printoptions',
'set_string_function',
'setbufsize',
'setdiff1d',
'seterr',
'seterrcall',
'seterrobj',
'setxor1d',
'shape',
'shares_memory',
'short',
'show_config',
'sign',
'signbit',
'signedinteger',
'sin',
'sinc',
'single',
'singlecomplex',
'sinh',
'size',
'sometrue',
'sort',
'sort_complex',
'source',
'spacing',
'split',
'sqrt',
'square',
'squeeze',
'stack',
'std',
'str',
'str0',
'str_',
'string_',
'subtract',
'sum',
'swapaxes',
'sys',
'take',
'take_along_axis',
'tan',
'tanh',
'tensordot',
'test',
'testing',
'tile',
'timedelta64',
'trace',
'tracemalloc_domain',
'transpose',
'trapz',
'tri',
'tril',
'tril_indices',
'tril_indices_from',
'trim_zeros',
'triu',
'triu_indices',
'triu_indices_from',
'true_divide',
'trunc',
'typeDict',
'typeNA',
'typecodes',
'typename',
'ubyte',
'ufunc',
'uint',
'uint0',
'uint16',
'uint32',
'uint64',
'uint8',
'uintc',
'uintp',
'ulonglong',
'unicode',
'unicode_',
'union1d',
'unique',
'unpackbits',
'unravel_index',
'unsignedinteger',
'unwrap',
'use_hugepage',
'ushort',
'vander',
'var',
'vdot',
'vectorize',
'version',
'void',
'void0',
'vsplit',
'vstack',
'warnings',
'where',
'who',
'zeros',
'zeros_like']
In [37]:
? np

Type:        module
String form: <module 'numpy' from 'C:\\Users\\Phili\\anaconda3\\lib\\site-packages\\numpy\\__init__.py'>
File:        c:\users\phili\anaconda3\lib\site-packages\numpy\__init__.py
Docstring:
NumPy
=====

Provides
1. An array object of arbitrary homogeneous items
2. Fast mathematical operations over arrays
3. Linear Algebra, Fourier Transforms, Random Number Generation

How to use the documentation
----------------------------
Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
the NumPy homepage <https://www.scipy.org>_.

We recommend exploring the docstrings using
IPython <https://ipython.org>_, an advanced Python shell with
TAB-completion and introspection capabilities.  See below for further
instructions.

The docstring examples assume that numpy has been imported as np::

>>> import numpy as np

Code snippets are indicated by three greater-than signs::

>>> x = 42
>>> x = x + 1

Use the built-in help function to view a function's docstring::

>>> help(np.sort)
... # doctest: +SKIP

For some objects, np.info(obj) may provide additional help.  This is
particularly true if you see the line "Help on ufunc object:" at the top
of the help() page.  Ufuncs are implemented in C, not Python, for speed.
The native Python help() does not know how to view their help, but our
np.info() function does.

To search for documents containing a keyword, do::

>>> np.lookfor('keyword')
... # doctest: +SKIP

General-purpose documents like a glossary and help on the basic concepts
of numpy are available under the doc sub-module::

>>> from numpy import doc
>>> help(doc)
... # doctest: +SKIP

Available subpackages
---------------------
doc
Topical documentation on broadcasting, indexing, etc.
lib
Basic functions used by several sub-packages.
random
Core Random Tools
linalg
Core Linear Algebra Tools
fft
Core FFT routines
polynomial
Polynomial tools
testing
NumPy testing tools
f2py
Fortran to Python Interface Generator.
distutils
Enhancements to distutils with support for
Fortran compilers support and more.

Utilities
---------
test
Run numpy unittests
show_config
Show numpy build configuration
dual
Overwrite certain functions with high-performance Scipy tools
matlib
Make everything matrices.
__version__
NumPy version string

Viewing documentation using IPython
-----------------------------------
Start IPython with the NumPy profile (ipython -p numpy), which will
import numpy under the alias np.  Then, use the cpaste command to
paste examples into the shell.  To see which functions are available in
numpy, type np.<TAB> (where <TAB> refers to the TAB key), or use
np.*cos*?<ENTER> (where <ENTER> refers to the ENTER key) to narrow
down the list.  To view the docstring for a function, use
np.cos?<ENTER> (to view the docstring) and np.cos??<ENTER> (to view
the source code).

Copies vs. in-place operation
-----------------------------
Most of the functions in numpy return a copy of the array argument
(e.g., np.sort).  In-place versions of these functions are often
available as array methods, i.e. x = np.array([1,2,3]); x.sort().
Exceptions to this rule are documented.


The numpy library contains a number of useful mathematical constants as attributes such as pi and e. There is also a representation for infinity and not a number.

In [38]:
np.pi

Out[38]:
3.141592653589793
In [39]:
np.e

Out[39]:
2.718281828459045
In [40]:
np.inf

Out[40]:
inf
In [41]:
np.nan

Out[41]:
nan

We can create an numpy array from an existing data structure object such as an int, float, list or tuple by use of the numpy function array. To get details about the function we can type it in followed by a shift ⇧ and tab ↹. We see that the positional input argument is object corresponding to the original data structure. There is an additional keyword input arguments such as dtype which has a default value of None meaning the dtype is inferred from the input data. The other keyword input arguments are for advanced applications such as data imports from Fortran and can be left as default.

In [42]:
? np.array

Docstring:
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0)

Create an array.

Parameters
----------
object : array_like
An array, any object exposing the array interface, an object whose
__array__ method returns an array, or any (nested) sequence.
dtype : data-type, optional
The desired data-type for the array.  If not given, then the type will
be determined as the minimum type required to hold the objects in the
sequence.
copy : bool, optional
If true (default), then the object is copied.  Otherwise, a copy will
only be made if __array__ returns a copy, if obj is a nested sequence,
or if a copy is needed to satisfy any of the other requirements
(dtype, order, etc.).
order : {'K', 'A', 'C', 'F'}, optional
Specify the memory layout of the array. If object is not an array, the
newly created array will be in C order (row major) unless 'F' is
specified, in which case it will be in Fortran order (column major).
If object is an array the following holds.

===== ========= ===================================================
order  no copy                     copy=True
===== ========= ===================================================
'K'   unchanged F & C order preserved, otherwise most similar order
'A'   unchanged F order if input is F and not C, otherwise C order
'C'   C order   C order
'F'   F order   F order
===== ========= ===================================================

When copy=False and a copy is made for other reasons, the result is
the same as if copy=True, with some exceptions for A, see the
Notes section. The default order is 'K'.
subok : bool, optional
If True, then sub-classes will be passed-through, otherwise
the returned array will be forced to be a base-class array (default).
ndmin : int, optional
Specifies the minimum number of dimensions that the resulting
array should have.  Ones will be pre-pended to the shape as
needed to meet this requirement.

Returns
-------
out : ndarray
An array object satisfying the specified requirements.

--------
empty_like : Return an empty array with shape and type of input.
ones_like : Return an array of ones with shape and type of input.
zeros_like : Return an array of zeros with shape and type of input.
full_like : Return a new array with shape of input filled with value.
empty : Return a new uninitialized array.
ones : Return a new array setting values to one.
zeros : Return a new array setting values to zero.
full : Return a new array of given shape filled with value.

Notes
-----
When order is 'A' and object is an array in neither 'C' nor 'F' order,
and a copy is forced by a change in dtype, then the order of the result is
not necessarily 'C' as expected. This is likely a bug.

Examples
--------
>>> np.array([1, 2, 3])
array([1, 2, 3])

Upcasting:

>>> np.array([1, 2, 3.0])
array([ 1.,  2.,  3.])

More than one dimension:

>>> np.array([[1, 2], [3, 4]])
array([[1, 2],
[3, 4]])

Minimum dimensions 2:

>>> np.array([1, 2, 3], ndmin=2)
array([[1, 2, 3]])

Type provided:

>>> np.array([1, 2, 3], dtype=complex)
array([ 1.+0.j,  2.+0.j,  3.+0.j])

Data-type consisting of more than one element:

>>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
>>> x['a']
array([1, 3])

Creating an array from sub-classes:

>>> np.array(np.mat('1 2; 3 4'))
array([[1, 2],
[3, 4]])

>>> np.array(np.mat('1 2; 3 4'), subok=True)
matrix([[1, 2],
[3, 4]])
Type:      builtin_function_or_method


Let's create a list l1.

In [43]:
l1=[1,2,3,4]


If we attempt to add the value 1 to the list we will get a TypeError because the + operator performs concatenation in a list and it cannot concatennate an int to a list.

l1+1

If however we convert the list l1 to a numeric array a1.

In [44]:
a1=np.array(l1)

In [45]:
a1

Out[45]:
array([1, 2, 3, 4])

The + operator in a numeric array is settup for numeric addition and the value 1 is added to the original value of each element in the list.

In [46]:
a1+1

Out[46]:
array([2, 3, 4, 5])

The numpy array is an instance of the ndarray class and has a number of attributes and methods which can be accessed by typing the objects name and then pressing dot . and tab ↹.

In [47]:
type(a1)

Out[47]:
numpy.ndarray

To view the output of this list in a cell we can type.

In [48]:
dir(a1)

Out[48]:
['T',
'__abs__',
'__and__',
'__array__',
'__array_finalize__',
'__array_function__',
'__array_interface__',
'__array_prepare__',
'__array_priority__',
'__array_struct__',
'__array_ufunc__',
'__array_wrap__',
'__bool__',
'__class__',
'__complex__',
'__contains__',
'__copy__',
'__deepcopy__',
'__delattr__',
'__delitem__',
'__dir__',
'__divmod__',
'__doc__',
'__eq__',
'__float__',
'__floordiv__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__gt__',
'__hash__',
'__iand__',
'__ifloordiv__',
'__ilshift__',
'__imatmul__',
'__imod__',
'__imul__',
'__index__',
'__init__',
'__init_subclass__',
'__int__',
'__invert__',
'__ior__',
'__ipow__',
'__irshift__',
'__isub__',
'__iter__',
'__itruediv__',
'__ixor__',
'__le__',
'__len__',
'__lshift__',
'__lt__',
'__matmul__',
'__mod__',
'__mul__',
'__ne__',
'__neg__',
'__new__',
'__or__',
'__pos__',
'__pow__',
'__rand__',
'__rdivmod__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rfloordiv__',
'__rlshift__',
'__rmatmul__',
'__rmod__',
'__rmul__',
'__ror__',
'__rpow__',
'__rrshift__',
'__rshift__',
'__rsub__',
'__rtruediv__',
'__rxor__',
'__setattr__',
'__setitem__',
'__setstate__',
'__sizeof__',
'__str__',
'__sub__',
'__subclasshook__',
'__truediv__',
'__xor__',
'all',
'any',
'argmax',
'argmin',
'argpartition',
'argsort',
'astype',
'base',
'byteswap',
'choose',
'clip',
'compress',
'conj',
'conjugate',
'copy',
'ctypes',
'cumprod',
'cumsum',
'data',
'diagonal',
'dot',
'dtype',
'dump',
'dumps',
'fill',
'flags',
'flat',
'flatten',
'getfield',
'imag',
'item',
'itemset',
'itemsize',
'max',
'mean',
'min',
'nbytes',
'ndim',
'newbyteorder',
'nonzero',
'partition',
'prod',
'ptp',
'put',
'ravel',
'real',
'repeat',
'reshape',
'resize',
'round',
'searchsorted',
'setfield',
'setflags',
'shape',
'size',
'sort',
'squeeze',
'std',
'strides',
'sum',
'swapaxes',
'take',
'tobytes',
'tofile',
'tolist',
'tostring',
'trace',
'transpose',
'var',
'view']
In [49]:
? a1

Type:            ndarray
String form:     [1 2 3 4]
Length:          4
File:            c:\users\phili\anaconda3\lib\site-packages\numpy\__init__.py
Docstring:       <no docstring>
Class docstring:
ndarray(shape, dtype=float, buffer=None, offset=0,
strides=None, order=None)

An array object represents a multidimensional, homogeneous array
of fixed-size items.  An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)

Arrays should be constructed using array, zeros or empty (refer
to the See Also section below).  The parameters given here refer to
a low-level method (ndarray(...)) for instantiating an array.

For more information, refer to the numpy module and examine the
methods and attributes of an array.

Parameters
----------
(for the __new__ method; see Notes below)

shape : tuple of ints
Shape of created array.
dtype : data-type, optional
Any object that can be interpreted as a numpy data type.
buffer : object exposing buffer interface, optional
Used to fill the array with data.
offset : int, optional
Offset of array data in buffer.
strides : tuple of ints, optional
Strides of data in memory.
order : {'C', 'F'}, optional
Row-major (C-style) or column-major (Fortran-style) order.

Attributes
----------
T : ndarray
Transpose of the array.
data : buffer
The array's elements, in memory.
dtype : dtype object
Describes the format of the elements in the array.
flags : dict
Dictionary containing information related to memory use, e.g.,
'C_CONTIGUOUS', 'OWNDATA', 'WRITEABLE', etc.
flat : numpy.flatiter object
Flattened version of the array as an iterator.  The iterator
allows assignments, e.g., x.flat = 3 (See ndarray.flat for
assignment examples; TODO).
imag : ndarray
Imaginary part of the array.
real : ndarray
Real part of the array.
size : int
Number of elements in the array.
itemsize : int
The memory use of each array element in bytes.
nbytes : int
The total number of bytes required to store the array data,
i.e., itemsize * size.
ndim : int
The array's number of dimensions.
shape : tuple of ints
Shape of the array.
strides : tuple of ints
The step-size required to move from one element to the next in
memory. For example, a contiguous (3, 4) array of type
int16 in C-order has strides (8, 2).  This implies that
to move from element to element in memory requires jumps of 2 bytes.
To move from row-to-row, one needs to jump 8 bytes at a time
(2 * 4).
ctypes : ctypes object
Class containing properties of the array needed for interaction
with ctypes.
base : ndarray
If the array is a view into another array, that array is its base
(unless that array is also a view).  The base array is where the
array data is actually stored.

--------
array : Construct an array.
zeros : Create an array, each element of which is zero.
empty : Create an array, but leave its allocated memory unchanged (i.e.,
it contains "garbage").
dtype : Create a data-type.

Notes
-----
There are two modes of creating an array using __new__:

1. If buffer is None, then only shape, dtype, and order
are used.
2. If buffer is an object exposing the buffer interface, then
all keywords are interpreted.

No __init__ method is needed because the array is fully initialized
after the __new__ method.

Examples
--------
These examples illustrate the low-level ndarray constructor.  Refer
to the See Also section above for easier ways of constructing an
ndarray.

First mode, buffer is None:

>>> np.ndarray(shape=(2,2), dtype=float, order='F')
array([[0.0e+000, 0.0e+000], # random
[     nan, 2.5e-323]])

Second mode:

>>> np.ndarray((2,), buffer=np.array([1,2,3]),
...            offset=np.int_().itemsize,
...            dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])


### ndarray dtype¶

Recall that attributes are objects or instances of a class and are called without input arguments. We can use the dtype attribute to examine the datatype of each element in the numpy array.

In [50]:
a1

Out[50]:
array([1, 2, 3, 4])
In [51]:
a1.dtype

Out[51]:
dtype('int32')

This differs from the function type which tells us the object a1 is a numpy array.

In [52]:
type(a1)

Out[52]:
numpy.ndarray

We can specify the datatype while creating the array. To do se we can change the keyword input argument dtype. In this example we will override the default behaviour from None (inferred) and set it to float.

In [53]:
a2=np.array(a1,dtype=float)


We can see the decimal point after each value in the numpy array.

In [54]:
a2

Out[54]:
array([1., 2., 3., 4.])
In [55]:
a2.dtype

Out[55]:
dtype('float64')
In [56]:
type(a2)

Out[56]:
numpy.ndarray

Another imporant attribute is the shape of an array.

### ndarray shape, reshape, ndim and size¶

We can use the attribut shape to get the dimensions of the array.

In [57]:
a1.shape

Out[57]:
(4,)

Many of the attributes have an analagous function which can be called directly from the np library. To get details about the input arguments type in the function name followed by shift ⇧ and tab ↹. We see that the array a needs to referenced as an input argument.

In [58]:
np.shape(a1)

Out[58]:
(4,)

Notice that the output above is (4,) which means the array has only a single dimension. It is neither a column or a row. We can use the attribute ndim or the numpy fnction ndim to check the number of dimensions.

In [59]:
a1.ndim

Out[59]:
1
In [60]:
np.ndim(a1)

Out[60]:
1

The method size or numpy function size will give the number of elements in the array.

In [61]:
a1.size

Out[61]:
4
In [62]:
np.size(a1)

Out[62]:
4

Let's create a matrix using a list of equally spaced lists.

In [63]:
m1=[[1,2,3,4],
[1,0,3,4],
[2,3,1,1]]


Now let's convert it to a numpy array.

In [64]:
m1=np.array(m1)

In [65]:
m1

Out[65]:
array([[1, 2, 3, 4],
[1, 0, 3, 4],
[2, 3, 1, 1]])

Now let's look at the number of dimensions. This is 2 as expected (rows and columns).

In [66]:
m1.ndim

Out[66]:
2

Now shape returns a tuple corresponding to the number of rows and number of columsn respectively.

In [67]:
m1.shape

Out[67]:
(3, 4)

In some cases we may want to use tuple unpacking to get the dimensions as variables.

In [68]:
(nrows,ncols)=m1.shape

In [69]:
nrows

Out[69]:
3
In [70]:
ncols

Out[70]:
4

We can now look at the size. This is 12 as expected 3 rows × 4 columns = 12 elements.

In [71]:
m1.size

Out[71]:
12
In [72]:
m1

Out[72]:
array([[1, 2, 3, 4],
[1, 0, 3, 4],
[2, 3, 1, 1]])

We can use the method flatten to flatten a 2d ndarray matrix into a 1d ndarray. If we type in the method followed by shift ⇧ and tab ↹ we see that it has a keyword input argument order which has default value of 'C' which stands for the C programming language and will flatten the matrix using row-major order. N.B. 'C' in this context shouldn't be confused with columns.

In [73]:
m1.flatten()

Out[73]:
array([1, 2, 3, 4, 1, 0, 3, 4, 2, 3, 1, 1])

We can use the reshape method to reshape an array to new dimensions. Once again to get the input arguments we can type in the method name followed by a shift ⇧ and tab ↹. We have the positional input argument shape. Note the size of the new shape must match the size of the old shape. We can reshape the original 3 rows by 4 columns matrix (3×4=12) to a 2 rows by 6 columns matrix (2×6=12). The shape is input as a tuple.

In [74]:
m1.reshape((2,6))

Out[74]:
array([[1, 2, 3, 4, 1, 0],
[3, 4, 2, 3, 1, 1]])

The numpy function reshape works in a similar manner.

In [75]:
np.reshape(m1,(2,6))

Out[75]:
array([[1, 2, 3, 4, 1, 0],
[3, 4, 2, 3, 1, 1]])

Note in either case the output has not be assigned to an object name and displays in a cell. We can reassign the value to the object name m1.

In [76]:
m1=np.reshape(m1,(2,6))


The value is not shown as it has been reassigned to the object name m1.

In [77]:
m1

Out[77]:
array([[1, 2, 3, 4, 1, 0],
[3, 4, 2, 3, 1, 1]])

Let's look again at the 1D array a1.

In [78]:
a1

Out[78]:
array([1, 2, 3, 4])
In [79]:
a1.shape

Out[79]:
(4,)
In [80]:
a1.ndim

Out[80]:
1
In [81]:
a1.size

Out[81]:
4

We see that it has a single dimension. In some cases we want to explictly reshape it as a row or a column. Recall that the input tuple used in the method reshape has the form (nrows,ncols). To get a row vector by definition we have 1 row and multiple columns. All the elements will be in a seperate column and we can use a1.size to specify this.

In [82]:
a1_row=a1.reshape((1,a1.size))


Now we see that a1_row is explicitly a row vector ndarray. With a shape of 1 row and 4 columns, 2 dimensions and maintaining a size of 4.

In [83]:
a1_row

Out[83]:
array([[1, 2, 3, 4]])
In [84]:
a1_row.shape

Out[84]:
(1, 4)
In [85]:
a1_row.ndim

Out[85]:
2
In [86]:
a1_row.size

Out[86]:
4

We can also use the tuple (1,-1). The -1 will reflect that we want want all elements in the tuple.

In [87]:
a1_row=a1.reshape((1,-1))

In [88]:
a1_row

Out[88]:
array([[1, 2, 3, 4]])

To get a column vector by definition we have multiple rows and 1 column. We can set the tuple shape to be (-1,1).

In [89]:
a1_col=a1.reshape((-1,1))

In [90]:
a1_col

Out[90]:
array([[1],
[2],
[3],
[4]])
In [91]:
a1_col.shape

Out[91]:
(4, 1)
In [92]:
a1_col.ndim

Out[92]:
2
In [93]:
a1_col.size

Out[93]:
4

### Indexing ndarrays¶

Let's have a look at the list.

In [94]:
l2=[2,4,6,8]

In [95]:
l2

Out[95]:
[2, 4, 6, 8]

Recall we can index into a list using square brackets. For example to retrieve the value at index 2 we can use.

In [96]:
l2[2]

Out[96]:
6

Recall we use zero-order indexing so we start from 0 and therefore the value at index 0 is.

In [97]:
l2[0]

Out[97]:
2

We can see how this works in more detail by using a for loop.

In [98]:
for idx,val in enumerate(l2):
print(idx,val)

0 2
1 4
2 6
3 8


The number before 0 is -1 and retrives the value at the last index. This idea is also why -1 is used in the ndarray method reshape.

In [99]:
l2[-1]

Out[99]:
8

We can see how negative indexing works using the for loop.

In [100]:
for idx,val in enumerate(l2):
print(idx-len(l2),val)

-4 2
-3 4
-2 6
-1 8


Let's now create a numpy array.

In [101]:
a2=np.array(l2)

In [102]:
a2

Out[102]:
array([2, 4, 6, 8])

We see that indexing works in a similar manner.

In [103]:
a2[2]

Out[103]:
6
In [104]:
a2[0]

Out[104]:
2
In [105]:
for idx,val in enumerate(a2):
print(idx,val)

0 2
1 4
2 6
3 8

In [106]:
for idx,val in enumerate(a2):
print(idx-len(l2),val)

-4 2
-3 4
-2 6
-1 8


Let's now have a look at a 2D ndarray known as a matrix. We can make a list of lists.

In [107]:
m2=[[2,4,6,8],
[1,3,5,7]]

In [108]:
m2

Out[108]:
[[2, 4, 6, 8], [1, 3, 5, 7]]

Indexing using square brackets will index into the outer list. If we select the 1st index we get the nested list.

In [109]:
m2[1]

Out[109]:
[1, 3, 5, 7]

We can then index into this, for example to get the 2nd element within this nested list.

In [110]:
m2[1][2]

Out[110]:
5

Notice the square brackets are seperate as we perform two seperate index operations i.e. indexing into the outer list and then indexing into the nested inner list.)

In [111]:
m2

Out[111]:
[[2, 4, 6, 8], [1, 3, 5, 7]]

Let's now create an array from this matrix.

In [112]:
m2=np.array(m2)

In [113]:
m2

Out[113]:
array([[2, 4, 6, 8],
[1, 3, 5, 7]])

Compare the output of m2 as a list of lists and a ndarray. The ndarray looks more like a matrix. We use a single indexing operation to select the desired row and column in a ndarray. We can specify a tuple as the input argument (analogous to the form of the shape when using the shape and reshape methods).

In [114]:
m2[(1,2)]

Out[114]:
5

Alternatively tuple unpacking works so the outer parenthesis do not need to be specified.

In [115]:
m2[1,2]

Out[115]:
5

In [116]:
color_image=[[[1.0,0.0,0.0,0.0,0.0],
[1.0,0.0,1.0,0.0,0.0],
[0.0,0.5,1.0,0.0,0.0],
[0.0,0.0,0.0,0.0,0.0]],

[[0.0,1.0,0.0,0.0,0.0],
[1.0,1.0,0.0,0.0,0.0],
[0.0,0.5,1.0,0.0,0.0],
[0.0,0.0,0.0,0.0,0.0]],

[[0.0,0.0,1.0,0.0,0.0],
[0.0,1.0,1.0,0.0,0.0],
[0.0,0.5,1.0,0.0,0.0],
[0.0,0.0,0.0,0.0,0.0]]]


Now let's convert this to a ndarray.

In [117]:
color_image=np.array(color_image)


We see the nd array displays this higher dimensional array as seperate matrices similar to the manner we laid out above.

In [118]:
color_image

Out[118]:
array([[[1. , 0. , 0. , 0. , 0. ],
[1. , 0. , 1. , 0. , 0. ],
[0. , 0.5, 1. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]],

[[0. , 1. , 0. , 0. , 0. ],
[1. , 1. , 0. , 0. , 0. ],
[0. , 0.5, 1. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]],

[[0. , 0. , 1. , 0. , 0. ],
[0. , 1. , 1. , 0. , 0. ],
[0. , 0.5, 1. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]]])

The number of dimensions is now 3.

In [119]:
color_image.ndim

Out[119]:
3

If we have a look at the shape we see three elements which represent the number of pages, number of rows and then number of columns.

In [120]:
color_image.shape

Out[120]:
(3, 4, 5)
In [121]:
(npags,nrows,ncols)=color_image.shape

In [122]:
npags

Out[122]:
3
In [123]:
nrows

Out[123]:
4
In [124]:
ncols

Out[124]:
5

We can once again index using a tuple of the form above.

In [125]:
color_image[(2,1,0)]

Out[125]:
0.0

Let's assign this to a new value.

In [126]:
color_image[(2,1,0)]=99


We see the value previously highlighted is now 99.

In [127]:
color_image

Out[127]:
array([[[ 1. ,  0. ,  0. ,  0. ,  0. ],
[ 1. ,  0. ,  1. ,  0. ,  0. ],
[ 0. ,  0.5,  1. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ]],

[[ 0. ,  1. ,  0. ,  0. ,  0. ],
[ 1. ,  1. ,  0. ,  0. ,  0. ],
[ 0. ,  0.5,  1. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ]],

[[ 0. ,  0. ,  1. ,  0. ,  0. ],
[99. ,  1. ,  1. ,  0. ,  0. ],
[ 0. ,  0.5,  1. ,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0. ]]])

This can once again be accessed by use of tuple unpacking.

In [128]:
color_image[2,1,0]

Out[128]:
99.0

We can have a look at a higher dimension object, the colorbook from earlier and convert it into a numpy array.

In [129]:
colorbook

Out[129]:
[[[[1.0, 0.0, 1.0], [1.0, 1.0, 1.0], [1.0, 0.0, 1.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]],
[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[1.0, 1.0, 1.0], [0.0, 1.0, 0.0], [1.0, 1.0, 1.0]]],
[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 1.0, 0.0]],
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]]
In [130]:
colorbook=np.array(colorbook)

In [131]:
colorbook

Out[131]:
array([[[[1., 0., 1.],
[1., 1., 1.],
[1., 0., 1.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]],

[[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],

[[1., 1., 1.],
[0., 1., 0.],
[1., 1., 1.]]],

[[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 1., 0.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]]])

If we have a look at the shape we see four elements which represent the number of frames, number of pages, number of rows and then number of columns. note as we go up in dimensionality the index for the new dimension is added to the left.

In [132]:
colorbook.shape

Out[132]:
(3, 3, 3, 3)
In [133]:
(nframe,npags,nrows,ncols)=colorbook.shape

In [134]:
nframe

Out[134]:
3
In [135]:
npags

Out[135]:
3
In [136]:
ncols

Out[136]:
3

We can index using a tuple of the form above.

In [137]:
colorbook[(0,1,2,1)]

Out[137]:
0.0

Once again we can use tuple unpacking, so don't need to specify the parenthesis. Let's reassign the value to 99.

In [138]:
colorbook[0,1,2,1]=99


We see the value previously highlighted is now updated.

In [139]:
colorbook

Out[139]:
array([[[[ 1.,  0.,  1.],
[ 1.,  1.,  1.],
[ 1.,  0.,  1.]],

[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0., 99.,  0.]],

[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]]],

[[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]],

[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]],

[[ 1.,  1.,  1.],
[ 0.,  1.,  0.],
[ 1.,  1.,  1.]]],

[[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]],

[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  1.,  0.]],

[[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]]]])

### concatenate and axis of ndarrays¶

Recall that for two lists.

In [140]:
l1=[2,4,6,8]

In [141]:
l2=[1,3,5,7]


The + operator performs concatenation.

In [142]:
l1+l2

Out[142]:
[2, 4, 6, 8, 1, 3, 5, 7]

Let's now look at two numpy arrays.

In [143]:
a1=np.array([2,4,6,8])

In [144]:
a2=np.array([1,3,5,7])


To perform concatenation we can use the function concatenate. To view the docstring we can type in the function followed by a shift ⇧ and tab ↹. We see the positional input argument is a tuple of all the arrays to be concatenated.

We can perform concatenation analogous to the list above using.

In [145]:
np.concatenate((a1,a2))

Out[145]:
array([2, 4, 6, 8, 1, 3, 5, 7])

If we also specify both arrays as rows.

In [146]:
a1=a1.reshape((1,-1))

In [147]:
a1

Out[147]:
array([[2, 4, 6, 8]])
In [148]:
a2=a2.reshape((1,-1))

In [149]:
a2

Out[149]:
array([[1, 3, 5, 7]])

We can also perform concatenation to obtain a matrix.

Recall that the shape for a 2D array, the 0th index corresponds to the number of rows and 1st index corresponds to the number of columns. For a 2D array the axis keyword argument in concatenate behaves in a corresponding manner. axis=0 means we are carrying out a row operation (concatenating along axis 0 i.e. increasing the number of rows). axis=1 means we are carrying out a column operation (concatenating along axis 1 i.e. increasing the number of columns).

In [150]:
a1.shape

Out[150]:
(1, 4)
In [151]:
a2.shape

Out[151]:
(1, 4)

For 2D arrays, axis=0 concatenates the rows:

In [152]:
np.concatenate((a1,a2),axis=0)

Out[152]:
array([[2, 4, 6, 8],
[1, 3, 5, 7]])

For 2D arrays, axis=1 concatenates the columns:

In [153]:
np.concatenate((a1,a2),axis=1)

Out[153]:
array([[2, 4, 6, 8, 1, 3, 5, 7]])

It is worth examining higher order dimensions briefly. As we increase the number of dimensions, the higher order dimension is found at the beginning of the tuple (and not at the end of the tuple) when we look up the attribute shape. We can see this explicitly if we use tuple unpacking.

(ncol)=array1d.shape

(nrow,ncol)=array2d.shape

(npage,nrow,ncol)=array3d.shape

(nframe,npage,nrow,ncol)=array4d.shape

...

This can cause quite some confusion when first using numpy as axis=0 is the ncol in a 1D array, nrow in a 2D array, npage in a 3D array and nframe in a 4D array...

If we instead look at negative indexing we see that axis=-1 will always perform concatenation along ncols. axis=-2 (when present) will always perform concatenation along rows and so on...

In [154]:
np.concatenate((a1,a2),axis=-1)

Out[154]:
array([[2, 4, 6, 8, 1, 3, 5, 7]])
In [155]:
np.concatenate((a1,a2),axis=-2)

Out[155]:
array([[2, 4, 6, 8],
[1, 3, 5, 7]])

Suppose for example we create the following book.

In [156]:
book=np.array([[[1,2,3,4],
[1,2,3,4],
[1,2,3,4]],

[[5,6,7,8],
[5,6,7,8],
[5,6,7,8]]])

In [157]:
book

Out[157]:
array([[[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]],

[[5, 6, 7, 8],
[5, 6, 7, 8],
[5, 6, 7, 8]]])
In [158]:
book.shape

Out[158]:
(2, 3, 4)

If we create the following matrix and want to concatenate it to book as an additional column (on each page).

In [159]:
col=np.array([[9,9,9],
[10,10,10]])

In [160]:
col

Out[160]:
array([[ 9,  9,  9],
[10, 10, 10]])
In [161]:
col.shape

Out[161]:
(2, 3)

We will need to explicitly reshape it as a 3D array. Note the number of pages (axis=0) and number of rows (axis=1) now match that of the book allowing for concatenation (along axis=2).

In [162]:
col=col.reshape((2,3,1))

In [163]:
col

Out[163]:
array([[[ 9],
[ 9],
[ 9]],

[[10],
[10],
[10]]])
In [164]:
np.concatenate((book,col),axis=2)

Out[164]:
array([[[ 1,  2,  3,  4,  9],
[ 1,  2,  3,  4,  9],
[ 1,  2,  3,  4,  9]],

[[ 5,  6,  7,  8, 10],
[ 5,  6,  7,  8, 10],
[ 5,  6,  7,  8, 10]]])

This is the last axis i.e. axis=-1 which as mentioned will always perform concatenation along ncols.

In [165]:
np.concatenate((book,col),axis=-1)

Out[165]:
array([[[ 1,  2,  3,  4,  9],
[ 1,  2,  3,  4,  9],
[ 1,  2,  3,  4,  9]],

[[ 5,  6,  7,  8, 10],
[ 5,  6,  7,  8, 10],
[ 5,  6,  7,  8, 10]]])

We will mainly be sticking to 2D arrays, in which case axis=0 will work on rows and axis=1 will work on columns.

### Indexing ndarrays and concatenation continued¶

Suppose we have the following matrix.

In [166]:
m=np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]])


Now suppose we want to make the following selections from it explictly as a row or column as highlighted.

$$m=\left[\begin{matrix}\color{#7030A0}{1}&\color{#FF0000}{2}&\color{#FF0000}{3}&\color{#FF0000}{4}\\\color{#7030A0}{5}&\color{#FF00FF}{6}&\color{#FF00FF}{7}&\color{#FF00FF}{8}\\\color{#7030A0}{9}&\color{#00B050}{10}&\color{#00B050}{11}&\color{#00FFFF}{12}\\\color{#7030A0}{13}&\color{#00B050}{14}&\color{#00B050}{15}&\color{#00FFFF}{16}\\\end{matrix}\right]$$

Let's start with the purple selection. We see we want rows 0,1,2 and 3 and only column 0. We can index using a tuple the 0th index contains the rows we want to select, in this case the tuple (0,1,2,3). The second index contains the columns we want to select in this case column 0.

In [167]:
purple=m[((0,1,2,3),0)]

In [168]:
purple

Out[168]:
array([ 1,  5,  9, 13])

Once again we can use tuple unpacking to remove the outer parenthesis.

In [169]:
purple=m[(0,1,2,3),0]

In [170]:
purple

Out[170]:
array([ 1,  5,  9, 13])

We can alternatively select the rows using a list.

In [171]:
purple=m[[0,1,2,3],0]

In [172]:
purple

Out[172]:
array([ 1,  5,  9, 13])

We can index this more efficiently using a colon :. Let's create a basic list and refresh how colon indexing works.

In [173]:
l1=[2,4,6,8,10,12,14,16,18]

In [174]:
for idx, val in enumerate(l1):
print(idx,val)

0 2
1 4
2 6
3 8
4 10
5 12
6 14
7 16
8 18


When we use a colon to index we use zero order indexing. This means that we index from the lower bound up to but not including the upper bound. i.e. we are inclusive of the lower bound and exclusive of the top bound.

l1[lower:upper]

In [175]:
l1[2:5]

Out[175]:
[6, 8, 10]

If the lower bound is not specified it is assumed to be 0.

l1[0:upper]

l1[:upper]

In [176]:
l1[0:5]

Out[176]:
[2, 4, 6, 8, 10]
In [177]:
l1[:5]

Out[177]:
[2, 4, 6, 8, 10]

If the lower bound is not specified it is assumed to be the length of the dimension.

l1[lower:len(l1)]

l1[lower:]

In [178]:
l1[2:len(l1)]

Out[178]:
[6, 8, 10, 12, 14, 16, 18]
In [179]:
l1[2:]

Out[179]:
[6, 8, 10, 12, 14, 16, 18]

If neither the lower bound or upper bound is specified the entire list is selected.

l1[0:len(l1)]

l1[:]

In [180]:
l1[0:len(l1)]

Out[180]:
[2, 4, 6, 8, 10, 12, 14, 16, 18]
In [181]:
l1[:]

Out[181]:
[2, 4, 6, 8, 10, 12, 14, 16, 18]

A second colon can be used to indicate a step (if it is not specified the step is by default set to 1).

l1[lower:upper:step]

Let's use a step of 2 and -1 for example.

In [182]:
l1[::2]

Out[182]:
[2, 6, 10, 14, 18]
In [183]:
l1[::-1]

Out[183]:
[18, 16, 14, 12, 10, 8, 6, 4, 2]

Returning to our array.

$$m=\left[\begin{matrix}\color{#7030A0}{1}&\color{#FF0000}{2}&\color{#FF0000}{3}&\color{#FF0000}{4}\\\color{#7030A0}{5}&\color{#FF00FF}{6}&\color{#FF00FF}{7}&\color{#FF00FF}{8}\\\color{#7030A0}{9}&\color{#00B050}{10}&\color{#00B050}{11}&\color{#00FFFF}{12}\\\color{#7030A0}{13}&\color{#00B050}{14}&\color{#00B050}{15}&\color{#00FFFF}{16}\\\end{matrix}\right]$$

The purple selection is all rows, which can be represented using a colon.

In [184]:
purple=m[:,0]

In [185]:
purple

Out[185]:
array([ 1,  5,  9, 13])

We need to explicitly reshape it to a column.

In [186]:
purple=purple.reshape((-1,1))

In [187]:
purple

Out[187]:
array([[ 1],
[ 5],
[ 9],
[13]])
$$m=\left[\begin{matrix}\color{#7030A0}{1}&\color{#FF0000}{2}&\color{#FF0000}{3}&\color{#FF0000}{4}\\\color{#7030A0}{5}&\color{#FF00FF}{6}&\color{#FF00FF}{7}&\color{#FF00FF}{8}\\\color{#7030A0}{9}&\color{#00B050}{10}&\color{#00B050}{11}&\color{#00FFFF}{12}\\\color{#7030A0}{13}&\color{#00B050}{14}&\color{#00B050}{15}&\color{#00FFFF}{16}\\\end{matrix}\right]$$

The red selection is the the oth row and all columns except the 0th column.

In [188]:
red=m[0,1:]

In [189]:
red

Out[189]:
array([2, 3, 4])

We need to explicitly reshape it to a row.

In [190]:
red=red.reshape((1,-1))

In [191]:
red

Out[191]:
array([[2, 3, 4]])

The magenta selection is similar except for being the 1st row opposed to the 0th row.

In [192]:
magenta=m[1,1:]
magenta=magenta.reshape((1,-1))
magenta

Out[192]:
array([[6, 7, 8]])
$$m=\left[\begin{matrix}\color{#7030A0}{1}&\color{#FF0000}{2}&\color{#FF0000}{3}&\color{#FF0000}{4}\\\color{#7030A0}{5}&\color{#FF00FF}{6}&\color{#FF00FF}{7}&\color{#FF00FF}{8}\\\color{#7030A0}{9}&\color{#00B050}{10}&\color{#00B050}{11}&\color{#00FFFF}{12}\\\color{#7030A0}{13}&\color{#00B050}{14}&\color{#00B050}{15}&\color{#00FFFF}{16}\\\end{matrix}\right]$$

The cyan selection is the 2nd to last row and last column.

In [193]:
cyan=m[2:,-1]

In [194]:
cyan

Out[194]:
array([12, 16])

We need to explicitly reshape it to a column.

In [195]:
cyan=cyan.reshape((-1,1))

In [196]:
cyan

Out[196]:
array([[12],
[16]])
$$m=\left[\begin{matrix}\color{#7030A0}{1}&\color{#FF0000}{2}&\color{#FF0000}{3}&\color{#FF0000}{4}\\\color{#7030A0}{5}&\color{#FF00FF}{6}&\color{#FF00FF}{7}&\color{#FF00FF}{8}\\\color{#7030A0}{9}&\color{#00B050}{10}&\color{#00B050}{11}&\color{#00FFFF}{12}\\\color{#7030A0}{13}&\color{#00B050}{14}&\color{#00B050}{15}&\color{#00FFFF}{16}\\\end{matrix}\right]$$

Finally the green selection is a matrix. It has the 2nd to last rows and 1st and 2nd columns (1:3 inclusive of the lower bound and exclusive of the top bound).

In [197]:
green=m[2:,1:3]

In [198]:
green

Out[198]:
array([[10, 11],
[14, 15]])

As this is already a 2D array it doesn't need to be reshaped.

Now that we have the fragments we can attempt to make this reconstructed matrix m2.

$$m2=\left[\begin{matrix}\color{#FF0000}{2}&\color{#FF0000}{3}&\color{#FF0000}{4}&\color{#7030A0}{1}\\\color{#00FFFF}{12}&\color{#00B050}{10}&\color{#00B050}{11}&\color{#7030A0}{5}\\\color{#00FFFF}{16}&\color{#00B050}{14}&\color{#00B050}{15}&\color{#7030A0}{9}\\\color{#FF00FF}{6}&\color{#FF00FF}{7}&\color{#FF00FF}{8}&\color{#7030A0}{13}\\\end{matrix}\right]$$

We see that the cyan and green fragments share the same number of rows and therefore can be concatenated along the columns axis=-1 (or 1 in the case of a 2D matrix).

In [199]:
cyangreen=np.concatenate((cyan,green),axis=-1)

In [200]:
cyangreen

Out[200]:
array([[12, 10, 11],
[16, 14, 15]])

We see that red, cyangreen and magenta fragments share the same number of columns and can therefore be concatenated along the rows axis=-2 (or 0 in the case of a 2D matrix).

In [201]:
redcyangreenmagenta=np.concatenate((red,cyangreen,magenta),axis=-2)

In [202]:
redcyangreenmagenta

Out[202]:
array([[ 2,  3,  4],
[12, 10, 11],
[16, 14, 15],
[ 6,  7,  8]])

Finally we see that the redcyangreenmagenta and the purple fragments share the same number of rows and can therefore be concatentated along the columns axis=-1 (or 1 in the case of a 2D matrix)

In [203]:
m2=np.concatenate((redcyangreenmagenta,purple),axis=-1)

In [204]:
m2

Out[204]:
array([[ 2,  3,  4,  1],
[12, 10, 11,  5],
[16, 14, 15,  9],
[ 6,  7,  8, 13]])

### numpy functions for rapid array generation¶

So far we have created all our arrays manually. numpy has a number fo functions to quickly generate arrays. The functions zeros and ones can be used to generate matrices of constant values. To get details about the input arguments type the function name followed by a shift ⇧ and tab ↹.

We see the positional input argument shape (which is a tuple of dimensions analogous to the same positional input argument in the reshape function or method) and the keyword input argument dtype which is analogous to the same keyword input argument found in the array function.

In [205]:
np.zeros((3,2))

Out[205]:
array([[0., 0.],
[0., 0.],
[0., 0.]])
In [206]:
np.zeros(4)

Out[206]:
array([0., 0., 0., 0.])

ones has the same form as zeros.

In [207]:
np.ones((3,2))

Out[207]:
array([[1., 1.],
[1., 1.],
[1., 1.]])
In [208]:
np.ones(4)

Out[208]:
array([1., 1., 1., 1.])

Any other number can be made by multiplication of a scalar, for example.

In [209]:
2*np.ones((3,2))

Out[209]:
array([[2., 2.],
[2., 2.],
[2., 2.]])

This can also be done using a custom function which incorporates ones.

In [210]:
def twos(*args):
return(2*np.ones(*args))

In [211]:
twos((3,2))

Out[211]:
array([[2., 2.],
[2., 2.],
[2., 2.]])

The method diagonal can be used to obtain the diagnoal from a matrix. It has the keyword offset with a default value of 0 which means it looks for the main diagonal by default.

In [212]:
m

Out[212]:
array([[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])

In [213]:
m.diagonal()

Out[213]:
array([ 1,  6, 11, 16])
In [214]:
m.diagonal(offset=1)

Out[214]:
array([ 2,  7, 12])
In [215]:
m.diagonal(offset=-1)

Out[215]:
array([ 5, 10, 15])

The function diag also behaves in a similar manner when used on matrices however if we type it in followed by shift ⇧ and tab ↹ we get details about its input arguments and see that it also works on a 1D array to create a matrix using the diagonal.

In [216]:
np.diag(m)

Out[216]:
array([ 1,  6, 11, 16])

All values not at the specified diagonal are 0.

In [217]:
np.diag(np.array([1,2,3,4]))

Out[217]:
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
In [218]:
np.diag(np.array([1,2,3,4]),k=1)

Out[218]:
array([[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4],
[0, 0, 0, 0, 0]])

To get the antidiagonal instead we can use the function fliplr, an abbreviation for flip in the left right direction. We can view the docstring for the function by typing in the function name followed by shift ⇧ and tab ↹.

In [219]:
m3=np.diag(np.array([1,2,3,4]))

In [220]:
m3

Out[220]:
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
In [221]:
np.fliplr(m3)

Out[221]:
array([[0, 0, 0, 1],
[0, 0, 2, 0],
[0, 3, 0, 0],
[4, 0, 0, 0]])

We can also use flipud, an abbreviation for flip in the up down direction. We can view the docstring for the function by typing in the function name followed by shift ⇧ and tab ↹.

In [222]:
np.flipud(m3)

Out[222]:
array([[0, 0, 0, 4],
[0, 0, 3, 0],
[0, 2, 0, 0],
[1, 0, 0, 0]])

The identity matrix is often referred to in mathematics as I and has values of 1 across the diagonal and 0 elsewhere. It is important for matrix multiplication, particularly square matrices (we will examine its use later when we look at interpolation).

In Python, case is important with upper cases being typically reserved for classes. In mathematics however the lower case i is used for imaginary numbers, although j is used in engineering (and by default in Python). To prevent confusion the numpy function eye is used to create the identity matrix (because I and eye sound the same in the English language).

The eye function input has one positional input argument which will be the integer dimensions of the square matrix. In the cases where the user wants a non-square matrix both m and n can be specified.

In [223]:
np.eye(4)

Out[223]:
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
In [224]:
np.eye(4,3)

Out[224]:
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 0.]])

The numpy function arange can be used to create an array of instantly spaced values. This function has 1-3 positional input arguments start, stop and input (which also work as keyword input arguments).

All three positional arguments can be specified.

In [225]:
np.arange(start=0,stop=10,step=1)

Out[225]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [226]:
np.arange(0,10,1)

Out[226]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Note that the function arange uses zero-order indexing. This means we are inclusive of the start bound but exclusive to the stop bound. i.e. in this case go up to 10 in steps of 1 without reaching 10.

If only two input arguments are specified, step is automatically assumed to be 1.

In [227]:
np.arange(start=0,stop=10)

Out[227]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [228]:
np.arange(0,10)

Out[228]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If only one input argument is specified, step is automatically assumed to be 1 and start is assumed to be 0.

In [229]:
np.arange(10)

Out[229]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The keyword input argument dtype can be used to assign the dtype while creating the array (and behaves in the same way as seen before in the numpy function array). If not assigned it will be inferred from the data.

Earlier we created the array m.

In [230]:
m=np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]])


We can create this using the numpy function arange and the ndarray method reshape. Note because we are using zero order indexing we need to explicitly start at 1 and stop 1 step above 16 to get 16 i.e. 17.

In [231]:
m=np.arange(start=1,stop=17,step=1)

In [232]:
m

Out[232]:
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])
In [233]:
m=m.reshape((4,4))

In [234]:
m

Out[234]:
array([[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])

When we use the function arange we need to consider zero-order indexing.

In [235]:
np.arange(start=-2,stop=2,step=0.5)

Out[235]:
array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5])
In [236]:
np.arange(start=-2,stop=2.5,step=0.5)

Out[236]:
array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5,  2. ])

Sometimes it is more convenient to create equally spaced arrays inclusive of both the start and stop bound. This can be done using the similar numpy function linspace.

In [237]:
np.linspace(start=-2,stop=2,num=9)

Out[237]:
array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5,  2. ])

### Element by Element Operations¶

The following operators are setup for numeric operation in a numpy array.

Operator Description double underscore (dunder) method
- element by element subtraction __sub__
* element by element multiplication __mul__
** element by element exponentiation __pow__
// element by element integer division __floordiv__
% element by element modulus __mod__
/ element by element float division __truediv__

This means that an array can interact with a numeric value for example the array with 4 elements can interact with a scalar.

In [238]:
a1=np.array([1,2,3,4])

In [239]:
a1+2

Out[239]:
array([3, 4, 5, 6])
In [240]:
a1-2

Out[240]:
array([-1,  0,  1,  2])
In [241]:
a1*2

Out[241]:
array([2, 4, 6, 8])
In [242]:
a1**2

Out[242]:
array([ 1,  4,  9, 16], dtype=int32)

This is also used for the square root.

In [243]:
a1**0.5

Out[243]:
array([1.        , 1.41421356, 1.73205081, 2.        ])
In [244]:
a1//2

Out[244]:
array([0, 1, 1, 2], dtype=int32)
In [245]:
a1//2

Out[245]:
array([0, 1, 1, 2], dtype=int32)
In [246]:
a1%2

Out[246]:
array([1, 0, 1, 0], dtype=int32)
In [247]:
a1/2

Out[247]:
array([0.5, 1. , 1.5, 2. ])

Recall that these operators work via use of the dunder methods, which can be expressed as:

In [248]:
a1.__add__(2)

Out[248]:
array([3, 4, 5, 6])

In all of the cases above scalar expansion is performed. Essentially in the background the scalar is multiplied by ones using the dimensions of the numpy array.

In [249]:
a1

Out[249]:
array([1, 2, 3, 4])
In [250]:
a2=2*np.ones(a1.shape,dtype=int)

In [251]:
a2

Out[251]:
array([2, 2, 2, 2])

The value in each index of a1 is added to the corresponding value in each index of a2.

In [252]:
np.array([1,2,3,4])+np.array([2,2,2,2])

Out[252]:
array([3, 4, 5, 6])

If the dimensions do not match, there is an error as there is no clear instruction to perform the operation.

np.array([1, 2, 3, 4])+np.array([2, 2, 2])

Therefore when two arrays are added they have to be equal size.

In [253]:
np.array([1,2,3,4])+np.array([2,4,6,8])

Out[253]:
array([ 3,  6,  9, 12])

For matrices it is possible to use scalar expansion and also vector expansion (providing the vector matches one of the dimensions of the matrix).

In [254]:
m=np.array([[1,2],
[3,4]])

In [255]:
m

Out[255]:
array([[1, 2],
[3, 4]])

When the vector is not explictly specified as a row or a column.

In [256]:
v=np.array([5,6])

In [257]:
v

Out[257]:
array([5, 6])

It is assumed to be an object with a single dimension. This object has multiple columns and no rows. Recall the following when we looked at shape.

(ncol)=array1d.shape

(nrow,ncol)=array2d.shape

(npage,nrow,ncol)=array3d.shape

(nframe,npage,nrow,ncol)=array4d.shape

...

In [258]:
m

Out[258]:
array([[1, 2],
[3, 4]])
In [259]:
v

Out[259]:
array([5, 6])

Therefore vector expansion of a 1D vector is always assumed to be along rows.

In [260]:
m+v

Out[260]:
array([[ 6,  8],
[ 8, 10]])

The vector can explicitly be reshaped as a row or column. In such a scenario, vector expansion will occur in line with the matching dimensions.

In [261]:
vrow=v.reshape((1,-1))

In [262]:
vrow

Out[262]:
array([[5, 6]])
In [263]:
vcol=v.reshape((-1,1))

In [264]:
vcol

Out[264]:
array([[5],
[6]])

Let's have a look at m+vrow.

In [265]:
m

Out[265]:
array([[1, 2],
[3, 4]])
In [266]:
vrow

Out[266]:
array([[5, 6]])
In [267]:
m+vrow

Out[267]:
array([[ 6,  8],
[ 8, 10]])

Let's now have a look at m+vcol.

In [268]:
m

Out[268]:
array([[1, 2],
[3, 4]])
In [269]:
vcol

Out[269]:
array([[5],
[6]])
In [270]:
m+vcol

Out[270]:
array([[ 6,  7],
[ 9, 10]])

The inplace operators also work (they will perform an inplace update to the array on the left hand side.

Operator Description
+= element by element in place addition
-= element by element in place subtraction
*= element by element in place multiplication
**= element by element in place exponentiation
//= element by element in place integer division
%= element by element in place modulus
/= element by element in place float division

Let's create the arrays a1 and a2.

In [271]:
a1=np.array([2,4,6,8])

In [272]:
a2=np.array([1,3,5,7])

In [273]:
a1

Out[273]:
array([2, 4, 6, 8])
In [274]:
a2

Out[274]:
array([1, 3, 5, 7])

Now let's perform a += operation.

In [275]:
a1+=a2

In [276]:
a1

Out[276]:
array([ 3,  7, 11, 15])
In [277]:
a2

Out[277]:
array([1, 3, 5, 7])

We see that a1 has been mutated and a2 remains unchanged.

Let's have a look at an array with negative numbers.

In [278]:
a3=np.array([-1,2,-3,4])


We can use the numpy function abs to return element by element the absolute values in the array.

In [279]:
np.abs(a3)

Out[279]:
array([1, 2, 3, 4])

Returning to the negative array because the array has negative numbers, it is not possible to calcualte the square root of it. An invalid value encountered in power error will display.

a3**0.5

If the dtype is changed to complex however the square root will display the real and imaginary components.

In [280]:
a3=np.array(a3,dtype=complex)

In [281]:
a3**0.5

Out[281]:
array([0.        +1.j        , 1.41421356+0.j        ,
0.        +1.73205081j, 2.        +0.j        ])

### Element by Element Multiplication vs Array Multiplication¶

For a matrix there is are some operations which occur element by element and others which occur across the array. Multiplication can be carried out element by element or across an array. The difference between these is better visualized using a practical example.

Operator Description double underscore (dunder) method
* element by element multiplication __mul__
@ array multiplication __matmul__

Let's have a look at multiplication of a scalar. Assume you are going to a shop and you want to purchase 5 pens that cost £2 each.

In [282]:
quantity=np.array(5)
price=np.array(2)

In [283]:
quantity*price

Out[283]:
10

Now let's instead look at buying two different item types in a shop, 5 pens and 3 pads. These items have a different price £2 and £6 respectively.

In [284]:
quantity=np.array([5,3])

In [285]:
price=np.array([2,6])


We can calculate the cost we spend on each item type using element by element multiplication. Alternatively we can calculate the total cost using array multiplication.

For element by element multiplication the dimensions of both arrays must match. They can both be columns or both be rows.

$$\left[\begin{matrix}5\\3\\\end{matrix}\right]\ast\left[\begin{matrix}2\\6\\\end{matrix}\right]=\left[\begin{matrix}5\ast2\\3\ast6\\\end{matrix}\right]=\left[\begin{matrix}5\ast2\\3\ast6\\\end{matrix}\right]=\left[\begin{matrix}10\\18\\\end{matrix}\right]$$

In this example we will reshape both quantity and price as columns.

In [286]:
quantity=quantity.reshape((-1,1))

In [287]:
quantity

Out[287]:
array([[5],
[3]])
In [288]:
price=price.reshape((-1,1))

In [289]:
price

Out[289]:
array([[2],
[6]])
In [290]:
quantity*price

Out[290]:
array([[10],
[18]])

For array multiplication the number of columns of the array on the left-hand side must match the number of rows of the array on the right-hand side. In other words, the inner dimensions of the array must match.

$$\left[\begin{matrix}5&3\\\end{matrix}\right]@\left[\begin{matrix}2\\6\\\end{matrix}\right]=\left[5\ast2+3\ast6\right]=\left[10+18\right]=\left[28\right]$$

We need quantity as a row vector and price as a column vector. price is already a column vector, so we will just reshape quantity.

In [291]:
quantity=quantity.reshape((1,-1))

In [292]:
quantity

Out[292]:
array([[5, 3]])
In [293]:
price

Out[293]:
array([[2],
[6]])

For matrix multiplication, the inner dimensions must match i.e. the number of columns in the left hand side matrix must match the number of rows in the tight hand side matrix.

In [294]:
quantity.shape

Out[294]:
(1, 2)
In [295]:
price.shape

Out[295]:
(2, 1)
In [296]:
quantity@price

Out[296]:
array([[28]])

The output matrix will have the dimensions of the outer matrices i.e. the number of rows of the left hand side matrix and the number of columns of the right hand side matrix.

In [297]:
total_cost=quantity@price

In [298]:
total_cost.shape

Out[298]:
(1, 1)

Because vectors are used, we must be careful with array multiplication and ensure that we reshape them so that they have the correct dimensionality. In the problem above we performed inner multiplication where the vector dimensions were (1,2) and (2,1) respectively leading to an output array that was (1,1).

In the problem we have the same numeric values but a different problem. This time we have 5 people and 3 people in the red and green office respectively. We want to order 2 pens and 6 pads for each person. Array multiplication instead of calculating a total cost will calculate the total price per item type in each office. In this problem the vector dimensions are (2,1) and (1,2) respectively leading to an output array that is (2,2).

$$\left[\begin{matrix}5\\3\\\end{matrix}\right]@\left[\begin{matrix}2&6\\\end{matrix}\right]=\left[\begin{matrix}5\ast2&5\ast6\\3\ast2&3\ast6\\\end{matrix}\right]=\left[\begin{matrix}10&30\\6&18\\\end{matrix}\right]$$

We need quantity as a column vector and price as a row vector.

In [299]:
quantity=quantity.reshape((-1,1))

In [300]:
price=price.reshape((1,-1))

In [301]:
quantity

Out[301]:
array([[5],
[3]])
In [302]:
price

Out[302]:
array([[2, 6]])

For matrix multiplication, the inner dimensions must match i.e. the number of columns in the left hand side matrix must match the number of rows in the tight hand side mat

In [303]:
quantity.shape

Out[303]:
(2, 1)
In [304]:
price.shape

Out[304]:
(1, 2)
In [305]:
quantity@price

Out[305]:
array([[10, 30],
[ 6, 18]])

The output matrix will have the dimensions of the outer matrices i.e. the number of rows of the left hand side matrix and the number of columns of the right hand side matrix.

In [306]:
total_item_cost_per_office=quantity@price

In [307]:
total_item_cost_per_office.shape

Out[307]:
(2, 2)

### Sorting Data in ndarrays and Basic Statistics of ndarrays¶

There are several statistical functions within the numpy library or ndarray methods that utilize the input argument axis (as previously discussed when looking at concatenation).

Recall that a tuple containing the number of dimensions can be obtained using the method shape. The index of each dimension corresponds to the axis.

(ncol)=array1d.shape

(nrow,ncol)=array2d.shape

(npage,nrow,ncol)=array3d.shape

(nframe,npage,nrow,ncol)=array4d.shape

...

The higher order dimension is added to the beginning of the tuple as the number of dimensions in the array is increased. This means that if we use the positive index then for a 1D array axis0=ncol, for a 2D array axis0=nrows, for a 3D array axis0=npage and for a 4d array axis0=nframe.

If we instead index the axis using negative indexing then axis=-1 will correspond to ncol for all arrays and thus mean when we use axis=-1 in any ndarray method or numpy function we will always "act on columns". axis=-2 means we will always act on rows (for a 2D array or higher dimension) and so on.

Let's create a vector v.

In [308]:
v=np.array([4,6,5])


We can sort the data in this array using the method sort. To get details about the input arguments we can type in the method followed by shift ⇧ and tab ↹. We see that there is no positional input arguments and the main keyword input argument is axis which is set to a default value of -1 which means it always acts on columns.

The method sort has no output and will always instead perform an inplace update as indicated in the docstring.

In [309]:
v.sort()

In [310]:
v

Out[310]:
array([4, 5, 6])
In [311]:
v=np.array([4,6,5])


The function sort has a similar behaviour but requires an array to be specified using a positional input argument. Note that this function ahs a return statement and does not perform an inplace update.

In [312]:
np.sort(v)

Out[312]:
array([4, 5, 6])

For this 1D array, axis=-1 (the default) corresponds to the only axis, axis=0.

In [313]:
np.sort(v,axis=0)

Out[313]:
array([4, 5, 6])
In [314]:
np.sort(v,axis=-1)

Out[314]:
array([4, 5, 6])

If we return to v and compare it to the output when using the numpy sort function.

In [315]:
v

Out[315]:
array([4, 6, 5])
In [316]:
np.sort(v)

Out[316]:
array([4, 5, 6])

We see the 0th index remains unchanged.

In [317]:
v[0]

Out[317]:
4

We see the 1st index of the sorted output now corresponds to the element that was in the 2nd index of v i.e.

In [318]:
v[2]

Out[318]:
5

We see the 2nd index of the sorted output now corresponds to the elment that was in the 1st index of v i.e.

In [319]:
v[1]

Out[319]:
6

Therefore if index using a list, we get the sorted array.

In [320]:
v[([0,2,1],)]

Out[320]:
array([4, 5, 6])

Note the above can be abbreviated as the following using tuple unpacking.

In [321]:
v[[0,2,1]]

Out[321]:
array([4, 5, 6])

We can get this list of indexes using the ndarray method argsort (an abbreviation for argument sort). This returns an array and like the method sort uses the same keyword input argument axis.

In [322]:
v.argsort()

Out[322]:
array([0, 2, 1], dtype=int64)

With the following giving the same output as the method sort.

In [323]:
v[v.argsort()]

Out[323]:
array([4, 5, 6])

The numpy function argsort behaves in a similar manner.

Instead of returning the full array we can also return the maximum and minimum value in an array using the methods max and min as well as the maximum and minimum index using the methods argmax and argmin. These methods all possess a similar keyword input argument axis however it defautls to a value of None (meaning the array is flattened and treated as a 1D array).

In [324]:
v.max()

Out[324]:
6
In [325]:
v.min()

Out[325]:
4
In [326]:
v[v.argmin()]

Out[326]:
4
In [327]:
v.argmax()

Out[327]:
1
In [328]:
v.argmin()

Out[328]:
0
In [329]:
v[v.argmax()]

Out[329]:
6

They all can be called as a function. When calling them as a function the instance needs to be specified using the positional input argument.

In [330]:
np.max(v)

Out[330]:
6
In [331]:
np.min(v)

Out[331]:
4
In [332]:
np.argmax(v)

Out[332]:
1
In [333]:
np.argmin(v)

Out[333]:
0

If we want to find the maximum value in each set of elements of two equally sized vectors.

In [334]:
v1=np.array([4,6,5])

In [335]:
v2=np.array([9,3,7])


We can create a higher dimensional 2D array m using concatenation. In this case we can explicitly set v1 and v2 to be row vectors.

In [336]:
v1=v1.reshape((1,-1))
v2=v2.reshape((1,-1))

In [337]:
v1

Out[337]:
array([[4, 6, 5]])
In [338]:
v2

Out[338]:
array([[9, 3, 7]])

We can concatenate along the rows which for a 2D array is axis 0 but more generally the rows correspond to an axis of -2.

In [339]:
m=np.concatenate((v1,v2),axis=-2)

In [340]:
m

Out[340]:
array([[4, 6, 5],
[9, 3, 7]])

For our specific query, we want to find the maximum value in each row of every column. Because we are acting on rows, this corresponds to an axis=0 for a 2D array or an axis=-2 more generally.

In [341]:
m.max(axis=-2)

Out[341]:
array([9, 6, 7])

In [342]:
v1=np.array([4,6,5])


We can have a look at the sum, prod, mean, cumulative sum, cumulative product, variance and standard deviation using the methods sum, prod, mean, cumsum, cumprod, var and std respectively. All of these methods contain the keyword input argument axis with a default of None which can be set to the desired access in a multi-dimensional array. These can also be called as functions from the nupy library.

In [343]:
v1

Out[343]:
array([4, 6, 5])

The sum is the addition of all elements in an array (or along an axis).

$$4+6+5$$
In [344]:
v1.sum()

Out[344]:
15

The prod is the multiplication of all elements in an array (or along an axis).

$$4*6*5$$
In [345]:
v1.prod()

Out[345]:
120

The mean is the addition of all elements in an array (or along an axis) divided by the number of elements in the array (or the axis).

In [346]:
v1.mean()

Out[346]:
5.0
In [347]:
v1.sum()/v1.size

Out[347]:
5.0

The cumsum calculates the cumulative sum going through all element in array or along an axis.

In [348]:
v1.cumsum()

Out[348]:
array([ 4, 10, 15], dtype=int32)

The cumprod calculates the cumulative product going through all element in array or along an axis.

In [349]:
v1.cumprod()

Out[349]:
array([  4,  24, 120], dtype=int32)

The mean gives the average value of every element in an array or along an axis. An additional metric is often useful to describe the difference of data points about the mean.

By definition the sum of differences between each datapoint and the mean will equate to 0.

To get a metric of the differences, we instead take the sum of the differences squared which will always yield a positive number. We can then divide this through by the number of datapoints (analogous to the mean).

In [350]:
v1.var()

Out[350]:
0.6666666666666666
In [351]:
((v1-v1.mean())**2).sum()/v1.size

Out[351]:
0.6666666666666666

In this mathematical expression, when one has knowledge of the mean and all the datapoints in a sample except one, the last datapoint can be calculated from existing knowledge.

The delta degrees of freedom is commonly set to 1. The var has a keyword input argument ddof (an abbreviation for delta degrees of freedom) which has a default value of 0. We can assign this to 1.

In [352]:
v1.var(ddof=1)

Out[352]:
1.0
In [353]:
((v1-v1.mean())**2).sum()/(v1.size-1)

Out[353]:
1.0

Notice that the variance uses the squared terms. If the unit being measured was a length for example, the variance would be an area. It is quite useful to take the square root of the variance to get a term that is of the same dimensionality of the mean. This is known as the std deviation.

In [354]:
v1.std(ddof=1)

Out[354]:
1.0
In [355]:
(((v1-v1.mean())**2).sum()/(v1.size-1))**0.5

Out[355]:
1.0

In the above a very basic example was selected which can readily be checked using a pen and paper. These can be scaled to larger multi-dimenional arrays with careful use of the keyword input argument axis.

In [356]:
np.pi

Out[356]:
3.141592653589793

### Array Division and Interpolation¶

Array division is the opposite process of multiplication, so let's have a look at the two arrays we have just created. Let's look at the example where we applied inner multiplication. Now let us assume we know the value on the left-hand side and the result from the array multiplication on the right hand side and are trying to instead calculate the values of the array in the middle:

Then what we have is:

$$\left[\begin{matrix}5&3\\\end{matrix}\right]@\left[\begin{matrix}x\\y\\\end{matrix}\right]=28$$

Or:

$$5x+3y=28$$

Here we run into a problem, we have two unknowns and only a single equation. Now our solution of $x=2$ and $y=6$ would work as we just back tracked from this solution however we never specified that we had to have a solution which yielded full values of £s if pennies are included then the solution $x=3.8$ and $y=9$ could also work. In other words, from the limited data we have we have come up with multiple solutions for the problem and we don't have the necessary number of equations to narrow down our result.

Let's now look at the example where we applied out multiplication. Once again let us assume we know the value on the left-hand side and the result from the array multiplication on the right hand side and are trying to instead calculate the values of the array in the middle:

Then what we have is:

$$\left[\begin{matrix}5\\3\\\end{matrix}\right]@\left[\begin{matrix}x&y\\\end{matrix}\right]=\left[\begin{matrix}10&30\\6&18\\\end{matrix}\right]$$

Then we have four equations for only two unknowns.

$$5x=10$$
$$5y=30$$
$$3x=6$$
$$3y=18$$

We only need the to solve the two equations for the two unknowns and we can see that $x=2$ and $y=6$ which is the correct answer. The other two equations confirm this result.

To solve a linear system of equations with accuracy we require the same number of equations as unknowns. This is commonly used in interpolation. If we have the following data we can ask ourselves the question, what is the velocity at time=16?

time (s) velocity (m/s)
0 0
10 227.04
15 362.78
20 517.35
22.5 602.97
30 901.67

We can guess the value of v at t=16 using a single point, the nearest point to perform nearest point interpolation.

time (s) Δt (s) velocity (m/s)
0 0-16=-16 0
10 10-16=-6 227.04
15 15-16=-1 362.78
20 20-16=+4 517.35
22.5 22.5-16=+6.5 602.97
30 30-16=+14 901.67

Let's create the following arrays.

In [357]:
t_orig=np.array([0,10,15,20,22.5,30])
v_orig=np.array([0,227.04,362.78,517.35,602.97,901.67])


We can read off the table to get the nearest datapoint and use this for an estimate at t=16.

In [358]:
v16=362.78
v16

Out[358]:
362.78

To get a more accurate estimation, we can use the two nearest points. This allows us to plot a straight line between the nearest two points and the value where this straight line intersects t=16 will yield a more accurate estimate.

The equation for a straight line is:

$$v[t]=c_0+c_1 t$$

To solve for the two unknowns:

$$c_0 , c_1$$

We require 2 linear equations:

$$v_0 = c_0 + c_1 t_0$$
$$v_1 = c_0 + c_1 t_1$$

Which in matrix form is:

$$\left[\begin{matrix}v_0\\v_1\\\end{matrix}\right]=\left[\begin{matrix}a_0&a_1t_0\\a_0&a_1t_1\\\end{matrix}\right]=\ \left[\begin{matrix}1&t_0\\1&t_1\\\end{matrix}\right]@\left[\begin{matrix}\color{#FF0000}{c_0}\\\color{#FF0000}{c_1}\\\end{matrix}\right]$$

Now we can input our values from the table:

$$\left[\begin{matrix}362.78\\517.35\\\end{matrix}\right]=\left[\begin{matrix}{15}^0&{15}^1\\{20}^0&{20}^1\\\end{matrix}\right]@\left[\begin{matrix}\color{#FF0000}{c_0}\\\color{#FF0000}{c_1}\\\end{matrix}\right]$$
$$\left[\begin{matrix}362.78\\517.35\\\end{matrix}\right]=\left[\begin{matrix}1&15\\1&20\\\end{matrix}\right]@\left[\begin{matrix}\color{#FF0000}{c_0}\\\color{#FF0000}{c_1}\\\end{matrix}\right]$$

For convenience we can assign these arrays as:

In [359]:
v=np.array([362.78,517.35])
v=v.reshape((-1,1))
v

Out[359]:
array([[362.78],
[517.35]])
In [360]:
t=np.array([[1,15],
[1,20]])
t

Out[360]:
array([[ 1, 15],
[ 1, 20]])

To solve these linear systems of equations we need to import the linalg module. A basic version of this module is actually included in numpy and a more advanced one is imported from scipy. scipy is a scientific python library built upon numpy and split into a number of modules for specific purposes. To view the list of modules type in from scipy import followed by a tab ↹.

You will see a number of modules to import.

In [361]:
from scipy import linalg


Once the module is imported we can access a number of additional functions (or other specific modules) by typing in linalg followed by a dot . and tab ↹.

From the linalg module we want to use the function solve. We can type this in followed by a shift ⇧ and tab ↹ to view details about its input arguments.

In [362]:
? linalg.solve

Signature:
linalg.solve(
a,
b,
sym_pos=False,
lower=False,
overwrite_a=False,
overwrite_b=False,
debug=None,
check_finite=True,
assume_a='gen',
transposed=False,
)
Docstring:
Solves the linear equation set a * x = b for the unknown x
for square a matrix.

If the data matrix is known to be a particular type then supplying the
corresponding string to assume_a key chooses the dedicated solver.
The available options are

===================  ========
generic matrix       'gen'
symmetric            'sym'
hermitian            'her'
positive definite    'pos'
===================  ========

If omitted, 'gen' is the default structure.

The datatype of the arrays define which solver is called regardless
of the values. In other words, even when the complex array entries have
precisely zero imaginary parts, the complex solver will be called based
on the data type of the array.

Parameters
----------
a : (N, N) array_like
Square input data
b : (N, NRHS) array_like
Input data for the right hand side.
sym_pos : bool, optional
Assume a is symmetric and positive definite. This key is deprecated
and assume_a = 'pos' keyword is recommended instead. The functionality
is the same. It will be removed in the future.
lower : bool, optional
If True, only the data contained in the lower triangle of a. Default
is to use upper triangle. (ignored for 'gen')
overwrite_a : bool, optional
Allow overwriting data in a (may enhance performance).
Default is False.
overwrite_b : bool, optional
Allow overwriting data in b (may enhance performance).
Default is False.
check_finite : bool, optional
Whether to check that the input matrices contain only finite numbers.
Disabling may give a performance gain, but may result in problems
(crashes, non-termination) if the inputs do contain infinities or NaNs.
assume_a : str, optional
Valid entries are explained above.
transposed: bool, optional
If True, a^T x = b for real matrices, raises NotImplementedError
for complex matrices (only for True).

Returns
-------
x : (N, NRHS) ndarray
The solution array.

Raises
------
ValueError
If size mismatches detected or input a is not square.
LinAlgError
If the matrix is singular.
LinAlgWarning
If an ill-conditioned input a is detected.
NotImplementedError
If transposed is True and input a is a complex matrix.

Examples
--------
Given a and b, solve for x:

>>> a = np.array([[3, 2, 0], [1, -1, 0], [0, 5, 1]])
>>> b = np.array([2, 4, -1])
>>> from scipy import linalg
>>> x = linalg.solve(a, b)
>>> x
array([ 2., -2.,  9.])
>>> np.dot(a, x) == b
array([ True,  True,  True], dtype=bool)

Notes
-----
If the input b matrix is a 1-D array with N elements, when supplied
together with an NxN input a, it is assumed as a valid column vector
despite the apparent size mismatch. This is compatible with the
numpy.dot() behavior and the returned result is still 1-D array.

The generic, symmetric, Hermitian and positive definite solutions are
obtained via calling ?GESV, ?SYSV, ?HESV, and ?POSV routines of
LAPACK respectively.
File:      c:\users\phili\anaconda3\lib\site-packages\scipy\linalg\basic.py
Type:      function


Here we see the function solve, solves an equation $ax=b$ for $x$. a and b are the positional input arguments and x is the return value. We can rearrange our problem so.

$$\left[\begin{matrix}362.78\\517.35\\\end{matrix}\right]=\left[\begin{matrix}1&15\\1&20\\\end{matrix}\right]@\left[\begin{matrix}\color{#FF0000}{a_0}\\\color{#FF0000}{a_1}\\\end{matrix}\right]$$
$$\left[\begin{matrix}1&15\\1&20\\\end{matrix}\right]@\left[\begin{matrix}\color{#FF0000}{a_0}\\\color{#FF0000}{a_1}\\\end{matrix}\right]=\left[\begin{matrix}362.78\\517.35\\\end{matrix}\right]$$

the positional input arguments:

• a is our square coefficient matrix of times t
• b is our velocity column vector v.

The result x will be our vector c of coefficients which will have equal dimensions to the velocity vector.

In [363]:
c=linalg.solve(t,v)
c

Out[363]:
array([[-100.93 ],
[  30.914]])

Now that we have these values we can create a time vector at time 16 (which has 2 data points).

In [364]:
t16=np.array([1,16])
t16=t16.reshape((1,-1))
t16

Out[364]:
array([[ 1, 16]])

Then calculate a linear interpolated value for the velocity.

In [365]:
v16=t16@c
v16

Out[365]:
array([[393.694]])

Instead of performing linear interpolation (2 nearest datapoints) we can perform quadratic interpolation (3 nearest datapoints). Now v and y will equal.

In [366]:
v=np.array([227.04,362.78,517.35])
v=v.reshape((-1,1))
v

Out[366]:
array([[227.04],
[362.78],
[517.35]])
In [367]:
t=np.array([[1,10,10**2],
[1,15,15**2],
[1,20,20**2]])
t

Out[367]:
array([[  1,  10, 100],
[  1,  15, 225],
[  1,  20, 400]])

We can solve for c.

In [368]:
c=linalg.solve(t,v)
c

Out[368]:
array([[12.05  ],
[17.733 ],
[ 0.3766]])

Now that we have these values we can create a time vector at time 16 (which has 2 data points).

In [369]:
t16=np.array([1,16,16**2])
t16=t16.reshape((1,-1))
t16

Out[369]:
array([[  1,  16, 256]])

Then calculate a linear interpolated value for the velocity.

In [370]:
v16=t16@c
v16

Out[370]:
array([[392.1876]])

We created a system of equations and then used linalg solve these equations to interpolate an unknown datapoint. scipy also has an interpolation module which we can use for this purpose.

In [371]:
t_orig=np.array([0,10,15,20,22.5,30])
v_orig=np.array([0,227.04,362.78,517.35,602.97,901.67])

In [372]:
from scipy import interpolate


Once imported a number of interpolation functions and classes can be accesed by typing in interpolate followed by a dot . and a tab ↹.

We see that most of these are classes. We can create instantiate the interp1d class to create a mathematical interpolation function which we can assign to an object name. Note although interp1d is a class, it is not CamelCaseCapitalized.

In [373]:
? interpolate.interp1d

Init signature:
interpolate.interp1d(
x,
y,
kind='linear',
axis=-1,
copy=True,
bounds_error=None,
fill_value=nan,
assume_sorted=False,
)
Docstring:
Interpolate a 1-D function.

x and y are arrays of values used to approximate some function f:
y = f(x). This class returns a function whose call method uses
interpolation to find the value of new points.

Parameters
----------
x : (N,) array_like
A 1-D array of real values.
y : (...,N,...) array_like
A N-D array of real values. The length of y along the interpolation
axis must be equal to the length of x.
kind : str or int, optional
Specifies the kind of interpolation as a string or as an integer
specifying the order of the spline interpolator to use.
The string has to be one of 'linear', 'nearest', 'nearest-up', 'zero',
'slinear', 'quadratic', 'cubic', 'previous', or 'next'. 'zero',
'slinear', 'quadratic' and 'cubic' refer to a spline interpolation of
zeroth, first, second or third order; 'previous' and 'next' simply
return the previous or next value of the point; 'nearest-up' and
'nearest' differ when interpolating half-integers (e.g. 0.5, 1.5)
in that 'nearest-up' rounds up and 'nearest' rounds down. Default
is 'linear'.
axis : int, optional
Specifies the axis of y along which to interpolate.
Interpolation defaults to the last axis of y.
copy : bool, optional
If True, the class makes internal copies of x and y.
If False, references to x and y are used. The default is to copy.
bounds_error : bool, optional
If True, a ValueError is raised any time interpolation is attempted on
a value outside of the range of x (where extrapolation is
necessary). If False, out of bounds values are assigned fill_value.
By default, an error is raised unless fill_value="extrapolate".
fill_value : array-like or (array-like, array_like) or "extrapolate", optional
- if a ndarray (or float), this value will be used to fill in for
requested points outside of the data range. If not provided, then
the default is NaN. The array-like must broadcast properly to the
dimensions of the non-interpolation axes.
- If a two-element tuple, then the first element is used as a
fill value for x_new < x[0] and the second element is used for
x_new > x[-1]. Anything that is not a 2-element tuple (e.g.,
list or ndarray, regardless of shape) is taken to be a single
array-like argument meant to be used for both bounds as
below, above = fill_value, fill_value.

- If "extrapolate", then points outside the data range will be
extrapolated.

assume_sorted : bool, optional
If False, values of x can be in any order and they are sorted first.
If True, x has to be an array of monotonically increasing values.

Attributes
----------
fill_value

Methods
-------
__call__

--------
splrep, splev
Spline interpolation/smoothing based on FITPACK.
UnivariateSpline : An object-oriented wrapper of the FITPACK routines.
interp2d : 2-D interpolation

Notes
-----
Calling interp1d with NaNs present in input values results in
undefined behaviour.

Input values x and y must be convertible to float values like
int or float.

Examples
--------
>>> import matplotlib.pyplot as plt
>>> from scipy import interpolate
>>> x = np.arange(0, 10)
>>> y = np.exp(-x/3.0)
>>> f = interpolate.interp1d(x, y)

>>> xnew = np.arange(0, 9, 0.1)
>>> ynew = f(xnew)   # use interpolation function returned by interp1d
>>> plt.plot(x, y, 'o', xnew, ynew, '-')
>>> plt.show()
Init docstring: Initialize a 1-D linear interpolation class.
File:           c:\users\phili\anaconda3\lib\site-packages\scipy\interpolate\interpolate.py
Type:           type
Subclasses:


Let's change the keyword input argument kind to 'nearest'.

In [374]:
f_near=interpolate.interp1d(t_orig,v_orig,kind='nearest')


We can now apply this function to the new t data which we wish to interpolate the unknown v values at, for example the time point t=16.

In [375]:
f_near(16)

Out[375]:
array(362.78)
In [376]:
f_lin=interpolate.interp1d(t_orig,v_orig,kind='linear')

In [377]:
f_lin(16)

Out[377]:
array(393.694)
In [378]:
f_quad=interpolate.interp1d(t_orig,v_orig,kind='quadratic')

In [379]:
f_quad(16)

Out[379]:
array(392.09275532)
In [380]:
f_cubic=interpolate.interp1d(t_orig,v_orig,kind='cubic')

In [381]:
f_cubic(16)

Out[381]:
array(392.07076444)

We can also interpolate a list of values.

In [382]:
t_new=np.arange(start=0,stop=31,step=1)
t_new

Out[382]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])
In [383]:
f_new=f_cubic(t_new)
f_new

Out[383]:
array([  0.        ,  20.51444   ,  41.45489778,  62.84408   ,
84.70469333, 107.05944444, 129.93104   , 153.34218667,
177.31559111, 201.87396   , 227.04      , 252.83641778,
279.28592   , 306.41121333, 334.23500444, 362.78      ,
392.07076444, 422.13929333, 453.01944   , 484.74505778,
517.35      , 550.87058   , 585.35295111, 620.84572667,
657.39752   , 695.05694444, 733.87261333, 773.89314   ,
815.16713778, 857.74322   , 901.67      ])

The constants module contains a number of useful scientific constants.

In [384]:
from scipy import constants


Once imported the constants can be accessed by typing in constants followed by a dot . and a tab ↹.

These are mainly attributes, although attributes are mainly lower case. In this module, attributes named after scientists are capitalized.

In [385]:
constants.Avogadro

Out[385]:
6.02214076e+23
In [386]:
constants.Boltzmann

Out[386]:
1.380649e-23
In [387]:
constants.G

Out[387]:
6.6743e-11
In [388]:
constants.angstrom

Out[388]:
1e-10
In [389]:
constants.atomic_mass

Out[389]:
1.6605390666e-27
In [390]:
constants.electron_volt

Out[390]:
1.602176634e-19

Before diving into scipy any deeper it is recommended to familiarize yourself with two commonly used Python libraries. I have put together additional JupyterLab Notebooks on these.

Python and Data Analysis Library (pandas)

The Python Plotting Library (matplotlib)