Numpy - multidimensional data arrays¶

Based on lectures from http://github.com/jrjohansson/scientific-python-lectures

The numpy package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.

In [1]:

from numpy import *

Creating `numpy` arrays¶

There are a number of ways to initialize new numpy arrays, for example from

a Python list or tuples
using functions that are dedicated to generating numpy arrays, such as arange, linspace, zeros, ones, etc.
reading data from files

In [2]:

M = array([[1,2],[3,4]])  # from list
M.shape, M.size, M.dtype

Out[2]:

((2, 2), 4, dtype('int64'))

In [3]:

array(((1,2),(3,5.0+0j)))

Out[3]:

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 5.+0.j]])

In [4]:

M = array([[1,2],[3,4]], dtype=complex)  # from list
print(M)
M.shape, M.size, M.dtype

[[1.+0.j 2.+0.j]
 [3.+0.j 4.+0.j]]

Out[4]:

((2, 2), 4, dtype('complex128'))

In [5]:

arange(1,10,0.1)

Out[5]:

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
       4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1,
       6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4,
       7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7,
       8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9])

In [6]:

array(range(10,100,1))/10.

Out[6]:

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
       4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1,
       6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4,
       7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7,
       8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9])

In [7]:

linspace(1,10,91)

Out[7]:

array([ 1. ,  1.1,  1.2,  1.3,  1.4,  1.5,  1.6,  1.7,  1.8,  1.9,  2. ,
        2.1,  2.2,  2.3,  2.4,  2.5,  2.6,  2.7,  2.8,  2.9,  3. ,  3.1,
        3.2,  3.3,  3.4,  3.5,  3.6,  3.7,  3.8,  3.9,  4. ,  4.1,  4.2,
        4.3,  4.4,  4.5,  4.6,  4.7,  4.8,  4.9,  5. ,  5.1,  5.2,  5.3,
        5.4,  5.5,  5.6,  5.7,  5.8,  5.9,  6. ,  6.1,  6.2,  6.3,  6.4,
        6.5,  6.6,  6.7,  6.8,  6.9,  7. ,  7.1,  7.2,  7.3,  7.4,  7.5,
        7.6,  7.7,  7.8,  7.9,  8. ,  8.1,  8.2,  8.3,  8.4,  8.5,  8.6,
        8.7,  8.8,  8.9,  9. ,  9.1,  9.2,  9.3,  9.4,  9.5,  9.6,  9.7,
        9.8,  9.9, 10. ])

In [8]:

logspace(-3,2,20)

Out[8]:

array([1.00000000e-03, 1.83298071e-03, 3.35981829e-03, 6.15848211e-03,
       1.12883789e-02, 2.06913808e-02, 3.79269019e-02, 6.95192796e-02,
       1.27427499e-01, 2.33572147e-01, 4.28133240e-01, 7.84759970e-01,
       1.43844989e+00, 2.63665090e+00, 4.83293024e+00, 8.85866790e+00,
       1.62377674e+01, 2.97635144e+01, 5.45559478e+01, 1.00000000e+02])

In [9]:

x = arange(0,10,0.5)  # linear mesh start:stop:increment
print(x)
y = linspace(0,10,21) # linear mesh start,stop,number of points
print(y)
z = logspace(-3,2,10) # log mesh, 10^start, 10^stop, number of points
print(z)

[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5 7.  7.5 8.  8.5
 9.  9.5]
[ 0.   0.5  1.   1.5  2.   2.5  3.   3.5  4.   4.5  5.   5.5  6.   6.5
  7.   7.5  8.   8.5  9.   9.5 10. ]
[1.00000000e-03 3.59381366e-03 1.29154967e-02 4.64158883e-02
 1.66810054e-01 5.99484250e-01 2.15443469e+00 7.74263683e+00
 2.78255940e+01 1.00000000e+02]

In [10]:

Z=zeros((3,3,2),dtype=complex)

Z.shape, Z.size, Z.dtype

Out[10]:

((3, 3, 2), 18, dtype('complex128'))

In [11]:

Z=ones((3,3),dtype=float)*1j

Z.shape, Z.size, Z.dtype

Out[11]:

((3, 3), 9, dtype('complex128'))

In [12]:

Z=ones((3,3))
# Z *= 1j   # issues an error, but would not work
Z = Z*1j
Z.shape, Z.size, Z.dtype

Out[12]:

((3, 3), 9, dtype('complex128'))

Random number generators¶

In [13]:

from numpy import random

print( random.rand() )  # uniformly distributed random number between [0,1]

print( random.rand(5,5) ) # uniform distributed (5x5) matrix

print( random.randn(5,5) ) # standard normal distribution random matrix

0.17129319280423427
[[0.68136812 0.48474291 0.19586362 0.05304944 0.13935424]
 [0.61944048 0.73399677 0.5378894  0.63996137 0.18797896]
 [0.30404113 0.98573661 0.46519993 0.86791191 0.85096155]
 [0.16435511 0.31602604 0.96019131 0.06160088 0.21851716]
 [0.02242947 0.50900989 0.79989502 0.42830787 0.23565606]]
[[ 0.50124616 -0.18421149 -0.94434126 -0.09408017 -0.76212889]
 [-1.25422183 -0.48305494  1.91924483 -0.72006581 -1.12775672]
 [-0.83642159  0.12483796 -0.06792395 -0.87661959 -1.00020029]
 [-1.69659347  1.58098831  0.24925687 -1.34600216 -0.09466769]
 [-0.90416851  0.41066708 -0.77173271 -0.22797163  0.96824553]]

File I/O¶

Very common form is comma-separated values (CSV) or tab-separated values (TSV). To read data from such files into Numpy arrays we can use the numpy.loadtxt or numpy.genfromtxt

File stockholm_td_adj.dat.txt contains Stockholm temperature over the years. The columns are [$year$,$month$,$day$,$T_{average}$,$T_{min}$,$T_{max}$]

In [14]:

## Check if file exists
!tail stockholm_td_adj.dat.txt

2011 12 22    -0.4    -1.0    -1.0 1
2011 12 23     3.7     3.1     3.1 1
2011 12 24     3.2     2.6     2.6 1
2011 12 25     4.2     3.5     3.5 1
2011 12 26     8.2     7.5     7.5 1
2011 12 27     8.3     7.6     7.6 1
2011 12 28     2.6     1.9     1.9 1
2011 12 29     4.9     4.2     4.2 1
2011 12 30     0.6    -0.1    -0.1 1
2011 12 31    -2.6    -3.3    -3.3 1

In [15]:

data = loadtxt('stockholm_td_adj.dat.txt')
data.shape

Out[15]:

(77431, 7)

In [16]:

# inline figures from matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

In [17]:

import pylab as plt

In [18]:

# time in years when we have year/month/day
t = data[:,0]+data[:,1]/12.+data[:,2]/365

In [19]:

t[:10]

Out[19]:

array([1800.08607306, 1800.08881279, 1800.09155251, 1800.09429224,
       1800.09703196, 1800.09977169, 1800.10251142, 1800.10525114,
       1800.10799087, 1800.11073059])

In [20]:

plt.plot(t, data[:,3])

Out[20]:

[<matplotlib.lines.Line2D at 0x7fe0a8dde4f0>]

In [21]:

# a bit more extended in x-direction
plt.figure(figsize=(14,4))
plt.plot(t, data[:,3])
plt.title('temperature in Stockholm')
plt.xlabel('year')
plt.ylabel('temperature $[\degree C]$');

Lets save the data in the form [t,$T_{average}$]

In [22]:

a=[1,2,3]
b=[4,5,6]
vstack((a,b))

Out[22]:

array([[1, 2, 3],
       [4, 5, 6]])

In [23]:

vstack((t,data[:,3])).T.shape

Out[23]:

(77431, 2)

In [24]:

data.shape

Out[24]:

(77431, 7)

In [25]:

savetxt('StockholmT.dat', vstack((t,data[:,3])).T)

In [26]:

!tail StockholmT.dat

2.012060273972602772e+03 -4.000000000000000222e-01
2.012063013698630130e+03 3.700000000000000178e+00
2.012065753424657487e+03 3.200000000000000178e+00
2.012068493150684844e+03 4.200000000000000178e+00
2.012071232876712429e+03 8.199999999999999289e+00
2.012073972602739786e+03 8.300000000000000711e+00
2.012076712328767144e+03 2.600000000000000089e+00
2.012079452054794501e+03 4.900000000000000355e+00
2.012082191780821859e+03 5.999999999999999778e-01
2.012084931506849216e+03 -2.600000000000000089e+00

More efficient binary storage of data to the disc¶

In [27]:

save('ST_data',data)
!ls -ltr

total 38048
-rw-r--r--@ 1 haule  staff  1756407 Dec 18  2020 Scientific-Computing-with-Python.pdf
-rw-r--r--@ 1 haule  staff   611082 Jan 20  2021 00_Introduction.html
-rw-r--r--@ 1 haule  staff   615891 Jan 20  2021 01_Basic_Python.html
-rw-r--r--@ 1 haule  staff   627779 Jan 20  2021 02_Numpy.html
-rw-r--r--@ 1 haule  staff  1303621 Mar  4  2021 03_Scipy.html
-rw-r--r--@ 1 haule  staff  1040661 Mar  4  2021 04_Scipy_example_Hydrogen_atom.html
-rw-r--r--  1 haule  staff   210051 Jan  3 17:57 02_Numpy.ipynb
-rw-r--r--  1 haule  staff   826208 Jan  3 17:57 03_Scipy.ipynb
-rw-r--r--  1 haule  staff   533268 Jan  3 17:57 04_Scipy_example_Hydrogen_atom.ipynb
-rw-r--r--@ 1 haule  staff   430767 Jan  3 17:57 double_pendulum.mp4
-rw-r--r--  1 haule  staff     2038 Jan  3 17:57 pendulum.py
-rw-r--r--  1 haule  staff  2864984 Jan  3 17:57 stockholm_td_adj.dat.txt
-rw-r--r--  1 haule  staff      264 Jan  3 17:57 ttt.cc
-rw-r--r--  1 haule  staff    24263 Jan  6 16:06 00_Introduction.ipynb
-rw-r--r--  1 haule  staff      752 Jan  6 16:06 mymodule.py
drwxr-xr-x  3 haule  staff       96 Jan  6 16:06 __pycache__
-rw-r--r--  1 haule  staff    52785 Jan  6 16:07 01_Basic_Python.ipynb
-rw-r--r--  1 haule  staff  3889864 Jan  6 16:07 StockholmT.dat
-rw-r--r--  1 haule  staff  4336264 Jan  6 16:07 ST_data.npy

In [28]:

data2=load('ST_data.npy')

In [29]:

allclose(data,data2)  # how close are the two sets of data

Out[29]:

True

In [30]:

max(abs(data-data2).ravel())

Out[30]:

0.0

In [31]:

amax(abs(data-data2))

Out[31]:

0.0

Manipulating data¶

Indexing and slicing¶

data[lower:upper:step, lower:upper:step]

In [32]:

print(data[:,0])  # display year for all datapoints
array(data[::365,0],dtype=int) # the years with 365 spacings, and then last years

[1800. 1800. 1800. ... 2011. 2011. 2011.]

Out[32]:

array([1800, 1801, 1802, 1803, 1804, 1804, 1805, 1806, 1807, 1808, 1809,
       1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820,
       1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831,
       1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842,
       1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853,
       1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864,
       1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875,
       1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886,
       1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897,
       1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908,
       1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919,
       1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930,
       1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941,
       1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952,
       1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963,
       1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974,
       1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985,
       1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996,
       1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
       2008, 2009, 2010, 2011])

Fancy indexing Index is itself an array of integer numbers, i.e, which rows or columns?

In [33]:

x=arange(100)
print(x[:10:2])
print(x[ [0,2,4,6,8] ])  # fancy indexing. Does not exists in C++

[0 2 4 6 8]
[0 2 4 6 8]

In [34]:

data[[0,365,2*365,3*365,4*365,5*365+1],0] # fancy indexing for multidimensional array

Out[34]:

array([1800., 1801., 1802., 1803., 1804., 1805.])

In [35]:

data[range(0,365*20,365),0]

Out[35]:

array([1800., 1801., 1802., 1803., 1804., 1804., 1805., 1806., 1807.,
       1808., 1809., 1810., 1811., 1812., 1813., 1814., 1815., 1816.,
       1817., 1818.])

In [36]:

data[range(3,3+365*10,365),0]

Out[36]:

array([1800., 1801., 1802., 1803., 1804., 1805., 1806., 1807., 1808.,
       1809.])

Using mask to pick data*

In addition to fancy indexing, Python allows yet another way to pick the data. Create a mask of [True,....False....] values, and pick from the array only columns/rows where True.

How to compute average temperature in the year of 1973?

In [37]:

data[:,0] >= 1973
data[:,0] < 1974

Out[37]:

array([ True,  True,  True, ..., False, False, False])

In [38]:

# Create mask for the year 1973
mask = logical_and(data[:,0] >= 1973, data[:,0] < 1974)

In [39]:

mask

Out[39]:

array([False, False, False, ..., False, False, False])

In [40]:

data[mask,0]  # All should have 1973

Out[40]:

array([1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.,
       1973., 1973., 1973., 1973., 1973.])

In [41]:

T1973 = data[mask,3]
print('Average temperature in 1973=', sum(T1973)/len(T1973))

Average temperature in 1973= 7.414794520547944

In [42]:

where(mask)

Out[42]:

(array([63187, 63188, 63189, 63190, 63191, 63192, 63193, 63194, 63195,
        63196, 63197, 63198, 63199, 63200, 63201, 63202, 63203, 63204,
        63205, 63206, 63207, 63208, 63209, 63210, 63211, 63212, 63213,
        63214, 63215, 63216, 63217, 63218, 63219, 63220, 63221, 63222,
        63223, 63224, 63225, 63226, 63227, 63228, 63229, 63230, 63231,
        63232, 63233, 63234, 63235, 63236, 63237, 63238, 63239, 63240,
        63241, 63242, 63243, 63244, 63245, 63246, 63247, 63248, 63249,
        63250, 63251, 63252, 63253, 63254, 63255, 63256, 63257, 63258,
        63259, 63260, 63261, 63262, 63263, 63264, 63265, 63266, 63267,
        63268, 63269, 63270, 63271, 63272, 63273, 63274, 63275, 63276,
        63277, 63278, 63279, 63280, 63281, 63282, 63283, 63284, 63285,
        63286, 63287, 63288, 63289, 63290, 63291, 63292, 63293, 63294,
        63295, 63296, 63297, 63298, 63299, 63300, 63301, 63302, 63303,
        63304, 63305, 63306, 63307, 63308, 63309, 63310, 63311, 63312,
        63313, 63314, 63315, 63316, 63317, 63318, 63319, 63320, 63321,
        63322, 63323, 63324, 63325, 63326, 63327, 63328, 63329, 63330,
        63331, 63332, 63333, 63334, 63335, 63336, 63337, 63338, 63339,
        63340, 63341, 63342, 63343, 63344, 63345, 63346, 63347, 63348,
        63349, 63350, 63351, 63352, 63353, 63354, 63355, 63356, 63357,
        63358, 63359, 63360, 63361, 63362, 63363, 63364, 63365, 63366,
        63367, 63368, 63369, 63370, 63371, 63372, 63373, 63374, 63375,
        63376, 63377, 63378, 63379, 63380, 63381, 63382, 63383, 63384,
        63385, 63386, 63387, 63388, 63389, 63390, 63391, 63392, 63393,
        63394, 63395, 63396, 63397, 63398, 63399, 63400, 63401, 63402,
        63403, 63404, 63405, 63406, 63407, 63408, 63409, 63410, 63411,
        63412, 63413, 63414, 63415, 63416, 63417, 63418, 63419, 63420,
        63421, 63422, 63423, 63424, 63425, 63426, 63427, 63428, 63429,
        63430, 63431, 63432, 63433, 63434, 63435, 63436, 63437, 63438,
        63439, 63440, 63441, 63442, 63443, 63444, 63445, 63446, 63447,
        63448, 63449, 63450, 63451, 63452, 63453, 63454, 63455, 63456,
        63457, 63458, 63459, 63460, 63461, 63462, 63463, 63464, 63465,
        63466, 63467, 63468, 63469, 63470, 63471, 63472, 63473, 63474,
        63475, 63476, 63477, 63478, 63479, 63480, 63481, 63482, 63483,
        63484, 63485, 63486, 63487, 63488, 63489, 63490, 63491, 63492,
        63493, 63494, 63495, 63496, 63497, 63498, 63499, 63500, 63501,
        63502, 63503, 63504, 63505, 63506, 63507, 63508, 63509, 63510,
        63511, 63512, 63513, 63514, 63515, 63516, 63517, 63518, 63519,
        63520, 63521, 63522, 63523, 63524, 63525, 63526, 63527, 63528,
        63529, 63530, 63531, 63532, 63533, 63534, 63535, 63536, 63537,
        63538, 63539, 63540, 63541, 63542, 63543, 63544, 63545, 63546,
        63547, 63548, 63549, 63550, 63551]),)

In [43]:

# where tells you the index where True
indices = where(mask)
X1973 = data[indices,3]; # This gives similar data in 1973, but not identical

In [44]:

print(T1973.shape, X1973.shape, X1973[0].shape)
print('Average temperature in 1973=', sum(X1973[0,:])/len(X1973[0,:]))
print('Min/Max temperature in 1973=', min(X1973[0,:]),'/',max(X1973[0,:]))

(365,) (1, 365) (365,)
Average temperature in 1973= 7.414794520547944
Min/Max temperature in 1973= -12.8 / 25.6

What is the mean monthly temperatures for each month of the year?

Let's do Ferbrurary first

In [45]:

Febr=data[:,1]==2
mean(data[Febr,3])

Out[45]:

-3.212109570736596

Now loop for all months

In [46]:

monthly_mean=[mean(data[data[:,1]==month,3]) for month in range(1,13)]

In [47]:

print(monthly_mean)

[-3.0447656725502132, -3.212109570736596, -0.8321515520389532, 3.888474842767295, 9.560970785149117, 14.659591194968554, 17.318837492391967, 16.117650639074864, 11.81937106918239, 6.760057821059039, 1.9458490566037738, -1.211275106512477]

In [48]:

plt.bar(range(1,13),monthly_mean);
plt.xlabel('month')
plt.ylabel('average temperature')

Out[48]:

Text(0, 0.5, 'average temperature')

Linear Algebra¶

It is implemented in low level fortran/C code, and is much more efficient than code written in Python.

In [49]:

A = random.rand(3,3)


print(A * A)  # It is not matrix-matrix product, but element-wise product
print()
print(multiply(A,A))
print()
print(A @ A)  # It is a matrix product
print()
print(matmul(A,A))

[[0.22276154 0.0009548  0.61689552]
 [0.62565466 0.0665923  0.19328271]
 [0.70376555 0.60291592 0.16846522]]

[[0.22276154 0.0009548  0.61689552]
 [0.62565466 0.0665923  0.19328271]
 [0.70376555 0.60291592 0.16846522]]

[[0.90610321 0.63242347 0.70666229]
 [0.94625905 0.43240315 0.91515852]
 [1.35444983 0.54499666 1.16873531]]

[[0.90610321 0.63242347 0.70666229]
 [0.94625905 0.43240315 0.91515852]
 [1.35444983 0.54499666 1.16873531]]

Matrix product or matrix-vector product can be performed by dot command

In [50]:

print(dot(A,A))
print(matmul(A,A))
# dot == A[i,j,:] * B[l,:,n] 

[[0.90610321 0.63242347 0.70666229]
 [0.94625905 0.43240315 0.91515852]
 [1.35444983 0.54499666 1.16873531]]
[[0.90610321 0.63242347 0.70666229]
 [0.94625905 0.43240315 0.91515852]
 [1.35444983 0.54499666 1.16873531]]

In [51]:

v1 = random.rand(3)
print(v1)
print( dot(A,v1) ) # matrix.vector product
print( dot(v1,v1) ) # length of vector^2

[0.81999594 0.24776367 0.00199407]
[0.39624061 0.71341642 0.88110177]
0.7337841462337032

In [52]:

print(A@v1)
print(dot(A,v1))

[0.39624061 0.71341642 0.88110177]
[0.39624061 0.71341642 0.88110177]

In [53]:

w=A*v1
sum(w,axis=1)

Out[53]:

array([0.39624061, 0.71341642, 0.88110177])

Slightly less efficient, but nicer code can be obtained by matrix clas

In [54]:

M = matrix(A)
v = matrix(v1).T # create a column vector

In [55]:

M*M  # this is now matrix-matrix product

Out[55]:

matrix([[0.90610321, 0.63242347, 0.70666229],
        [0.94625905, 0.43240315, 0.91515852],
        [1.35444983, 0.54499666, 1.16873531]])

In [56]:

M.T

Out[56]:

matrix([[0.4719762 , 0.79098335, 0.83890735],
        [0.03089979, 0.25805484, 0.7764766 ],
        [0.78542697, 0.4396393 , 0.41044515]])

In [57]:

M*v # this is matrix*vector product

Out[57]:

matrix([[0.39624061],
        [0.71341642],
        [0.88110177]])

In [58]:

v.T * M # vector*matrix product

Out[58]:

matrix([[0.58466834, 0.09082266, 0.75379202]])

In [59]:

v.T * v # inner-product

Out[59]:

matrix([[0.73378415]])

Array/Matrix transformations

.T or transpose(M) transposes matrix
.H hermitian conjugate
conjugate(M) conjugates
real(M) and imag(M) takes real and imaginary part of the matrix

In [60]:

conjugate(A.T)

Out[60]:

array([[0.4719762 , 0.79098335, 0.83890735],
       [0.03089979, 0.25805484, 0.7764766 ],
       [0.78542697, 0.4396393 , 0.41044515]])

More advanced linear algebra operations¶

Library linalg:

linalg.det(A)
linalg.inv(A) or just M.I
linalg.eig, linalg.eigvals, linalg.eigh
linalg.svd()
linalg.solve()
linalg.cholesky()

In [61]:

from numpy import linalg

In [62]:

print( linalg.det(A) )
linalg.inv(A)

0.20259783890966102

Out[62]:

array([[-1.16216576,  2.94762767, -0.93336865],
       [ 0.21797549, -2.29607639,  2.04227434],
       [ 1.96298224, -1.68094794,  0.48053093]])

In [63]:

M.I

Out[63]:

matrix([[-1.16216576,  2.94762767, -0.93336865],
        [ 0.21797549, -2.29607639,  2.04227434],
        [ 1.96298224, -1.68094794,  0.48053093]])

The eigenvalue problem for a matrix $A$:

$\displaystyle A v_n = \lambda_n v_n$

where $v_n$ is the $n$th eigenvector and $\lambda_n$ is the $n$th eigenvalue.

To calculate eigenvalues of a matrix, use the eigvals (symmetric/hermitian eigvalsh) and for calculating both eigenvalues and eigenvectors, use the function eig (or eigh):

In [64]:

linalg.eigvals(A)

Out[64]:

array([ 1.59750194+0.j        , -0.22851287+0.27313645j,
       -0.22851287-0.27313645j])

In [65]:

linalg.eigvalsh(A)

Out[65]:

array([-0.45257226, -0.39695136,  1.98999982])

In [66]:

A = array([[1,2,3], [4,5,6], [7,8,9]])
b = array([1,2,3])
x = linalg.solve(A,b) # A*x==b
print(x)
dot(A,x)-b

[-0.33333333  0.66666667  0.        ]

Out[66]:

array([ 0.00000000e+00, -2.22044605e-16,  0.00000000e+00])

sum, cumsum, trace, diag¶

In [67]:

v1

Out[67]:

array([0.81999594, 0.24776367, 0.00199407])

In [68]:

print( sum(v1) )
print( cumsum(v1) )
print( trace(A) )
print( diag(A) )
print( sum(diag(A)) )

1.06975367109378
[0.81999594 1.0677596  1.06975367]
15
[1 5 9]
15

Reshaping, resizing, and stacking arrays¶

In [69]:

print(A.shape)
Ag = reshape(A, (9,1))  # this is not new data
print(Ag.shape)

(3, 3)
(9, 1)

In [70]:

Ag[0]=10
A        # we change A when we change Ag

Out[70]:

array([[10,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

In [71]:

Ax = A.flatten()  # flatten creates 1D array of all data, but creates a copy
Ax[8]=100         # changing a copy
print(A)

[[10  2  3]
 [ 4  5  6]
 [ 7  8  9]]

In [72]:

Ay=A.ravel()
Ay[8]=100
print(A)

[[ 10   2   3]
 [  4   5   6]
 [  7   8 100]]

Vectorizing functions¶

Every function written in Python is very slow. However numpy type operations are fast, because they are written in fortran/C

In [73]:

Temp = data[:,3]

Temp**2  #  this is fast, written in C

Out[73]:

array([ 37.21, 237.16, 225.  , ...,  24.01,   0.36,   6.76])

In [74]:

array([Temp[i]**2 for i in range(len(Temp))])  # This is slow, written in Python

Out[74]:

array([ 37.21, 237.16, 225.  , ...,  24.01,   0.36,   6.76])

What if we have a function that can not simply work on arrays?

For example, theta function?

In [75]:

def Theta(x):
    if x>=0:
        return 1
    else:
        return 0

In [76]:

# Does not work on array
Theta(Temp)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[76], line 2
      1 # Does not work on array
----> 2 Theta(Temp)

Cell In[75], line 2, in Theta(x)
      1 def Theta(x):
----> 2     if x>=0:
      3         return 1
      4     else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We can vectorize Theta, to make it applicable to arrays.

This is simply achieved by call to numpy function vectorize, which will create low-level routine from your function

In [77]:

Theta_vec = vectorize(Theta)

In [78]:

# This is very fast now, and creates 0 or ones
positive_temperatures=Theta_vec(Temp)
positive_temperatures

Out[78]:

array([0, 0, 0, ..., 1, 1, 0])

How to calculate number of days in a year with positive temperatures?

In [79]:

# Boolean array to select data with positive temperatures
positives = array(positive_temperatures, dtype=bool)
# keeps data with positive temperatures only
kept = data[positives,0]
# now we just need to check how many of these data are in each year

In [80]:

years = list(range(1800,2013,1))
hist = histogram(kept, bins=years)

In [81]:

plt.plot(hist[1][:-1], hist[0]);

In [82]:

negative = array(1-Theta_vec(Temp), dtype=bool)
kept2 = data[negative,0]
hist2 = histogram(kept2, bins=years)
plt.plot(hist2[1][:-1], hist[0]);

In [ ]: