Based on lectures from http://github.com/jrjohansson/scientific-python-lectures
The numpy
package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.
from numpy import *
numpy
arrays¶There are a number of ways to initialize new numpy arrays, for example from
arange
, linspace
, zeros
, ones
, etc.M = array([[1,2],[3,4]]) # from list
M.shape, M.size, M.dtype
((2, 2), 4, dtype('int64'))
array(((1,2),(3,5.0+0j)))
array([[1.+0.j, 2.+0.j], [3.+0.j, 5.+0.j]])
M = array([[1,2],[3,4]], dtype=complex) # from list
print(M)
M.shape, M.size, M.dtype
[[1.+0.j 2.+0.j] [3.+0.j 4.+0.j]]
((2, 2), 4, dtype('complex128'))
arange(1,10,0.1)
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9])
array(range(10,100,1))/10.
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9])
linspace(1,10,91)
array([ 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10. ])
logspace(-3,2,20)
array([1.00000000e-03, 1.83298071e-03, 3.35981829e-03, 6.15848211e-03, 1.12883789e-02, 2.06913808e-02, 3.79269019e-02, 6.95192796e-02, 1.27427499e-01, 2.33572147e-01, 4.28133240e-01, 7.84759970e-01, 1.43844989e+00, 2.63665090e+00, 4.83293024e+00, 8.85866790e+00, 1.62377674e+01, 2.97635144e+01, 5.45559478e+01, 1.00000000e+02])
x = arange(0,10,0.5) # linear mesh start:stop:increment
print(x)
y = linspace(0,10,21) # linear mesh start,stop,number of points
print(y)
z = logspace(-3,2,10) # log mesh, 10^start, 10^stop, number of points
print(z)
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. 6.5 7. 7.5 8. 8.5 9. 9.5] [ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. 6.5 7. 7.5 8. 8.5 9. 9.5 10. ] [1.00000000e-03 3.59381366e-03 1.29154967e-02 4.64158883e-02 1.66810054e-01 5.99484250e-01 2.15443469e+00 7.74263683e+00 2.78255940e+01 1.00000000e+02]
Z=zeros((3,3,2),dtype=complex)
Z.shape, Z.size, Z.dtype
((3, 3, 2), 18, dtype('complex128'))
Z=ones((3,3),dtype=float)*1j
Z.shape, Z.size, Z.dtype
((3, 3), 9, dtype('complex128'))
Z=ones((3,3))
# Z *= 1j # issues an error, but would not work
Z = Z*1j
Z.shape, Z.size, Z.dtype
((3, 3), 9, dtype('complex128'))
from numpy import random
print( random.rand() ) # uniformly distributed random number between [0,1]
print( random.rand(5,5) ) # uniform distributed (5x5) matrix
print( random.randn(5,5) ) # standard normal distribution random matrix
0.17129319280423427 [[0.68136812 0.48474291 0.19586362 0.05304944 0.13935424] [0.61944048 0.73399677 0.5378894 0.63996137 0.18797896] [0.30404113 0.98573661 0.46519993 0.86791191 0.85096155] [0.16435511 0.31602604 0.96019131 0.06160088 0.21851716] [0.02242947 0.50900989 0.79989502 0.42830787 0.23565606]] [[ 0.50124616 -0.18421149 -0.94434126 -0.09408017 -0.76212889] [-1.25422183 -0.48305494 1.91924483 -0.72006581 -1.12775672] [-0.83642159 0.12483796 -0.06792395 -0.87661959 -1.00020029] [-1.69659347 1.58098831 0.24925687 -1.34600216 -0.09466769] [-0.90416851 0.41066708 -0.77173271 -0.22797163 0.96824553]]
Very common form is comma-separated values (CSV) or tab-separated values (TSV). To read data from such files into Numpy arrays we can use the numpy.loadtxt
or numpy.genfromtxt
File stockholm_td_adj.dat.txt
contains Stockholm temperature over the years. The columns are [$year$,$month$,$day$,$T_{average}$,$T_{min}$,$T_{max}$]
## Check if file exists
!tail stockholm_td_adj.dat.txt
2011 12 22 -0.4 -1.0 -1.0 1 2011 12 23 3.7 3.1 3.1 1 2011 12 24 3.2 2.6 2.6 1 2011 12 25 4.2 3.5 3.5 1 2011 12 26 8.2 7.5 7.5 1 2011 12 27 8.3 7.6 7.6 1 2011 12 28 2.6 1.9 1.9 1 2011 12 29 4.9 4.2 4.2 1 2011 12 30 0.6 -0.1 -0.1 1 2011 12 31 -2.6 -3.3 -3.3 1
data = loadtxt('stockholm_td_adj.dat.txt')
data.shape
(77431, 7)
# inline figures from matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import pylab as plt
# time in years when we have year/month/day
t = data[:,0]+data[:,1]/12.+data[:,2]/365
t[:10]
array([1800.08607306, 1800.08881279, 1800.09155251, 1800.09429224, 1800.09703196, 1800.09977169, 1800.10251142, 1800.10525114, 1800.10799087, 1800.11073059])
plt.plot(t, data[:,3])
[<matplotlib.lines.Line2D at 0x7fe0a8dde4f0>]
# a bit more extended in x-direction
plt.figure(figsize=(14,4))
plt.plot(t, data[:,3])
plt.title('temperature in Stockholm')
plt.xlabel('year')
plt.ylabel('temperature $[\degree C]$');
Lets save the data in the form [t,$T_{average}$]
a=[1,2,3]
b=[4,5,6]
vstack((a,b))
array([[1, 2, 3], [4, 5, 6]])
vstack((t,data[:,3])).T.shape
(77431, 2)
data.shape
(77431, 7)
savetxt('StockholmT.dat', vstack((t,data[:,3])).T)
!tail StockholmT.dat
2.012060273972602772e+03 -4.000000000000000222e-01 2.012063013698630130e+03 3.700000000000000178e+00 2.012065753424657487e+03 3.200000000000000178e+00 2.012068493150684844e+03 4.200000000000000178e+00 2.012071232876712429e+03 8.199999999999999289e+00 2.012073972602739786e+03 8.300000000000000711e+00 2.012076712328767144e+03 2.600000000000000089e+00 2.012079452054794501e+03 4.900000000000000355e+00 2.012082191780821859e+03 5.999999999999999778e-01 2.012084931506849216e+03 -2.600000000000000089e+00
save('ST_data',data)
!ls -ltr
total 38048
-rw-r--r--@ 1 haule staff 1756407 Dec 18 2020 Scientific-Computing-with-Python.pdf
-rw-r--r--@ 1 haule staff 611082 Jan 20 2021 00_Introduction.html
-rw-r--r--@ 1 haule staff 615891 Jan 20 2021 01_Basic_Python.html
-rw-r--r--@ 1 haule staff 627779 Jan 20 2021 02_Numpy.html
-rw-r--r--@ 1 haule staff 1303621 Mar 4 2021 03_Scipy.html
-rw-r--r--@ 1 haule staff 1040661 Mar 4 2021 04_Scipy_example_Hydrogen_atom.html
-rw-r--r-- 1 haule staff 210051 Jan 3 17:57 02_Numpy.ipynb
-rw-r--r-- 1 haule staff 826208 Jan 3 17:57 03_Scipy.ipynb
-rw-r--r-- 1 haule staff 533268 Jan 3 17:57 04_Scipy_example_Hydrogen_atom.ipynb
-rw-r--r--@ 1 haule staff 430767 Jan 3 17:57 double_pendulum.mp4
-rw-r--r-- 1 haule staff 2038 Jan 3 17:57 pendulum.py
-rw-r--r-- 1 haule staff 2864984 Jan 3 17:57 stockholm_td_adj.dat.txt
-rw-r--r-- 1 haule staff 264 Jan 3 17:57 ttt.cc
-rw-r--r-- 1 haule staff 24263 Jan 6 16:06 00_Introduction.ipynb
-rw-r--r-- 1 haule staff 752 Jan 6 16:06 mymodule.py
drwxr-xr-x 3 haule staff 96 Jan 6 16:06 __pycache__
-rw-r--r-- 1 haule staff 52785 Jan 6 16:07 01_Basic_Python.ipynb
-rw-r--r-- 1 haule staff 3889864 Jan 6 16:07 StockholmT.dat
-rw-r--r-- 1 haule staff 4336264 Jan 6 16:07 ST_data.npy
data2=load('ST_data.npy')
allclose(data,data2) # how close are the two sets of data
True
max(abs(data-data2).ravel())
0.0
amax(abs(data-data2))
0.0
print(data[:,0]) # display year for all datapoints
array(data[::365,0],dtype=int) # the years with 365 spacings, and then last years
[1800. 1800. 1800. ... 2011. 2011. 2011.]
array([1800, 1801, 1802, 1803, 1804, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011])
Fancy indexing Index is itself an array of integer numbers, i.e, which rows or columns?
x=arange(100)
print(x[:10:2])
print(x[ [0,2,4,6,8] ]) # fancy indexing. Does not exists in C++
[0 2 4 6 8] [0 2 4 6 8]
data[[0,365,2*365,3*365,4*365,5*365+1],0] # fancy indexing for multidimensional array
array([1800., 1801., 1802., 1803., 1804., 1805.])
data[range(0,365*20,365),0]
array([1800., 1801., 1802., 1803., 1804., 1804., 1805., 1806., 1807., 1808., 1809., 1810., 1811., 1812., 1813., 1814., 1815., 1816., 1817., 1818.])
data[range(3,3+365*10,365),0]
array([1800., 1801., 1802., 1803., 1804., 1805., 1806., 1807., 1808., 1809.])
Using mask to pick data*
In addition to fancy indexing, Python allows yet another way to pick the data.
Create a mask of [True,....False....]
values, and pick from the array only columns/rows where True
.
How to compute average temperature in the year of 1973?
data[:,0] >= 1973
data[:,0] < 1974
array([ True, True, True, ..., False, False, False])
# Create mask for the year 1973
mask = logical_and(data[:,0] >= 1973, data[:,0] < 1974)
mask
array([False, False, False, ..., False, False, False])
data[mask,0] # All should have 1973
array([1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973., 1973.])
T1973 = data[mask,3]
print('Average temperature in 1973=', sum(T1973)/len(T1973))
Average temperature in 1973= 7.414794520547944
where(mask)
(array([63187, 63188, 63189, 63190, 63191, 63192, 63193, 63194, 63195, 63196, 63197, 63198, 63199, 63200, 63201, 63202, 63203, 63204, 63205, 63206, 63207, 63208, 63209, 63210, 63211, 63212, 63213, 63214, 63215, 63216, 63217, 63218, 63219, 63220, 63221, 63222, 63223, 63224, 63225, 63226, 63227, 63228, 63229, 63230, 63231, 63232, 63233, 63234, 63235, 63236, 63237, 63238, 63239, 63240, 63241, 63242, 63243, 63244, 63245, 63246, 63247, 63248, 63249, 63250, 63251, 63252, 63253, 63254, 63255, 63256, 63257, 63258, 63259, 63260, 63261, 63262, 63263, 63264, 63265, 63266, 63267, 63268, 63269, 63270, 63271, 63272, 63273, 63274, 63275, 63276, 63277, 63278, 63279, 63280, 63281, 63282, 63283, 63284, 63285, 63286, 63287, 63288, 63289, 63290, 63291, 63292, 63293, 63294, 63295, 63296, 63297, 63298, 63299, 63300, 63301, 63302, 63303, 63304, 63305, 63306, 63307, 63308, 63309, 63310, 63311, 63312, 63313, 63314, 63315, 63316, 63317, 63318, 63319, 63320, 63321, 63322, 63323, 63324, 63325, 63326, 63327, 63328, 63329, 63330, 63331, 63332, 63333, 63334, 63335, 63336, 63337, 63338, 63339, 63340, 63341, 63342, 63343, 63344, 63345, 63346, 63347, 63348, 63349, 63350, 63351, 63352, 63353, 63354, 63355, 63356, 63357, 63358, 63359, 63360, 63361, 63362, 63363, 63364, 63365, 63366, 63367, 63368, 63369, 63370, 63371, 63372, 63373, 63374, 63375, 63376, 63377, 63378, 63379, 63380, 63381, 63382, 63383, 63384, 63385, 63386, 63387, 63388, 63389, 63390, 63391, 63392, 63393, 63394, 63395, 63396, 63397, 63398, 63399, 63400, 63401, 63402, 63403, 63404, 63405, 63406, 63407, 63408, 63409, 63410, 63411, 63412, 63413, 63414, 63415, 63416, 63417, 63418, 63419, 63420, 63421, 63422, 63423, 63424, 63425, 63426, 63427, 63428, 63429, 63430, 63431, 63432, 63433, 63434, 63435, 63436, 63437, 63438, 63439, 63440, 63441, 63442, 63443, 63444, 63445, 63446, 63447, 63448, 63449, 63450, 63451, 63452, 63453, 63454, 63455, 63456, 63457, 63458, 63459, 63460, 63461, 63462, 63463, 63464, 63465, 63466, 63467, 63468, 63469, 63470, 63471, 63472, 63473, 63474, 63475, 63476, 63477, 63478, 63479, 63480, 63481, 63482, 63483, 63484, 63485, 63486, 63487, 63488, 63489, 63490, 63491, 63492, 63493, 63494, 63495, 63496, 63497, 63498, 63499, 63500, 63501, 63502, 63503, 63504, 63505, 63506, 63507, 63508, 63509, 63510, 63511, 63512, 63513, 63514, 63515, 63516, 63517, 63518, 63519, 63520, 63521, 63522, 63523, 63524, 63525, 63526, 63527, 63528, 63529, 63530, 63531, 63532, 63533, 63534, 63535, 63536, 63537, 63538, 63539, 63540, 63541, 63542, 63543, 63544, 63545, 63546, 63547, 63548, 63549, 63550, 63551]),)
# where tells you the index where True
indices = where(mask)
X1973 = data[indices,3]; # This gives similar data in 1973, but not identical
print(T1973.shape, X1973.shape, X1973[0].shape)
print('Average temperature in 1973=', sum(X1973[0,:])/len(X1973[0,:]))
print('Min/Max temperature in 1973=', min(X1973[0,:]),'/',max(X1973[0,:]))
(365,) (1, 365) (365,) Average temperature in 1973= 7.414794520547944 Min/Max temperature in 1973= -12.8 / 25.6
What is the mean monthly temperatures for each month of the year?
Let's do Ferbrurary first
Febr=data[:,1]==2
mean(data[Febr,3])
-3.212109570736596
Now loop for all months
monthly_mean=[mean(data[data[:,1]==month,3]) for month in range(1,13)]
print(monthly_mean)
[-3.0447656725502132, -3.212109570736596, -0.8321515520389532, 3.888474842767295, 9.560970785149117, 14.659591194968554, 17.318837492391967, 16.117650639074864, 11.81937106918239, 6.760057821059039, 1.9458490566037738, -1.211275106512477]
plt.bar(range(1,13),monthly_mean);
plt.xlabel('month')
plt.ylabel('average temperature')
Text(0, 0.5, 'average temperature')
It is implemented in low level fortran/C code, and is much more efficient than code written in Python.
A = random.rand(3,3)
print(A * A) # It is not matrix-matrix product, but element-wise product
print()
print(multiply(A,A))
print()
print(A @ A) # It is a matrix product
print()
print(matmul(A,A))
[[0.22276154 0.0009548 0.61689552] [0.62565466 0.0665923 0.19328271] [0.70376555 0.60291592 0.16846522]] [[0.22276154 0.0009548 0.61689552] [0.62565466 0.0665923 0.19328271] [0.70376555 0.60291592 0.16846522]] [[0.90610321 0.63242347 0.70666229] [0.94625905 0.43240315 0.91515852] [1.35444983 0.54499666 1.16873531]] [[0.90610321 0.63242347 0.70666229] [0.94625905 0.43240315 0.91515852] [1.35444983 0.54499666 1.16873531]]
Matrix product or matrix-vector product can be performed by dot
command
print(dot(A,A))
print(matmul(A,A))
# dot == A[i,j,:] * B[l,:,n]
[[0.90610321 0.63242347 0.70666229] [0.94625905 0.43240315 0.91515852] [1.35444983 0.54499666 1.16873531]] [[0.90610321 0.63242347 0.70666229] [0.94625905 0.43240315 0.91515852] [1.35444983 0.54499666 1.16873531]]
v1 = random.rand(3)
print(v1)
print( dot(A,v1) ) # matrix.vector product
print( dot(v1,v1) ) # length of vector^2
[0.81999594 0.24776367 0.00199407] [0.39624061 0.71341642 0.88110177] 0.7337841462337032
print(A@v1)
print(dot(A,v1))
[0.39624061 0.71341642 0.88110177] [0.39624061 0.71341642 0.88110177]
w=A*v1
sum(w,axis=1)
array([0.39624061, 0.71341642, 0.88110177])
Slightly less efficient, but nicer code can be obtained by matrix
clas
M = matrix(A)
v = matrix(v1).T # create a column vector
M*M # this is now matrix-matrix product
matrix([[0.90610321, 0.63242347, 0.70666229], [0.94625905, 0.43240315, 0.91515852], [1.35444983, 0.54499666, 1.16873531]])
M.T
matrix([[0.4719762 , 0.79098335, 0.83890735], [0.03089979, 0.25805484, 0.7764766 ], [0.78542697, 0.4396393 , 0.41044515]])
M*v # this is matrix*vector product
matrix([[0.39624061], [0.71341642], [0.88110177]])
v.T * M # vector*matrix product
matrix([[0.58466834, 0.09082266, 0.75379202]])
v.T * v # inner-product
matrix([[0.73378415]])
Array/Matrix transformations
.T
or transpose(M)
transposes matrix.H
hermitian conjugateconjugate(M)
conjugatesreal(M)
and imag(M)
takes real and imaginary part of the matrixconjugate(A.T)
array([[0.4719762 , 0.79098335, 0.83890735], [0.03089979, 0.25805484, 0.7764766 ], [0.78542697, 0.4396393 , 0.41044515]])
Library linalg
:
linalg.det(A)
linalg.inv(A)
or just M.I
linalg.eig
, linalg.eigvals
, linalg.eigh
linalg.svd()
linalg.solve()
linalg.cholesky()
from numpy import linalg
print( linalg.det(A) )
linalg.inv(A)
0.20259783890966102
array([[-1.16216576, 2.94762767, -0.93336865], [ 0.21797549, -2.29607639, 2.04227434], [ 1.96298224, -1.68094794, 0.48053093]])
M.I
matrix([[-1.16216576, 2.94762767, -0.93336865], [ 0.21797549, -2.29607639, 2.04227434], [ 1.96298224, -1.68094794, 0.48053093]])
The eigenvalue problem for a matrix $A$:
$\displaystyle A v_n = \lambda_n v_n$
where $v_n$ is the $n$th eigenvector and $\lambda_n$ is the $n$th eigenvalue.
To calculate eigenvalues of a matrix, use the eigvals
(symmetric/hermitian eigvalsh
) and for calculating both eigenvalues and eigenvectors, use the function eig
(or eigh
):
linalg.eigvals(A)
array([ 1.59750194+0.j , -0.22851287+0.27313645j, -0.22851287-0.27313645j])
linalg.eigvalsh(A)
array([-0.45257226, -0.39695136, 1.98999982])
A = array([[1,2,3], [4,5,6], [7,8,9]])
b = array([1,2,3])
x = linalg.solve(A,b) # A*x==b
print(x)
dot(A,x)-b
[-0.33333333 0.66666667 0. ]
array([ 0.00000000e+00, -2.22044605e-16, 0.00000000e+00])
v1
array([0.81999594, 0.24776367, 0.00199407])
print( sum(v1) )
print( cumsum(v1) )
print( trace(A) )
print( diag(A) )
print( sum(diag(A)) )
1.06975367109378 [0.81999594 1.0677596 1.06975367] 15 [1 5 9] 15
print(A.shape)
Ag = reshape(A, (9,1)) # this is not new data
print(Ag.shape)
(3, 3) (9, 1)
Ag[0]=10
A # we change A when we change Ag
array([[10, 2, 3], [ 4, 5, 6], [ 7, 8, 9]])
Ax = A.flatten() # flatten creates 1D array of all data, but creates a copy
Ax[8]=100 # changing a copy
print(A)
[[10 2 3] [ 4 5 6] [ 7 8 9]]
Ay=A.ravel()
Ay[8]=100
print(A)
[[ 10 2 3] [ 4 5 6] [ 7 8 100]]
Every function written in Python is very slow. However numpy type operations are fast, because they are written in fortran/C
Temp = data[:,3]
Temp**2 # this is fast, written in C
array([ 37.21, 237.16, 225. , ..., 24.01, 0.36, 6.76])
array([Temp[i]**2 for i in range(len(Temp))]) # This is slow, written in Python
array([ 37.21, 237.16, 225. , ..., 24.01, 0.36, 6.76])
What if we have a function that can not simply work on arrays?
For example, theta function?
def Theta(x):
if x>=0:
return 1
else:
return 0
# Does not work on array
Theta(Temp)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[76], line 2 1 # Does not work on array ----> 2 Theta(Temp) Cell In[75], line 2, in Theta(x) 1 def Theta(x): ----> 2 if x>=0: 3 return 1 4 else: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
We can vectorize Theta, to make it applicable to arrays.
This is simply achieved by call to numpy function vectorize
, which will create low-level routine from your function
Theta_vec = vectorize(Theta)
# This is very fast now, and creates 0 or ones
positive_temperatures=Theta_vec(Temp)
positive_temperatures
array([0, 0, 0, ..., 1, 1, 0])
How to calculate number of days in a year with positive temperatures?
# Boolean array to select data with positive temperatures
positives = array(positive_temperatures, dtype=bool)
# keeps data with positive temperatures only
kept = data[positives,0]
# now we just need to check how many of these data are in each year
years = list(range(1800,2013,1))
hist = histogram(kept, bins=years)
plt.plot(hist[1][:-1], hist[0]);
negative = array(1-Theta_vec(Temp), dtype=bool)
kept2 = data[negative,0]
hist2 = histogram(kept2, bins=years)
plt.plot(hist2[1][:-1], hist[0]);