#!/usr/bin/env python
# coding: utf-8
# # Introduction to scientific computing with Python
# *Maxime Sangnier*
#
# September, 2024
#
# ## Part 3: Linear algebra and scientific computing
# # Table of contents
# 1. [Numpy: numerical arrays in Python](#part1)
# - [What is Numpy](#part1sec1)
# - [Numpy array](#part1sec2)
# - [Operations on arrays](#part1sec3)
# 1. [Scipy: scientific computing in Python](#part2)
# - [What is Scipy](#part2sec1)
# - [Special functions](#part2sec2)
# - [Linear algebra](#part2sec3)
# - [Optimization](#part2sec4)
# - [Interpolation](#part2sec5)
# - [Numerical integration](#part2sec6)
# - [Others](#part2sec7)
# 1. [Loading and saving data](#part3)
# - [Basics](#part3sec1)
# - [Text files](#part3sec2)
# - [Images](#part3sec3)
# - [Numpy format](#part3sec4)
# - [Matlab format](#part3sec5)
# - [JSON](#part3sec6)
# - [Pickle](#part3sec7)
# 1. [Exercises](#part4)
# - [Exercise 1](#part4sec1)
# - [Exercise 2](#part4sec2)
# - [Exercise 3](#part4sec3)
# - [Exercise 4](#part4sec4)
# - [Exercise 5](#part4sec5)
# - [Exercise 6](#part4sec6)
# 1. [References](#part5)
#
# # Numpy: numerical arrays in Python
# ## What is Numpy
# Numpy is a fundamental extension of Python for scientific computing.
# This package provides:
# - a container, called `ndarray`, for handling numerical arrays;
# - [routines](http://docs.scipy.org/doc/numpy/reference/routines.html#routines) for fast operations on arrays (including mathematical and statistical operations);
# - universal mathematical functions (cos, log, exp…).
#
# The main asset of Numpy is the `ndarray` object.
# Roughly speaking, a Numpy array is a list (or a list of lists of… for n-dimensional arrays) that comes with many methods for scientific computing implemented in fast compiled routines.
# Despite the fact that Numpy arrays and lists serve as containers for a collection of items, they diverge on several points:
# - an array is a collection of homogeneous objects (same data types);
# - an array has a fixed size (definition at creation) while a list can grow dynamically;
# - methods for arrays are scientific oriented and efficiently implemented.
#
# The following sections provide a basic introduction to Numpy virtues.
#
# ## Numpy array
# ### Creation
# The main contribution of the numpy package is its multidimensional array.
# Since it shares some features with Python lists, an array can be legitimately created from a list.
# Let us begin with the following 2-dimensional list:
# In[1]:
l = [[18., 17., 16.],
[14., 19., 18.]]
print(l)
# The corresponding array can be created thanks to the `array` function:
# In[2]:
import numpy as np
a = np.array(l)
print(a)
# Let us have a look to the more import attributes of the array object:
# In[3]:
print("Rank (number of dimensions, also called axes) of a: ", a.ndim)
# In[4]:
print("Dimensions (or shape) of a:", a.shape) # The result is a tuple
# In[5]:
print("Total number of items in a:", a.size)
# In[6]:
print("Types of items:", a.dtype)
# Note that the type of items is automatically inferred at creation.
# In the following example, an array of integers is created.
# In[7]:
print(np.array([1, 2, 3]).dtype)
# The type can be changed *a posteriori* with `asarray`.
# In[8]:
np.asarray(a, dtype='int64')
# The type can also be forced at creation:
# In[9]:
b = np.array([1, 2, 3], dtype="complex")
print(b)
print(b.dtype)
# ### Arrays from Numpy routines
# Appart from the `array` function, a Numpy array can be created in several other ways (see the documentation for further details on the parameters):
# In[10]:
np.arange(15)
# In[11]:
np.zeros((3, 4))
# In[12]:
np.ones((2, 10))
# In[13]:
np.eye(3)
# In[14]:
np.diag([1., 2, 3])
# In[15]:
np.linspace(0, 1, num=6)
# **Question**
#
# Consult the help of `arange` and produce the object `array([ 2, 5, 8, 11, 14])`.
# In[ ]:
# Answer
# ### Shape modification
# The shape of an array can be modified with many routines.
# The main one is `resize`.
#
# In Python, the norm for resizing a vector is *the rightmost index changes first*.
# This means that a 2D array `m` is filled in the following order: m[0, 0], m[0, 1], m[0, 2], …, m[1, 0]…
# **This is different from R and Matlab**.
# In[17]:
m = np.arange(12)
m.resize(4, 3) # In-place operation
print(m)
# This is equivalent to modifying the `shape` attribute of the array (yet I advise you not to use it):
# In[18]:
m.shape = (3, 4)
print(m)
# To go back to a one-dimensional array, one can use the `ravel` method:
# In[19]:
m.ravel() # Not in-place
# Note that this is **not an in-place** method, meaning that it does not alter the original array.
# About this, two other methods are similar to `resize` and `ravel`:
# - `reshape` creates a new **view** of the object while `resize` runs **in-place**;
# - `flatten` creates a **copy** of the object while `ravel` returns a new **view**.
# The difference between copy and view is explained below.
# In[20]:
print(m.reshape(2, 6))
# In[21]:
print(m.flatten())
# In[22]:
print(m)
# Note that with the `reshape` method and the `shape` attribute, one dimension can be automatically calculated by giving `-1`:
# In[23]:
m.shape = (2, -1)
print(m)
# In[24]:
m.reshape(-1, 2)
# **Remark :**
# I advise you to use `reshape` and `ravel`, which are two convenient methods returning a new view of the object.
# Finally, a new dimension can be added to an array with the object `newaxis`:
# In[25]:
c = np.ones(3)
print(c)
# In[26]:
c = c[:, np.newaxis]
print(c)
# This technique is equivalent to reshaping with dimensions `(-1, 1)`:
# In[27]:
np.ones(3).reshape(-1, 1)
# ### Repetition and concatenation
# Besides modifying their shapes, arrays can also be repeated and concatenated:
# In[28]:
m = np.arange(4).reshape(2, -1)
print(m)
# In[29]:
np.tile(m, (2, 3)) # 2 repetitions in line, 3 in column
# In[30]:
p = np.linspace(-2, -0.5, num=4).reshape(2, -1)
print(p)
# In[31]:
np.concatenate((m, p))
# In[32]:
np.concatenate((m, p), axis=1)
# For 2D-arrays, the last two operations can also be done by stacking arrays vertically or horizontally:
# In[33]:
np.vstack((m, p))
# In[34]:
np.hstack((m, p))
# Finally, stacking objects of different kinds may also be a way to create arrays.
# This is the purpose of the routines `r_` (for *row*) and `c_` (for *column*).
# The first one is used for concatenating horizontally 1D-arrays while the second one serves for 2D-arrays.
# Beware of the notation. It is quite unusual:
# In[35]:
np.r_[0:6:2, -1:1:6j, [0]*3, 5, 6, np.array([1, 3, 5])]
# Here, we used the slice notation of `0:6:2`, which produces the same as `np.arange(0, 6, step=2)`, that is `[0, 2, 4]` (no 6!).
# However, when the step is an imaginary number, the slice notation is equivalent to a linspace.
# In other words, `-1:1:6j` produces the same as `np.linspace(-1, 1, num=6)`.
# In[36]:
np.c_[m, [[0], [0]], p] # Concatenation of 2D-arrays
# **Question**
#
# With `reshape` and `concatenate`, produce the array:
#
# [[ 0 -2 0 2]
# [-4 -6 4 6]].
# In[ ]:
# Answer
# ### Indexing and slicing
# A one-dimensional array can be indexed, sliced and iterated over the same way as a list:
# In[38]:
a = np.arange(4)
print(a)
a[1:3]
# In[39]:
a[[0, 2]]
# In[40]:
for it in np.linspace(0, 1, num=6):
print(it**2, end=" ")
# For a multi-dimensional array (here 2D), a single index is interpreted as accessing the first axis of the array:
# In[41]:
m = np.arange(12).reshape(4, 3)
print(m)
# Note that the first axis is understood from **top to bottom**.
# In[42]:
print(m[0])
# In[43]:
for row in m:
print(row)
# To iterate on each element of a multi-dimensional array, one can use the `flat` attribute, which is an iterator over all items:
# In[44]:
c = 0
for it in m.flat:
c += 1 - it%2
print("Number of even items:", c)
# Yet, one can access items of a multi-dimensional array with multi-dimensional indexes:
# In[45]:
m[0, 1]
# Note that this is equivalent to:
# In[46]:
m[0][1]
# Some parameters of the slice can be automatically inferred:
# In[47]:
m[:2, 1:] # Until Row 2 (excluded), from Column 1 (included)
# In[48]:
m[2:, :] # ':' means 'all'
# A feature of Numpy is to provide two methods of *fancy indexing* for arrays.
# The first one is with lists/arrays:
# In[49]:
print(m)
print("Extracted array:", m[[1, 1, 2], [0, 1, 2]])
# The second method is using masks:
# In[50]:
print(np.logical_and(3 < m, m < 9))
# In[51]:
print(m[np.logical_and(3 < m, m < 9)])
# These two methods are particularly useful for assignments:
# In[52]:
m[m%2 == 1] -= 1
print(m)
# **Question**
#
# Set all the negative items of the following array to 0.
# In[53]:
u = ((-1)**np.arange(12) * (np.arange(-6, 6))).reshape(3, -1)
print(u)
# In[ ]:
# Answer
# ### Copies and views
# As already stated in the first part of this tutorial, Python objects are rarely copied but often only bound.
# This is still true with Numpy arrays (they are mutable objects) and even a bit more complicated.
# There are three situations:
# #### No copy
# With the assignment operator and arrays as parameters of a function, there is no copy:
# In[55]:
p = m
p is m
# In[56]:
def f(x):
return id(x)
print(id(m), f(m))
# #### View
# With Numpy, two arrays can share the same data.
# The `view` method enables to create a new array that looks at the same data:
# In[57]:
p = m.view()
p is m # Different objects
# In[58]:
p.base is m.base # Same data
# Some attributes of a view can be modified without altering the data.
# For instance the shape:
# In[59]:
p.shape = (1, -1)
print(p)
# In[60]:
print(m)
# As an example, slicing and array returns a view of it:
# In[61]:
s = m[1, 1:]
print(s)
# In[62]:
s.base is m.base
# In[63]:
s[:] = -1
print(s)
# In[64]:
print(m)
# #### Copy
# The `copy` method makes a complete copy of an array (object and data):
# In[65]:
p = m.copy()
print(p is m, p.base is m.base)
# ## Operations on arrays
# ### Basic operations
# In Numpy, operations on arrays are elementwise.
# The result is returned in a new array.
# In[66]:
a = np.arange(4)
print("a =", a)
b = a * 10 # Product with a scalar
print("b =", b)
# In[67]:
b-a # Sum and differences
# In[68]:
a**2 # Power
# In[69]:
a * b # Product with another array
# In[70]:
b < 10.5 # Comparison
# Som operations, such as *+=* and **=* are inplace (no copy):
# In[71]:
a += 2
print(a)
# ### Broadcasting
# A major feature of array is operation broadcasting, that is the ability to operate on arrays with different shapes.
# Broadcasting is a way of vectorizing array operations with fast backward implementations.
#
# The main rule of broadcasting is that the shapes of two arrays are compatible when:
# - they are the same;
# - a shape is different but is equal to 1 (in this case, the array is identically repeated along this direction);
# - a dimension is missing (in this case, numpy creates a new dimension at the beginning and the array is identically repeated along this new first direction).
#
# A common example of this last case is operation between an array and a reduced version of it along the axis 0 (see below).
# In[72]:
x = np.arange(12).reshape(4, -1)
print(x)
# The mean along the first axis is:
# In[73]:
m = x.mean(axis=0)
print(m)
# Then, centering the array can be easily performed in the following way:
# In[74]:
c = x - m
print(c)
# In[75]:
print(c.mean(axis=0))
# **Question**
#
# What is the aim of the operation `u - ur.reshape(-1, 1)`?
# In[76]:
u = np.arange(16).reshape(4, -1)
ur = u.mean(axis=1)
# In[ ]:
# Answer
# ### Mathematical methods
# As stated in the introduction, a major feature of Numpy is to provide mathematical methods with fast implementations.
# For instance:
# In[78]:
print(a)
a.sum(), a.prod()
# In[79]:
a.cumsum()
# In[80]:
a.mean(), a.std()
# In[81]:
a.min(), a.max()
# In[82]:
a.argmin(), a.argmax()
# In[83]:
c = np.array([4, 3, 6, 5, 9])
c.sort() # In-place sorting
print(c)
# Note that for multidimensional arrays, the axis of operation can be specified:
# In[84]:
m = np.arange(12).reshape(4, -1)
print(m)
print(m.sum(axis=1))
# ### Universal functions
# As a mathematical package, Numpy provides elementary functions (called [*universal functions*](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations)) such as the following ones.
# These functions apply element-wise and return an array.
# In[85]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
t = np.linspace(0, 2*np.pi, num=50)
fig, ax = plt.subplots()
ax.plot(t, np.sin(t), label="sin")
ax.plot(t, np.cos(t), label="cos")
ax.plot(t, np.exp(-t), label="exp")
ax.legend();
# In[86]:
fig, ax = plt.subplots()
ax.plot(t, np.fmax(np.sin(t), np.cos(t)));
# **Question**
#
# Compute and plot on the last graphic the curve of $\left| \max(\sin(t), \cos(t)) - 0.25 \right|$.
# In[ ]:
# Answer
# ### Linear algebra
# A major application of Numpy is [linear algebra](http://docs.scipy.org/doc/numpy/reference/routines.linalg.html).
# For this purpose, Numpy provides a `matrix` object, yet it is hardly ever used in practice.
# Scientists generally prefer to handle traditional arrays with routines from linear algebra.
# Some examples are provided below.
# In[88]:
a = np.ones(5)
print(a)
b = np.arange(5)
print(b)
# In[89]:
print(a.dot(b)) # Dot product
# In[90]:
A = np.outer(a, b) # Outer product
print(A)
# In[91]:
B = np.diag([0.2, 0.7, -1, 2, 5])
print(B)
# In[92]:
print(A.dot(B)) # Matrix product (different from A*B!)
# In[93]:
print(A.T) # Transposed matrix
# In[94]:
C = np.random.randn(*A.shape)
C = C.dot(C.T) # Symmetric positive semidefinite matrix
np.set_printoptions(precision=2)
print(C)
# In[95]:
print(np.linalg.eigvalsh(C)) # Eigenvalues
# In[96]:
print(np.linalg.eigvalsh(C).sum(), C.trace()) # Trace of C
# In[97]:
invC = np.linalg.inv(C) # Inverse of C
pC = invC.dot(C)
pC[pC < 1e-10] = 0 # Erase numerical errors before printing
print(pC)
# **Question**
#
# Based on the array defined below, compute the matrix $(2^{-|i-j|})_{1 \le i, j \le 4}$.
# In[98]:
ind = np.outer(np.arange(4), np.ones(4))
print(ind)
# In[ ]:
# Answer
# ### Others
# In[100]:
print(np.any(C < -1, axis=0)) # Test if a column has at least one value less than -1
# In[101]:
print(np.all(C > -5, axis=1)) # Test if a row has all values greater than -5
# In[102]:
indx, indy = np.where(C < -1) # Indices where C is less than -1
# This is equivalent to np.nonzero(C < -1).
print(indx, indy)
# In[103]:
print(C[indx, indy])
# In[104]:
np.floor(C) # Floor of C
# # Scipy: scientific computing in Python
# ## What is Scipy
# Scipy is a collection of modules for various mathematical purposes.
# Leveraging the Numpy package, Scipy offers a broad and powerful environment for solving scientific problems.
# Roughly speaking, Scipy is to Python what toolboxes are to Matlab.
#
# In the forthcoming sections, a few modules of Scipy are skimmed.
# ## Special functions
# An important feature of the [scipy.special](http://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special) module is the definition of numerous mathematical functions, called *special functions*.
# Available functions include Airy, elliptic, Bessel, gamma, beta, hypergeometric, parabolic cylinder, Mathieu, spheroidal wave, Struve, and Kelvin.
# **Question**
#
# Plot the error function on the line $[-3, 3]$.
# In[ ]:
# Answer
# ## Linear algebra
# The [scipy.linalg](http://docs.scipy.org/doc/scipy/reference/linalg.html#module-scipy.linalg) module is crafted for linear algebra.
# It subsumes all the functions in numpy.linalg and adds other more advanced ones.
#
# Here is a non-exhaustive list of scipy.linalg's assets.
#
# ### Inverse of a matrix
# In[106]:
from scipy import linalg
print(C)
# In[107]:
pC_sci = linalg.inv(C).dot(C)
pC_sci[pC_sci < 1e-10] = 0 # Erase numerical errors before printing
print(pC)
# ### Solving a linear system
# Here we try to solve the linear system $Cx = b$ for $x$.
# In[108]:
b = np.random.randn(C.shape[0])
x = linalg.solve(C, b)
print(x)
# In[109]:
print(C.dot(x), b)
# **Remark:**
# It is numerically more efficient to compute $C^{-1}b$ with `solve` than `inv`.
# In[110]:
from timeit import timeit
setup = """
from scipy.linalg import solve, inv
from numpy.random import randn
n = 200
C = randn(n, n)
C = C.dot(C.T)
b = randn(n)
"""
print('solve: {0:0.2f}s'.format(timeit('x = solve(C, b);', setup=setup, number=100)))
print('inv: {0:0.2f}s'.format(timeit('x = inv(C).dot(b);', setup=setup, number=100)))
# ### Eigendecomposition
# The eigendecomposition is illustrated here, yet many others are available: singular value, LU, Cholesky, QR and Schur decompositions.
# In[111]:
print(linalg.eigvalsh(C))
# ### Special matrices
# An important feature of `scipy.linalg` over `numpy.linalg` is to provide several functions for creating special matrices such as block diagonal, circulant, Hadamard, Toeplitz…
# In[112]:
print(linalg.block_diag(A, B))
# In[113]:
print(linalg.circulant(np.arange(4)))
# ## Optimization
# The [scipy.optimize](http://docs.scipy.org/doc/scipy/reference/optimize.html#module-scipy.optimize) module offers several common but efficient routines for constrained and unconstrained optimization, as well as curve fitting and root finding.
# ### Local optimization
# The main interfaces for optimization offered by scipy.optimize are `minimize` (Nelder-Mead, BFGS, Newton, etc.) for local multivariate optimization, `minimize_scalar` for univariate optimization and `linprog` for linear programming (Simplex algorithm). In addition, other routines are available for global optimization and least-squares.
# In[114]:
from scipy import optimize
# import warnings
# warnings.filterwarnings('ignore')
def f(x):
return x**2 + 10*np.sin(x)
# In[115]:
x = np.arange(-10, 10, 0.1)
fig, ax = plt.subplots()
ax.plot(x, f(x), linewidth=2)
for x0 in np.linspace(-10, 10, 5):
sol = optimize.minimize(f, x0, method='BFGS')
ax.annotate("", xy=(sol.x[0], f(sol.x[0])), xytext=(x0, f(x0)), size=20,
arrowprops=dict(arrowstyle="-|>", connectionstyle="arc3,rad=-0.2", edgecolor="k", facecolor='w'))
ax.plot(x0, f(x0), 'go', markersize=8)
ax.plot(sol.x, f(sol.x), '*r', markersize=12)
# ### Curve fitting
# Given some points $\{x_i, y_i\}_{i=1}^n$ and a model $f(\cdot, \theta)$ parametrized by $\theta$, `scipy.optimize.curve_fit` finds a good parameter $\theta^\star$ that makes it possible for the model $f(\cdot, \theta^\star)$ to explain $y_i$ based on $x_i$.
# The solution $\theta^\star$ is obtained by solving non-linear least-squares.
# In[116]:
def f_model(x, theta1, theta2, theta3, theta4):
return theta1*x**2 + theta2*np.sin(x+np.pi/6)
x = np.linspace(-10, 10, num=10)
y = f(x)
popt, pcov = optimize.curve_fit(f_model, x, y)
x_plot = np.arange(-10, 10, 0.1)
fig, ax = plt.subplots()
ax.plot(x_plot, f(x_plot), linewidth=2, label="true")
ax.plot(x_plot, f_model(x_plot, *popt), linestyle='dashed', color="red", linewidth=2, label="fitted")
ax.legend();
# ### Root finding
# Several routines are available for finding roots and fixed points of a function (see [scipy.optimize](http://docs.scipy.org/doc/scipy/reference/optimize.html#module-scipy.optimize)).
# For instance, one can use `fsolve`:
# In[117]:
x = np.arange(-10, 10, 0.1)
fig, ax = plt.subplots()
ax.plot(x, f(x), linewidth=2)
for x0 in np.linspace(-4, 4, 4):
root = optimize.fsolve(f, x0)
ax.annotate("", xy=(root[0], f(root[0])), xytext=(x0, f(x0)), size=20,
arrowprops=dict(arrowstyle="-|>", connectionstyle="arc3,rad=-0.2", edgecolor="k", facecolor="w"))
ax.plot(x0, f(x0), 'go', markersize=8)
ax.plot(root, f(root), '*r', markersize=12)
# ## Interpolation
# [scipy.interpolate](http://docs.scipy.org/doc/scipy/reference/interpolate.html#module-scipy.interpolate) provides functions for uni and multivariate data interpolation.
# Here, we only focus on unidimensional interpolation.
# In[118]:
from scipy.interpolate import interp1d
x = np.linspace(-10, 10, num=10)
x_plot = np.arange(-10, 10, 0.1)
fig, ax = plt.subplots()
ax.plot(x_plot, f(x_plot), linewidth=2, label="true")
ax.plot(x, f(x), 'o', markersize=6, color="blue")
for kind in ['linear', 'cubic']:
f_inter = interp1d(x, f(x), kind=kind)
ax.plot(x_plot, f_inter(x_plot), label=kind, linewidth=2, linestyle="dashed")
ax.legend();
# ## Numerical integration
# In the [scipy.integrate](http://docs.scipy.org/doc/scipy/reference/integrate.html#module-scipy.integrate) module, one can find several methods (trapezoidal, Simpson, Romberg, etc) for integrating uni, bi and trivariate functions.
# Below is an example of univariate integration.
# In[119]:
from scipy.special import erf
from scipy.integrate import quad
f = lambda x: np.exp(-x**2)*2/np.sqrt(np.pi)
I, I_err = quad(f, 0, 1)
print(I, erf(1))
# ## Others
# Other possibilities of Scipy include:
# - Fourier transforms ([scipy.fftpack](http://docs.scipy.org/doc/scipy/reference/fftpack.html#module-scipy.fftpack));
# - sparse matrices ([scipy.sparse](docs.scipy.org/doc/scipy/reference/sparse.html));
# - signal processing ([scipy.signal](http://docs.scipy.org/doc/scipy/reference/signal.html#module-scipy.signal));
# - spatial structures ([scipy.spatial](http://docs.scipy.org/doc/scipy/reference/spatial.html#module-scipy.spatial));
# - multidimensional image processing ([scipy.ndimage](http://docs.scipy.org/doc/scipy/reference/ndimage.html#module-scipy.ndimage)).
#
# # Loading and saving data
# The core of Python as well as Numpy and Scipy provide several routines for loading and saving data.
# We will review some of them.
#
# ## Basics
# With Python only, files are handled thanks to a file object, initialized with `open` and that should be closed after operations completed.
# Roughly speaking, operations may be `write`, `read`, `readline` and `readlines`.
# Note that only strings can be saved thanks to those routines.
#
# One can open a file with different modes:
# - `r`: read only;
# - `w`: write only;
# - `a`: append;
# - `r+`: read and write;
# - `[rwa]b`: (read, write, append) binary.
# In[120]:
f = open('aux/an_example.txt', 'w')
f.write("""This file is an example.
It illustrates how to write in a file.
As well as how to read a file.""")
f.close()
# In[121]:
get_ipython().system('cat aux/an_example.txt')
# It is good practice for handling exceptions and being sure that the file is closed properly to replace the previous script by:
# In[122]:
with open('aux/an_example.txt', 'w') as f:
f.write("""This file is an example.
It illustrates how to write in a file.
As well as how to read a file.""")
print("File closed:", f.closed)
# In[123]:
with open('aux/an_example.txt', 'r') as f:
s = f.read()
print(s)
# The functions `readline` and `readlines` respectively read a single line and put all lines in a list.
# In[124]:
with open('aux/an_example.txt', 'r') as f:
print("First line:", f.readline())
print("Second line:", f.readline())
# In[125]:
with open('aux/an_example.txt', 'r') as f:
s = f.readlines()
print(s)
# One may also iterate over the lines of a file.
# In[126]:
with open('aux/an_example.txt', 'r') as f:
for it, line in enumerate(f):
print("Line", it, ":", line)
# ## Text files
# Numpy provides two routines for handling arrays saved in text files: `loadtxt` and `savetxt`.
# In[127]:
get_ipython().system('cat data/consumption.csv')
# In[128]:
data = np.loadtxt("data/consumption.csv", delimiter=";", skiprows=1)
print(data)
# In[129]:
# Standardization
data -= data.mean(axis=0)
data /= data.std(axis=0)
# Save the data
np.savetxt("data/consumption_std.csv", data)
# In[130]:
data_txt = np.loadtxt("data/consumption_std.csv")
print(data_txt)
# ## Images
# Images can be read and saved thanks to the functions *imread* and *imsave* from matplotlib.pyplot.
# In[131]:
import matplotlib.pyplot as plt
img = plt.imread("img/upmc.jpg")
fig, ax = plt.subplots()
ax.imshow(img);
ax.grid(False)
# In[132]:
plt.imsave("img/upmc3.jpg", img[:, :, 0], cmap=plt.cm.gray) # Save only one channel
# Saved image:
#
#
# ## Numpy format
# Numpy has its own binary format.
# Note that it does not promote interoperability.
# In[133]:
np.save('data/consumption.npy', data)
# In[134]:
data_npy = np.load('data/consumption.npy')
print(data_npy)
# ## Matlab format
# Scipy offers the possibility to save and read Matlab files with the functions *loadmat*, *whosmat* and *savemat*.
# In[135]:
import scipy.io as sio
sio.savemat('data/consumption.mat', {'data': data})
# In[136]:
sio.whosmat('data/consumption.mat')
# In[137]:
data_mat = sio.loadmat('data/consumption.mat')['data']
print(data_mat)
# ## JSON
# JSON (which stands for *JavaScript Object Notation*) is a format
# with which one can save complex data types such as dictionaries and nested lists by serialization.
# It is a secure data format that benefits from inter-operability (data exchange).
#
# However, Numpy arrays cannot be serialized with JSON and have to be converted to lists before.
# In[138]:
import json
with open('data/consumption.json', 'w') as f:
json.dump(data.tolist(), f)
# In[139]:
with open('data/consumption.json', 'r') as f:
data_json = np.array(json.load(f))
print(data_json)
# In a similar manner, dictionaries are stored as `ndarrays` of type *object*.
# In[140]:
with open('data/consumption.json', 'w') as f:
json.dump(dict(description="This is the consumption dataset", data=data.tolist()), f)
# In[141]:
with open('data/consumption.json', 'r') as f:
data_json = np.array(json.load(f))
print(data_json.item()['description'])
print(np.array(data_json.item()['data']))
# ## Pickle
# Akin to JSON, Pickle (and its C version cPickle for Python 2) is a module aimed at serializing (or pickling) and de-serializing (or unpickling) a Python object structure.
# However, Pickle is far more powerful than JSON: it can serialize arbitrarily complex Python objects.
# Thus, in practice, any Python object can be saved (and then re-loaded) with Pickle.
#
# However (this is the other side of the coin), **the Pickle module is not secure against erroneous or maliciously constructed data: deserializing pickle data can execute arbitrary harmful code.
# Never unpickle data received from an untrusted or unauthenticated source**.
# In[142]:
import pickle as pkl # import cPickle in Python 2, it is faster than pickle
with open('data/consumption.pkl', 'wb') as f:
pkl.dump(data, f)
# In[143]:
with open('data/consumption.pkl', 'rb') as f:
data_pkl = pkl.load(f)
print(data_pkl)
# # Exercises
# ## Exercise 1
# Produce the following array:
#
# [[ 0 4 8 12 16]
# [ 1 5 9 13 17]
# [ 2 6 10 14 18]
# [ 3 7 11 15 19]]
#
# Standardize this array, namely do such that the mean of each column is $0$ and the standard deviation is $1$.
# In[ ]:
# Answer
# ## Exercise 2
# Plot the [probability density functions](https://en.wikipedia.org/wiki/Chi-squared_distribution) of $\chi^2$ with parameters $k \in \{1, 2, \dots, 5\}$.
#
# In[ ]:
# Answer
# ## Exercise 3
# Let $A$ be a matrix and $b$ be a vector defined by:
#
# V = np.random.randn(5, 5)
# A = V + V.T
# b = np.random.rand(5)
#
# Solve the linear system $Ax = b$ with two different methods.
#
# In[ ]:
# Answer
# ## Exercise 4
# Build a symmetric positive semidefinite matrix $H$ of size 5 x 5, a vector $b$ of size 5 and a scalar value $c$.
# Compute a minimizer of $x \mapsto \frac 12 x^\top H x + b^\top x + c$.
# Compute the minimum of this function.
#
# In[ ]:
# Answer
# ## Exercise 5
# Build a symmetric positive semidefinite matrix $X$ of size 5 x 5.
# Implement [power iteration](https://en.wikipedia.org/wiki/Power_iteration).
# Compute the [Rayleigh quotient](https://en.wikipedia.org/wiki/Rayleigh_quotient) of the resulting vector regarding the matrix $X$.
# Rediscover the result with a single Numpy routine.
#
# In[ ]:
# Answer
# ## Exercise 6
# Dowload [this data file](https://perso.lpsm.paris/~msangnier/files/pythonM2/Gaussian_sample.mat).
# The sample in this file is assumed to be Gaussian iid.
# Compute the negative log-likelihood function associated to this sample.
# Minimize it with a Scipy routine to estimate $\mu$ and $\sigma$.
# In[ ]:
# Answer
# # References
# - [Official documentation](https://docs.python.org/3/tutorial/index.html).
# - [Numpy and Scipy documentation](http://docs.scipy.org/doc/).
# - [Scipy lecture notes](http://www.scipy-lectures.org/index.html).