# Turing Award Winner, 1989
#
#
# >I have a number in my head
# >Though I don't know why it's there
# >When numbers get serious
# >You see their shape everywhere
# >
# >Paul Simon
# One of the themes of this course will be shifting between mathematical and computational views of various concepts.
#
# Today we need to talk about why the answers we get from computers can be different from the answers we get mathematically
#
# -- for the same question!
# The root of the problem has to do with how __numbers__ are manipulated in a computer.
#
# In other words, how numbers are __represented.__
# ## Representations
# A number is a mathematical concept -- an abstract idea.
#
# > God made the integers,
all else is the work of man.
# >
# > Leopold Kronecker (1823 - 1891)
#
# In a computer we assign __bit patterns__ to correspond to certain numbers.
#
# We say the bit pattern is the number's _representation._
# For example the number '3.14' might have the representation '01000000010010001111010111000011'.
#
# For reasons of efficiency, we use a fixed number of bits for these representations. In most computers nowadays we use __64 bits__ to represent a number.
# Let's look at some number representations and see what they imply about computations.
# ## Integers
# Kronecker believed that integers were the only 'true' numbers.
#
# And for the most part, using integers in a computer is not complicated.
#
# Integer representation is essentially the same as binary numerals.
#
# For example, in a 64-bit computer, the representation of the concept of 'seven' would be '0..0111' (with 61 zeros in the front).
#
# There is a size limit on the largest value that can be stored as an integer, but it's so big we don't need to concern ourselves with it in this course.
# So for our purposes, an integer can be stored exactly.
#
# In other words, there is an 1-1 correspondence between every (computational) representation and the corresponding (mathematical) integer.
# So, what happens when we compute with integers?
#
# For (reasonably sized) integers, computation is __exact__ .... as long as it only involves __addition, subtraction, and multiplication.__
#
# In other words, there are no errors introduced when adding, subtracting or multiplying integers.
# However, it is a different story when we come to division, because the integers are not closed under division.
#
# For example, 2/3 is not an integer. ... It is, however, a __real__ number.
# ## Real Numbers and Floating-Point Representations
# Representing a real number in a computer is a __much__ more complicated matter.
#
# In fact, for many decades after electronic computers were developed, there was no accepted "best" way to do this!
#
# Eventually (in the 1980s) a widely accepted standard emerged, called IEEE-754. This is what almost all computers now use.
# The style of representation used is called __floating point.__
# Conceptually, it is similar to "scientific notation."
#
# $$123456 = \underbrace{1.23456}_{\text{significand}}\times {\underbrace{10}_{\text{base}}}^{\overbrace{5}^{exponent}}$$
# Except that it is encoded in binary:
#
# $$17 = \underbrace{1.0001}_{\text{significand}}\times {\underbrace{2}_{\text{base}}}^{\overbrace{4}^{exponent}}$$
# The sign, significand, and exponent are all contained within the 64 bits.
# ```{margin}
# By Codekaizen (Own work) [
GFDL or
CC BY-SA 4.0-3.0-2.5-2.0-1.0],
via Wikimedia Commons
# ```
# In[2]:
display(Image("images/IEEE_754_Double_Floating_Point_Format.png", width=450))
# Because only a fixed number of bits are used, __most real numbers cannot be represented exactly in a computer.__
#
# Another way of saying this is that, usually, a floating point number is an approximation of some particular real number.
# Generally when we try to store a real number in a computer, __what we wind up storing is the closest floating point number that the computer can represent.__
# ### The Relative Error of a Real Number stored in a Computer
# ````{margin}
# ```{note}
# You can experiment with floating point representations to see how errors arise using [this interactive tool](https://baseconvert.com/ieee-754-floating-point).
# ```
# ````
# The way to think about working with floating point (in fact, how the hardware actually does it) is:
#
# 1. Represent each input as the __nearest__ representable floating point number.
# 2. Compute the result exactly from the floating point representations.
# 3. Return the __nearest__ representable floating point number to the result.
# What does "__nearest__" mean? Long story short, it means "round to the nearest representable value."
#
# Let's say we have a particular real number $r$ and we represent it as a floating point value $f$.
#
# Then $r = f + \epsilon$ where $\epsilon$ is the amount that $r$ was rounded when represented as $f$.
# So $\epsilon$ is the difference between the value we want, and the value we get.
#
# How big can this difference be? Let's say $f$ is
#
# $$f = \underbrace{1.010...01}_\text{53 bits}\times 2^n$$
# Then $|\epsilon|$ must be smaller than
#
# $$|\epsilon| < \underbrace{0.000...01}_\text{53 bits}\times 2^n.$$
# So as a _relative error_,
#
# $$ \text{relative error} = \frac{|\epsilon|}{f} < \frac{{0.000...01}\times 2^n}{\underbrace{1.000...00}_\text{53 bits}\times 2^n} = 2^{-52} \approx 10^{-16}$$
# This value $10^{-16}$ is an important one to remember.
#
# It is approximately __the relative error that can be introduced any time a real number is stored in a computer.__
# Another way of thinking about this is that you __only have about 16 digits of accuracy__ in a floating point number.
# ### Implications of Representation Error
# Problems arise when we work with floating point numbers and confuse them with real numbers.
#
# It is important to remember that most of the time we are not storing the real number exactly, but only a floating point number that is close to it.
# Let's look at some examples. First:
#
# $$ \left( \frac{1}{8} \cdot 8 \right) - 1 $$
# In[3]:
# ((1/8)*8)-1
a = 1/8
b = 8
c = 1
(a*b)-c
# It turns out that 1/8, 8, and 1 can all be stored exactly in IEEE-754 floating point format.
#
# So, we are
# * storing the inputs exactly (1/8, 8 and 1)
# * computing the results exactly (by definition of IEEE-754), yielding $(1/8) \cdot 8 = 1$
# * and representing the result exactly (zero)
# OK, here is another example:
#
# $$ \left( \frac{1}{7} \cdot 7 \right) - 1 $$
# In[4]:
# ((1/7)*7)-1
a = 1/7
b = 7
c = 1
a * b - c
# Here the situation is different.
#
# 1/7 can __not__ be stored exactly in IEEE-754 floating point format.
#
# In binary, 1/7 is $0.001\overline{001}$, an infinitely repeating pattern that obviously cannot be represented as a finite sequence of bits.
# Nonetheless, the computation $(1/7) \cdot 7$ still yields exactly 1.0.
#
# Why? Because the rounding of $0.001\overline{001}$ to its closest floating point representation, when multiplied by 7, yields a value whose closest floating point representation is 1.0.
# Now, let's do something that seems very similar:
#
# $$ \left( \frac{1}{70} \cdot 7 \right) - 0.1 $$
# In[5]:
a = 1/70
b = 7
c = 0.1
a * b - c
# In this case, both 1/70 and 0.1 __cannot__ be stored exactly.
#
# More importantly, the process of rounding 1/70 to its closest floating point representation, then multiplying by 7, yields a number whose closest floating point representation is __not__ 0.1
# However, that floating point representation is very __close__ to 0.1.
#
# Let's look at the difference: -1.3877787807814457e-17.
#
# This is about $-1 \cdot 10^{-17}$.
# In other words, about -0.00000000000000001
#
# Compared to 0.1, this is a very small number. The relative error is about:
#
# $$ \frac{|-0.00000000000000001|}{0.1} $$
#
# which is about $10^{-16}.$
# This suggests that when a floating point calculation is not exact, the error (in a relative sense) is usually very small.
# Notice also that in our example the size of the relative error is about $10^{-16}$.
#
# Recall that the significand in IEEE-754 uses 52 bits.
#
# Now, note that $2^{-52} \approx 10^{-16}$.
#
# There's our "sixteen digits of accuracy" principle again.
# ### Special Values
# There are three kinds of special values defined by IEEE-754:
# 1. NaN, which means "Not a Number"
# 2. Infinity -- both positive and negative
# 3. Zero -- both positive and negative.
# __NaN__ and __Inf__ behave about as you'd expect.
#
# If you get one of these values in a computation you should be able to reason about how it happened. Note that these are values, and can be assigned to variables.
# In[6]:
np.sqrt(-1)
# In[7]:
var = np.log(0)
var
# In[8]:
1/var
# As far as we are concerned, there is no difference between positive and negative zero. You can ignore the minus sign in front of a negative zero. (If you are curious why there is a negative zero, see the online notes.)
# ````{margin}
# ```{note}
#
# The reason for having a negative and positive zero is the following.
#
# Remember that, due to the limitations of floating point representation, we can only store the __nearest representable__ number to the one we'd like to store.
#
# So, let's say we try to store a number $x$ that is very close to zero. To be specific, let $|x| < 2.2 \times 10^{-308}$. Then the closest floating point representation is zero, so that is what is stored. This is known as "underflow".
#
# But ... the number $x$ that we were _trying_ to store could have been positive or negative. So the standard defines a positive and negative zero. The sign of zero tells us when underflow occurred, "which direction" the underflow came from.
#
# This can be useful in some numerical algorithms.
# ```
# ````
# In[9]:
var = np.nan
var + 7
# In[10]:
var = np.inf
var + 7
# ## _Mathematical_ Computation vs. _Mechanical_ Computation
# In a mathematical theorem, working with (idealized) numbers, it is always true that:
#
# If
#
# $$c = 1/a$$
#
# then
#
# $$abc = b.$$
#
# In other words,
#
# $$(ab)/a = b.$$
# Let's test whether this is always true in actual computation.
# In[11]:
a = 7
b = 1/10
c = 1/a
a*c*b
# In[12]:
b*c*a
# In[13]:
a*c*b == b*c*a
# Here is another example:
# In[14]:
0.1 + 0.1 + 0.1
# In[15]:
3 * (0.1) - 0.3
# __What does all this mean for us in practice?__
#
# I will now give you three principles to keep in mind when computing with floating point numbers.
# ### Principle 1: Do not compare floating point numbers for equality
# Two floating point computations that _should_ yield the same result mathematically, may not do so due to rounding error.
#
# However, in general, if two numbers should be equal, the relative error of the difference in the floating point should be small.
#
# So, instead of asking whether two floating numbers are equal, we should ask whether the relative error of their difference is small.
# In[16]:
r1 = a * b * c
r2 = b * c * a
np.abs(r1-r2)/r1
# In[17]:
np.finfo('float')
# In[18]:
print(r1 == r2)
# In[19]:
print(np.abs(r1 - r2)/np.max([r1, r2]) < np.finfo('float').resolution)
# This test is needed often enough that `numpy` has a function that implements it:
# In[20]:
np.isclose(r1, r2)
# So another way to state Rule 1 for our purposes is:
#
# ... __always__ use `np.isclose()` to compare floating point numbers for equality!
# Next, we will generalize this idea a bit:
#
# beyond the fact that numbers that should be equal, may not be in practice,
#
# we can also observe that it can be hard to be accurate about the __difference__ between two numbers that are __nearly__ equal. This leads to the next two principles.
# ### Principle 2: Beware of ill-conditioned problems
# An __ill-conditioned__ problem is one in which the outputs depend in a very sensitive manner on the inputs.
#
# That is, a small change in the inputs can yield a very large change in the outputs.
#
# The simplest example is computing $1/(a-b)$.
# In[21]:
print(f'r1 is {r1}')
print(f'r2 is very close to r1')
r3 = r1 + 0.0001
print(f'r3 is 0.1001')
# Let's look at
#
# $$ \frac{1}{r1 - r2} \text{ versus } \frac{1}{r3-r2} $$
# In[22]:
print(f'1/(r1 - r2) = {1/(r1 - r2)}')
print(f'1/(r3 - r2) = {1/(r3 - r2)}')
# If $a$ is close to $b$, small changes in either make a big difference in the output.
# Because the inputs to your problem may not be exact, if the problem is ill-conditioned, the outputs may be wrong by a large amount.
# Later on we will see that the notion of ill-conditioning applies to matrix problems too, and in particular comes up when we solve certain problems involving matrices.
# ### Principle 3: Relative error can be magnified during subtractions
# Two numbers, each with small relative error, can yield a value with large relative error if subtracted.
# Let's say we represent a = 1.2345 as 1.2345002 -- the relative error is 0.0000002.
# Let's say we represent b = 1.234 as 1.2340001 -- the relative error is 0.0000001.
# Now, subtract a - b: the result is .0005001.
# What is the relative error? 0.005001 - 0.005 / 0.005 = 0.0002
# The relative error of the result is 1000 times larger than the relative error of the inputs.
# Here's an example in practice:
# In[23]:
a = 1.23456789
b = 1.2345678
print(0.00000009)
print(a-b)
# In[24]:
print(np.abs(a-b-0.00000009)/ 0.00000009)
# We know the relative error in the inputs is on the order of $10^{-16}$, but the relative error of the output is on the order of $10^{-9}$ -- i.e., a million times larger.
# ### Further Reading
# * Further information about how Python handles issues around floating point is at [Floating Point Arithmetic: Issues and Limitations](https://docs.python.org/3/tutorial/floatingpoint.html).
# * An excellent, in-depth introduction to floating point is [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html).