SO Plot with trend line¶

based on https://stackoverflow.com/q/78168545/8508004

In [1]:

#based on https://stackoverflow.com/a/75562058/8508004
import numpy as np
x_data, y_data = np.repeat(np.linspace(0, 9, 100)[None,:], 2, axis=0) + np.random.rand(2, 100)*2
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1,2, figsize=(12,3))
axs[0].scatter(x_data, y_data)
#z = np.polynomial.polynomial.polyfit(x_data, y_data, 1) comment https://stackoverflow.com/questions/26447191/how-to-add-trendline-to-a-scatter-plot#comment100202969_26447505 seems wrong
z = np.polyfit(x_data, y_data, 1)
p = np.poly1d(z)
axs[0].plot(x_data,p(x_data), '-', color= "red");

In [ ]:

In [2]:

%pip install scipy

Requirement already satisfied: scipy in /srv/conda/envs/notebook/lib/python3.10/site-packages (1.9.3)
Requirement already satisfied: numpy<1.26.0,>=1.18.5 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from scipy) (1.23.5)
Note: you may need to restart the kernel to use updated packages.

In [3]:

# from https://stackoverflow.com/a/7187687/8508004
import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)

# plot
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
#ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o') # this gives `ValueError: 'yerr' must not contain negative values`
ax.scatter(x, yn) # simplified because as written was getting `ValueError: 'yerr' must not contain negative values` and I didn't really care about that
ax.plot(x, np.polyval(popt, x), '-', color= "red")
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()

Adjusting after-the-fact¶

Imagine we made scatter plot and now want to adjust it.

Starting point based on above:

In [4]:

#based on https://stackoverflow.com/a/75562058/8508004
import numpy as np
x_data, y_data = np.repeat(np.linspace(0, 9, 100)[None,:], 2, axis=0) + np.random.rand(2, 100)*2
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1,2, figsize=(12,3))
axs[0].scatter(x_data, y_data);

Because that gets closed after cell concludes running, there is no current plot. We can see that none are open by running plt.gca() and plt.gcf() below (based on here.

In [5]:

plt.gcf()

Out[5]:

<Figure size 640x480 with 0 Axes>

<Figure size 640x480 with 0 Axes>

In [6]:

plt.gca()

Out[6]:

<AxesSubplot: >

That's something but not what we want to modify.

So both current figure and axis are something. What can we do with this knowledge?

But if we put them in same cell, the current axes can be the last one modified.

In [7]:

#based on https://stackoverflow.com/a/75562058/8508004
import numpy as np
x_data, y_data = np.repeat(np.linspace(0, 9, 100)[None,:], 2, axis=0) + np.random.rand(2, 100)*2
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1,2, figsize=(12,3))
axs[0].scatter(x_data, y_data)
print(plt.gca());

AxesSubplot(0.547727,0.11;0.352273x0.77)

If we do the same with the figure, we get something different thab above because of the print() we wrap it in.

In [8]:

#based on https://stackoverflow.com/a/75562058/8508004
import numpy as np
x_data, y_data = np.repeat(np.linspace(0, 9, 100)[None,:], 2, axis=0) + np.random.rand(2, 100)*2
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1,2, figsize=(12,3))
axs[0].scatter(x_data, y_data)
print(plt.gcf());

Figure(1200x300)

However, both of thse had a handle.
For example, we can recall fig already. (See here and here.)

In [9]:

fig

Out[9]:

So at least the figure object exists. We'll get back to that because recalling fig to display the current state of fig will be handy.

Actually the Axes exist to because they had a handle applied. Because we are using subplot, there's more than one:

In [10]:

for ax_obj in axs:
    print(ax_obj)

AxesSubplot(0.125,0.11;0.352273x0.77)
AxesSubplot(0.547727,0.11;0.352273x0.77)

Modifying a specific subplot after-the-fact¶

Let's restore things to the starting point again where later we want to add a line showing the fit; however, for now there is on a scatter plot produced.

In [11]:

#based on https://stackoverflow.com/a/75562058/8508004
import numpy as np
x_data, y_data = np.repeat(np.linspace(0, 9, 100)[None,:], 2, axis=0) + np.random.rand(2, 100)*2
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1,2, figsize=(12,3))
axs[0].scatter(x_data, y_data);

So we have a subplot without the fit line. Let's add the regression line after-the-fact and show the figure again like we did above using fig, which this time should be updated.

We showed that because the axes have handles, we can access them after-the-fact. We can also modify them:

In [12]:

z = np.polyfit(x_data, y_data, 1)
p = np.poly1d(z)
axs[0].plot(x_data,p(x_data), '-', color= "red")
fig

Out[12]:

We could modify and reuse these. But I'm not sure OP wants to do that here? That's more complex.
What if we didn't get in situation in the first place:

An option discussed here is to use plt.ioff() to not clear the information after the cell is run. I don't think that is necessary given the handles but I add it in case OP is interested. Plus, I think plt.ioff() use can lead to complications, see here. And so it is nice to have other options.

Enjoy!