(Work in progress. I haven't yet written an explanation I consider sufficiently clear and intuitive for this well-understood phenomenon.)
(Oh, now I have! But it's at the end. I think I'll try to write it up in a new notebook now, leaving this one to document the order I was thinking about things in.)
From a constant-voltage power supply, normally the power dissipated by a load of resistance $R$ is $P = VI = V\frac VR = \frac{V^2}R$. So, for a fixed power supply, the power is just inversely proportional to the resistance. Want twice the heat? Use half the resistance.
For example:
%pylab inline
import seaborn
style.use('seaborn-poster')
import sympy
sympy.init_printing()
rcParams['lines.linewidth'] = 1
Populating the interactive namespace from numpy and matplotlib
V = array([3.3, 5, 12, 120, 240])[:, None]
R = array([4.7e6, 4.7e3, 4700, 470])
I = V/R
P = V*I
P
array([[2.31702128e-06, 2.31702128e-03, 2.31702128e-03, 2.31702128e-02], [5.31914894e-06, 5.31914894e-03, 5.31914894e-03, 5.31914894e-02], [3.06382979e-05, 3.06382979e-02, 3.06382979e-02, 3.06382979e-01], [3.06382979e-03, 3.06382979e+00, 3.06382979e+00, 3.06382979e+01], [1.22553191e-02, 1.22553191e+01, 1.22553191e+01, 1.22553191e+02]])
Columns are resistances, rows are voltages. Here we go from 2.3 μW in the upper left-hand corner to 120 W in the lower right-hand corner. You can see that in each row the power is just inversely proportional to the resistance.
But real power supplies have some internal resistance in series with the load. So even if they're designed for constant voltage, their voltage starts to droop as you draw more current from them, because some of it is dropped across the internal resistance. If you try to get very high power out of them by using a very low load resistance, the voltage droops so much that the power delivered to the load starts to fall.
Suppose the internal resistance $R_I$ is 2.3Ω. What powers do we get for different load resistances $R_L$? What load resistance gives us the maximum load power $P_L$, or is there no such load resistance? That is, what is $\newcommand{argmax}[1]{\underset{#1}{\mathrm{argmax}\,}}\argmax{R_L}P_L$?
V = 12
Ri = 2.3 # internal
Rl = array([0.01, 0.1, 1, 10, 100])
R = Ri + Rl
I = V/R
Vl = I * Rl
Pl = Vl * I
Pl
array([ 0.26986001, 2.5 , 13.2231405 , 9.51814396, 1.37597716])
So you can see that for both low load resistances like 10 mΩ and high resistances like 100 Ω, the power delivered to the load is small, but intermediate resistances like 1Ω give us more power.
A first observation on this comes from dimensional analysis. Whatever the answer is, it is almost surely proportional to $R_I$, because the only parameters of the problem are $V$ and $R_I$, and $V$ has dimensions of voltage (energy per charge), while $R_I$ has dimensions of resistance (voltage per current, which is to say, energy times time, divided by charge squared). Of these parameters, only $R_I$ depends on the units we use to measure resistance, current, or time; if our milliammeter were reading 25% high and we adjusted our ohmmeter to match, we'd see the same measurements of voltage (such as $V$ and $V_L$), but we'd measure both $R_I$ and $R_L$ as being 20% lower ($\frac 1{1.25} - 1 = -0.2$). It would be very strange if recalibrating our milliammeter in this way caused $R_L$ to heat up or cool down, since our milliammeter is not part of the circuit.
By the same logic, the answer can't depend on $V$, but that's also easy to see algebraically; $P_L = V_L I = I^2 R_L = \frac{V^2}{R^2} R_L = V^2 \frac{R_L}{(R_I + R_L)^2}$, so the load power is just proportional to $V^2$, which is constant. So if $V$ is higher or lower, both the maximum power and the powers nearby will be scaled up or down by the same amount, but the maximum will be at the same value of $R_L$. (Unless $V$ is zero.)
Thus we have strong reason to believe that the answer is of the form $\argmax{R_L}P_L = kR_I$.
So the problem simplifies to finding the optimal ratio $k = \frac{R_L}{R_I}$, which is to say, $\frac{\argmax{R_L}P_L}{R_I}$.
Here's a more detailed visualization of the phenomenon:
V = 12
Ri = 2.3 # internal
Rl = linspace(0.001, 20, 1000) # load
R = Ri + Rl
I = V/R
Vl = I * Rl
Pl = Vl * I
opt = Rl[Pl.argmax()]
subplot(211); tick_params(labelbottom=False); plot(Rl, Pl); ylabel("$P_L$"); axvline(opt); axhline(Pl.max())
subplot(212, sharex=gca()); plot(Rl, I, 'r'); ylabel("$I$"); axhline(I.max()); xlabel("$R_L$")
gca().twinx(); plot(Rl, Vl); ylabel("$V_L$"); legend(['$V_L$'], loc=3)
opt
Empirically, you might speculate that maybe the actual max is when $R_L = R_I$, since it's within 0.1% of that in our numerical calculation there. We can also see that the $I$ and $V_L$ lines seem to cross there, although of course that is nonsense; they are on different $y$-axes. As it happens, though, they are scaled so that their maxima are quite close vertically, and when $R_L = R_I$ they are each at precisely half of those maxima, so that's pretty close to where the lines cross.
So is that it?
It's fairly straightforward to work this out symbolically; here we express the power in terms of the parameters $V$ and $R_I$ and the design variable $R_L$:
V, Rl, Ri = sympy.symbols('V R_L R_I')
R = Ri + Rl
I = V/R
Vl = I * Rl
Pl = Vl * I
Pl
Or, step by step, and with the concrete numbers above:
Pl_concrete = Pl.subs({V: 12, Ri: 2.3}) # P_L at the concrete values we used above
#sympy.plotting.plot(Pl_concrete, (Rl, 0, 20))
R, I, Vl, Pl, Pl_concrete
That's a pretty smooth function, differentiable everywhere the total resistance $R_I + R_L$ is nonzero, and the total resistance could only be zero if the load or internal resistance were negative; so we can find its extrema (and maybe other points) by finding where its derivative is zero with respect to the design variable $R_L$. That is, if $\argmax{R_L}P_L$ exists, then $\renewcommand{\d}[1]{\operatorname{d}\!{#1}} \frac{\d P_L}{\d R_L} = 0$ at that point.
Well, what is that derivative, symbolically?
Plp = Pl.diff(Rl) # P_L'
Plp, Plp.simplify(), Pl_concrete.diff().simplify()
Now, we can see that the derivative has a factor of $R_I - R_L$, so clearly it will be zero when $R_I = R_L$ as we saw empirically above; and nowhere else, unless of course $V = 0$. but it's not at all obvious where that comes from or what it means physically!
Algebraically, where it came from is that we multiplied the second term $\frac{V^2}{(R_I + R_L)^2}$ by $\frac{R_I + R_L}{R_I + R_L}$ in order to get the two rational expressions onto the common denominator $(R_I + R_L)^3$; and then $R_I + R_L - 2R_L$ gave us $R_I - R_L$ in the numerator. But intuitively where does this come from?
I think maybe an easier way to understand this is to think about $V_L'$ and $I'$ separately, and how they combine to make $P_L'$. Because $P_L = V_LI$,
$$P_L' = V_L'I + V_LI'$$Now, both $V_L$ and $I$ are positive and monotonic in $R_L$; $V_L$ monotonically increases, and $I$ monotonically decreases, as $R_L$ increases.
This means that $V_L'I$ is always positive, and $V_LI'$ is always negative. The point where they are perfectly balanced, then, is the zero of the derivative, and the maximum of the power. So we are looking for when $V_L'I = -V_LI'$, which implies not only that $V_L'I + V_LI' = 0$, but also that $\frac{V_L'I}{V_LI'} = -1$.
This immediately suggests a more felicitous phrasing for the derivative:
$$\frac{V_L'}{V_L} = -\frac{I'}I$$That is, the power reaches its maximum when the relative decline of the load voltage equals the relative growth of the current as $R_L$ increases. If we expand that out brute-force we get this:
c = sympy.Eq((Vl.diff(Rl)/Vl), -(I.diff(Rl)/I))
c
c.simplify()
Canceling the common term in the denominator, we get this:
sympy.Eq(c.lhs * R, c.rhs * R).simplify()
So again we have verified that $\argmax{R_L}P_L = R_I$. Still, though, that seems like a somewhat obscure derivation. Why again do the derivatives take that form?
Vl, Vl.diff(Rl).simplify()
I, I.diff(Rl).simplify()
There's a lot of $\frac{R_L}{R_I + R_L}$ in the above, which kind of screws with the simplicity of the equations, making it hard to see what's going on. What if we just define $m = \frac{R_L}{R_I + R_L}$ and re-express everything that way? $R_L$ should be monotonic in $m$, and valid values of $m$ are $[0, 1)$. This might give us a nicer way to tackle the problem.
To do this, we have to solve for $R_L$; I'm doing this in a very plodding way because I can't brain right now:
$$m = \frac{R_L}{R_I + R_L}$$$$R_L = m(R_I + R_L) = mR_I + mR_L$$$$mR_I = R_L - mR_L = (1 - m)R_L$$$$R_L = \frac{m}{1 - m}R_I$$V, Ri, m = sympy.symbols('V R_I m')
Rl = m*Ri / (1 - m)
(Rl / (Ri + Rl)).simplify()
Good! Now, let's see what our power looks like as a function of our new parameter $m$ instead of $R_L$!
R = Rl + Ri
I = V / R
Vl = I * Rl
Pl = Vl * I
Pl.simplify()
Well, as you might expect, $P_L = 0$ when $m = 0$ and approaches 0 when $m \to 1$, and is also 0 if $V = 0$. But the delightful thing is that this is just a nice little parabola! It's $\frac{V^2}{R_I}m(1-m)$. Now $m(1-m) = -m^2 + m$, a quadratic, so its extremum is halfway between its zeroes, when $m = \frac 12$.
Which is to say, $R_L = R_I$, because $R_L$ is half the total resistance.
A nicer derivation of the expression above is
$$P_L = V_L I$$$$V_L = m V$$(That is, the usual voltage-divider proportionality applies; $R_L = m R$, so $V = V_L + V_I = I R_L + I R_I = I m R + I (1-m) R = IR$, just like in any other series resistances.)
$$I = \frac{V}{R} = \frac V{R_I + R_L} = \frac V{R_I + \frac{m}{1-m}R_I} = \frac V{\left(1 + \frac{m}{1-m}\right)R_I}$$(That is, the current is smaller than the maximum current $\frac V{R_I}$ by a factor of $1-m$.)
Putting these together, we get
$$P_L = mV (1-m)\frac V{R_I} = \frac {V^2}{R_I} m (1-m)$$which is what we got above.