import Graphics.Rendering.Chart
import Graphics.Rendering.Chart.Plot.Histogram
import System.Random
import Control.Monad
import Control.Lens
import Data.Default.Class
histogram :: Int -> (Double,Double) -> [Double] -> Renderable ()
histogram n r events = toRenderable $ def & layout_plots .~ [
histToPlot $ defaultPlotHist & plot_hist_bins .~ n
& plot_hist_values .~ events
& plot_hist_range .~ Just r
]
Throw a pair of dice 1000 times:
diceThrows <- replicateM 1000 $
sum <$> replicateM 2 (randomRIO (1,6))
The individual results look like this:
take 30 diceThrows
[7,5,12,6,6,2,6,6,6,7,11,3,7,6,12,9,9,3,4,10,3,3,7,12,7,4,7,5,4,6]
Now if we plot the frequency of each result, we observe that seven turns up most often:
histogram 11 (2,12) $ map fromIntegral diceThrows
The reason is that the result seven can be achieved in many different ways (1+6, 2+5, 3+4, 4+3, 5+2 and 6+1) whereas e.g. two can only be obtained when both dice give exactly 1.
One might now expect that plotting the reciprocals of these results (ranging from $\tfrac1{12}$ to $\tfrac12$) would show a peak at $\tfrac17 \approx 0.143$. Well, turns out this does not happen – the peak is actually rather at $0.12$:
histogram 5 (1/12, 0.2) $ map (recip . fromIntegral) diceThrows
The reason for this is binning: because the reciprocals fall more densely around low values (the spacing between $\tfrac1{12}$ and $\tfrac1{11}$ is only $0.0076$, while the spacing between $\tfrac13$ and $\tfrac12$ is $0.16$). Hence we actually get more different results in the $0.12$ bin than in the $0.14$ bin.
Now, for this dice example we could remove this issue by making so many so small bins that every possible result has a bin of its own:
histogram 50 (1/12, 0.2) $ map (recip . fromIntegral) diceThrows
In this view, the peak is at $0.14$. Most of the bins are zero.
But this only works because the individual results are hard-quantised: you can't get two dice to sum to e.g. $8.734$. A physical distribution like the Planck spectrum is continuous, i.e. no matter how small we make the bins, we will always have “different events” in each bin. And setting up bins with equal distribution in frequency space gives a different result from equal-spaced bins in wavelength space.