*This notebook originally appeared as a post on the blog Pythonic Perambulations.*

*Image Source: Wikipedia License CC-BY-SA 3.0*

If you, like me, frequently commute via public transit, you may be familiar with the following situation:

You arrive at the bus stop, ready to catch your bus: a line that advertises arrivals every 10 minutes. You glance at your watch and note the time... and when the bus finally comes 11 minutes later, you wonder why you always seem to be so unlucky.

Naïvely, you might expect that if buses are coming every 10 minutes and you arrive at a random time, your average wait would be something like 5 minutes. In reality, though, buses do not arrive exactly on schedule, and so you might wait longer. It turns out that under some reasonable assumptions, you can reach a startling conclusion:

**When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes.**

This is what is sometimes known as the *waiting time paradox*.

I've encountered this idea before, and always wondered whether it is actually true... how well do those "reasonable assumptions" match reality? This post will explore the waiting time paradox from the standpoint of both simulation and probabilistic arguments, and then take a look at some real bus arrival time data from the city of Seattle to (hopefully) settle the paradox once and for all.

If buses arrive exactly every ten minutes, it's true that your average wait time will be half that interval: 5 minutes. Qualitatively speaking, it's easy to convince yourself that adding some variation to those arrivals will make the average wait time somewhat longer, as we'll see here.

The waiting time paradox turns out to be a particular instance of a more general phenomenon, the *inspection paradox*, which is discussed at length in this enlightening post by Allen Downey: The Inspection Paradox Is Everywhere.

Briefly, the inspection paradox arises whenever the probability of observing a quantity is related to the quantity being observed.
Allen gives one example of surveying university students about the average size of their classes. Though the school may truthfully advertise an average of 30 students per class, the average class size *as experienced by students* can be (and generally will be) much larger. The reason is that there are (of course) more students in the larger classes, and so you oversample large classes when computing the average experience of students.

In the case of a nominally 10-minute bus line, sometimes the span between arrivals will be longer than 10 minutes, and sometimes shorter, and if you arrive at a random time, you have more opportunities to encounter a longer interval than to encounter a shorter interval. And so it makes sense that the average span of time *experienced by riders* will be longer than the average span of time between buses, because the longer spans are over-sampled.

But the waiting time paradox makes a stronger claim than this: when the average span between arrivals is $N$ minutes, the average span *experienced by riders* is $2N$ minutes.
Could this possibly be true?

In [1]:

```
import numpy as np
N = 1000000 # number of buses
tau = 10 # average minutes between arrivals
rand = np.random.RandomState(42) # universal random seed
bus_arrival_times = N * tau * np.sort(rand.rand(N))
```

In [2]:

```
intervals = np.diff(bus_arrival_times)
intervals.mean()
```

Out[2]:

9.9999879601518398

In [3]:

```
def simulate_wait_times(arrival_times,
rseed=8675309, # Jenny's random seed
n_passengers=1000000):
rand = np.random.RandomState(rseed)
arrival_times = np.asarray(arrival_times)
passenger_times = arrival_times.max() * rand.rand(n_passengers)
# find the index of the next bus for each simulated passenger
i = np.searchsorted(arrival_times, passenger_times, side='right')
return arrival_times[i] - passenger_times
```

We can then simulate some wait times and compute the average:

In [4]:

```
wait_times = simulate_wait_times(bus_arrival_times)
wait_times.mean()
```

Out[4]:

10.001584206227317

The average wait time is also close to 10 minutes, just as the waiting time paradox predicted.

How can we understand what's going on here?

Fundamentally, this is an instance of the inspection paradox, in which the probability of observing a value is related to the value itself. Let's denote by $p(T)$ the distribution of intervals $T$ between buses as they arrive at a bus stop. In this notation, the expectation value of the arrival times is $$ E[T] = \int_0^\infty T~p(T)~dT $$ In the above simulation, we had chosen $E[T] = \tau = 10$ minutes.

When a rider arrives at a bus stop at a random time, the probability of the time interval they experience will be affected by $p(T)$, but also by $T$ itself: the longer the interval, the larger the probability is that a passenger will experience it.

So we can write the distribution of arrival times experienced by passengers: $$ p_{exp}(T) \propto T~p(T) $$ The constant of proportionality comes from normalizing the distribution: $$ p_{exp}(T) = \frac{T~p(T)}{\int_0^\infty T~p(T)~dT} $$ Comparing to above we see this simplifies to $$ p_{exp}(T) = \frac{T~p(T)}{E[T]} $$ The expected wait time $E[W]$ will then be half of the expected interval experienced by passengers, so we can write $$ E[W] = \frac{1}{2}E_{exp}[T] = \frac{1}{2}\int_0^\infty T~p_{exp}(T)~dT $$ which can be rewritten in a more suggestive way: $$ E[W] = \frac{E[T^2]}{2E[T]} $$ and now all that remains is for us to choose a form for $p(T)$ and compute the integrals.

With this formalism worked out, what is a reasonable distribution to use for $p(T)$? We can get a picture of the $p(T)$ distribution within our simulated arrivals by plotting a histogram of the intervals between arrivals:

In [5]:

```
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')
plt.hist(intervals, bins=np.arange(80), density=True)
plt.axvline(intervals.mean(), color='black', linestyle='dotted')
plt.xlabel('Interval between arrivals (minutes)')
plt.ylabel('Probability density');
```

The vertical dotted line here shows the mean interval of about 10 minutes. This looks very much like an exponential distribution, and that is no accident: our simulation of bus arrival times as uniform random numbers very closely approximates a Poisson process, and for such a process it can be shown that the distribution of intervals between arrivals is exponential.

(Side note: In our case this is only approximately exponential; in reality the intervals $T$ between $N$ uniformly sampled points within a timespan $N\tau$ follow the Beta distribution: $T/(N\tau) \sim \mathrm{Beta}[1, N]$, which in the large $N$ limit approaches $T \sim \mathrm{Exp}[1/\tau]$. See, e.g. this StackExchange post, or this twitter thread for more details).

An exponential distribution of intervals implies that the arrival times follow a Poisson process. To double-check this reasoning, we can confirm that it matches another property of a Poisson process: the number of arrivals within a fixed span of time will be Poisson-distributed. Let's check this by binning our simulated arrivals into hourly blocks:

In [6]:

```
from scipy.stats import poisson
# count the number of arrivals in 1-hour bins
binsize = 60
binned_arrivals = np.bincount((bus_arrival_times // binsize).astype(int))
x = np.arange(20)
# plot the results
plt.hist(binned_arrivals, bins=x - 0.5, density=True, alpha=0.5, label='simulation')
plt.plot(x, poisson(binsize / tau).pmf(x), 'ok', label='Poisson prediction')
plt.xlabel('Number of arrivals per hour')
plt.ylabel('frequency')
plt.legend();
```

*memoryless* process, meaning that the history of events has no bearing on the expected time to the next event.
So when you arrive at the bus stop, the average waiting time until the next bus is always the same: in our case, it is 10 minutes, and this is regardless of how long it has been since the previous bus!
Along the same lines, it does not matter how long you have been waiting already: the expected time to the next arrival is always exactly 10 minutes: for a Poisson process, you get no "credit" for time spent waiting.

The above is well and good if real-world bus arrivals are actually described by a Poisson process, but are they?

*Image Source: https://seattletransitmap.com/*

To determine whether the waiting time paradox describes reality, we can dig into some data, available for download here: arrival_times.csv (3MB CSV file). The dataset contains scheduled and actual arrival times for Seattle's Rapid Ride lines C, D, and E at the 3rd & Pike bus stop in downtown Seattle, recorded during the second quarter of 2016 (huge thanks to Mark Hallenbeck of the Washington State Transportation Center for providing this data!).

In [7]:

```
import pandas as pd
df = pd.read_csv('arrival_times.csv')
df = df.dropna(axis=0, how='any')
df.head()
```

Out[7]:

OPD_DATE | VEHICLE_ID | RTE | DIR | TRIP_ID | STOP_ID | STOP_NAME | SCH_STOP_TM | ACT_STOP_TM | |
---|---|---|---|---|---|---|---|---|---|

0 | 2016-03-26 | 6201 | 673 | S | 30908177 | 431 | 3RD AVE & PIKE ST (431) | 01:11:57 | 01:13:19 |

1 | 2016-03-26 | 6201 | 673 | S | 30908033 | 431 | 3RD AVE & PIKE ST (431) | 23:19:57 | 23:16:13 |

2 | 2016-03-26 | 6201 | 673 | S | 30908028 | 431 | 3RD AVE & PIKE ST (431) | 21:19:57 | 21:18:46 |

3 | 2016-03-26 | 6201 | 673 | S | 30908019 | 431 | 3RD AVE & PIKE ST (431) | 19:04:57 | 19:01:49 |

4 | 2016-03-26 | 6201 | 673 | S | 30908252 | 431 | 3RD AVE & PIKE ST (431) | 16:42:57 | 16:42:39 |

To start with, let's do a little bit of data cleanup to get it into a form that's easier to work with:

In [8]:

```
# combine date and time into a single timestamp
df['scheduled'] = pd.to_datetime(df['OPD_DATE'] + ' ' + df['SCH_STOP_TM'])
df['actual'] = pd.to_datetime(df['OPD_DATE'] + ' ' + df['ACT_STOP_TM'])
# if scheduled & actual span midnight, then the actual day needs to be adjusted
minute = np.timedelta64(1, 'm')
hour = 60 * minute
diff_hrs = (df['actual'] - df['scheduled']) / hour
df.loc[diff_hrs > 20, 'actual'] -= 24 * hour
df.loc[diff_hrs < -20, 'actual'] += 24 * hour
df['minutes_late'] = (df['actual'] - df['scheduled']) / minute
# map internal route codes to external route letters
df['route'] = df['RTE'].replace({673: 'C', 674: 'D', 675: 'E'}).astype('category')
df['direction'] = df['DIR'].replace({'N': 'northbound', 'S': 'southbound'}).astype('category')
# extract useful columns
df = df[['route', 'direction', 'scheduled', 'actual', 'minutes_late']].copy()
df.head()
```

Out[8]:

route | direction | scheduled | actual | minutes_late | |
---|---|---|---|---|---|

0 | C | southbound | 2016-03-26 01:11:57 | 2016-03-26 01:13:19 | 1.366667 |

1 | C | southbound | 2016-03-26 23:19:57 | 2016-03-26 23:16:13 | -3.733333 |

2 | C | southbound | 2016-03-26 21:19:57 | 2016-03-26 21:18:46 | -1.183333 |

3 | C | southbound | 2016-03-26 19:04:57 | 2016-03-26 19:01:49 | -3.133333 |

4 | C | southbound | 2016-03-26 16:42:57 | 2016-03-26 16:42:39 | -0.300000 |

In [9]:

```
import seaborn as sns
g = sns.FacetGrid(df, row="direction", col="route")
g.map(plt.hist, "minutes_late", bins=np.arange(-10, 20))
g.set_titles('{col_name} {row_name}')
g.set_axis_labels('minutes late', 'number of buses');
```

`groupby`

functionality to compute these intervals:

In [10]:

```
def compute_headway(scheduled):
minute = np.timedelta64(1, 'm')
return scheduled.sort_values().diff() / minute
grouped = df.groupby(['route', 'direction'])
df['actual_interval'] = grouped['actual'].transform(compute_headway)
df['scheduled_interval'] = grouped['scheduled'].transform(compute_headway)
```

In [11]:

```
g = sns.FacetGrid(df.dropna(), row="direction", col="route")
g.map(plt.hist, "actual_interval", bins=np.arange(50) + 0.5)
g.set_titles('{col_name} {row_name}')
g.set_axis_labels('actual interval (minutes)', 'number of buses');
```

It's already clear that these don't look much like the exponential distribution of our model, but that is not telling us much yet: the distributions may be affected by non-constant scheduled arrival intervals.

Let's repeat the above chart, examining the scheduled rather than observed arrival intervals:

In [12]:

```
g = sns.FacetGrid(df.dropna(), row="direction", col="route")
g.map(plt.hist, "scheduled_interval", bins=np.arange(20) - 0.5)
g.set_titles('{col_name} {row_name}')
g.set_axis_labels('scheduled interval (minutes)', 'frequency');
```

In [13]:

```
def stack_sequence(data):
# first, sort by scheduled time
data = data.sort_values('scheduled')
# re-stack data & recompute relevant quantities
data['scheduled'] = data['scheduled_interval'].cumsum()
data['actual'] = data['scheduled'] + data['minutes_late']
data['actual_interval'] = data['actual'].sort_values().diff()
return data
subset = df[df.scheduled_interval.isin([10, 12, 15])]
grouped = subset.groupby(['route', 'direction', 'scheduled_interval'])
sequenced = grouped.apply(stack_sequence).reset_index(drop=True)
sequenced.head()
```

Out[13]:

route | direction | scheduled | actual | minutes_late | actual_interval | scheduled_interval | |
---|---|---|---|---|---|---|---|

0 | C | northbound | 10.0 | 12.400000 | 2.400000 | NaN | 10.0 |

1 | C | northbound | 20.0 | 27.150000 | 7.150000 | 0.183333 | 10.0 |

2 | C | northbound | 30.0 | 26.966667 | -3.033333 | 14.566667 | 10.0 |

3 | C | northbound | 40.0 | 35.516667 | -4.483333 | 8.366667 | 10.0 |

4 | C | northbound | 50.0 | 53.583333 | 3.583333 | 18.066667 | 10.0 |

In [14]:

```
for route in ['C', 'D', 'E']:
g = sns.FacetGrid(sequenced.query(f"route == '{route}'"),
row="direction", col="scheduled_interval")
g.map(plt.hist, "actual_interval", bins=np.arange(40) + 0.5)
g.set_titles('{row_name} ({col_name:.0f} min)')
g.set_axis_labels('actual interval (min)', 'count')
g.fig.set_size_inches(8, 4)
g.fig.suptitle(f'{route} line', y=1.05, fontsize=14)
```

We see that for each line and schedule, the distribution of observed arrival intervals is nearly Gaussian, is peaked near the scheduled arrival interval, and has a standard deviation that is smaller near the beginning of the route (southbound for C, northbound for D/E) and larger near the end.
Even without a statistical test, it's clear by eye that the actual arrival intervals are definitely **not** exponentially distributed, which is the basic assumption on which the waiting time paradox rests.

We can make use of the wait time simulation function we used above in order to find the average wait time for each bus line, direction, and schedule:

In [15]:

```
grouped = sequenced.groupby(['route', 'direction', 'scheduled_interval'])
sims = grouped['actual'].apply(simulate_wait_times)
sims.apply(lambda times: "{0:.1f} +/- {1:.1f}".format(times.mean(), times.std()))
```

Out[15]:

route direction scheduled_interval C northbound 10.0 7.8 +/- 12.5 12.0 7.4 +/- 5.7 15.0 8.8 +/- 6.4 southbound 10.0 6.2 +/- 6.3 12.0 6.8 +/- 5.2 15.0 8.4 +/- 7.3 D northbound 10.0 6.1 +/- 7.1 12.0 6.5 +/- 4.6 15.0 7.9 +/- 5.3 southbound 10.0 6.7 +/- 5.3 12.0 7.5 +/- 5.9 15.0 8.8 +/- 6.5 E northbound 10.0 5.5 +/- 3.7 12.0 6.5 +/- 4.3 15.0 7.9 +/- 4.9 southbound 10.0 6.8 +/- 5.6 12.0 7.3 +/- 5.2 15.0 8.7 +/- 6.0 Name: actual, dtype: object

The waiting time paradox has been an interesting launching-point for a discussion that covered simulation, probability, and comparison of statistical assumptions with reality. Although we confirmed that real-world bus lines do follow some version of the inspection paradox, the above analysis shows pretty definitively that the core assumption behind the waiting time paradox — that the arrival of buses follows the statistics of a Poisson process — is not well-founded.

In retrospect, this is perhaps not all that surprising: a Poisson process is a memoryless process that assumes the probability of an arrival is entirely independent of the time since the previous arrival. In reality, a well-run bus system will have schedules deliberately structured to avoid this kind of behavior: buses don't begin their routes at random times throughout the day, but rather begin their routes on a schedule chosen to best serve the transit-riding public.

The larger lesson here is that you should be careful about the assumptions you bring to any data analysis task. A Poisson process is a good description for arrival time data — sometimes. But just because one type of data sounds like another type of data, it does not mean that assumptions valid for one are necessarily valid for the other. Often assumptions that seem correct on their face can lead to conclusions that don't match reality.