#!/usr/bin/env python # coding: utf-8 #
Peter Norvig, Oct 2017
Last update: Jan 2024
# # # Bicycling Statistics # # During a pandemic, bicycling is a great way to (1) spend some time, (2) get some exercise, (3) stay outside and be safe. In this notebook I track [my cycling performance](https://www.strava.com/athletes/575579) against various goals: # - **Distance**: I do about 6,000 miles a year. # - **Climbing**: In 2022, I climbed to *space* (100 km of total elevation gain). # - **Explorer Tiles**: In 2022, I started tracking the 1-mile-square [explorer tiles](https://rideeverytile.com/) I have visited. # - **Wandering**: In 2020, I started using [Wandrer.earth](https://wandrer.earth/athletes/3534/) to track what new roads I have ridden. # - **Eddington Number**: I've done 68 miles or more on 68 different days. So 68 is my Eddington Number. # - **Speed**: I'm not going particularly fast, but I am interested in understanding how my speed varies with the steepness of the hill. # # This notebook is mostly for my own benefit, but if you're a cyclist you're welcome to adapt it to your own data, and if you're a data scientist, you might find it an interesting example of exploratory data analysis. The companion notebook [**BikeCode.ipynb**](BikeCode.ipynb) has the implementation details. # # Yearly Totals # # Here are my overall stats for each year since I started keeping track in mid-2014. I have done 6,000 miles per year since 2016, except for 2020 when an injury kept me sidelined for two months. The columns keep track of the total **hours** on the bike, distance traveled in **miles**, and total **feet** climbed. Then there are some columns that are dervided from these: **mph** is **miles / hour**; **vam** is vertical meters ascended per hour (or **feet × 0.3048 / hours**); **fpmi** is **feet / miles**; **pct** is the grade in percent (or **feet × 100 / miles / 5280**), and finally **kms** and **meters** are the metric equivalents of **miles** and **feet**. # # In[45]: get_ipython().run_line_magic('run', 'BikeCode.ipynb') yearly # And here's the same data on a per day basis, assuming I ride 6 days a week: # In[46]: daily # # Climbing # # In 2022 my friend [A. J. Jacobs](https://ajjacobs.com/) set a goal of **walking to space**: climbing a total elevation equal to the distance from the Earth's surface to the top of the atmoshere. [A group](https://www.facebook.com/groups/260966686136038) of about 40 of us joined the quest. The boundary of "space" is vague, but the [Kármán line](https://en.wikipedia.org/wiki/K%C3%A1rm%C3%A1n_line) is 100 kilometers; in 2022 I surpassed 100 kilometers of climbing (over 1,100 feet per day), but most years I'm closer to 60 kilometers (about 600 feet per day). # # Explorer Tiles # # # The [OpenStreetMap](https://www.openstreetmap.org/) world map is divided into **[explorer tiles](https://www.statshunters.com/faq-10-what-are-explorer-tiles)** of approximately 1 mile square. Sites like [Veloviewer](https://veloviewer.com), [Statshunter](https://www.statshunters.com/), [RideEveryTile](https://rideeverytile.com/), and [SquadRats](https://squadrats.com/map) challenge bicyclist/hikers to record which tiles they have passed through. The process is gamified to highlight the following statistics: # - The largest **square** (an *n* × *n* array of visited tiles). # - The maximum **cluster** (a set of contiguous interior visited tiles, where "interior" means surrounded by visited tiles). # - The **total** number of visited tiles. # # # Since I live on a penninsula, it is not easy for me to form a large square, and I sometimes have to work hard to connect different parts of my map into my main cluster (such as connecting San Francisco and Marin). I have a [separate page](???) documenting my explorations, but here are a few key points along the way: # In[47]: tiles # # Wandering # # The website [**Wandrer.earth**](https://wandrer.earth) tracks the distinct roads a user has biked on. It provides a fun incentive to get out and explore new roads. The site is gamified in a way that there is a reward for first reaching 25% of the road-miles in each city, and further rewards for higher percentages. (You get no credit for repeating a road you've already been on.) # # The wandrer.earth site does a good job of showing my current status, but it requires clicking around a bit, so I summarize it all in one place here. Each line gives the percent of roads/trails that I have traveled on for each place (specified by **county** and city **name**), as well as the **total** miles of road in the place, the miles I have **done**, and the amount I need to hit the **next badge**. # In[48]: wandering(by='pct') # As part of my wandering, in April 2022 I was able to get to 25% of every city that rings the San Francisco Bay and is below San Francisco or Oakland (see map [with](ring2.jpeg) or [without](ring1.jpeg) roads traveled; as soon as you get 25% of a city, it lights up with a color). # # I live at the border of Santa Clara County (SCC) and San Mateo County (SMC), so I ride in both. Wandrer.earth says that Jason Molenda is a whopping 1,700 miles ahead of me in SCC and Megan Gardner is 1,000 miles ahead of me in SMC. Barry Mann is the leader in total miles in the two counties, and Megan leads in average percent. Kudos to all of them! However, I do occupy a small section of the [Pareto front](https://en.wikipedia.org/wiki/Pareto_front) for the two counties together: no single rider on wandrer.earth has done more than me in *both* counties. Here are the leaders (as of December 2023), where the dotted line indicates the Pareto front. # In[49]: pareto_front(leaders) # # Eddington Number # # The physicist/bicyclist [Sir Arthur Eddington](https://en.wikipedia.org/wiki/Arthur_Eddington), a contemporary of Einstein defined the [**Eddington Number**](https://www.triathlete.com/2011/04/training/measuring-bike-miles-eddington-number_301789) as the largest integer **E** such that you have cycled at least **E** miles on at least **E** days. # # My Eddington number progress over the years, in both kilometers and miles: # In[50]: Ed_progress(rides) # My current Eddington Number is **102** in kilometers and **68** in miles (I've ridden at least 68 miles on at least 68 days, but not 69 miles on 69 days). My number is above [the median for Strava](https://swinny.net/Cycling/-4687-Calculate-your-Eddington-Number), but not nearly as good as Eddington himself: his number was **84** (in miles) when he died at age 62, and his roads, weather, bicycles, and navigation aids were not nearly as nice as mine, so bravo zulu to him. # How many more rides will I need to reach higher Eddington numbers? I call that the *Eddington Gap*: # In[51]: Ed_gaps(rides) # I need 3 rides of 103 kms or 12 rides of 69 miles to increase my Eddington numbers. # # Here are some properties of Eddington numbers: # - Your Eddington number is monotonic: it can never decrease over time. # - To improve from an Eddington number of *n* to *n* + 1 can take as few as 1 ride, or as many as *n* + 1 rides. # + *Suppose you have done 9 rides, each of exactly 10 miles. Your Eddington number is 9.* # + *You would need 1 ride of 10 miles to improve from a number of 9 to 10.* # + *You would then need 11 rides of 11 miles to improve from a number 10 to 11.* # - Your metric Eddington number will always be greater than or equal to your imperial Eddington number. # - Your metric Eddington number will never be more than 1.609344 times your imperial Eddington number. # - Of two riders, it is possible that one has a higher metric number and the other a higher imperial number. # # *Note:* the definition of Eddington Number seems precise, but what exactly does ***day*** mean? The New Oxford dictionary has three senses: # # 1. *a period of 24 hours;* # 2. *a unit of time, reckoned from one midnight to the next;* # 3. *the part of a day when it is light.* # # I originally assumed sense 2, but I wanted to accept sense 1 for what [bikepackers](https://bikepacking.com/) call a [sub-24-hour overnight](https://oneofsevenproject.com/s24o-bikepacking-guide/) (S24O): a ride to a camping site in the afternoon, pitching a tent for the night, and returning back home the next morning. And then COVID struck, the camping sites closed, so why not allow an S24O where I sleep in my own home? I realize Eddington had a lot more hardships than we have (World War I, the 1918 pandemic, and World War II, for example), but I hope he would approve of this modest accomodation on my part. # # Hill-Index: Speed versus Grade on Short Climbs # # The Eddington number reminds me of the [**h-index**](https://en.wikipedia.org/wiki/H-index) metric for scientific publications. I invented another metric: # # > *Your **hill-index** is the maximum integer **h** where you can regularly climb an **h** percent grade at **h** miles per hour.* # # I'll plot grade versus speed for segments (not rides) with two best-fit curves: a blue quadratic and an orange cubic. I'll also superimpose a red dotted line where grade = speed. # In[52]: show('pct', 'mph', segments[segments.pct > 2], 'Miles per hour versus segment grade in percent') plt.plot((2, 6, 7), (2, 6, 7), 'ro:'); # Both best-fit curves are above the red circle at 6% and below the red circle for 7%, so **my hill-index is 6**. We also see that I can cruise at 14 mph on a 2% grade, but only about 7 mph at 6% grade, and around 5.5 mph on 8% grades. # # Speed versus Grade on Long Rides # # The plot above tell me how fast I should expect to climb a particular hill, but what about average time on longer rides? Here's a plot of my speed versus steepness (measured in feet climbed per mile rather than in percent). # In[53]: show('fpmi', 'mph', rides, 'Speed (miles per hour) versus Ride Grade (feet per mile)') # So, I average a little under 14 mph when the overall route is fairly flat, with a lot of variability, depending more on my level of effort (and maybe the wind) than on the grade of the road. But when the grade is steeper than 50 ft/mile, my speed falls off quickly: down to 12mph at 80 ft/mile; 11 mph at 100 ft/mile; and around 10 mph at 120 ft/mile. Note that 120 ft/mile is only 2.3% grade, but if you figure a typical route is 1/3 up, 1/3 down, and 1/3 flat, then that's 7% average grade on the up part. # # I can use this to predict the time of a ride. For example, if I'm in La Honda and want to get to Pescadero, which way is faster: the [coast route](https://www.google.com/maps/dir/La+Honda,+California/Pescadero,+California/@37.2905834,-122.3896683,12z/data=!4m19!4m18!1m10!1m1!1s0x808faed4dc6265bd:0x51a109d3306a7219!2m2!1d-122.274227!2d37.3190255!3m4!1m2!1d-122.4039496!2d37.3116594!3s0x808f062b7d7585e7:0x942480c22f110b74!1m5!1m1!1s0x808f00b4b613c4c1:0x43c609077878b77!2m2!1d-122.3830152!2d37.2551636!3e1) (15.7 miles, 361 ft climb), or the [creek route](https://www.google.com/maps/dir/La+Honda,+California/Pescadero,+California/@37.2905834,-122.3896683,12z/data=!4m19!4m18!1m10!1m1!1s0x808faed4dc6265bd:0x51a109d3306a7219!2m2!1d-122.274227!2d37.3190255!3m4!1m2!1d-122.3658887!2d37.2538867!3s0x808f00acf265bd43:0xb7e2a0c9ee355c3a!1m5!1m1!1s0x808f00b4b613c4c1:0x43c609077878b77!2m2!1d-122.3830152!2d37.2551636!3e1) (13.5 miles, 853 ft climb)? We can estimate: # In[54]: f'Coast: {estimate(15.7, 361)} min, Creek: {estimate(13.5, 853)} min.' # This predicts the shorter but steeper creek route would be about 6 minutes faster (whereas Google Maps predicts the creek route would be 80 minutes, 2 more than the coast route—I guess Google lacks confidence in my climbing ability). This is all good to know, but other factors (like the scenery and whether I want to stop at the San Gregorio store) are probably more important in making the choice. # # VAM # # Climbing speed is measured by [VAM](https://en.wikipedia.org/wiki/VAM_%28bicycling%29), which stands for *velocità ascensionale media* (for native Campagnolo speakers) or *vertical ascent in meters per hour* (for SRAM) or 平均上昇率 (for Shimano), or *Vm/h* (for physicists). The theory is that for fairly steep climbs, most of your power is going into lifting against gravity, so your VAM should be about constant no matter what the grade. (For flatish segments power is spent on wind and rolling resistance, and for the very steepest of climbs, in my experience, power goes largely to cursing *sotto voce*, as they say in Italian.) # # Here's a plot of my VAM versus grade over short segments: # In[55]: show('pct', 'vam', segments, 'VAM (vertical meters per hour) versus segment grade in percent') # Champion cyclists can do over 1800 meters/hour over a 10 km climb, and can sustain [1400 meters/hour for 7 hours](https://www.strava.com/activities/4996833865). My VAM numbers range mostly from 400 to 800 meters/hour, and I can sustain the higher numbers for only a couple of minutes: # In[56]: top(segments, 'vam') # On segments that are at least a kilometer long my VAM tops out at about 800 meters/hour: # In[57]: top(segments[segments.kms >= 1], 'vam', n=30) # I can also look at VAM numbers for complete rides. I would expect the ride VAM to be half the segment VAM (or less) since most of my rides are circuits where I return to the start, and thus no more than half the ride is climbing. Sure enough, the best I can do is about 400 meters/hour: # In[58]: top(rides, 'vam') # # Exploring the Data # # # Some more ways to look at the data, both rides and segments. # In[59]: rides.describe() # Summary statistics for the rides # In[60]: segments.describe() # Summary statistics for the segments # In[61]: top(rides, 'mph') # Fastest rides (of more than 20 miles, that I sampled into database) # In[62]: top(segments, 'mph') # Fastest segments (there are no descent segments in the database) # In[63]: top(segments, 'feet') # Biggest climbing segments # In[64]: top(segments, 'pct') # Steepest climbs # In[65]: top(rides, 'miles') # Longest rides