A friend found a nice table of beer alcohol by volume and calories so I thought I'd try something obvious. Maybe alcohol isn't what drinking beer is all about, but it's definitely also about calories. They don't call it a beer belly for nothing. So maybe APC should be a thing? If you've ever wondered about how alcohol and calories play together in your favorite brew, you may find the interactive plotly charts at the bottom of this notebook interesting.
pip install plotly
Requirement already satisfied: plotly in c:\program files\python\python 3.7.8\lib\site-packages (5.8.2) Requirement already satisfied: tenacity>=6.2.0 in c:\program files\python\python 3.7.8\lib\site-packages (from plotly) (8.0.1) Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import lxml
from matplotlib import pyplot
import plotly.express as px
Thank you homebrewacademy.com for the simple to import table!
beer = pd.read_html('https://homebrewacademy.com/beer-alcohol-content-list/')
print(f'Total tables: {len(beer)}')
Total tables: 2
# Need the first table, and need to strip off the '%' symbol and make it a float
df = beer[0]
df['ABV'] = df['ABV'].str[:-1]
df['ABV'] = df['ABV'].astype('float')
df.head()
Brand | Calories | ABV | |
---|---|---|---|
0 | Abita Amber | 128 | 4.5 |
1 | Abita Golden | 125 | 4.2 |
2 | Abita Jockamo IPA | 190 | 6.5 |
3 | Abita Light | 118 | 4.0 |
4 | Abita Purple Haze | 128 | 4.2 |
# Start with a simple plotly express plot
fig = px.scatter(df, x="Calories", y="ABV", hover_data=['Brand'],
width=400, height=400)
fig.show()
# But the residual isn't really what's important. How about a ratio?
df['apc'] = df['ABV']/df['Calories']
df = df.sort_values(['apc'],ascending=False)
# So which beers give you the most alcohol per calorie?
df.head(10)
Brand | Calories | ABV | apc | |
---|---|---|---|---|
136 | Michelob Ultra Pure Gold | 85 | 4.3 | 0.050588 |
153 | Natural Ice | 130 | 5.9 | 0.045385 |
149 | Molson Canadian 67 | 67 | 3.0 | 0.044776 |
33 | Bud Ice | 123 | 5.5 | 0.044715 |
193 | Rolling Rock Green Light | 83 | 3.7 | 0.044578 |
52 | Corona Premier | 90 | 4.0 | 0.044444 |
154 | Natural Light | 95 | 4.2 | 0.044211 |
133 | Michelob Ultra | 95 | 4.2 | 0.044211 |
37 | Bud Light Platinum | 137 | 6.0 | 0.043796 |
142 | Miller Lite | 96 | 4.2 | 0.043750 |
So it's mostly "light" and "ice" beers winning the "bang for the buck" test but the highest ABV beer is also in the top 30 (truncated for ease of display). Way to go Dogfish Head.
Maybe try something a little fancier...
I like quintiles for graphics because the middle value contains the median and then you get high, low, and extremes.. Quartiles are for boxplots.
pd.qcut(df["apc"], q=5, labels=False)
df['Quintile'] = pd.qcut(df["apc"], q=5, labels=False)
df['Quintile'] = df['Quintile'] + 1 # add one because zero index not helpful here
df.head(10)
Brand | Calories | ABV | apc | Quintile | |
---|---|---|---|---|---|
136 | Michelob Ultra Pure Gold | 85 | 4.3 | 0.050588 | 5 |
153 | Natural Ice | 130 | 5.9 | 0.045385 | 5 |
149 | Molson Canadian 67 | 67 | 3.0 | 0.044776 | 5 |
33 | Bud Ice | 123 | 5.5 | 0.044715 | 5 |
193 | Rolling Rock Green Light | 83 | 3.7 | 0.044578 | 5 |
52 | Corona Premier | 90 | 4.0 | 0.044444 | 5 |
154 | Natural Light | 95 | 4.2 | 0.044211 | 5 |
133 | Michelob Ultra | 95 | 4.2 | 0.044211 | 5 |
37 | Bud Light Platinum | 137 | 6.0 | 0.043796 | 5 |
142 | Miller Lite | 96 | 4.2 | 0.043750 | 5 |
Plotly's default color ramp from blue to yellow not the best because the largest quintile gets assigned yellow which is difficult to see. Try discrete color rainbow after taking making quintiles categorical.
df['Quintile'] = df['Quintile'].astype("string")
df = df.sort_values(['apc'],ascending=True)
# Now color data points by quintiles of ABV/Calories
fig = px.scatter(df, x="Calories", y="ABV", hover_data=['Brand'], color="Quintile"
,color_discrete_map={
"1" : "blue"
,"2" : "green"
,"3" : "yellow"
,"4" : "orange"
,"5" : "red"
}
,width=700, height=700)
fig.update_layout(title_text="Alcohol per calorie by ABV and Calorie", title_x=0.5)
fig.show()
No surprise that the highest quintile APC are above where the regression line would be but that "Dogfish Head 120 Minute IPA" red dot at 18% is interesting. With that exception, the mass of the highest quintile ratio distribution is at the lower calorie range. So maybe there is something to light beers after all - just not taste!