We've only been looking at the largest cities by population, but we have a whole bunch of other data we can use.
import pandas
df = pandas.read_csv('US_cities.csv', index_col="AccentCity")
df[:5]
id | Country | City | Region | Population | Latitude | Longitude | |
---|---|---|---|---|---|---|---|
AccentCity | |||||||
Alabaster | 2907755 | us | alabaster | AL | 26738.0 | 33.244167 | -86.816389 |
Albertville | 2907759 | us | albertville | AL | 18368.0 | 34.267500 | -86.208889 |
Alexander City | 2907765 | us | alexander city | AL | 14993.0 | 32.943889 | -85.953889 |
Anniston | 2907804 | us | anniston | AL | 23423.0 | 33.659722 | -85.831667 |
Athens | 2907848 | us | athens | AL | 20470.0 | 34.802778 | -86.971667 |
We have over 4000 cities and towns in the database, definitely too many for a bar chart. But we could look at the population data at the state level. First, let's try just a simple groupby in pandas.
states_pop = df[["Population", "Region"]].groupby(by=["Region"]).sum()
states_pop[:5]
Region Population AK 425708.0 AL 1995225.0 AR 1212898.0 AZ 4838335.0 CA 30890865.0 dtype: float64
states_pop.plot.bar()
plt.show()
Let's make a stacked bar chart that shows us how much the three largest cities contribute to population compared to the rest of the state. First, we'll need to get a separate list of the three top cities. To do that, we'll be using the pandas groupby method.
largest_cities_idx = df.groupby(by=["Region"], sort=False)["Population"].transform(max) == df["Population"]
df[largest_cities_idx][:5]
id | Country | City | Region | Population | Latitude | Longitude | |
---|---|---|---|---|---|---|---|
AccentCity | |||||||
Birmingham | 2908046 | us | birmingham | AL | 231621.0 | 33.520556 | -86.802500 |
Anchorage | 2912001 | us | anchorage | AK | 276263.0 | 61.218056 | -149.900278 |
Phoenix | 2913986 | us | phoenix | AZ | 1428509.0 | 33.448333 | -112.073333 |
Little Rock | 2916351 | us | little rock | AR | 184217.0 | 34.746389 | -92.289444 |
Los Angeles | 2920652 | us | los angeles | CA | 3877129.0 | 34.052222 | -118.242778 |
def max2(cities):
return nth_largest(cities, 2)
def max3(cities):
return nth_largest(cities, 3)
def nth_largest(cities, n):
nlargest = cities.nlargest(n)
if len(nlargest) < n:
return None
return nlargest[n-1]
second_largest_idx = df.groupby(by=["Region"], sort=False)["Population"].transform(max2) == df["Population"]
third_largest_idx = df.groupby(by=["Region"], sort=False)["Population"].transform(max3) == df["Population"]
df[second_largest_idx][:5]
id | Country | City | Region | Population | Latitude | Longitude | |
---|---|---|---|---|---|---|---|
AccentCity | |||||||
Montgomery | 2910236 | us | montgomery | AL | 198325.0 | 32.366667 | -86.300000 |
Juneau | 2912251 | us | juneau | AK | 31796.0 | 58.301944 | -134.419722 |
Tucson | 2914501 | us | tucson | AZ | 518907.0 | 32.221667 | -110.925833 |
Fort Smith | 2915717 | us | fort smith | AR | 81985.0 | 35.385833 | -94.398333 |
San Diego | 2922103 | us | san diego | CA | 1287050.0 | 32.715278 | -117.156389 |
Now we just need to sum the rest of the cities that we haven't counted yet.
smaller_cities = df[~largest_cities_idx & ~second_largest_idx & ~third_largest_idx].copy()
smaller_cities[["Population", "Region"]].groupby(by=["Region"]).sum().plot.bar(stacked=True)
df[largest_cities_idx][["Population", "Region"]].groupby(by=["Region"]).sum().plot.bar(stacked=True)
# df[second_largest_idx].plot.bar()
# df[third_largest_idx].plot.bar()
plt.show()
df
id | Country | City | Region | Population | Latitude | Longitude | |
---|---|---|---|---|---|---|---|
AccentCity | |||||||
Alabaster | 2907755 | us | alabaster | AL | 26738.0 | 33.244167 | -86.816389 |
Albertville | 2907759 | us | albertville | AL | 18368.0 | 34.267500 | -86.208889 |
Alexander City | 2907765 | us | alexander city | AL | 14993.0 | 32.943889 | -85.953889 |
Anniston | 2907804 | us | anniston | AL | 23423.0 | 33.659722 | -85.831667 |
Athens | 2907848 | us | athens | AL | 20470.0 | 34.802778 | -86.971667 |
Auburn | 2907855 | us | auburn | AL | 49833.0 | 32.609722 | -85.480833 |
Bessemer | 2908015 | us | bessemer | AL | 28628.0 | 33.401667 | -86.954444 |
Birmingham | 2908046 | us | birmingham | AL | 231621.0 | 33.520556 | -86.802500 |
Cullman | 2908668 | us | cullman | AL | 14796.0 | 34.174722 | -86.843611 |
Daphne | 2908701 | us | daphne | AL | 18581.0 | 30.603333 | -87.903611 |
Decatur | 2908730 | us | decatur | AL | 54621.0 | 34.605833 | -86.983333 |
Dothan | 2908787 | us | dothan | AL | 61741.0 | 31.223056 | -85.390556 |
Enterprise | 2908939 | us | enterprise | AL | 22572.0 | 31.315000 | -85.855278 |
Eufaula | 2908954 | us | eufaula | AL | 13453.0 | 31.891111 | -85.145556 |
Fairfield | 2908982 | us | fairfield | AL | 11641.0 | 33.485833 | -86.911944 |
Fairhope | 2908986 | us | fairhope | AL | 15419.0 | 30.522778 | -87.903333 |
Florence | 2909053 | us | florence | AL | 35733.0 | 34.799722 | -87.677222 |
Fort Payne | 2909087 | us | fort payne | AL | 13586.0 | 34.444167 | -85.719722 |
Gadsden | 2909142 | us | gadsden | AL | 36821.0 | 34.014167 | -86.006667 |
Gardendale | 2909161 | us | gardendale | AL | 12370.0 | 33.660000 | -86.812778 |
Hartselle | 2909380 | us | hartselle | AL | 13089.0 | 34.443333 | -86.935278 |
Helena | 2909416 | us | helena | AL | 13119.0 | 33.296111 | -86.843611 |
Homewood | 2909519 | us | homewood | AL | 23815.0 | 33.471667 | -86.800833 |
Hoover | 2909526 | us | hoover | AL | 66752.0 | 33.405278 | -86.811389 |
Hueytown | 2909557 | us | hueytown | AL | 15308.0 | 33.451111 | -86.996667 |
Huntsville | 2909571 | us | huntsville | AL | 167528.0 | 34.730278 | -86.586111 |
Irondale | 2909609 | us | irondale | AL | 9652.0 | 33.538056 | -86.707222 |
Jacksonville | 2909632 | us | jacksonville | AL | 8603.0 | 33.813611 | -85.761389 |
Jasper | 2909638 | us | jasper | AL | 13831.0 | 33.831111 | -87.277500 |
Leeds | 2909843 | us | leeds | AL | 11167.0 | 33.548056 | -86.544444 |
... | ... | ... | ... | ... | ... | ... | ... |
Verona | 3048989 | us | verona | WI | 10649.0 | 42.990833 | -89.533056 |
Watertown | 3049019 | us | watertown | WI | 23323.0 | 43.194722 | -88.728889 |
Waukesha | 3049024 | us | waukesha | WI | 68112.0 | 43.011667 | -88.231389 |
Waunakee | 3049026 | us | waunakee | WI | 10366.0 | 43.191944 | -89.455556 |
Waupun | 3049028 | us | waupun | WI | 10480.0 | 43.633333 | -88.729444 |
Wausau | 3049029 | us | wausau | WI | 36779.0 | 44.959167 | -89.630000 |
Wauwatosa | 3049033 | us | wauwatosa | WI | 45453.0 | 43.049444 | -88.007500 |
West Allis | 3049050 | us | west allis | WI | 59363.0 | 43.016667 | -88.006944 |
West Bend | 3049054 | us | west bend | WI | 29431.0 | 43.425278 | -88.183333 |
Weston | 3049071 | us | weston | WI | 13080.0 | 44.890833 | -89.547500 |
Whitefish Bay | 3049086 | us | whitefish bay | WI | 13611.0 | 43.113333 | -87.900000 |
Whitewater | 3049089 | us | whitewater | WI | 14536.0 | 42.833611 | -88.732222 |
Wisconsin Rapids | 3049120 | us | wisconsin rapids | WI | 17724.0 | 44.383611 | -89.817222 |
Casper | 3049260 | us | casper | WY | 51507.0 | 42.866667 | -106.312500 |
Cheyenne | 3049264 | us | cheyenne | WY | 55443.0 | 41.140000 | -104.819722 |
Cody | 3049273 | us | cody | WY | 9161.0 | 44.526389 | -109.055833 |
Douglas | 3049305 | us | douglas | WY | 5378.0 | 42.759722 | -105.381667 |
Evanston | 3049335 | us | evanston | WY | 11258.0 | 41.268333 | -110.962500 |
Gillette | 3049358 | us | gillette | WY | 23101.0 | 44.291111 | -105.501667 |
Green River | 3049368 | us | green river | WY | 11358.0 | 41.528611 | -109.465556 |
Jackson | 3049409 | us | jackson | WY | 8989.0 | 43.480000 | -110.761667 |
Lander | 3049439 | us | lander | WY | 6821.0 | 42.833056 | -108.730000 |
Laramie | 3049440 | us | laramie | WY | 26934.0 | 41.311389 | -105.590556 |
Powell | 3049544 | us | powell | WY | 5213.0 | 44.753889 | -108.756667 |
Rawlins | 3049552 | us | rawlins | WY | 8533.0 | 41.791111 | -107.238056 |
Riverton | 3049568 | us | riverton | WY | 9192.0 | 43.025000 | -108.379444 |
Rock Springs | 3049572 | us | rock springs | WY | 18226.0 | 41.587500 | -109.202222 |
Sheridan | 3049595 | us | sheridan | WY | 16154.0 | 44.797222 | -106.955556 |
Torrington | 3049648 | us | torrington | WY | 5483.0 | 42.065000 | -104.181111 |
Worland | 3049696 | us | worland | WY | 4729.0 | 44.016944 | -107.954722 |
4175 rows × 7 columns