from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
Last class we discussed the following table methods which return new Tables as output:
tb.select(label)
: constructs a new table with just the specified columnstb.drop(label)
: constructs a new table in which the specified columns are omittedtb.sort(label)
: constructs a new table with rows sorted by the specified columntb.where(label, condition)
: constructs a new table with just the rows that match the conditionThere are a number of properties we can extract from a Table including:
num_rows
: returns the number of rows in a Tablenum_columns
: returns the number columns in a TableThere are also a number of additional methods for Tables including:
relabel('column_name', 'new_name')
: returns a table where the column name 'column_name'
is now called 'new_name'
take(row_numbers)
: returns a Table with the selected row numbers# Load the ice cream data. Each row represents one ice cream cone.
cones = Table.read_table('cones.csv')
cones
# select only the chocolate cones using the `where` method as we did last class
# print the number of rows and columns
# relabel a column
# extract a row
We can extract columns from a Table
as either:
Table
with fewer columns using tb.select()
ndarray
using tb.column()
# select() returns a table
# column() returns a an array
Lists are one of the most widely used data structions in Python. They like like ndarrays but they can hold heterogeneous types of data.
my_list
using my_list[2]
.
We have created tables by loading data from comma separated value files (.csv files). We can also create Tables from scratch by using:
Table()
: constucts an empty Tabletb.with_columns("Name", array)
adds columns to a Tabletb.with_row("Name", list)
adds a row to a TableLet's try creating a table that says how many blocks away different streets are from our classroom (now that we are back in person!).
# create an array of street names
# create a Table with street names and the distance from our classroom
# add another row to the Table
# add another column to the Table saying whether a street is one-way or two-way
The US government conducts a census every 10 years. We can examine the census data to see interesting patterns in the population of people in the United States.
# As of August 2021, this census file is online here:
data = 'http://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/nc-est2019-agesex-res.csv'
# A local copy can be accessed here in case census.gov moves the file:
# data = path_data + 'nc-est2019-agesex-res.csv'
full = Table.read_table(data)
full
# get a reduced set of columns that we want to analyze further
# rename the columns to make them easier to work with
# let's examine the data a little more
# let's remove the totals (value of 999 in the AGE column)
# let's split the data into male, female and everyone
# let's see which ages have the most people
# let's create a Table with age males and females
# let's add a precent female column to our Table
A useful way to visualize data as a function of time is a line plot. We can do this using the tb.plot('x_col_name', 'y_col_name')
method.
# plot Percent Female as a function of Age
# plot Males and Females
# see the which ages have had the biggest changes between 2014 to 2019
# Let's look at the percent change between 2014 to 2019 for each age
# plot percent change - any ideas why larger increases around age 72 and late 90's?