# Lecture 6¶

In [ ]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')


## Review and continuation Table operations¶

Last class we discussed the following table methods which return new Tables as output:

1. tb.select(label): constructs a new table with just the specified columns
2. tb.drop(label): constructs a new table in which the specified columns are omitted
3. tb.sort(label): constructs a new table with rows sorted by the specified column
4. tb.where(label, condition): constructs a new table with just the rows that match the condition

There are a number of properties we can extract from a Table including:

• num_rows: returns the number of rows in a Table
• num_columns: returns the number columns in a Table

There are also a number of additional methods for Tables including:

• relabel('column_name', 'new_name'): returns a table where the column name 'column_name' is now called 'new_name'
• take(row_numbers): returns a Table with the selected row numbers
In [ ]:
# Load the ice cream data. Each row represents one ice cream cone.
cones

In [ ]:
# select only the chocolate cones using the where method as we did last class

In [ ]:
# print the number of rows and columns

In [ ]:
# relabel a column

In [ ]:
# extract a row


## Columns of Tables are Arrays¶

We can extract columns from a Table as either:

• A new Table with fewer columns using tb.select()
• An ndarray using tb.column()
In [ ]:
# select() returns a table

In [ ]:


In [ ]:
# column() returns a an array

In [ ]:



## Lists¶

Lists are one of the most widely used data structions in Python. They like like ndarrays but they can hold heterogeneous types of data.

• We construction lists using square brackets [], where the elements in the list are separated by commas.
• We can access the third items in a list called my_list using my_list[2].
In [ ]:



## Constructing Tables¶

We have created tables by loading data from comma separated value files (.csv files). We can also create Tables from scratch by using:

• Table(): constucts an empty Table
• tb.with_columns("Name", array) adds columns to a Table
• tb.with_row("Name", list) adds a row to a Table

Let's try creating a table that says how many blocks away different streets are from our classroom (now that we are back in person!).

In [ ]:
# create an array of street names

In [ ]:
# create a Table with street names and the distance from our classroom

In [ ]:


In [ ]:
# add another row to the Table

In [ ]:
# add another column to the Table saying whether a street is one-way or two-way

In [ ]:



## Example: Census data¶

The US government conducts a census every 10 years. We can examine the census data to see interesting patterns in the population of people in the United States.

In [ ]:
# As of August 2021, this census file is online here:
data = 'http://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/nc-est2019-agesex-res.csv'

# A local copy can be accessed here in case census.gov moves the file:
# data = path_data + 'nc-est2019-agesex-res.csv'

full

In [ ]:
# get a reduced set of columns that we want to analyze further

In [ ]:
# rename the columns to make them easier to work with

In [ ]:
# let's examine the data a little more

In [ ]:
# let's remove the totals (value of 999 in the AGE column)

In [ ]:
# let's split the data into male, female and everyone

In [ ]:


In [ ]:
# let's see which ages have the most people

In [ ]:


In [ ]:
# let's create a Table with age males and females

In [ ]:


In [ ]:
# let's add a precent female column to our Table

In [ ]:



## Line Graphs¶

A useful way to visualize data as a function of time is a line plot. We can do this using the tb.plot('x_col_name', 'y_col_name') method.

In [ ]:
# plot Percent Female as a function of Age

In [ ]:


In [ ]:
# plot Males and Females

In [ ]:
# see the which ages have had the biggest changes between 2014 to 2019

In [ ]:


In [ ]:
# Let's look at the percent change between 2014 to 2019 for each age

In [ ]:
# plot percent change - any ideas why larger increases around age 72 and late 90's?

In [ ]: