In [1]:

from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

Arrays¶

Arrays are a data structure that holds a sequence of values of the same type. For example, a squence of all numbers, or a squence of all strings, etc.

We can use use the make_array function from the datascience package to create what are called ndarray that are array implemented by the NumPy package. One can perform a range of operations on these arrays in a very efficient manner.

In [ ]:

Ranges¶

Range functions allow one to create arrays of ordered sequences of numbers. We can use the np.arange() function to create NumPy ndarrays.

In [ ]:

Tables¶

Tables stored structured data. We can use the datascience package to create Table objects that we can perform data manipulation operations on (the Table object is a simplified version of a Pandas DataFrame).

Some methods we can perform on Table objects are:

tb.show(k): show the first k rows of the table
tb.select('col1', 'col2'): select col1 and col2 from the table
tb.drop('col'): remove col from the table
tb.sort('col'): sort the rows in the table based on the values in col
tb.where('col', value): reduce the table to rows where col is equal to value

These methods all return Table objects that have been modified based on the methods that have been called.

Let's look at data on ice cream cones that is described in the class textbook.

In [2]:

# Load the ice cream data. Each row represents one ice cream cone.
cones = Table.read_table('cones.csv')
cones

Out[2]:

Flavor	Color	Price
strawberry	pink	3.55
chocolate	light brown	4.75
chocolate	dark brown	5.25
strawberry	pink	5.25
chocolate	dark brown	5.25
bubblegum	pink	4.75

In [ ]:

In [3]:

# Show the first 2 rows of the data

In [4]:

# select only the Flavor column

In [5]:

# the original cones Table is not modified

In [6]:

# select the Flavor and Price columns

In [7]:

# remove the Color column

In [8]:

# sort by price

In [9]:

# sort by price highest to loweset

In [10]:

# select only the chocolate cones

In [11]:

# We can combine mulitple method called. Let's drop the color and then sort by price

Example: NBA Salaries¶

Let's look basketball (NBA) salaries from the 2015-2016 season. The data is originally from https://www.statcrunch.com/app/index.php?dataid=1843341

In [12]:

# NBA players, 2015-2016 season
nba = Table.read_table('nba_salaries.csv').relabeled(3, 'SALARY')

nba

Out[12]:

PLAYER	POSITION	TEAM	SALARY
Paul Millsap	PF	Atlanta Hawks	18.6717
Al Horford	C	Atlanta Hawks	12
Tiago Splitter	C	Atlanta Hawks	9.75625
Jeff Teague	PG	Atlanta Hawks	8
Kyle Korver	SG	Atlanta Hawks	5.74648
Thabo Sefolosha	SF	Atlanta Hawks	4
Mike Scott	PF	Atlanta Hawks	3.33333
Kent Bazemore	SF	Atlanta Hawks	2
Dennis Schroder	PG	Atlanta Hawks	1.7634
Tim Hardaway Jr.	SG	Atlanta Hawks	1.30452

... (407 rows omitted)

In [13]:

# Let's get Stephen Curry's data

In [14]:

# Let's get data from the New York Knicks

Columns of Tables are Arrays¶

We can extract columns from a Table as either:

A new Table with fewer columns using tb.select()
An ndarray using tb.column()

In [15]:

# extract a column from a Tables as a Table

In [ ]:

In [16]:

# extract a column from a Tables as an ndarray

In [ ]:

Creating a Table from Scratch¶

We can also create tables from scratch using the Tables() method and then adding columns to the table using the tb.with_colum("col_name", ndarray) method.

In [ ]: