#!/usr/bin/env python
# coding: utf-8

# [Home](Home.ipynb)
# 
# # Tabular Python (numpy + pandas)
# 
# Although numpy's n-dimensional arrays may have as many axes as we like, we're typically working with rows and columns, like a spreadsheet.  We call this a two dimensional array.
# 
# Numpy does the number crunching with respect to these arrays.  Pandas provides a DataFrame much like a picture frame around a canvas.  You get to label and reorder rows and columns, will mixing columns (called Series type objects) of different types.

# In[4]:


import numpy as np
import pandas as pd


# In[20]:


"numpy ver: {}; pandas ver: {}".format(np.__version__, pd.__version__)


# Python's native range type is all fine and good when you want consecutive integers. Thanks to list comprehension syntax, defining floating point inputs at some constant interval is likewise doable.

# In[3]:


[i/10 for i in range(1, 11)] # range has no native floating point ability


# However, numpy introduces two new domain-makers that make life easier yet, in that both return np.ndarray type objects, equipped with all manner of methods.
# 
# ```arange``` gets a start and up-to-but-not-including argument, followed by the discrete interval between adjacent values.  

# In[4]:


np.arange(0.1, 1.1, 0.1) # start, up to, interval


# ```linspace``` is similar, but a third argument specifies how many points should appear between start and stop values.  The stop value is inclusive in this case.

# In[5]:


np.linspace(0.1, 1.0, 10) # start, stop, how many


# ### Parabolas
# 
# The cells below may be used as a worksheet.  Try different polynomials with varied domains, tabulating and plotting the results.

# In[21]:


x = np.arange(-3, 3.1, 0.1)


# In[22]:


y = x**2 - 1


# A DataFrame may be initialized in several ways, the most straightforward being a shown below, with a dict.  The dict keys are column headers, whereas the values should be numpy arrays with the corresponding column data.

# In[23]:


points = pd.DataFrame({"X":x, "Y":y})


# Notice how pretty the formatting, with middle rows left out.

# In[24]:


points


# In[27]:


points.tail(10)


# Once we have a DataFrame, we may invoke its plot method directly, with axis labels and title for named arguments.

# In[11]:


points.plot(x="x", y="y", title="Parabola");


# # pandas & SQL
# 
# Where else have we seen tabular data?  In spreadsheets, certainly.  And in SQL databases, where we have CRUD powers (Create, Retrieve, Update, Delete). We want all these powers, and many more, over our DataFrames as well.
# 
# Remember [our context manager studies](DatabaseFun.ipynb)?  That's where [context1](context1.py) gets more context.

# In[2]:


from context1 import DB


# In[7]:


with DB("roller_coasters.db") as db:
    sql_query = pd.read_sql_query("SELECT Name, Park, State FROM Coasters WHERE speed >= 80 ORDER BY State", db.conn)
    coasters = pd.DataFrame(sql_query)
    
coasters.head(20)


# In[17]:


with DB("airports.db") as db:
    sql_query = pd.read_sql_query("SELECT * FROM Airports", db.conn)
    airports = pd.DataFrame(sql_query)
    
airports.head()


# For further reading:
# 
# * [Help with DELETE on Quora](https://qr.ae/pGNaYy)