HW 1: Solutions

Total: 25 pts

Start date: Tuesday Sept. 3
Due date: Tuesday Sept. 10

If you don't already have a version of anaconda installed, start by downloading anaconda and installing it (see for example here). When working on the exercises below, keep in mind that there exists a rich python documentation online. Don't hesitate to check the documentation and examples related to the functions you want to use.

1. (4pts) Numerical Linear Algebra: Numpy

  • Start by building a 10 by 10 matrix of random Gaussian entries. Then compute the two largest eigenvalues of the matrix
  • Reshape the matrix that you built above into a 2 by 50 array (call it $v$) first and into a single vector then (call it 'w'). Return the vector obtained by sorting the elements of $w$ in descending order
  • Generate two random vectors (you can choose the distribution you use to generate the entries). Let us call those vectors $v1$ and $v2$. Stack those vectors vertically then horizontally. Store the respective results in two matrices $A$ and $B$.
  • Do the same with two random arrays $C_1 \in \mathbb{R}^{n\times n}$ and $C_2^{n\times n}$. Store the results in the variables $Cv$ and $Ch$
In [ ]:
import numpy as np
from numpy import linalg as LA

M = np.random.normal(0, 1, (10, 10))
w, v = LA.eig(M)


# Here since I didn't specify anything, you can either return the eigenvalue with the largest real part, 
# the largest imaginary part, let numpy choose by just aplying sort to the vector of complex eigenvalues, 
# or, what I decide to do here, return the eigenvalues with the largest modulus.   

indices = np.argsort(np.absolute(w))
indices = indices[::-1]
w = w[indices]


print "first eigenvalue:", w[0] 
print "second eigenvalue:", w[1]

M_reshape_v = np.reshape(M, (2, 50))
M_reshape_w = np.reshape(M, (1, 100))
M_reshape_w_sorted = np.sort(M_reshape_w)
M_reshape_w_sorted = M_reshape_w_sorted[:,::-1] # get the elements in descending order


# to change I decide to generate random integers between 0 and n
v1 = np.random.randint(10, size=(1, 10))
v2 = np.random.randint(10, size=(1, 10))

vertical_stack  = np.vstack((v1,v2))
horizontal_stack  = np.hstack((v1,v2))

C1 = np.random.randint(10, size = (10,10))
C2 = np.random.randint(10, size = (10,10))

Cv  = np.vstack((C1,C2))
Ch  = np.hstack((C1,C2))

2. (2pts) Towards multiclass classification: one-hot encoding

  • Generate a vector (let us call it $v$) of integers taking values between 0 and 9.
  • Then build the vector corresponding to the one-hot encoding of each entry in $v$ (a one-hot encoding represents each categorical variable (0 to 9 digits in your vector $v$ by using binary sequences in which only one entry (for example the one corresponding to the digit that is encoded) is non zero))
In [ ]:
# you were free to choose the size you wanted

vv = np.random.randint(9, size = (10,))

# there are many valid approaches. One of them is to build a matrix of zeros and then put 
# one at the column index corresponding to the number in vv
print(vv)
oneHotEncoding_vv = np.zeros((10,10))

oneHotEncoding_vv[np.arange(10),vv] = 1

3. (6pt) Towards regression: sampling and matplolib

3a. (2pts) One dimensional In this exercise, we will successively generate points according to a function, sample pairs (t,f) from that distribution and plot the results

  • Using the 'linspace' function from numpy, generate $1000$ pairs $(t, f(t) = \frac{1}{1+e^{-t}})$ for values of $t$ between $-6$ and $6$. What does the function look like?
  • Generate 100 random pairs $(t_i, f_i)$ from the plot. Then plot the points $(t_i,x_i)$ on top of the line $(t, f(t))$ using matplotlib (you can choose how you randomly generate the points)
In [28]:
# first generate te pairs

t = np.linspace(-6.0, 6.0, num=1000)
f = np.reciprocal(1+np.exp(-t))

# Then plot the function using pyplot

import matplotlib.pyplot as plt
plt.plot(t, f)
plt.show()

# As we want to sample without replacement, we can use the function 'sample' from random

from random import sample 

sampleData = np.random.choice(t, 100)
f_sampleData = np.reciprocal(1+np.exp(-sampleData))

# Then we use Scatter to plot the samples

plt.scatter(sampleData, f_sampleData, c = 'red')
plt.plot(t, f)
plt.show()

3b. (4pts) The two dimensional hyperplane

  • An extension of the previous case, we now want to generate triples $(x,y, t)$ according to the following hyperplane:
$$t \equiv\pi(x, y) = x + y +1$$

using Axes3D, matplolib and pyplot, as well as the meshgrid( ) and arrange( ) functions from numpy and the _plotsurface( ) and scatter( ) functions from pyplot,

  • Generate a regular grid of points $(x, y)$ covering the domain $[-20,20]\times [-20,20]$. Let us say 200 by 200.
  • As in the 1D case, we now want to generate noisy samples that are lying on the plane on average. Start by generating $(50\times 50)$ triples $(x,y,\pi(x,y))$ covering the domain $[-20,20]\times [-20,20]$.
  • Perturb the $50\times 50$ pairs by adding to them a random gaussian noise of amplitude no larger than $0.1$
  • Finally using the scatter( ) function from pyplot, plot the noisy samples on top of the plane.
In [40]:
import numpy as np

x = np.linspace(-20, 20, 200)
y = np.linspace(-20, 20, 200)
xv, yv = np.meshgrid(x, y)

tv = xv + yv + 1


tv = tv+ np.random.normal(0, .1, (200,200))

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.gca(projection='3d')

# plot the plane
ax.plot_surface(xv, yv, tv, alpha=0.2)

# Although this was not part of the erxercise, I downsample the points to get a clear picture

x = np.linspace(-20, 20, 20)
y = np.linspace(-20, 20, 20)
xv2, yv2 = np.meshgrid(x, y)


# I also increased the variance a little to emphasize the noise
tv2 = xv2 + yv2 + 1 + np.random.normal(0, 1, (20,20))


ax.scatter(xv2.flatten() , yv2.flatten() , tv2.flatten(),  color='green')

plt.show()

4. (3pts) Getting started with Pandas and Kaggle datasets

4a Download the car dataset on Kaggle and open this dataset with pandas.

  • Display a couple (5-10) of rows from the pandas data frame.
  • Find the brand that has the highest average price across cars
  • Sort the cars according to their horse power and return the corresponding panda frame. Display the first 10 lines from the frame.
In [73]:
# There are several ways to answer the question. I give one

import pandas as pd
data = pd.read_csv("Automobile_data 3.csv")

# printing first 10 rows
data.head(10)

# I remove the rows that do not contain a number for the price
data = data[(data.astype(str) != '?').all(axis=1)]


 # returning the brands
Brand_names = data.make.unique()

averages = np.zeros((Brand_names.size,))
k=0

for i in Brand_names:
    
    tmp = data.loc[data['make'] == i]
    tmp = tmp['price'].astype(int)
    averages[k] = np.mean(tmp)
    k += 1
     

columns = Brand_names
rows = ["averages"]
data = np.array([averages])
averageData = pd.DataFrame(data=data, index=rows, columns=columns)


averageData

# The last one is one line
Out[73]:
audi bmw chevrolet dodge honda jaguar mazda mercedes-benz mitsubishi nissan peugot plymouth porsche saab subaru toyota volkswagen volvo
averages 18246.25 18857.5 6007.0 7790.125 8184.692308 32250.0 9080.0 29726.4 7813.0 10415.666667 15758.571429 7163.333333 22018.0 15223.333333 8541.25 9696.645161 8738.125 18063.181818
In [37]:
# For the last point, you might want to turn the strings into number to get the proper sort. 
# Once you are done with that part, the answer is literally one line

import pandas as pd
import numpy as np
data = pd.read_csv("Automobile_data 3.csv")
# printing first 10 rows
data.head(10)
# I remove the rows that do not contain a number for the price
data = data[(data.astype(str) != '?').all(axis=1)]

data.iloc[np.argsort(data['horsepower'].values.astype(np.float))]


# returning the brands
#data
Out[37]:
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base ... engine-size fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg highway-mpg price
18 2 121 chevrolet gas std two hatchback fwd front 88.4 ... 61 2bbl 2.91 3.03 9.5 48 5100 47 53 5151
184 2 94 volkswagen diesel std four sedan fwd front 97.3 ... 97 idi 3.01 3.4 23.0 52 4800 37 46 7995
182 2 122 volkswagen diesel std two sedan fwd front 97.3 ... 97 idi 3.01 3.4 23.0 52 4800 37 46 7775
90 1 128 nissan diesel std two sedan fwd front 94.5 ... 103 idi 2.99 3.47 21.9 55 4800 45 50 7099
159 0 91 toyota diesel std four hatchback fwd front 95.7 ... 110 idi 3.27 3.35 22.5 56 4500 38 47 7788
158 0 91 toyota diesel std four sedan fwd front 95.7 ... 110 idi 3.27 3.35 22.5 56 4500 34 36 7898
30 2 137 honda gas std two hatchback fwd front 86.6 ... 92 1bbl 2.91 3.41 9.6 58 4800 49 54 6479
32 1 101 honda gas std two hatchback fwd front 93.7 ... 79 1bbl 2.91 3.07 10.1 60 5500 38 42 5399
153 0 77 toyota gas std four wagon fwd front 95.7 ... 92 2bbl 3.05 3.03 9.0 62 4800 31 37 6918
151 1 87 toyota gas std two hatchback fwd front 95.7 ... 92 2bbl 3.05 3.03 9.0 62 4800 31 38 6338
155 0 91 toyota gas std four wagon 4wd front 95.7 ... 92 2bbl 3.05 3.03 9.0 62 4800 27 32 8778
154 0 81 toyota gas std four wagon 4wd front 95.7 ... 92 2bbl 3.05 3.03 9.0 62 4800 27 32 7898
150 1 87 toyota gas std two hatchback fwd front 95.7 ... 92 2bbl 3.05 3.03 9.0 62 4800 35 39 5348
152 1 74 toyota gas std four hatchback fwd front 95.7 ... 92 2bbl 3.05 3.03 9.0 62 4800 31 38 6488
118 1 119 plymouth gas std two hatchback fwd front 93.7 ... 90 2bbl 2.97 3.23 9.4 68 5500 37 41 5572
52 1 104 mazda gas std two hatchback fwd front 93.1 ... 91 2bbl 3.03 3.15 9.0 68 5000 31 38 6795
51 1 104 mazda gas std two hatchback fwd front 93.1 ... 91 2bbl 3.03 3.15 9.0 68 5000 31 38 6095
120 1 154 plymouth gas std four hatchback fwd front 93.7 ... 90 2bbl 2.97 3.23 9.4 68 5500 31 38 6229
53 1 113 mazda gas std four sedan fwd front 93.1 ... 91 2bbl 3.03 3.15 9.0 68 5000 31 38 6695
54 1 113 mazda gas std four sedan fwd front 93.1 ... 91 2bbl 3.08 3.15 9.0 68 5000 31 38 7395
76 2 161 mitsubishi gas std two hatchback fwd front 93.7 ... 92 2bbl 2.97 3.23 9.4 68 5500 37 41 5389
78 2 161 mitsubishi gas std two hatchback fwd front 93.7 ... 92 2bbl 2.97 3.23 9.4 68 5500 31 38 6669
50 1 104 mazda gas std two hatchback fwd front 93.1 ... 91 2bbl 3.03 3.15 9.0 68 5000 30 31 5195
121 1 154 plymouth gas std four sedan fwd front 93.7 ... 90 2bbl 2.97 3.23 9.4 68 5500 31 38 6692
77 2 161 mitsubishi gas std two hatchback fwd front 93.7 ... 92 2bbl 2.97 3.23 9.4 68 5500 31 38 6189
187 2 94 volkswagen diesel turbo four sedan fwd front 97.3 ... 97 idi 3.01 3.4 23.0 68 4500 37 42 9495
122 1 154 plymouth gas std four sedan fwd front 93.7 ... 98 2bbl 2.97 3.23 9.4 68 5500 31 38 7609
26 1 148 dodge gas std four sedan fwd front 93.7 ... 90 2bbl 2.97 3.23 9.4 68 5500 31 38 7609
25 1 148 dodge gas std four sedan fwd front 93.7 ... 90 2bbl 2.97 3.23 9.4 68 5500 31 38 6692
24 1 148 dodge gas std four hatchback fwd front 93.7 ... 90 2bbl 2.97 3.23 9.4 68 5500 31 38 6229
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
88 -1 137 mitsubishi gas std four sedan fwd front 96.3 ... 110 spdi 3.17 3.46 7.5 116 5500 23 30 9279
167 2 134 toyota gas std two hardtop rwd front 98.4 ... 146 mpfi 3.62 3.5 9.3 116 4800 24 30 8449
65 0 118 mazda gas std four sedan rwd front 104.9 ... 140 mpfi 3.76 3.16 8.0 120 5000 19 27 18280
12 0 188 bmw gas std two sedan rwd front 101.2 ... 164 mpfi 3.31 3.19 9.0 121 4250 21 28 20970
13 0 188 bmw gas std four sedan rwd front 101.2 ... 164 mpfi 3.31 3.19 9.0 121 4250 21 28 21105
70 -1 93 mercedes-benz diesel turbo four sedan rwd front 115.6 ... 183 idi 3.58 3.64 21.5 123 4350 22 25 31600
69 0 93 mercedes-benz diesel turbo two hardtop rwd front 106.7 ... 183 idi 3.58 3.64 21.5 123 4350 22 25 28176
68 -1 93 mercedes-benz diesel turbo four wagon rwd front 110.0 ... 183 idi 3.58 3.64 21.5 123 4350 22 25 28248
67 -1 93 mercedes-benz diesel turbo four sedan rwd front 110.0 ... 183 idi 3.58 3.64 21.5 123 4350 22 25 25552
202 -1 95 volvo gas std four sedan rwd front 109.1 ... 173 mpfi 3.58 2.87 8.8 134 5500 18 23 21485
8 1 158 audi gas turbo four sedan fwd front 105.8 ... 131 mpfi 3.13 3.4 8.3 140 5500 17 20 23875
117 0 161 peugot gas turbo four sedan rwd front 108.0 ... 134 mpfi 3.61 3.21 7.0 142 5600 18 24 18150
125 3 186 porsche gas std two hatchback rwd front 94.5 ... 151 mpfi 3.94 3.11 9.5 143 5500 19 27 22018
29 3 145 dodge gas turbo two hatchback fwd front 95.9 ... 156 mfi 3.6 3.9 7.0 145 5000 19 24 12964
102 0 108 nissan gas std four wagon fwd front 100.4 ... 181 mpfi 3.43 3.27 9.0 152 5200 17 22 14399
103 0 108 nissan gas std four sedan fwd front 100.4 ... 181 mpfi 3.43 3.27 9.0 152 5200 19 25 13499
101 0 128 nissan gas std four sedan fwd front 100.4 ... 181 mpfi 3.43 3.27 9.0 152 5200 17 22 13499
72 3 142 mercedes-benz gas std two convertible rwd front 96.6 ... 234 mpfi 3.46 3.1 8.3 155 4750 16 18 35056
180 -1 90 toyota gas std four sedan rwd front 104.5 ... 171 mpfi 3.27 3.35 9.2 156 5200 20 24 15690
104 3 194 nissan gas std two hatchback rwd front 91.3 ... 181 mpfi 3.43 3.27 9.0 160 5200 19 25 17199
106 1 231 nissan gas std two hatchback rwd front 99.2 ... 181 mpfi 3.43 3.27 9.0 160 5200 19 25 18399
201 -1 95 volvo gas turbo four sedan rwd front 109.1 ... 141 mpfi 3.78 3.15 8.7 160 5300 19 25 19045
137 2 104 saab gas turbo four sedan fwd front 99.1 ... 121 mpfi 3.54 3.07 9.0 160 5500 19 26 18620
136 3 150 saab gas turbo two hatchback fwd front 99.1 ... 121 mpfi 3.54 3.07 9.0 160 5500 19 26 18150
179 3 197 toyota gas std two hatchback rwd front 102.9 ... 171 mpfi 3.27 3.35 9.3 161 5200 19 24 15998
178 3 197 toyota gas std two hatchback rwd front 102.9 ... 171 mpfi 3.27 3.35 9.3 161 5200 20 24 16558
198 -2 103 volvo gas turbo four sedan rwd front 104.3 ... 130 mpfi 3.62 3.15 7.5 162 5100 17 22 18420
199 -1 74 volvo gas turbo four wagon rwd front 104.3 ... 130 mpfi 3.62 3.15 7.5 162 5100 17 22 18950
47 0 145 jaguar gas std four sedan rwd front 113.0 ... 258 mpfi 3.63 4.17 8.1 176 4750 15 19 32250
105 3 194 nissan gas turbo two hatchback rwd front 91.3 ... 181 mpfi 3.43 3.27 7.8 200 5200 17 23 19699

159 rows × 26 columns