%matplotlib inline
import pandas as pd
data = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTrain_carListings.zip')
data.head()
Price | Year | Mileage | State | Make | Model | |
---|---|---|---|---|---|---|
0 | 21490 | 2014 | 31909 | MD | Nissan | MuranoAWD |
1 | 21250 | 2016 | 25741 | KY | Chevrolet | CamaroCoupe |
2 | 20925 | 2016 | 24633 | SC | Hyundai | Santa |
3 | 14500 | 2012 | 84026 | OK | Jeep | Grand |
4 | 32488 | 2013 | 22816 | TN | Jeep | Wrangler |
data.shape
(500000, 6)
data.Price.describe()
count 500000.000000 mean 21144.186304 std 10753.259704 min 5001.000000 25% 13499.000000 50% 18450.000000 75% 26998.000000 max 79999.000000 Name: Price, dtype: float64
data.plot(kind='scatter', y='Price', x='Year')
<matplotlib.axes._subplots.AxesSubplot at 0x1a3b24a5ef0>
data.plot(kind='scatter', y='Price', x='Mileage')
<matplotlib.axes._subplots.AxesSubplot at 0x1a3b2d3cd68>
data.columns
Index(['Price', 'Year', 'Mileage', 'State', 'Make', 'Model'], dtype='object')
Develop a machine learning model that predicts the price of the of car using as an input ['Year', 'Mileage', 'State', 'Make', 'Model']
Submit the prediction of the testing set to Kaggle https://www.kaggle.com/c/miia4200-20191-p1-usedcarpriceprediction
data_test = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTest_carListings.zip', index_col=0)
data_test.head()
Year | Mileage | State | Make | Model | |
---|---|---|---|---|---|
ID | |||||
0 | 2015 | 23388 | OH | Ford | EscapeFWD |
1 | 2014 | 45061 | PA | Ford | EscapeSE |
2 | 2007 | 101033 | WI | Toyota | Camry4dr |
3 | 2015 | 13590 | HI | Jeep | Wrangler |
4 | 2009 | 118916 | CO | Dodge | Charger4dr |
data_test.shape
(250000, 5)
import numpy as np
np.random.seed(42)
y_pred = pd.DataFrame(np.random.rand(data_test.shape[0]) * 75000 + 5000, index=data_test.index, columns=['Price'])
y_pred.to_csv('test_submission.csv', index_label='ID')
y_pred.head()
Price | |
---|---|
ID | |
0 | 33090.508914 |
1 | 76303.572981 |
2 | 59899.545636 |
3 | 49899.386315 |
4 | 16701.398033 |