%matplotlib inline
import pandas as pd
data = pd.read_csv('../datasets/dataTrain_carListings.zip')
data.head()
Price | Year | Mileage | State | Make | Model | |
---|---|---|---|---|---|---|
0 | 21490 | 2014 | 31909 | MD | Nissan | MuranoAWD |
1 | 21250 | 2016 | 25741 | KY | Chevrolet | CamaroCoupe |
2 | 20925 | 2016 | 24633 | SC | Hyundai | Santa |
3 | 14500 | 2012 | 84026 | OK | Jeep | Grand |
4 | 32488 | 2013 | 22816 | TN | Jeep | Wrangler |
data.shape
(500000, 6)
data.Price.describe()
count 500000.000000 mean 21144.186304 std 10753.259704 min 5001.000000 25% 13499.000000 50% 18450.000000 75% 26998.000000 max 79999.000000 Name: Price, dtype: float64
data.plot(kind='scatter', y='Price', x='Year')
<matplotlib.axes._subplots.AxesSubplot at 0x1a3b24a5ef0>
data.plot(kind='scatter', y='Price', x='Mileage')
<matplotlib.axes._subplots.AxesSubplot at 0x1a3b2d3cd68>
data.columns
Index(['Price', 'Year', 'Mileage', 'State', 'Make', 'Model'], dtype='object')
Develop a machine learning model that predicts the price of the of car using as an input ['Year', 'Mileage', 'State', 'Make', 'Model']