Ebay Car Sales Data

We will work with a dataset of used cars from eBay Klieinanzeigen, a classifieds section of the German eBay website. The aim of the project is to clean the data and analyze the included used car listings.

The data dictionary provided with data as follows:

  • dateCrawled - When this ad was first crawled. All field-values are taken from this date.
  • name - Name of the car.
  • seller - Whether the seller is private or a dealer.
  • offerType - The type of listing
  • price - The price on the ad to sell the car.
  • abtest - Whether the listing is included in an A/B test.
  • vehicleType - The vehicle Type.
  • yearOfRegistration - The year in which the car was first registered
  • gearbox - the transmission type.
  • powerPS - The power of the car in PS.
  • model - The car model name.
  • kilometer - How many kilometers the car has driven.
  • monthofRegistration - The month in which the car was first registered.
  • fuelType - What type of fuel the car uses.
  • brand - The brand of the car.
  • notRepairedDamage - if the car has a damage which is not yet repaired.
  • dateCreated - The date on which the eBay listing was created.
  • nrOfPictures - The number of pictures in the ad.
  • postalCode - The postal code for the location of the vehicle.
  • lastSeenOnline - When the crawler saw this ad last online.

1. Importing and Reading Data

In [80]:
import pandas as pd
import numpy as np
In [81]:
autos = pd.read_csv('autos.csv', encoding='Latin-1')
In [82]:
autos
Out[82]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot $24,900 control limousine 2011 automatik 239 q5 100,000km 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot $1,980 control cabrio 1996 manuell 75 astra 150,000km 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot $13,200 test cabrio 2014 automatik 69 500 5,000km 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot $22,900 control kombi 2013 manuell 150 a3 40,000km 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot $1,250 control limousine 1996 manuell 101 vectra 150,000km 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

50000 rows × 20 columns

In [83]:
autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   dateCrawled          50000 non-null  object
 1   name                 50000 non-null  object
 2   seller               50000 non-null  object
 3   offerType            50000 non-null  object
 4   price                50000 non-null  object
 5   abtest               50000 non-null  object
 6   vehicleType          44905 non-null  object
 7   yearOfRegistration   50000 non-null  int64 
 8   gearbox              47320 non-null  object
 9   powerPS              50000 non-null  int64 
 10  model                47242 non-null  object
 11  odometer             50000 non-null  object
 12  monthOfRegistration  50000 non-null  int64 
 13  fuelType             45518 non-null  object
 14  brand                50000 non-null  object
 15  notRepairedDamage    40171 non-null  object
 16  dateCreated          50000 non-null  object
 17  nrOfPictures         50000 non-null  int64 
 18  postalCode           50000 non-null  int64 
 19  lastSeen             50000 non-null  object
dtypes: int64(5), object(15)
memory usage: 7.6+ MB
In [84]:
#Check the missing values 
missing_values = autos.isnull().sum()
missing_values.sort_values(ascending = False)

#Find percentage of missing values 
percentage_missing = round((missing_values/len(autos)) * 100)
percentage_missing.sort_values(ascending = False)
Out[84]:
notRepairedDamage      20.0
vehicleType            10.0
fuelType                9.0
model                   6.0
gearbox                 5.0
lastSeen                0.0
yearOfRegistration      0.0
name                    0.0
seller                  0.0
offerType               0.0
price                   0.0
abtest                  0.0
powerPS                 0.0
postalCode              0.0
odometer                0.0
monthOfRegistration     0.0
brand                   0.0
dateCreated             0.0
nrOfPictures            0.0
dateCrawled             0.0
dtype: float64
In [85]:
autos.head()
Out[85]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

Observations

  • Total 20 columns, 5000 rows
  • Data types are mostly strings, but few integers
  • The columns Vehicle Type, Gear Box, Model, Fuel Type, Not Repaired Damage have null values (no more than ~20% null values)
  • Column names use camelcase instead of Python's preferred snakecase
In [86]:
autos.columns
Out[86]:
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')
In [87]:
#Change the column names to appropriate snakecase letters
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type',
                 'price', 'abtest', 'vehicle_type', 'registeration_year',
                 'gearbox', 'power_ps', 'model', 'odometer',
                 'registeration_month', 'fuel_type', 'brand',
                 'unrepaired_damage', 'ad_created', 'nr_of_pictures',
                 'postal_code', 'last_seen']
In [88]:
autos.columns
Out[88]:
Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registeration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registeration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')
In [89]:
autos.head()
Out[89]:
date_crawled name seller offer_type price abtest vehicle_type registeration_year gearbox power_ps model odometer registeration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
  • I had to change the column names from camelcase to snakecase so Python can read them.

2. Initial Exploration and Cleaning

What to look out for

  • The text columns where most of the values are the same, those need to be dropped. They do not provide us with useful information for analysis.
  • Numeric data stored as text needs to be converted to numbers and cleaned
In [90]:
autos.describe(include = 'all')
Out[90]:
date_crawled name seller offer_type price abtest vehicle_type registeration_year gearbox power_ps model odometer registeration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
count 50000 50000 50000 50000 50000 50000 44905 50000.000000 47320 50000.000000 47242 50000 50000.000000 45518 50000 40171 50000 50000.0 50000.000000 50000
unique 48213 38754 2 2 2357 2 8 NaN 2 NaN 245 13 NaN 7 40 2 76 NaN NaN 39481
top 2016-03-09 11:54:38 Ford_Fiesta privat Angebot $0 test limousine NaN manuell NaN golf 150,000km NaN benzin volkswagen nein 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27
freq 3 78 49999 49999 1421 25756 12859 NaN 36993 NaN 4024 32424 NaN 30107 10687 35232 1946 NaN NaN 8
mean NaN NaN NaN NaN NaN NaN NaN 2005.073280 NaN 116.355920 NaN NaN 5.723360 NaN NaN NaN NaN 0.0 50813.627300 NaN
std NaN NaN NaN NaN NaN NaN NaN 105.712813 NaN 209.216627 NaN NaN 3.711984 NaN NaN NaN NaN 0.0 25779.747957 NaN
min NaN NaN NaN NaN NaN NaN NaN 1000.000000 NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN 0.0 1067.000000 NaN
25% NaN NaN NaN NaN NaN NaN NaN 1999.000000 NaN 70.000000 NaN NaN 3.000000 NaN NaN NaN NaN 0.0 30451.000000 NaN
50% NaN NaN NaN NaN NaN NaN NaN 2003.000000 NaN 105.000000 NaN NaN 6.000000 NaN NaN NaN NaN 0.0 49577.000000 NaN
75% NaN NaN NaN NaN NaN NaN NaN 2008.000000 NaN 150.000000 NaN NaN 9.000000 NaN NaN NaN NaN 0.0 71540.000000 NaN
max NaN NaN NaN NaN NaN NaN NaN 9999.000000 NaN 17700.000000 NaN NaN 12.000000 NaN NaN NaN NaN 0.0 99998.000000 NaN
In [91]:
autos['price'].value_counts()
Out[91]:
$0         1421
$500        781
$1,500      734
$2,500      643
$1,200      639
           ... 
$14,321       1
$5,475        1
$33,777       1
$4,222        1
$889          1
Name: price, Length: 2357, dtype: int64
In [92]:
autos['odometer'].value_counts()
Out[92]:
150,000km    32424
125,000km     5170
100,000km     2169
90,000km      1757
80,000km      1436
70,000km      1230
60,000km      1164
50,000km      1027
5,000km        967
40,000km       819
30,000km       789
20,000km       784
10,000km       264
Name: odometer, dtype: int64
In [93]:
autos['price'].head()
Out[93]:
0    $5,000
1    $8,500
2    $8,990
3    $4,350
4    $1,350
Name: price, dtype: object
In [94]:
reg_year = autos['registeration_year'].unique()
np.sort(reg_year)
Out[94]:
array([1000, 1001, 1111, 1500, 1800, 1910, 1927, 1929, 1931, 1934, 1937,
       1938, 1939, 1941, 1943, 1948, 1950, 1951, 1952, 1953, 1954, 1955,
       1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966,
       1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977,
       1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988,
       1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
       2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
       2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2800, 4100,
       4500, 4800, 5000, 5911, 6200, 8888, 9000, 9996, 9999])
In [95]:
autos['nr_of_pictures'].value_counts()
Out[95]:
0    50000
Name: nr_of_pictures, dtype: int64

Summary of Descriptive Stats

  • price and odometer columns need to be converted to numeric types from text types
  • registeration year contains invalid years because it ranges from 1000 to 9999 years
  • nr_of_pictures can be dropped because it only contains 0 as its value
  • seller and offer types have almost the same values
In [96]:
#Remove any non-numeric characters in the data
autos['price'] = autos['price'].str.replace('$', '')
autos['price'] = autos['price'].str.replace(',', '')

autos['odometer'] = autos['odometer'].str.replace('km', '')
autos['odometer'] = autos['odometer'].str.replace(',', '')

#Convert columns to numeric dtype
autos['price'] = autos['price'].astype(float)
autos['odometer'] = autos['odometer'].astype(float)
In [97]:
autos
Out[97]:
date_crawled name seller offer_type price abtest vehicle_type registeration_year gearbox power_ps model odometer registeration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot 5000.0 control bus 2004 manuell 158 andere 150000.0 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot 8500.0 control limousine 1997 automatik 286 7er 150000.0 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot 8990.0 test limousine 2009 manuell 102 golf 70000.0 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot 4350.0 control kleinwagen 2007 automatik 71 fortwo 70000.0 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot 1350.0 test kombi 2003 manuell 0 focus 150000.0 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot 24900.0 control limousine 2011 automatik 239 q5 100000.0 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot 1980.0 control cabrio 1996 manuell 75 astra 150000.0 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot 13200.0 test cabrio 2014 automatik 69 500 5000.0 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot 22900.0 control kombi 2013 manuell 150 a3 40000.0 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot 1250.0 control limousine 1996 manuell 101 vectra 150000.0 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

50000 rows × 20 columns

In [98]:
#Rename column name - odometer to odometer_km 
autos.rename({'odometer': 'odometer_km'}, axis=1, inplace=True)
In [99]:
autos.columns
Out[99]:
Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registeration_year', 'gearbox', 'power_ps', 'model',
       'odometer_km', 'registeration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')
In [100]:
autos['seller'].value_counts()
Out[100]:
privat        49999
gewerblich        1
Name: seller, dtype: int64
In [101]:
autos['offer_type'].value_counts()
Out[101]:
Angebot    49999
Gesuch         1
Name: offer_type, dtype: int64

Exploring the Odometer and Price Columns

  • odometer_km and price values do not look right
  • These two columns are critical for our analysis
In [102]:
autos['odometer_km'].unique().shape
Out[102]:
(13,)
In [103]:
autos['odometer_km'].describe()
Out[103]:
count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64
In [104]:
autos['odometer_km'].value_counts().sort_index(ascending=False)
Out[104]:
150000.0    32424
125000.0     5170
100000.0     2169
90000.0      1757
80000.0      1436
70000.0      1230
60000.0      1164
50000.0      1027
40000.0       819
30000.0       789
20000.0       784
10000.0       264
5000.0        967
Name: odometer_km, dtype: int64

Based on our analysis, there seems to be no major outliers in the odometer_km column

In [105]:
autos['price'].unique().shape
Out[105]:
(2357,)
In [106]:
autos['price'].describe()
Out[106]:
count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64
In [107]:
autos['price'].value_counts().sort_index(ascending=False)
Out[107]:
99999999.0       1
27322222.0       1
12345678.0       3
11111111.0       2
10000000.0       1
              ... 
5.0              2
3.0              1
2.0              3
1.0            156
0.0           1421
Name: price, Length: 2357, dtype: int64
In [108]:
autos['price'].min(), autos['price'].max(), autos['price'].median(), autos['price'].mean()
Out[108]:
(0.0, 99999999.0, 2950.0, 9840.04376)

In the price column, we need to remove the numbers between 0 and 1, and any numbers above 10 million, because they are all outliers. We should use boolean filter and filter index to remove these outliers

In [125]:
#Find all price values between $0 and $5
price_0to5_bool = autos["price"].between(0, 5)
price_0to5_bool = autos[price_0to5_bool]
price_0to5_bool.index.name = 'index_0to5'
price_0to5_bool
Out[125]:
date_crawled name seller offer_type price abtest vehicle_type registeration_year gearbox power_ps model odometer_km registeration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
index_0to5
27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE privat Angebot 0.0 control NaN 2005 NaN 0 NaN 150000.0 0 NaN ford NaN 2016-03-27 00:00:00 0 66701 2016-03-27 18:45:01
55 2016-03-07 02:47:54 Mercedes_E320_AMG_zu_Tauschen! privat Angebot 1.0 test NaN 2017 automatik 224 e_klasse 125000.0 7 benzin mercedes_benz nein 2016-03-06 00:00:00 0 22111 2016-03-08 05:45:44
71 2016-03-28 19:39:35 Suche_Opel_Astra_F__Corsa_oder_Kadett_E_mit_Re... privat Angebot 0.0 control NaN 1990 manuell 0 NaN 5000.0 0 benzin opel NaN 2016-03-28 00:00:00 0 4552 2016-04-07 01:45:48
80 2016-03-09 15:57:57 Nissan_Primera_Hatchback_1_6_16v_73_Kw___99Ps_... privat Angebot 0.0 control coupe 1999 manuell 99 primera 150000.0 3 benzin nissan ja 2016-03-09 00:00:00 0 66903 2016-03-09 16:43:50
87 2016-03-29 23:37:22 Bmw_520_e39_zum_ausschlachten privat Angebot 0.0 control NaN 2000 NaN 0 5er 150000.0 0 NaN bmw NaN 2016-03-29 00:00:00 0 82256 2016-04-06 21:18:15
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49884 2016-03-11 13:55:30 Audi_a6_2.5l__Schnaeppchen_nur_heute privat Angebot 0.0 test kombi 1999 manuell 150 a6 150000.0 11 diesel audi NaN 2016-03-11 00:00:00 0 27711 2016-03-12 03:17:08
49943 2016-03-16 20:46:08 Opel_astra privat Angebot 0.0 control NaN 2016 manuell 101 astra 150000.0 8 benzin opel NaN 2016-03-16 00:00:00 0 89134 2016-03-17 19:44:20
49960 2016-03-25 22:51:55 Ford_KA_zu_verschenken_***Reserviert*** privat Angebot 0.0 control kleinwagen 1999 manuell 60 ka 150000.0 6 benzin ford NaN 2016-03-25 00:00:00 0 34355 2016-03-25 22:51:55
49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... privat Angebot 0.0 control cabrio 1983 manuell 70 golf 150000.0 2 benzin volkswagen nein 2016-03-20 00:00:00 0 8209 2016-03-27 19:48:16
49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... privat Angebot 0.0 test NaN 2000 NaN 0 NaN 150000.0 0 NaN sonstige_autos NaN 2016-03-31 00:00:00 0 12103 2016-04-02 19:44:53

1583 rows × 20 columns

There are 1583 data values with price ranging between 0 and 5.

In [129]:
#Remove the price values between $0 and $5, because they are outliers 
autos.drop(axis = 0, labels = index_0to5)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-129-a3625bcee24a> in <module>
      1 #Remove the price values between $0 and $5, because they are outliers
----> 2 autos.drop(axis = 0, labels = index_0to5)

NameError: name 'index_0to5' is not defined
In [72]:
 
Out[72]:
date_crawled name seller offer_type price abtest vehicle_type registeration_year gearbox power_ps model odometer_km registeration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot 5000.0 control bus 2004 manuell 158 andere 150000.0 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot 8500.0 control limousine 1997 automatik 286 7er 150000.0 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot 8990.0 test limousine 2009 manuell 102 golf 70000.0 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot 4350.0 control kleinwagen 2007 automatik 71 fortwo 70000.0 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot 1350.0 test kombi 2003 manuell 0 focus 150000.0 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot 24900.0 control limousine 2011 automatik 239 q5 100000.0 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot 1980.0 control cabrio 1996 manuell 75 astra 150000.0 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot 13200.0 test cabrio 2014 automatik 69 500 5000.0 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot 22900.0 control kombi 2013 manuell 150 a3 40000.0 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot 1250.0 control limousine 1996 manuell 101 vectra 150000.0 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

48417 rows × 20 columns

In [ ]:
 
In [ ]: