Analyzing the Trend of the Used Car Deals – Exploring eBay Car Sales Data

Introduction

eBay Kleinanzeigen is a classifieds section of the German eBay website. In this project, we will use the used cars dataset from eBay Kleinanzeigen to analyze the used cars ads created in Germany between 11th June 2015 and 7th April 2016.

The original dataset uploaded to Kaggle by user orgesleka is no longer available on Kaggle, but it is still accessible on data.world.

In this project, we use the modified version of the original dataset, which was prepared by Dataquest. The modifications are as follows:

  • 50,000 data points have been sampled from the original full dataset, which originally contained 370,000 data points.
  • The dataset was dirtied slightly — to make it resembles a scraped dataset before data cleaning (the original dataset on Kaggle had been cleaned).

The explanations for the columns of the autos.csv dataset are as follows:

  • dateCrawled: The date when this ad was first crawled
  • name: Name of the car
  • seller: A private seller or a dealer
  • offerType: The listing type
  • price: The asking price for selling the car
  • abtest: Whether the listing is included in an A/B test
  • vehicleType: The vehicle type
  • yearOfRegistration: The year in which the car was first registered
  • gearbox: The transmission type
  • powerPS: The power of the car in PS
  • model: The car model name
  • odometer: How many kilometres the car has driven
  • monthOfRegistration: The month in which the car was first registered
  • fuelType: What type of fuel the car uses
  • brand: The brand of the car
  • notRepairedDamage: Whether the car has damage which still needs to be repaired
  • dateCreated: The eBay listing creation date
  • nrOfPictures: The number of pictures in the ad
  • postalCode: The postal code for the location of the vehicle
  • lastSeenOnline: When the crawler saw this ad last online

The content of the dataset is in German, as the data was originally scraped from the German eBay website.

The Goal of the Project

The goal of this project is to clean the data and analyze the trend of the used car deals.

Summary of Results

We cleaned the dataset and observed that March and beginning of April 2016 are the busiest time for ad creation. We also found a lack of correlation between mean price and mean mileage by brand. The distribution of date_crawled and last_seen are fairly consistent throughout the whole data sampling period.

Reading the Data and Cleaning the Column Names

We read the autos.csv file into pandas and assign it to the variable name autos. The file could read by using encoding=Latin-1 instead of the default encoding UTF-8.

In [1]:
import pandas as pd
import numpy as np
In [2]:
autos = pd.read_csv('autos.csv', encoding='Latin-1')

Let's briefly examine the data:

In [3]:
autos
Out[3]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot $7,900 test bus 2006 automatik 150 voyager 150,000km 4 diesel chrysler NaN 2016-03-21 00:00:00 0 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot $300 test limousine 1995 manuell 90 golf 150,000km 8 benzin volkswagen NaN 2016-03-20 00:00:00 0 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot $1,990 control limousine 1998 manuell 90 golf 150,000km 12 diesel volkswagen nein 2016-03-16 00:00:00 0 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot $250 test NaN 2000 manuell 0 arosa 150,000km 10 NaN seat nein 2016-03-22 00:00:00 0 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot $590 control bus 1997 manuell 90 megane 150,000km 7 benzin renault nein 2016-03-16 00:00:00 0 15749 2016-04-06 10:46:35
10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau privat Angebot $999 test NaN 2017 manuell 90 NaN 150,000km 4 benzin volkswagen nein 2016-03-14 00:00:00 0 86157 2016-04-07 03:16:21
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot $350 control NaN 2000 NaN 0 NaN 150,000km 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 0 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot $5,299 control kleinwagen 2010 automatik 71 fortwo 50,000km 9 benzin smart nein 2016-03-31 00:00:00 0 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot $1,350 control limousine 1999 manuell 101 a3 150,000km 11 benzin audi nein 2016-03-23 00:00:00 0 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot $3,999 test kleinwagen 2007 manuell 75 clio 150,000km 9 benzin renault NaN 2016-03-23 00:00:00 0 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot $18,900 test coupe 1982 automatik 203 NaN 80,000km 6 benzin sonstige_autos nein 2016-04-01 00:00:00 0 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot $350 test kombi 1999 manuell 101 vectra 150,000km 5 benzin opel nein 2016-03-16 00:00:00 0 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot $5,500 test coupe 1990 manuell 205 scirocco 150,000km 6 benzin volkswagen nein 2016-03-29 00:00:00 0 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot $300 control bus 1995 manuell 150 3er 150,000km 0 benzin bmw NaN 2016-03-26 00:00:00 0 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot $4,150 control suv 2004 manuell 124 andere 150,000km 2 lpg mazda nein 2016-03-17 00:00:00 0 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot $3,500 test kombi 2003 manuell 131 a4 150,000km 5 diesel audi NaN 2016-03-05 00:00:00 0 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot $41,500 test cabrio 2004 manuell 320 911 150,000km 4 benzin porsche nein 2016-03-06 00:00:00 0 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot $25,450 control cabrio 2015 manuell 184 cooper 10,000km 1 benzin mini nein 2016-03-28 00:00:00 0 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot $7,999 control bus 2010 manuell 120 NaN 150,000km 2 diesel peugeot nein 2016-03-10 00:00:00 0 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot $48,500 control limousine 2014 automatik 306 5er 30,000km 12 benzin bmw nein 2016-04-03 00:00:00 0 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung privat Angebot $90 control kombi 1996 manuell 116 NaN 150,000km 4 benzin ford ja 2016-03-21 00:00:00 0 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot $777 control kleinwagen 1992 manuell 54 polo 125,000km 2 benzin volkswagen nein 2016-04-03 00:00:00 0 38110 2016-04-05 23:46:48
27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE privat Angebot $0 control NaN 2005 NaN 0 NaN 150,000km 0 NaN ford NaN 2016-03-27 00:00:00 0 66701 2016-03-27 18:45:01
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot $5,250 control kleinwagen 2007 manuell 110 cooper 150,000km 7 diesel mini ja 2016-03-19 00:00:00 0 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot $4,999 test kombi 2004 automatik 204 e_klasse 150,000km 10 diesel mercedes_benz nein 2016-04-02 00:00:00 0 47638 2016-04-02 12:45:44
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... privat Angebot $15,800 control bus 2010 automatik 136 c4 60,000km 4 diesel citroen nein 2016-03-21 00:00:00 0 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 privat Angebot $950 test kleinwagen 2001 manuell 50 lupo 150,000km 4 benzin volkswagen nein 2016-03-29 00:00:00 0 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. privat Angebot $3,300 control bus 2004 automatik 150 vito 150,000km 10 diesel mercedes_benz ja 2016-03-26 00:00:00 0 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor privat Angebot $6,000 control cabrio 2004 manuell 163 slk 150,000km 11 benzin mercedes_benz nein 2016-03-27 00:00:00 0 53567 2016-03-27 08:25:24
49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... privat Angebot $0 control cabrio 1983 manuell 70 golf 150,000km 2 benzin volkswagen nein 2016-03-20 00:00:00 0 8209 2016-03-27 19:48:16
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort privat Angebot $9,700 control kleinwagen 2012 automatik 88 jazz 100,000km 11 hybrid honda nein 2016-03-27 00:00:00 0 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... privat Angebot $5,900 test kombi 1992 automatik 150 80 150,000km 12 benzin audi nein 2016-03-19 00:00:00 0 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 privat Angebot $5,500 control limousine 2003 manuell 116 c_klasse 150,000km 2 diesel mercedes_benz nein 2016-03-31 00:00:00 0 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic privat Angebot $900 control limousine 1996 automatik 136 e_klasse 150,000km 9 benzin mercedes_benz ja 2016-04-04 00:00:00 0 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style privat Angebot $11,000 test kleinwagen 2011 manuell 90 polo 70,000km 11 diesel volkswagen nein 2016-03-20 00:00:00 0 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V privat Angebot $400 control kombi 1995 manuell 105 escort 125,000km 3 benzin ford NaN 2016-03-12 00:00:00 0 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung privat Angebot $2,000 control kombi 1998 manuell 115 astra 150,000km 12 benzin opel nein 2016-03-15 00:00:00 0 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm privat Angebot $1,950 control kleinwagen 2004 manuell 0 fabia 90,000km 7 benzin skoda NaN 2016-03-29 00:00:00 0 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 privat Angebot $600 test kleinwagen 1999 manuell 101 focus 150,000km 4 benzin ford NaN 2016-03-06 00:00:00 0 52477 2016-03-09 06:16:08
49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... privat Angebot $0 test NaN 2000 NaN 0 NaN 150,000km 0 NaN sonstige_autos NaN 2016-03-31 00:00:00 0 12103 2016-04-02 19:44:53
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! privat Angebot $1,000 control NaN 1995 automatik 0 NaN 150,000km 0 benzin volkswagen NaN 2016-04-02 00:00:00 0 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... privat Angebot $15,900 control limousine 2010 automatik 218 300c 125,000km 11 diesel chrysler nein 2016-04-04 00:00:00 0 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... privat Angebot $21,990 control limousine 2013 manuell 150 a3 50,000km 11 diesel audi nein 2016-03-22 00:00:00 0 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci privat Angebot $9,550 control coupe 2001 manuell 231 3er 150,000km 10 benzin bmw nein 2016-03-28 00:00:00 0 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau privat Angebot $150 test kleinwagen 1997 manuell 0 polo 150,000km 5 benzin volkswagen ja 2016-03-11 00:00:00 0 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban privat Angebot $17,500 test limousine 2012 manuell 156 a_klasse 30,000km 12 benzin mercedes_benz nein 2016-03-21 00:00:00 0 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot $500 control NaN 2016 manuell 0 twingo 150,000km 0 benzin renault NaN 2016-03-06 00:00:00 0 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport privat Angebot $4,800 control kleinwagen 2009 manuell 120 andere 125,000km 9 lpg fiat nein 2016-03-10 00:00:00 0 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug privat Angebot $1,650 control kleinwagen 1997 manuell 0 NaN 150,000km 7 benzin audi NaN 2016-03-15 00:00:00 0 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... privat Angebot $5,000 control kombi 2001 automatik 299 a6 150,000km 1 benzin audi nein 2016-03-22 00:00:00 0 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot $24,900 control limousine 2011 automatik 239 q5 100,000km 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot $1,980 control cabrio 1996 manuell 75 astra 150,000km 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot $13,200 test cabrio 2014 automatik 69 500 5,000km 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot $22,900 control kombi 2013 manuell 150 a3 40,000km 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot $1,250 control limousine 1996 manuell 101 vectra 150,000km 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

50000 rows × 20 columns

In [4]:
autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null object
dtypes: int64(5), object(15)
memory usage: 7.6+ MB

Here are what we observe:

  • The autos dataset contains 50,000 rows and 20 columns.
  • Most of the entries are string objects (15), whereas only several are integers (5).
  • Some columns have null values (5) but in low percentage.
  • The column names use camelcase instead of snakecase, which is the preferred case for Python.

We will transform the column names from camelcase to the preferred snakecase and modify some of the column names to make them more descriptive.

The original column names are as follows:

In [5]:
# Show the original column names
autos.columns
Out[5]:
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

We copy the array of the column names and edit it, and eventually assign the modified column names back to the autos.columns:

In [6]:
# Edit the column names
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen']

To verify whether the column names have been modified, we call autos.head() to examine the column names of the autos dataframe. Alternatively, we can also verify the column names by running autos.columns.

In [7]:
# Display the data of the first 5 rows
autos.head()
Out[7]:
date_crawled name seller offer_type price abtest vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

The output shows that the column names have been converted to snakecase and several names have been reworded.

Initial Exploration and Cleaning

Now, we will explore the data a little bit more to determine the necessary basic data cleaning tasks. We use autos.describe() with include='all' to examine the descriptive statistics for both categorical and numeric columns.

In [8]:
# Display the descriptive statistics
autos.describe(include='all')
Out[8]:
date_crawled name seller offer_type price abtest vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
count 50000 50000 50000 50000 50000 50000 44905 50000.000000 47320 50000.000000 47242 50000 50000.000000 45518 50000 40171 50000 50000.0 50000.000000 50000
unique 48213 38754 2 2 2357 2 8 NaN 2 NaN 245 13 NaN 7 40 2 76 NaN NaN 39481
top 2016-03-29 23:42:13 Ford_Fiesta privat Angebot $0 test limousine NaN manuell NaN golf 150,000km NaN benzin volkswagen nein 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27
freq 3 78 49999 49999 1421 25756 12859 NaN 36993 NaN 4024 32424 NaN 30107 10687 35232 1946 NaN NaN 8
mean NaN NaN NaN NaN NaN NaN NaN 2005.073280 NaN 116.355920 NaN NaN 5.723360 NaN NaN NaN NaN 0.0 50813.627300 NaN
std NaN NaN NaN NaN NaN NaN NaN 105.712813 NaN 209.216627 NaN NaN 3.711984 NaN NaN NaN NaN 0.0 25779.747957 NaN
min NaN NaN NaN NaN NaN NaN NaN 1000.000000 NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN 0.0 1067.000000 NaN
25% NaN NaN NaN NaN NaN NaN NaN 1999.000000 NaN 70.000000 NaN NaN 3.000000 NaN NaN NaN NaN 0.0 30451.000000 NaN
50% NaN NaN NaN NaN NaN NaN NaN 2003.000000 NaN 105.000000 NaN NaN 6.000000 NaN NaN NaN NaN 0.0 49577.000000 NaN
75% NaN NaN NaN NaN NaN NaN NaN 2008.000000 NaN 150.000000 NaN NaN 9.000000 NaN NaN NaN NaN 0.0 71540.000000 NaN
max NaN NaN NaN NaN NaN NaN NaN 9999.000000 NaN 17700.000000 NaN NaN 12.000000 NaN NaN NaN NaN 0.0 99998.000000 NaN

Cleaning the Non-Value-Added Columns

Based on the output, we notice that the nr_of_pictures column has only one value, which is 0.

In [9]:
# Show the unique value
autos['nr_of_pictures'].unique()
Out[9]:
array([0])

Since a single-value column would not add additional value to our data analysis, we will remove nr_of_pictures from our dataset.

In [10]:
# Remove `nr_of_pictures` from the dataset
autos.drop(columns=['nr_of_pictures'], inplace=True)

To verify whether the nr_of_pictures column has been removed, we did the following:

  • Run autos.shape: The number of columns has been reduced from 20 to 19.
  • Run autos.columns: The nr_of_pictures column has disappeared.

These results confirm that the nr_of_pictures column has been removed from the dataset.

In [11]:
# Show the number of rows and columns
autos.shape
Out[11]:
(50000, 19)
In [12]:
# Show the column names
autos.columns
Out[12]:
Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'postal_code', 'last_seen'],
      dtype='object')

We also observe that only one out of the 50,000 entries for the seller column is gewerblich (or commercial in English). The remaining are privat (or private in English). We can delete the entry for the gewerblich to focus on the entries of the privat dealers.

Same as the observation for the seller, the offer_type column shows 49,999 entries for Angebot (or Offer in English), while only one entry for Gesuch (or Request in English). Since we focus on the Angebot (Offer), we will delete the entry for the Gesuch to prevent it from messing up our data analysis.

First, we isolate the rows for the gewerblich (commercial) seller and Gesuch (Request) offer_type:

In [13]:
# Isolate `commercial` seller
seller_commercial = autos[autos['seller'] == 'gewerblich']

seller_commercial
Out[13]:
date_crawled name seller offer_type price abtest vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
7738 2016-03-15 18:06:22 Verkaufe_mehrere_Fahrzeuge_zum_Verschrotten gewerblich Angebot $100 control kombi 2000 manuell 0 megane 150,000km 8 benzin renault NaN 2016-03-15 00:00:00 65232 2016-04-06 17:15:37
In [14]:
# Isolate `Request` offer_type
offer_type_request = autos[autos['offer_type'] == 'Gesuch']

offer_type_request
Out[14]:
date_crawled name seller offer_type price abtest vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
17541 2016-04-03 15:48:33 Suche_VW_T5_Multivan privat Gesuch $0 test bus 2005 NaN 0 transporter 150,000km 0 NaN volkswagen NaN 2016-04-03 00:00:00 29690 2016-04-05 15:16:06

Next, we remove these two rows by using autos.drop and the indexes of these rows:

In [15]:
# Remove the rows of `commercial` seller and `Request` offer_type from our dataset
autos.drop(index=[7738, 17541], inplace=True)

The result from autos.shape shows that the number of rows has been reduced from 50,000 to 49,998. This indicates that two rows have been deleted from the dataset.

In [16]:
# Show the number of rows and columns
autos.shape
Out[16]:
(49998, 19)

To examine whether these two deleted rows are the rows that we wanted to delete, we call these rows by using their index labels. The results show that the indexes have been updated — it confirms that the two rows that we do not intend to keep have been deleted.

In [17]:
# Index 7738 was the original index for `commercial` seller row
autos.iloc[7738]
Out[17]:
date_crawled                                        2016-03-17 14:47:29
name                  ***_SMART_forTwo_cabrio_softouch_passion___sup...
seller                                                           privat
offer_type                                                      Angebot
price                                                           $10,999
abtest                                                          control
vehicle_type                                                     cabrio
registration_year                                                  2014
gearbox                                                       automatik
power_ps                                                              0
model                                                            fortwo
odometer                                                       20,000km
registration_month                                                    2
fuel_type                                                        benzin
brand                                                             smart
unrepaired_damage                                                  nein
ad_created                                          2016-03-17 00:00:00
postal_code                                                       67117
last_seen                                           2016-04-06 21:17:26
Name: 7739, dtype: object
In [18]:
# Index 17541 was the original index for Request` offer_type row
autos.iloc[17541]
Out[18]:
date_crawled                            2016-03-11 16:45:06
name                  Seat_Ibiza_1.9_TDI_Sport_SELTEN_!_TÜV
seller                                               privat
offer_type                                          Angebot
price                                                $1,950
abtest                                                 test
vehicle_type                                     kleinwagen
registration_year                                      2002
gearbox                                             manuell
power_ps                                                110
model                                                 ibiza
odometer                                          150,000km
registration_month                                        1
fuel_type                                            diesel
brand                                                  seat
unrepaired_damage                                      nein
ad_created                              2016-03-11 00:00:00
postal_code                                           46284
last_seen                               2016-03-27 18:17:02
Name: 17543, dtype: object

Converting Numeric Data Stored as String Object to Integer Datatype

We notice that this numeric data stored as string objects, and we need to convert them to integers for analysis purpose:

  • price
  • odometer
In [19]:
autos.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49998 entries, 0 to 49999
Data columns (total 19 columns):
date_crawled          49998 non-null object
name                  49998 non-null object
seller                49998 non-null object
offer_type            49998 non-null object
price                 49998 non-null object
abtest                49998 non-null object
vehicle_type          44903 non-null object
registration_year     49998 non-null int64
gearbox               47319 non-null object
power_ps              49998 non-null int64
model                 47240 non-null object
odometer              49998 non-null object
registration_month    49998 non-null int64
fuel_type             45517 non-null object
brand                 49998 non-null object
unrepaired_damage     40171 non-null object
ad_created            49998 non-null object
postal_code           49998 non-null int64
last_seen             49998 non-null object
dtypes: int64(4), object(15)
memory usage: 7.6+ MB

We build string_to_integer function to convert columns with the numeric data stored as string to integer datatype. It also removes unit and , (thousands separator ) from the numeric data.

In [20]:
# `column` is the name of the column; `unit` is the non-digit character to be removed from the value
def string_to_integer(column, unit):
    autos[column] = autos[column].str.replace(unit, '').str.replace(',', '').astype(int)
    return autos[column]

Next, we use string_to_integer to convert price and odometer to integer and verify the conversion by printing their datatypes.

In [21]:
# Convert the `price` from string object to integer
string_to_integer('price', '$')

# Convert the `odometer` from string object to integer
string_to_integer('odometer', 'km')

# Check their dtypes after conversion
# Alternatively, we can also use `autos.info()`
print('The dtype for:')
print('    - price:      ', autos['price'].dtype)
print('    - odometer:   ', autos['odometer'].dtype)
The dtype for:
    - price:       int64
    - odometer:    int64

We also update the column names by including their corresponding units — an important piece of information to keep.

In [22]:
# Rename the columns for `price` and `odometer`
autos.rename({'price': 'price_$', 'odometer' : 'odometer_km'}, axis=1, inplace=True)

Next, we confirm that all the changes have been made on the dataset by examining autos.csv.

In [23]:
# Display the data
autos
Out[23]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot 5000 control bus 2004 manuell 158 andere 150000 3 lpg peugeot nein 2016-03-26 00:00:00 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot 8500 control limousine 1997 automatik 286 7er 150000 6 benzin bmw nein 2016-04-04 00:00:00 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot 8990 test limousine 2009 manuell 102 golf 70000 7 benzin volkswagen nein 2016-03-26 00:00:00 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot 4350 control kleinwagen 2007 automatik 71 fortwo 70000 6 benzin smart nein 2016-03-12 00:00:00 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot 1350 test kombi 2003 manuell 0 focus 150000 7 benzin ford nein 2016-04-01 00:00:00 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot 7900 test bus 2006 automatik 150 voyager 150000 4 diesel chrysler NaN 2016-03-21 00:00:00 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot 300 test limousine 1995 manuell 90 golf 150000 8 benzin volkswagen NaN 2016-03-20 00:00:00 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot 1990 control limousine 1998 manuell 90 golf 150000 12 diesel volkswagen nein 2016-03-16 00:00:00 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot 250 test NaN 2000 manuell 0 arosa 150000 10 NaN seat nein 2016-03-22 00:00:00 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot 590 control bus 1997 manuell 90 megane 150000 7 benzin renault nein 2016-03-16 00:00:00 15749 2016-04-06 10:46:35
10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau privat Angebot 999 test NaN 2017 manuell 90 NaN 150000 4 benzin volkswagen nein 2016-03-14 00:00:00 86157 2016-04-07 03:16:21
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot 350 control NaN 2000 NaN 0 NaN 150000 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot 5299 control kleinwagen 2010 automatik 71 fortwo 50000 9 benzin smart nein 2016-03-31 00:00:00 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot 1350 control limousine 1999 manuell 101 a3 150000 11 benzin audi nein 2016-03-23 00:00:00 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot 3999 test kleinwagen 2007 manuell 75 clio 150000 9 benzin renault NaN 2016-03-23 00:00:00 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot 18900 test coupe 1982 automatik 203 NaN 80000 6 benzin sonstige_autos nein 2016-04-01 00:00:00 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot 350 test kombi 1999 manuell 101 vectra 150000 5 benzin opel nein 2016-03-16 00:00:00 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot 5500 test coupe 1990 manuell 205 scirocco 150000 6 benzin volkswagen nein 2016-03-29 00:00:00 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot 300 control bus 1995 manuell 150 3er 150000 0 benzin bmw NaN 2016-03-26 00:00:00 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot 4150 control suv 2004 manuell 124 andere 150000 2 lpg mazda nein 2016-03-17 00:00:00 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot 3500 test kombi 2003 manuell 131 a4 150000 5 diesel audi NaN 2016-03-05 00:00:00 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot 41500 test cabrio 2004 manuell 320 911 150000 4 benzin porsche nein 2016-03-06 00:00:00 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot 25450 control cabrio 2015 manuell 184 cooper 10000 1 benzin mini nein 2016-03-28 00:00:00 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot 7999 control bus 2010 manuell 120 NaN 150000 2 diesel peugeot nein 2016-03-10 00:00:00 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot 48500 control limousine 2014 automatik 306 5er 30000 12 benzin bmw nein 2016-04-03 00:00:00 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung privat Angebot 90 control kombi 1996 manuell 116 NaN 150000 4 benzin ford ja 2016-03-21 00:00:00 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot 777 control kleinwagen 1992 manuell 54 polo 125000 2 benzin volkswagen nein 2016-04-03 00:00:00 38110 2016-04-05 23:46:48
27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE privat Angebot 0 control NaN 2005 NaN 0 NaN 150000 0 NaN ford NaN 2016-03-27 00:00:00 66701 2016-03-27 18:45:01
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot 5250 control kleinwagen 2007 manuell 110 cooper 150000 7 diesel mini ja 2016-03-19 00:00:00 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot 4999 test kombi 2004 automatik 204 e_klasse 150000 10 diesel mercedes_benz nein 2016-04-02 00:00:00 47638 2016-04-02 12:45:44
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... privat Angebot 15800 control bus 2010 automatik 136 c4 60000 4 diesel citroen nein 2016-03-21 00:00:00 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 privat Angebot 950 test kleinwagen 2001 manuell 50 lupo 150000 4 benzin volkswagen nein 2016-03-29 00:00:00 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. privat Angebot 3300 control bus 2004 automatik 150 vito 150000 10 diesel mercedes_benz ja 2016-03-26 00:00:00 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor privat Angebot 6000 control cabrio 2004 manuell 163 slk 150000 11 benzin mercedes_benz nein 2016-03-27 00:00:00 53567 2016-03-27 08:25:24
49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... privat Angebot 0 control cabrio 1983 manuell 70 golf 150000 2 benzin volkswagen nein 2016-03-20 00:00:00 8209 2016-03-27 19:48:16
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort privat Angebot 9700 control kleinwagen 2012 automatik 88 jazz 100000 11 hybrid honda nein 2016-03-27 00:00:00 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... privat Angebot 5900 test kombi 1992 automatik 150 80 150000 12 benzin audi nein 2016-03-19 00:00:00 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 privat Angebot 5500 control limousine 2003 manuell 116 c_klasse 150000 2 diesel mercedes_benz nein 2016-03-31 00:00:00 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic privat Angebot 900 control limousine 1996 automatik 136 e_klasse 150000 9 benzin mercedes_benz ja 2016-04-04 00:00:00 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style privat Angebot 11000 test kleinwagen 2011 manuell 90 polo 70000 11 diesel volkswagen nein 2016-03-20 00:00:00 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V privat Angebot 400 control kombi 1995 manuell 105 escort 125000 3 benzin ford NaN 2016-03-12 00:00:00 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung privat Angebot 2000 control kombi 1998 manuell 115 astra 150000 12 benzin opel nein 2016-03-15 00:00:00 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm privat Angebot 1950 control kleinwagen 2004 manuell 0 fabia 90000 7 benzin skoda NaN 2016-03-29 00:00:00 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 privat Angebot 600 test kleinwagen 1999 manuell 101 focus 150000 4 benzin ford NaN 2016-03-06 00:00:00 52477 2016-03-09 06:16:08
49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... privat Angebot 0 test NaN 2000 NaN 0 NaN 150000 0 NaN sonstige_autos NaN 2016-03-31 00:00:00 12103 2016-04-02 19:44:53
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! privat Angebot 1000 control NaN 1995 automatik 0 NaN 150000 0 benzin volkswagen NaN 2016-04-02 00:00:00 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... privat Angebot 15900 control limousine 2010 automatik 218 300c 125000 11 diesel chrysler nein 2016-04-04 00:00:00 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... privat Angebot 21990 control limousine 2013 manuell 150 a3 50000 11 diesel audi nein 2016-03-22 00:00:00 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci privat Angebot 9550 control coupe 2001 manuell 231 3er 150000 10 benzin bmw nein 2016-03-28 00:00:00 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau privat Angebot 150 test kleinwagen 1997 manuell 0 polo 150000 5 benzin volkswagen ja 2016-03-11 00:00:00 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban privat Angebot 17500 test limousine 2012 manuell 156 a_klasse 30000 12 benzin mercedes_benz nein 2016-03-21 00:00:00 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot 500 control NaN 2016 manuell 0 twingo 150000 0 benzin renault NaN 2016-03-06 00:00:00 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport privat Angebot 4800 control kleinwagen 2009 manuell 120 andere 125000 9 lpg fiat nein 2016-03-10 00:00:00 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug privat Angebot 1650 control kleinwagen 1997 manuell 0 NaN 150000 7 benzin audi NaN 2016-03-15 00:00:00 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... privat Angebot 5000 control kombi 2001 automatik 299 a6 150000 1 benzin audi nein 2016-03-22 00:00:00 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot 24900 control limousine 2011 automatik 239 q5 100000 1 diesel audi nein 2016-03-27 00:00:00 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot 1980 control cabrio 1996 manuell 75 astra 150000 5 benzin opel nein 2016-03-28 00:00:00 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot 13200 test cabrio 2014 automatik 69 500 5000 11 benzin fiat nein 2016-04-02 00:00:00 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot 22900 control kombi 2013 manuell 150 a3 40000 11 diesel audi nein 2016-03-08 00:00:00 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot 1250 control limousine 1996 manuell 101 vectra 150000 1 benzin opel nein 2016-03-13 00:00:00 45897 2016-04-06 21:18:48

49998 rows × 19 columns

Exploring the Price Column

After the initial data cleaning, we explore the data in detailed — specifically examine for data that is illogical. We start by analyzing the price_$ column.

In [24]:
# Display the descriptive statistics
autos['price_$'].describe()
Out[24]:
count    4.999800e+04
mean     9.840435e+03
std      4.811140e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price_$, dtype: float64

The statistic information above is displayed in scientific notation, which is not so friendly to read in the price context. Therefore, we will change the display format to floats with one decimal place and include a thousand separator (,) by using pd.options.display.float_format method.

In [25]:
# Change the format to show floats with 1 decimal place and include a thousand separator
pd.options.display.float_format = '{:,.1f}'.format

# Use the following to reset the format
# pd.reset_option('^display.', silent=True)

After adjusting the format, we reprint the descriptive statistics:

In [26]:
# Display the descriptive statistics with updated format
autos['price_$'].describe()
Out[26]:
count       49,998.0
mean         9,840.4
std        481,114.0
min              0.0
25%          1,100.0
50%          2,950.0
75%          7,200.0
max     99,999,999.0
Name: price_$, dtype: float64
In [27]:
print('The number of unique values:   ', autos['price_$'].unique().shape[0])
The number of unique values:    2357

Outliers in the Minumum Value

We notice that 0 for the min price of listed cars is unreasonable, as it is basically FREE!

Surprisingly, the frequency for 0 is very high — 1420 entries — therefore, they do not seem to be entry errors.

In [28]:
# Examine the value counts of `price_$` and sort their index in a descending manner
autos['price_$'].value_counts().sort_index().head(10)
Out[28]:
0     1420
1      156
2        3
3        1
5        2
8        1
9        1
10       7
11       2
12       3
Name: price_$, dtype: int64

We extract the details of these entries and give it a variable name min_price for further investigation.

In [29]:
min_price = autos[autos['price_$'] == 0]

min_price
Out[29]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE privat Angebot 0 control NaN 2005 NaN 0 NaN 150000 0 NaN ford NaN 2016-03-27 00:00:00 66701 2016-03-27 18:45:01
71 2016-03-28 19:39:35 Suche_Opel_Astra_F__Corsa_oder_Kadett_E_mit_Re... privat Angebot 0 control NaN 1990 manuell 0 NaN 5000 0 benzin opel NaN 2016-03-28 00:00:00 4552 2016-04-07 01:45:48
80 2016-03-09 15:57:57 Nissan_Primera_Hatchback_1_6_16v_73_Kw___99Ps_... privat Angebot 0 control coupe 1999 manuell 99 primera 150000 3 benzin nissan ja 2016-03-09 00:00:00 66903 2016-03-09 16:43:50
87 2016-03-29 23:37:22 Bmw_520_e39_zum_ausschlachten privat Angebot 0 control NaN 2000 NaN 0 5er 150000 0 NaN bmw NaN 2016-03-29 00:00:00 82256 2016-04-06 21:18:15
99 2016-04-05 09:48:54 Peugeot_207_CC___Cabrio_Bj_2011 privat Angebot 0 control cabrio 2011 manuell 0 2_reihe 60000 7 diesel peugeot nein 2016-04-05 00:00:00 99735 2016-04-07 12:17:34
118 2016-03-12 05:03:00 VW_Sharan_V6_204_PS_Karosse_Rohkarosse_mit_Pap... privat Angebot 0 control bus 2001 manuell 204 sharan 150000 7 benzin volkswagen ja 2016-03-12 00:00:00 15370 2016-03-12 21:44:23
146 2016-03-22 23:59:28 Ford_Fiesta_rot privat Angebot 0 test kleinwagen 1996 manuell 75 fiesta 20000 8 benzin ford NaN 2016-03-22 00:00:00 63069 2016-04-01 20:16:38
167 2016-04-02 19:43:45 Suche_VW_Multivan_Innenausstattung_Set_oder_TE... privat Angebot 0 control NaN 2011 NaN 0 transporter 5000 0 NaN volkswagen NaN 2016-04-02 00:00:00 64739 2016-04-06 19:45:08
180 2016-03-19 10:50:25 Zu_verkaufen privat Angebot 0 test NaN 2016 manuell 98 3_reihe 150000 12 benzin mazda ja 2016-03-19 00:00:00 30966 2016-03-24 03:17:21
226 2016-03-25 23:52:12 Porsche_911_S_Targa__67er_SWB privat Angebot 0 control cabrio 1967 manuell 160 911 5000 12 benzin porsche nein 2016-03-25 00:00:00 44575 2016-04-05 14:46:39
234 2016-03-05 16:56:22 Fiat_Punto_Sport privat Angebot 0 control NaN 2016 NaN 0 punto 150000 0 NaN fiat NaN 2016-03-05 00:00:00 44789 2016-03-17 22:47:09
248 2016-03-08 22:57:53 VW_Passat_zu_verkaufen privat Angebot 0 test kombi 1998 manuell 0 passat 150000 10 benzin volkswagen ja 2016-03-08 00:00:00 34369 2016-03-29 16:16:18
259 2016-04-03 23:49:58 guenstiges_Auto_/_auch_defekt privat Angebot 0 control NaN 2000 NaN 0 NaN 5000 6 NaN sonstige_autos NaN 2016-04-03 00:00:00 89269 2016-04-06 07:16:22
301 2016-03-08 20:37:59 Kaufe_alle_Autos_bietet_an privat Angebot 0 control NaN 1990 NaN 0 NaN 5000 0 NaN sonstige_autos NaN 2016-03-08 00:00:00 13589 2016-04-05 18:44:43
323 2016-03-15 16:54:59 W220_Unfallfahrzeug privat Angebot 0 control limousine 2001 automatik 224 s_klasse 150000 8 NaN mercedes_benz NaN 2016-03-15 00:00:00 45475 2016-04-06 12:44:20
327 2016-04-03 13:49:21 BMW_Cabrio_318i_/_BMW_318_tds_Touring privat Angebot 0 control NaN 2016 manuell 90 3er 150000 2 diesel bmw NaN 2016-04-03 00:00:00 39112 2016-04-05 12:46:23
347 2016-04-02 11:48:02 Volvo_TP21__Sugga privat Angebot 0 test suv 1956 NaN 0 andere 100000 0 NaN volvo NaN 2016-04-02 00:00:00 34414 2016-04-04 09:15:24
366 2016-03-21 20:36:25 Ford_Escort_Express_75__LKW_Zulussung___Diesel... privat Angebot 0 control NaN 2000 NaN 0 andere 90000 0 diesel ford NaN 2016-03-21 00:00:00 3130 2016-03-28 19:16:45
389 2016-03-08 11:38:52 Vw_polo_classic privat Angebot 0 control limousine 1999 manuell 75 NaN 150000 0 NaN volkswagen NaN 2016-03-08 00:00:00 14476 2016-03-14 09:16:30
418 2016-03-29 14:43:24 Fiat_SCUDO_8_Sitzer_Bus__Diesel_JTD__80_KW privat Angebot 0 test bus 2003 manuell 80 andere 5000 5 diesel fiat nein 2016-03-29 00:00:00 35315 2016-04-05 23:47:14
427 2016-03-12 11:44:47 TOP_ZUSTAND_GrunePlakette_Euro4!!! privat Angebot 0 test bus 2005 manuell 101 vivaro 80000 8 diesel opel nein 2016-03-12 00:00:00 93173 2016-03-12 11:44:47
430 2016-03-18 23:52:40 Winterraeder_FORD privat Angebot 0 test NaN 2007 NaN 0 focus 5000 0 NaN ford NaN 2016-03-18 00:00:00 40549 2016-03-19 06:47:06
449 2016-04-03 20:59:59 Tausche_gegen_ein_smart. privat Angebot 0 test limousine 2001 manuell 120 golf 150000 8 benzin volkswagen nein 2016-04-03 00:00:00 60439 2016-04-04 10:31:06
487 2016-03-14 21:41:36 MAG_MICH_KEINER_HABEN_Opel_Omega_B_Caravan_2.0... privat Angebot 0 test kombi 1998 manuell 136 omega 150000 8 benzin opel ja 2016-03-14 00:00:00 59320 2016-03-17 11:48:08
502 2016-03-25 23:55:24 C32_AMG_C55_C43_w203 privat Angebot 0 control kombi 2001 NaN 354 c_klasse 150000 0 benzin mercedes_benz NaN 2016-03-25 00:00:00 79112 2016-03-26 08:40:35
519 2016-03-28 08:53:58 Verkaufe/Tausche_Seat_Inca__VW_Caddy__aus_2002... privat Angebot 0 test andere 2002 manuell 64 andere 150000 2 diesel seat nein 2016-03-28 00:00:00 12627 2016-03-28 10:41:01
565 2016-03-20 20:49:36 Tausche_Audi_S3_8p_gegen_Golf_VI_Gti_mit_xenon privat Angebot 0 test NaN 2016 manuell 330 NaN 150000 9 NaN audi nein 2016-03-20 00:00:00 97488 2016-03-22 00:44:53
593 2016-04-01 21:44:15 CLK_200__BRABUS_Optik privat Angebot 0 test coupe 1998 manuell 136 clk 150000 1 benzin mercedes_benz NaN 2016-04-01 00:00:00 49191 2016-04-05 19:16:21
660 2016-03-29 21:44:40 Opel_Corsa_B privat Angebot 0 test kleinwagen 1998 manuell 54 corsa 125000 11 benzin opel ja 2016-03-29 00:00:00 24214 2016-04-06 07:46:43
661 2016-03-18 22:49:48 Renault_Megane_Scenic_1.6_Tuev_12/17 privat Angebot 0 test bus 1999 manuell 0 megane 150000 8 benzin renault NaN 2016-03-18 00:00:00 66346 2016-03-29 16:47:09
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
48882 2016-03-14 23:58:47 SUCHE_AUTO privat Angebot 0 control NaN 2000 NaN 0 NaN 150000 0 NaN sonstige_autos NaN 2016-03-14 00:00:00 97753 2016-03-30 23:47:43
48889 2016-04-01 10:50:52 Audi_A4_____TueV_09.2016 privat Angebot 0 control limousine 1995 manuell 101 a4 150000 2 benzin audi nein 2016-04-01 00:00:00 86869 2016-04-07 07:46:02
48891 2016-03-20 22:50:14 MB_E_220_CDI__TAUSCHEN_nur_SUV/GELÄNDEWAGEN privat Angebot 0 test kombi 2001 automatik 143 e_klasse 150000 12 diesel mercedes_benz ja 2016-03-20 00:00:00 38442 2016-03-24 18:16:00
48982 2016-03-08 12:48:33 Bmw_525D_Sport_Pro..Navi privat Angebot 0 test limousine 2010 automatik 204 5er 150000 10 diesel bmw NaN 2016-03-08 00:00:00 47805 2016-03-11 16:15:39
48985 2016-03-27 08:55:25 Stockcar_Autos_Peugeot privat Angebot 0 test NaN 2007 NaN 60 NaN 5000 0 NaN peugeot NaN 2016-03-27 00:00:00 17153 2016-04-07 05:17:18
49043 2016-03-07 11:48:55 Opel_Vectra_B_Caravan privat Angebot 0 test kombi 1997 manuell 101 NaN 150000 0 benzin opel NaN 2016-03-07 00:00:00 48429 2016-03-17 21:20:07
49082 2016-03-13 17:06:58 Passat_35i_b4_in_candy_weiss_original_lack privat Angebot 0 test limousine 1996 manuell 90 NaN 150000 6 NaN volkswagen nein 2016-03-13 00:00:00 35799 2016-04-04 06:45:30
49254 2016-03-21 10:49:57 Corrado_G60_mit_2e_Motor privat Angebot 0 test NaN 1995 manuell 0 NaN 150000 0 benzin volkswagen NaN 2016-03-21 00:00:00 53639 2016-03-22 15:18:35
49301 2016-03-20 20:52:40 Golf_3_Variant_Kombi_1.8 privat Angebot 0 test NaN 2000 NaN 0 NaN 150000 0 NaN volkswagen NaN 2016-03-20 00:00:00 4924 2016-03-30 15:17:15
49342 2016-03-09 20:47:01 Audi_80_b4_S_Color_und_BMW_318ti_compact_Tausc... privat Angebot 0 test NaN 2017 manuell 90 80 150000 10 benzin audi ja 2016-03-09 00:00:00 25853 2016-03-11 17:45:36
49350 2016-03-19 23:57:15 Golf_II_Diesel privat Angebot 0 control kleinwagen 1991 manuell 54 golf 150000 3 diesel volkswagen nein 2016-03-19 00:00:00 29664 2016-04-06 03:15:23
49404 2016-04-04 03:03:16 Golf_2_BBS_LEDER_KW privat Angebot 0 control limousine 1989 manuell 90 NaN 150000 3 NaN volkswagen nein 2016-04-04 00:00:00 52062 2016-04-06 08:45:25
49408 2016-03-30 20:57:06 OPEL_ASTRA privat Angebot 0 control NaN 1995 manuell 60 astra 150000 0 benzin opel ja 2016-03-30 00:00:00 27711 2016-04-05 11:46:06
49479 2016-04-01 23:55:01 Tausch_mit_frischen_TÜV privat Angebot 0 control limousine 1995 manuell 101 a4 150000 5 benzin audi nein 2016-04-01 00:00:00 27753 2016-04-06 05:16:07
49496 2016-03-26 13:55:28 Bmw_e39_520 privat Angebot 0 control limousine 1998 manuell 0 NaN 5000 0 NaN bmw NaN 2016-03-26 00:00:00 26188 2016-03-26 13:55:28
49504 2016-03-13 19:42:57 Meriva_Lpg_1_6_16v privat Angebot 0 control bus 2004 manuell 101 meriva 150000 10 lpg opel ja 2016-03-13 00:00:00 27432 2016-03-19 13:16:57
49507 2016-03-16 17:38:32 BMW_528i_E39_Schaltgetriebe_Unfall privat Angebot 0 control limousine 1998 manuell 193 5er 150000 0 NaN bmw ja 2016-03-16 00:00:00 25541 2016-03-30 08:47:38
49525 2016-03-09 17:59:02 Bmw_e36_318_i_Limo_Ringtool_PS_Tuning privat Angebot 0 test NaN 2000 NaN 160 3er 150000 0 NaN bmw NaN 2016-03-09 00:00:00 66822 2016-03-15 04:45:26
49538 2016-03-14 10:55:27 Golf_3_mit_Tuev_.....PREISWERT!!!! privat Angebot 0 control limousine 1998 manuell 60 golf 150000 0 benzin volkswagen nein 2016-03-14 00:00:00 15345 2016-03-14 10:55:27
49551 2016-03-16 23:40:18 Golf_IV_an_Bastler_zuverkaufen__reserviert_ privat Angebot 0 test NaN 2016 manuell 0 NaN 150000 6 diesel volkswagen nein 2016-03-16 00:00:00 4275 2016-03-16 23:40:18
49614 2016-03-09 17:49:39 Verkauf_meine__Audi__A3_.Gegen_tauschen_mit_Di... privat Angebot 0 test NaN 2016 manuell 101 a3 150000 0 benzin audi nein 2016-03-09 00:00:00 47051 2016-03-10 14:15:53
49739 2016-03-17 15:47:34 Reparaturfaelligen_twingo privat Angebot 0 test NaN 2016 manuell 58 twingo 150000 0 benzin renault nein 2016-03-17 00:00:00 27412 2016-03-17 15:47:34
49755 2016-04-02 13:46:23 Mercedes_Benz_w202_unfaller privat Angebot 0 test limousine 1998 manuell 136 c_klasse 150000 7 benzin mercedes_benz NaN 2016-04-02 00:00:00 99441 2016-04-02 13:46:23
49793 2016-03-05 14:58:49 verkaufe_BMW_E_39 privat Angebot 0 control kombi 2000 automatik 183 5er 150000 1 benzin bmw NaN 2016-03-05 00:00:00 13125 2016-03-05 18:47:14
49880 2016-03-30 08:52:57 E39_528i_an_Bastler privat Angebot 0 control NaN 2017 manuell 193 5er 150000 4 NaN bmw ja 2016-03-30 00:00:00 65468 2016-04-07 01:15:27
49884 2016-03-11 13:55:30 Audi_a6_2.5l__Schnaeppchen_nur_heute privat Angebot 0 test kombi 1999 manuell 150 a6 150000 11 diesel audi NaN 2016-03-11 00:00:00 27711 2016-03-12 03:17:08
49943 2016-03-16 20:46:08 Opel_astra privat Angebot 0 control NaN 2016 manuell 101 astra 150000 8 benzin opel NaN 2016-03-16 00:00:00 89134 2016-03-17 19:44:20
49960 2016-03-25 22:51:55 Ford_KA_zu_verschenken_***Reserviert*** privat Angebot 0 control kleinwagen 1999 manuell 60 ka 150000 6 benzin ford NaN 2016-03-25 00:00:00 34355 2016-03-25 22:51:55
49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... privat Angebot 0 control cabrio 1983 manuell 70 golf 150000 2 benzin volkswagen nein 2016-03-20 00:00:00 8209 2016-03-27 19:48:16
49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... privat Angebot 0 test NaN 2000 NaN 0 NaN 150000 0 NaN sonstige_autos NaN 2016-03-31 00:00:00 12103 2016-04-02 19:44:53

1420 rows × 19 columns

When we analyze the number of counts of the registration_year of the cars under min_price, we found that some of the cars are very new, for example, 96 of them were registered in 2016. Thus, it is unreasonable to us that the asking price is only 0.

Interestingly, there are also 60 cars registered in 2017, which is illogical, as the data was crawled in 2016. Because of that, we need to delete the entries with registration after 2016.

In [30]:
min_price['registration_year'].value_counts().head(10)
Out[30]:
2000    198
1999    101
2016     96
1998     90
1995     85
2005     78
1997     77
1996     71
2001     66
2017     60
Name: registration_year, dtype: int64

To investigate whether those 'free' cars belong to cheaper or unknown brands, we analyze their brands.

In [31]:
min_price['brand'].value_counts().head(10)
Out[31]:
volkswagen        347
opel              183
bmw               154
audi              115
ford               95
mercedes_benz      81
renault            79
sonstige_autos     72
fiat               45
peugeot            26
Name: brand, dtype: int64

Given that the top 10 brands of the 'free' cars include many luxurious brands (e.g. BMW, Audi and Mercedes-Benz), as well as decent/well-known brands (e.g. Volkswagen, Ford, Renault, etc.), it is bizarre that their asking price is only 0.

eBay is an auction site, which means that in theory, a seller could propose the lowest asking price at 0. However, based on the several points that we just mentioned, 0 is still an unrealistically low asking price, especially for new (or not too old) cars and decent brands. Therefore, we will remove their entries from our dataset.

Outliers in the Maximum Value

In [32]:
autos['price_$'].describe()
Out[32]:
count       49,998.0
mean         9,840.4
std        481,114.0
min              0.0
25%          1,100.0
50%          2,950.0
75%          7,200.0
max     99,999,999.0
Name: price_$, dtype: float64

Based on the descriptive statistics for price_$, the maximum value is 99,999,999, which far exceeds the 7,200 of the third quartile (Q3 or 75% of the data). Considering that this is a very old car registered in 1999, the asking price is too high, and it sounds illogical. Therefore, this entry is definitely an outlier that needs to be removed.

In [33]:
autos[autos['price_$'] == 99999999]
Out[33]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
39705 2016-03-22 14:58:27 Tausch_gegen_gleichwertiges privat Angebot 99999999 control limousine 1999 automatik 224 s_klasse 150000 9 benzin mercedes_benz NaN 2016-03-22 00:00:00 73525 2016-04-06 05:15:30

To find out more about other potential outliers at the upper end of the price, we investigate the data of the top 20 most expensive cars.

In [34]:
autos['price_$'].value_counts().sort_index(ascending=False).head(20)
Out[34]:
99999999    1
27322222    1
12345678    3
11111111    2
10000000    1
3890000     1
1300000     1
1234566     1
999999      2
999990      1
350000      1
345000      1
299000      1
295000      1
265000      1
259000      1
250000      1
220000      1
198000      1
197000      1
Name: price_$, dtype: int64
In [35]:
# Based on the sorting result, the 20th most expensive car costs $197,000
# Call for the details of the entries of the top 20 most expensive cars
max_price = autos[autos['price_$'] >= 197000].sort_values('price_$', ascending=False)

max_price
Out[35]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
39705 2016-03-22 14:58:27 Tausch_gegen_gleichwertiges privat Angebot 99999999 control limousine 1999 automatik 224 s_klasse 150000 9 benzin mercedes_benz NaN 2016-03-22 00:00:00 73525 2016-04-06 05:15:30
42221 2016-03-08 20:39:05 Leasinguebernahme privat Angebot 27322222 control limousine 2014 manuell 163 c4 40000 2 diesel citroen NaN 2016-03-08 00:00:00 76532 2016-03-08 20:39:05
47598 2016-03-31 18:56:54 Opel_Vectra_B_1_6i_16V_Facelift_Tuning_Showcar... privat Angebot 12345678 control limousine 2001 manuell 101 vectra 150000 3 benzin opel nein 2016-03-31 00:00:00 4356 2016-03-31 18:56:54
27371 2016-03-09 15:45:47 Fiat_Punto privat Angebot 12345678 control NaN 2017 NaN 95 punto 150000 0 NaN fiat NaN 2016-03-09 00:00:00 96110 2016-03-09 15:45:47
39377 2016-03-08 23:53:51 Tausche_volvo_v40_gegen_van privat Angebot 12345678 control NaN 2018 manuell 95 v40 150000 6 NaN volvo nein 2016-03-08 00:00:00 14542 2016-04-06 23:17:31
2897 2016-03-12 21:50:57 Escort_MK_1_Hundeknochen_zum_umbauen_auf_RS_2000 privat Angebot 11111111 test limousine 1973 manuell 48 escort 50000 3 benzin ford nein 2016-03-12 00:00:00 94469 2016-03-12 22:45:27
24384 2016-03-21 13:57:51 Schlachte_Golf_3_gt_tdi privat Angebot 11111111 test NaN 1995 NaN 0 NaN 150000 0 NaN volkswagen NaN 2016-03-21 00:00:00 18519 2016-03-21 14:40:18
11137 2016-03-29 23:52:57 suche_maserati_3200_gt_Zustand_unwichtig_laufe... privat Angebot 10000000 control coupe 1960 manuell 368 NaN 100000 1 benzin sonstige_autos nein 2016-03-29 00:00:00 73033 2016-04-06 21:18:11
47634 2016-04-04 21:25:21 Ferrari_FXX privat Angebot 3890000 test coupe 2006 NaN 799 NaN 5000 7 NaN sonstige_autos nein 2016-04-04 00:00:00 60313 2016-04-05 12:07:37
7814 2016-04-04 11:53:31 Ferrari_F40 privat Angebot 1300000 control coupe 1992 NaN 0 NaN 50000 12 NaN sonstige_autos nein 2016-04-04 00:00:00 60598 2016-04-05 11:34:11
22947 2016-03-22 12:54:19 Bmw_530d_zum_ausschlachten privat Angebot 1234566 control kombi 1999 automatik 190 NaN 150000 2 diesel bmw NaN 2016-03-22 00:00:00 17454 2016-04-02 03:17:32
43049 2016-03-21 19:53:52 2_VW_Busse_T3 privat Angebot 999999 test bus 1981 manuell 70 transporter 150000 1 benzin volkswagen NaN 2016-03-21 00:00:00 99880 2016-03-28 17:18:28
514 2016-03-17 09:53:08 Ford_Focus_Turnier_1.6_16V_Style privat Angebot 999999 test kombi 2009 manuell 101 focus 125000 4 benzin ford nein 2016-03-17 00:00:00 12205 2016-04-06 07:17:35
37585 2016-03-29 11:38:54 Volkswagen_Jetta_GT privat Angebot 999990 test limousine 1985 manuell 111 jetta 150000 12 benzin volkswagen ja 2016-03-29 00:00:00 50997 2016-03-29 11:38:54
36818 2016-03-27 18:37:37 Porsche_991 privat Angebot 350000 control coupe 2016 manuell 500 911 5000 3 benzin porsche nein 2016-03-27 00:00:00 70499 2016-03-27 18:37:37
14715 2016-03-30 08:37:24 Rolls_Royce_Phantom_Drophead_Coupe privat Angebot 345000 control cabrio 2012 automatik 460 NaN 20000 8 benzin sonstige_autos nein 2016-03-30 00:00:00 73525 2016-04-07 00:16:26
34723 2016-03-23 16:37:29 Porsche_Porsche_911/930_Turbo_3.0__deutsche_Au... privat Angebot 299000 test coupe 1977 manuell 260 911 100000 7 benzin porsche nein 2016-03-23 00:00:00 61462 2016-04-06 16:44:50
35923 2016-04-03 07:56:23 Porsche_911_Targa_Exclusive_Edition__1_von_15_... privat Angebot 295000 test cabrio 2015 automatik 400 911 5000 6 benzin porsche nein 2016-04-03 00:00:00 74078 2016-04-03 08:56:20
12682 2016-03-28 22:48:01 Porsche_GT3_RS__PCCB__Lift___grosser_Exklusiv_... privat Angebot 265000 control coupe 2016 automatik 500 911 5000 3 benzin porsche nein 2016-03-28 00:00:00 70193 2016-04-05 03:44:51
47337 2016-04-05 10:25:38 BMW_Z8_roadster privat Angebot 259000 test cabrio 2001 manuell 400 z_reihe 20000 6 benzin bmw nein 2016-04-05 00:00:00 61462 2016-04-05 12:07:32
38299 2016-03-28 22:25:25 Glas_BMW_mit_Wasser privat Angebot 250000 test NaN 2015 NaN 0 x_reihe 5000 0 NaN bmw NaN 2016-03-28 00:00:00 60489 2016-03-28 22:25:25
37840 2016-03-21 10:50:12 Porsche_997 privat Angebot 220000 test coupe 2008 manuell 415 911 30000 7 benzin porsche nein 2016-03-21 00:00:00 69198 2016-04-06 04:46:14
40918 2016-03-20 18:40:05 Porsche_911_991_GT3_RS privat Angebot 198000 test coupe 2015 automatik 500 911 5000 6 benzin porsche nein 2016-03-20 00:00:00 51491 2016-03-21 21:46:36
43668 2016-03-16 18:47:26 Porsche_993/911_Turbo_WLS_II_Exclusive_S_deuts... privat Angebot 197000 control coupe 1998 manuell 450 911 150000 3 NaN porsche nein 2016-03-16 00:00:00 46147 2016-04-07 02:44:47

Based on the output, we noticed that there is a huge price gap between USD 350,000 (Porsche 991, index: 36818) and 999,990 (Volkswagen Jetta GT, index: 37585). Most of the cars equal to or above USD 999,990 are fairly old cars (registered between the year 1960 to 2009), with several exceptions:

  • 2014, Citroen C4, index: 42221 — USD 27,322,222
  • 2017, Fiat Punto, index: 27371 — USD 12,345,678
  • 2018, Volvo v40, index: 39377 — USD 12,345,678

The asking price for Citroen C4 on eBay does not make sense, as its market price in 2016 was only between USD 9,314 to 12,454 (GBP 6,899 to 9,225) (Ref: Citroen C4 2016). The car entries for years 2017 and 2018 are questionable, as the crawling date was 2016 and the prices (12,345,678) seem to be invented number.

Meanwhile, a quick search on the internet browser shows that the price of the most expensive cars in 2016 are:

If a brand new luxurious car in 2016 costs above one million, any used cars cost close to or above 1 million in the dataset are clearly unrealistically high. Therefore, we decided to keep the entries with the price between USD 1 and USD 350,000, i.e. exclude entries at USD 0 and above USD 350,000.

We are aware of the fact that a better filter may involve more manual checking regarding whether the asking price is realistic — similar to what we just did to the data above USD 999,990. However, it is not so realistic to do so, as there are thousands of entries in the dataset. Therefore, even though our filter is still not perfect, yet it is fairly effective.

In [36]:
# Keep the rows with prices between 1 and 350,000, while discarding the rows with other values
autos = autos[autos['price_$'].between(1,350000)]

autos
Out[36]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot 5000 control bus 2004 manuell 158 andere 150000 3 lpg peugeot nein 2016-03-26 00:00:00 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot 8500 control limousine 1997 automatik 286 7er 150000 6 benzin bmw nein 2016-04-04 00:00:00 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot 8990 test limousine 2009 manuell 102 golf 70000 7 benzin volkswagen nein 2016-03-26 00:00:00 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot 4350 control kleinwagen 2007 automatik 71 fortwo 70000 6 benzin smart nein 2016-03-12 00:00:00 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot 1350 test kombi 2003 manuell 0 focus 150000 7 benzin ford nein 2016-04-01 00:00:00 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot 7900 test bus 2006 automatik 150 voyager 150000 4 diesel chrysler NaN 2016-03-21 00:00:00 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot 300 test limousine 1995 manuell 90 golf 150000 8 benzin volkswagen NaN 2016-03-20 00:00:00 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot 1990 control limousine 1998 manuell 90 golf 150000 12 diesel volkswagen nein 2016-03-16 00:00:00 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot 250 test NaN 2000 manuell 0 arosa 150000 10 NaN seat nein 2016-03-22 00:00:00 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot 590 control bus 1997 manuell 90 megane 150000 7 benzin renault nein 2016-03-16 00:00:00 15749 2016-04-06 10:46:35
10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau privat Angebot 999 test NaN 2017 manuell 90 NaN 150000 4 benzin volkswagen nein 2016-03-14 00:00:00 86157 2016-04-07 03:16:21
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot 350 control NaN 2000 NaN 0 NaN 150000 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot 5299 control kleinwagen 2010 automatik 71 fortwo 50000 9 benzin smart nein 2016-03-31 00:00:00 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot 1350 control limousine 1999 manuell 101 a3 150000 11 benzin audi nein 2016-03-23 00:00:00 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot 3999 test kleinwagen 2007 manuell 75 clio 150000 9 benzin renault NaN 2016-03-23 00:00:00 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot 18900 test coupe 1982 automatik 203 NaN 80000 6 benzin sonstige_autos nein 2016-04-01 00:00:00 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot 350 test kombi 1999 manuell 101 vectra 150000 5 benzin opel nein 2016-03-16 00:00:00 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot 5500 test coupe 1990 manuell 205 scirocco 150000 6 benzin volkswagen nein 2016-03-29 00:00:00 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot 300 control bus 1995 manuell 150 3er 150000 0 benzin bmw NaN 2016-03-26 00:00:00 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot 4150 control suv 2004 manuell 124 andere 150000 2 lpg mazda nein 2016-03-17 00:00:00 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot 3500 test kombi 2003 manuell 131 a4 150000 5 diesel audi NaN 2016-03-05 00:00:00 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot 41500 test cabrio 2004 manuell 320 911 150000 4 benzin porsche nein 2016-03-06 00:00:00 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot 25450 control cabrio 2015 manuell 184 cooper 10000 1 benzin mini nein 2016-03-28 00:00:00 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot 7999 control bus 2010 manuell 120 NaN 150000 2 diesel peugeot nein 2016-03-10 00:00:00 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot 48500 control limousine 2014 automatik 306 5er 30000 12 benzin bmw nein 2016-04-03 00:00:00 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung privat Angebot 90 control kombi 1996 manuell 116 NaN 150000 4 benzin ford ja 2016-03-21 00:00:00 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot 777 control kleinwagen 1992 manuell 54 polo 125000 2 benzin volkswagen nein 2016-04-03 00:00:00 38110 2016-04-05 23:46:48
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot 5250 control kleinwagen 2007 manuell 110 cooper 150000 7 diesel mini ja 2016-03-19 00:00:00 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot 4999 test kombi 2004 automatik 204 e_klasse 150000 10 diesel mercedes_benz nein 2016-04-02 00:00:00 47638 2016-04-02 12:45:44
30 2016-03-14 11:47:31 Peugeot_206_Unfallfahrzeug privat Angebot 80 test kleinwagen 2002 manuell 60 2_reihe 150000 6 benzin peugeot ja 2016-03-14 00:00:00 57076 2016-03-14 11:47:31
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49968 2016-04-01 17:49:15 Mercedes_Benz_190_D_2.5_Automatik privat Angebot 2100 test limousine 1986 automatik 90 andere 150000 9 diesel mercedes_benz nein 2016-04-01 00:00:00 40227 2016-04-05 13:16:35
49969 2016-03-17 18:49:02 Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZ privat Angebot 4500 control suv 2005 manuell 136 x_trail 150000 5 diesel nissan nein 2016-03-17 00:00:00 17379 2016-03-25 23:18:15
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... privat Angebot 15800 control bus 2010 automatik 136 c4 60000 4 diesel citroen nein 2016-03-21 00:00:00 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 privat Angebot 950 test kleinwagen 2001 manuell 50 lupo 150000 4 benzin volkswagen nein 2016-03-29 00:00:00 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. privat Angebot 3300 control bus 2004 automatik 150 vito 150000 10 diesel mercedes_benz ja 2016-03-26 00:00:00 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor privat Angebot 6000 control cabrio 2004 manuell 163 slk 150000 11 benzin mercedes_benz nein 2016-03-27 00:00:00 53567 2016-03-27 08:25:24
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort privat Angebot 9700 control kleinwagen 2012 automatik 88 jazz 100000 11 hybrid honda nein 2016-03-27 00:00:00 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... privat Angebot 5900 test kombi 1992 automatik 150 80 150000 12 benzin audi nein 2016-03-19 00:00:00 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 privat Angebot 5500 control limousine 2003 manuell 116 c_klasse 150000 2 diesel mercedes_benz nein 2016-03-31 00:00:00 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic privat Angebot 900 control limousine 1996 automatik 136 e_klasse 150000 9 benzin mercedes_benz ja 2016-04-04 00:00:00 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style privat Angebot 11000 test kleinwagen 2011 manuell 90 polo 70000 11 diesel volkswagen nein 2016-03-20 00:00:00 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V privat Angebot 400 control kombi 1995 manuell 105 escort 125000 3 benzin ford NaN 2016-03-12 00:00:00 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung privat Angebot 2000 control kombi 1998 manuell 115 astra 150000 12 benzin opel nein 2016-03-15 00:00:00 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm privat Angebot 1950 control kleinwagen 2004 manuell 0 fabia 90000 7 benzin skoda NaN 2016-03-29 00:00:00 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 privat Angebot 600 test kleinwagen 1999 manuell 101 focus 150000 4 benzin ford NaN 2016-03-06 00:00:00 52477 2016-03-09 06:16:08
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! privat Angebot 1000 control NaN 1995 automatik 0 NaN 150000 0 benzin volkswagen NaN 2016-04-02 00:00:00 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... privat Angebot 15900 control limousine 2010 automatik 218 300c 125000 11 diesel chrysler nein 2016-04-04 00:00:00 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... privat Angebot 21990 control limousine 2013 manuell 150 a3 50000 11 diesel audi nein 2016-03-22 00:00:00 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci privat Angebot 9550 control coupe 2001 manuell 231 3er 150000 10 benzin bmw nein 2016-03-28 00:00:00 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau privat Angebot 150 test kleinwagen 1997 manuell 0 polo 150000 5 benzin volkswagen ja 2016-03-11 00:00:00 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban privat Angebot 17500 test limousine 2012 manuell 156 a_klasse 30000 12 benzin mercedes_benz nein 2016-03-21 00:00:00 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot 500 control NaN 2016 manuell 0 twingo 150000 0 benzin renault NaN 2016-03-06 00:00:00 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport privat Angebot 4800 control kleinwagen 2009 manuell 120 andere 125000 9 lpg fiat nein 2016-03-10 00:00:00 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug privat Angebot 1650 control kleinwagen 1997 manuell 0 NaN 150000 7 benzin audi NaN 2016-03-15 00:00:00 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... privat Angebot 5000 control kombi 2001 automatik 299 a6 150000 1 benzin audi nein 2016-03-22 00:00:00 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot 24900 control limousine 2011 automatik 239 q5 100000 1 diesel audi nein 2016-03-27 00:00:00 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot 1980 control cabrio 1996 manuell 75 astra 150000 5 benzin opel nein 2016-03-28 00:00:00 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot 13200 test cabrio 2014 automatik 69 500 5000 11 benzin fiat nein 2016-04-02 00:00:00 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot 22900 control kombi 2013 manuell 150 a3 40000 11 diesel audi nein 2016-03-08 00:00:00 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot 1250 control limousine 1996 manuell 101 vectra 150000 1 benzin opel nein 2016-03-13 00:00:00 45897 2016-04-06 21:18:48

48564 rows × 19 columns

In [37]:
# Display the descriptive statistics
autos.describe()
Out[37]:
price_$ registration_year power_ps odometer_km registration_month postal_code
count 48,564.0 48,564.0 48,564.0 48,564.0 48,564.0 48,564.0
mean 5,889.1 2,004.8 117.2 125,769.6 5.8 50,975.5
std 9,059.9 88.6 200.7 39,788.9 3.7 25,747.2
min 1.0 1,000.0 0.0 5,000.0 0.0 1,067.0
25% 1,200.0 1,999.0 71.0 125,000.0 3.0 30,657.0
50% 3,000.0 2,004.0 107.0 150,000.0 6.0 49,716.0
75% 7,490.0 2,008.0 150.0 150,000.0 9.0 71,665.0
max 350,000.0 9,999.0 17,700.0 150,000.0 12.0 99,998.0
In [38]:
# Display the lowest price in an ascending manner
autos['price_$'].value_counts().sort_index().head()
Out[38]:
1    156
2      3
3      1
5      2
8      1
Name: price_$, dtype: int64
In [39]:
# Display the highest price in an descending manner
autos['price_$'].value_counts().sort_index(ascending=False).head()
Out[39]:
350000    1
345000    1
299000    1
295000    1
265000    1
Name: price_$, dtype: int64

Based on the cells above, we confirm that the entries at price 0 and above 350,000 have been removed.

Exploring the Odometer Column

Next, we analyze the odometer_km column.

In [40]:
autos['odometer_km'].describe()
Out[40]:
count    48,564.0
mean    125,769.6
std      39,788.9
min       5,000.0
25%     125,000.0
50%     150,000.0
75%     150,000.0
max     150,000.0
Name: odometer_km, dtype: float64
In [41]:
print('The number of unique values:   ', autos['odometer_km'].unique().shape[0])
The number of unique values:    13
In [42]:
autos['odometer_km'].value_counts().sort_index(ascending=True)
Out[42]:
5000        836
10000       253
20000       762
30000       780
40000       815
50000      1012
60000      1155
70000      1217
80000      1415
90000      1734
100000     2115
125000     5057
150000    31413
Name: odometer_km, dtype: int64

There are 13 unique values in odometer_km column.

The value_counts result shows that there is no outlier. The number of counts skews towards the higher end with the highest counts being the maximum value, 150,000 km. This is followed by the second and third highest counts being 125,000 and 100,000 km, which are the second and third maximum values, respectively.

Overall, the data looks good.

Exploring the Date Columns

Now, we will explore the date columns, which are:

  • date_crawled
  • ad_created
  • last_seen
  • registration_year
  • registration_month

Let's look at the date formats of the three date columns:

In [43]:
# Display the first five rows of the three date columns
autos[['date_crawled', 'ad_created', 'last_seen']][0:5]
Out[43]:
date_crawled ad_created last_seen
0 2016-03-26 17:47:46 2016-03-26 00:00:00 2016-04-06 06:45:54
1 2016-04-04 13:38:56 2016-04-04 00:00:00 2016-04-06 14:45:08
2 2016-03-26 18:57:24 2016-03-26 00:00:00 2016-04-06 20:15:37
3 2016-03-12 16:58:10 2016-03-12 00:00:00 2016-03-15 03:16:28
4 2016-04-01 14:38:50 2016-04-01 00:00:00 2016-04-01 14:38:50

The output shows that the values in date columns are displayed in %Y-%m-%d %H:%M%S format, where the initial 10 characters represent the date (e.g. 2016-03-26).

To investigate the distribution of the date values in these three columns as percentages, we chain the methods as such:

  • str[:10]: To extract the date
  • value_counts(normalize=True, dropna=False): To get the relative frequency of the unique values and include the missing values
  • sort_index(): To sort by date in ascending order (default)
  • *100: To convert the relative frequency to the percentage

In the earlier cells, we changed the output format of the descriptive statistics from scientific notation to floats with 1 decimal place and included a thousand separator. Now, we revert the format setting to make the analysis of the value in percentages easier.

In [44]:
# Reset the float format setting
pd.reset_option('^display.', silent=True)
In [45]:
# 'date_crawled' — To get the distribution of date values as percentages
autos['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index()*100
Out[45]:
2016-03-05    2.532740
2016-03-06    1.404332
2016-03-07    3.601433
2016-03-08    3.329627
2016-03-09    3.309035
2016-03-10    3.218433
2016-03-11    3.257557
2016-03-12    3.692035
2016-03-13    1.567004
2016-03-14    3.654971
2016-03-15    3.426406
2016-03-16    2.961041
2016-03-17    3.162837
2016-03-18    1.291080
2016-03-19    3.477885
2016-03-20    3.788815
2016-03-21    3.737336
2016-03-22    3.298740
2016-03-23    3.222552
2016-03-24    2.934272
2016-03-25    3.160778
2016-03-26    3.220493
2016-03-27    3.109299
2016-03-28    3.486121
2016-03-29    3.409933
2016-03-30    3.368751
2016-03-31    3.183428
2016-04-01    3.368751
2016-04-02    3.547896
2016-04-03    3.860885
2016-04-04    3.648793
2016-04-05    1.309612
2016-04-06    0.317107
2016-04-07    0.140021
Name: date_crawled, dtype: float64

The distribution of date_crawled is quite consistent throughout the whole crawling period — it fluctuates within 3% most of the time — with several exceptions. The lower date_crawled percentage from 2016-04-05 to 2016-04-07 correlates with the lower ad_created percentage (see the next cell). This indicates that the percentage of ad_created affect the percentage of date_crawled. Overall, the crawling efficiency seems to be fairly good.

In [46]:
# 'ad_created' — To get the distribution of date values as percentages
autos['ad_created'].str[:10].value_counts(normalize=True,dropna=False).sort_index()*100
Out[46]:
2015-06-11    0.002059
2015-08-10    0.002059
2015-09-09    0.002059
2015-11-10    0.002059
2015-12-05    0.002059
2015-12-30    0.002059
2016-01-03    0.002059
2016-01-07    0.002059
2016-01-10    0.004118
2016-01-13    0.002059
2016-01-14    0.002059
2016-01-16    0.002059
2016-01-22    0.002059
2016-01-27    0.006177
2016-01-29    0.002059
2016-02-01    0.002059
2016-02-02    0.004118
2016-02-05    0.004118
2016-02-07    0.002059
2016-02-08    0.002059
2016-02-09    0.002059
2016-02-11    0.002059
2016-02-12    0.004118
2016-02-14    0.004118
2016-02-16    0.002059
2016-02-17    0.002059
2016-02-18    0.004118
2016-02-19    0.006177
2016-02-20    0.004118
2016-02-21    0.006177
                ...   
2016-03-09    3.315213
2016-03-10    3.189605
2016-03-11    3.290503
2016-03-12    3.675562
2016-03-13    1.700848
2016-03-14    3.519068
2016-03-15    3.399638
2016-03-16    3.012520
2016-03-17    3.127831
2016-03-18    1.359031
2016-03-19    3.368751
2016-03-20    3.794992
2016-03-21    3.757928
2016-03-22    3.280208
2016-03-23    3.206079
2016-03-24    2.928095
2016-03-25    3.175191
2016-03-26    3.226670
2016-03-27    3.099003
2016-03-28    3.498476
2016-03-29    3.403756
2016-03-30    3.350218
2016-03-31    3.187546
2016-04-01    3.368751
2016-04-02    3.514949
2016-04-03    3.885594
2016-04-04    3.685858
2016-04-05    1.181945
2016-04-06    0.325344
2016-04-07    0.125607
Name: ad_created, Length: 76, dtype: float64

March 2016 and beginning of April 2016 were the busiest month for used cars ads creation, with around 3.+ % of daily new ads. The percentages of daily ads creation decreased drastically from 2016-04-05 to 2016-04-07 from 1.181945% to 0.125607%.

The daily new ads created between June 2015 to February 2016 were very low in comparison to March 2016 — only about 0.002059% to 0.006177% — and they were not created daily. In 2015, we also noticed that they were only 0.002059% of new ads on a single day in June, August, September and November, respectively; 0.002059 on two days in December; while none in July.

Based on these observations, we propose that March or early spring is generally a good time for a buyer to search for a used car on eBay.

In [47]:
# 'last_seen' — To get the distribution of date values as percentages
autos['last_seen'].str[:10].value_counts(normalize=True,dropna=False).sort_index()*100
Out[47]:
2016-03-05     0.107075
2016-03-06     0.432419
2016-03-07     0.539494
2016-03-08     0.741290
2016-03-09     0.959559
2016-03-10     1.066634
2016-03-11     1.237542
2016-03-12     2.378305
2016-03-13     0.889548
2016-03-14     1.260193
2016-03-15     1.587596
2016-03-16     1.645252
2016-03-17     2.808665
2016-03-18     0.735112
2016-03-19     1.583477
2016-03-20     2.065316
2016-03-21     2.063257
2016-03-22     2.137386
2016-03-23     1.853225
2016-03-24     1.976773
2016-03-25     1.921176
2016-03-26     1.680257
2016-03-27     1.564945
2016-03-28     2.085907
2016-03-29     2.234165
2016-03-30     2.477144
2016-03-31     2.378305
2016-04-01     2.279466
2016-04-02     2.491558
2016-04-03     2.520385
2016-04-04     2.448316
2016-04-05    12.476320
2016-04-06    22.178980
2016-04-07    13.194959
Name: last_seen, dtype: float64

The distribution of last_seen percentage is rather consistent throughout the timeframe — it fluctuates within 0.107075% to 2.808665%. The percentage is a lot higher on the last three days, between 12.476320% to 22.178980%, which seems to be normal, as it is the date that the crawler saw the ad last online.

Dealing with Incorrect Registration Year Data

Now, let's analyze the registration year data!

In [48]:
autos['registration_year'].describe()
Out[48]:
count    48564.000000
mean      2004.755518
std         88.644797
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

The descriptive statistics of registration_year comes to our surprise in two illogical aspects:

  • The minimum registration year is 1000 — cars were not invented yet at that time!
  • The maximum registration year is 9999 — year 9999 ridiculous!

We investigate the registration_year further to examine whether there are more unreasonable entries.

In [49]:
# To get the distribution of `registration_year` as percentages and sort the years in an ascending manner
autos['registration_year'].value_counts(normalize=True,dropna=False).sort_index()*100
Out[49]:
1000    0.002059
1001    0.002059
1111    0.002059
1800    0.004118
1910    0.010296
1927    0.002059
1929    0.002059
1931    0.002059
1934    0.004118
1937    0.008237
1938    0.002059
1939    0.002059
1941    0.004118
1943    0.002059
1948    0.002059
1950    0.006177
1951    0.004118
1952    0.002059
1953    0.002059
1954    0.004118
1955    0.004118
1956    0.008237
1957    0.004118
1958    0.008237
1959    0.012355
1960    0.047360
1961    0.012355
1962    0.008237
1963    0.016473
1964    0.024710
          ...   
2000    6.496582
2001    5.427889
2002    5.119018
2003    5.557615
2004    5.565851
2005    6.045631
2006    5.497900
2007    4.688658
2008    4.560992
2009    4.293304
2010    3.271971
2011    3.341982
2012    2.697471
2013    1.653488
2014    1.365209
2015    0.807182
2016    2.512149
2017    2.866321
2018    0.967795
2019    0.004118
2800    0.002059
4100    0.002059
4500    0.002059
4800    0.002059
5000    0.008237
5911    0.002059
6200    0.002059
8888    0.002059
9000    0.002059
9999    0.006177
Name: registration_year, Length: 95, dtype: float64

In addition to the year 1000, we notice that the unreasonable earliest registration years also include 1001, 1111, 1800 and 1910. Since the year 1886 is regarded as the birth year of the modern car (Ref: modern car), any entries before the year 1910 in this dataset should be removed.

Additionally, any entries after 2016 (the crawling year), i.e. between 2017 to 9999, should be removed as well.

In [50]:
# Remove the entries before the year 1910 and after the year 2016
autos = autos[autos['registration_year'].between(1910,2016)]

From the cell below, we can see that we have successfully removed the illogical years. Now, the minimum year is 1910, whereas the maximum year is 2016.

In [51]:
autos['registration_year'].describe()
Out[51]:
count    46680.000000
mean      2002.910818
std          7.185168
min       1910.000000
25%       1999.000000
50%       2003.000000
75%       2008.000000
max       2016.000000
Name: registration_year, dtype: float64
In [52]:
# To get the distribution of `registration_year` as percentages and sort the years in a descending manner
autos['registration_year'].value_counts(normalize=True).sort_values(ascending=False).head(10)*100
Out[52]:
2000    6.758783
2005    6.289632
1999    6.206084
2004    5.790488
2003    5.781919
2006    5.719794
2001    5.646958
2002    5.325621
1998    5.062125
2007    4.877892
Name: registration_year, dtype: float64
In [53]:
# To get the distribution of `registration_year` as percentages and sort the years in an ascending manner
autos['registration_year'].value_counts(normalize=True).sort_values().head(30)*100
Out[53]:
1952    0.002142
1953    0.002142
1943    0.002142
1929    0.002142
1931    0.002142
1938    0.002142
1948    0.002142
1927    0.002142
1939    0.002142
1955    0.004284
1957    0.004284
1934    0.004284
1951    0.004284
1941    0.004284
1954    0.004284
1950    0.006427
1962    0.008569
1937    0.008569
1958    0.008569
1956    0.008569
1910    0.010711
1959    0.012853
1961    0.012853
1963    0.017138
1964    0.025707
1965    0.036418
1975    0.038560
1969    0.040703
1976    0.044987
1977    0.047129
Name: registration_year, dtype: float64

Based on the distribution, we can see that the top 10 registration_year are between 1998 to 2007 (4.88% to 6.76%). There was also a small percentage of sellers selling their antique cars on eBay site.

Dealing with Incorrect Registration Month Data

Next, we will analyze the registration month data.

In [54]:
autos['registration_month'].describe()
Out[54]:
count    46680.000000
mean         5.827078
std          3.670325
min          0.000000
25%          3.000000
50%          6.000000
75%          9.000000
max         12.000000
Name: registration_month, dtype: float64
In [55]:
# To get the distribution of `registration_month` as percentages and sort the months in a descending manner
autos['registration_month'].value_counts(normalize=True).sort_values(ascending=False)*100
Out[55]:
3     10.364182
6      8.823907
0      8.624679
4      8.341902
5      8.305484
7      7.973436
10     7.487147
12     6.988003
9      6.947301
11     6.917309
1      6.651671
8      6.469580
2      6.105398
Name: registration_month, dtype: float64

We are surprised by the fact that 8.62% of the registration was done in the month-0, which does not exist. Given that registration month is usually not so crucial compared to registration year in a market place, sellers might fill in month-0 when they do not remember the registration month. We will just keep the entries for the month-0 as they are, as registration month will not have a big impact on our analysis and we do not want to lose the associated data of the month-0.

Based on the data, most of the cars were registered between spring and early summer (March to June) — 8.31% to 10.36% — presumably preparing for summer holidays or spring/summer outdoor activities. Therefore, spring and early summer is the best time to change cars, especially March, as March is the peak time for ads creation as well.

To have an overview of the number of entries, we print it out:

In [56]:
n_before_removal = autos.shape[0]
print('The number of entries before removal of illogical registration months:   ', n_before_removal)
The number of entries before removal of illogical registration months:    46680

Now, let's examine whether there are any entries with illogical registration months. For example, it is illogical if an ad was created and crawled in March 2016, but the car was registered in August 2016.

The last ad creation date is 2016-04-07. In theory, the chance of someone buying and registering a new car and sell it within the same month is extremely low. Therefore, we create a combined Boolean filter for car registration after March 2016 by using autos[(autos['registration_year'] == 2016) & (autos['registration_month'] > 3)], and remove the entries from autos by using autos.drop() method.

In [57]:
# Remove the entries of car registration after March 2016
autos = autos.drop(autos[(autos['registration_year'] == 2016) & (autos['registration_month'] > 3)].index)

autos
Out[57]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot 5000 control bus 2004 manuell 158 andere 150000 3 lpg peugeot nein 2016-03-26 00:00:00 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot 8500 control limousine 1997 automatik 286 7er 150000 6 benzin bmw nein 2016-04-04 00:00:00 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot 8990 test limousine 2009 manuell 102 golf 70000 7 benzin volkswagen nein 2016-03-26 00:00:00 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot 4350 control kleinwagen 2007 automatik 71 fortwo 70000 6 benzin smart nein 2016-03-12 00:00:00 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot 1350 test kombi 2003 manuell 0 focus 150000 7 benzin ford nein 2016-04-01 00:00:00 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot 7900 test bus 2006 automatik 150 voyager 150000 4 diesel chrysler NaN 2016-03-21 00:00:00 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot 300 test limousine 1995 manuell 90 golf 150000 8 benzin volkswagen NaN 2016-03-20 00:00:00 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot 1990 control limousine 1998 manuell 90 golf 150000 12 diesel volkswagen nein 2016-03-16 00:00:00 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot 250 test NaN 2000 manuell 0 arosa 150000 10 NaN seat nein 2016-03-22 00:00:00 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot 590 control bus 1997 manuell 90 megane 150000 7 benzin renault nein 2016-03-16 00:00:00 15749 2016-04-06 10:46:35
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot 350 control NaN 2000 NaN 0 NaN 150000 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot 5299 control kleinwagen 2010 automatik 71 fortwo 50000 9 benzin smart nein 2016-03-31 00:00:00 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot 1350 control limousine 1999 manuell 101 a3 150000 11 benzin audi nein 2016-03-23 00:00:00 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot 3999 test kleinwagen 2007 manuell 75 clio 150000 9 benzin renault NaN 2016-03-23 00:00:00 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot 18900 test coupe 1982 automatik 203 NaN 80000 6 benzin sonstige_autos nein 2016-04-01 00:00:00 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot 350 test kombi 1999 manuell 101 vectra 150000 5 benzin opel nein 2016-03-16 00:00:00 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot 5500 test coupe 1990 manuell 205 scirocco 150000 6 benzin volkswagen nein 2016-03-29 00:00:00 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot 300 control bus 1995 manuell 150 3er 150000 0 benzin bmw NaN 2016-03-26 00:00:00 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot 4150 control suv 2004 manuell 124 andere 150000 2 lpg mazda nein 2016-03-17 00:00:00 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot 3500 test kombi 2003 manuell 131 a4 150000 5 diesel audi NaN 2016-03-05 00:00:00 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot 41500 test cabrio 2004 manuell 320 911 150000 4 benzin porsche nein 2016-03-06 00:00:00 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot 25450 control cabrio 2015 manuell 184 cooper 10000 1 benzin mini nein 2016-03-28 00:00:00 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot 7999 control bus 2010 manuell 120 NaN 150000 2 diesel peugeot nein 2016-03-10 00:00:00 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot 48500 control limousine 2014 automatik 306 5er 30000 12 benzin bmw nein 2016-04-03 00:00:00 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung privat Angebot 90 control kombi 1996 manuell 116 NaN 150000 4 benzin ford ja 2016-03-21 00:00:00 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot 777 control kleinwagen 1992 manuell 54 polo 125000 2 benzin volkswagen nein 2016-04-03 00:00:00 38110 2016-04-05 23:46:48
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot 5250 control kleinwagen 2007 manuell 110 cooper 150000 7 diesel mini ja 2016-03-19 00:00:00 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot 4999 test kombi 2004 automatik 204 e_klasse 150000 10 diesel mercedes_benz nein 2016-04-02 00:00:00 47638 2016-04-02 12:45:44
30 2016-03-14 11:47:31 Peugeot_206_Unfallfahrzeug privat Angebot 80 test kleinwagen 2002 manuell 60 2_reihe 150000 6 benzin peugeot ja 2016-03-14 00:00:00 57076 2016-03-14 11:47:31
31 2016-03-14 16:53:09 Noch_gut_erhaltenen_C_320 privat Angebot 2850 test kombi 2002 automatik 218 c_klasse 150000 7 benzin mercedes_benz nein 2016-03-14 00:00:00 41065 2016-03-16 07:19:04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49968 2016-04-01 17:49:15 Mercedes_Benz_190_D_2.5_Automatik privat Angebot 2100 test limousine 1986 automatik 90 andere 150000 9 diesel mercedes_benz nein 2016-04-01 00:00:00 40227 2016-04-05 13:16:35
49969 2016-03-17 18:49:02 Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZ privat Angebot 4500 control suv 2005 manuell 136 x_trail 150000 5 diesel nissan nein 2016-03-17 00:00:00 17379 2016-03-25 23:18:15
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... privat Angebot 15800 control bus 2010 automatik 136 c4 60000 4 diesel citroen nein 2016-03-21 00:00:00 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 privat Angebot 950 test kleinwagen 2001 manuell 50 lupo 150000 4 benzin volkswagen nein 2016-03-29 00:00:00 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. privat Angebot 3300 control bus 2004 automatik 150 vito 150000 10 diesel mercedes_benz ja 2016-03-26 00:00:00 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor privat Angebot 6000 control cabrio 2004 manuell 163 slk 150000 11 benzin mercedes_benz nein 2016-03-27 00:00:00 53567 2016-03-27 08:25:24
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort privat Angebot 9700 control kleinwagen 2012 automatik 88 jazz 100000 11 hybrid honda nein 2016-03-27 00:00:00 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... privat Angebot 5900 test kombi 1992 automatik 150 80 150000 12 benzin audi nein 2016-03-19 00:00:00 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 privat Angebot 5500 control limousine 2003 manuell 116 c_klasse 150000 2 diesel mercedes_benz nein 2016-03-31 00:00:00 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic privat Angebot 900 control limousine 1996 automatik 136 e_klasse 150000 9 benzin mercedes_benz ja 2016-04-04 00:00:00 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style privat Angebot 11000 test kleinwagen 2011 manuell 90 polo 70000 11 diesel volkswagen nein 2016-03-20 00:00:00 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V privat Angebot 400 control kombi 1995 manuell 105 escort 125000 3 benzin ford NaN 2016-03-12 00:00:00 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung privat Angebot 2000 control kombi 1998 manuell 115 astra 150000 12 benzin opel nein 2016-03-15 00:00:00 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm privat Angebot 1950 control kleinwagen 2004 manuell 0 fabia 90000 7 benzin skoda NaN 2016-03-29 00:00:00 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 privat Angebot 600 test kleinwagen 1999 manuell 101 focus 150000 4 benzin ford NaN 2016-03-06 00:00:00 52477 2016-03-09 06:16:08
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! privat Angebot 1000 control NaN 1995 automatik 0 NaN 150000 0 benzin volkswagen NaN 2016-04-02 00:00:00 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... privat Angebot 15900 control limousine 2010 automatik 218 300c 125000 11 diesel chrysler nein 2016-04-04 00:00:00 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... privat Angebot 21990 control limousine 2013 manuell 150 a3 50000 11 diesel audi nein 2016-03-22 00:00:00 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci privat Angebot 9550 control coupe 2001 manuell 231 3er 150000 10 benzin bmw nein 2016-03-28 00:00:00 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau privat Angebot 150 test kleinwagen 1997 manuell 0 polo 150000 5 benzin volkswagen ja 2016-03-11 00:00:00 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban privat Angebot 17500 test limousine 2012 manuell 156 a_klasse 30000 12 benzin mercedes_benz nein 2016-03-21 00:00:00 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot 500 control NaN 2016 manuell 0 twingo 150000 0 benzin renault NaN 2016-03-06 00:00:00 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport privat Angebot 4800 control kleinwagen 2009 manuell 120 andere 125000 9 lpg fiat nein 2016-03-10 00:00:00 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug privat Angebot 1650 control kleinwagen 1997 manuell 0 NaN 150000 7 benzin audi NaN 2016-03-15 00:00:00 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... privat Angebot 5000 control kombi 2001 automatik 299 a6 150000 1 benzin audi nein 2016-03-22 00:00:00 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot 24900 control limousine 2011 automatik 239 q5 100000 1 diesel audi nein 2016-03-27 00:00:00 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot 1980 control cabrio 1996 manuell 75 astra 150000 5 benzin opel nein 2016-03-28 00:00:00 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot 13200 test cabrio 2014 automatik 69 500 5000 11 benzin fiat nein 2016-04-02 00:00:00 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot 22900 control kombi 2013 manuell 150 a3 40000 11 diesel audi nein 2016-03-08 00:00:00 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot 1250 control limousine 1996 manuell 101 vectra 150000 1 benzin opel nein 2016-03-13 00:00:00 45897 2016-04-06 21:18:48

45979 rows × 19 columns

As we can see from the cell below, the number of entries have been reduced from 46680 to 45979 after removing 701 illogical entries.

In [58]:
n_after_removal = autos.shape[0]
print('The number of entries after removal of illogical registration months:   ', n_after_removal)
print('The number of deleted entries:    ', n_before_removal - n_after_removal)
The number of entries after removal of illogical registration months:    45979
The number of deleted entries:     701

While we check the entries for the registration year 2016, we only obtain the entries for registration month 0 to 3. This result further confirms the removal of the entries of the illogical registration months.

In [59]:
# Verify the deletion
autos[autos['registration_year'] == 2016]
Out[59]:
date_crawled name seller offer_type price_$ abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
59 2016-03-17 17:50:54 Mercedes_A_Klasse_W_168__A_140_gruen privat Angebot 700 control NaN 2016 manuell 0 a_klasse 150000 0 benzin mercedes_benz NaN 2016-03-17 00:00:00 95356 2016-03-19 17:46:47
76 2016-03-22 14:52:57 BMW_318i_neustes_Model_0Km privat Angebot 31999 control limousine 2016 manuell 136 3er 5000 2 benzin bmw NaN 2016-03-22 00:00:00 45149 2016-04-06 05:15:42
101 2016-03-22 17:51:49 Schnaepchen_in_einem_Jahr_OLDTEIMER_KENNZEICHEN privat Angebot 600 control NaN 2016 NaN 0 corolla 150000 0 NaN toyota NaN 2016-03-22 00:00:00 32825 2016-03-22 17:51:49
227 2016-04-04 17:54:21 Als_Bastler_Fahrzeug privat Angebot 300 test NaN 2016 NaN 0 seicento 125000 0 benzin fiat NaN 2016-04-04 00:00:00 25924 2016-04-06 19:48:31
343 2016-03-08 21:45:54 mitsubishi_colt privat Angebot 600 control NaN 2016 manuell 75 colt 150000 0 benzin mitsubishi NaN 2016-03-08 00:00:00 42929 2016-03-19 07:15:57
374 2016-03-15 18:42:52 Polo_9N_1_9_Tdi_Gti_Optik privat Angebot 3850 control NaN 2016 manuell 101 polo 150000 0 diesel volkswagen NaN 2016-03-15 00:00:00 21762 2016-03-20 15:17:50
452 2016-03-08 10:52:14 Blauer_Golf_III_schoenes_Anfaenger_Auto privat Angebot 800 test NaN 2016 manuell 75 golf 150000 3 benzin volkswagen NaN 2016-03-08 00:00:00 29640 2016-03-10 23:17:57
479 2016-03-08 07:55:41 Renault_R4_TL_Savanne privat Angebot 6450 test NaN 2016 NaN 0 andere 70000 0 NaN renault NaN 2016-03-08 00:00:00 57462 2016-04-06 06:46:24
696 2016-03-08 03:55:14 Audi_a6_avant_2.5_tdi_v6 privat Angebot 2500 control NaN 2016 NaN 0 a6 40000 3 diesel audi nein 2016-03-08 00:00:00 49577 2016-03-12 22:17:43
1012 2016-03-05 21:49:36 Opel_Agila_1.0 privat Angebot 290 test NaN 2016 manuell 0 agila 150000 0 benzin opel NaN 2016-03-05 00:00:00 50765 2016-03-06 03:45:36
1124 2016-03-28 23:39:33 opel_corsa_A privat Angebot 300 test NaN 2016 NaN 45 corsa 40000 0 NaN opel NaN 2016-03-28 00:00:00 16321 2016-03-30 16:45:27
1133 2016-03-14 16:47:42 Fiat_Panda_HP_100_Klimaautomatic!! privat Angebot 4790 test NaN 2016 manuell 101 panda 80000 0 NaN fiat NaN 2016-03-14 00:00:00 94113 2016-03-23 17:16:40
1135 2016-03-19 12:45:25 Renault_twingo_1.3 privat Angebot 150 test NaN 2016 manuell 54 twingo 150000 0 NaN renault NaN 2016-03-19 00:00:00 6642 2016-04-06 16:44:58
1306 2016-04-01 20:37:43 Peugeot_607_HDI_2.2_mit_gruener_Plakette privat Angebot 2199 test NaN 2016 automatik 133 NaN 150000 0 diesel peugeot NaN 2016-04-01 00:00:00 44147 2016-04-03 17:18:48
1530 2016-03-25 15:50:38 Passat_35i_1_9_td privat Angebot 850 test NaN 2016 NaN 75 passat 150000 2 diesel volkswagen NaN 2016-03-25 00:00:00 37318 2016-04-06 20:16:09
1555 2016-03-17 01:00:21 Audi_Cabriolet____A5___NAVI_.XEN_._ALCANTARA.S... privat Angebot 37450 test cabrio 2016 manuell 177 andere 5000 3 benzin audi nein 2016-03-17 00:00:00 95444 2016-03-17 07:38:03
1562 2016-03-19 20:54:16 RENAULT_TWINGO___ERST_3_MONATE_ALT_UND_2000_KM... privat Angebot 6900 control kleinwagen 2016 manuell 71 twingo 5000 1 benzin renault NaN 2016-03-19 00:00:00 41238 2016-03-19 21:42:49
1632 2016-03-29 23:48:06 Fiat_Idea_Mini_Van privat Angebot 1990 control NaN 2016 manuell 80 andere 150000 2 NaN fiat nein 2016-03-29 00:00:00 27612 2016-04-06 13:16:55
1649 2016-03-25 13:43:29 Opel_Corsa_SPORT_GSi_16V_106_PS_mit_Ganzleder_... privat Angebot 380 test NaN 2016 manuell 106 corsa 150000 0 NaN opel NaN 2016-03-25 00:00:00 56812 2016-03-28 12:45:58
1866 2016-03-07 17:42:12 MICROCAR_M.GO_Dynamic_Mopedauto_Leichtkraftfah... privat Angebot 11000 test kleinwagen 2016 automatik 5 NaN 5000 3 diesel sonstige_autos nein 2016-03-07 00:00:00 19288 2016-03-29 19:18:13
1956 2016-03-07 15:46:32 Volkswagen_Passat_1_9_tdi privat Angebot 2700 control NaN 2016 NaN 101 passat 5000 0 diesel volkswagen nein 2016-03-07 00:00:00 37581 2016-03-29 05:18:08
1988 2016-03-17 20:36:21 Polo_9n3_1.6l_105PS_Automatik privat Angebot 6500 control NaN 2016 automatik 105 polo 125000 0 benzin volkswagen NaN 2016-03-17 00:00:00 37197 2016-03-19 20:47:48
2278 2016-03-15 18:53:59 Bmw_318_Coupe_M_Original privat Angebot 2600 control NaN 2016 automatik 0 3er 150000 0 benzin bmw nein 2016-03-15 00:00:00 12049 2016-03-15 18:53:59
2280 2016-03-31 17:51:37 Ford_fiesta privat Angebot 2350 control NaN 2016 manuell 60 NaN 150000 3 NaN ford NaN 2016-03-31 00:00:00 92360 2016-04-06 11:44:27
2516 2016-03-15 09:57:39 Gutes_Auto privat Angebot 269 control NaN 2016 NaN 0 1_reihe 20000 3 NaN peugeot nein 2016-03-15 00:00:00 71522 2016-03-17 03:44:32
2534 2016-03-07 09:57:07 Golf_1_Cabrio privat Angebot 1900 control NaN 2016 manuell 98 golf 150000 1 NaN volkswagen NaN 2016-03-07 00:00:00 84048 2016-03-11 10:18:05
2583 2016-03-16 19:55:27 BMW_E46_Touring_320d_150ps_Leder_Tempomat privat Angebot 4500 test NaN 2016 manuell 0 3er 150000 0 diesel bmw NaN 2016-03-16 00:00:00 17379 2016-04-07 05:45:50
2596 2016-03-10 18:49:15 Peugeot_206_75_Klima__8__fach_bereift_Scheckhe... privat Angebot 1599 test NaN 2016 manuell 75 2_reihe 125000 1 NaN peugeot NaN 2016-03-10 00:00:00 71126 2016-03-15 12:44:36
2815 2016-03-31 10:53:50 Volkswagen_Golf_4_unfall privat Angebot 600 control NaN 2016 manuell 0 NaN 150000 0 benzin volkswagen NaN 2016-03-31 00:00:00 49565 2016-03-31 11:40:58
3121 2016-03-08 10:52:51 Ford_Focus_Zweitwagen_Anfaengerauto_VW privat Angebot 1999 test NaN 2016 manuell 101 focus 150000 2 benzin ford nein 2016-03-08 00:00:00 8107 2016-03-26 00:44:35
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
46586 2016-03-21 16:54:39 VW_Golf_VI_zu_verkaufen!!! privat Angebot 950 control NaN 2016 NaN 74 golf 150000 0 benzin volkswagen NaN 2016-03-21 00:00:00 28309 2016-03-21 16:54:39
46664 2016-03-12 07:36:59 BMW_323_170ps_Leder_Automatik privat Angebot 3100 control NaN 2016 automatik 170 NaN 150000 0 NaN bmw nein 2016-03-12 00:00:00 49393 2016-03-16 06:17:14
46782 2016-03-23 16:47:47 Audi_A4_Avant_S_line_S4_Optik_S_tronic_LED_Matrex privat Angebot 49900 test kombi 2016 automatik 190 a4 5000 3 diesel audi nein 2016-03-23 00:00:00 34355 2016-04-05 20:17:49
46977 2016-03-09 14:50:31 Auto_zu_verkaufen privat Angebot 1200 control NaN 2016 manuell 120 NaN 150000 0 NaN peugeot NaN 2016-03-09 00:00:00 63225 2016-03-11 21:15:15
46994 2016-04-03 15:50:56 opel_corsa_B_1.2 privat Angebot 400 test NaN 2016 NaN 0 corsa 150000 0 benzin opel NaN 2016-04-03 00:00:00 26897 2016-04-05 15:16:17
46998 2016-03-11 08:37:34 MINI_One_Clubman privat Angebot 19999 test kombi 2016 manuell 102 clubman 5000 3 benzin mini nein 2016-03-11 00:00:00 63594 2016-03-12 11:45:22
47249 2016-03-12 23:57:17 Opel_corsa privat Angebot 1800 control NaN 2016 NaN 0 corsa 150000 0 benzin opel NaN 2016-03-12 00:00:00 89340 2016-03-16 14:45:56
47270 2016-04-02 14:51:55 Golf_4_Gti_1.8t privat Angebot 16000 test NaN 2016 manuell 337 golf 125000 0 benzin volkswagen nein 2016-04-02 00:00:00 59505 2016-04-06 13:17:44
47414 2016-03-15 03:03:07 Verkaufe_Honda_Civic privat Angebot 5600 test NaN 2016 manuell 0 civic 150000 2 benzin honda nein 2016-03-15 00:00:00 75378 2016-03-18 01:15:49
47539 2016-03-14 18:25:17 1_Hand_Civic_mit_Rest_Tuev privat Angebot 666 control NaN 2016 manuell 90 civic 125000 0 NaN honda NaN 2016-03-14 00:00:00 60386 2016-03-14 18:46:25
47671 2016-03-11 13:37:40 Opel_corsa_b_1.0 privat Angebot 300 test NaN 2016 manuell 0 corsa 150000 0 NaN opel NaN 2016-03-11 00:00:00 97070 2016-03-22 02:44:22
47832 2016-03-23 14:49:36 Opel_Vivaro_1.6_BiTurbo_2xSchiebetuer_KeyLess_... privat Angebot 31799 test bus 2016 manuell 145 vivaro 5000 2 diesel opel nein 2016-03-23 00:00:00 1328 2016-04-05 16:47:40
48254 2016-03-17 15:51:12 Ford_Focus_2_Tdci privat Angebot 6500 test NaN 2016 manuell 136 focus 90000 3 diesel ford nein 2016-03-17 00:00:00 39418 2016-03-23 02:15:28
48275 2016-03-08 19:37:00 FIAT_PUNTO_188_..._60_PS privat Angebot 1800 test NaN 2016 manuell 60 punto 125000 1 NaN fiat nein 2016-03-08 00:00:00 45149 2016-04-05 13:17:31
48437 2016-03-16 21:55:59 Golf_4_EDITION_1.4 privat Angebot 500 control NaN 2016 manuell 75 golf 150000 1 benzin volkswagen ja 2016-03-16 00:00:00 50169 2016-03-26 04:16:51
48559 2016-03-14 08:53:15 Audi_A3_Automatik privat Angebot 500 test NaN 2016 automatik 0 a3 150000 0 benzin audi ja 2016-03-14 00:00:00 54329 2016-03-15 23:15:30
48656 2016-03-28 01:54:01 Vectra_b_Bastler_Fahrzeug___auch_Tausch_ privat Angebot 250 test NaN 2016 manuell 101 NaN 150000 0 NaN opel nein 2016-03-27 00:00:00 49584 2016-04-06 09:17:39
48768 2016-04-03 18:54:34 Golf_4_mit_Anhaengerkupplung privat Angebot 1150 control NaN 2016 manuell 75 golf 150000 2 NaN volkswagen nein 2016-04-03 00:00:00 13593 2016-04-05 19:19:19
48923 2016-03-21 15:55:27 Suche_ein_golf_4_1.6 privat Angebot 1000 control NaN 2016 manuell 0 golf 150000 0 benzin volkswagen NaN 2016-03-21 00:00:00 32120 2016-03-22 20:47:39
49055 2016-04-03 11:38:57 VW_polo_1.0_1999_bj._Motor__schaden!! privat Angebot 250 control NaN 2016 manuell 50 polo 150000 0 benzin volkswagen NaN 2016-04-03 00:00:00 57439 2016-04-07 12:44:45
49058 2016-03-18 08:37:34 Polo_6n_1.4_167Tsd._KM_TÜV_Bis_April privat Angebot 320 test NaN 2016 NaN 0 polo 150000 0 benzin volkswagen NaN 2016-03-18 00:00:00 33129 2016-04-05 22:15:50
49189 2016-03-24 17:37:23 Skoda_Fabia_Combi_1.2_TSI_Ambition privat Angebot 11995 control kombi 2016 manuell 90 fabia 5000 3 benzin skoda nein 2016-03-24 00:00:00 82229 2016-04-07 10:17:21
49324 2016-04-01 16:36:34 Mercedes_Benz_Vito_111_BlueTEC_Tourer_Kompakt_PRO privat Angebot 29500 control kombi 2016 manuell 114 vito 5000 2 diesel mercedes_benz nein 2016-04-01 00:00:00 35037 2016-04-07 12:44:36
49340 2016-03-05 19:52:28 Nagelneuer_Adam__viele_Extras__aus_Gewinn._NIE... privat Angebot 12990 test kleinwagen 2016 manuell 87 andere 5000 3 benzin opel nein 2016-03-03 00:00:00 48249 2016-03-22 19:18:46
49477 2016-03-25 14:48:00 Opel_Omega_B_Caravan privat Angebot 450 control NaN 2016 automatik 136 omega 150000 0 NaN opel NaN 2016-03-25 00:00:00 15713 2016-04-06 17:47:06
49686 2016-03-21 18:43:24 Ford_Escort_Combi_Bastler privat Angebot 700 test NaN 2016 manuell 90 escort 150000 0 NaN ford ja 2016-03-21 00:00:00 12351 2016-03-22 23:44:27
49725 2016-03-26 12:58:05 VW_Passat_2.8l_V6_Syncro_AHK privat Angebot 999 control NaN 2016 automatik 193 passat 150000 0 NaN volkswagen ja 2016-03-26 00:00:00 82343 2016-03-26 13:42:13
49728 2016-03-25 16:00:11 Ford_Focus_1.4_16V_Bastlerauto privat Angebot 1 test NaN 2016 manuell 0 focus 150000 0 benzin ford ja 2016-03-25 00:00:00 34587 2016-03-25 16:40:21
49750 2016-03-29 22:37:59 Volkswagen_Polo_6n privat Angebot 220 test NaN 2016 NaN 60 NaN 150000 0 NaN volkswagen ja 2016-03-29 00:00:00 67433 2016-04-02 12:47:26
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot 500 control NaN 2016 manuell 0 twingo 150000 0 benzin renault NaN 2016-03-06 00:00:00 61350 2016-03-06 18:24:19

519 rows × 19 columns

Exploring Price and Mileage by Brand

Now, we are going to analyze the price and mileage of cars by brand.

First, we explore the unique values in the brand column and sort them by using value_counts():

In [60]:
print('Unique brand:')
print(autos['brand'].unique())
print('\n')
print('The number of unique brand:   ', len(autos['brand'].unique()))
Unique brand:
['peugeot' 'bmw' 'volkswagen' 'smart' 'ford' 'chrysler' 'seat' 'renault'
 'mercedes_benz' 'audi' 'sonstige_autos' 'opel' 'mazda' 'porsche' 'mini'
 'toyota' 'dacia' 'nissan' 'jeep' 'saab' 'volvo' 'mitsubishi' 'jaguar'
 'fiat' 'skoda' 'subaru' 'kia' 'citroen' 'chevrolet' 'hyundai' 'honda'
 'daewoo' 'suzuki' 'trabant' 'land_rover' 'alfa_romeo' 'lada' 'rover'
 'daihatsu' 'lancia']


The number of unique brand:    40
In [61]:
# Display the percentage of values counts of all car brands
autos['brand'].value_counts(normalize=True).sort_values(ascending=False)*100
Out[61]:
volkswagen        21.066139
bmw               11.065921
opel              10.694012
mercedes_benz      9.687031
audi               8.699624
ford               6.972748
renault            4.708671
peugeot            2.973096
fiat               2.562039
seat               1.813872
skoda              1.650754
nissan             1.522434
mazda              1.513735
smart              1.409339
citroen            1.394115
toyota             1.272320
hyundai            1.000457
sonstige_autos     0.989582
volvo              0.919985
mini               0.880837
mitsubishi         0.822114
honda              0.782966
kia                0.715544
alfa_romeo         0.663346
porsche            0.617673
suzuki             0.591574
chevrolet          0.572000
chrysler           0.350160
dacia              0.263164
daihatsu           0.252289
jeep               0.228365
subaru             0.213141
land_rover         0.210966
saab               0.167468
jaguar             0.158768
daewoo             0.152243
trabant            0.141369
rover              0.132669
lancia             0.108745
lada               0.058722
Name: brand, dtype: float64

Out of the 40 unique brands, we decided to aggregate the top 10 brands for our analysis, as it represents a good proportion of the common car brands (1.81% to 21.07%). The brands that we have chosen are as below:

In [62]:
# Display the percentage of values counts of the top 10 car brands
autos['brand'].value_counts(normalize=True).sort_values(ascending=False).head(10)*100
Out[62]:
volkswagen       21.066139
bmw              11.065921
opel             10.694012
mercedes_benz     9.687031
audi              8.699624
ford              6.972748
renault           4.708671
peugeot           2.973096
fiat              2.562039
seat              1.813872
Name: brand, dtype: float64

We use aggregation to understand the brand column. We start with grouping the the top 10 unique brands under the variable top_10_brands:

In [63]:
# Group the top 10 unique brands
top_10_brands = autos['brand'].value_counts(normalize=True).sort_values(ascending=False).head(10).index

print(top_10_brands)
Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford', 'renault',
       'peugeot', 'fiat', 'seat'],
      dtype='object')

Next, we loop over the brand names in top_10_brands to group the dataframe by unique brand names. By using the unique brand names, we compute the brand_mean_price and brand_mean_mileage for each of the unique brands, add them to the mean_prices_dict and mean_mileage_dict dictionaries, respectively.

In [64]:
# A dictionary to hold the aggegate data
mean_prices_dict = {}
mean_mileage_dict = {}
mean_prices_list = []

# Loop over the top 10 unique brands
for brand_name in top_10_brands:
    brand_groups = autos[autos['brand'] == brand_name]    # Group the rows of the unique brands
    
    # For mean price
    brand_mean_prices = int(brand_groups['price_$'].mean())    # Calculate the mean price of each of the unique brands and convert it to integer
    mean_prices_dict[brand_name] = brand_mean_prices    # Add to dictionary: key = brand_name; value = brand_mean_prices
    
    # For mean mileage
    brand_mean_mileage = int(brand_groups['odometer_km'].mean())    # Calculate the mean mileage of each of the unique brands and convert it to integer 
    mean_mileage_dict[brand_name] = brand_mean_mileage    # Add to dictionary: key = brand_name; value = brand_mean_mileage

print('Mean price dictionary:', '\n', mean_prices_dict)
print('\n')
print('Mean mileage dictionary:', '\n', mean_mileage_dict)
Mean price dictionary: 
 {'volkswagen': 5450, 'bmw': 8378, 'opel': 2995, 'mercedes_benz': 8688, 'audi': 9390, 'ford': 3795, 'renault': 2496, 'peugeot': 3109, 'fiat': 2833, 'seat': 4447}


Mean mileage dictionary: 
 {'volkswagen': 128526, 'bmw': 132466, 'opel': 129221, 'mercedes_benz': 130638, 'audi': 129041, 'ford': 124042, 'renault': 127886, 'peugeot': 126850, 'fiat': 116943, 'seat': 120803}

Now, we will investigate the correlation between the and the mean price and mean mileage by brand, if any. To do so, we will:

  • Construct two series objects.
  • Combine the data from both series objects into a single dataframe and display the dataframe directly.
In [65]:
# Construct mean price series
mean_price_series = pd.Series(mean_prices_dict)

# Construct mean mileage series
mean_mileage_series = pd.Series(mean_mileage_dict)

print('** Mean price series **')
print(mean_price_series)
print('\n')

print('** Mean mileage series **')
print(mean_mileage_series)
** Mean price series **
volkswagen       5450
bmw              8378
opel             2995
mercedes_benz    8688
audi             9390
ford             3795
renault          2496
peugeot          3109
fiat             2833
seat             4447
dtype: int64


** Mean mileage series **
volkswagen       128526
bmw              132466
opel             129221
mercedes_benz    130638
audi             129041
ford             124042
renault          127886
peugeot          126850
fiat             116943
seat             120803
dtype: int64
In [66]:
# Construct a dataframe from `mean_price_series`
mean_price_mileage_df = pd.DataFrame(mean_price_series, columns=['mean_price_$'])

# Assign `mean_mileage_series` as a new column in this dataframe
mean_price_mileage_df['mean_mileage_km'] = pd.DataFrame(mean_mileage_series)

# Sort the mean price in a descending manner
mean_price_mileage_df.sort_values(['mean_price_$'], ascending=False)
Out[66]:
mean_price_$ mean_mileage_km
audi 9390 129041
mercedes_benz 8688 130638
bmw 8378 132466
volkswagen 5450 128526
seat 4447 120803
ford 3795 124042
peugeot 3109 126850
opel 2995 129221
fiat 2833 116943
renault 2496 127886
In [67]:
# Sort the mean mileage in a descending manner
mean_price_mileage_df.sort_values(['mean_mileage_km'], ascending=False)
Out[67]:
mean_price_$ mean_mileage_km
bmw 8378 132466
mercedes_benz 8688 130638
opel 2995 129221
audi 9390 129041
volkswagen 5450 128526
renault 2496 127886
peugeot 3109 126850
ford 3795 124042
seat 4447 120803
fiat 2833 116943
In [68]:
max_mileage = mean_price_mileage_df['mean_mileage_km'].max()
min_mileage = mean_price_mileage_df['mean_mileage_km'].min()

difference_mileage_percent = round((max_mileage - min_mileage) * 100 / max_mileage, 1) 

print('The difference between the maximum and minimum mileage:   ', difference_mileage_percent, '%')
The difference between the maximum and minimum mileage:    11.7 %

We notice that there are three distinct price gaps between the 10 top brands:

  • More expensive (> 8000): Audi, Mercedes-Benz, BMW
  • Moderate expensive (4000 - 7999): Volkswagen, Seat
  • Less expensive (< 4000): Ford, Peugeot, Opel, Fiat, Renault

The difference in mileage between the maximum and minimum prices among the top 10 brands is not big, only 11.7%.

Based on the comparison result, we do not observe any correlation between the mean price and the mean mileage by brand. In fact, the mean price of the brands is possibly determined by more complicated combined factors, such as credibility of the brand, marketing strategy, popularity in certain countries/regions, car specifications, etc.

Conclusion

In this project, we cleaned the used cars dataset from eBay Kleinanzeigen, by removing the non-value-added columns, outliers in the price column and the illogical registration year and month data. We further analyzed the car listings and discovered :

  • The highest count frequency for odometer is 150,000 km
  • The distribution of date_crawled is fairly consistent throughout the whole crawling period
  • The peak time for ads_created is March 2016 and beginning of April 2016
  • The distribution of last_seen percentage is rather consistent throughout the timeframe
  • No correlation between mean price and mean mileage by brand