eBay Used Car Listing Analysis

The aim of this project is to clean data taken from eBay Kleinanzeigen, the section of the German eBay website related used cars. The goal of which will be a detailed analysis of the used car listings in the data set. The data is provided in a CSV file, so the first order of business will be to read the file into the analysis software.

In [1]:
import numpy as np
import pandas as pd

autos = pd.read_csv('autos.csv', encoding='Latin-1')
In [2]:
autos
Out[2]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot $7,900 test bus 2006 automatik 150 voyager 150,000km 4 diesel chrysler NaN 2016-03-21 00:00:00 0 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot $300 test limousine 1995 manuell 90 golf 150,000km 8 benzin volkswagen NaN 2016-03-20 00:00:00 0 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot $1,990 control limousine 1998 manuell 90 golf 150,000km 12 diesel volkswagen nein 2016-03-16 00:00:00 0 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot $250 test NaN 2000 manuell 0 arosa 150,000km 10 NaN seat nein 2016-03-22 00:00:00 0 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot $590 control bus 1997 manuell 90 megane 150,000km 7 benzin renault nein 2016-03-16 00:00:00 0 15749 2016-04-06 10:46:35
10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau privat Angebot $999 test NaN 2017 manuell 90 NaN 150,000km 4 benzin volkswagen nein 2016-03-14 00:00:00 0 86157 2016-04-07 03:16:21
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot $350 control NaN 2000 NaN 0 NaN 150,000km 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 0 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot $5,299 control kleinwagen 2010 automatik 71 fortwo 50,000km 9 benzin smart nein 2016-03-31 00:00:00 0 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot $1,350 control limousine 1999 manuell 101 a3 150,000km 11 benzin audi nein 2016-03-23 00:00:00 0 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot $3,999 test kleinwagen 2007 manuell 75 clio 150,000km 9 benzin renault NaN 2016-03-23 00:00:00 0 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot $18,900 test coupe 1982 automatik 203 NaN 80,000km 6 benzin sonstige_autos nein 2016-04-01 00:00:00 0 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot $350 test kombi 1999 manuell 101 vectra 150,000km 5 benzin opel nein 2016-03-16 00:00:00 0 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot $5,500 test coupe 1990 manuell 205 scirocco 150,000km 6 benzin volkswagen nein 2016-03-29 00:00:00 0 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot $300 control bus 1995 manuell 150 3er 150,000km 0 benzin bmw NaN 2016-03-26 00:00:00 0 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot $4,150 control suv 2004 manuell 124 andere 150,000km 2 lpg mazda nein 2016-03-17 00:00:00 0 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot $3,500 test kombi 2003 manuell 131 a4 150,000km 5 diesel audi NaN 2016-03-05 00:00:00 0 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot $41,500 test cabrio 2004 manuell 320 911 150,000km 4 benzin porsche nein 2016-03-06 00:00:00 0 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot $25,450 control cabrio 2015 manuell 184 cooper 10,000km 1 benzin mini nein 2016-03-28 00:00:00 0 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot $7,999 control bus 2010 manuell 120 NaN 150,000km 2 diesel peugeot nein 2016-03-10 00:00:00 0 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot $48,500 control limousine 2014 automatik 306 5er 30,000km 12 benzin bmw nein 2016-04-03 00:00:00 0 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung privat Angebot $90 control kombi 1996 manuell 116 NaN 150,000km 4 benzin ford ja 2016-03-21 00:00:00 0 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot $777 control kleinwagen 1992 manuell 54 polo 125,000km 2 benzin volkswagen nein 2016-04-03 00:00:00 0 38110 2016-04-05 23:46:48
27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE privat Angebot $0 control NaN 2005 NaN 0 NaN 150,000km 0 NaN ford NaN 2016-03-27 00:00:00 0 66701 2016-03-27 18:45:01
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot $5,250 control kleinwagen 2007 manuell 110 cooper 150,000km 7 diesel mini ja 2016-03-19 00:00:00 0 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot $4,999 test kombi 2004 automatik 204 e_klasse 150,000km 10 diesel mercedes_benz nein 2016-04-02 00:00:00 0 47638 2016-04-02 12:45:44
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... privat Angebot $15,800 control bus 2010 automatik 136 c4 60,000km 4 diesel citroen nein 2016-03-21 00:00:00 0 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 privat Angebot $950 test kleinwagen 2001 manuell 50 lupo 150,000km 4 benzin volkswagen nein 2016-03-29 00:00:00 0 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. privat Angebot $3,300 control bus 2004 automatik 150 vito 150,000km 10 diesel mercedes_benz ja 2016-03-26 00:00:00 0 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor privat Angebot $6,000 control cabrio 2004 manuell 163 slk 150,000km 11 benzin mercedes_benz nein 2016-03-27 00:00:00 0 53567 2016-03-27 08:25:24
49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... privat Angebot $0 control cabrio 1983 manuell 70 golf 150,000km 2 benzin volkswagen nein 2016-03-20 00:00:00 0 8209 2016-03-27 19:48:16
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort privat Angebot $9,700 control kleinwagen 2012 automatik 88 jazz 100,000km 11 hybrid honda nein 2016-03-27 00:00:00 0 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... privat Angebot $5,900 test kombi 1992 automatik 150 80 150,000km 12 benzin audi nein 2016-03-19 00:00:00 0 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 privat Angebot $5,500 control limousine 2003 manuell 116 c_klasse 150,000km 2 diesel mercedes_benz nein 2016-03-31 00:00:00 0 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic privat Angebot $900 control limousine 1996 automatik 136 e_klasse 150,000km 9 benzin mercedes_benz ja 2016-04-04 00:00:00 0 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style privat Angebot $11,000 test kleinwagen 2011 manuell 90 polo 70,000km 11 diesel volkswagen nein 2016-03-20 00:00:00 0 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V privat Angebot $400 control kombi 1995 manuell 105 escort 125,000km 3 benzin ford NaN 2016-03-12 00:00:00 0 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung privat Angebot $2,000 control kombi 1998 manuell 115 astra 150,000km 12 benzin opel nein 2016-03-15 00:00:00 0 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm privat Angebot $1,950 control kleinwagen 2004 manuell 0 fabia 90,000km 7 benzin skoda NaN 2016-03-29 00:00:00 0 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 privat Angebot $600 test kleinwagen 1999 manuell 101 focus 150,000km 4 benzin ford NaN 2016-03-06 00:00:00 0 52477 2016-03-09 06:16:08
49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... privat Angebot $0 test NaN 2000 NaN 0 NaN 150,000km 0 NaN sonstige_autos NaN 2016-03-31 00:00:00 0 12103 2016-04-02 19:44:53
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! privat Angebot $1,000 control NaN 1995 automatik 0 NaN 150,000km 0 benzin volkswagen NaN 2016-04-02 00:00:00 0 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... privat Angebot $15,900 control limousine 2010 automatik 218 300c 125,000km 11 diesel chrysler nein 2016-04-04 00:00:00 0 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... privat Angebot $21,990 control limousine 2013 manuell 150 a3 50,000km 11 diesel audi nein 2016-03-22 00:00:00 0 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci privat Angebot $9,550 control coupe 2001 manuell 231 3er 150,000km 10 benzin bmw nein 2016-03-28 00:00:00 0 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau privat Angebot $150 test kleinwagen 1997 manuell 0 polo 150,000km 5 benzin volkswagen ja 2016-03-11 00:00:00 0 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban privat Angebot $17,500 test limousine 2012 manuell 156 a_klasse 30,000km 12 benzin mercedes_benz nein 2016-03-21 00:00:00 0 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot $500 control NaN 2016 manuell 0 twingo 150,000km 0 benzin renault NaN 2016-03-06 00:00:00 0 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport privat Angebot $4,800 control kleinwagen 2009 manuell 120 andere 125,000km 9 lpg fiat nein 2016-03-10 00:00:00 0 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug privat Angebot $1,650 control kleinwagen 1997 manuell 0 NaN 150,000km 7 benzin audi NaN 2016-03-15 00:00:00 0 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... privat Angebot $5,000 control kombi 2001 automatik 299 a6 150,000km 1 benzin audi nein 2016-03-22 00:00:00 0 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot $24,900 control limousine 2011 automatik 239 q5 100,000km 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot $1,980 control cabrio 1996 manuell 75 astra 150,000km 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot $13,200 test cabrio 2014 automatik 69 500 5,000km 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot $22,900 control kombi 2013 manuell 150 a3 40,000km 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot $1,250 control limousine 1996 manuell 101 vectra 150,000km 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

50000 rows × 20 columns

In [3]:
autos.info()
autos.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null object
dtypes: int64(5), object(15)
memory usage: 7.6+ MB
Out[3]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

From a quick scan of the data, there are 20 columns, mostly strings. It can be seen that the column headings are formatted to camelcase instead of the preferred snakecase. The other key note is that the headers are in English while the data is in German, it was expected the entire data set including headers would be in German. There is also some numerical row data that is stored as an object, but could be converted to integer or float data with the removal of special characters.

The first step will be to clean up the column headings.

In [4]:
autos.columns #displays current column headings
Out[4]:
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')
In [5]:
columns = autos.columns

mapping_dict = {
'dateCrawled': 'date_crawled',
'name': 'name',
'seller': 'seller',
'offerType': 'offer_type',
'price': 'price',
'abtest': 'ab_test',
'vehicleType': 'vehicle_type',
'yearOfRegistration': 'registration_year',
'gearbox': 'gearbox',
'powerPS': 'power_ps',
'model': 'model',
'odometer': 'odometer',
'monthOfRegistration': 'registration_month',
'fuelType': 'fuel_type',
'brand': 'brand',
'notRepairedDamage': 'unrepaired_damage',
'dateCreated': 'ad_created',
'nrOfPictures': 'nr_of_pictures',
'postalCode': 'postal_code',
'lastSeen': 'last_seen'
}
autos.columns = pd.Series(columns).map(mapping_dict)
autos.head()
Out[5]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

The code above was utilized to change the column headers to the snakecase format and make certain header edits. This will make further coding easier, and provides column headers that are in a more logical presentation.

The next step will be to explore the data further to see if any columns can be excluded due to all values being the same, or for numeric data stored as strings which need converting.

In [6]:
autos.describe(include='all') #provide descriptive statistics for the DataFrame
Out[6]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
count 50000 50000 50000 50000 50000 50000 44905 50000.000000 47320 50000.000000 47242 50000 50000.000000 45518 50000 40171 50000 50000.0 50000.000000 50000
unique 48213 38754 2 2 2357 2 8 NaN 2 NaN 245 13 NaN 7 40 2 76 NaN NaN 39481
top 2016-03-21 16:37:21 Ford_Fiesta privat Angebot $0 test limousine NaN manuell NaN golf 150,000km NaN benzin volkswagen nein 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27
freq 3 78 49999 49999 1421 25756 12859 NaN 36993 NaN 4024 32424 NaN 30107 10687 35232 1946 NaN NaN 8
mean NaN NaN NaN NaN NaN NaN NaN 2005.073280 NaN 116.355920 NaN NaN 5.723360 NaN NaN NaN NaN 0.0 50813.627300 NaN
std NaN NaN NaN NaN NaN NaN NaN 105.712813 NaN 209.216627 NaN NaN 3.711984 NaN NaN NaN NaN 0.0 25779.747957 NaN
min NaN NaN NaN NaN NaN NaN NaN 1000.000000 NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN 0.0 1067.000000 NaN
25% NaN NaN NaN NaN NaN NaN NaN 1999.000000 NaN 70.000000 NaN NaN 3.000000 NaN NaN NaN NaN 0.0 30451.000000 NaN
50% NaN NaN NaN NaN NaN NaN NaN 2003.000000 NaN 105.000000 NaN NaN 6.000000 NaN NaN NaN NaN 0.0 49577.000000 NaN
75% NaN NaN NaN NaN NaN NaN NaN 2008.000000 NaN 150.000000 NaN NaN 9.000000 NaN NaN NaN NaN 0.0 71540.000000 NaN
max NaN NaN NaN NaN NaN NaN NaN 9999.000000 NaN 17700.000000 NaN NaN 12.000000 NaN NaN NaN NaN 0.0 99998.000000 NaN

There are multiple columns that contain just two unique inputs. The majority of these inputs will affect the price of the vehicle and should, therefore, be used in the analysis. There are two, however, that will not really change the analysis at all. The two columns that should be considered to drop would be "seller" and "offer_type".

Furthermore, "price" and "odometer" are numerical inputs stored as strings. These will be cleaned below.

In [7]:
autos['price'] = autos['price'].str.replace('$', '').str.replace(',', '').astype(int) #removing special characters to convert to integers
autos['odometer'] = autos['odometer'].str.replace('km', '').str.replace(',', '').astype(int) #see note above

autos.rename(columns={'odometer': 'odometer_km'}, inplace=True) # rename 'odometer' to convey unit of measure
In [8]:
# checking for outliers in 'price'
autos['price'].unique().shape # number of unique values
Out[8]:
(2357,)
In [9]:
autos['price'].describe() # min/max/mean/median values
Out[9]:
count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64
In [10]:
highest_prices = autos['price'].value_counts().sort_index(ascending=True).head()
lowest_prices = autos['price'].value_counts().sort_index(ascending=False).head()

print(highest_prices)
print(lowest_prices)
0    1421
1     156
2       3
3       1
5       2
Name: price, dtype: int64
99999999    1
27322222    1
12345678    3
11111111    2
10000000    1
Name: price, dtype: int64

From the above it can be seen that there are several cars with a price below '100'. These cars would be worth more if sold for scrap in these cases. For the purposes of this analysis these cars should not be counted in car sales, so these will be excluded. Conversely, there are a few cars listed well above '1,000,000'. While it is possible that there are cars worth this much, anything above this level will be excluded as well.

In [11]:
autos = autos.loc[autos['price'].between(100,1000000), :]  # exclude highest and lowest prices
print(autos)
              date_crawled                                               name  \
0      2016-03-26 17:47:46                   Peugeot_807_160_NAVTECH_ON_BOARD   
1      2016-04-04 13:38:56         BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik   
2      2016-03-26 18:57:24                         Volkswagen_Golf_1.6_United   
3      2016-03-12 16:58:10  Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...   
4      2016-04-01 14:38:50  Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...   
5      2016-03-21 13:47:45  Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...   
6      2016-03-20 17:55:21  VW_Golf_III_GT_Special_Electronic_Green_Metall...   
7      2016-03-16 18:55:19                               Golf_IV_1.9_TDI_90PS   
8      2016-03-22 16:51:34                                         Seat_Arosa   
9      2016-03-16 13:47:02          Renault_Megane_Scenic_1.6e_RT_Klimaanlage   
10     2016-03-15 01:41:36                       VW_Golf_Tuning_in_siber/grau   
11     2016-03-16 18:45:34                         Mercedes_A140_Motorschaden   
12     2016-03-31 19:48:22  Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan...   
13     2016-03-23 10:48:32                                 Audi_A3_1.6_tuning   
14     2016-03-23 11:50:46  Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver...   
15     2016-04-01 12:06:20        Corvette_C3_Coupe_T_Top_Crossfire_Injection   
16     2016-03-16 14:59:02                                Opel_Vectra_B_Kombi   
17     2016-03-29 11:46:22                          Volkswagen_Scirocco_2_G60   
18     2016-03-26 19:57:44               Verkaufen_mein_bmw_e36_320_i_touring   
19     2016-03-17 13:36:21        mazda_tribute_2.0_mit_gas_und_tuev_neu_2018   
20     2016-03-05 19:57:31  Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*...   
21     2016-03-06 19:07:10                      Porsche_911_Carrera_4S_Cabrio   
22     2016-03-28 20:50:54                               MINI_Cooper_S_Cabrio   
23     2016-03-10 19:55:34   Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima   
24     2016-04-03 11:57:02                         BMW_535i_xDrive_Sport_Aut.   
26     2016-04-03 22:46:28                                Volkswagen_Polo_Fox   
28     2016-03-19 21:56:19                                      MINI_Cooper_D   
29     2016-04-02 12:45:44  Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz...   
31     2016-03-14 16:53:09                          Noch_gut_erhaltenen_C_320   
32     2016-03-20 05:03:03                               Corsa_mit_TÜV_5.2016   
...                    ...                                                ...   
49968  2016-04-01 17:49:15                  Mercedes_Benz_190_D_2.5_Automatik   
49969  2016-03-17 18:49:02             Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZ   
49970  2016-03-21 22:47:37  c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp...   
49971  2016-03-29 14:54:12                                         W.Lupo_1.0   
49972  2016-03-26 22:25:23          Mercedes_Benz_Vito_115_CDI_Extralang_Aut.   
49973  2016-03-27 05:32:39                   Mercedes_Benz_SLK_200_Kompressor   
49975  2016-03-27 20:51:39          Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort   
49976  2016-03-19 18:56:05  Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga...   
49977  2016-03-31 18:37:18                        Mercedes_Benz_C200_Cdi_W203   
49978  2016-04-04 10:37:14                        Mercedes_Benz_E_200_Classic   
49979  2016-03-20 18:38:40                      Volkswagen_Polo_1.6_TDI_Style   
49980  2016-03-12 10:55:54                            Ford_Escort_Turnier_16V   
49981  2016-03-15 09:38:21             Opel_Astra_Kombi_mit_Anhaengerkupplung   
49982  2016-03-29 18:51:08            Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm   
49983  2016-03-06 12:43:04                                      Ford_focus_99   
49985  2016-04-02 16:38:23                          Verkaufe_meinen_vw_vento!   
49986  2016-04-04 20:46:02  Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst...   
49987  2016-03-22 20:47:27  Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__...   
49988  2016-03-28 19:49:51                                         BMW_330_Ci   
49989  2016-03-11 19:50:37        VW_Polo_zum_Ausschlachten_oder_Wiederaufbau   
49990  2016-03-21 19:54:19         Mercedes_Benz_A_200__BlueEFFICIENCY__Urban   
49991  2016-03-06 15:25:19                                         Kleinwagen   
49992  2016-03-10 19:37:38              Fiat_Grande_Punto_1.4_T_Jet_16V_Sport   
49993  2016-03-15 18:47:35           Audi_A3__1_8l__Silber;_schoenes_Fahrzeug   
49994  2016-03-22 17:36:42  Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc...   
49995  2016-03-27 14:38:19   Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon   
49996  2016-03-28 10:50:25  Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+...   
49997  2016-04-02 14:44:48                     Fiat_500_C_1.2_Dualogic_Lounge   
49998  2016-03-08 19:25:42                 Audi_A3_2.0_TDI_Sportback_Ambition   
49999  2016-03-14 00:42:12                                Opel_Vectra_1.6_16V   

       seller offer_type  price  ab_test vehicle_type  registration_year  \
0      privat    Angebot   5000  control          bus               2004   
1      privat    Angebot   8500  control    limousine               1997   
2      privat    Angebot   8990     test    limousine               2009   
3      privat    Angebot   4350  control   kleinwagen               2007   
4      privat    Angebot   1350     test        kombi               2003   
5      privat    Angebot   7900     test          bus               2006   
6      privat    Angebot    300     test    limousine               1995   
7      privat    Angebot   1990  control    limousine               1998   
8      privat    Angebot    250     test          NaN               2000   
9      privat    Angebot    590  control          bus               1997   
10     privat    Angebot    999     test          NaN               2017   
11     privat    Angebot    350  control          NaN               2000   
12     privat    Angebot   5299  control   kleinwagen               2010   
13     privat    Angebot   1350  control    limousine               1999   
14     privat    Angebot   3999     test   kleinwagen               2007   
15     privat    Angebot  18900     test        coupe               1982   
16     privat    Angebot    350     test        kombi               1999   
17     privat    Angebot   5500     test        coupe               1990   
18     privat    Angebot    300  control          bus               1995   
19     privat    Angebot   4150  control          suv               2004   
20     privat    Angebot   3500     test        kombi               2003   
21     privat    Angebot  41500     test       cabrio               2004   
22     privat    Angebot  25450  control       cabrio               2015   
23     privat    Angebot   7999  control          bus               2010   
24     privat    Angebot  48500  control    limousine               2014   
26     privat    Angebot    777  control   kleinwagen               1992   
28     privat    Angebot   5250  control   kleinwagen               2007   
29     privat    Angebot   4999     test        kombi               2004   
31     privat    Angebot   2850     test        kombi               2002   
32     privat    Angebot    350  control   kleinwagen               1999   
...       ...        ...    ...      ...          ...                ...   
49968  privat    Angebot   2100     test    limousine               1986   
49969  privat    Angebot   4500  control          suv               2005   
49970  privat    Angebot  15800  control          bus               2010   
49971  privat    Angebot    950     test   kleinwagen               2001   
49972  privat    Angebot   3300  control          bus               2004   
49973  privat    Angebot   6000  control       cabrio               2004   
49975  privat    Angebot   9700  control   kleinwagen               2012   
49976  privat    Angebot   5900     test        kombi               1992   
49977  privat    Angebot   5500  control    limousine               2003   
49978  privat    Angebot    900  control    limousine               1996   
49979  privat    Angebot  11000     test   kleinwagen               2011   
49980  privat    Angebot    400  control        kombi               1995   
49981  privat    Angebot   2000  control        kombi               1998   
49982  privat    Angebot   1950  control   kleinwagen               2004   
49983  privat    Angebot    600     test   kleinwagen               1999   
49985  privat    Angebot   1000  control          NaN               1995   
49986  privat    Angebot  15900  control    limousine               2010   
49987  privat    Angebot  21990  control    limousine               2013   
49988  privat    Angebot   9550  control        coupe               2001   
49989  privat    Angebot    150     test   kleinwagen               1997   
49990  privat    Angebot  17500     test    limousine               2012   
49991  privat    Angebot    500  control          NaN               2016   
49992  privat    Angebot   4800  control   kleinwagen               2009   
49993  privat    Angebot   1650  control   kleinwagen               1997   
49994  privat    Angebot   5000  control        kombi               2001   
49995  privat    Angebot  24900  control    limousine               2011   
49996  privat    Angebot   1980  control       cabrio               1996   
49997  privat    Angebot  13200     test       cabrio               2014   
49998  privat    Angebot  22900  control        kombi               2013   
49999  privat    Angebot   1250  control    limousine               1996   

         gearbox  power_ps     model  odometer_km  registration_month  \
0        manuell       158    andere       150000                   3   
1      automatik       286       7er       150000                   6   
2        manuell       102      golf        70000                   7   
3      automatik        71    fortwo        70000                   6   
4        manuell         0     focus       150000                   7   
5      automatik       150   voyager       150000                   4   
6        manuell        90      golf       150000                   8   
7        manuell        90      golf       150000                  12   
8        manuell         0     arosa       150000                  10   
9        manuell        90    megane       150000                   7   
10       manuell        90       NaN       150000                   4   
11           NaN         0       NaN       150000                   0   
12     automatik        71    fortwo        50000                   9   
13       manuell       101        a3       150000                  11   
14       manuell        75      clio       150000                   9   
15     automatik       203       NaN        80000                   6   
16       manuell       101    vectra       150000                   5   
17       manuell       205  scirocco       150000                   6   
18       manuell       150       3er       150000                   0   
19       manuell       124    andere       150000                   2   
20       manuell       131        a4       150000                   5   
21       manuell       320       911       150000                   4   
22       manuell       184    cooper        10000                   1   
23       manuell       120       NaN       150000                   2   
24     automatik       306       5er        30000                  12   
26       manuell        54      polo       125000                   2   
28       manuell       110    cooper       150000                   7   
29     automatik       204  e_klasse       150000                  10   
31     automatik       218  c_klasse       150000                   7   
32       manuell         0     corsa       150000                   7   
...          ...       ...       ...          ...                 ...   
49968  automatik        90    andere       150000                   9   
49969    manuell       136   x_trail       150000                   5   
49970  automatik       136        c4        60000                   4   
49971    manuell        50      lupo       150000                   4   
49972  automatik       150      vito       150000                  10   
49973    manuell       163       slk       150000                  11   
49975  automatik        88      jazz       100000                  11   
49976  automatik       150        80       150000                  12   
49977    manuell       116  c_klasse       150000                   2   
49978  automatik       136  e_klasse       150000                   9   
49979    manuell        90      polo        70000                  11   
49980    manuell       105    escort       125000                   3   
49981    manuell       115     astra       150000                  12   
49982    manuell         0     fabia        90000                   7   
49983    manuell       101     focus       150000                   4   
49985  automatik         0       NaN       150000                   0   
49986  automatik       218      300c       125000                  11   
49987    manuell       150        a3        50000                  11   
49988    manuell       231       3er       150000                  10   
49989    manuell         0      polo       150000                   5   
49990    manuell       156  a_klasse        30000                  12   
49991    manuell         0    twingo       150000                   0   
49992    manuell       120    andere       125000                   9   
49993    manuell         0       NaN       150000                   7   
49994  automatik       299        a6       150000                   1   
49995  automatik       239        q5       100000                   1   
49996    manuell        75     astra       150000                   5   
49997  automatik        69       500         5000                  11   
49998    manuell       150        a3        40000                  11   
49999    manuell       101    vectra       150000                   1   

      fuel_type           brand unrepaired_damage           ad_created  \
0           lpg         peugeot              nein  2016-03-26 00:00:00   
1        benzin             bmw              nein  2016-04-04 00:00:00   
2        benzin      volkswagen              nein  2016-03-26 00:00:00   
3        benzin           smart              nein  2016-03-12 00:00:00   
4        benzin            ford              nein  2016-04-01 00:00:00   
5        diesel        chrysler               NaN  2016-03-21 00:00:00   
6        benzin      volkswagen               NaN  2016-03-20 00:00:00   
7        diesel      volkswagen              nein  2016-03-16 00:00:00   
8           NaN            seat              nein  2016-03-22 00:00:00   
9        benzin         renault              nein  2016-03-16 00:00:00   
10       benzin      volkswagen              nein  2016-03-14 00:00:00   
11       benzin   mercedes_benz               NaN  2016-03-16 00:00:00   
12       benzin           smart              nein  2016-03-31 00:00:00   
13       benzin            audi              nein  2016-03-23 00:00:00   
14       benzin         renault               NaN  2016-03-23 00:00:00   
15       benzin  sonstige_autos              nein  2016-04-01 00:00:00   
16       benzin            opel              nein  2016-03-16 00:00:00   
17       benzin      volkswagen              nein  2016-03-29 00:00:00   
18       benzin             bmw               NaN  2016-03-26 00:00:00   
19          lpg           mazda              nein  2016-03-17 00:00:00   
20       diesel            audi               NaN  2016-03-05 00:00:00   
21       benzin         porsche              nein  2016-03-06 00:00:00   
22       benzin            mini              nein  2016-03-28 00:00:00   
23       diesel         peugeot              nein  2016-03-10 00:00:00   
24       benzin             bmw              nein  2016-04-03 00:00:00   
26       benzin      volkswagen              nein  2016-04-03 00:00:00   
28       diesel            mini                ja  2016-03-19 00:00:00   
29       diesel   mercedes_benz              nein  2016-04-02 00:00:00   
31       benzin   mercedes_benz              nein  2016-03-14 00:00:00   
32       benzin            opel               NaN  2016-03-20 00:00:00   
...         ...             ...               ...                  ...   
49968    diesel   mercedes_benz              nein  2016-04-01 00:00:00   
49969    diesel          nissan              nein  2016-03-17 00:00:00   
49970    diesel         citroen              nein  2016-03-21 00:00:00   
49971    benzin      volkswagen              nein  2016-03-29 00:00:00   
49972    diesel   mercedes_benz                ja  2016-03-26 00:00:00   
49973    benzin   mercedes_benz              nein  2016-03-27 00:00:00   
49975    hybrid           honda              nein  2016-03-27 00:00:00   
49976    benzin            audi              nein  2016-03-19 00:00:00   
49977    diesel   mercedes_benz              nein  2016-03-31 00:00:00   
49978    benzin   mercedes_benz                ja  2016-04-04 00:00:00   
49979    diesel      volkswagen              nein  2016-03-20 00:00:00   
49980    benzin            ford               NaN  2016-03-12 00:00:00   
49981    benzin            opel              nein  2016-03-15 00:00:00   
49982    benzin           skoda               NaN  2016-03-29 00:00:00   
49983    benzin            ford               NaN  2016-03-06 00:00:00   
49985    benzin      volkswagen               NaN  2016-04-02 00:00:00   
49986    diesel        chrysler              nein  2016-04-04 00:00:00   
49987    diesel            audi              nein  2016-03-22 00:00:00   
49988    benzin             bmw              nein  2016-03-28 00:00:00   
49989    benzin      volkswagen                ja  2016-03-11 00:00:00   
49990    benzin   mercedes_benz              nein  2016-03-21 00:00:00   
49991    benzin         renault               NaN  2016-03-06 00:00:00   
49992       lpg            fiat              nein  2016-03-10 00:00:00   
49993    benzin            audi               NaN  2016-03-15 00:00:00   
49994    benzin            audi              nein  2016-03-22 00:00:00   
49995    diesel            audi              nein  2016-03-27 00:00:00   
49996    benzin            opel              nein  2016-03-28 00:00:00   
49997    benzin            fiat              nein  2016-04-02 00:00:00   
49998    diesel            audi              nein  2016-03-08 00:00:00   
49999    benzin            opel              nein  2016-03-13 00:00:00   

       nr_of_pictures  postal_code            last_seen  
0                   0        79588  2016-04-06 06:45:54  
1                   0        71034  2016-04-06 14:45:08  
2                   0        35394  2016-04-06 20:15:37  
3                   0        33729  2016-03-15 03:16:28  
4                   0        39218  2016-04-01 14:38:50  
5                   0        22962  2016-04-06 09:45:21  
6                   0        31535  2016-03-23 02:48:59  
7                   0        53474  2016-04-07 03:17:32  
8                   0         7426  2016-03-26 18:18:10  
9                   0        15749  2016-04-06 10:46:35  
10                  0        86157  2016-04-07 03:16:21  
11                  0        17498  2016-03-16 18:45:34  
12                  0        34590  2016-04-06 14:17:52  
13                  0        12043  2016-04-01 14:17:13  
14                  0        81737  2016-04-01 15:46:47  
15                  0        61276  2016-04-02 21:10:48  
16                  0        57299  2016-03-18 05:29:37  
17                  0        74821  2016-04-05 20:46:26  
18                  0        54329  2016-04-02 12:16:41  
19                  0        40878  2016-03-17 14:45:58  
20                  0        53913  2016-03-07 05:46:46  
21                  0        65428  2016-04-05 23:46:19  
22                  0        44789  2016-04-01 06:45:30  
23                  0        30900  2016-03-17 08:45:17  
24                  0        22547  2016-04-07 13:16:50  
26                  0        38110  2016-04-05 23:46:48  
28                  0        15745  2016-04-07 14:58:48  
29                  0        47638  2016-04-02 12:45:44  
31                  0        41065  2016-03-16 07:19:04  
32                  0        27619  2016-04-06 03:15:20  
...               ...          ...                  ...  
49968               0        40227  2016-04-05 13:16:35  
49969               0        17379  2016-03-25 23:18:15  
49970               0        14947  2016-04-07 04:17:34  
49971               0        65197  2016-03-29 20:41:51  
49972               0        65326  2016-03-28 11:28:18  
49973               0        53567  2016-03-27 08:25:24  
49975               0        84385  2016-04-05 19:45:34  
49976               0        36100  2016-04-07 06:16:44  
49977               0        33739  2016-04-06 12:16:11  
49978               0        24405  2016-04-06 12:44:20  
49979               0        48455  2016-04-07 01:45:12  
49980               0        56218  2016-04-06 17:16:49  
49981               0        86859  2016-04-05 17:21:46  
49982               0        45884  2016-03-29 18:51:08  
49983               0        52477  2016-03-09 06:16:08  
49985               0        30900  2016-04-06 15:17:52  
49986               0        73527  2016-04-06 23:16:00  
49987               0        94362  2016-03-26 22:46:06  
49988               0        83646  2016-04-07 02:17:40  
49989               0        21244  2016-03-12 10:17:55  
49990               0        58239  2016-04-06 22:46:57  
49991               0        61350  2016-03-06 18:24:19  
49992               0        68642  2016-03-13 01:44:51  
49993               0        65203  2016-04-06 19:46:53  
49994               0        46537  2016-04-06 08:16:39  
49995               0        82131  2016-04-01 13:47:40  
49996               0        44807  2016-04-02 14:18:02  
49997               0        73430  2016-04-04 11:47:27  
49998               0        35683  2016-04-05 16:45:07  
49999               0        45897  2016-04-06 21:18:48  

[48227 rows x 20 columns]

Now the same analysis will be performed for the 'odometer_km'.

In [12]:
# checking for outliers in 'odometer_km'
autos['odometer_km'].unique().shape # number of unique values
Out[12]:
(13,)
In [13]:
autos['odometer_km'].describe() # min/max/mean/median values
Out[13]:
count     48227.000000
mean     125920.127729
std       39542.413981
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64
In [14]:
highest_odometer = autos['odometer_km'].value_counts().sort_index(ascending=True).head()
lowest_odometer = autos['odometer_km'].value_counts().sort_index(ascending=False).head()

print(highest_odometer)
print(lowest_odometer)
5000     760
10000    245
20000    757
30000    777
40000    814
Name: odometer_km, dtype: int64
150000    31214
125000     5038
100000     2101
90000      1733
80000      1412
Name: odometer_km, dtype: int64

The odomoeter readings in the data set are not as diverse and all seem reasonable, therefore no values will be removed.

The next step will be to get a breakdown of the values in 'date_crawled', 'ad-created', and 'last_seen. Followed by an understanding of 'registration_year'.

In [15]:
autos['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index()
Out[15]:
2016-03-05    0.025359
2016-03-06    0.014038
2016-03-07    0.036059
2016-03-08    0.033176
2016-03-09    0.033011
2016-03-10    0.032285
2016-03-11    0.032596
2016-03-12    0.036909
2016-03-13    0.015676
2016-03-14    0.036660
2016-03-15    0.034317
2016-03-16    0.029465
2016-03-17    0.031518
2016-03-18    0.012897
2016-03-19    0.034732
2016-03-20    0.037800
2016-03-21    0.037220
2016-03-22    0.032886
2016-03-23    0.032285
2016-03-24    0.029444
2016-03-25    0.031497
2016-03-26    0.032306
2016-03-27    0.031124
2016-03-28    0.034960
2016-03-29    0.034130
2016-03-30    0.033736
2016-03-31    0.031849
2016-04-01    0.033695
2016-04-02    0.035602
2016-04-03    0.038609
2016-04-04    0.036536
2016-04-05    0.013063
2016-04-06    0.003172
2016-04-07    0.001389
Name: date_crawled, dtype: float64

The 'date_crawled' data shows that everything has been crawled in the last ~30 days, with an equal distribution amongst all dates.

In [16]:
autos['ad_created'].str[:10].value_counts(normalize=True, dropna=False).sort_index()
Out[16]:
2015-06-11    0.000021
2015-08-10    0.000021
2015-09-09    0.000021
2015-11-10    0.000021
2015-12-05    0.000021
2015-12-30    0.000021
2016-01-03    0.000021
2016-01-07    0.000021
2016-01-10    0.000041
2016-01-13    0.000021
2016-01-14    0.000021
2016-01-16    0.000021
2016-01-22    0.000021
2016-01-27    0.000062
2016-01-29    0.000021
2016-02-01    0.000021
2016-02-02    0.000041
2016-02-05    0.000041
2016-02-07    0.000021
2016-02-08    0.000021
2016-02-09    0.000021
2016-02-11    0.000021
2016-02-12    0.000041
2016-02-14    0.000041
2016-02-16    0.000021
2016-02-17    0.000021
2016-02-18    0.000041
2016-02-19    0.000062
2016-02-20    0.000041
2016-02-21    0.000062
                ...   
2016-03-09    0.033093
2016-03-10    0.031995
2016-03-11    0.032907
2016-03-12    0.036743
2016-03-13    0.017044
2016-03-14    0.035291
2016-03-15    0.034047
2016-03-16    0.029962
2016-03-17    0.031186
2016-03-18    0.013582
2016-03-19    0.033612
2016-03-20    0.037863
2016-03-21    0.037448
2016-03-22    0.032700
2016-03-23    0.032119
2016-03-24    0.029382
2016-03-25    0.031621
2016-03-26    0.032368
2016-03-27    0.031041
2016-03-28    0.035063
2016-03-29    0.034089
2016-03-30    0.033550
2016-03-31    0.031891
2016-04-01    0.033674
2016-04-02    0.035291
2016-04-03    0.038858
2016-04-04    0.036888
2016-04-05    0.011798
2016-04-06    0.003255
2016-04-07    0.001244
Name: ad_created, Length: 76, dtype: float64

The 'ad_created' column shows the same distribution amongst the 'web_crawled' dates, but there are several entries dated before this. This are probably erroneous or old data.

In [17]:
autos['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index()
Out[17]:
2016-03-05    0.001078
2016-03-06    0.004313
2016-03-07    0.005433
2016-03-08    0.007320
2016-03-09    0.009580
2016-03-10    0.010637
2016-03-11    0.012400
2016-03-12    0.023783
2016-03-13    0.008875
2016-03-14    0.012628
2016-03-15    0.015862
2016-03-16    0.016443
2016-03-17    0.028096
2016-03-18    0.007320
2016-03-19    0.015759
2016-03-20    0.020652
2016-03-21    0.020549
2016-03-22    0.021357
2016-03-23    0.018579
2016-03-24    0.019761
2016-03-25    0.019097
2016-03-26    0.016671
2016-03-27    0.015551
2016-03-28    0.020860
2016-03-29    0.022311
2016-03-30    0.024696
2016-03-31    0.023825
2016-04-01    0.022850
2016-04-02    0.024882
2016-04-03    0.025131
2016-04-04    0.024530
2016-04-05    0.125054
2016-04-06    0.221971
2016-04-07    0.132146
Name: last_seen, dtype: float64

The 'last_seen' column trends as expected. The older the post the less likely it is to be seen more recently. This is most likely due to being 'bumped' down the site by newer ads and postings.

In [18]:
autos['registration_year'].describe()
Out[18]:
count    48227.000000
mean      2004.730151
std         87.894768
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

The mean of the 'registration_year' column shows a fairly current year (16 years old at time of report), but a scan of the min (1000) and max(9999) show that there is some erroneous data to be excluded here as well.

Removing the bad data from the 'registration_year' is the next logical step. Clearly, the max can not be above 2016, which is the year this data was aggregated. Given that the first mass produced vehicles did not begin production until roughly the 1920's, a minimum of 1920 will be used.

In [19]:
autos = autos[autos["registration_year"].between(1920, 2016)]
autos['registration_year'].value_counts(normalize=True).sort_index(ascending=True)
Out[19]:
1927    0.000022
1929    0.000022
1931    0.000022
1934    0.000043
1937    0.000086
1938    0.000022
1939    0.000022
1941    0.000043
1943    0.000022
1948    0.000022
1950    0.000022
1951    0.000043
1952    0.000022
1953    0.000022
1954    0.000043
1955    0.000043
1956    0.000086
1957    0.000043
1958    0.000086
1959    0.000129
1960    0.000475
1961    0.000129
1962    0.000086
1963    0.000173
1964    0.000259
1965    0.000367
1966    0.000475
1967    0.000561
1968    0.000561
1969    0.000410
          ...   
1987    0.001553
1988    0.002869
1989    0.003689
1990    0.007162
1991    0.007292
1992    0.007918
1993    0.009061
1994    0.013505
1995    0.025737
1996    0.029232
1997    0.041529
1998    0.050482
1999    0.062110
2000    0.066964
2001    0.056717
2002    0.053438
2003    0.058098
2004    0.058227
2005    0.062801
2006    0.057558
2007    0.049037
2008    0.047678
2009    0.044895
2010    0.034237
2011    0.034906
2012    0.028218
2013    0.017280
2014    0.014282
2015    0.008198
2016    0.025931
Name: registration_year, Length: 77, dtype: float64

The results of this data cull trend as expected, because as registration year increases the percentage of registered cars increases. There is also a noticeable dip in the five most recent years given that most car owners keep a car for about that long.

Now it should be determined what the most popular cars are. The top ten vehicle brands and mean price will be aggregated.

In [35]:
top_brand_mean_price = {}
brands = autos["brand"].value_counts().head(10).index
for b in brands:
    selected_rows = autos[autos["brand"] == b]
    b_mean = selected_rows["price"].mean()
    top_brand_mean_price[b] = b_mean

print(top_brand_mean_price)
   
{'audi': 9380.718547986076, 'mercedes_benz': 8672.654241071428, 'ford': 4086.93421865349, 'bmw': 8381.677305658899, 'seat': 4433.419621749409, 'opel': 3006.0002012072437, 'volkswagen': 5639.900316294255, 'fiat': 2836.8736310025274, 'peugeot': 3113.860549132948, 'renault': 2496.940394314535}

Looking at the top 10 brands, it isn't a surprise that the top two are both German car companies. It is surprising that number three is Ford from the USA. With used cars, the expectation would be that price would play a key part in popularity. This does not really corelate though, as three of the top four (also all German manufacturers) brands are nearly double the mean price of the other seven brands. It appears that Germans just prefer to purchase German vehicles.

In [36]:
top_brand_mean_odometer = {}
brands = autos["brand"].value_counts().head(10).index
for o in brands:
    selected_rows = autos[autos["brand"] == o]
    o_mean = selected_rows["odometer_km"].mean()
    top_brand_mean_odometer[o] = o_mean

print(top_brand_mean_odometer)
{'audi': 129245.40029835903, 'mercedes_benz': 131025.66964285714, 'ford': 124277.33168622607, 'bmw': 132695.32014881534, 'seat': 121536.64302600473, 'opel': 129380.28169014085, 'volkswagen': 128804.2036526885, 'fiat': 116950.29486099411, 'peugeot': 127127.8901734104, 'renault': 128337.91838606144}
In [41]:
bmp_series = pd.Series(top_brand_mean_price)
bmm_series = pd.Series(top_brand_mean_odometer)

df = pd.DataFrame(bmp_series, columns=['mean_price'])
df.insert(1, 'mean_odometer', bmm_series)

print(df.to_string())
                mean_price  mean_odometer
audi           9380.718548  129245.400298
bmw            8381.677306  132695.320149
fiat           2836.873631  116950.294861
ford           4086.934219  124277.331686
mercedes_benz  8672.654241  131025.669643
opel           3006.000201  129380.281690
peugeot        3113.860549  127127.890173
renault        2496.940394  128337.918386
seat           4433.419622  121536.643026
volkswagen     5639.900316  128804.203653
In [ ]: