Cleaning and Analyzing the Data

We are working with a data set used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website. There are 50K records in dataset. The purpose of this project is to clean the data and analyze the included used car listing

In [1]:
import pandas as pd 
import numpy as np

autos = pd.read_csv("autos.csv", encoding='Latin-1')
In [2]:
autos
Out[2]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot $7,900 test bus 2006 automatik 150 voyager 150,000km 4 diesel chrysler NaN 2016-03-21 00:00:00 0 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot $300 test limousine 1995 manuell 90 golf 150,000km 8 benzin volkswagen NaN 2016-03-20 00:00:00 0 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot $1,990 control limousine 1998 manuell 90 golf 150,000km 12 diesel volkswagen nein 2016-03-16 00:00:00 0 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot $250 test NaN 2000 manuell 0 arosa 150,000km 10 NaN seat nein 2016-03-22 00:00:00 0 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot $590 control bus 1997 manuell 90 megane 150,000km 7 benzin renault nein 2016-03-16 00:00:00 0 15749 2016-04-06 10:46:35
10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau privat Angebot $999 test NaN 2017 manuell 90 NaN 150,000km 4 benzin volkswagen nein 2016-03-14 00:00:00 0 86157 2016-04-07 03:16:21
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot $350 control NaN 2000 NaN 0 NaN 150,000km 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 0 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot $5,299 control kleinwagen 2010 automatik 71 fortwo 50,000km 9 benzin smart nein 2016-03-31 00:00:00 0 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot $1,350 control limousine 1999 manuell 101 a3 150,000km 11 benzin audi nein 2016-03-23 00:00:00 0 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot $3,999 test kleinwagen 2007 manuell 75 clio 150,000km 9 benzin renault NaN 2016-03-23 00:00:00 0 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot $18,900 test coupe 1982 automatik 203 NaN 80,000km 6 benzin sonstige_autos nein 2016-04-01 00:00:00 0 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot $350 test kombi 1999 manuell 101 vectra 150,000km 5 benzin opel nein 2016-03-16 00:00:00 0 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot $5,500 test coupe 1990 manuell 205 scirocco 150,000km 6 benzin volkswagen nein 2016-03-29 00:00:00 0 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot $300 control bus 1995 manuell 150 3er 150,000km 0 benzin bmw NaN 2016-03-26 00:00:00 0 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot $4,150 control suv 2004 manuell 124 andere 150,000km 2 lpg mazda nein 2016-03-17 00:00:00 0 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot $3,500 test kombi 2003 manuell 131 a4 150,000km 5 diesel audi NaN 2016-03-05 00:00:00 0 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot $41,500 test cabrio 2004 manuell 320 911 150,000km 4 benzin porsche nein 2016-03-06 00:00:00 0 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot $25,450 control cabrio 2015 manuell 184 cooper 10,000km 1 benzin mini nein 2016-03-28 00:00:00 0 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot $7,999 control bus 2010 manuell 120 NaN 150,000km 2 diesel peugeot nein 2016-03-10 00:00:00 0 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot $48,500 control limousine 2014 automatik 306 5er 30,000km 12 benzin bmw nein 2016-04-03 00:00:00 0 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung privat Angebot $90 control kombi 1996 manuell 116 NaN 150,000km 4 benzin ford ja 2016-03-21 00:00:00 0 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot $777 control kleinwagen 1992 manuell 54 polo 125,000km 2 benzin volkswagen nein 2016-04-03 00:00:00 0 38110 2016-04-05 23:46:48
27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE privat Angebot $0 control NaN 2005 NaN 0 NaN 150,000km 0 NaN ford NaN 2016-03-27 00:00:00 0 66701 2016-03-27 18:45:01
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot $5,250 control kleinwagen 2007 manuell 110 cooper 150,000km 7 diesel mini ja 2016-03-19 00:00:00 0 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot $4,999 test kombi 2004 automatik 204 e_klasse 150,000km 10 diesel mercedes_benz nein 2016-04-02 00:00:00 0 47638 2016-04-02 12:45:44
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... privat Angebot $15,800 control bus 2010 automatik 136 c4 60,000km 4 diesel citroen nein 2016-03-21 00:00:00 0 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 privat Angebot $950 test kleinwagen 2001 manuell 50 lupo 150,000km 4 benzin volkswagen nein 2016-03-29 00:00:00 0 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. privat Angebot $3,300 control bus 2004 automatik 150 vito 150,000km 10 diesel mercedes_benz ja 2016-03-26 00:00:00 0 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor privat Angebot $6,000 control cabrio 2004 manuell 163 slk 150,000km 11 benzin mercedes_benz nein 2016-03-27 00:00:00 0 53567 2016-03-27 08:25:24
49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... privat Angebot $0 control cabrio 1983 manuell 70 golf 150,000km 2 benzin volkswagen nein 2016-03-20 00:00:00 0 8209 2016-03-27 19:48:16
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort privat Angebot $9,700 control kleinwagen 2012 automatik 88 jazz 100,000km 11 hybrid honda nein 2016-03-27 00:00:00 0 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... privat Angebot $5,900 test kombi 1992 automatik 150 80 150,000km 12 benzin audi nein 2016-03-19 00:00:00 0 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 privat Angebot $5,500 control limousine 2003 manuell 116 c_klasse 150,000km 2 diesel mercedes_benz nein 2016-03-31 00:00:00 0 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic privat Angebot $900 control limousine 1996 automatik 136 e_klasse 150,000km 9 benzin mercedes_benz ja 2016-04-04 00:00:00 0 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style privat Angebot $11,000 test kleinwagen 2011 manuell 90 polo 70,000km 11 diesel volkswagen nein 2016-03-20 00:00:00 0 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V privat Angebot $400 control kombi 1995 manuell 105 escort 125,000km 3 benzin ford NaN 2016-03-12 00:00:00 0 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung privat Angebot $2,000 control kombi 1998 manuell 115 astra 150,000km 12 benzin opel nein 2016-03-15 00:00:00 0 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm privat Angebot $1,950 control kleinwagen 2004 manuell 0 fabia 90,000km 7 benzin skoda NaN 2016-03-29 00:00:00 0 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 privat Angebot $600 test kleinwagen 1999 manuell 101 focus 150,000km 4 benzin ford NaN 2016-03-06 00:00:00 0 52477 2016-03-09 06:16:08
49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... privat Angebot $0 test NaN 2000 NaN 0 NaN 150,000km 0 NaN sonstige_autos NaN 2016-03-31 00:00:00 0 12103 2016-04-02 19:44:53
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! privat Angebot $1,000 control NaN 1995 automatik 0 NaN 150,000km 0 benzin volkswagen NaN 2016-04-02 00:00:00 0 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... privat Angebot $15,900 control limousine 2010 automatik 218 300c 125,000km 11 diesel chrysler nein 2016-04-04 00:00:00 0 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... privat Angebot $21,990 control limousine 2013 manuell 150 a3 50,000km 11 diesel audi nein 2016-03-22 00:00:00 0 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci privat Angebot $9,550 control coupe 2001 manuell 231 3er 150,000km 10 benzin bmw nein 2016-03-28 00:00:00 0 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau privat Angebot $150 test kleinwagen 1997 manuell 0 polo 150,000km 5 benzin volkswagen ja 2016-03-11 00:00:00 0 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban privat Angebot $17,500 test limousine 2012 manuell 156 a_klasse 30,000km 12 benzin mercedes_benz nein 2016-03-21 00:00:00 0 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen privat Angebot $500 control NaN 2016 manuell 0 twingo 150,000km 0 benzin renault NaN 2016-03-06 00:00:00 0 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport privat Angebot $4,800 control kleinwagen 2009 manuell 120 andere 125,000km 9 lpg fiat nein 2016-03-10 00:00:00 0 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug privat Angebot $1,650 control kleinwagen 1997 manuell 0 NaN 150,000km 7 benzin audi NaN 2016-03-15 00:00:00 0 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... privat Angebot $5,000 control kombi 2001 automatik 299 a6 150,000km 1 benzin audi nein 2016-03-22 00:00:00 0 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot $24,900 control limousine 2011 automatik 239 q5 100,000km 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot $1,980 control cabrio 1996 manuell 75 astra 150,000km 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot $13,200 test cabrio 2014 automatik 69 500 5,000km 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot $22,900 control kombi 2013 manuell 150 a3 40,000km 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot $1,250 control limousine 1996 manuell 101 vectra 150,000km 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

50000 rows × 20 columns

In [3]:
autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null object
dtypes: int64(5), object(15)
memory usage: 7.6+ MB
In [4]:
autos.head(5)
Out[4]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

The above data is has 20 columns, 15 of which are of object type (i.e. the data contained in those columns are not only of integer type). There are null values present in the data

In [5]:
#Changing the Column name case from camelcase to snakecase 
#print ('before : ',autos.columns)

autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen']
#print('after :',autos.columns)

autos.head()
Out[5]:
date_crawled name seller offer_type price abtest vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

The name of the columns were in camelcase. However the preferred method is snakecase. Above we are changing the column names to be in confirmitiy with the camel case as well as renaming some columns to make them self explainatory.

Lets deep dive and explore the data . lets have a look if some column needs to be cleaned in order to make the analysis easy

In [6]:
autos.describe(include='all')
Out[6]:
date_crawled name seller offer_type price abtest vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postal_code last_seen
count 50000 50000 50000 50000 50000 50000 44905 50000.000000 47320 50000.000000 47242 50000 50000.000000 45518 50000 40171 50000 50000.0 50000.000000 50000
unique 48213 38754 2 2 2357 2 8 NaN 2 NaN 245 13 NaN 7 40 2 76 NaN NaN 39481
top 2016-03-11 22:38:16 Ford_Fiesta privat Angebot $0 test limousine NaN manuell NaN golf 150,000km NaN benzin volkswagen nein 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27
freq 3 78 49999 49999 1421 25756 12859 NaN 36993 NaN 4024 32424 NaN 30107 10687 35232 1946 NaN NaN 8
mean NaN NaN NaN NaN NaN NaN NaN 2005.073280 NaN 116.355920 NaN NaN 5.723360 NaN NaN NaN NaN 0.0 50813.627300 NaN
std NaN NaN NaN NaN NaN NaN NaN 105.712813 NaN 209.216627 NaN NaN 3.711984 NaN NaN NaN NaN 0.0 25779.747957 NaN
min NaN NaN NaN NaN NaN NaN NaN 1000.000000 NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN 0.0 1067.000000 NaN
25% NaN NaN NaN NaN NaN NaN NaN 1999.000000 NaN 70.000000 NaN NaN 3.000000 NaN NaN NaN NaN 0.0 30451.000000 NaN
50% NaN NaN NaN NaN NaN NaN NaN 2003.000000 NaN 105.000000 NaN NaN 6.000000 NaN NaN NaN NaN 0.0 49577.000000 NaN
75% NaN NaN NaN NaN NaN NaN NaN 2008.000000 NaN 150.000000 NaN NaN 9.000000 NaN NaN NaN NaN 0.0 71540.000000 NaN
max NaN NaN NaN NaN NaN NaN NaN 9999.000000 NaN 17700.000000 NaN NaN 12.000000 NaN NaN NaN NaN 0.0 99998.000000 NaN

From the above we can see that the odometer and price are of type string as it contains non-numeric character. We need to clean the data to use it for the analysis. Also the column "nr_of_pictures' has same value in almost all the rows. So we can drop that off.

In [7]:
#investigating and cleaning the price column

autos['price']=autos['price'].str.replace("$" ,"")
autos['price']=autos['price'].str.replace("," ,"")

#investigating and cleaning the odometer column
autos['odometer']=autos['odometer'].str.replace("km" ,"")
autos['odometer']=autos['odometer'].str.replace("," ,"")

#renaming the columns
autos.rename (columns={ 'odometer':'odometer_km'}, inplace=True)

#dropping the column which we found to have same values in all row.
In [8]:
autos = autos.drop(["nr_of_pictures", "seller", "offer_type",'nr_of_pictures' ], axis=1)
In [9]:
#converting data type to int
autos['odometer_km']= autos['odometer_km'].astype ('int')
In [10]:
autos['price']=autos['price'].astype ('float')

Lets explore the Odomter and price Columns

In [11]:
autos["odometer_km"].value_counts()
Out[11]:
150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
70000      1230
60000      1164
50000      1027
5000        967
40000       819
30000       789
20000       784
10000       264
Name: odometer_km, dtype: int64

The majority of the cars have quite milage on them.Lets explore the price data further

In [12]:
print ('Unique Car Prices:', autos["price"].unique().shape)
print ('Stats about car prices\n',autos["price"].describe())
print ('Max Car Price', autos['price'].max())
print ('Min Car Price', autos['price'].min())
Unique Car Prices: (2357,)
Stats about car prices
 count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64
Max Car Price 99999999.0
Min Car Price 0.0

A look at the car prices tells us that two outliers. The Max price 99999999 and the min price 0. Let explore more to see how many cars lie in those outliers.

In [13]:
#Sorted Ascending
autos['price'].value_counts().sort_index(ascending=True).head(20)
Out[13]:
0.0     1421
1.0      156
2.0        3
3.0        1
5.0        2
8.0        1
9.0        1
10.0       7
11.0       2
12.0       3
13.0       2
14.0       1
15.0       2
17.0       3
18.0       1
20.0       4
25.0       5
29.0       1
30.0       7
35.0       1
Name: price, dtype: int64
In [14]:
#Sorted Descending
autos['price'].value_counts().sort_index(ascending=False).head(20)
Out[14]:
99999999.0    1
27322222.0    1
12345678.0    3
11111111.0    2
10000000.0    1
3890000.0     1
1300000.0     1
1234566.0     1
999999.0      2
999990.0      1
350000.0      1
345000.0      1
299000.0      1
295000.0      1
265000.0      1
259000.0      1
250000.0      1
220000.0      1
198000.0      1
197000.0      1
Name: price, dtype: int64

Now if you look at the top and bottom 20 price we can see that 1421 cars priced at 0. For the purpose of this analysis I am going to remove those cars from the dataset where the price is less then 3 or greater than 10000000 .

In [15]:
autos=autos[autos["price"].between(3,10000000)]
In [16]:
autos
Out[16]:
date_crawled name price abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD 5000.0 control bus 2004 manuell 158 andere 150000 3 lpg peugeot nein 2016-03-26 00:00:00 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik 8500.0 control limousine 1997 automatik 286 7er 150000 6 benzin bmw nein 2016-04-04 00:00:00 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United 8990.0 test limousine 2009 manuell 102 golf 70000 7 benzin volkswagen nein 2016-03-26 00:00:00 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... 4350.0 control kleinwagen 2007 automatik 71 fortwo 70000 6 benzin smart nein 2016-03-12 00:00:00 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... 1350.0 test kombi 2003 manuell 0 focus 150000 7 benzin ford nein 2016-04-01 00:00:00 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... 7900.0 test bus 2006 automatik 150 voyager 150000 4 diesel chrysler NaN 2016-03-21 00:00:00 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... 300.0 test limousine 1995 manuell 90 golf 150000 8 benzin volkswagen NaN 2016-03-20 00:00:00 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS 1990.0 control limousine 1998 manuell 90 golf 150000 12 diesel volkswagen nein 2016-03-16 00:00:00 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa 250.0 test NaN 2000 manuell 0 arosa 150000 10 NaN seat nein 2016-03-22 00:00:00 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage 590.0 control bus 1997 manuell 90 megane 150000 7 benzin renault nein 2016-03-16 00:00:00 15749 2016-04-06 10:46:35
10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau 999.0 test NaN 2017 manuell 90 NaN 150000 4 benzin volkswagen nein 2016-03-14 00:00:00 86157 2016-04-07 03:16:21
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden 350.0 control NaN 2000 NaN 0 NaN 150000 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... 5299.0 control kleinwagen 2010 automatik 71 fortwo 50000 9 benzin smart nein 2016-03-31 00:00:00 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning 1350.0 control limousine 1999 manuell 101 a3 150000 11 benzin audi nein 2016-03-23 00:00:00 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... 3999.0 test kleinwagen 2007 manuell 75 clio 150000 9 benzin renault NaN 2016-03-23 00:00:00 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection 18900.0 test coupe 1982 automatik 203 NaN 80000 6 benzin sonstige_autos nein 2016-04-01 00:00:00 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi 350.0 test kombi 1999 manuell 101 vectra 150000 5 benzin opel nein 2016-03-16 00:00:00 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 5500.0 test coupe 1990 manuell 205 scirocco 150000 6 benzin volkswagen nein 2016-03-29 00:00:00 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring 300.0 control bus 1995 manuell 150 3er 150000 0 benzin bmw NaN 2016-03-26 00:00:00 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 4150.0 control suv 2004 manuell 124 andere 150000 2 lpg mazda nein 2016-03-17 00:00:00 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... 3500.0 test kombi 2003 manuell 131 a4 150000 5 diesel audi NaN 2016-03-05 00:00:00 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio 41500.0 test cabrio 2004 manuell 320 911 150000 4 benzin porsche nein 2016-03-06 00:00:00 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio 25450.0 control cabrio 2015 manuell 184 cooper 10000 1 benzin mini nein 2016-03-28 00:00:00 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima 7999.0 control bus 2010 manuell 120 NaN 150000 2 diesel peugeot nein 2016-03-10 00:00:00 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. 48500.0 control limousine 2014 automatik 306 5er 30000 12 benzin bmw nein 2016-04-03 00:00:00 22547 2016-04-07 13:16:50
25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung 90.0 control kombi 1996 manuell 116 NaN 150000 4 benzin ford ja 2016-03-21 00:00:00 27574 2016-04-01 05:16:49
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox 777.0 control kleinwagen 1992 manuell 54 polo 125000 2 benzin volkswagen nein 2016-04-03 00:00:00 38110 2016-04-05 23:46:48
28 2016-03-19 21:56:19 MINI_Cooper_D 5250.0 control kleinwagen 2007 manuell 110 cooper 150000 7 diesel mini ja 2016-03-19 00:00:00 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... 4999.0 test kombi 2004 automatik 204 e_klasse 150000 10 diesel mercedes_benz nein 2016-04-02 00:00:00 47638 2016-04-02 12:45:44
30 2016-03-14 11:47:31 Peugeot_206_Unfallfahrzeug 80.0 test kleinwagen 2002 manuell 60 2_reihe 150000 6 benzin peugeot ja 2016-03-14 00:00:00 57076 2016-03-14 11:47:31
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49968 2016-04-01 17:49:15 Mercedes_Benz_190_D_2.5_Automatik 2100.0 test limousine 1986 automatik 90 andere 150000 9 diesel mercedes_benz nein 2016-04-01 00:00:00 40227 2016-04-05 13:16:35
49969 2016-03-17 18:49:02 Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZ 4500.0 control suv 2005 manuell 136 x_trail 150000 5 diesel nissan nein 2016-03-17 00:00:00 17379 2016-03-25 23:18:15
49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... 15800.0 control bus 2010 automatik 136 c4 60000 4 diesel citroen nein 2016-03-21 00:00:00 14947 2016-04-07 04:17:34
49971 2016-03-29 14:54:12 W.Lupo_1.0 950.0 test kleinwagen 2001 manuell 50 lupo 150000 4 benzin volkswagen nein 2016-03-29 00:00:00 65197 2016-03-29 20:41:51
49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. 3300.0 control bus 2004 automatik 150 vito 150000 10 diesel mercedes_benz ja 2016-03-26 00:00:00 65326 2016-03-28 11:28:18
49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor 6000.0 control cabrio 2004 manuell 163 slk 150000 11 benzin mercedes_benz nein 2016-03-27 00:00:00 53567 2016-03-27 08:25:24
49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort 9700.0 control kleinwagen 2012 automatik 88 jazz 100000 11 hybrid honda nein 2016-03-27 00:00:00 84385 2016-04-05 19:45:34
49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... 5900.0 test kombi 1992 automatik 150 80 150000 12 benzin audi nein 2016-03-19 00:00:00 36100 2016-04-07 06:16:44
49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 5500.0 control limousine 2003 manuell 116 c_klasse 150000 2 diesel mercedes_benz nein 2016-03-31 00:00:00 33739 2016-04-06 12:16:11
49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic 900.0 control limousine 1996 automatik 136 e_klasse 150000 9 benzin mercedes_benz ja 2016-04-04 00:00:00 24405 2016-04-06 12:44:20
49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style 11000.0 test kleinwagen 2011 manuell 90 polo 70000 11 diesel volkswagen nein 2016-03-20 00:00:00 48455 2016-04-07 01:45:12
49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V 400.0 control kombi 1995 manuell 105 escort 125000 3 benzin ford NaN 2016-03-12 00:00:00 56218 2016-04-06 17:16:49
49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung 2000.0 control kombi 1998 manuell 115 astra 150000 12 benzin opel nein 2016-03-15 00:00:00 86859 2016-04-05 17:21:46
49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm 1950.0 control kleinwagen 2004 manuell 0 fabia 90000 7 benzin skoda NaN 2016-03-29 00:00:00 45884 2016-03-29 18:51:08
49983 2016-03-06 12:43:04 Ford_focus_99 600.0 test kleinwagen 1999 manuell 101 focus 150000 4 benzin ford NaN 2016-03-06 00:00:00 52477 2016-03-09 06:16:08
49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! 1000.0 control NaN 1995 automatik 0 NaN 150000 0 benzin volkswagen NaN 2016-04-02 00:00:00 30900 2016-04-06 15:17:52
49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... 15900.0 control limousine 2010 automatik 218 300c 125000 11 diesel chrysler nein 2016-04-04 00:00:00 73527 2016-04-06 23:16:00
49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... 21990.0 control limousine 2013 manuell 150 a3 50000 11 diesel audi nein 2016-03-22 00:00:00 94362 2016-03-26 22:46:06
49988 2016-03-28 19:49:51 BMW_330_Ci 9550.0 control coupe 2001 manuell 231 3er 150000 10 benzin bmw nein 2016-03-28 00:00:00 83646 2016-04-07 02:17:40
49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau 150.0 test kleinwagen 1997 manuell 0 polo 150000 5 benzin volkswagen ja 2016-03-11 00:00:00 21244 2016-03-12 10:17:55
49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban 17500.0 test limousine 2012 manuell 156 a_klasse 30000 12 benzin mercedes_benz nein 2016-03-21 00:00:00 58239 2016-04-06 22:46:57
49991 2016-03-06 15:25:19 Kleinwagen 500.0 control NaN 2016 manuell 0 twingo 150000 0 benzin renault NaN 2016-03-06 00:00:00 61350 2016-03-06 18:24:19
49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport 4800.0 control kleinwagen 2009 manuell 120 andere 125000 9 lpg fiat nein 2016-03-10 00:00:00 68642 2016-03-13 01:44:51
49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug 1650.0 control kleinwagen 1997 manuell 0 NaN 150000 7 benzin audi NaN 2016-03-15 00:00:00 65203 2016-04-06 19:46:53
49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... 5000.0 control kombi 2001 automatik 299 a6 150000 1 benzin audi nein 2016-03-22 00:00:00 46537 2016-04-06 08:16:39
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon 24900.0 control limousine 2011 automatik 239 q5 100000 1 diesel audi nein 2016-03-27 00:00:00 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... 1980.0 control cabrio 1996 manuell 75 astra 150000 5 benzin opel nein 2016-03-28 00:00:00 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge 13200.0 test cabrio 2014 automatik 69 500 5000 11 benzin fiat nein 2016-04-02 00:00:00 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition 22900.0 control kombi 2013 manuell 150 a3 40000 11 diesel audi nein 2016-03-08 00:00:00 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V 1250.0 control limousine 1996 manuell 101 vectra 150000 1 benzin opel nein 2016-03-13 00:00:00 45897 2016-04-06 21:18:48

48413 rows × 17 columns

In [17]:
#Frequency for Date Crawled
autos['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index(ascending=True)
Out[17]:
2016-03-05    0.025365
2016-03-06    0.014066
2016-03-07    0.036044
2016-03-08    0.033276
2016-03-09    0.033049
2016-03-10    0.032202
2016-03-11    0.032595
2016-03-12    0.036953
2016-03-13    0.015657
2016-03-14    0.036622
2016-03-15    0.034268
2016-03-16    0.029517
2016-03-17    0.031644
2016-03-18    0.012910
2016-03-19    0.034743
2016-03-20    0.037800
2016-03-21    0.037325
2016-03-22    0.032946
2016-03-23    0.032264
2016-03-24    0.029393
2016-03-25    0.031562
2016-03-26    0.032243
2016-03-27    0.031107
2016-03-28    0.034846
2016-03-29    0.034144
2016-03-30    0.033731
2016-03-31    0.031810
2016-04-01    0.033731
2016-04-02    0.035486
2016-04-03    0.038564
2016-04-04    0.036519
2016-04-05    0.013075
2016-04-06    0.003160
2016-04-07    0.001384
Name: date_crawled, dtype: float64

From the column 'date_crawled' it seems that the data extracted/crawled activity was carried out for almost a month

In [18]:
#Frequency for Ad Created
autos['ad_created'].str[:10].value_counts(normalize=True, dropna=False).sort_index(ascending=True)
Out[18]:
2015-06-11    0.000021
2015-08-10    0.000021
2015-09-09    0.000021
2015-11-10    0.000021
2015-12-05    0.000021
2015-12-30    0.000021
2016-01-03    0.000021
2016-01-07    0.000021
2016-01-10    0.000041
2016-01-13    0.000021
2016-01-14    0.000021
2016-01-16    0.000021
2016-01-22    0.000021
2016-01-27    0.000062
2016-01-29    0.000021
2016-02-01    0.000021
2016-02-02    0.000041
2016-02-05    0.000041
2016-02-07    0.000021
2016-02-08    0.000021
2016-02-09    0.000021
2016-02-11    0.000021
2016-02-12    0.000041
2016-02-14    0.000041
2016-02-16    0.000021
2016-02-17    0.000021
2016-02-18    0.000041
2016-02-19    0.000062
2016-02-20    0.000041
2016-02-21    0.000062
                ...   
2016-03-09    0.033111
2016-03-10    0.031913
2016-03-11    0.032925
2016-03-12    0.036767
2016-03-13    0.017020
2016-03-14    0.035259
2016-03-15    0.033999
2016-03-16    0.030033
2016-03-17    0.031293
2016-03-18    0.013591
2016-03-19    0.033627
2016-03-20    0.037862
2016-03-21    0.037552
2016-03-22    0.032760
2016-03-23    0.032099
2016-03-24    0.029331
2016-03-25    0.031686
2016-03-26    0.032305
2016-03-27    0.031025
2016-03-28    0.034970
2016-03-29    0.034082
2016-03-30    0.033545
2016-03-31    0.031851
2016-04-01    0.033710
2016-04-02    0.035177
2016-04-03    0.038812
2016-04-04    0.036891
2016-04-05    0.011794
2016-04-06    0.003243
2016-04-07    0.001239
Name: ad_created, Length: 76, dtype: float64

The frequency of ad created seem to be more in the year 2016.

In [19]:
#Frequency for Ad Created
autos['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index(ascending=True)
Out[19]:
2016-03-05    0.001074
2016-03-06    0.004338
2016-03-07    0.005412
2016-03-08    0.007374
2016-03-09    0.009626
2016-03-10    0.010617
2016-03-11    0.012373
2016-03-12    0.023795
2016-03-13    0.008861
2016-03-14    0.012621
2016-03-15    0.015864
2016-03-16    0.016442
2016-03-17    0.028071
2016-03-18    0.007333
2016-03-19    0.015822
2016-03-20    0.020635
2016-03-21    0.020614
2016-03-22    0.021379
2016-03-23    0.018590
2016-03-24    0.019747
2016-03-25    0.019189
2016-03-26    0.016814
2016-03-27    0.015616
2016-03-28    0.020883
2016-03-29    0.022349
2016-03-30    0.024745
2016-03-31    0.023837
2016-04-01    0.022866
2016-04-02    0.024869
2016-04-03    0.025179
2016-04-04    0.024498
2016-04-05    0.124966
2016-04-06    0.221593
2016-04-07    0.132010
Name: last_seen, dtype: float64

The analysis of the last_seen column tells that majority of the ads were still active /present on the website when it was last crawled.

In [20]:
autos['registration_year'].describe()
Out[20]:
count    48413.000000
mean      2004.772871
std         88.779599
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

Profiling the registration year we can see that most of the cars are registered before or on 2008. There seem to to be few data discrpency with 9999 and 1000 as max and min values respectively

Since the first car accessible to the masses was in 1908 and the since the car can't be registered until made, so above I analyzed the percentages of car in the data set whose registration year is between 1910 and 2016.

In [21]:
autos['registration_year'].between(1910,2016).value_counts(normalize=True)
Out[21]:
True     0.961229
False    0.038771
Name: registration_year, dtype: float64

Only 3 % of the records lie outside our selected years. We can ignore those values.

Let us explore the brands of the cars whose ads are up for sale.

In [22]:
brand_composition = autos['brand'].value_counts(normalize=True)
print (brand_composition)
volkswagen        0.212959
bmw               0.108504
opel              0.108442
mercedes_benz     0.095759
audi              0.085824
ford              0.069733
renault           0.047859
peugeot           0.029455
fiat              0.026026
seat              0.018921
skoda             0.016070
nissan            0.015285
mazda             0.015264
smart             0.014335
citroen           0.014128
toyota            0.012621
hyundai           0.009956
sonstige_autos    0.009584
volvo             0.009027
mini              0.008634
mitsubishi        0.008180
honda             0.007994
kia               0.007126
alfa_romeo        0.006610
porsche           0.005887
suzuki            0.005887
chevrolet         0.005660
chrysler          0.003491
dacia             0.002665
daihatsu          0.002520
jeep              0.002231
subaru            0.002107
land_rover        0.002045
saab              0.001632
daewoo            0.001570
jaguar            0.001529
trabant           0.001405
rover             0.001343
lancia            0.001136
lada              0.000599
Name: brand, dtype: float64

The list above shows the percentage of ads for each brand of car. Let us concentrate on cars whose ads comprises > 5% of total ads.

In [23]:
#isolating the  comprises > 5%car brand names
selected_brands = brand_composition[brand_composition>0.05].index
print (selected_brands)

#Now creating a dictionary and aggregating the mean price of top 5 cars that we have choosen
#in the previous step
mean_car_price ={}

for brands in selected_brands :
    print (brands)
    car_brand = autos[autos['brand']==brands]
    mean_car_p = car_brand['price'].mean()
    mean_car_price[brands]= int(mean_car_p)

mean_car_price
Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')
volkswagen
bmw
opel
mercedes_benz
audi
ford
Out[23]:
{'audi': 9241,
 'bmw': 8529,
 'ford': 4031,
 'mercedes_benz': 8565,
 'opel': 2959,
 'volkswagen': 5539}
THe above are the mean prices of the cars we selected. There is quite a difference in luxury car mean price to the normal ones, indicating that the luxury cars are expesive
In [24]:
#Exploring Mileage of the cars and calculating the mean mileage
mean_car_mileage ={}

for brands in selected_brands :
    print (brands)
    car_brand = autos[autos['brand']==brands]
    mean_car_m = car_brand['odometer_km'].mean()
    mean_car_mileage[brands]= int(mean_car_m)
mean_car_mileage
volkswagen
bmw
opel
mercedes_benz
audi
ford
Out[24]:
{'audi': 129567,
 'bmw': 132689,
 'ford': 124296,
 'mercedes_benz': 130846,
 'opel': 129391,
 'volkswagen': 128902}
In [25]:
#converting the dictionariesi to Pandas series
#this step is done so that we can have a  single Dataframe 
#wher we can compare the price to mileage of the car
mileage_series = pd.Series (mean_car_mileage)
price_series =  pd.Series (mean_car_price)
In [26]:
#Finding out relationship between Mileage and Price
df_price_mileage = pd.DataFrame (mileage_series, columns=['mean_mileage'])
df_price_mileage
df_price_mileage['mean_car_price']=price_series
df_price_mileage.sort_values(by=['mean_mileage', 'mean_car_price'],ascending=False)
Out[26]:
mean_mileage mean_car_price
bmw 132689 8529
mercedes_benz 130846 8565
audi 129567 9241
opel 129391 2959
volkswagen 128902 5539
ford 124296 4031

In general there seems to be a inverse relationship between the mileage and the car price. Higher the Milegae lower the car price

Extra Cleaning Steps

In [27]:
autos.columns
# df.groupby(['key1', 'key2']).size()
# autos.groupby(['brand','model']).size()
# autos[['brand','model']].apply(pd.Series.value_counts)

# converting date crawled into YYYMMDD format and changing it to int data type
date_crawled_list = autos["date_crawled"].str.slice(0,10).str.split('-')

for lista in date_crawled_list:
    autos['date_crawled_1']= int(lista[0]+lista[1]+lista[2])

#converting last_seen into YYYYMMDD format and chaning it into int data type
last_seen_list = autos["last_seen"].str.slice(0,10).str.split('-')    
    

for lista in last_seen_list:
    autos['last_seen_1']= int(lista[0]+lista[1]+lista[2])

#converting ad_created into YYYYMMDD format and chaning it into int data type
ad_created_list = autos['ad_created'].str.slice(0,10).str.split('-')


for lista in ad_created_list:
    autos['ad_created_1']= int(lista[0]+lista[1]+lista[2])
In [28]:
#Droppin the columns last_seen and date_crawled
autos = autos.drop(['last_seen','date_crawled','ad_created'], axis=1)
autos.columns
Out[28]:
Index(['name', 'price', 'abtest', 'vehicle_type', 'registration_year',
       'gearbox', 'power_ps', 'model', 'odometer_km', 'registration_month',
       'fuel_type', 'brand', 'unrepaired_damage', 'postal_code',
       'date_crawled_1', 'last_seen_1', 'ad_created_1'],
      dtype='object')
In [29]:
#Renaming the columns
autos = autos.rename(columns={'last_seen_1':'last_seen','date_crawled_1':'date_crawled' ,'ad_created_1':'ad_created'})

autos
Out[29]:
name price abtest vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage postal_code date_crawled last_seen ad_created
0 Peugeot_807_160_NAVTECH_ON_BOARD 5000.0 control bus 2004 manuell 158 andere 150000 3 lpg peugeot nein 79588 20160314 20160406 20160313
1 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik 8500.0 control limousine 1997 automatik 286 7er 150000 6 benzin bmw nein 71034 20160314 20160406 20160313
2 Volkswagen_Golf_1.6_United 8990.0 test limousine 2009 manuell 102 golf 70000 7 benzin volkswagen nein 35394 20160314 20160406 20160313
3 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... 4350.0 control kleinwagen 2007 automatik 71 fortwo 70000 6 benzin smart nein 33729 20160314 20160406 20160313
4 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... 1350.0 test kombi 2003 manuell 0 focus 150000 7 benzin ford nein 39218 20160314 20160406 20160313
5 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... 7900.0 test bus 2006 automatik 150 voyager 150000 4 diesel chrysler NaN 22962 20160314 20160406 20160313
6 VW_Golf_III_GT_Special_Electronic_Green_Metall... 300.0 test limousine 1995 manuell 90 golf 150000 8 benzin volkswagen NaN 31535 20160314 20160406 20160313
7 Golf_IV_1.9_TDI_90PS 1990.0 control limousine 1998 manuell 90 golf 150000 12 diesel volkswagen nein 53474 20160314 20160406 20160313
8 Seat_Arosa 250.0 test NaN 2000 manuell 0 arosa 150000 10 NaN seat nein 7426 20160314 20160406 20160313
9 Renault_Megane_Scenic_1.6e_RT_Klimaanlage 590.0 control bus 1997 manuell 90 megane 150000 7 benzin renault nein 15749 20160314 20160406 20160313
10 VW_Golf_Tuning_in_siber/grau 999.0 test NaN 2017 manuell 90 NaN 150000 4 benzin volkswagen nein 86157 20160314 20160406 20160313
11 Mercedes_A140_Motorschaden 350.0 control NaN 2000 NaN 0 NaN 150000 0 benzin mercedes_benz NaN 17498 20160314 20160406 20160313
12 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... 5299.0 control kleinwagen 2010 automatik 71 fortwo 50000 9 benzin smart nein 34590 20160314 20160406 20160313
13 Audi_A3_1.6_tuning 1350.0 control limousine 1999 manuell 101 a3 150000 11 benzin audi nein 12043 20160314 20160406 20160313
14 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... 3999.0 test kleinwagen 2007 manuell 75 clio 150000 9 benzin renault NaN 81737 20160314 20160406 20160313
15 Corvette_C3_Coupe_T_Top_Crossfire_Injection 18900.0 test coupe 1982 automatik 203 NaN 80000 6 benzin sonstige_autos nein 61276 20160314 20160406 20160313
16 Opel_Vectra_B_Kombi 350.0 test kombi 1999 manuell 101 vectra 150000 5 benzin opel nein 57299 20160314 20160406 20160313
17 Volkswagen_Scirocco_2_G60 5500.0 test coupe 1990 manuell 205 scirocco 150000 6 benzin volkswagen nein 74821 20160314 20160406 20160313
18 Verkaufen_mein_bmw_e36_320_i_touring 300.0 control bus 1995 manuell 150 3er 150000 0 benzin bmw NaN 54329 20160314 20160406 20160313
19 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 4150.0 control suv 2004 manuell 124 andere 150000 2 lpg mazda nein 40878 20160314 20160406 20160313
20 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... 3500.0 test kombi 2003 manuell 131 a4 150000 5 diesel audi NaN 53913 20160314 20160406 20160313
21 Porsche_911_Carrera_4S_Cabrio 41500.0 test cabrio 2004 manuell 320 911 150000 4 benzin porsche nein 65428 20160314 20160406 20160313
22 MINI_Cooper_S_Cabrio 25450.0 control cabrio 2015 manuell 184 cooper 10000 1 benzin mini nein 44789 20160314 20160406 20160313
23 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima 7999.0 control bus 2010 manuell 120 NaN 150000 2 diesel peugeot nein 30900 20160314 20160406 20160313
24 BMW_535i_xDrive_Sport_Aut. 48500.0 control limousine 2014 automatik 306 5er 30000 12 benzin bmw nein 22547 20160314 20160406 20160313
25 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung 90.0 control kombi 1996 manuell 116 NaN 150000 4 benzin ford ja 27574 20160314 20160406 20160313
26 Volkswagen_Polo_Fox 777.0 control kleinwagen 1992 manuell 54 polo 125000 2 benzin volkswagen nein 38110 20160314 20160406 20160313
28 MINI_Cooper_D 5250.0 control kleinwagen 2007 manuell 110 cooper 150000 7 diesel mini ja 15745 20160314 20160406 20160313
29 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... 4999.0 test kombi 2004 automatik 204 e_klasse 150000 10 diesel mercedes_benz nein 47638 20160314 20160406 20160313
30 Peugeot_206_Unfallfahrzeug 80.0 test kleinwagen 2002 manuell 60 2_reihe 150000 6 benzin peugeot ja 57076 20160314 20160406 20160313
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49968 Mercedes_Benz_190_D_2.5_Automatik 2100.0 test limousine 1986 automatik 90 andere 150000 9 diesel mercedes_benz nein 40227 20160314 20160406 20160313
49969 Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZ 4500.0 control suv 2005 manuell 136 x_trail 150000 5 diesel nissan nein 17379 20160314 20160406 20160313
49970 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... 15800.0 control bus 2010 automatik 136 c4 60000 4 diesel citroen nein 14947 20160314 20160406 20160313
49971 W.Lupo_1.0 950.0 test kleinwagen 2001 manuell 50 lupo 150000 4 benzin volkswagen nein 65197 20160314 20160406 20160313
49972 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. 3300.0 control bus 2004 automatik 150 vito 150000 10 diesel mercedes_benz ja 65326 20160314 20160406 20160313
49973 Mercedes_Benz_SLK_200_Kompressor 6000.0 control cabrio 2004 manuell 163 slk 150000 11 benzin mercedes_benz nein 53567 20160314 20160406 20160313
49975 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort 9700.0 control kleinwagen 2012 automatik 88 jazz 100000 11 hybrid honda nein 84385 20160314 20160406 20160313
49976 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... 5900.0 test kombi 1992 automatik 150 80 150000 12 benzin audi nein 36100 20160314 20160406 20160313
49977 Mercedes_Benz_C200_Cdi_W203 5500.0 control limousine 2003 manuell 116 c_klasse 150000 2 diesel mercedes_benz nein 33739 20160314 20160406 20160313
49978 Mercedes_Benz_E_200_Classic 900.0 control limousine 1996 automatik 136 e_klasse 150000 9 benzin mercedes_benz ja 24405 20160314 20160406 20160313
49979 Volkswagen_Polo_1.6_TDI_Style 11000.0 test kleinwagen 2011 manuell 90 polo 70000 11 diesel volkswagen nein 48455 20160314 20160406 20160313
49980 Ford_Escort_Turnier_16V 400.0 control kombi 1995 manuell 105 escort 125000 3 benzin ford NaN 56218 20160314 20160406 20160313
49981 Opel_Astra_Kombi_mit_Anhaengerkupplung 2000.0 control kombi 1998 manuell 115 astra 150000 12 benzin opel nein 86859 20160314 20160406 20160313
49982 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm 1950.0 control kleinwagen 2004 manuell 0 fabia 90000 7 benzin skoda NaN 45884 20160314 20160406 20160313
49983 Ford_focus_99 600.0 test kleinwagen 1999 manuell 101 focus 150000 4 benzin ford NaN 52477 20160314 20160406 20160313
49985 Verkaufe_meinen_vw_vento! 1000.0 control NaN 1995 automatik 0 NaN 150000 0 benzin volkswagen NaN 30900 20160314 20160406 20160313
49986 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... 15900.0 control limousine 2010 automatik 218 300c 125000 11 diesel chrysler nein 73527 20160314 20160406 20160313
49987 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... 21990.0 control limousine 2013 manuell 150 a3 50000 11 diesel audi nein 94362 20160314 20160406 20160313
49988 BMW_330_Ci 9550.0 control coupe 2001 manuell 231 3er 150000 10 benzin bmw nein 83646 20160314 20160406 20160313
49989 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau 150.0 test kleinwagen 1997 manuell 0 polo 150000 5 benzin volkswagen ja 21244 20160314 20160406 20160313
49990 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban 17500.0 test limousine 2012 manuell 156 a_klasse 30000 12 benzin mercedes_benz nein 58239 20160314 20160406 20160313
49991 Kleinwagen 500.0 control NaN 2016 manuell 0 twingo 150000 0 benzin renault NaN 61350 20160314 20160406 20160313
49992 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport 4800.0 control kleinwagen 2009 manuell 120 andere 125000 9 lpg fiat nein 68642 20160314 20160406 20160313
49993 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug 1650.0 control kleinwagen 1997 manuell 0 NaN 150000 7 benzin audi NaN 65203 20160314 20160406 20160313
49994 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... 5000.0 control kombi 2001 automatik 299 a6 150000 1 benzin audi nein 46537 20160314 20160406 20160313
49995 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon 24900.0 control limousine 2011 automatik 239 q5 100000 1 diesel audi nein 82131 20160314 20160406 20160313
49996 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... 1980.0 control cabrio 1996 manuell 75 astra 150000 5 benzin opel nein 44807 20160314 20160406 20160313
49997 Fiat_500_C_1.2_Dualogic_Lounge 13200.0 test cabrio 2014 automatik 69 500 5000 11 benzin fiat nein 73430 20160314 20160406 20160313
49998 Audi_A3_2.0_TDI_Sportback_Ambition 22900.0 control kombi 2013 manuell 150 a3 40000 11 diesel audi nein 35683 20160314 20160406 20160313
49999 Opel_Vectra_1.6_16V 1250.0 control limousine 1996 manuell 101 vectra 150000 1 benzin opel nein 45897 20160314 20160406 20160313

48413 rows × 17 columns