Exploring Ebay Car Sales Data

In this project we will work with a modified data set from a classifieds section of the German Ebay website. The goal of this project is to clean and analyze the data.

In [1]:
import pandas as pd
import numpy as np
In [2]:
autos = pd.read_csv('autos.csv', encoding='Latin-1')
In [3]:
autos
Out[3]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon privat Angebot $24,900 control limousine 2011 automatik 239 q5 100,000km 1 diesel audi nein 2016-03-27 00:00:00 0 82131 2016-04-01 13:47:40
49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... privat Angebot $1,980 control cabrio 1996 manuell 75 astra 150,000km 5 benzin opel nein 2016-03-28 00:00:00 0 44807 2016-04-02 14:18:02
49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge privat Angebot $13,200 test cabrio 2014 automatik 69 500 5,000km 11 benzin fiat nein 2016-04-02 00:00:00 0 73430 2016-04-04 11:47:27
49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition privat Angebot $22,900 control kombi 2013 manuell 150 a3 40,000km 11 diesel audi nein 2016-03-08 00:00:00 0 35683 2016-04-05 16:45:07
49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V privat Angebot $1,250 control limousine 1996 manuell 101 vectra 150,000km 1 benzin opel nein 2016-03-13 00:00:00 0 45897 2016-04-06 21:18:48

50000 rows × 20 columns

In [4]:
autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   dateCrawled          50000 non-null  object
 1   name                 50000 non-null  object
 2   seller               50000 non-null  object
 3   offerType            50000 non-null  object
 4   price                50000 non-null  object
 5   abtest               50000 non-null  object
 6   vehicleType          44905 non-null  object
 7   yearOfRegistration   50000 non-null  int64 
 8   gearbox              47320 non-null  object
 9   powerPS              50000 non-null  int64 
 10  model                47242 non-null  object
 11  odometer             50000 non-null  object
 12  monthOfRegistration  50000 non-null  int64 
 13  fuelType             45518 non-null  object
 14  brand                50000 non-null  object
 15  notRepairedDamage    40171 non-null  object
 16  dateCreated          50000 non-null  object
 17  nrOfPictures         50000 non-null  int64 
 18  postalCode           50000 non-null  int64 
 19  lastSeen             50000 non-null  object
dtypes: int64(5), object(15)
memory usage: 7.6+ MB
In [5]:
autos.head()
Out[5]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

Initial impressions of the data:

'dateCrawled' and 'lastSeenOnline' - might not be relevent to our ananlysis so might delete 'price' - convert to float 'model' - put this next to the 'brand' column for ease of reading 'odometer' - convert to int 'monthOfRegistration' - maybe convert to the full name of the month 'dateCreated' - convert to date

Some of the columns have null values that we will have to investigate.
Column titles we can clean, add spaces, and use lowercase.

First let's convert the column names from camelcase to snakecase and re-word some of them to make it more clear:

In [6]:
autos.columns
Out[6]:
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')
In [7]:
mapping_dictionary = {'dateCrawled':'date_crawled', 'offerType':'offer_type', 'abtest':'ab_test','vehicleType':'vehicle_type',
'yearOfRegistration':'registration_year', 'powerPS':'power_ps', 'monthOfRegistration':'registration_month',
'fuelType':'fuel_type', 'notRepairedDamage':'unrepaired_damage',
'dateCreated':'ad_created', 'nrOfPictures':'nr_of_pictures', 'postaCode':'postal_code', 'lastSeen':'last_seen'}
In [8]:
autos.rename(mapper=mapping_dictionary, axis=1, inplace=True)
In [9]:
autos.head()
Out[9]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postalCode last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
In [10]:
autos.describe(include='all')
Out[10]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postalCode last_seen
count 50000 50000 50000 50000 50000 50000 44905 50000.000000 47320 50000.000000 47242 50000 50000.000000 45518 50000 40171 50000 50000.0 50000.000000 50000
unique 48213 38754 2 2 2357 2 8 NaN 2 NaN 245 13 NaN 7 40 2 76 NaN NaN 39481
top 2016-03-21 20:37:19 Ford_Fiesta privat Angebot $0 test limousine NaN manuell NaN golf 150,000km NaN benzin volkswagen nein 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27
freq 3 78 49999 49999 1421 25756 12859 NaN 36993 NaN 4024 32424 NaN 30107 10687 35232 1946 NaN NaN 8
mean NaN NaN NaN NaN NaN NaN NaN 2005.073280 NaN 116.355920 NaN NaN 5.723360 NaN NaN NaN NaN 0.0 50813.627300 NaN
std NaN NaN NaN NaN NaN NaN NaN 105.712813 NaN 209.216627 NaN NaN 3.711984 NaN NaN NaN NaN 0.0 25779.747957 NaN
min NaN NaN NaN NaN NaN NaN NaN 1000.000000 NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN 0.0 1067.000000 NaN
25% NaN NaN NaN NaN NaN NaN NaN 1999.000000 NaN 70.000000 NaN NaN 3.000000 NaN NaN NaN NaN 0.0 30451.000000 NaN
50% NaN NaN NaN NaN NaN NaN NaN 2003.000000 NaN 105.000000 NaN NaN 6.000000 NaN NaN NaN NaN 0.0 49577.000000 NaN
75% NaN NaN NaN NaN NaN NaN NaN 2008.000000 NaN 150.000000 NaN NaN 9.000000 NaN NaN NaN NaN 0.0 71540.000000 NaN
max NaN NaN NaN NaN NaN NaN NaN 9999.000000 NaN 17700.000000 NaN NaN 12.000000 NaN NaN NaN NaN 0.0 99998.000000 NaN

'seller' and 'offer_type' can be removed becasue amost all of the values are the same 'price', 'odometer' needs to be converted to int 'date_craweled', 'ad_created', and 'last_seen' can be converted to date format.

In [11]:
autos['price'] = autos['price'].str.replace('$','').str.replace(',','').astype(int)
In [12]:
autos['odometer'] = autos['odometer'].str.replace('km','').str.replace(',','').astype(int)
In [13]:
autos.rename({'odometer':'odometer_km'}, axis=1, inplace=True)

Let's continue exploring the data, specifically looking for data that doesn't look right. We'll start by analyzing the odometer_km and price columns.

In [14]:
autos['price'].describe()
Out[14]:
count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64
In [15]:
autos['price'].nlargest(20)
Out[15]:
39705    99999999
42221    27322222
27371    12345678
39377    12345678
47598    12345678
2897     11111111
24384    11111111
11137    10000000
47634     3890000
7814      1300000
22947     1234566
514        999999
43049      999999
37585      999990
36818      350000
14715      345000
34723      299000
35923      295000
12682      265000
47337      259000
Name: price, dtype: int64
In [16]:
price_bool = autos['price'] < 100
price_bool.value_counts()
Out[16]:
False    48238
True      1762
Name: price, dtype: int64

We will remove the entries with prices 999,990 and above since those are either errors or extrememly expense that they may skew the results. There are over 1700 entries with a price less than 100 so we will exclude these as well.

In [17]:
autos = autos.loc[autos['price'].between(100, 350000), :]
In [18]:
autos['odometer_km'].min()
Out[18]:
5000

We don't need decimals for the 'odometer_km' column so we will convert it to int. 150,000 km is reasonable for some used cars so there is no need to remove any on this basis. The lowest is 5000 km which is also reasonable.

In [19]:
autos['odometer_km'].describe()
Out[19]:
count     48224.000000
mean     125919.148142
std       39543.339640
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64
In [20]:
autos['price'].describe()
Out[20]:
count     48224.000000
mean       5930.371433
std        9078.372762
min         100.000000
25%        1250.000000
50%        3000.000000
75%        7499.000000
max      350000.000000
Name: price, dtype: float64

Observations about the remaining values for 'price' and 'odometer_km':

The mean milage is over 125000 km so most of our entries are vehicles with high km. 75% of the vehicles have 125000 kilometers or more.

The average price is 5930.

Exploring the Columns Containing Dates

In [21]:
autos[['date_crawled','ad_created','last_seen']][0:5]
Out[21]:
date_crawled ad_created last_seen
0 2016-03-26 17:47:46 2016-03-26 00:00:00 2016-04-06 06:45:54
1 2016-04-04 13:38:56 2016-04-04 00:00:00 2016-04-06 14:45:08
2 2016-03-26 18:57:24 2016-03-26 00:00:00 2016-04-06 20:15:37
3 2016-03-12 16:58:10 2016-03-12 00:00:00 2016-03-15 03:16:28
4 2016-04-01 14:38:50 2016-04-01 00:00:00 2016-04-01 14:38:50
In [22]:
autos['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index()
Out[22]:
2016-03-05    0.025361
2016-03-06    0.014039
2016-03-07    0.036061
2016-03-08    0.033179
2016-03-09    0.033013
2016-03-10    0.032287
2016-03-11    0.032598
2016-03-12    0.036911
2016-03-13    0.015677
2016-03-14    0.036662
2016-03-15    0.034319
2016-03-16    0.029467
2016-03-17    0.031499
2016-03-18    0.012898
2016-03-19    0.034734
2016-03-20    0.037803
2016-03-21    0.037201
2016-03-22    0.032888
2016-03-23    0.032287
2016-03-24    0.029446
2016-03-25    0.031499
2016-03-26    0.032308
2016-03-27    0.031126
2016-03-28    0.034962
2016-03-29    0.034112
2016-03-30    0.033738
2016-03-31    0.031851
2016-04-01    0.033697
2016-04-02    0.035605
2016-04-03    0.038611
2016-04-04    0.036538
2016-04-05    0.013064
2016-04-06    0.003173
2016-04-07    0.001389
Name: date_crawled, dtype: float64

All of the entries were crawled in either March or April of 2016 with most of them being craweled in March

In [23]:
autos['ad_created'].str[:7].value_counts(normalize=True, dropna=False)
Out[23]:
2016-03    0.837342
2016-04    0.161019
2016-02    0.001265
2016-01    0.000249
2015-12    0.000041
2015-08    0.000021
2015-06    0.000021
2015-11    0.000021
2015-09    0.000021
Name: ad_created, dtype: float64

Over 83% of the ad were created in March of 2016. The total range for all ad creation is from June 2015 through April 2016.

In [24]:
autos['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index()
Out[24]:
2016-03-05    0.001078
2016-03-06    0.004313
2016-03-07    0.005433
2016-03-08    0.007320
2016-03-09    0.009580
2016-03-10    0.010638
2016-03-11    0.012400
2016-03-12    0.023785
2016-03-13    0.008875
2016-03-14    0.012629
2016-03-15    0.015863
2016-03-16    0.016444
2016-03-17    0.028098
2016-03-18    0.007320
2016-03-19    0.015760
2016-03-20    0.020654
2016-03-21    0.020550
2016-03-22    0.021359
2016-03-23    0.018580
2016-03-24    0.019762
2016-03-25    0.019098
2016-03-26    0.016672
2016-03-27    0.015552
2016-03-28    0.020840
2016-03-29    0.022292
2016-03-30    0.024697
2016-03-31    0.023826
2016-04-01    0.022852
2016-04-02    0.024884
2016-04-03    0.025133
2016-04-04    0.024531
2016-04-05    0.125062
2016-04-06    0.221964
2016-04-07    0.132154
Name: last_seen, dtype: float64

The date range for when the ad was last seen is from March 5th, 2016 to April 7th, 2016

Dealing With Incorrect Registration Year Data

In [25]:
autos['registration_year'].describe()
Out[25]:
count    48224.000000
mean      2004.730964
std         87.897388
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

Some things stand out from the 'registration_year' column:

the minimum vallue is 1000 and the maximum value is 9999. The registration year can't be after the listing was seen so any entry after 2016 is inaccurate. The earliest possible registration date would be in the early 1900's when cars were invented. Let's take a look at some of the data to determine if we need to fix anything that may be unrealistic.

In [26]:
autos[autos['registration_year'] < 1940]
Out[26]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postalCode last_seen
2221 2016-03-15 14:57:07 Sehr_seltener_Oldtimer_Opel_1210_zum_Restaurieren privat Angebot 3350 control andere 1934 manuell 0 andere 5000 0 benzin opel ja 2016-03-15 00:00:00 0 49828 2016-04-06 06:17:51
2573 2016-03-19 22:51:25 Hanomag_rekord_15k_Suche_ersatz_teile privat Angebot 3000 test andere 1934 NaN 0 NaN 90000 1 benzin sonstige_autos nein 2016-03-19 00:00:00 0 90489 2016-03-19 22:51:25
10556 2016-04-01 06:02:10 UNFAL_Auto privat Angebot 450 control NaN 1800 NaN 1800 NaN 5000 2 NaN mitsubishi nein 2016-04-01 00:00:00 0 63322 2016-04-01 09:42:30
11246 2016-03-26 19:49:59 Ford_Model_A_Roadster_Deluxe_1931 privat Angebot 27500 control cabrio 1931 manuell 39 andere 10000 7 benzin ford nein 2016-03-26 00:00:00 0 9322 2016-04-06 09:46:59
21416 2016-03-12 08:36:21 Essex_super_six__Ford_A privat Angebot 16500 control cabrio 1927 manuell 40 andere 5000 5 benzin ford NaN 2016-03-12 00:00:00 0 74821 2016-03-15 12:45:12
21421 2016-03-05 17:45:32 Ford_Business_Coupe_Hotrod_Projekt.1937 privat Angebot 7000 test coupe 1937 manuell 85 andere 5000 8 benzin ford ja 2016-03-05 00:00:00 0 8359 2016-04-07 10:44:39
22101 2016-03-09 16:51:17 BMW_Andere privat Angebot 11500 test cabrio 1929 manuell 15 andere 5000 1 NaN bmw ja 2016-03-09 00:00:00 0 70569 2016-04-07 06:17:11
22316 2016-03-29 16:56:41 VW_Kaefer.__Zwei_zum_Preis_von_einem. privat Angebot 1500 control NaN 1000 manuell 0 kaefer 5000 0 benzin volkswagen NaN 2016-03-29 00:00:00 0 48324 2016-03-31 10:15:28
22659 2016-03-14 08:51:18 Opel_Corsa_B privat Angebot 500 test NaN 1910 NaN 0 corsa 150000 0 NaN opel NaN 2016-03-14 00:00:00 0 52393 2016-04-03 07:53:55
23804 2016-03-30 22:42:50 Mercedes_Benz_170V privat Angebot 4800 control limousine 1937 manuell 38 andere 5000 0 benzin mercedes_benz NaN 2016-03-30 00:00:00 0 23936 2016-03-30 22:42:50
24511 2016-03-17 19:45:11 Trabant__wartburg__Ostalgie privat Angebot 490 control NaN 1111 NaN 0 NaN 5000 0 NaN trabant NaN 2016-03-17 00:00:00 0 16818 2016-04-07 07:17:29
24855 2016-03-31 13:57:18 Andere_DKW_F8_Cabrio_Limosine privat Angebot 26600 control cabrio 1939 NaN 0 NaN 5000 7 NaN sonstige_autos ja 2016-03-31 00:00:00 0 55278 2016-04-06 07:15:57
26103 2016-03-16 12:37:38 Oldtimer_Sport___Cabrio_Georges_Irat privat Angebot 8500 test cabrio 1938 manuell 0 NaN 5000 6 benzin sonstige_autos ja 2016-03-16 00:00:00 0 61118 2016-03-16 19:43:33
26607 2016-03-21 14:45:26 MG_Andere privat Angebot 25500 control cabrio 1937 manuell 50 NaN 50000 5 benzin sonstige_autos nein 2016-03-21 00:00:00 0 53804 2016-04-06 11:17:17
28693 2016-03-22 17:48:41 Renault_Twingo privat Angebot 599 control kleinwagen 1910 manuell 0 NaN 5000 0 benzin renault NaN 2016-03-22 00:00:00 0 70376 2016-04-06 09:16:59
32585 2016-04-02 16:56:39 UNFAL_Auto privat Angebot 450 control NaN 1800 NaN 1800 NaN 5000 2 NaN mitsubishi nein 2016-04-02 00:00:00 0 63322 2016-04-04 14:46:21
39725 2016-03-25 22:54:31 Dodge_1937_Hotrod_Flathead_Scheunenfund_Oldtim... privat Angebot 8990 control limousine 1937 NaN 0 NaN 150000 7 benzin sonstige_autos NaN 2016-03-25 00:00:00 0 3042 2016-04-05 15:47:14
49283 2016-03-15 18:38:53 Citroen_HY privat Angebot 7750 control NaN 1001 NaN 0 andere 5000 0 NaN citroen NaN 2016-03-15 00:00:00 0 66706 2016-04-06 18:47:20

The entires with registration years 1800, 1000, 1111, 1800, 1001 are inaccurate. The entry of 1910 for an Opel Corsa B is inaccurate because Opel Corsas were not manufactured until 1982. The entry of 1910 for a Renault Twingo is inaccurate because the Twingo was not manufactured until 1992. The rest of the entries before 1940 are vintage cars which appear to be accurate so we will leave our analysis there.

We will remove entries with registration date before 1911 and after 2016

In [27]:
autos = autos[autos["registration_year"].between(1911, 2016)]
autos['registration_year'].value_counts(normalize=True)
Out[27]:
2000    0.066969
2005    0.062805
1999    0.062114
2004    0.058231
2003    0.058101
          ...   
1927    0.000022
1929    0.000022
1950    0.000022
1948    0.000022
1952    0.000022
Name: registration_year, Length: 77, dtype: float64

Exploring Price by Brand

We will aggregate by price using brands that make up 5% or greater of total entries. Over 20% of all entries are Volkswagens and the top 4 brands make up over 50% of the data.

In [28]:
autos['brand'].value_counts(normalize=False).head(10)
Out[28]:
volkswagen       9799
bmw              5107
opel             4970
mercedes_benz    4480
audi             4022
ford             3237
renault          2181
peugeot          1384
fiat             1187
seat              846
Name: brand, dtype: int64
In [29]:
brands = autos['brand'].value_counts(normalize=True)
significant_brands = brands[brands > 0.05].index
print(significant_brands)

avg_price_by_brand = {}
for b in significant_brands:
    selected_brand = autos[autos['brand'] == b] 
    price_mean = selected_brand['price'].mean() 
    avg_price_by_brand[b] = int(price_mean)
print(avg_price_by_brand)
Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')
{'volkswagen': 5436, 'bmw': 8381, 'opel': 3006, 'mercedes_benz': 8672, 'audi': 9380, 'ford': 3779}

The most common car brand in our dataset, Volkswagen, has an average list price of 5,436. Audi is the most expensive at 9,380. The least expenive are Ford and Opel.

For the top 6 brands, let's use aggregation to understand the average mileage for those cars and if there's any visible link with mean price.

In [30]:
avg_milage_by_brand = {}
for b in significant_brands:
    selected_brand = autos[autos['brand'] == b] 
    milage_mean = selected_brand['odometer_km'].mean()
    avg_milage_by_brand[b] = int(milage_mean)

print(avg_milage_by_brand)
{'volkswagen': 128799, 'bmw': 132695, 'opel': 129380, 'mercedes_benz': 131025, 'audi': 129245, 'ford': 124277}

Convert both dictionaries to series objects:

In [31]:
apb_series = pd.Series(avg_price_by_brand)
print(apb_series)
amb_series = pd.Series(avg_milage_by_brand)
print(amb_series)
volkswagen       5436
bmw              8381
opel             3006
mercedes_benz    8672
audi             9380
ford             3779
dtype: int64
volkswagen       128799
bmw              132695
opel             129380
mercedes_benz    131025
audi             129245
ford             124277
dtype: int64
In [32]:
price_milage_df = pd.DataFrame(apb_series, columns=['price_mean'])
price_milage_df
Out[32]:
price_mean
volkswagen 5436
bmw 8381
opel 3006
mercedes_benz 8672
audi 9380
ford 3779
In [33]:
price_milage_df['milage_mean'] = amb_series
print(price_milage_df)
               price_mean  milage_mean
volkswagen           5436       128799
bmw                  8381       132695
opel                 3006       129380
mercedes_benz        8672       131025
audi                 9380       129245
ford                 3779       124277

The two least expensive brands have the lowest milage readings while the 3 most expensive have the most milage.

Identify data that uses German Words, translate them, and map the value to English

In [34]:
autos.head(50)
Out[34]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postalCode last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot 5000 control bus 2004 manuell 158 andere 150000 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot 8500 control limousine 1997 automatik 286 7er 150000 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot 8990 test limousine 2009 manuell 102 golf 70000 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot 4350 control kleinwagen 2007 automatik 71 fortwo 70000 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot 1350 test kombi 2003 manuell 0 focus 150000 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50
5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... privat Angebot 7900 test bus 2006 automatik 150 voyager 150000 4 diesel chrysler NaN 2016-03-21 00:00:00 0 22962 2016-04-06 09:45:21
6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... privat Angebot 300 test limousine 1995 manuell 90 golf 150000 8 benzin volkswagen NaN 2016-03-20 00:00:00 0 31535 2016-03-23 02:48:59
7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS privat Angebot 1990 control limousine 1998 manuell 90 golf 150000 12 diesel volkswagen nein 2016-03-16 00:00:00 0 53474 2016-04-07 03:17:32
8 2016-03-22 16:51:34 Seat_Arosa privat Angebot 250 test NaN 2000 manuell 0 arosa 150000 10 NaN seat nein 2016-03-22 00:00:00 0 7426 2016-03-26 18:18:10
9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage privat Angebot 590 control bus 1997 manuell 90 megane 150000 7 benzin renault nein 2016-03-16 00:00:00 0 15749 2016-04-06 10:46:35
11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden privat Angebot 350 control NaN 2000 NaN 0 NaN 150000 0 benzin mercedes_benz NaN 2016-03-16 00:00:00 0 17498 2016-03-16 18:45:34
12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... privat Angebot 5299 control kleinwagen 2010 automatik 71 fortwo 50000 9 benzin smart nein 2016-03-31 00:00:00 0 34590 2016-04-06 14:17:52
13 2016-03-23 10:48:32 Audi_A3_1.6_tuning privat Angebot 1350 control limousine 1999 manuell 101 a3 150000 11 benzin audi nein 2016-03-23 00:00:00 0 12043 2016-04-01 14:17:13
14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... privat Angebot 3999 test kleinwagen 2007 manuell 75 clio 150000 9 benzin renault NaN 2016-03-23 00:00:00 0 81737 2016-04-01 15:46:47
15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection privat Angebot 18900 test coupe 1982 automatik 203 NaN 80000 6 benzin sonstige_autos nein 2016-04-01 00:00:00 0 61276 2016-04-02 21:10:48
16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi privat Angebot 350 test kombi 1999 manuell 101 vectra 150000 5 benzin opel nein 2016-03-16 00:00:00 0 57299 2016-03-18 05:29:37
17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 privat Angebot 5500 test coupe 1990 manuell 205 scirocco 150000 6 benzin volkswagen nein 2016-03-29 00:00:00 0 74821 2016-04-05 20:46:26
18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring privat Angebot 300 control bus 1995 manuell 150 3er 150000 0 benzin bmw NaN 2016-03-26 00:00:00 0 54329 2016-04-02 12:16:41
19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 privat Angebot 4150 control suv 2004 manuell 124 andere 150000 2 lpg mazda nein 2016-03-17 00:00:00 0 40878 2016-03-17 14:45:58
20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... privat Angebot 3500 test kombi 2003 manuell 131 a4 150000 5 diesel audi NaN 2016-03-05 00:00:00 0 53913 2016-03-07 05:46:46
21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio privat Angebot 41500 test cabrio 2004 manuell 320 911 150000 4 benzin porsche nein 2016-03-06 00:00:00 0 65428 2016-04-05 23:46:19
22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio privat Angebot 25450 control cabrio 2015 manuell 184 cooper 10000 1 benzin mini nein 2016-03-28 00:00:00 0 44789 2016-04-01 06:45:30
23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima privat Angebot 7999 control bus 2010 manuell 120 NaN 150000 2 diesel peugeot nein 2016-03-10 00:00:00 0 30900 2016-03-17 08:45:17
24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. privat Angebot 48500 control limousine 2014 automatik 306 5er 30000 12 benzin bmw nein 2016-04-03 00:00:00 0 22547 2016-04-07 13:16:50
26 2016-04-03 22:46:28 Volkswagen_Polo_Fox privat Angebot 777 control kleinwagen 1992 manuell 54 polo 125000 2 benzin volkswagen nein 2016-04-03 00:00:00 0 38110 2016-04-05 23:46:48
28 2016-03-19 21:56:19 MINI_Cooper_D privat Angebot 5250 control kleinwagen 2007 manuell 110 cooper 150000 7 diesel mini ja 2016-03-19 00:00:00 0 15745 2016-04-07 14:58:48
29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... privat Angebot 4999 test kombi 2004 automatik 204 e_klasse 150000 10 diesel mercedes_benz nein 2016-04-02 00:00:00 0 47638 2016-04-02 12:45:44
31 2016-03-14 16:53:09 Noch_gut_erhaltenen_C_320 privat Angebot 2850 test kombi 2002 automatik 218 c_klasse 150000 7 benzin mercedes_benz nein 2016-03-14 00:00:00 0 41065 2016-03-16 07:19:04
32 2016-03-20 05:03:03 Corsa_mit_TÜV_5.2016 privat Angebot 350 control kleinwagen 1999 manuell 0 corsa 150000 7 benzin opel NaN 2016-03-20 00:00:00 0 27619 2016-04-06 03:15:20
33 2016-03-15 12:25:26 Audi_A3_2.0_TDI_Sportback_DPF_Kupplung_defekt_... privat Angebot 4800 test kombi 2006 manuell 140 a3 150000 8 diesel audi nein 2016-03-15 00:00:00 0 61169 2016-04-07 13:15:31
34 2016-04-05 07:55:07 Bmw_530d__navi_tuv03/18 privat Angebot 2600 test kombi 2000 automatik 193 5er 150000 6 diesel bmw NaN 2016-04-05 00:00:00 0 95111 2016-04-07 12:15:47
35 2016-03-29 21:52:56 Ford_Mondeo privat Angebot 1200 test kombi 1998 manuell 0 mondeo 150000 12 benzin ford nein 2016-03-29 00:00:00 0 52428 2016-04-06 08:45:35
36 2016-03-16 23:46:31 Ford_Mondeo_2.0_Turnier_TDCi_DPF_Titanium privat Angebot 7500 test kombi 2007 manuell 131 mondeo 150000 11 diesel ford nein 2016-03-16 00:00:00 0 34246 2016-04-05 16:21:16
37 2016-03-23 16:47:46 Seat_Altea_XL_2.0_TDI_CR_DPF_Style privat Angebot 13800 test bus 2012 manuell 140 altea 90000 2 diesel seat nein 2016-03-23 00:00:00 0 85113 2016-04-05 20:17:53
38 2016-03-21 15:51:10 Volkswagen_Golf_1.4_Special privat Angebot 2850 control limousine 2002 manuell 75 golf 125000 2 benzin volkswagen nein 2016-03-21 00:00:00 0 63674 2016-03-28 12:16:06
39 2016-03-11 13:58:38 Mercedes_C_180_Kombi__zu_verkaufen privat Angebot 1450 test kombi 1999 automatik 122 c_klasse 150000 6 benzin mercedes_benz nein 2016-03-11 00:00:00 0 12683 2016-04-06 15:17:52
40 2016-03-07 14:50:03 VW_Golf__4_Cabrio_2.0_GTI_16V___Leder_MFA_Alus... privat Angebot 3500 control cabrio 1999 manuell 150 golf 150000 1 benzin volkswagen nein 2016-03-07 00:00:00 0 6780 2016-03-12 02:15:52
41 2016-03-10 10:46:08 Passat_3b_1.9_TDI_Highline__angemeldet_mit_tuv... privat Angebot 3200 test kombi 2003 manuell 131 NaN 150000 7 NaN volkswagen nein 2016-03-10 00:00:00 0 28259 2016-04-06 20:19:08
42 2016-03-22 21:37:03 Vw_Polo_l.0_/60_PS_Blue_Motion_Technologie_Son... privat Angebot 11900 control kleinwagen 2014 manuell 60 polo 20000 7 benzin volkswagen NaN 2016-03-22 00:00:00 0 26629 2016-04-06 18:44:39
43 2016-03-08 16:51:22 Audi_A1_1.2_TFSI_Attraction_mit_Garantie privat Angebot 11500 test kleinwagen 2011 manuell 86 a1 60000 9 benzin audi nein 2016-03-08 00:00:00 0 85049 2016-04-07 03:16:05
44 2016-03-12 10:38:06 Zu_verkaufen privat Angebot 900 control kleinwagen 1998 manuell 43 twingo 150000 7 benzin renault nein 2016-03-12 00:00:00 0 79725 2016-04-06 01:17:27
45 2016-03-12 17:39:55 Toyota_Starlet_1.3 privat Angebot 1000 test kleinwagen 1995 manuell 75 andere 150000 3 benzin toyota nein 2016-03-12 00:00:00 0 74597 2016-04-07 08:45:19
46 2016-03-31 10:53:28 BMW_mit__Lpg privat Angebot 9000 control NaN 2005 automatik 0 NaN 150000 0 NaN bmw NaN 2016-03-31 00:00:00 0 12351 2016-04-06 03:44:41
47 2016-03-28 19:51:51 Mercedes_Benz_A_150 privat Angebot 3499 control limousine 2008 manuell 95 a_klasse 150000 8 benzin mercedes_benz nein 2016-03-28 00:00:00 0 59557 2016-04-01 05:17:51
48 2016-03-20 12:41:50 Mercedes_C180_Sport_1_8l privat Angebot 1099 test limousine 1998 manuell 122 c_klasse 150000 2 benzin mercedes_benz nein 2016-03-20 00:00:00 0 83024 2016-03-24 03:17:53
49 2016-03-27 15:36:42 Mercedes_Benz_CL_500__BlueEFFICIENCY__7G_TRONIC privat Angebot 56000 control coupe 2012 automatik 435 cl 60000 2 benzin mercedes_benz nein 2016-03-27 00:00:00 0 34474 2016-04-05 11:47:52
50 2016-03-09 16:48:05 •??_????????_??•Top!!!_Peugeot_308_sw_Sport_•?... privat Angebot 5999 test kombi 2008 manuell 150 3_reihe 125000 8 benzin peugeot nein 2016-03-09 00:00:00 0 55569 2016-04-07 06:17:09
51 2016-03-22 12:57:38 Mercedes_Benz_S_320_CDI privat Angebot 6000 control limousine 2005 automatik 204 s_klasse 150000 7 diesel mercedes_benz ja 2016-03-22 00:00:00 0 49492 2016-04-04 00:17:51
52 2016-03-25 18:50:03 Senator_A_3.0E_Karosserie_restauriert_m._viele... privat Angebot 3500 test limousine 1985 NaN 0 andere 5000 0 benzin opel nein 2016-03-25 00:00:00 0 63500 2016-04-07 00:46:00
53 2016-04-02 17:45:01 Dacia_Sandero_Stepway_II_MPI_85__top_gepflegt privat Angebot 9000 control limousine 2012 manuell 84 sandero 30000 7 benzin dacia nein 2016-04-02 00:00:00 0 53639 2016-04-06 16:46:50

The following columns have German words: 'seller','offer_type', 'vehicle_type', 'gearbox', 'fuel_type', 'unrepaired_damage'.

Let's use the value_counts() method to find out all the words that are contained within each column:

In [35]:
autos['seller'].value_counts()
Out[35]:
privat        46349
gewerblich        1
Name: seller, dtype: int64
In [36]:
autos['offer_type'].value_counts()
Out[36]:
Angebot    46350
Name: offer_type, dtype: int64
In [37]:
autos['vehicle_type'].value_counts()
Out[37]:
limousine     12558
kleinwagen    10497
kombi          8906
bus            4023
cabrio         3008
coupe          2456
suv            1962
andere          388
Name: vehicle_type, dtype: int64
In [38]:
autos['gearbox'].value_counts()
Out[38]:
manuell      34546
automatik     9831
Name: gearbox, dtype: int64
In [39]:
autos['fuel_type'].value_counts()
Out[39]:
benzin     28380
diesel     14013
lpg          648
cng           70
hybrid        37
elektro       18
andere        14
Name: fuel_type, dtype: int64
In [40]:
autos['unrepaired_damage'].value_counts()
Out[40]:
nein    33772
ja       4482
Name: unrepaired_damage, dtype: int64

Here is the common english translation for these words in the context of vehicles:

'seller' privat = private, gewerblich = trade

'offer_type' Agnebot = asking price

'vehicle_type' limousine = sedan, kleinwagen = subcompact, kombi = sport_wagen, bus = van, cabrio = convertible, coupe = couple, suv = suv, andere = other

'gearbox' manuell = manual, automatik = automatic

'fuel_type' benzin = gas, diesel = diesel, lpg = lpg (liquified petroleum gas), cng = cng (compressed natural gas), hybrid = hybrid, elektro = electric, andere = other

'unrepaired_damage' nein = no, ja = yes

Now that we have our translations, lets replace the German words with the English words. To do this we will use the series.map method:

In [41]:
autos['seller'] = autos['seller'].map({'privat':'private', 'gewerblich':'trade'})
autos['offer_type'] = autos['offer_type'].map({'Angebot':'asking price'})
autos['vehicle_type'] = autos['vehicle_type'].map({'limousine':'sedan', 'bus':'bus', 'coupe':'coupe', 'suv':'suv','kleinwagen':'subcompact', 'kombi':'sport wagen', 'bus':'van', 'cabrio':'convertible', 'andere':'other'})
autos['gearbox'] = autos['gearbox'].map({'manuell':'manual', 'automatik':'automatic'})
autos['fuel_type'] = autos['fuel_type'].map({'benzin':'gas', 'diesel':'diesel', 'lpg':'lpg', 'cng':'cng', 'hybrid':'hybrid', 'elektro':'electric', 'andere':'other'})
autos['unrepaired_damage'] = autos['unrepaired_damage'].map({'nein':'no', 'ja':'yes'})
In [42]:
autos.head()
Out[42]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postalCode last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD private asking price 5000 control van 2004 manual 158 andere 150000 3 lpg peugeot no 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik private asking price 8500 control sedan 1997 automatic 286 7er 150000 6 gas bmw no 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United private asking price 8990 test sedan 2009 manual 102 golf 70000 7 gas volkswagen no 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... private asking price 4350 control subcompact 2007 automatic 71 fortwo 70000 6 gas smart no 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... private asking price 1350 test sport wagen 2003 manual 0 focus 150000 7 gas ford no 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

Convert the dates to be unifor numeric data so '2016-01-01' becomes integer 20160101

In [43]:
autos['date_crawled'] = autos['date_crawled'].str.replace('-', '').str[:9].astype(int)
autos['ad_created'] = autos['ad_created'].str.replace('-', '').str[:9].astype(int)
autos['last_seen'] = autos['last_seen'].str.replace('-', '').str[:9].astype(int)
In [44]:
autos.head()
Out[44]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer_km registration_month fuel_type brand unrepaired_damage ad_created nr_of_pictures postalCode last_seen
0 20160326 Peugeot_807_160_NAVTECH_ON_BOARD private asking price 5000 control van 2004 manual 158 andere 150000 3 lpg peugeot no 20160326 0 79588 20160406
1 20160404 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik private asking price 8500 control sedan 1997 automatic 286 7er 150000 6 gas bmw no 20160404 0 71034 20160406
2 20160326 Volkswagen_Golf_1.6_United private asking price 8990 test sedan 2009 manual 102 golf 70000 7 gas volkswagen no 20160326 0 35394 20160406
3 20160312 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... private asking price 4350 control subcompact 2007 automatic 71 fortwo 70000 6 gas smart no 20160312 0 33729 20160315
4 20160401 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... private asking price 1350 test sport wagen 2003 manual 0 focus 150000 7 gas ford no 20160401 0 39218 20160401

Furthur Analysis

To answer this we will filter by brand and select just the 'model' column. Then we will create a dictionary frequency table, iterate over our column, and add the counts to our dictionary.

In [45]:
import operator
vw = autos['brand'] == 'volkswagen'
vw_models = autos[vw]['model']

model_freq_vw = {}

for m in vw_models:
    if m not in model_freq_vw:
        model_freq_vw[m] = 1
    else:
        model_freq_vw[m] += 1
sorted_vw = sorted(model_freq_vw.items(), key=operator.itemgetter(1), reverse=True)
sorted_vw[0]
Out[45]:
('golf', 3684)

From here we could use this same code over and over for each brand but to speed up the process, we will make a custom function:

In [46]:
import operator  #This will let us sort our dictionary by value so that we can get the dictionary sorted by most common models
def brand_model_freq(b):
    
    brand_filter = autos['brand'] == b
    brand_models = autos[brand_filter]['model']
    
    model_freq = {}
    
    for m in brand_models:
        if m not in model_freq:
            model_freq[m] = 1
        else:
            model_freq[m] += 1
    
    sorted_brand = sorted(model_freq.items(), key=operator.itemgetter(1), reverse=True)
    
    return sorted_brand[0]  
In [47]:
brand_model_freq('volkswagen')
Out[47]:
('golf', 3684)
In [48]:
brand_model_freq('bmw')
Out[48]:
('3er', 2602)
In [49]:
brand_model_freq('opel')
Out[49]:
('corsa', 1567)
In [50]:
brand_model_freq('mercedes_benz')
Out[50]:
('c_klasse', 1135)
In [51]:
brand_model_freq('audi')
Out[51]:
('a4', 1226)
In [52]:
brand_model_freq('ford')
Out[52]:
('focus', 757)

How does milage affect price?

To answer this we will create milage ranges that make sense and then assign our entries into those respective groups.

autos['odometer_km'].value_counts()

In [53]:
group_less_than_20 = autos.loc[autos["odometer_km"]<20000,'price']
group_between_20_and_50 = autos.loc[autos["odometer_km"].between(20000,50000),'price']
group_between_50_and_80 = autos.loc[autos["odometer_km"].between(50000,80000),'price']
group_between_80_and_100 = autos.loc[autos["odometer_km"].between(80000,100000),'price']
group_between_100_and_125 = autos.loc[autos["odometer_km"].between(100000,125000),'price']
group_between_125_and_150 = autos.loc[autos["odometer_km"].between(125000,150000),'price']
group_150_and_over = autos.loc[autos["odometer_km"] == 150000,'price']  
In [54]:
group_less_than_20.mean()
Out[54]:
12635.730646871687
In [55]:
group_between_20_and_50.mean()
Out[55]:
15959.703839122487
In [56]:
group_between_50_and_80.mean()
Out[56]:
11563.292594178081
In [57]:
group_between_80_and_100.mean()
Out[57]:
8700.130110062893
In [58]:
group_between_100_and_125.mean()
Out[58]:
6817.424295263005
In [59]:
group_between_125_and_150.mean()
Out[59]:
4133.174100843724
In [60]:
group_150_and_over.mean()
Out[60]:
3792.4303924520727

In general, we see the price decreasing as milage increases which is expected.

How Much Cheaper Are Cars with Damage Than Those Without?

To answer this we will use boolean indexing to filter our dataframe by damage, then find the mean:

In [61]:
with_damage = autos[autos['unrepaired_damage'] == 'yes']
with_damage['price'].mean()
Out[61]:
2269.5798750557788
In [62]:
no_damage = autos[autos['unrepaired_damage'] == 'no']
no_damage['price'].mean()
Out[62]:
7177.137984128864

On average, a vehicle with damage is one third the price of a vehicle with no damage.