This rich data set provides a comprehensive window into Germany's used automobile market, spanning over 400,000 online sales listings from 2014-2016.
The contents of the data was in German, so some transalations and modifications (replacements of words) were made on the relevant fields.
Data Source is from Data World. However, the data was scrapped from eBay
Data was visualised using the Matplotlib library.
This data was from 2014 - 2016
Business problems intended to be analysed and identified from tyhis data were:
understanding the most used brands and their relationship with the vehicle types
undersatnding how prices affect the brands and vehicle types
understand consuer's preference for fuel type
Understand the sdvertisement trend on ebay (for cars)
identiffy the seller category with the most advertised cars on ebay.
Comparisms between high-end luxury vehicles versus their affordability
other metrics like pricing, brands, models, mileage, repairs and modifications were uncovered; revealling an intriguing demand patterns and shifts within this vast market.
A few major dimensions stand out as offering meaningful insights. These analyses revealed how the ebay platform performed and insights to help them understand conusmers' preferences.
Hopefully, this informs listings their sales team would be more on the lookout for (cosidering their increasing customer demands).
Data fields:
import pandas as pd
df = pd.read_csv(r"C:\Users\Teni\Desktop\Git-Github\Online Datasets\autos.csv")
df
# The original dataset
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | nrofpictures | postalcode | lastseen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | privat | Angebot | 480 | test | NaN | 1993 | manuell | 0 | golf | 150000 | 0 | benzin | volkswagen | NaN | 2016-03-24T00:00:00 | False | 70435 | 2016-04-07T03:16:57 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | privat | Angebot | 18300 | test | coupe | 2011 | manuell | 190 | NaN | 125000 | 5 | diesel | audi | ja | 2016-03-24T00:00:00 | False | 66954 | 2016-04-07T01:46:50 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | privat | Angebot | 9800 | test | suv | 2004 | automatik | 163 | grand | 125000 | 8 | diesel | jeep | NaN | 2016-03-14T00:00:00 | False | 90480 | 2016-04-05T12:47:46 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | privat | Angebot | 1500 | test | kleinwagen | 2001 | manuell | 75 | golf | 150000 | 6 | benzin | volkswagen | nein | 2016-03-17T00:00:00 | False | 91074 | 2016-03-17T17:40:17 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | privat | Angebot | 3600 | test | kleinwagen | 2008 | manuell | 69 | fabia | 90000 | 7 | diesel | skoda | nein | 2016-03-31T00:00:00 | False | 60437 | 2016-04-06T10:17:21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | privat | Angebot | 2200 | test | NaN | 2005 | NaN | 0 | NaN | 20000 | 1 | NaN | sonstige_autos | NaN | 2016-03-14T00:00:00 | False | 39576 | 2016-04-06T00:46:52 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | privat | Angebot | 1199 | test | cabrio | 2000 | automatik | 101 | fortwo | 125000 | 3 | benzin | smart | nein | 2016-03-05T00:00:00 | False | 26135 | 2016-03-11T18:17:12 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | privat | Angebot | 9200 | test | bus | 1996 | manuell | 102 | transporter | 150000 | 3 | diesel | volkswagen | nein | 2016-03-19T00:00:00 | False | 87439 | 2016-04-07T07:15:26 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | privat | Angebot | 3400 | test | kombi | 2002 | manuell | 100 | golf | 150000 | 6 | diesel | volkswagen | NaN | 2016-03-20T00:00:00 | False | 40764 | 2016-03-24T12:45:21 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | privat | Angebot | 28990 | control | limousine | 2013 | manuell | 320 | m_reihe | 50000 | 8 | benzin | bmw | nein | 2016-03-07T00:00:00 | False | 73326 | 2016-03-22T03:17:10 |
371528 rows × 20 columns
# To get the summary, and datatype, of the fields.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 371528 entries, 0 to 371527 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datecrawled 371528 non-null object 1 name 371528 non-null object 2 seller 371528 non-null object 3 offertype 371528 non-null object 4 price 371528 non-null int64 5 abtest 371528 non-null object 6 vehicletype 333659 non-null object 7 yearofregistration 371528 non-null int64 8 gearbox 351319 non-null object 9 powerps 371528 non-null int64 10 model 351044 non-null object 11 kilometer 371528 non-null int64 12 monthofregistration 371528 non-null int64 13 fueltype 338142 non-null object 14 brand 371528 non-null object 15 notrepaireddamage 299468 non-null object 16 datecreated 371528 non-null object 17 nrofpictures 371528 non-null bool 18 postalcode 371528 non-null int64 19 lastseen 371528 non-null object dtypes: bool(1), int64(6), object(13) memory usage: 54.2+ MB
To solve the business questions on:
The needed fields are: seller, price, vehicletype, gearbox, powerups, kilometer, datecreated, fueltype
Translate needed columns to the English Language
# summarizing the categorical data uder offertype. This is in German language; so the next code replaces the field values
df['offertype'].value_counts()
Angebot 371516 Gesuch 12 Name: offertype, dtype: int64
df['offertype'] = df['offertype'].replace({'Angebot': 'Offer', 'Gesuch': 'Request'})
# updated field value replacement
df['offertype'].value_counts()
Offer 371516 Request 12 Name: offertype, dtype: int64
# summarizing the categorical data uder fueltype. This is in German language; so the next code replaces the field values
df['fueltype'].value_counts()
benzin 223857 diesel 107746 lpg 5378 cng 571 hybrid 278 andere 208 elektro 104 Name: fueltype, dtype: int64
df['fueltype']= df['fueltype'].replace({'benzin':'Petrol', 'diesel':'Diesel','lpg':'Liquefied Petroleum Gas', 'cng':'Compressed Natural Gas', 'hybrid': 'Hybrid', 'andere': 'Others', 'elektro':'Electric'})
# updated field replacement
df['fueltype'].value_counts()
Petrol 223857 Diesel 107746 Liquefied Petroleum Gas 5378 Compressed Natural Gas 571 Hybrid 278 Others 208 Electric 104 Name: fueltype, dtype: int64
df['gearbox']= df['gearbox'].replace({'manuell': 'Manual', 'automatik': 'Automatic'})
# updated field replacement f the gearbox attribute of the data
df['gearbox'].value_counts()
Manual 274214 Automatic 77105 Name: gearbox, dtype: int64
df['abtest'].value_counts()
test 192585 control 178943 Name: abtest, dtype: int64
df['vehicletype'].value_counts()
limousine 95894 kleinwagen 80023 kombi 67564 bus 30201 cabrio 22898 coupe 19015 suv 14707 andere 3357 Name: vehicletype, dtype: int64
df['vehicletype']=df['vehicletype'].replace({'kleinwagen': 'small car', 'kombi': 'station wagon', 'cabrio': 'convertible', 'andere': 'other'})
df['seller'].value_counts()
privat 371525 gewerblich 3 Name: seller, dtype: int64
df['seller'] = df['seller'].replace({'privat':'private', 'gewerblich':'dealer'})
df['seller'].value_counts()
private 371525 dealer 3 Name: seller, dtype: int64
df['notrepaireddamage'].value_counts()
nein 263182 ja 36286 Name: notrepaireddamage, dtype: int64
df['notrepaireddamage'] = df['notrepaireddamage'].replace({'nein': 'No', 'ja': 'Yes'})
df['model'].head(100)
0 golf 1 NaN 2 grand 3 golf 4 fabia ... 95 a1 96 insignia 97 passat 98 3er 99 transporter Name: model, Length: 100, dtype: object
df['model']= df['model'].replace({'andere': 'other', })
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | nrofpictures | postalcode | lastseen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | NaN | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | NaN | 2016-03-24T00:00:00 | False | 70435 | 2016-04-07T03:16:57 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | NaN | 125000 | 5 | Diesel | audi | Yes | 2016-03-24T00:00:00 | False | 66954 | 2016-04-07T01:46:50 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | NaN | 2016-03-14T00:00:00 | False | 90480 | 2016-04-05T12:47:46 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17T00:00:00 | False | 91074 | 2016-03-17T17:40:17 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31T00:00:00 | False | 60437 | 2016-04-06T10:17:21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | NaN | 2005 | NaN | 0 | NaN | 20000 | 1 | NaN | sonstige_autos | NaN | 2016-03-14T00:00:00 | False | 39576 | 2016-04-06T00:46:52 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05T00:00:00 | False | 26135 | 2016-03-11T18:17:12 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19T00:00:00 | False | 87439 | 2016-04-07T07:15:26 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | NaN | 2016-03-20T00:00:00 | False | 40764 | 2016-03-24T12:45:21 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07T00:00:00 | False | 73326 | 2016-03-22T03:17:10 |
371528 rows × 20 columns
Drop irrelevant columns
Some attributes are not needed in solving the business problem, so they'd all be dropped
df['nrofpictures'].value_counts()
False 371528 Name: nrofpictures, dtype: int64
df = df.drop(['nrofpictures', 'lastseen', 'postalcode'], axis=1)
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | NaN | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | NaN | 2016-03-24T00:00:00 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | NaN | 125000 | 5 | Diesel | audi | Yes | 2016-03-24T00:00:00 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | NaN | 2016-03-14T00:00:00 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17T00:00:00 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31T00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | NaN | 2005 | NaN | 0 | NaN | 20000 | 1 | NaN | sonstige_autos | NaN | 2016-03-14T00:00:00 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05T00:00:00 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19T00:00:00 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | NaN | 2016-03-20T00:00:00 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07T00:00:00 |
371528 rows × 17 columns
Pricing Analysis: Investigate the price distribution for different vehicle types, brands, or models to understand pricing trends and identify outliers.
Trend Analysis over Time: Analyze the trends in the number of listings or sales over different months or years to understand seasonal patterns in the automotive market.
Brand Comparison: Compare the popularity or sales of different car brands to understand which brands are more preferred in the market and how they perform in terms of pricing or demand.
Vehicle Condition and Price Correlation: Examine how the condition of a vehicle (repaired damage or not) correlates with its pricing. Determine if repaired damage affects the selling price significantly.
Fuel Type Preference: Analyze the preference for different fuel types among different vehicle types or brands. Understand if there's a shift in preference towards more eco-friendly options like hybrid or electric vehicles.
Feature Importance: Analyze the impact of specific vehicle features (like power, gearbox type, vehicle type, etc.) on the pricing to identify which features drive higher prices or are more sought after by buyers.
Pricing Analysis: Investigate the price distribution for different vehicle types, brands, or models to understand pricing trends and identify outliers.
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | NaN | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | NaN | 2016-03-24T00:00:00 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | NaN | 125000 | 5 | Diesel | audi | Yes | 2016-03-24T00:00:00 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | NaN | 2016-03-14T00:00:00 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17T00:00:00 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31T00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | NaN | 2005 | NaN | 0 | NaN | 20000 | 1 | NaN | sonstige_autos | NaN | 2016-03-14T00:00:00 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05T00:00:00 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19T00:00:00 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | NaN | 2016-03-20T00:00:00 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07T00:00:00 |
371528 rows × 17 columns
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 371528 entries, 0 to 371527 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datecrawled 371528 non-null object 1 name 371528 non-null object 2 seller 371528 non-null object 3 offertype 371528 non-null object 4 price 371528 non-null int64 5 abtest 371528 non-null object 6 vehicletype 333659 non-null object 7 yearofregistration 371528 non-null int64 8 gearbox 351319 non-null object 9 powerps 371528 non-null int64 10 model 351044 non-null object 11 kilometer 371528 non-null int64 12 monthofregistration 371528 non-null int64 13 fueltype 338142 non-null object 14 brand 371528 non-null object 15 notrepaireddamage 299468 non-null object 16 datecreated 371528 non-null object dtypes: int64(5), object(12) memory usage: 48.2+ MB
df['price'].isnull()
0 False 1 False 2 False 3 False 4 False ... 371523 False 371524 False 371525 False 371526 False 371527 False Name: price, Length: 371528, dtype: bool
df = df.fillna('None')
# Since there's no Null value in Price, it's safe to fill all null values in the dataframe with the string 'None' than the integer '0'
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | None | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | None | 2016-03-24T00:00:00 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | None | 125000 | 5 | Diesel | audi | Yes | 2016-03-24T00:00:00 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | None | 2016-03-14T00:00:00 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17T00:00:00 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31T00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | None | 2005 | None | 0 | None | 20000 | 1 | None | sonstige_autos | None | 2016-03-14T00:00:00 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05T00:00:00 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19T00:00:00 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | None | 2016-03-20T00:00:00 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07T00:00:00 |
371528 rows × 17 columns
import matplotlib.pyplot as plt
import pandas as pd
# Calculate the average price for each brand (top 10)
avg_prices = df.groupby('brand')['price'].mean().nlargest(10)
# Get the number of brands for each category (top 10)
brand_count = df['brand'].value_counts()[:10]
# Using the add_axes because I'd like to display 2 related information in one plot
fig = plt.figure(figsize=(22, 15)) # Adjust the figure size if needed
ax1 = fig.add_axes([0, 0, 1, 1])
# Plotting the Main graph for the Avg_prices for the top 10 brands
ax1.bar(avg_prices.index,avg_prices, color='blue')
# Defining the title of the main plot
ax1.set_title('Prices of the top 10 Brands advertised on the platform', fontsize=20, fontweight='bold',color='blue')
# Defining the x and y labels
ax1.set_xlabel('Brand', fontsize=20,fontweight='bold')
ax1.set_ylabel('Average Price', fontsize=20, fontweight='bold')
# # Adjusting the x and y items
ax1.tick_params(axis='x', rotation=45, labelsize=16)
ax1.tick_params(axis='y', rotation=45, labelsize=16)
# Plotting the inner graph for the count of the top 10 Brands
ax2 = fig.add_axes([0.5, 0.5, 0.25, 0.25])
ax2.bar(brand_count.index,brand_count, color='skyblue')
# Defining the title of the main plot
ax2.set_title('Top 10 advertised brands on the platform ', fontsize=20, fontweight='bold',color='skyblue')
# Defining the x and y labels
ax2.set_xlabel('Brand', fontsize=14,fontweight='bold')
ax2.set_ylabel('Count', fontsize=14, fontweight='bold')
# Adjusting the x and y items
ax2.tick_params(axis='x', rotation=45, labelsize=15)
ax2.tick_params(axis='y', labelsize=15)
# Display the plot
plt.show()
df['vehicletype'].value_counts()
limousine 95894 small car 80023 station wagon 67564 None 37869 bus 30201 convertible 22898 coupe 19015 suv 14707 other 3357 Name: vehicletype, dtype: int64
import matplotlib.pyplot as plt
import pandas as pd
# Assuming df is your DataFrame
# Filter out rows where vehicletype is 'None' or 'Other'
filtered_df = df[~df['vehicletype'].isin(['None', 'other'])]
# Updated vehicle type and price data after filtering
vehicletype = filtered_df['vehicletype'].value_counts()
price_type = filtered_df.groupby('vehicletype')['price'].mean().sort_values(ascending=False)
# Create a figure and two subplots
fig, (axes1, axes2) = plt.subplots(1, 2, figsize=(12, 5))
# First subplot - Average prices based on vehicle types
axes1.bar(price_type.index, price_type, color='purple')
axes1.set_xlabel('Vehicle Types')
axes1.set_ylabel('Prices')
axes1.set_title('Top 7 Vehices per Price', fontweight='bold')
axes1.tick_params(axis='x', rotation=45, labelsize=10) # Rotate x-axis labels for better reada6bility
# Second subplot - Count of vehicles based on types
axes2.bar(vehicletype.index, vehicletype, color='blue')
axes2.set_xlabel('Vehicle Type')
axes2.set_ylabel('Count')
axes2.set_title('Top 7 Vehicles', fontweight='bold')
axes2.tick_params(axis='x', rotation=45, labelsize=10) # Rotate x-axis labels for better readability
plt.tight_layout()
plt.show()
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | None | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | None | 2016-03-24T00:00:00 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | None | 125000 | 5 | Diesel | audi | Yes | 2016-03-24T00:00:00 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | None | 2016-03-14T00:00:00 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17T00:00:00 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31T00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | None | 2005 | None | 0 | None | 20000 | 1 | None | sonstige_autos | None | 2016-03-14T00:00:00 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05T00:00:00 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19T00:00:00 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | None | 2016-03-20T00:00:00 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07T00:00:00 |
371528 rows × 17 columns
df['datecreated'] = pd.to_datetime(df['datecreated'])
year = df['datecreated'].dt.year
year_count = year.value_counts()
year_count
2016 371498 2015 29 2014 1 Name: datecreated, dtype: int64
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
df['datecreated'] = pd.to_datetime(df['datecreated'])
year_created = df['datecreated'].dt.year
year_count = year_created.value_counts()
fig = plt.figure()
axes1 = fig.add_axes([0, 0, 1, 1])
axes1.bar(year_count.index, year_count, color='blue')
axes1.set_xlabel('Year')
axes1.set_ylabel('Count')
axes1.set_title('Sales Year Trend', fontweight='bold')
axes1.xaxis.set_major_locator(ticker.MaxNLocator(integer=True)) # Ensure integer values on x-axis
axes2 = fig.add_axes([0.25, 0.25, 0.25, 0.25])
axes2.bar(year_count.index,year_count, color='purple')
axes2.set_xlim(2014, 2015)
axes2.set_ylim(0, 35)
axes2.set_xlabel('Year', fontsize=8)
axes2.set_ylabel('Count', fontsize=8)
axes2.set_title('Zoomed_In View for years 2014 and 2015', fontsize=10)
axes2.xaxis.set_major_locator(ticker.MaxNLocator(integer=True)) # Ensure integer values on x-axis
plt.show()
plt.tight_layout()
<Figure size 640x480 with 0 Axes>
More Ads were put out in 2016 than ever advertised. What was the cause for this?
Firstly, most of our top advertised brands released new models in 2016. As a matter of fact, they topped the list of top 25 bestmodels and car sales in year 2016
The most cars were sold in 2016, than in previous year. This was a general market trend.
A speculation could be due to the. From this report, 2016 increased by 4.51% to almost 3.4 million cars sold; the best since 2009.
This general rise were attributed to environmental bonus (which must have been promoted since 2016) but was introducted in 2020.
import matplotlib.pyplot as plt
sellers = df['seller'].value_counts()
fig = plt.figure()
ax1 = fig.add_axes([0, 0, 1, 1])
ax1.bar(sellers.index, sellers)
ax1.set_title("Sellers' Index", fontweight='bold')
ax2=fig.add_axes([0.5, 0.5, 0.25, 0.25])
ax2.bar(sellers.index, sellers)
ax2.set_ylim(0, 30)
ax2.set_title('Zoomed_In view', size=10)
Text(0.5, 1.0, 'Zoomed_In view')
Firstly, the introduction of the environmental bonus tilted to favour individual ownership of vehicles. It's safe to assume most of the private sellers took the most of this bonus by plunging fulltime in car businesses.
Other questions revealed from thsi question is;
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | None | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | None | 2016-03-24 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | None | 125000 | 5 | Diesel | audi | Yes | 2016-03-24 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | None | 2016-03-14 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | None | 2005 | None | 0 | None | 20000 | 1 | None | sonstige_autos | None | 2016-03-14 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | None | 2016-03-20 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07 |
371528 rows × 17 columns
df['brand']
0 volkswagen 1 audi 2 jeep 3 volkswagen 4 skoda ... 371523 sonstige_autos 371524 smart 371525 volkswagen 371526 volkswagen 371527 bmw Name: brand, Length: 371528, dtype: object
import matplotlib.pyplot as plt
import pandas as pd
filtered_df = df[~df['vehicletype'].isin(['None', 'other'])]
vehicle_type = filtered_df['vehicletype'].value_counts()
brand = df['brand'].value_counts().nlargest(15)
fig = plt.figure(figsize=(24, 16))
axes1 = fig.add_axes([0, 0, 1, 1])
axes1.bar(brand.index, brand, color='#50ABC7')
axes1.set_title('Brand', fontweight='bold', fontsize=20)
axes1.set_xlabel('brands')
axes1.set_ylabel('brand count')
axes1.tick_params(axis='x', rotation=45, labelsize=13)
axes2 = fig.add_axes([0.5, 0.5, 0.25, 0.25])
axes2.bar(vehicle_type.index,vehicle_type)
axes2.set_title('Types of Vehicle', fontweight='bold')
# axes2.set_xlabel('vehicle types')
axes2.set_ylabel('count')
axes2.tick_params(axis='x', rotation=45, labelsize=10)
plt.show()
Firstly, the introduction of the environmental bonus tilted to favour individual ownership of vehicles. It's safe to assume most of the private sellers took the most of this bonus by plunging fulltime in car businesses.
Other questions revealed from thsi question is;
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | None | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | None | 2016-03-24 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | None | 125000 | 5 | Diesel | audi | Yes | 2016-03-24 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | None | 2016-03-14 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | No | 2016-03-17 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | No | 2016-03-31 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | None | 2005 | None | 0 | None | 20000 | 1 | None | sonstige_autos | None | 2016-03-14 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | No | 2016-03-05 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | No | 2016-03-19 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | None | 2016-03-20 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | No | 2016-03-07 |
371528 rows × 17 columns
df['notrepaireddamage'] = df['notrepaireddamage'].replace({'Yes': 'repaired', 'No': 'never_repaired'})
filtered_damages = df['notrepaireddamage'][~df['notrepaireddamage'].isin(['None'])].value_counts()
filtered_damages
never_repaired 263182 repaired 36286 Name: notrepaireddamage, dtype: int64
gearbox_counts = df[df['gearbox'] !='None']['gearbox'].value_counts()
filtered_df = df[df['gearbox'] != 'None']
gear_box = round(filtered_df.groupby('gearbox')['kilometer'].mean(), 2)
gear_box
gearbox Automatic 122592.96 Manual 127043.68 Name: kilometer, dtype: float64
import matplotlib.pyplot as plt
import pandas as pd
filtered_damages = df['notrepaireddamage'][df['notrepaireddamage'].isin(['Yes', 'No'])].value_counts()
filtered_damages = df[~df['notrepaireddamage'].isin(['None'])]['notrepaireddamage'].value_counts()
prices = df['price'].value_counts()
gearbox_counts = df[df['gearbox'] !='None']['gearbox'].value_counts()
filtered_df = df[df['gearbox'] != 'None']
gear_box = round(filtered_df.groupby('gearbox')['kilometer'].mean(), 2)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(38, 15))
ax1.bar(gearbox_counts.index, gearbox_counts, label='Gearbox')
ax1.set_xlabel('Gear_Box', size=22, weight='bold')
ax1.set_ylabel('Count', size=22, weight='bold')
ax1.tick_params(axis='x', labelsize=20)
ax1.tick_params(axis='y', labelsize=20)
ax2.bar(filtered_damages.index, filtered_damages, label='Repair Status', color='skyblue')
ax2.set_xlabel('Repair Status', size=22, weight='bold')
ax2.set_ylabel('Count', size=22, weight='bold')
ax2.tick_params(axis='x', labelsize=20)
ax2.tick_params(axis='y', labelsize=20)
ax3.bar(gear_box.index, gear_box, label='Avg kilomenter per Gear_Type')
ax3.set_xlabel('Gear_Type', size=22, weight='bold')
ax3.set_ylabel('Kilometer', size=22, weight='bold')
ax3.tick_params(axis='x', labelsize=20)
ax3.tick_params(axis='y', labelsize=20)
ax1.legend(fontsize=26)
ax2.legend(loc='center right', fontsize=26)
ax3.legend(loc='lower left', fontsize=26)
# plt.tight_layout()
plt.show()
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | None | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | None | 2016-03-24 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | None | 125000 | 5 | Diesel | audi | repaired | 2016-03-24 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | None | 2016-03-14 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | never_repaired | 2016-03-17 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | never_repaired | 2016-03-31 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | None | 2005 | None | 0 | None | 20000 | 1 | None | sonstige_autos | None | 2016-03-14 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | never_repaired | 2016-03-05 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | never_repaired | 2016-03-19 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | None | 2016-03-20 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | never_repaired | 2016-03-07 |
371528 rows × 17 columns
import matplotlib.pyplot as plt
fuel_type = df['fueltype'][~df['fueltype'].isin(['None', 'Others'])].value_counts()
plt.figure(figsize=(12, 8))
plt.bar(fuel_type.index, fuel_type, label='Fuel Type Preference', color='#C79F50')
plt.tick_params(axis='x', rotation = 45, labelsize=8)
plt.tick_params(axis='y', labelsize=8)
plt.legend();
df
datecrawled | name | seller | offertype | price | abtest | vehicletype | yearofregistration | gearbox | powerps | model | kilometer | monthofregistration | fueltype | brand | notrepaireddamage | datecreated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24T11:52:17 | Golf_3_1.6 | private | Offer | 480 | test | None | 1993 | Manual | 0 | golf | 150000 | 0 | Petrol | volkswagen | None | 2016-03-24 |
1 | 2016-03-24T10:58:45 | A5_Sportback_2.7_Tdi | private | Offer | 18300 | test | coupe | 2011 | Manual | 190 | None | 125000 | 5 | Diesel | audi | repaired | 2016-03-24 |
2 | 2016-03-14T12:52:21 | Jeep_Grand_Cherokee_"Overland" | private | Offer | 9800 | test | suv | 2004 | Automatic | 163 | grand | 125000 | 8 | Diesel | jeep | None | 2016-03-14 |
3 | 2016-03-17T16:54:04 | GOLF_4_1_4__3TÜRER | private | Offer | 1500 | test | small car | 2001 | Manual | 75 | golf | 150000 | 6 | Petrol | volkswagen | never_repaired | 2016-03-17 |
4 | 2016-03-31T17:25:20 | Skoda_Fabia_1.4_TDI_PD_Classic | private | Offer | 3600 | test | small car | 2008 | Manual | 69 | fabia | 90000 | 7 | Diesel | skoda | never_repaired | 2016-03-31 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
371523 | 2016-03-14T17:48:27 | Suche_t4___vito_ab_6_sitze | private | Offer | 2200 | test | None | 2005 | None | 0 | None | 20000 | 1 | None | sonstige_autos | None | 2016-03-14 |
371524 | 2016-03-05T19:56:21 | Smart_smart_leistungssteigerung_100ps | private | Offer | 1199 | test | convertible | 2000 | Automatic | 101 | fortwo | 125000 | 3 | Petrol | smart | never_repaired | 2016-03-05 |
371525 | 2016-03-19T18:57:12 | Volkswagen_Multivan_T4_TDI_7DC_UY2 | private | Offer | 9200 | test | bus | 1996 | Manual | 102 | transporter | 150000 | 3 | Diesel | volkswagen | never_repaired | 2016-03-19 |
371526 | 2016-03-20T19:41:08 | VW_Golf_Kombi_1_9l_TDI | private | Offer | 3400 | test | station wagon | 2002 | Manual | 100 | golf | 150000 | 6 | Diesel | volkswagen | None | 2016-03-20 |
371527 | 2016-03-07T19:39:19 | BMW_M135i_vollausgestattet_NP_52.720____Euro | private | Offer | 28990 | control | limousine | 2013 | Manual | 320 | m_reihe | 50000 | 8 | Petrol | bmw | never_repaired | 2016-03-07 |
371528 rows × 17 columns
import matplotlib.pyplot as plt
powerps = df['powerps'].value_counts()
# plt.bar(powerps.index, powerps)
powerps
# power, gearbox type, vehicle type,
0 40820 75 24035 60 15907 150 15442 140 13585 ... 1339 1 780 1 6920 1 1659 1 564 1 Name: powerps, Length: 794, dtype: int64
filtered_gearbox = df[~df['gearbox'].isin(['None'])]
avg_power_gear = round(filtered_gearbox.groupby('gearbox')['powerps'].mean(), 2)
filtered_vehicles = df[df['vehicletype']!='None']
avg_power_veh = round(filtered_vehicles.groupby('vehicletype')['powerps'].mean())
avg_power_veh
vehicletype bus 114.0 convertible 145.0 coupe 173.0 limousine 132.0 other 102.0 small car 69.0 station wagon 136.0 suv 166.0 Name: powerps, dtype: float64
import matplotlib.pyplot as plt
filtered_gearbox = df[~df['gearbox'].isin(['None'])]
avg_power_gear = round(filtered_gearbox.groupby('gearbox')['powerps'].mean(), 2)
filtered_vehicles = df[df['vehicletype']!='None']
avg_power_veh = round(filtered_vehicles.groupby('vehicletype')['powerps'].mean())
fig, (axis1, axis2) = plt.subplots(1, 2, figsize=(20, 14), gridspec_kw={'width_ratios': [1, 2]})
axis1.bar(avg_power_gear.index, avg_power_gear)
axis1.set_xlabel('Gearbox', fontweight='bold')
axis1.set_ylabel('Powerps', fontweight='bold')
axis1.set_title('Avg Powerups per GearBox', fontweight='bold', fontsize=10)
# Axis2
axis2.bar(avg_power_veh.index, avg_power_veh)
axis2.set_title('Avg Powerups per Vehicle Type', fontweight='bold', fontsize=10)
plt.show()
Since majority of the cars sold on the platform are manual cars, this would help us understand customer preference