-In this project,we are analysing data for companies that build android and ios mobile apps.
_The aim of this project is to help our developers understand what type of apps are likely to attract more users in the market.
#ios mobile apps
opened_file=open("AppleStore.csv")
from csv import reader
reader_file=reader(opened_file)
apps_data=list(reader_file)
print(apps_data[:5])#priint the first few rows
#android mobile apps
opened_file=open("googleplaystore.csv")
from csv import reader
reader_file=reader(opened_file)
android=list(reader_file)
print(android[:5])
[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']] [['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
#exploring the data,displaying the raws and columns of the dataset
def explore_data(dataset):
print("\nnumber of rows ",len(dataset))
print("number of columns ",len(dataset[0]))
print("\n")
print(dataset[0])
explore_data(apps_data)
explore_data(android)
number of rows 7198 number of columns 16 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] number of rows 10842 number of columns 13 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
In Google Play Data set there is incorrect data(number of rating in google play does'nt exceed 5,and is displayed in the cell below.
print(android[10473])#wrong row/incorrect data
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
#google play app has a maximum of 5 ratings not 19 has indicated in the above app.
del android[10473]#removing wrong row
print(len(android))
10841
-Google play data has got duplicates apps,for example in the code below,you'll find the Instagram has been entered more than once.and by this we have to remove all the apps that appear more than once.
for apps in android:
duplicate=apps[0]
if duplicate=="Instagram":
print(apps)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
-in total, there 1,181 cases where app occurs more than once,below is the display.
#working on duplicate apps
duplicate_apps=[]
unique_apps=[]
for apps in android:
duplicate=apps[0]
if duplicate in unique_apps:
duplicate_apps.append(duplicate)
else:
unique_apps.append(duplicate)
print("Number of duplicate apps: ",len(duplicate_apps))
Number of duplicate apps: 1181
-We won't removes the duplicate apps randomly,but if you check the rows of the above cell(In[6]) you notice that the only difference is in fourth position(reviews),showing that data was collected at differnce times.
-we can therefore use this to build a criterion for keeping rows,by creating a dictionary where each keys will be the unque app name and the value is the highest number of reviews of that app.
reviews_max={}
for app in android[1:]:
name=app[0]
n_reviews=float(app[3])
if name in reviews_max and reviews_max[name]>n_reviews:
reviews_max[name]=n_reviews
if name not in reviews_max:
reviews_max[name]=n_reviews
print(len(reviews_max))
9659
-To confirm the actual length,subtract the lenth of duplicate from the length of android, as demonstrated in the cell below:
print(len(android[1:])-len(duplicate_apps))
9659
#using the dictionary to remove the duplicate data
android_clean=[]
already_added=[]
for apps in android[1:]:
name=apps[0]
n_reviews=float(apps[3])
if (n_reviews==reviews_max[name]) and(name not in already_added):
android_clean.append(apps)
already_added.append(name)
print(len(android_clean))
9659
-In the above cell,we check if n_reviews(reviews) is equal to the maximum review of the value in our dictionary(reviews_max) and at the same time we ensure that the name of the app has not been added.If this is the case, we append the entire row(where each raw consist of data point) to our empty list(android_clean).we also append at the same time, the names to already_added list.
In both Apps data and Google play Data set, there are some apps which are non_english:
#checking for english app name
def english(string):
for letter in string:
if ord (letter) >127:#wrong approach
return False
return True
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))
True False False False
-if you check the cell above,you'll find that the funtion couldn't correctly identify certain english app names like 'Docs To Go™ Free Office Suite' and 'Instachat 😜',just because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127
-if we use this function, then we will miss some english app names.To avoid this,we wil create a new function, in which we'll conclude that if there are more than 3 emojis and charcters like (™) then these are non english app names,otherwise english app name.
#final eglish and non english app names
def english_1(string):
non=0
for character in string:
if ord(character)>127:
non+=1 #this isolate the non english app name
if non>3:
return False
return True
print(english_1('Instagram'))
print(english_1('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_1('Docs To Go™ Free Office Suite'))
print(english_1('Instachat 😜'))
True False True True
-in te cell below,we are going to remove the non english app name in both the data sets.
android_english=[]
ios_english=[]
for apps in android_clean:
app=apps[0]
if english_1(app):
android_english.append(apps)
for apps in apps_data:
app=apps[1]
if english_1(app):
ios_english.append(apps)
print("android english data point: ",len(android_english))
print("ios english data point: ",len(ios_english))
print(android_english[0])
print(ios_english[0])
android english data point: 9614 ios english data point: 6184 ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
-So far,we have removed inaccurate data,duplicate data and non_english apps.so our last step is to isolate the free apps in both data sets.
In the cell below,our aim is to isolate only the free apps.
android_final=[]
ios_final=[]
for apps in android_english:
app=apps[7]# price is in index 7
if app=='0':
android_final.append(apps)
for apps in ios_english[1:]:
app=apps[4] #price is in indx 4
if app=='0.0':
ios_final.append(apps)
print("android length: ",len(android_final))
print("ios length: ",len(ios_final))
android length: 8862 ios length: 3222
Since our revenue is influenced by the number of people using our apps,we will need to find app profile(we will use genre for both data set) that are succefull on both markets.
Our validation strategy for an app idea comprimised of 3 step;
1.Build minimal Android version of he app and add it to Google play
2.If the app has good response from users,we then develop it further.
3.If the app is profitable after 6 months,we also build an IOS version of the app and add it to App Store.
print(android[0])
print(apps_data[0])
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
#thi function will generate tables that show percantages
def freq_table(dataset,index):
table={}
for apps in dataset:
app=apps[index]
if app in table:
table[app]+=1
else:
table[app]=1
table_percentage={} #we use this dctionary to convert our values to percantage
for key in table:
percentage=(table[key]/len(dataset))*100
table_percentage[key]=percentage
return table_percentage
def display_table(dataset, index):#this function helps in arranging in descending order
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
display_table(ios_final,11)#prime genre for app store
print("\n")
display_table(android_final,9)#genre for google google apps
print("\n")
display_table(android_final,1)#category for google play apps
Games : 58.16263190564867 Entertainment : 7.883302296710118 Photo & Video : 4.9658597144630665 Education : 3.662321539416512 Social Networking : 3.2898820608317814 Shopping : 2.60707635009311 Utilities : 2.5139664804469275 Sports : 2.1415270018621975 Music : 2.0484171322160147 Health & Fitness : 2.0173805090006205 Productivity : 1.7380509000620732 Lifestyle : 1.5828677839851024 News : 1.3345747982619491 Travel : 1.2414649286157666 Finance : 1.1173184357541899 Weather : 0.8690254500310366 Food & Drink : 0.8069522036002483 Reference : 0.5586592178770949 Business : 0.5276225946617008 Book : 0.4345127250155183 Navigation : 0.186219739292365 Medical : 0.186219739292365 Catalogs : 0.12414649286157665 Tools : 8.429248476641842 Entertainment : 6.070864364703228 Education : 5.348679756262695 Business : 4.581358609794629 Productivity : 3.8930264048747465 Lifestyle : 3.8930264048747465 Finance : 3.7011961182577298 Medical : 3.5319341006544795 Sports : 3.4642292936131795 Personalization : 3.3175355450236967 Communication : 3.2498307379823967 Action : 3.1031369893929135 Health & Fitness : 3.0692845858722633 Photography : 2.945159106296547 News & Magazines : 2.798465357707064 Social : 2.663055743624464 Travel & Local : 2.324531708417964 Shopping : 2.2455427668697814 Books & Reference : 2.143985556307831 Simulation : 2.0424283457458814 Dating : 1.8618821936357481 Arcade : 1.8618821936357481 Video Players & Editors : 1.782893252087565 Casual : 1.7490408485669149 Maps & Navigation : 1.399232678853532 Food & Drink : 1.2412547957571656 Puzzle : 1.128413450688332 Racing : 0.9930038366057323 Role Playing : 0.9365831640713158 Libraries & Demo : 0.9365831640713158 Auto & Vehicles : 0.9252990295644324 Strategy : 0.9140148950575491 House & Home : 0.8350259535093659 Weather : 0.8011735499887158 Events : 0.7109004739336493 Adventure : 0.6770480704129994 Comics : 0.6093432633716994 Beauty : 0.598059128864816 Art & Design : 0.598059128864816 Parenting : 0.49650191830286616 Card : 0.4400812457684496 Casino : 0.4287971112615662 Trivia : 0.41751297675468296 Educational;Education : 0.3949447077409162 Educational : 0.3723764387271496 Board : 0.3723764387271496 Education;Education : 0.3385240352064997 Word : 0.2595350936583164 Casual;Pretend Play : 0.23696682464454977 Music : 0.2031144211238998 Racing;Action & Adventure : 0.16926201760324985 Puzzle;Brain Games : 0.16926201760324985 Entertainment;Music & Video : 0.16926201760324985 Casual;Brain Games : 0.13540961408259986 Casual;Action & Adventure : 0.13540961408259986 Arcade;Action & Adventure : 0.12412547957571654 Action;Action & Adventure : 0.1015572105619499 Educational;Pretend Play : 0.09027307605506657 Board;Brain Games : 0.09027307605506657 Simulation;Action & Adventure : 0.07898894154818326 Parenting;Education : 0.07898894154818326 Entertainment;Brain Games : 0.07898894154818326 Parenting;Music & Video : 0.06770480704129993 Educational;Brain Games : 0.06770480704129993 Casual;Creativity : 0.06770480704129993 Art & Design;Creativity : 0.06770480704129993 Education;Pretend Play : 0.056420672534416606 Role Playing;Pretend Play : 0.045136538027533285 Education;Creativity : 0.045136538027533285 Role Playing;Action & Adventure : 0.033852403520649964 Puzzle;Action & Adventure : 0.033852403520649964 Entertainment;Creativity : 0.033852403520649964 Entertainment;Action & Adventure : 0.033852403520649964 Educational;Creativity : 0.033852403520649964 Educational;Action & Adventure : 0.033852403520649964 Education;Music & Video : 0.033852403520649964 Education;Brain Games : 0.033852403520649964 Education;Action & Adventure : 0.033852403520649964 Adventure;Action & Adventure : 0.033852403520649964 Video Players & Editors;Music & Video : 0.022568269013766643 Sports;Action & Adventure : 0.022568269013766643 Simulation;Pretend Play : 0.022568269013766643 Puzzle;Creativity : 0.022568269013766643 Music;Music & Video : 0.022568269013766643 Entertainment;Pretend Play : 0.022568269013766643 Casual;Education : 0.022568269013766643 Board;Action & Adventure : 0.022568269013766643 Trivia;Education : 0.011284134506883321 Travel & Local;Action & Adventure : 0.011284134506883321 Tools;Education : 0.011284134506883321 Strategy;Education : 0.011284134506883321 Strategy;Creativity : 0.011284134506883321 Strategy;Action & Adventure : 0.011284134506883321 Simulation;Education : 0.011284134506883321 Role Playing;Brain Games : 0.011284134506883321 Racing;Pretend Play : 0.011284134506883321 Puzzle;Education : 0.011284134506883321 Parenting;Brain Games : 0.011284134506883321 Music & Audio;Music & Video : 0.011284134506883321 Lifestyle;Pretend Play : 0.011284134506883321 Lifestyle;Education : 0.011284134506883321 Health & Fitness;Education : 0.011284134506883321 Health & Fitness;Action & Adventure : 0.011284134506883321 Entertainment;Education : 0.011284134506883321 Communication;Creativity : 0.011284134506883321 Comics;Creativity : 0.011284134506883321 Casual;Music & Video : 0.011284134506883321 Card;Brain Games : 0.011284134506883321 Card;Action & Adventure : 0.011284134506883321 Books & Reference;Education : 0.011284134506883321 Art & Design;Pretend Play : 0.011284134506883321 Art & Design;Action & Adventure : 0.011284134506883321 Arcade;Pretend Play : 0.011284134506883321 Adventure;Education : 0.011284134506883321 FAMILY : 18.788083953960733 GAME : 9.636650868878357 TOOLS : 8.440532611148726 BUSINESS : 4.581358609794629 LIFESTYLE : 3.9043105393816293 PRODUCTIVITY : 3.8930264048747465 FINANCE : 3.7011961182577298 MEDICAL : 3.5319341006544795 SPORTS : 3.419092755585647 PERSONALIZATION : 3.3175355450236967 COMMUNICATION : 3.2498307379823967 HEALTH_AND_FITNESS : 3.0692845858722633 PHOTOGRAPHY : 2.945159106296547 NEWS_AND_MAGAZINES : 2.798465357707064 SOCIAL : 2.663055743624464 TRAVEL_AND_LOCAL : 2.335815842924848 SHOPPING : 2.2455427668697814 BOOKS_AND_REFERENCE : 2.143985556307831 DATING : 1.8618821936357481 VIDEO_PLAYERS : 1.782893252087565 MAPS_AND_NAVIGATION : 1.399232678853532 EDUCATION : 1.2525389302640486 FOOD_AND_DRINK : 1.2412547957571656 ENTERTAINMENT : 1.0381403746332656 LIBRARIES_AND_DEMO : 0.9365831640713158 AUTO_AND_VEHICLES : 0.9252990295644324 HOUSE_AND_HOME : 0.8350259535093659 WEATHER : 0.8011735499887158 EVENTS : 0.7109004739336493 ART_AND_DESIGN : 0.6770480704129994 PARENTING : 0.6544798013992327 COMICS : 0.6206273978785828 BEAUTY : 0.598059128864816
print(ios_final[:8])
print("\n")
print(android_final[:8])
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'], ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1'], ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1'], ['553834731', 'Candy Crush Saga', '222846976', 'USD', '0.0', '961794', '2453', '4.5', '4.5', '1.101.0', '4+', 'Games', '43', '5', '24', '1']] [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'], ['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'April 26, 2018', '1.1', '4.0.3 and up'], ['Infinite Painter', 'ART_AND_DESIGN', '4.1', '36815', '29M', '1,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'June 14, 2018', '6.1.61.1', '4.2 and up']]
In App Store(prime_genre),it's clear that games are the common genre:
It's also clear that catalog is the least common genre in App Store.
Most of the app in App Store are designed more for entertainment(games,photo and video,sports) compared to pratical(purposes(education,shopping,utilities,productivity)
The most common genre are tools.
The is a slight difference between catalog and genre i.e genre has more apps compared to catalog.
In Apps store it's clear that more apps are for entatainement(games,sports) compared to Google Play which are more practical(tools,education)
In the cell below,we want to check what genres are the most popular in app:
To do this,we need to calculate the average number of install for each app genre.
But sincw App store is missing install column,we have to go through all this process;
1.isolate the apps of each genre
2.sum up the user ratings for the apps of that genre:
3.Divide the sum by the number of apps belonging to that genre(not by the total number of apps)
unique_genre=freq_table(ios_final,11)
for genre in unique_genre:
total=0
len_genre=0
for apps in ios_final:
genre_app=apps[11]
if genre_app==genre:
user_rating=float(apps[5])
total+=user_rating
len_genre+=1
aver_user_rating=total/len_genre
print(genre, " : ",aver_user_rating )
Shopping : 26919.690476190477 Sports : 23008.898550724636 Food & Drink : 33333.92307692308 Education : 7003.983050847458 Productivity : 21028.410714285714 Finance : 31467.944444444445 Catalogs : 4004.0 Photo & Video : 28441.54375 Social Networking : 71548.34905660378 Health & Fitness : 23298.015384615384 Utilities : 18684.456790123455 Book : 39758.5 Games : 22788.6696905016 Travel : 28243.8 Lifestyle : 16485.764705882353 Weather : 52279.892857142855 Reference : 74942.11111111111 Business : 7491.117647058823 Entertainment : 14029.830708661417 Medical : 612.0 Navigation : 86090.33333333333 Music : 57326.530303030304 News : 21248.023255813954
The most popular app genre(being used by many) in the Apps data is navigation with over eighty six thousands of users.They have highest number of user reviews.
Some app genre like Refference and Social NEtworking are also popular with over seventy thousnd of users each.
It's also clear that apps for fun like games and entatainemnt though domintes In App Store, but they are less popuar i.e have got less users.
It should also be into our attention that medical apps are the least popular(have very small number of users) in the Apps Store Data Set.
unique_genre=freq_table(android_final,1)
for category in unique_genre:
total=0
len_category=0
for app in android_final:
category_app=app[1]
if category_app==category:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
total += float(n_installs)
len_category += 1
ave_num_install=total/len_category# avarage number of install
print(category," : ",ave_num_install)
HOUSE_AND_HOME : 1313681.9054054054 COMMUNICATION : 38326063.197916664 EDUCATION : 3057207.207207207 PERSONALIZATION : 5201482.6122448975 LIFESTYLE : 1437816.2687861272 SOCIAL : 23253652.127118643 BUSINESS : 1704192.3399014778 AUTO_AND_VEHICLES : 647317.8170731707 NEWS_AND_MAGAZINES : 9549178.467741935 BEAUTY : 513151.88679245283 PARENTING : 542603.6206896552 LIBRARIES_AND_DEMO : 638503.734939759 GAME : 13006872.892271662 HEALTH_AND_FITNESS : 4167457.3602941176 WEATHER : 5074486.197183099 BOOKS_AND_REFERENCE : 8767811.894736841 TRAVEL_AND_LOCAL : 13984077.710144928 DATING : 854028.8303030303 FAMILY : 4371709.123123123 MAPS_AND_NAVIGATION : 4056941.7741935486 EVENTS : 253542.22222222222 PRODUCTIVITY : 16772838.591304347 ENTERTAINMENT : 19428913.04347826 PHOTOGRAPHY : 17805627.643678162 FINANCE : 1387692.475609756 ART_AND_DESIGN : 1905351.6666666667 SHOPPING : 7036877.311557789 FOOD_AND_DRINK : 1924897.7363636363 TOOLS : 10695245.286096256 COMICS : 817657.2727272727 VIDEO_PLAYERS : 24790074.17721519 SPORTS : 4274688.722772277 MEDICAL : 107167.23322683707
On average, Communication Apps ave the most installs of 38,326,063. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:
Below is the display of such apps:
for app in android_final:
if app[1]=='COMMUNICATION' and app[5]=='1,000,000,000+':
print(app[0]," : ",app[5])
Messenger – Text and Video Chat for Free : 1,000,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Skype - free IM & video calls : 1,000,000,000+ WhatsApp Messenger : 1,000,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.
From the above data,it will be good for a company which is devoloping Apps Store, to go for Navigation app genre since they have large number of users hence profitable.To the Company building Google Play,should have Communication app genre as the first choice since they have most Installs and profitable.