DATA ANALYSIS IN ANDROID AND IOS MOBILE APPS

-In this project,we are analysing data for companies that build android and ios mobile apps.

_The aim of this project is to help our developers understand what type of apps are likely to attract more users in the market.

In [1]:
#ios mobile apps
opened_file=open("AppleStore.csv")

from csv import reader
reader_file=reader(opened_file)
apps_data=list(reader_file)
print(apps_data[:5])#priint the first few rows

#android mobile apps
opened_file=open("googleplaystore.csv")

from csv import reader
reader_file=reader(opened_file)
android=list(reader_file)
print(android[:5])
[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']]
[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
In [2]:
#exploring the data,displaying the raws and columns of the dataset
def explore_data(dataset):
    print("\nnumber of rows ",len(dataset))
    print("number of columns ",len(dataset[0]))
    print("\n")
    print(dataset[0])
    
explore_data(apps_data)
explore_data(android)

    

    
number of rows  7198
number of columns  16


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

number of rows  10842
number of columns  13


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

Removing incorrect data

In Google Play Data set there is incorrect data(number of rating in google play does'nt exceed 5,and is displayed in the cell below.

In [3]:
print(android[10473])#wrong row/incorrect data
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
In [4]:
#google play app has a maximum of 5 ratings not 19 has indicated in the above app. 
del android[10473]#removing wrong row
print(len(android))
10841

Removing duplicate data

-Google play data has got duplicates apps,for example in the code below,you'll find the Instagram has been entered more than once.and by this we have to remove all the apps that appear more than once.

In [5]:
for apps in android:
    duplicate=apps[0]
    if duplicate=="Instagram":
        print(apps)
    
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

-in total, there 1,181 cases where app occurs more than once,below is the display.

In [6]:
#working on duplicate apps
duplicate_apps=[]
unique_apps=[]
for apps in android:
    duplicate=apps[0]
    if duplicate in unique_apps:
        duplicate_apps.append(duplicate)
    else:
        unique_apps.append(duplicate)
print("Number of duplicate apps: ",len(duplicate_apps))
Number of duplicate apps:  1181

-We won't removes the duplicate apps randomly,but if you check the rows of the above cell(In[6]) you notice that the only difference is in fourth position(reviews),showing that data was collected at differnce times.

-we can therefore use this to build a criterion for keeping rows,by creating a dictionary where each keys will be the unque app name and the value is the highest number of reviews of that app.

In [7]:
reviews_max={}

for app in android[1:]:
    name=app[0]
    n_reviews=float(app[3])
    if name in reviews_max and reviews_max[name]>n_reviews:
        reviews_max[name]=n_reviews
    if name not in reviews_max:
        reviews_max[name]=n_reviews
print(len(reviews_max))
9659

-To confirm the actual length,subtract the lenth of duplicate from the length of android, as demonstrated in the cell below:

In [8]:
print(len(android[1:])-len(duplicate_apps))
9659
In [9]:
#using the dictionary to remove the duplicate data
android_clean=[]
already_added=[]

for apps in android[1:]:
    name=apps[0]
    n_reviews=float(apps[3])
    if (n_reviews==reviews_max[name]) and(name not in already_added):
        android_clean.append(apps)
        already_added.append(name)

print(len(android_clean))
    
    
    
9659

-In the above cell,we check if n_reviews(reviews) is equal to the maximum review of the value in our dictionary(reviews_max) and at the same time we ensure that the name of the app has not been added.If this is the case, we append the entire row(where each raw consist of data point) to our empty list(android_clean).we also append at the same time, the names to already_added list.

Removing non_english apps

In both Apps data and Google play Data set, there are some apps which are non_english:

In [10]:
#checking for english app name
def english(string):
    for letter in string:
        if ord (letter) >127:#wrong approach
            return False
    return True
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite')) 
print(english('Instachat 😜'))

        
        
        
True
False
False
False

-if you check the cell above,you'll find that the funtion couldn't correctly identify certain english app names like 'Docs To Go™ Free Office Suite' and 'Instachat 😜',just because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127

-if we use this function, then we will miss some english app names.To avoid this,we wil create a new function, in which we'll conclude that if there are more than 3 emojis and charcters like (™) then these are non english app names,otherwise english app name.

In [11]:
#final eglish and non english app names
def english_1(string):
    non=0
    for character in string:
        if ord(character)>127:
            non+=1  #this isolate the non english app name
    if non>3:
        return False
    
    return True
print(english_1('Instagram'))
print(english_1('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_1('Docs To Go™ Free Office Suite'))
print(english_1('Instachat 😜'))
    
    
    
    
    
True
False
True
True

-in te cell below,we are going to remove the non english app name in both the data sets.

In [12]:
android_english=[]
ios_english=[]

for apps in android_clean:
    app=apps[0]
    if english_1(app):
        android_english.append(apps)

for apps in apps_data:
    app=apps[1]
    if english_1(app):
        ios_english.append(apps)
print("android english data point: ",len(android_english))
print("ios english data point: ",len(ios_english))
print(android_english[0])
print(ios_english[0])
android english data point:  9614
ios english data point:  6184
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

-So far,we have removed inaccurate data,duplicate data and non_english apps.so our last step is to isolate the free apps in both data sets.

Isolating free apps

In the cell below,our aim is to isolate only the free apps.

In [13]:
android_final=[]
ios_final=[]

for apps in android_english:
    app=apps[7]# price is in index 7
    if app=='0':
        android_final.append(apps)
for apps in ios_english[1:]:
    app=apps[4] #price is in indx 4
    if app=='0.0':
        ios_final.append(apps)
        
print("android length: ",len(android_final))
print("ios length: ",len(ios_final))
        
    
    
android length:  8862
ios length:  3222

Since our revenue is influenced by the number of people using our apps,we will need to find app profile(we will use genre for both data set) that are succefull on both markets.

Our validation strategy for an app idea comprimised of 3 step;

1.Build minimal Android version of he app and add it to Google play

2.If the app has good response from users,we then develop it further.

3.If the app is profitable after 6 months,we also build an IOS version of the app and add it to App Store.

In [14]:
print(android[0])
print(apps_data[0])
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
In [15]:
#thi function will generate tables that show percantages
def freq_table(dataset,index):
    table={}
    for apps in dataset:
        app=apps[index]
        if app in table:
            table[app]+=1
        else:
            table[app]=1
    
    table_percentage={} #we use this dctionary to convert our values to percantage
    for key in table:
        percentage=(table[key]/len(dataset))*100
        table_percentage[key]=percentage
    return table_percentage

def display_table(dataset, index):#this function helps in arranging in descending  order
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

        
In [16]:
display_table(ios_final,11)#prime genre for app store
print("\n")
display_table(android_final,9)#genre for google google apps
print("\n")
display_table(android_final,1)#category for google play apps
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Tools : 8.429248476641842
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.581358609794629
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5319341006544795
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.2498307379823967
Action : 3.1031369893929135
Health & Fitness : 3.0692845858722633
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8618821936357481
Video Players & Editors : 1.782893252087565
Casual : 1.7490408485669149
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.9252990295644324
Strategy : 0.9140148950575491
House & Home : 0.8350259535093659
Weather : 0.8011735499887158
Events : 0.7109004739336493
Adventure : 0.6770480704129994
Comics : 0.6093432633716994
Beauty : 0.598059128864816
Art & Design : 0.598059128864816
Parenting : 0.49650191830286616
Card : 0.4400812457684496
Casino : 0.4287971112615662
Trivia : 0.41751297675468296
Educational;Education : 0.3949447077409162
Educational : 0.3723764387271496
Board : 0.3723764387271496
Education;Education : 0.3385240352064997
Word : 0.2595350936583164
Casual;Pretend Play : 0.23696682464454977
Music : 0.2031144211238998
Racing;Action & Adventure : 0.16926201760324985
Puzzle;Brain Games : 0.16926201760324985
Entertainment;Music & Video : 0.16926201760324985
Casual;Brain Games : 0.13540961408259986
Casual;Action & Adventure : 0.13540961408259986
Arcade;Action & Adventure : 0.12412547957571654
Action;Action & Adventure : 0.1015572105619499
Educational;Pretend Play : 0.09027307605506657
Board;Brain Games : 0.09027307605506657
Simulation;Action & Adventure : 0.07898894154818326
Parenting;Education : 0.07898894154818326
Entertainment;Brain Games : 0.07898894154818326
Parenting;Music & Video : 0.06770480704129993
Educational;Brain Games : 0.06770480704129993
Casual;Creativity : 0.06770480704129993
Art & Design;Creativity : 0.06770480704129993
Education;Pretend Play : 0.056420672534416606
Role Playing;Pretend Play : 0.045136538027533285
Education;Creativity : 0.045136538027533285
Role Playing;Action & Adventure : 0.033852403520649964
Puzzle;Action & Adventure : 0.033852403520649964
Entertainment;Creativity : 0.033852403520649964
Entertainment;Action & Adventure : 0.033852403520649964
Educational;Creativity : 0.033852403520649964
Educational;Action & Adventure : 0.033852403520649964
Education;Music & Video : 0.033852403520649964
Education;Brain Games : 0.033852403520649964
Education;Action & Adventure : 0.033852403520649964
Adventure;Action & Adventure : 0.033852403520649964
Video Players & Editors;Music & Video : 0.022568269013766643
Sports;Action & Adventure : 0.022568269013766643
Simulation;Pretend Play : 0.022568269013766643
Puzzle;Creativity : 0.022568269013766643
Music;Music & Video : 0.022568269013766643
Entertainment;Pretend Play : 0.022568269013766643
Casual;Education : 0.022568269013766643
Board;Action & Adventure : 0.022568269013766643
Trivia;Education : 0.011284134506883321
Travel & Local;Action & Adventure : 0.011284134506883321
Tools;Education : 0.011284134506883321
Strategy;Education : 0.011284134506883321
Strategy;Creativity : 0.011284134506883321
Strategy;Action & Adventure : 0.011284134506883321
Simulation;Education : 0.011284134506883321
Role Playing;Brain Games : 0.011284134506883321
Racing;Pretend Play : 0.011284134506883321
Puzzle;Education : 0.011284134506883321
Parenting;Brain Games : 0.011284134506883321
Music & Audio;Music & Video : 0.011284134506883321
Lifestyle;Pretend Play : 0.011284134506883321
Lifestyle;Education : 0.011284134506883321
Health & Fitness;Education : 0.011284134506883321
Health & Fitness;Action & Adventure : 0.011284134506883321
Entertainment;Education : 0.011284134506883321
Communication;Creativity : 0.011284134506883321
Comics;Creativity : 0.011284134506883321
Casual;Music & Video : 0.011284134506883321
Card;Brain Games : 0.011284134506883321
Card;Action & Adventure : 0.011284134506883321
Books & Reference;Education : 0.011284134506883321
Art & Design;Pretend Play : 0.011284134506883321
Art & Design;Action & Adventure : 0.011284134506883321
Arcade;Pretend Play : 0.011284134506883321
Adventure;Education : 0.011284134506883321


FAMILY : 18.788083953960733
GAME : 9.636650868878357
TOOLS : 8.440532611148726
BUSINESS : 4.581358609794629
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5319341006544795
SPORTS : 3.419092755585647
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.2498307379823967
HEALTH_AND_FITNESS : 3.0692845858722633
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.782893252087565
MAPS_AND_NAVIGATION : 1.399232678853532
EDUCATION : 1.2525389302640486
FOOD_AND_DRINK : 1.2412547957571656
ENTERTAINMENT : 1.0381403746332656
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8350259535093659
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
ART_AND_DESIGN : 0.6770480704129994
PARENTING : 0.6544798013992327
COMICS : 0.6206273978785828
BEAUTY : 0.598059128864816
In [17]:
print(ios_final[:8])
print("\n")
print(android_final[:8])
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'], ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1'], ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1'], ['553834731', 'Candy Crush Saga', '222846976', 'USD', '0.0', '961794', '2453', '4.5', '4.5', '1.101.0', '4+', 'Games', '43', '5', '24', '1']]


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'], ['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'April 26, 2018', '1.1', '4.0.3 and up'], ['Infinite Painter', 'ART_AND_DESIGN', '4.1', '36815', '29M', '1,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'June 14, 2018', '6.1.61.1', '4.2 and up']]

Analysis of Data set just by using genres

1.Apps Store

In App Store(prime_genre),it's clear that games are the common genre:

It's also clear that catalog is the least common genre in App Store.

Most of the app in App Store are designed more for entertainment(games,photo and video,sports) compared to pratical(purposes(education,shopping,utilities,productivity)

2.Google play

The most common genre are tools.

The is a slight difference between catalog and genre i.e genre has more apps compared to catalog.

comparing Apps data and Google Play

In Apps store it's clear that more apps are for entatainement(games,sports) compared to Google Play which are more practical(tools,education)

-

checking genre app with most users

a.Apps Store

In the cell below,we want to check what genres are the most popular in app:

To do this,we need to calculate the average number of install for each app genre.

But sincw App store is missing install column,we have to go through all this process;

1.isolate the apps of each genre

2.sum up the user ratings for the apps of that genre:

3.Divide the sum by the number of apps belonging to that genre(not by the total number of apps)

In [18]:
unique_genre=freq_table(ios_final,11)

for genre in unique_genre:
    total=0
    len_genre=0
    for apps in ios_final:
        genre_app=apps[11]
        if genre_app==genre:
            user_rating=float(apps[5])
            total+=user_rating
            len_genre+=1
    aver_user_rating=total/len_genre
    print(genre, " : ",aver_user_rating )
            

        
Shopping  :  26919.690476190477
Sports  :  23008.898550724636
Food & Drink  :  33333.92307692308
Education  :  7003.983050847458
Productivity  :  21028.410714285714
Finance  :  31467.944444444445
Catalogs  :  4004.0
Photo & Video  :  28441.54375
Social Networking  :  71548.34905660378
Health & Fitness  :  23298.015384615384
Utilities  :  18684.456790123455
Book  :  39758.5
Games  :  22788.6696905016
Travel  :  28243.8
Lifestyle  :  16485.764705882353
Weather  :  52279.892857142855
Reference  :  74942.11111111111
Business  :  7491.117647058823
Entertainment  :  14029.830708661417
Medical  :  612.0
Navigation  :  86090.33333333333
Music  :  57326.530303030304
News  :  21248.023255813954

Analysing the Apps Store using above cell

The most popular app genre(being used by many) in the Apps data is navigation with over eighty six thousands of users.They have highest number of user reviews.

Some app genre like Refference and Social NEtworking are also popular with over seventy thousnd of users each.

It's also clear that apps for fun like games and entatainemnt though domintes In App Store, but they are less popuar i.e have got less users.

It should also be into our attention that medical apps are the least popular(have very small number of users) in the Apps Store Data Set.

In [19]:
unique_genre=freq_table(android_final,1)

for category in unique_genre:
    total=0
    len_category=0
    for app in android_final:
        category_app=app[1]
        if category_app==category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    
    ave_num_install=total/len_category# avarage number of install 
    print(category," : ",ave_num_install)
            
            
            
            
HOUSE_AND_HOME  :  1313681.9054054054
COMMUNICATION  :  38326063.197916664
EDUCATION  :  3057207.207207207
PERSONALIZATION  :  5201482.6122448975
LIFESTYLE  :  1437816.2687861272
SOCIAL  :  23253652.127118643
BUSINESS  :  1704192.3399014778
AUTO_AND_VEHICLES  :  647317.8170731707
NEWS_AND_MAGAZINES  :  9549178.467741935
BEAUTY  :  513151.88679245283
PARENTING  :  542603.6206896552
LIBRARIES_AND_DEMO  :  638503.734939759
GAME  :  13006872.892271662
HEALTH_AND_FITNESS  :  4167457.3602941176
WEATHER  :  5074486.197183099
BOOKS_AND_REFERENCE  :  8767811.894736841
TRAVEL_AND_LOCAL  :  13984077.710144928
DATING  :  854028.8303030303
FAMILY  :  4371709.123123123
MAPS_AND_NAVIGATION  :  4056941.7741935486
EVENTS  :  253542.22222222222
PRODUCTIVITY  :  16772838.591304347
ENTERTAINMENT  :  19428913.04347826
PHOTOGRAPHY  :  17805627.643678162
FINANCE  :  1387692.475609756
ART_AND_DESIGN  :  1905351.6666666667
SHOPPING  :  7036877.311557789
FOOD_AND_DRINK  :  1924897.7363636363
TOOLS  :  10695245.286096256
COMICS  :  817657.2727272727
VIDEO_PLAYERS  :  24790074.17721519
SPORTS  :  4274688.722772277
MEDICAL  :  107167.23322683707

Analysis of Google Play

On average, Communication Apps ave the most installs of 38,326,063. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

Below is the display of such apps:

In [21]:
for app in android_final:
    if app[1]=='COMMUNICATION' and app[5]=='1,000,000,000+':
        print(app[0]," : ",app[5])
    
    
Messenger – Text and Video Chat for Free  :  1,000,000,000+
Gmail  :  1,000,000,000+
Hangouts  :  1,000,000,000+
Skype - free IM & video calls  :  1,000,000,000+
WhatsApp Messenger  :  1,000,000,000+
Google Chrome: Fast & Secure  :  1,000,000,000+

Conclusion

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

From the above data,it will be good for a company which is devoloping Apps Store, to go for Navigation app genre since they have large number of users hence profitable.To the Company building Google Play,should have Communication app genre as the first choice since they have most Installs and profitable.

In [ ]: