header

In this project, I endorsed the role of a data analyst working for app-building company. Our apps are available on the Apple Store and Google play-store.

Our business model consists in delevering viral free apps for the general public. Our revenues are from in-app ads and depend heavily on the popularity of our apps and the number our people that use them.

Goal:

Our purpose here is to analyze our dataset and help our developers understand the type of aps that are the most likely to attract more users.

Datasets

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:

A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this link.

A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this link.

We'll start by opening and exploring these two data sets.

In [81]:
from csv import reader

# Apple Store dataset 
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

# Google Play data set 
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

Google Play Store data set

In [82]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(android_header)
print('\n')
explore_data(android, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13

The Google Play Store dataset is composed of 13 columns and 10841 rows. From the information we have, we can assume that the interesting columns for our analysis will be: 'App', 'Category', 'Rating', 'Price', 'Install' and 'Genres'. This list may evolve while running the analysis.

Apple Store data set

In [83]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16

The Apple Store database is composed of 7197 row for 16 columns. Interesting columns should be: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.

Data Cleaning

Let's first check if all the data are correct and accurate. This process is called Data Clearning. It allows us to remove apps that have foreign names for example. Also, since we are a company that only develops free apps, it can help us remove non-free apps for our analysis.

Inaccurate data for the Google Play Store dataset?

In [84]:
for row in android: 
    if len(row) != len(android_header):
        print(row)
        print(android.index(row))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472

It seems that the row 10472 is missing information in the Category column. It hence messes with our data. Since the information is corrupted, we will remove it.

In [85]:
del android[10472]
In [86]:
for row in android: 
    if len(row) != len(android_header):
        print(row)
        print(android.index(row))

The incorrect data [10472] is well removed from the data set.

Inaccurate data for the Apple Store dataset?

In [87]:
for row in ios: 
    if len(row) != len(ios_header):
        print(row)
        print(ios.index(row))

There seems to be no corrupted data in the Apple Store data set, we can continue our analysis.

Duplicates

From the discussion over the data sets on Kaggle, it appears that the Google Play Store dataset is composed of duplicate entries. For example, the app Instagram has 4 different entries:

In [88]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

Let's find out how many dupplicates the dataset is composed of.

In [89]:
unique_apps_android = []
duplicate_apps_android = []

for app in android:
    name = app[0]
    if name in unique_apps_android:
        duplicate_apps_android.append(name)
    else:
        unique_apps_android.append(name)

print(len(duplicate_apps_android))
print('\n')
print('Example of duplicates:', duplicate_apps_android[:5])
1181


Example of duplicates: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']

There are a total of 1181 duplicates in the Google Play Store dataset. You can also see a few examples in the output.

It might be worth to also check for the Apple Store dataset.

In [90]:
unique_apps_ios = []
duplicate_apps_ios = []

for app in ios:
    name = app[0]
    if name in unique_apps_ios:
        duplicate_apps_ios.append(name)
    else:
        unique_apps_ios.append(name)

print(len(duplicate_apps_ios))
0

Lucky for us, the Apple Store dataset does not contain any duplicates!

How to deal with those duplicates?

This is the part where things get interesting in the data cleaning process. There is around 10% of duplicates in the totality of the Google Play Store dataset. We will have to remove them if we want our analysis to be correct. Removing duplicates imply that we need to keep the most correct information. To do this, we'd need to come up with a list of criteria to judge which information is the most correct one.

In our case, it turns out that number of reviews make a difference. Here is for example the duplicates of the Instagram app that we saw earlier:

example_insta_duplicates

Using the number or reviews will get us the indication of which information was the most recently updated and hence, the most correct one.

Removing the duplicates

In [91]:
print('Expected length:', len(android) - 1181)
Expected length: 9659

Removing the duplicates will give us a data set of 9659 rows.

In order to remove the duplicates:

  • We start by initializing two empty lists, android_clean and already_added.
  • We loop through the android data set, and for every iteration:
  • We isolate the name of the app and the number of reviews.
  • We add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:

The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [92]:
reviews_max = {}

for row in android:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
In [93]:
print(len(reviews_max))
9659
In [94]:
android_clean = []
already_added = []

for app in android: 
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

print(len(android_clean))
        
9659

The dataset is now cleaned up!

Needless apps for the analysis

Since our company only develops apps in English, we also need to remove applications that are in foreign languages.

In [95]:
def english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True
In [96]:
print(english('Instagram'))
print(english('Docs To Go™ Free Office Suite'))
print(english('中国語'))
True
False
False

Here we created a function that allows us to determine whether or not the name of an app is based on the American Standard Code for Information Interchange system. In a nutshell, if the number of characters exceed 127, there are not common English characters.

The issue we are facing here is that the function also detects app with special characters like Docs To Go Free Office Suite. We'd loose many precious data and that is not something we are trying to achieve.

In [97]:
def english(string):
    smileyfree = 0
    
    for character in string:
        if ord(character) > 127:
            smileyfree += 1
    
    if smileyfree > 3:
        return False
    else:
        return True

print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))
True
True
In [98]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 5, True)
print('\n')
explore_data(ios_english, 0, 5, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 6183
Number of columns: 16

To minimize the impact of data loss, we only removed an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English.

Only free apps

We mentioned it earlier but as a reminder, we only produce free apps. We need to isolate the free from the non-free apps.

In [99]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print("Android:", len(android_final))
print("iOs:", len(ios_final))
Android: 8864
iOs: 3222

This is the final step of our data cleaning process and we can start our analysis.

Analysis

Our strategy

As mentioned in the introduction, our company builds small apps that they put online for free. Our economical model is based on advertissements that generate revenue. We first put it on the Google Play Store. If the app becomes popular, we develop it further. After a 6-month trial, if the app becomes profitable, it is released on the Apple Store as well.

This is why this benchmarking is important to us. It allows us to distinguish which apps are better suited for the current market. We will be interested in the genre, the category, the number of times it was installed, the ratings and etc.

Our marketing team can then come up with a strategy with different apps that can be released on the Google Play Store and if successful, then on the Apple Store.

We will be focusing mostly on the Google Play Store.

Google Play Store

In [100]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages
In [101]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
In [102]:
Category = display_table(android_final, 1)
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 0.6430505415162455
COMICS : 0.6204873646209386
BEAUTY : 0.5979241877256317

Apps in the family category tend to be more present on the store. By checking the Google Play Store website, we can see that those apps touch everyone, from 4+ to 17+ public targets. This category also gathers a lot of quick games for children and adults. Basically, it appears to be the largest category regrouping different genres for a large general public target.

Second category regoups the games and tools come third.

This is a good indication as to know which public is the more represented on this store. This first insight gives us the indication that "all public" apps will be more downloaded than others but we will confirm this later in our analysis.

In [103]:
Genres = display_table(android_final, -4)
Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075812
Strategy : 0.9138086642599278
House & Home : 0.8235559566787004
Weather : 0.8009927797833934
Events : 0.7107400722021661
Adventure : 0.6768953068592057
Comics : 0.6092057761732852
Beauty : 0.5979241877256317
Art & Design : 0.5979241877256317
Parenting : 0.4963898916967509
Card : 0.45126353790613716
Casino : 0.42870036101083037
Trivia : 0.41741877256317694
Educational;Education : 0.39485559566787
Board : 0.3835740072202166
Educational : 0.3722924187725632
Education;Education : 0.33844765342960287
Word : 0.2594765342960289
Casual;Pretend Play : 0.236913357400722
Music : 0.2030685920577617
Racing;Action & Adventure : 0.16922382671480143
Puzzle;Brain Games : 0.16922382671480143
Entertainment;Music & Video : 0.16922382671480143
Casual;Brain Games : 0.13537906137184114
Casual;Action & Adventure : 0.13537906137184114
Arcade;Action & Adventure : 0.12409747292418773
Action;Action & Adventure : 0.10153429602888085
Educational;Pretend Play : 0.09025270758122744
Simulation;Action & Adventure : 0.078971119133574
Parenting;Education : 0.078971119133574
Entertainment;Brain Games : 0.078971119133574
Board;Brain Games : 0.078971119133574
Parenting;Music & Video : 0.06768953068592057
Educational;Brain Games : 0.06768953068592057
Casual;Creativity : 0.06768953068592057
Art & Design;Creativity : 0.06768953068592057
Education;Pretend Play : 0.056407942238267145
Role Playing;Pretend Play : 0.04512635379061372
Education;Creativity : 0.04512635379061372
Role Playing;Action & Adventure : 0.033844765342960284
Puzzle;Action & Adventure : 0.033844765342960284
Entertainment;Creativity : 0.033844765342960284
Entertainment;Action & Adventure : 0.033844765342960284
Educational;Creativity : 0.033844765342960284
Educational;Action & Adventure : 0.033844765342960284
Education;Music & Video : 0.033844765342960284
Education;Brain Games : 0.033844765342960284
Education;Action & Adventure : 0.033844765342960284
Adventure;Action & Adventure : 0.033844765342960284
Video Players & Editors;Music & Video : 0.02256317689530686
Sports;Action & Adventure : 0.02256317689530686
Simulation;Pretend Play : 0.02256317689530686
Puzzle;Creativity : 0.02256317689530686
Music;Music & Video : 0.02256317689530686
Entertainment;Pretend Play : 0.02256317689530686
Casual;Education : 0.02256317689530686
Board;Action & Adventure : 0.02256317689530686
Video Players & Editors;Creativity : 0.01128158844765343
Trivia;Education : 0.01128158844765343
Travel & Local;Action & Adventure : 0.01128158844765343
Tools;Education : 0.01128158844765343
Strategy;Education : 0.01128158844765343
Strategy;Creativity : 0.01128158844765343
Strategy;Action & Adventure : 0.01128158844765343
Simulation;Education : 0.01128158844765343
Role Playing;Brain Games : 0.01128158844765343
Racing;Pretend Play : 0.01128158844765343
Puzzle;Education : 0.01128158844765343
Parenting;Brain Games : 0.01128158844765343
Music & Audio;Music & Video : 0.01128158844765343
Lifestyle;Pretend Play : 0.01128158844765343
Lifestyle;Education : 0.01128158844765343
Health & Fitness;Education : 0.01128158844765343
Health & Fitness;Action & Adventure : 0.01128158844765343
Entertainment;Education : 0.01128158844765343
Communication;Creativity : 0.01128158844765343
Comics;Creativity : 0.01128158844765343
Casual;Music & Video : 0.01128158844765343
Card;Action & Adventure : 0.01128158844765343
Books & Reference;Education : 0.01128158844765343
Art & Design;Pretend Play : 0.01128158844765343
Art & Design;Action & Adventure : 0.01128158844765343
Arcade;Pretend Play : 0.01128158844765343
Adventure;Education : 0.01128158844765343

In terms of genres, tool apps tend to dominate the Play Store. They stricly followed by entertainement and education. Again, all of thoses genres can be found in the family category.

Apple Store

In [104]:
prime_genre = display_table(ios_final, -5)
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665

The Apple Store seems to have a different way of categorizing its apps. Game is by far the most represented category by 58%.

The most popular apps

One way to see the most popular apps would to rank them by rating. However, since our company lives on ads generated revenues, we will focus on the number of time an app will be installed on a device. It will give us a clear indication of the most popular application that there is on the stores and the state of the market.

Google Play Store

In [105]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)
BUSINESS : 1712290.1474201474
PRODUCTIVITY : 16787331.344927534
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
PHOTOGRAPHY : 17840110.40229885
HOUSE_AND_HOME : 1331540.5616438356
MEDICAL : 120550.61980830671
COMMUNICATION : 38456119.167247385
PARENTING : 542603.6206896552
TRAVEL_AND_LOCAL : 13984077.710144928
ART_AND_DESIGN : 1986335.0877192982
ENTERTAINMENT : 11640705.88235294
AUTO_AND_VEHICLES : 647317.8170731707
MAPS_AND_NAVIGATION : 4056941.7741935486
DATING : 854028.8303030303
SOCIAL : 23253652.127118643
SPORTS : 3638640.1428571427
LIFESTYLE : 1437816.2687861272
COMICS : 817657.2727272727
HEALTH_AND_FITNESS : 4188821.9853479853
VIDEO_PLAYERS : 24727872.452830188
EVENTS : 253542.22222222222
PERSONALIZATION : 5201482.6122448975
EDUCATION : 1833495.145631068
LIBRARIES_AND_DEMO : 638503.734939759
SHOPPING : 7036877.311557789
FAMILY : 3695641.8198090694
NEWS_AND_MAGAZINES : 9549178.467741935
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
WEATHER : 5074486.197183099
GAME : 15588015.603248259
TOOLS : 10801391.298666667

Communication apps have the most installs (38,456,119). However, if we take a look at the Google Play Store, we can notice that apps like Whatsapp or Messenger are heavily downloaded. Let's check it out.

In [106]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+

The communication category tends to be heavily dominated by giants in the industry of fast text messages. It might not be a good idea to dive into this category as our apps would be drowned.

The game genre seems to be pretty popular as well, but our previous tend to show that this genre is also a little bit saturated. If we really want our apps to be popular quickly, we need to find another pattern. Let's now check the situation for the Apple Store so we can gather more information.

Apple Store

In [107]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)
Games : 22788.6696905016
Medical : 612.0
Catalogs : 4004.0
Social Networking : 71548.34905660378
Utilities : 18684.456790123455
Weather : 52279.892857142855
Health & Fitness : 23298.015384615384
Finance : 31467.944444444445
Music : 57326.530303030304
Food & Drink : 33333.92307692308
Shopping : 26919.690476190477
Navigation : 86090.33333333333
Travel : 28243.8
Entertainment : 14029.830708661417
Lifestyle : 16485.764705882353
Business : 7491.117647058823
Sports : 23008.898550724636
Reference : 74942.11111111111
Book : 39758.5
Photo & Video : 28441.54375
Productivity : 21028.410714285714
Education : 7003.983050847458
News : 21248.023255813954

Surprisingly, the app genre with the more installs is the navigation category. That is interesting as apps such as Waze or Google Maps are heavily popular.

Games and Social Networking also tend to be popular. But as we previously saw, such an industry is quickly dominated by the competition.

Conclusion

From the data we gathered, it will not be easy to distance ourselves from the competition. If there is one thing for certain, all public apps will be more popular than apps dedicated to a special target (such as 17+).

Social Networking (including communicating apps), Games and Navigation apps are the most popular among the installed apps. As our aim to quickly earn users to develop our apps futher, I suggest we create an hybdrid.

User Case: Hybrid app

We have been able to distinguish a few criteria:

  • navigating
  • social sharing
  • gaming

An app that could encompasses those three criteria would be present in all the categories above and have a highly addictive power. The app should be able to immerse the user with immerse navigating systems (such as Pokemon Go for example), being able to socially interact (I suggest to contrusct the app with social media APIs) with friends and family or other users and finally to be a game.

I strongly suggest to dive deeper in other metrics and find a socially current popular topic to base on the story-telling of the app.