Notebook

Profitable App Profiles for the App Store and Google Play Markets¶

Introduction¶

The aim of this project is to identify profitable Android (Google Play) and iOS (the App Store) mobile apps.

The apps in consideration are free to download and install, and the main source of the company's revenue consists of in-app ads. This means the revenue for any given app is mostly influenced by the number of its users - the more users that see and engage with the ads, the better. Hence it is necessary to analyze available data to understand what type of apps are likely to attract more users both on Google Play and the App Store.

1. Data Collection and Exploration¶

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try first to analyze a sample of the data instead, to see if we can find any relevant existing data at no cost. For this purpose, there are 2 data sets available in the form of CSV files:

Android apps data set contains data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.
IOS apps data set contains data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017.

To open and explore these two data sets, a function explore_data() was created:

In [1]:

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:

# Opening the data sets and saving both as lists of lists
from csv import reader

opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [3]:

explore_data(android, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13

In [4]:

explore_data(ios, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16

In [5]:

# Android data set columns
print(android_header)

print('\n')

# iOS data set columns
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

The Google Play data set (Android apps) contains 10,841 apps and 13 columns. The most informative columns for us seem to be the following: 'App', 'Category', 'Rating', 'Reviews', 'Installs', 'Type', 'Price', 'Content Rating' and 'Genres'.

The App Store data set (iOS apps) contains 7,197 apps and 16 columns. The columns potentially useful for our data analysis might be the following: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'cont_rating' and 'prime_genre'.

For further details about both data sets and the meaning of each column, the corresponding data set documentation can be addressed: Android apps data set and iOS apps data set.

2. Data Cleaning¶

2.1. Deleting Wrong Data¶

For both data sets discussion sections are available here: for Google Play and for the App Store. In the discussion section dedicated to Google Play data set in one of the topics it was reported a wrong value for the row 10,472 (missing 'Rating' and a column shift for next columns).

In [6]:

print(android_header)
print('\n')
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']

Inspecting the reported row, we can see that the missing value is actually not 'Rating' but 'Category', and also for 'Genres' there is no value. For comparison, let's check some other row of this data set:

In [7]:

print(android_header)
print('\n')
print(android[5])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']

Hence the row 10,472 indeed has a missing value for 'Category', empty cell for 'Genres', and all the values in between are shifted to the left. This row has to be removed from the data set:

In [8]:

del android[10472]

2.2. Deleting Duplicates¶

Exploring the Google Play data set, it was discovered that some apps have duplicate entries. For instance, Instagram has 4 entries:

In [9]:

for app in android:
    name = app[0]
    if  name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

In total, there are 1,181 cases where an app occurs more than once:

In [10]:

# Creating the lists of duplicate apps and unique apps
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']

We need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

Returning to the rows we printed for the Instagram app, the main difference happens on the 4th position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times:

In [11]:

for app in android:
    name = app[0]
    if  name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

In [12]:

# Creating a dictionary with the highest number of reviews for each app
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max and reviews_max[name] < n_reviews) or name not in reviews_max:
        reviews_max[name] = n_reviews     

Given that in the Google Play data set 1,181 duplicates were detected, after we remove the duplicates, we should be left with 9,659 rows. We expect also the length of the dictionary to be equal to 9,659:

In [13]:

print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659

In [14]:

#  Creating a new data set without duplicates (one entry per app)
android_clean = []
already_added = []
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Checking the length of the resulting data set (again, expected value is 9,659):

In [15]:

print(len(android_clean))

In [16]:

print(android_clean[:5])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]

2.3. Deleting Non-English Apps¶

Since our company uses only English to develop its apps, we'd like to analyze only the apps that are directed toward an English-speaking audience.

Inspecting both data sets, it was detected that both have also apps with non-English names, that is containing symbols unusual for English texts (i.e. not English letters, digits 0-9, punctuation marks, and special symbols). These apps have to be removed.

In [17]:

print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[442][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


iPair-Meet, Chat, Dating
لعبة تقدر تربح DZ

According to the ASCII system, the numbers corresponding to the set of common English characters are all in the range 0-127. Hence we have to create a function to identify if each symbol of each app name belongs or not to this range. If it doesn't, the app cannot be considered for further data analysis and has to be removed from the data set.

In [18]:

def english_apps(string):
    for symbol in string:
        if ord(symbol) > 127:
            return False
    return True

Let's check this function on some apps:

In [19]:

print(english_apps('Instagram'))
print(english_apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_apps('Docs To Go™ Free Office Suite'))
print(english_apps('Instachat 😜'))

True
False
False
False

It results that sometimes the function cannot correctly identify certain English app names containing emojis and some special characters that fall outside the ASCII range. In this case we can lose valuable data.

To minimize the impact of data loss, we'll only remove an app if its name has more than 3 characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to 3 such symbols will still be labeled as English.

In [20]:

# Editing the previous function
def english_apps(string):
    acceptable = 0
    for symbol in string:
        if ord(symbol) > 127:
            acceptable += 1
        if acceptable > 3:
            return False
    return True

In [21]:

# Checking the updated function
print(english_apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_apps('Docs To Go™ Free Office Suite'))
print(english_apps('Instachat 😜'))

False
True
True

Now we will filter out non-English apps from both data sets:

In [22]:

android_cleaned_filtered = []
ios_filtered = []

for app in android_clean:    
    check = english_apps(app[0])
    if check == True:
        android_cleaned_filtered.append(app)
        
for app in ios:
    check = english_apps(app[1])
    if check == True:
        ios_filtered.append(app)       

In [23]:

explore_data(android_cleaned_filtered, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13

In [24]:

explore_data(ios_filtered, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16

After filtering the data set with android apps counts 9,614 rows and the one with iOS apps 6,183 rows.

2.4. Deleting Non-Free Apps¶

The company is specialized in building only free apps. Hence, before proceeding to the data analysis step, we have to remove all non-free apps from both data sets.

In [25]:

android_final = []
ios_final = []

for app in android_cleaned_filtered:    
    if app[7] == '0':
        android_final.append(app)
        
for app in ios_filtered:
    if app[4] == '0.0':
        ios_final.append(app)

In [26]:

print('Final number of android apps:', len(android_final))
print('Final number of iOS apps:', len(ios_final))

Final number of android apps: 8864
Final number of iOS apps: 3222

Now we have 8,864 android apps and 3,222 iOS apps for further data analysis.

3. Data Analysis¶

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users, because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of 3 steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after 6 months, we build an iOS version of the app and add it to the App Store.

Because our final goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.

3.1. Finding The Most Common Genres¶

Let's begin the analysis by getting a sense of what are the most common genres for each market. For Google Play data set the genres of the apps are described in the column 'Genres' and 'Category', for the App Store data set - in the column 'prime_genre'.

We'll build two functions we can use to analyze the frequency tables:

To generate frequency tables that show percentages.
To display the percentages in a frequency table in a descending order.

In [27]:

def freq_table(dataset, index):
    dictionary = {}
    number_apps = 0
    for row in dataset:
        number_apps += 1
        dictionary[row[index]] = dictionary.get(row[index], 0) + 1
            
    dictionary_percent = {}
    for key in dictionary:
        dictionary_percent[key] = (dictionary[key] / number_apps) * 100
        
    return dictionary_percent

In [28]:

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [29]:

# Prime_genre column
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665

Among iOS English free apps, the most common genre is Games (58%) followed with a big gap by Entertainment (7.9%). The general impression is that the apps designed for entertainment (games, photo and video, social networking, sports, music) significantly dominate the App Store, in comparison to the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle).

Judging only by the frequency table, we still cannot recommend an app profile for the App Store market, because a large number of apps for a particular genre does not necessarily imply that apps of that genre have a large number of users.

In [30]:

# Category column
display_table(android_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 0.6430505415162455
COMICS : 0.6204873646209386
BEAUTY : 0.5979241877256317

Among Android English free apps, the most common categories are also of entertaining character (FAMILY(18.9%) and GAME(9.7%). However, the dispersion of percentages for the other categories is not as large as for iOS apps, and in general a more balanced landscape of both practical and fun apps is observed. The number of categories is comparable with the number of iOS apps' genres.

If we look at the prime_genre column for Android apps, we will see that it is much more detailed and specified and not anymore comparable with the the number of iOS app genres:

In [31]:

# Genres column
display_table(android_final, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075812
Strategy : 0.9138086642599278
House & Home : 0.8235559566787004
Weather : 0.8009927797833934
Events : 0.7107400722021661
Adventure : 0.6768953068592057
Comics : 0.6092057761732852
Beauty : 0.5979241877256317
Art & Design : 0.5979241877256317
Parenting : 0.4963898916967509
Card : 0.45126353790613716
Casino : 0.42870036101083037
Trivia : 0.41741877256317694
Educational;Education : 0.39485559566787
Board : 0.3835740072202166
Educational : 0.3722924187725632
Education;Education : 0.33844765342960287
Word : 0.2594765342960289
Casual;Pretend Play : 0.236913357400722
Music : 0.2030685920577617
Racing;Action & Adventure : 0.16922382671480143
Puzzle;Brain Games : 0.16922382671480143
Entertainment;Music & Video : 0.16922382671480143
Casual;Brain Games : 0.13537906137184114
Casual;Action & Adventure : 0.13537906137184114
Arcade;Action & Adventure : 0.12409747292418773
Action;Action & Adventure : 0.10153429602888085
Educational;Pretend Play : 0.09025270758122744
Simulation;Action & Adventure : 0.078971119133574
Parenting;Education : 0.078971119133574
Entertainment;Brain Games : 0.078971119133574
Board;Brain Games : 0.078971119133574
Parenting;Music & Video : 0.06768953068592057
Educational;Brain Games : 0.06768953068592057
Casual;Creativity : 0.06768953068592057
Art & Design;Creativity : 0.06768953068592057
Education;Pretend Play : 0.056407942238267145
Role Playing;Pretend Play : 0.04512635379061372
Education;Creativity : 0.04512635379061372
Role Playing;Action & Adventure : 0.033844765342960284
Puzzle;Action & Adventure : 0.033844765342960284
Entertainment;Creativity : 0.033844765342960284
Entertainment;Action & Adventure : 0.033844765342960284
Educational;Creativity : 0.033844765342960284
Educational;Action & Adventure : 0.033844765342960284
Education;Music & Video : 0.033844765342960284
Education;Brain Games : 0.033844765342960284
Education;Action & Adventure : 0.033844765342960284
Adventure;Action & Adventure : 0.033844765342960284
Video Players & Editors;Music & Video : 0.02256317689530686
Sports;Action & Adventure : 0.02256317689530686
Simulation;Pretend Play : 0.02256317689530686
Puzzle;Creativity : 0.02256317689530686
Music;Music & Video : 0.02256317689530686
Entertainment;Pretend Play : 0.02256317689530686
Casual;Education : 0.02256317689530686
Board;Action & Adventure : 0.02256317689530686
Video Players & Editors;Creativity : 0.01128158844765343
Trivia;Education : 0.01128158844765343
Travel & Local;Action & Adventure : 0.01128158844765343
Tools;Education : 0.01128158844765343
Strategy;Education : 0.01128158844765343
Strategy;Creativity : 0.01128158844765343
Strategy;Action & Adventure : 0.01128158844765343
Simulation;Education : 0.01128158844765343
Role Playing;Brain Games : 0.01128158844765343
Racing;Pretend Play : 0.01128158844765343
Puzzle;Education : 0.01128158844765343
Parenting;Brain Games : 0.01128158844765343
Music & Audio;Music & Video : 0.01128158844765343
Lifestyle;Pretend Play : 0.01128158844765343
Lifestyle;Education : 0.01128158844765343
Health & Fitness;Education : 0.01128158844765343
Health & Fitness;Action & Adventure : 0.01128158844765343
Entertainment;Education : 0.01128158844765343
Communication;Creativity : 0.01128158844765343
Comics;Creativity : 0.01128158844765343
Casual;Music & Video : 0.01128158844765343
Card;Action & Adventure : 0.01128158844765343
Books & Reference;Education : 0.01128158844765343
Art & Design;Pretend Play : 0.01128158844765343
Art & Design;Action & Adventure : 0.01128158844765343
Arcade;Pretend Play : 0.01128158844765343
Adventure;Education : 0.01128158844765343

Like in the previous case, from these frequency tables alone we cannot deduce anything about the genres (categories) with the most users and cannot recommend an app profile for Google Play.

3.2. Finding The Most Popular Genres¶

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot column.

In [32]:

# Calculating the average number of user ratings per app genre on the App Store:
prime_genre = freq_table(ios_final, -5)
for genre in prime_genre:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            number_rating = float(app[5])
            total += number_rating
            len_genre += 1
    average_number_rating = total / len_genre
    print(genre, ':', average_number_rating)   

Catalogs : 4004.0
Food & Drink : 33333.92307692308
Travel : 28243.8
Business : 7491.117647058823
Games : 22788.6696905016
Weather : 52279.892857142855
Utilities : 18684.456790123455
Health & Fitness : 23298.015384615384
Navigation : 86090.33333333333
Shopping : 26919.690476190477
Medical : 612.0
Finance : 31467.944444444445
News : 21248.023255813954
Reference : 74942.11111111111
Productivity : 21028.410714285714
Education : 7003.983050847458
Sports : 23008.898550724636
Music : 57326.530303030304
Book : 39758.5
Photo & Video : 28441.54375
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Social Networking : 71548.34905660378

Looking at the results, a preliminary conlusion is that the most popular app genres (based on the average number of user ratings) are the following:

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5

Let's investigate more in detail each of them, in particular their contents of apps:

In [33]:

print('Navigation')
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])
print('\n')        
print('Reference')
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])
print('\n')
print('Social Networking')
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])
print('\n')
print('Music')
for app in ios_final:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])
print('\n')
print('Weather')
for app in ios_final:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])
print('\n')
print('Book')
for app in ios_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])

Navigation
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Reference
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Social Networking
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 23965
SimSimi : 23530
Grindr - Gay and same sex guys chat, meet and date : 23201
Wishbone - Compare Anything : 20649
imo video calls and chat : 18841
After School - Funny Anonymous School News : 18482
Quick Reposter - Repost, Regram and Reshare Photos : 17694
Weibo HD : 16772
Repost for Instagram : 15185
Live.me – Live Video Chat & Make Friends Nearby : 14724
Nextdoor : 14402
Followers Analytics for Instagram - InstaReport : 13914
YouNow: Live Stream Video Chat : 12079
FollowMeter for Instagram - Followers Tracking : 11976
LINE : 11437
eHarmony™ Dating App - Meet Singles : 11124
Discord - Chat for Gamers : 9152
QQ : 9109
Telegram Messenger : 7573
Weibo : 7265
Periscope - Live Video Streaming Around the World : 6062
Chat for Whatsapp - iPad Version : 5060
QQ HD : 5058
Followers Analysis Tool For Instagram App Free : 4253
live.ly - live video streaming : 4145
Houseparty - Group Video Chat : 3991
SOMA Messenger : 3232
Monkey : 3060
Down To Lunch : 2535
Flinch - Video Chat Staring Contest : 2134
Highrise - Your Avatar Community : 2011
LOVOO - Dating Chat : 1985
PlayStation®Messages : 1918
BOO! - Video chat camera with filters & stickers : 1805
Qzone : 1649
Chatous - Chat with new people : 1609
Kiwi - Q&A : 1538
GhostCodes - a discovery app for Snapchat : 1313
Jodel : 1193
FireChat : 1037
Google Duo - simple video calling : 1033
Fiesta by Tango - Chat & Meet New People : 885
Google Allo — smart messaging : 862
Peach — share vividly : 727
Hey! VINA - Where Women Meet New Friends : 719
Battlefield™ Companion : 689
All Devices for WhatsApp - Messenger for iPad : 682
Chat for Pokemon Go - GoChat : 500
IAmNaughty – Dating App to Meet New People Online : 463
Qzone HD : 458
Zenly - Locate your friends in realtime : 427
League of Legends Friends : 420
豆瓣 : 407
Candid - Speak Your Mind Freely : 398
知乎 : 397
Selfeo : 366
Fake-A-Location Free ™ : 354
Popcorn Buzz - Free Group Calls : 281
Fam — Group video calling for iMessage : 279
QQ International : 274
Ameba : 269
SoundCloud Pulse: for creators : 240
Tantan : 235
Cougar Dating & Life Style App for Mature Women : 213
Rawr Messenger - Dab your chat : 180
WhenToPost: Best Time to Post Photos for Instagram : 158
Inke—Broadcast an amazing life : 147
Mustknow - anonymous video Q&A : 53
CTFxCmoji : 39
Lobi : 36
Chain: Collaborate On MyVideo Story/Group Video : 35
botman - Real time video chat : 7
BestieBox : 0
MATCH ON LINE chat : 0
niconico ch : 0
LINE BLOG : 0
bit-tube - Live Stream Video Chat : 0


Music
Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Music : 7109
Nicki Minaj: The Empire : 5196
Sounds app - Music And Friends : 5126
SongFlip - Free Music Streamer : 5004
Simple Radio - Live AM & FM Radio Stations : 4787
Deezer - Listen to your Favorite Music & Playlists : 4677
Ringtones for iPhone with Ringtone Maker : 4013
Bose SoundTouch : 3687
Amazon Alexa : 3018
DatPiff : 2815
Trebel Music - Unlimited Music Downloader : 2570
Free Music Play - Mp3 Streamer & Player : 2496
Acapella from PicPlayPost : 2487
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
Musicloud - MP3 and FLAC Music Player for Cloud Platforms. : 2211
Piano - Play Keyboard Music Games with Magic Tiles : 1636
Boom: Best Equalizer & Magical Surround Sound : 1375
Music Freedom - Unlimited Free MP3 Music Streaming : 1246
AmpMe - A Portable Social Party Music Speaker : 1047
Medly - Music Maker : 933
Bose Connect : 915
Music Memos : 909
UE BOOM : 612
LiveMixtapes : 555
NOISE : 355
MP3 Music Player & Streamer for Clouds : 329
Musical Video Maker - Create Music clips lip sync : 320
Cloud Music Player - Downloader & Playlist Manager : 319
Remixlive - Remix loops with pads : 288
QQ音乐HD : 224
Blocs Wave - Make & Record Music : 158
PlayGround • Music At Your Fingertips : 150
Music and Chill : 135
The Singing Machine Mobile Karaoke App : 130
radio.de - Der Radioplayer : 64
Free Music -  Player & Streamer  for Dropbox, OneDrive & Google Drive : 46
NRJ Radio : 38
Smart Music: Streaming Videos and Radio : 17
BOSS Tuner : 13
PetitLyrics : 0


Weather
The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir : 0
wetter.com : 0
WarnWetter : 0


Book
Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0

Navigation genre average number of ratings seems to be heavily influenced by 2 giant apps, Waze and Google Maps, while all the other apps show quite low numbers. It means that in this case average is not a good metric to use, since the distribution is heavily skewed. If to exclude these 2 dominant apps, the average becomes very low, which makes this genre less interesting for our purposes.
Reference. A similar situation is observed: average is influenced mostly by Bible and dictionaries, with a big gap with all the other apps of this genre.
Social Networking. Even though also here we see an obvious domination of Facebook and Pinterest, we can also note that there are many other social networks for different categories of people and interests, with rather high values of the average number of ratings. We remember also that entertainment apps are very popular on the App Store, and in particular among them social networks are not the most popular ones. Hence this market niche is probably not oversaturated, while the demand is still very high (since the need of communication is always present). Also, people usually spend a lot of time on this genre of apps. They use them for leisure, being relaxed and open to new information, so there is more chance for an in-app ad to attract their attention.
Music. Like for the first 2 genres, the distribution for this one is skewed by several very popular apps. The rest of the apps mostly demonstrate quite low average number of ratings.
Weather. This genre doesn't seem particularly interesting for our purposes, since people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require connecting the app to non-free APIs.
Book. Here, practically, we have only 5 apps, if not to count few others, with extremely low average values. This niche is obviously undersaturated, which doesn't necessarily means a low demand. Finding some granular directions in this sphere (e.g. personal growth, psychology, business) and developing high-quality contents, we can attract probably not so large but pointedly interested audience, increasing in this way the chances of the app to become profitable.

Thus, the most promising iOS app profiles seem to be Social Networking and Book.

Our next step is to provide an app profile recommendation for the Google Play market. We have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough, with most values being open-ended (100+, 1,000+, 5,000+, etc.). We want to use these data anyway, after some cleaning: leaving the numbers as they are, removing commas and the plus characters, converting the numbers into float type.

In [34]:

# Calculating the average number of installs per app genre on Google Play
categories = freq_table(android_final, 1)
for category in categories:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            number_installs = app[5]
            number_installs = number_installs.replace('+', '')
            number_installs = number_installs.replace(',', '')
            total += float(number_installs)
            len_category += 1
    average_number_installs = total / len_category
    print(category, ':', average_number_installs)   

WEATHER : 5074486.197183099
PHOTOGRAPHY : 17840110.40229885
ENTERTAINMENT : 11640705.88235294
VIDEO_PLAYERS : 24727872.452830188
SHOPPING : 7036877.311557789
PARENTING : 542603.6206896552
MEDICAL : 120550.61980830671
PERSONALIZATION : 5201482.6122448975
EVENTS : 253542.22222222222
GAME : 15588015.603248259
BOOKS_AND_REFERENCE : 8767811.894736841
TRAVEL_AND_LOCAL : 13984077.710144928
FINANCE : 1387692.475609756
BEAUTY : 513151.88679245283
FOOD_AND_DRINK : 1924897.7363636363
COMICS : 817657.2727272727
NEWS_AND_MAGAZINES : 9549178.467741935
SOCIAL : 23253652.127118643
TOOLS : 10801391.298666667
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
HEALTH_AND_FITNESS : 4188821.9853479853
BUSINESS : 1712290.1474201474
LIBRARIES_AND_DEMO : 638503.734939759
ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
PRODUCTIVITY : 16787331.344927534
DATING : 854028.8303030303
HOUSE_AND_HOME : 1331540.5616438356
LIFESTYLE : 1437816.2687861272
COMMUNICATION : 38456119.167247385
SPORTS : 3638640.1428571427
EDUCATION : 1833495.145631068

We see that the most popular app genres (based on the average number of installs) are the following:

COMMUNICATION : 38456119
VIDEO_PLAYERS : 24727872
SOCIAL : 23253652
PHOTOGRAPHY : 17840110
PRODUCTIVITY : 16787331
GAME : 15588015
TRAVEL_AND_LOCAL : 13984077

Let's investigate more in detail the contents of their apps. First, it seems that these seemingly popular genres are dominated by some giant apps, with the number of installs more than 100 millions. These values, certainly, result in very biased average values.

In [35]:

for app in android_final:
    if app[1] == 'COMMUNICATION':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+

If to exclude from consideration these numerous giant apps of COMMUNICATION genre, the average would be reduced roughly 10 times:

In [36]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('COMMUNICATION')
print('Before:  38456119')
print('After:  ', average_number_installs)                                                     

COMMUNICATION
Before:  38456119
After:   3603485.3884615386

The same tendency is traced for all the other genres that look the most popular ones:

In [37]:

for app in android_final:
    if app[1] == 'VIDEO_PLAYERS':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+

In [38]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'VIDEO_PLAYERS') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('VIDEO_PLAYERS')
print('Before:  24727872')
print('After:  ', average_number_installs)   

VIDEO_PLAYERS
Before:  24727872
After:   5544878.133333334

In [39]:

for app in android_final:
    if app[1] == 'SOCIAL':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+

In [40]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'SOCIAL') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('SOCIAL')
print('Before:  23253652')
print('After:  ', average_number_installs)  

SOCIAL
Before:  23253652
After:   3084582.5201793723

In [41]:

for app in android_final:
    if app[1] == 'PHOTOGRAPHY':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+

In [42]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PHOTOGRAPHY') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('PHOTOGRAPHY')
print('Before:  17840110')
print('After:  ', average_number_installs)  

PHOTOGRAPHY
Before:  17840110
After:   7670532.29338843

In [43]:

for app in android_final:
    if app[1] == 'PRODUCTIVITY':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+

In [44]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PRODUCTIVITY') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('PRODUCTIVITY')
print('Before:  16787331')
print('After:  ', average_number_installs)  

PRODUCTIVITY
Before:  16787331
After:   3379657.318885449

In [45]:

for app in android_final:
    if app[1] == 'GAME':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Extreme Car Driving Simulator : 100,000,000+
Trivia Crack : 100,000,000+
Angry Birds 2 : 100,000,000+
Candy Crush Saga : 500,000,000+
8 Ball Pool : 100,000,000+
Subway Surfers : 1,000,000,000+
Candy Crush Soda Saga : 100,000,000+
Clash Royale : 100,000,000+
Clash of Clans : 100,000,000+
Plants vs. Zombies FREE : 100,000,000+
Pou : 500,000,000+
Flow Free : 100,000,000+
My Talking Angela : 100,000,000+
slither.io : 100,000,000+
Cooking Fever : 100,000,000+
Yes day : 100,000,000+
Score! Hero : 100,000,000+
Dream League Soccer 2018 : 100,000,000+
My Talking Tom : 500,000,000+
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100,000,000+
Zombie Tsunami : 100,000,000+
Helix Jump : 100,000,000+
Crossy Road : 100,000,000+
Temple Run 2 : 500,000,000+
Talking Tom Gold Run : 100,000,000+
Agar.io : 100,000,000+
Bus Rush: Subway Edition : 100,000,000+
Traffic Racer : 100,000,000+
Hill Climb Racing : 100,000,000+
Angry Birds Rio : 100,000,000+
Cut the Rope FULL FREE : 100,000,000+
Hungry Shark Evolution : 100,000,000+
Angry Birds Classic : 100,000,000+
Hill Climb Racing 2 : 100,000,000+
Jetpack Joyride : 100,000,000+
Super Mario Run : 100,000,000+
Glow Hockey : 100,000,000+
Asphalt 8: Airborne : 100,000,000+
Lep's World 2 🍀🍀 : 100,000,000+
Fruit Ninja® : 100,000,000+
Vector : 100,000,000+
Dr. Driving : 100,000,000+
Bike Race Free - Top Motorcycle Racing Games : 100,000,000+
Smash Hit : 100,000,000+
Temple Run : 100,000,000+
Geometry Dash Lite : 100,000,000+
Ant Smasher by Best Cool & Fun Games : 100,000,000+
Angry Birds Star Wars : 100,000,000+
Mobile Legends: Bang Bang : 100,000,000+
Banana Kong : 100,000,000+
Skater Boy : 100,000,000+
Shadow Fight 2 : 100,000,000+
Modern Combat 5: eSports FPS : 100,000,000+
Garena Free Fire : 100,000,000+

In [46]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'GAME') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('GAME')
print('Before:  15588015')
print('After:  ', average_number_installs)  

GAME
Before:  15588015
After:   6272564.694894147

In [47]:

for app in android_final:
    if app[1] == 'TRAVEL_AND_LOCAL':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])

Booking.com Travel Deals : 100,000,000+
TripAdvisor Hotels Flights Restaurants Attractions : 100,000,000+
Maps - Navigate & Explore : 1,000,000,000+
Google Street View : 1,000,000,000+
Google Earth : 100,000,000+

In [48]:

under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'TRAVEL_AND_LOCAL') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('TRAVEL_AND_LOCAL')
print('Before:  13984077')
print('After:  ', average_number_installs) 

TRAVEL_AND_LOCAL
Before:  13984077
After:   2944079.6336633665

This investigation reveals some insights for each of the most popular genres.

COMMUNICATION is oversaturated by numerous giant apps, with a big gap with all the others. This implies huge competition in given sphere, with very limited chance for newcomers. Apart from that, this category doesn't appear in a clear way among the genres of the App Store, and our aim is to recommend an app profile that shows potential for being profitable on both the App Store and Google Play.
VIDEO_PLAYERS, again, is oversaturated by some giant apps (even though fewer than the COMMUNICATION genre), with a big gap with all the others. Also, this category doesn't appear clearly among the genres of the App Store, only as a part of the Photo & Video genre, so it's impossible to compare these two genres in a direct way. In addition, the Photo & Video genre was not one of the most popular on the App Store.
SOCIAL. This category is also dominated by few famous apps like Facebook and Pinterest. However, some of these apps are quite specialized (LinkedIn used for work, Badoo for dating, etc), so their functions are different and the apps are not substitutable. Coming up with some interesting thematic social network could be a good idea for our apps, also because the average of the installs is still quite high even after excluding the giant apps. On the App Store we also found the corresponding genre Social Networking being highly potential. Hence this sphere can become profitable on both markets, which indeed is our ultimate goal.
PHOTOGRAPHY category shows high average number of installs even after removing the giant apps. Despite this, and despite being one of the most common genres on both markets, it is not identified as one of the most popular genres on the App Store. In addition, this category is represented mostly by photo editors, presumably used mainly for work by professional photographers. This work requires high concentration, so in-app ads probably will not meet suitable target audience here.
PRODUCTIVITY genre is heavily dominated by famous apps. In addition, it doesn't result to be very popular on the App Store, even being among the most common ones.
GAME. This category is the most common on the App Store and the second common on Google Play. On Google Play it shows also hign number of installs after removing numerous giant apps. However on the App Store it is not particularly popular. Anyway, given certain potential of this genre, we can consider the idea of creating a game-based social network (since we have already defined this category as possibly profitable). It can be, for instance, an online quiz, quest, some other online games where the involvement of several teams (or several participants) currently available online is required. In this case, for being effective in capturing attention, the in-app ads should appear between games while people are waiting for the results and can percept a new information without being distracted from the game.
TRAVEL_AND_LOCAL category has very few (but strongly influencing the average number of installs) dominant apps, which are mostly about maps and hotels. According to our investigation of the most common genres, this category is in the middle in both data sets (the App Store and Google Play). Among the iOS apps, it is not particularly popular, so it doesn't fit our purposes. However, we can use this idea for our apps: creating a social network related to travelling. For example, it can be an app dedicated to searching for co-travellers or a company to travel together, discussing itineraries, what to visit in a certain place, etc.

When we were investigating the app genres of the App Store, we defined as potential also the Book profile. For Google Play, the corresponding category (BOOKS_AND_REFERENCE) doesn't appear one of the most popular and, practically, is on the 11th place among the 33 categories. It could be also difficult to extract from here some ideas for a social networking app. Hence to create apps profitable on both markets, books don't seem to be the best chioce.

Conclusions¶

All in all, after a thorough analysis of the most common and the most popular app genres of both datasets, the SOCIAL NETWORKING profile was suggested as the most interesting for our purposes, i.e. creating profitable free English apps with the revenue based on in-app ads for both the App Store and Google play. To stand out in the existing apps of this kind and to overcome the competition, a right theme has to be selected. As some possible ideas, it was proposed to create an online quiz, quest, some other online games with a lot of people/teams involved, or a social networking app dedicated to searching for co-travellers, discussing itineraries and places to visit.