Profitable App Profiles for App Store and Google Play Markets

In this project we analyze profiles of App Store and Google Play free apps. The main source of revenue from these apps in in-app advertisements. We use the number of reviews to measure the number of users who download and interact with advertisements in each app.

The goal of the project is to determine the features of an app that attracts a large number of users who view and engage with advertisements in the app.

We are using two data sets:

  • A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.
  • A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017.
In [0]:
#open csv file for each dataset, read using reader function imported from csv module, store each in variable as list of lists
opened_file_google = open('/content/drive/My Drive/Datasets/googleplaystore.csv')
opened_file_apple = open('/content/drive/My Drive/Datasets/AppleStore.csv')
from csv import reader
google_data = list(reader(opened_file_google))
google_data_header = google_data[0]
google_data = google_data[1:]
apple_data = list(reader(opened_file_apple))
apple_data_header = apple_data[0]
apple_data = apple_data[1:]
In [0]:
def explore_data(dataset, start, end, rows_and_columns=False):
  '''Passed dataset paraemter as list of lists, prints rows of dataset and if rows_and_columns parameter is passed True
  then prints number of rows (including header row) and number of columns in dataset''' 
  dataset_slice = dataset[start:end]    
  for row in dataset_slice:
    print(row)
    print('\n') # adds a new (empty) line after each row

  if rows_and_columns:
    print('Number of rows:', len(dataset))
    print('Number of columns:', len(dataset[0]))
In [3]:
print(google_data_header)
print('\n')
explore_data(google_data, 0, 3, True)
print('\n'*3)
print(apple_data_header)
print('\n')
explore_data(apple_data, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13




['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16

Column headings for googleplaystore.csv data (click here for detailed description):

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

Column headings for AppleStore.csv data (click here for detailed description):

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

In [4]:
#Delete a row with error as identified in discussion forum in dataset documentation
print(len(google_data))
print(google_data[10472])
del google_data[10472]
print(len(google_data))
10841
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10840

Removing duplicate entries

There are duplicate entries for apps in the Google Play dataset. For example the app named 'Coloring book moana' has two separate entries in the dataset where each entry has a different value in the 'Reviews' column:

In [5]:
name = 'Coloring book moana'

for app in google_data:
  if app[0] == name:
    print(app)
    print(google_data.index(app))
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
1
['Coloring book moana', 'FAMILY', '3.9', '974', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
2033
In [6]:
duplicate_entries = []
unique_entries = []

for app in google_data:
  name = app[0]
  if name in unique_entries:
    duplicate_entries.append(name)
  else:
    unique_entries.append(name)

print('Number of duplicate apps: ', len(duplicate_entries))    
print('Examples of duplicate apps: ', duplicate_entries[:10])
Number of duplicate apps:  1181
Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']

There are 1181 duplicate entries in the dataset. Duplicates will be removed by only keeping the entry with the highest number of reviews and removing other entires with the same name: the enrty with the higest number of reviews will provide the most accurate rating.

To remove the duplicates, we will:

  1. Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
  2. Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).
In [7]:
#initialize empty dictionary reviews_max
#loop over apps in google_data and update reviews column value if entry is a duplicate. Else add key, value pair to reviews_max if app name is in reviews_max 
reviews_max = {}
for app in google_data:
  name = app[0]
  n_reviews = float(app[3])
  if name in reviews_max:
    if n_reviews > reviews_max[name]:
      reviews_max[name] = n_reviews
  else:
    reviews_max[name] = n_reviews

#print lengths of container variables to check loop has worked correctly
print('Length of google_data minus length of duplicate entries: ', len(google_data) - len(duplicate_entries))
print('Length of unique_entries: ', len(unique_entries))
print('Length of reviews_max: ', len(reviews_max))

                    
Length of google_data minus length of duplicate entries:  9659
Length of unique_entries:  9659
Length of reviews_max:  9659

As expected, the dataset minus the number of duplicate entries, the unique_entries list and the reviews_max dictionary all have the same length.

In [8]:
#create two empty lists to store cleaned dataset and to store named of apps already added to cleaned dataset
#loop through apps in original dataset and store name and number of reviews 
#if number of reviews is equal to the max number of reviews for apps of same name AND name of app in not in the list of names of apps already added then append app to cleaned dataset
#note: some rows in original dataset have duplicate entries with same number of reviews hence 'name not in already_added' required to prevent duplicates of these rows in cleaned data

google_cleaned = []
already_added = []

for app in google_data:
  name = app[0]
  n_reviews = float(app[3])
  if (n_reviews == reviews_max[name]) & (name not in already_added):
    google_cleaned.append(app)
    already_added.append(name)

#explore the cleaed dataset
explore_data(google_cleaned, 0, 3, True)   
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13

Removing Non-English Apps

Our dataset contains apps designed for non-english users.

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system.

In the code box below is a function which takes a string as an argument and determines if the language of the string is English:

In [0]:
def is_english(s):
  '''is_english returns True if the string only contains characters 
  with an output from ord() function in the range 0 to 127 and False if the 
  string contains one or more characters outside that range'''
  for character in s:
    if ord(character) > 127:
      return False
  return True
In [10]:
#test is_english function on some strings
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True
False
False
False

Notice the function couldn't correctly identify certain English app names like 'Docs To Go™ Free Office Suite' and 'Instachat 😜'. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127. This means that the function will incorrectly identify many apps as non-English and we will lose useful data.

To minimize the effects of data loss we can modify the is_english function to return false only if the string contains more than 3 characters outside the 0 to 127 ASCII range:

In [0]:
def is_english(s):
  '''is_english returns True if the string contains 3 or less characters 
  with an output from ord() function outside the range 0 to 127 and False if the 
  string contains 4 or more characters outside that range'''
  count = 0
  for character in s:
    if ord(character) > 127:
      count += 1
      if count == 4:
        return False
  return True
In [12]:
#test modified is_english function on same strings
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True
False
True
True

The function is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.

Now, use the new function to filter out non-English apps from both data sets. Loop through each data set. If an app name is identified as English, append the whole row to a new list:

In [13]:
english_google_cleaned = []
english_apple_data = []

for app in google_cleaned:
  name = app[0]
  if is_english(name):
    english_google_cleaned.append(app)

for app in apple_data:
  name = app[1] #name is in second column (index=1) of apple_data dataset
  if is_english(name):
    english_apple_data.append(app)   

explore_data(english_google_cleaned, 0, 3, True)
print('\n')
explore_data(english_apple_data, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16

Isolating Free Apps

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis. We will do so with the following process:

  1. Loop through each data set to isolate the free apps in separate lists.

  2. After isolating the free apps, check the length of each data set to see how many apps are remaining.

In [14]:
free_english_google_cleaned = []
free_english_apple_data = []

for app in english_google_cleaned:
  if (app[6] == 'Free') | (app[7] == '0'):
    free_english_google_cleaned.append(app)

for app in english_apple_data:
  if app[4] == '0.0':
    free_english_apple_data.append(app)

explore_data(free_english_google_cleaned, 0, 3, True)
print('\n')
explore_data(free_english_apple_data, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16

We're left with 8864 Google Play apps and 3222 Apple Store apps, which should be enough for our analysis.

Most Common App Genres

We seek to identify the features of an app profile that is successful in both the Google Play and Apple Store markets. We aim to determine the kinds of apps that are likely to be successful in both markets since our revenue from an app is mainly a function of the number of users of the app.

The validation process (to minimize risks and overheads) for a new app is as follows:

  1. Build a minimal Android version of the app, and add it to Google Play store.
  2. If the app has a good response from users, we develop it further.
  3. If the app is profitable after six months, we build an iOS version of the app and add it to the Apple Store.

We will begin by inspecting both datasets and determining the columns we could use to generate frequency tables to find the most common app genres in each market.

In [15]:
print(google_data_header)
print('\n')
print(apple_data_header)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
  • For the Google Play dataset we will use the columns 'Category' and 'Genres'.
  • For the Apple Store dataset we will use the column 'prime_genre'.

We'll build two functions we can use to analyze the frequency tables:

  • One function to generate frequency tables that show percentages
  • Another function we can use to display the percentages in a descending order
In [16]:
def freq_table(dataset, index):
  '''dataset is expected to be a list of lists and index is expected to be an integer
  freq_table returns the relative frequency table (as a dictionary) for any column we want.
  '''
  table = {}
  total = len(dataset)

  for app in dataset:
    key = app[index]
    if key in table:
      table[key] += 1
    else:
      table[key] = 1
  
  for key in table:
    table[key] = (table[key] / total) * 100
  
  return table

print(freq_table(free_english_google_cleaned, 1))
{'ART_AND_DESIGN': 0.6430505415162455, 'AUTO_AND_VEHICLES': 0.9250902527075812, 'BEAUTY': 0.5979241877256317, 'BOOKS_AND_REFERENCE': 2.1435018050541514, 'BUSINESS': 4.591606498194946, 'COMICS': 0.6204873646209386, 'COMMUNICATION': 3.2378158844765346, 'DATING': 1.861462093862816, 'EDUCATION': 1.1620036101083033, 'ENTERTAINMENT': 0.9589350180505415, 'EVENTS': 0.7107400722021661, 'FINANCE': 3.7003610108303246, 'FOOD_AND_DRINK': 1.2409747292418771, 'HEALTH_AND_FITNESS': 3.0798736462093865, 'HOUSE_AND_HOME': 0.8235559566787004, 'LIBRARIES_AND_DEMO': 0.9363718411552346, 'LIFESTYLE': 3.9034296028880866, 'GAME': 9.724729241877256, 'FAMILY': 18.907942238267147, 'MEDICAL': 3.531137184115524, 'SOCIAL': 2.6624548736462095, 'SHOPPING': 2.2450361010830324, 'PHOTOGRAPHY': 2.944494584837545, 'SPORTS': 3.395758122743682, 'TRAVEL_AND_LOCAL': 2.33528880866426, 'TOOLS': 8.461191335740072, 'PERSONALIZATION': 3.3167870036101084, 'PRODUCTIVITY': 3.892148014440433, 'PARENTING': 0.6543321299638989, 'WEATHER': 0.8009927797833934, 'VIDEO_PLAYERS': 1.7937725631768955, 'NEWS_AND_MAGAZINES': 2.7978339350180503, 'MAPS_AND_NAVIGATION': 1.3989169675090252}
In [0]:
def display_table(dataset, index):
  '''Takes in two parameters: dataset and index. dataset is expected to be a list of lists, and index is expected to be an integer.
  Generates a frequency table using the freq_table() function.
  Transforms the frequency table into a list of tuples (value, key), then sorts the list in a descending order using sorted() function.
  Prints the entries of the frequency table.
  Does not return anyhting.
  '''
  table = freq_table(dataset, index)
  table_display = []
  for key in table:
      key_val_as_tuple = (table[key], key)
      table_display.append(key_val_as_tuple)

  table_sorted = sorted(table_display, reverse = True)
  for entry in table_sorted:
      print(entry[1], ':', entry[0])
In [18]:
#use the display_table function to display the frequency table of the prime_genre column from english_apple_data

display_table(free_english_apple_data, 11)
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665

For the prime_genre column of the App Store data set:

By far the most common genre is 'Games' with over 50% relative frequency. The second most common genre is 'Entertainment' with around 7% relative frequency. Approximately 62% of English apps from the Apple Store in this dataset have a prime genre of 'Games' or 'Entertainment'. 'Education' and 'Photo and Video' comprise approximately 12%. It is clear that most apps are designed more for entertainment purposes than practical purposes and it seems likely that an app profile purposed towards entermainment might have a large number of users if we assume that app designers of iOS apps are responding to market demand and that market demand for an app genre can be inferred by the amount of offerings. However it could be that a small number of apps with practical purposes have a large number of users - maybe because one or two very effective and popular apps have been released.

Let's continue by examining the Category and Genres columns of the Google Play data set (two columns which seem to be related).

In [19]:
print('Categories frequency table: ')
print('\n')
display_table(free_english_google_cleaned, 1)
print('\n')
print('Genres frequency table: ')
print('\n')
display_table(free_english_google_cleaned, 9)
Categories frequency table: 


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 0.6430505415162455
COMICS : 0.6204873646209386
BEAUTY : 0.5979241877256317


Genres frequency table: 


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075812
Strategy : 0.9138086642599278
House & Home : 0.8235559566787004
Weather : 0.8009927797833934
Events : 0.7107400722021661
Adventure : 0.6768953068592057
Comics : 0.6092057761732852
Beauty : 0.5979241877256317
Art & Design : 0.5979241877256317
Parenting : 0.4963898916967509
Card : 0.45126353790613716
Casino : 0.42870036101083037
Trivia : 0.41741877256317694
Educational;Education : 0.39485559566787
Board : 0.3835740072202166
Educational : 0.3722924187725632
Education;Education : 0.33844765342960287
Word : 0.2594765342960289
Casual;Pretend Play : 0.236913357400722
Music : 0.2030685920577617
Racing;Action & Adventure : 0.16922382671480143
Puzzle;Brain Games : 0.16922382671480143
Entertainment;Music & Video : 0.16922382671480143
Casual;Brain Games : 0.13537906137184114
Casual;Action & Adventure : 0.13537906137184114
Arcade;Action & Adventure : 0.12409747292418773
Action;Action & Adventure : 0.10153429602888085
Educational;Pretend Play : 0.09025270758122744
Simulation;Action & Adventure : 0.078971119133574
Parenting;Education : 0.078971119133574
Entertainment;Brain Games : 0.078971119133574
Board;Brain Games : 0.078971119133574
Parenting;Music & Video : 0.06768953068592057
Educational;Brain Games : 0.06768953068592057
Casual;Creativity : 0.06768953068592057
Art & Design;Creativity : 0.06768953068592057
Education;Pretend Play : 0.056407942238267145
Role Playing;Pretend Play : 0.04512635379061372
Education;Creativity : 0.04512635379061372
Role Playing;Action & Adventure : 0.033844765342960284
Puzzle;Action & Adventure : 0.033844765342960284
Entertainment;Creativity : 0.033844765342960284
Entertainment;Action & Adventure : 0.033844765342960284
Educational;Creativity : 0.033844765342960284
Educational;Action & Adventure : 0.033844765342960284
Education;Music & Video : 0.033844765342960284
Education;Brain Games : 0.033844765342960284
Education;Action & Adventure : 0.033844765342960284
Adventure;Action & Adventure : 0.033844765342960284
Video Players & Editors;Music & Video : 0.02256317689530686
Sports;Action & Adventure : 0.02256317689530686
Simulation;Pretend Play : 0.02256317689530686
Puzzle;Creativity : 0.02256317689530686
Music;Music & Video : 0.02256317689530686
Entertainment;Pretend Play : 0.02256317689530686
Casual;Education : 0.02256317689530686
Board;Action & Adventure : 0.02256317689530686
Video Players & Editors;Creativity : 0.01128158844765343
Trivia;Education : 0.01128158844765343
Travel & Local;Action & Adventure : 0.01128158844765343
Tools;Education : 0.01128158844765343
Strategy;Education : 0.01128158844765343
Strategy;Creativity : 0.01128158844765343
Strategy;Action & Adventure : 0.01128158844765343
Simulation;Education : 0.01128158844765343
Role Playing;Brain Games : 0.01128158844765343
Racing;Pretend Play : 0.01128158844765343
Puzzle;Education : 0.01128158844765343
Parenting;Brain Games : 0.01128158844765343
Music & Audio;Music & Video : 0.01128158844765343
Lifestyle;Pretend Play : 0.01128158844765343
Lifestyle;Education : 0.01128158844765343
Health & Fitness;Education : 0.01128158844765343
Health & Fitness;Action & Adventure : 0.01128158844765343
Entertainment;Education : 0.01128158844765343
Communication;Creativity : 0.01128158844765343
Comics;Creativity : 0.01128158844765343
Casual;Music & Video : 0.01128158844765343
Card;Action & Adventure : 0.01128158844765343
Books & Reference;Education : 0.01128158844765343
Art & Design;Pretend Play : 0.01128158844765343
Art & Design;Action & Adventure : 0.01128158844765343
Arcade;Pretend Play : 0.01128158844765343
Adventure;Education : 0.01128158844765343
  • For the Category column of the Google Play dataset:

Around 19% of English Apps on the Google play store are categroized as 'FAMILY'; around 10% as 'GAME' and around 9% as 'TOOLS'. All other categories have a relative frequency below 5%. It seems on first inspection as through there are significantly less apps purposed for entertainment in the Google Play store market than in the Apple Store market. However if we examine the 'FAMILY' category on the Google Play store we can see that this category (which accounts for almost 19% of the apps) means mostly games for kids. In any case, even taking this into account, there appears to be a much lower relative frequency of entertainment purposed apps on the Google Play store than the Apple store and is more balanced with practical apps.

  • For the Genres column of the Google Play dataset:

Around 9% have the genre of 'Tools', 6% of 'Entertainment' and 5% of 'Education'. All other genres have a relative frequency below 5%. The genres with lower relative frequencies in this table are subcategories and it is liklely that the relative frequency of top level categories has been affected by this more granular grouping of app genres. It is difficult to compare groupings for the Category and genres columns, and since we are looking for a holistic overview, we will decide to only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

Most Popular Apps by Genre on Apple Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:

  1. Isolate the apps of each genre.
  2. Sum up the user ratings for the apps of that genre.
  3. Divide the sum by the number of apps belonging to that genre (not by the total number of apps).
  4. Display the genre with average number of ratings ordered by the descending average number of ratings.
In [20]:
prime_genre_table = freq_table(free_english_apple_data, 11)

genre_dict = {}

for genre in prime_genre_table:
  total_num_ratings = 0
  len_genre = 0
  for app in free_english_apple_data:
    genre_app = app[11]
    if genre_app == genre:
      app_num_ratings = float(app[5])
      total_num_ratings += app_num_ratings
      len_genre += 1
  mean_num_ratings = total_num_ratings / len_genre
  genre_dict[genre] = mean_num_ratings

genre_list = []

for key in genre_dict:
    key_val_as_tuple = (genre_dict[key], key)
    genre_list.append(key_val_as_tuple)

genre_list_sorted = sorted(genre_list, reverse=True)
for entry in genre_list_sorted:
    print(entry[1], ':', entry[0])
Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0

On average, Navigation apps (86k) have the highest average number of user ratings, followed by Reference apps (75k), then Social Networking apps (72k), then Music apps (57k), then Weather (52k) and then Book apps (40k).

Even though approx. 58% of apps are in the genre 'Games' it is clear that reference and practical apps have a much larger share of user ratings.

Let us look at the names and number of ratings of apps with a prime genre of 'Navigation':

In [21]:
for app in free_english_apple_data:
  if app[11] == 'Navigation':
    print(app[1], ':', app[5])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5

We can see that the majority of user ratings for apps with the 'Navigation' prime genre are for 'Waze' and 'Google Maps'. It would seem that this genre is dominated by a small number of popular apps and other apps do not have much traffic.

In [22]:
for app in free_english_apple_data:
  if app[11] == 'Reference':
    print(app[1], ':', app[5])
print()
for app in free_english_apple_data:
  if app[11] == 'Book':
    print(app[1], ':', app[5])
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0

The 'Reference' and 'Book' genres both appear to be dominated by apps created for popular publications or a small number of popular apps. The Reference apps with the highest number of user reviews are the 'Bible' app and Dictionary apps. The 'Book' apps with the highest number of user reviews are mostly eBook readers or audio book hosting apps.

In [23]:
for app in free_english_apple_data:
  if app[11] == 'Social Networking':
    print(app[1], ':', app[5])
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 23965
SimSimi : 23530
Grindr - Gay and same sex guys chat, meet and date : 23201
Wishbone - Compare Anything : 20649
imo video calls and chat : 18841
After School - Funny Anonymous School News : 18482
Quick Reposter - Repost, Regram and Reshare Photos : 17694
Weibo HD : 16772
Repost for Instagram : 15185
Live.me – Live Video Chat & Make Friends Nearby : 14724
Nextdoor : 14402
Followers Analytics for Instagram - InstaReport : 13914
YouNow: Live Stream Video Chat : 12079
FollowMeter for Instagram - Followers Tracking : 11976
LINE : 11437
eHarmony™ Dating App - Meet Singles : 11124
Discord - Chat for Gamers : 9152
QQ : 9109
Telegram Messenger : 7573
Weibo : 7265
Periscope - Live Video Streaming Around the World : 6062
Chat for Whatsapp - iPad Version : 5060
QQ HD : 5058
Followers Analysis Tool For Instagram App Free : 4253
live.ly - live video streaming : 4145
Houseparty - Group Video Chat : 3991
SOMA Messenger : 3232
Monkey : 3060
Down To Lunch : 2535
Flinch - Video Chat Staring Contest : 2134
Highrise - Your Avatar Community : 2011
LOVOO - Dating Chat : 1985
PlayStation®Messages : 1918
BOO! - Video chat camera with filters & stickers : 1805
Qzone : 1649
Chatous - Chat with new people : 1609
Kiwi - Q&A : 1538
GhostCodes - a discovery app for Snapchat : 1313
Jodel : 1193
FireChat : 1037
Google Duo - simple video calling : 1033
Fiesta by Tango - Chat & Meet New People : 885
Google Allo — smart messaging : 862
Peach — share vividly : 727
Hey! VINA - Where Women Meet New Friends : 719
Battlefield™ Companion : 689
All Devices for WhatsApp - Messenger for iPad : 682
Chat for Pokemon Go - GoChat : 500
IAmNaughty – Dating App to Meet New People Online : 463
Qzone HD : 458
Zenly - Locate your friends in realtime : 427
League of Legends Friends : 420
豆瓣 : 407
Candid - Speak Your Mind Freely : 398
知乎 : 397
Selfeo : 366
Fake-A-Location Free ™ : 354
Popcorn Buzz - Free Group Calls : 281
Fam — Group video calling for iMessage : 279
QQ International : 274
Ameba : 269
SoundCloud Pulse: for creators : 240
Tantan : 235
Cougar Dating & Life Style App for Mature Women : 213
Rawr Messenger - Dab your chat : 180
WhenToPost: Best Time to Post Photos for Instagram : 158
Inke—Broadcast an amazing life : 147
Mustknow - anonymous video Q&A : 53
CTFxCmoji : 39
Lobi : 36
Chain: Collaborate On MyVideo Story/Group Video : 35
botman - Real time video chat : 7
BestieBox : 0
MATCH ON LINE chat : 0
niconico ch : 0
LINE BLOG : 0
bit-tube - Live Stream Video Chat : 0

The 'Social Networking' genre contains a larger number of apps than the other genres with a high average number of ratings. Although this genre also appears to be dominated by well known and popular apps like 'Facebook', 'Pinterest' and 'Skype'; there are a larger number of apps with an appreciable number of user ratings with suggests there is more chance that a new app may also attract a decent number of users.

Since all these popular genres are heavily skewed by a few apps with a large number of ratings, we could make our analysis more relevant to new apps by removing these very popular apps (>=30% of total number of ratings in genre) from the dataset and then recalculating the average number of ratings per app:

In [24]:
genre_dict = {}

for genre in prime_genre_table:
  total_num_ratings = 0
  len_genre = 0
  for app in free_english_apple_data:
    genre_app = app[11]
    if genre_app == genre: 
      app_num_ratings = float(app[5])
      total_num_ratings += app_num_ratings
      
  #added a loop to remove apps for average calculation which have over 20% of total number of ratings in genre
  new_total_num_ratings = total_num_ratings
  for app in free_english_apple_data:
    genre_app = app[11]
    if genre_app == genre:
      app_num_ratings = float(app[5])   
      if app_num_ratings >= 0.2*total_num_ratings:
        new_total_num_ratings -= app_num_ratings
      else:
        len_genre += 1
     
  mean_num_ratings = new_total_num_ratings / len_genre
  genre_dict[genre] = mean_num_ratings


genre_list = []

for key in genre_dict:
    key_val_as_tuple = (genre_dict[key], key)
    genre_list.append(key_val_as_tuple)

genre_list_sorted = sorted(genre_list, reverse=True)
for entry in genre_list_sorted:
    print(entry[1], ':', entry[0])
Social Networking : 43899.514285714286
Weather : 35859.666666666664
Music : 27782.953125
Shopping : 26919.690476190477
Book : 23426.384615384617
Sports : 23008.898550724636
Games : 22788.6696905016
Reference : 21355.176470588234
Productivity : 21028.410714285714
Finance : 19606.941176470587
Travel : 17527.358974358973
Photo & Video : 15025.716981132075
Entertainment : 14029.830708661417
News : 13323.97619047619
Utilities : 12925.0125
Food & Drink : 12675.083333333334
Health & Fitness : 10044.920634920634
Lifestyle : 9956.1
Education : 7003.983050847458
Business : 5541.75
Navigation : 4146.25
Catalogs : 890.3333333333334
Medical : 9.666666666666666

With the most popular market dominating apps removed in each genre, we can compare average number of ratings for apps with 20% or less of the total number of ratings for all apps in a given genre.

The following 5 genres have the highest average number of rating for smaller apps:

  1. Social Networking
  2. Weather
  3. Music
  4. Shopping
  5. Book

We could build a new social networking app, which is likely to be cheaper to build and not require paying for an external specialized API.

Most Popular Apps by Genre on Google Play

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. Using the 'installs' column of the Google Play dataset we can look at the relative frequency of intervals for number of installs. The intervals are imprecise, but we can accept taking the lower limit of each interval as the float value for all values in the interval as we do not require high precision for our analysis. Note that the intervals contain non-numeric characters ',' and '+' but we can convert the intervals to floats using the str.replace(old, new) method.

In [25]:
display_table(free_english_google_cleaned, 5) #'Installs' column is column indexed 5
1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343

Start by generating a frequency table for the Category column of the Google Play data set to get the unique app genres using the freq_table() function:

In [26]:
categories_table = freq_table(free_english_google_cleaned, 1)

category_dict = {}

for category in categories_table:
  total_installs = 0
  len_category = 0
  for app in free_english_google_cleaned:
    category_app = app[1]
    if category_app == category:
      n_installs = app[5]
      n_installs = float(n_installs.replace('+', '').replace(',', ''))
      total_installs += n_installs
      len_category += 1
  
  mean_installs = total_installs / len_category
  category_dict[category] = mean_installs


category_list = []

for key in category_dict:
    key_val_as_tuple = (category_dict[key], key)
    category_list.append(key_val_as_tuple)

category_list_sorted = sorted(category_list, reverse=True)
for entry in category_list_sorted:
    print(entry[1], ':', entry[0])
COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245283
EVENTS : 253542.22222222222
MEDICAL : 120550.61980830671

On average, apps in the COMMUNICATION category have the most number of installs (approx. 38,000,000). However, as with the average number of ratings of apps in different genres in the Apple Store dataset, by isolating the very popular apps we can see that the mean is highly skewed upwards by a few apps with 1 billion installs like WhatsApp Messenger, Messenger, Skype and Google Chrome and others with 500 million installs:

In [31]:
for app in free_english_google_cleaned:
  if (app[1] == 'COMMUNICATION') & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
    print(app[0], ':', app[5])
B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+
WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+

If we also look at the app names with the number of installs for apps in the categories with the second and third highest average number of reviews (VIDEO_PLAYERS and SOCIAL) we see that the VIDEO_PLAYERS category is dominated by Youtube, Google Play and MX Player and the SOCIAL category is skewed by Facebook, Facebook Lite, Google+, Instagram and Snapchat.

In [28]:
for category in ['VIDEO_PLAYERS', 'SOCIAL']:
  print(category)
  for app in free_english_google_cleaned:
    if (app[1] == category) & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
      print(app[0], ':', app[5])
  print()
VIDEO_PLAYERS
YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+

SOCIAL
Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+

As in the Apple Store, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

Let us repeat what we did for the Apple Store data and remove those few very popular apps that skew the average for each category by removing apps with 100 million installs or above and recalculating the average number of installs in each category without these market domineering apps:

In [29]:
category_dict = {}

for category in categories_table:
  total_installs = 0
  len_category = 0
  for app in free_english_google_cleaned:
    category_app = app[1]
    if category_app == category:
      n_installs = app[5]
      n_installs = float(n_installs.replace('+', '').replace(',', ''))
      total_installs += n_installs
          
  #added a nested loop to remove apps for average calculation for each genre which have over 100000000 installs
  new_total_installs = total_installs
  for app in free_english_google_cleaned:
    category_app = app[1]
    if category_app == category:
      n_installs = app[5]
      n_installs = float(n_installs.replace('+', '').replace(',', ''))   
      if n_installs >= 100000000:
        new_total_installs -= n_installs
      else:
        len_category += 1
     
  mean_installs = new_total_installs / len_category
  category_dict[category] = mean_installs

category_list = []

for key in category_dict:
    key_val_as_tuple = (category_dict[key], key)
    category_list.append(key_val_as_tuple)

category_list_sorted = sorted(category_list, reverse=True)
for entry in category_list_sorted:
    print(entry[1], ':', entry[0])
PHOTOGRAPHY : 7670532.29338843
GAME : 6272564.694894147
ENTERTAINMENT : 6118250.0
VIDEO_PLAYERS : 5544878.133333334
WEATHER : 5074486.197183099
SHOPPING : 4640920.541237113
COMMUNICATION : 3603485.3884615386
PRODUCTIVITY : 3379657.318885449
TOOLS : 3191461.128987517
SOCIAL : 3084582.5201793723
SPORTS : 2994082.551839465
TRAVEL_AND_LOCAL : 2944079.6336633665
PERSONALIZATION : 2549775.832167832
MAPS_AND_NAVIGATION : 2484104.7540983604
FAMILY : 2342897.527075812
HEALTH_AND_FITNESS : 2005713.6605166052
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
NEWS_AND_MAGAZINES : 1502841.8775510204
BOOKS_AND_REFERENCE : 1437212.2162162163
HOUSE_AND_HOME : 1331540.5616438356
BUSINESS : 1226918.7407407407
LIFESTYLE : 1152128.779710145
FINANCE : 1086125.7859327218
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245283
EVENTS : 253542.22222222222
MEDICAL : 120550.61980830671

The following 5 categories have the highest average number of installs for 'smaller' apps:

  1. PHOTOGRAPHY
  2. GAME
  3. ENTERTAINMENT
  4. VIDEO_PLAYERS
  5. WEATHER

Lets look at the photography category in greater detail:

In [0]:
for app in free_english_google_cleaned:
  if (app[1] == 'PHOTOGRAPHY') & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
    print(app[0], ':', app[5])
print('\n'*2) 
for app in free_english_google_cleaned:
    if app[1] == 'PHOTOGRAPHY':
        print(app[0], ':', app[5])

It appears that there are no apps with over 500,000,000 installs in this category but 19 apps in the 100,000,000 to 500,000,000 installs interval: it appears that the number of installs is more evenly distributed than in other categories.

We will now look at the names of apps which are 'moderately' popular in this category (between 1,000,000 and 100,000,000 installs):

In [0]:
for app in free_english_google_cleaned:
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

There are mainly camera apps and photo editing / organizer apps on this list with a few apps which are photo sharing (social) apps. It may not be a good idea to build an editing app since there will be significant competition. We may have success with an app that combines photo editing and social media but there are already a number of giants like Instagram and Snapchat who dominate this markets and would be very difficult to compete with.

The game genre is second on this list, but previously we found out this part of the market seems very saturated on the Apple Store, so we'd like to come up with a different app recommendation if possible as we are looking to recommend an app which has the potential to be successful in both the Apple and the Android markets.

Since the 'COMMUNICATION' and 'SOCIAL' categories are relatively high up this list of average installs with larger apps removed, and considering that some popular 'SOCIAL' apps could have been classified as 'COMMUNICATION', and further recalling that 'Social Networking' was potentially a promising genre in the apple store market: we will also look at the 'SOCIAL' category in more detail by again isolating 'moderately' popular apps whilst also looking at the apps categorized as 'COMMUNICATION' with a very large number of installs:

In [39]:
for app in free_english_google_cleaned:
  if (app[1] == 'COMMUNICATION') & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
    print(app[0], ':', app[5])

print('\n')

for app in free_english_google_cleaned:
    if (app[1] == 'SOCIAL') and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+


TextNow - free text + calls : 10,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, stickers and GIF : 1,000,000+
HTC Social Plugin - Facebook : 10,000,000+
Quora : 10,000,000+
Kate Mobile for VK : 10,000,000+
Family GPS tracker KidControl + GPS by SMS Locator : 1,000,000+
Moment : 1,000,000+
Text Me: Text Free, Call Free, Second Phone Number : 10,000,000+
Text Free: WiFi Calling App : 5,000,000+
Text free - Free Text + Call : 10,000,000+
ooVoo Video Calls, Messaging & Stories : 50,000,000+
Whisper : 5,000,000+
Blogger : 5,000,000+
TwitCasting Live : 1,000,000+
YouNow: Live Stream Video Chat : 10,000,000+
Banjo : 1,000,000+
We Heart It : 10,000,000+
MeetMe: Chat & Meet New People : 50,000,000+
Timehop : 5,000,000+
Frontback - Social Photos : 1,000,000+
Path : 10,000,000+
SayHi Chat, Meet New People : 10,000,000+
Tapatalk - 100,000+ Forums : 10,000,000+
Couple - Relationship App : 1,000,000+
Nextdoor - Local neighborhood news & classifieds : 5,000,000+
LOVOO : 10,000,000+
Jaumo Dating, Flirt & Live Video : 10,000,000+
Zello PTT Walkie Talkie : 50,000,000+
textPlus: Free Text & Calls : 10,000,000+
magicApp Calling & Messaging : 10,000,000+
Dating App, Flirt & Chat : W-Match : 10,000,000+
Meetup : 5,000,000+
POF Free Dating App : 50,000,000+
Tagged - Meet, Chat & Dating : 10,000,000+
SKOUT - Meet, Chat, Go Live : 50,000,000+
Mico- Stranger Chat Random video Chat, Live, Meet : 10,000,000+
Waplog - Free Chat, Dating App, Meet Singles : 10,000,000+
B-Messenger Video Chat : 1,000,000+
Instachat 😜 : 5,000,000+
Fame Boom for Real Followers, Likes : 5,000,000+
FollowMeter for Instagram : 1,000,000+
pixiv : 1,000,000+
U LIVE – Video Chat & Stream : 1,000,000+
VMate Lite - Funny Short Videos Social Network : 1,000,000+
Legend - Animate Text in Video : 10,000,000+
GUYZ - Gay Chat & Gay Dating : 1,000,000+
Snaappy – 3D fun AR core communication platform : 1,000,000+
Find My Friends : 10,000,000+
Grindr - Gay chat : 10,000,000+
Lesbian Chat & Dating - SPICY : 1,000,000+
BOO! - Next Generation Messenger : 1,000,000+
Wishbone - Compare Anything : 1,000,000+
Fiesta by Tango - Find, Meet and Make New Friends : 1,000,000+
Periscope - Live Video : 10,000,000+
Free phone calls, free texting SMS on free number : 10,000,000+
Phone Tracker : Family Locator : 10,000,000+
HOLLA Live: Meet New People via Random Video Chat : 5,000,000+
+Download 4 Instagram Twitter : 1,000,000+
Hornet - Gay Social Network : 5,000,000+
Amino: Communities and Chats : 10,000,000+
EZ Video Download for Facebook : 1,000,000+
Messages, Text and Video Chat for Messenger : 10,000,000+
All Social Networks : 1,000,000+
Messenger Messenger : 10,000,000+
Facebook Creator : 1,000,000+
Friendly for Facebook : 1,000,000+
Faster for Facebook Lite : 1,000,000+
Messenger : 10,000,000+
Who Viewed My Facebook Profile - Stalkers Visitors : 5,000,000+
Stickers for Facebook : 1,000,000+
FunForMobile Ringtones & Chat : 5,000,000+
Frim: get new friends on local chat rooms : 5,000,000+

In this list there are a variety of dating apps and also apps that have been made as 'add-ons' or enhancements to the well known, very popular social apps. We can also see that there are a number of very successful apps in the 'COMMUNCIATION' category that could equally well be categorized as 'SOCIAL'.

There is competition in the middle market, but there may be potential in both markets for a new dating app which stands out somehow. We have seen that the Game genre is saturated on the apple store but perhaps there is a niche for a 'gamified' dating app. The book genre is quite high up the ranked list of average ratings of 'smaller' apps in the Apple Store: perhaps we could create a dating app which connects users based upon shared literature preferences or a dating app targeted at a religious demographic which is an offshoot from the very popular existing reference apps for the Bible and the Quran on the Apple Store.

Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that creating a dating app with a unique selling point or specific focus could be profitable for both the Google Play and the App Store markets. The markets are already full of dating apps, so we need to add some special features or make the app specific to people of a certain demographic or with certain shared interests. This may limit our market size however, so alternatively we could create a 'gamified' dating app that leverages the high popularity of both the game and the social genres in both markets.