In this project, I endorsed the role of a data analyst working for app-building company. Our apps are available on the Apple Store and Google play-store.
Our business model consists in delevering viral free apps for the general public. Our revenues are from in-app ads and depend heavily on the popularity of our apps and the number our people that use them.
Goal:
Our purpose here is to analyze our dataset and help our developers understand the type of aps that are the most likely to attract more users.
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.
Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:
A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this link.
A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this link.
We'll start by opening and exploring these two data sets.
from csv import reader
# Apple Store dataset
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]
# Google Play data set
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(android_header)
print('\n')
explore_data(android, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of rows: 10841 Number of columns: 13
The Google Play Store dataset is composed of 13 columns and 10841 rows. From the information we have, we can assume that the interesting columns for our analysis will be: 'App', 'Category', 'Rating', 'Price', 'Install' and 'Genres'. This list may evolve while running the analysis.
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7197 Number of columns: 16
The Apple Store database is composed of 7197 row for 16 columns. Interesting columns should be: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.
Let's first check if all the data are correct and accurate. This process is called Data Clearning. It allows us to remove apps that have foreign names for example. Also, since we are a company that only develops free apps, it can help us remove non-free apps for our analysis.
for row in android:
if len(row) != len(android_header):
print(row)
print(android.index(row))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 10472
It seems that the row 10472 is missing information in the Category column. It hence messes with our data. Since the information is corrupted, we will remove it.
del android[10472]
for row in android:
if len(row) != len(android_header):
print(row)
print(android.index(row))
The incorrect data [10472] is well removed from the data set.
for row in ios:
if len(row) != len(ios_header):
print(row)
print(ios.index(row))
There seems to be no corrupted data in the Apple Store data set, we can continue our analysis.
From the discussion over the data sets on Kaggle, it appears that the Google Play Store dataset is composed of duplicate entries. For example, the app Instagram has 4 different entries:
for app in android:
name = app[0]
if name == 'Instagram':
print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
Let's find out how many dupplicates the dataset is composed of.
unique_apps_android = []
duplicate_apps_android = []
for app in android:
name = app[0]
if name in unique_apps_android:
duplicate_apps_android.append(name)
else:
unique_apps_android.append(name)
print(len(duplicate_apps_android))
print('\n')
print('Example of duplicates:', duplicate_apps_android[:5])
1181 Example of duplicates: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']
There are a total of 1181 duplicates in the Google Play Store dataset. You can also see a few examples in the output.
It might be worth to also check for the Apple Store dataset.
unique_apps_ios = []
duplicate_apps_ios = []
for app in ios:
name = app[0]
if name in unique_apps_ios:
duplicate_apps_ios.append(name)
else:
unique_apps_ios.append(name)
print(len(duplicate_apps_ios))
0
Lucky for us, the Apple Store dataset does not contain any duplicates!
This is the part where things get interesting in the data cleaning process. There is around 10% of duplicates in the totality of the Google Play Store dataset. We will have to remove them if we want our analysis to be correct. Removing duplicates imply that we need to keep the most correct information. To do this, we'd need to come up with a list of criteria to judge which information is the most correct one.
In our case, it turns out that number of reviews make a difference. Here is for example the duplicates of the Instagram app that we saw earlier:
Using the number or reviews will get us the indication of which information was the most recently updated and hence, the most correct one.
print('Expected length:', len(android) - 1181)
Expected length: 9659
Removing the duplicates will give us a data set of 9659 rows.
In order to remove the duplicates:
The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.
reviews_max = {}
for row in android:
name = row[0]
n_reviews = float(row[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
print(len(reviews_max))
9659
android_clean = []
already_added = []
for app in android:
name = app[0]
n_reviews = float(app[3])
if (n_reviews == reviews_max[name]) and (name not in already_added):
android_clean.append(app)
already_added.append(name)
print(len(android_clean))
9659
The dataset is now cleaned up!
Since our company only develops apps in English, we also need to remove applications that are in foreign languages.
def english(string):
for character in string:
if ord(character) > 127:
return False
return True
print(english('Instagram'))
print(english('Docs To Go™ Free Office Suite'))
print(english('中国語'))
True False False
Here we created a function that allows us to determine whether or not the name of an app is based on the American Standard Code for Information Interchange system. In a nutshell, if the number of characters exceed 127, there are not common English characters.
The issue we are facing here is that the function also detects app with special characters like Docs To Go™ Free Office Suite. We'd loose many precious data and that is not something we are trying to achieve.
def english(string):
smileyfree = 0
for character in string:
if ord(character) > 127:
smileyfree += 1
if smileyfree > 3:
return False
else:
return True
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))
True True
android_english = []
ios_english = []
for app in android_clean:
name = app[0]
if english(name):
android_english.append(app)
for app in ios:
name = app[1]
if english(name):
ios_english.append(app)
explore_data(android_english, 0, 5, True)
print('\n')
explore_data(ios_english, 0, 5, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'] Number of rows: 9614 Number of columns: 13 ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] Number of rows: 6183 Number of columns: 16
To minimize the impact of data loss, we only removed an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English.
We mentioned it earlier but as a reminder, we only produce free apps. We need to isolate the free from the non-free apps.
android_final = []
ios_final = []
for app in android_english:
price = app[7]
if price == '0':
android_final.append(app)
for app in ios_english:
price = app[4]
if price == '0.0':
ios_final.append(app)
print("Android:", len(android_final))
print("iOs:", len(ios_final))
Android: 8864 iOs: 3222
This is the final step of our data cleaning process and we can start our analysis.
As mentioned in the introduction, our company builds small apps that they put online for free. Our economical model is based on advertissements that generate revenue. We first put it on the Google Play Store. If the app becomes popular, we develop it further. After a 6-month trial, if the app becomes profitable, it is released on the Apple Store as well.
This is why this benchmarking is important to us. It allows us to distinguish which apps are better suited for the current market. We will be interested in the genre, the category, the number of times it was installed, the ratings and etc.
Our marketing team can then come up with a strategy with different apps that can be released on the Google Play Store and if successful, then on the Apple Store.
We will be focusing mostly on the Google Play Store.
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] += 1
else:
table[value] = 1
table_percentages = {}
for key in table:
percentage = (table[key] / total) * 100
table_percentages[key] = percentage
return table_percentages
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
Category = display_table(android_final, 1)
FAMILY : 18.907942238267147 GAME : 9.724729241877256 TOOLS : 8.461191335740072 BUSINESS : 4.591606498194946 LIFESTYLE : 3.9034296028880866 PRODUCTIVITY : 3.892148014440433 FINANCE : 3.7003610108303246 MEDICAL : 3.531137184115524 SPORTS : 3.395758122743682 PERSONALIZATION : 3.3167870036101084 COMMUNICATION : 3.2378158844765346 HEALTH_AND_FITNESS : 3.0798736462093865 PHOTOGRAPHY : 2.944494584837545 NEWS_AND_MAGAZINES : 2.7978339350180503 SOCIAL : 2.6624548736462095 TRAVEL_AND_LOCAL : 2.33528880866426 SHOPPING : 2.2450361010830324 BOOKS_AND_REFERENCE : 2.1435018050541514 DATING : 1.861462093862816 VIDEO_PLAYERS : 1.7937725631768955 MAPS_AND_NAVIGATION : 1.3989169675090252 FOOD_AND_DRINK : 1.2409747292418771 EDUCATION : 1.1620036101083033 ENTERTAINMENT : 0.9589350180505415 LIBRARIES_AND_DEMO : 0.9363718411552346 AUTO_AND_VEHICLES : 0.9250902527075812 HOUSE_AND_HOME : 0.8235559566787004 WEATHER : 0.8009927797833934 EVENTS : 0.7107400722021661 PARENTING : 0.6543321299638989 ART_AND_DESIGN : 0.6430505415162455 COMICS : 0.6204873646209386 BEAUTY : 0.5979241877256317
Apps in the family category tend to be more present on the store. By checking the Google Play Store website, we can see that those apps touch everyone, from 4+ to 17+ public targets. This category also gathers a lot of quick games for children and adults. Basically, it appears to be the largest category regrouping different genres for a large general public target.
Second category regoups the games and tools come third.
This is a good indication as to know which public is the more represented on this store. This first insight gives us the indication that "all public" apps will be more downloaded than others but we will confirm this later in our analysis.
Genres = display_table(android_final, -4)
Tools : 8.449909747292418 Entertainment : 6.069494584837545 Education : 5.347472924187725 Business : 4.591606498194946 Productivity : 3.892148014440433 Lifestyle : 3.892148014440433 Finance : 3.7003610108303246 Medical : 3.531137184115524 Sports : 3.463447653429603 Personalization : 3.3167870036101084 Communication : 3.2378158844765346 Action : 3.1024368231046933 Health & Fitness : 3.0798736462093865 Photography : 2.944494584837545 News & Magazines : 2.7978339350180503 Social : 2.6624548736462095 Travel & Local : 2.3240072202166067 Shopping : 2.2450361010830324 Books & Reference : 2.1435018050541514 Simulation : 2.0419675090252705 Dating : 1.861462093862816 Arcade : 1.8501805054151623 Video Players & Editors : 1.7712093862815883 Casual : 1.7599277978339352 Maps & Navigation : 1.3989169675090252 Food & Drink : 1.2409747292418771 Puzzle : 1.128158844765343 Racing : 0.9927797833935018 Role Playing : 0.9363718411552346 Libraries & Demo : 0.9363718411552346 Auto & Vehicles : 0.9250902527075812 Strategy : 0.9138086642599278 House & Home : 0.8235559566787004 Weather : 0.8009927797833934 Events : 0.7107400722021661 Adventure : 0.6768953068592057 Comics : 0.6092057761732852 Beauty : 0.5979241877256317 Art & Design : 0.5979241877256317 Parenting : 0.4963898916967509 Card : 0.45126353790613716 Casino : 0.42870036101083037 Trivia : 0.41741877256317694 Educational;Education : 0.39485559566787 Board : 0.3835740072202166 Educational : 0.3722924187725632 Education;Education : 0.33844765342960287 Word : 0.2594765342960289 Casual;Pretend Play : 0.236913357400722 Music : 0.2030685920577617 Racing;Action & Adventure : 0.16922382671480143 Puzzle;Brain Games : 0.16922382671480143 Entertainment;Music & Video : 0.16922382671480143 Casual;Brain Games : 0.13537906137184114 Casual;Action & Adventure : 0.13537906137184114 Arcade;Action & Adventure : 0.12409747292418773 Action;Action & Adventure : 0.10153429602888085 Educational;Pretend Play : 0.09025270758122744 Simulation;Action & Adventure : 0.078971119133574 Parenting;Education : 0.078971119133574 Entertainment;Brain Games : 0.078971119133574 Board;Brain Games : 0.078971119133574 Parenting;Music & Video : 0.06768953068592057 Educational;Brain Games : 0.06768953068592057 Casual;Creativity : 0.06768953068592057 Art & Design;Creativity : 0.06768953068592057 Education;Pretend Play : 0.056407942238267145 Role Playing;Pretend Play : 0.04512635379061372 Education;Creativity : 0.04512635379061372 Role Playing;Action & Adventure : 0.033844765342960284 Puzzle;Action & Adventure : 0.033844765342960284 Entertainment;Creativity : 0.033844765342960284 Entertainment;Action & Adventure : 0.033844765342960284 Educational;Creativity : 0.033844765342960284 Educational;Action & Adventure : 0.033844765342960284 Education;Music & Video : 0.033844765342960284 Education;Brain Games : 0.033844765342960284 Education;Action & Adventure : 0.033844765342960284 Adventure;Action & Adventure : 0.033844765342960284 Video Players & Editors;Music & Video : 0.02256317689530686 Sports;Action & Adventure : 0.02256317689530686 Simulation;Pretend Play : 0.02256317689530686 Puzzle;Creativity : 0.02256317689530686 Music;Music & Video : 0.02256317689530686 Entertainment;Pretend Play : 0.02256317689530686 Casual;Education : 0.02256317689530686 Board;Action & Adventure : 0.02256317689530686 Video Players & Editors;Creativity : 0.01128158844765343 Trivia;Education : 0.01128158844765343 Travel & Local;Action & Adventure : 0.01128158844765343 Tools;Education : 0.01128158844765343 Strategy;Education : 0.01128158844765343 Strategy;Creativity : 0.01128158844765343 Strategy;Action & Adventure : 0.01128158844765343 Simulation;Education : 0.01128158844765343 Role Playing;Brain Games : 0.01128158844765343 Racing;Pretend Play : 0.01128158844765343 Puzzle;Education : 0.01128158844765343 Parenting;Brain Games : 0.01128158844765343 Music & Audio;Music & Video : 0.01128158844765343 Lifestyle;Pretend Play : 0.01128158844765343 Lifestyle;Education : 0.01128158844765343 Health & Fitness;Education : 0.01128158844765343 Health & Fitness;Action & Adventure : 0.01128158844765343 Entertainment;Education : 0.01128158844765343 Communication;Creativity : 0.01128158844765343 Comics;Creativity : 0.01128158844765343 Casual;Music & Video : 0.01128158844765343 Card;Action & Adventure : 0.01128158844765343 Books & Reference;Education : 0.01128158844765343 Art & Design;Pretend Play : 0.01128158844765343 Art & Design;Action & Adventure : 0.01128158844765343 Arcade;Pretend Play : 0.01128158844765343 Adventure;Education : 0.01128158844765343
In terms of genres, tool apps tend to dominate the Play Store. They stricly followed by entertainement and education. Again, all of thoses genres can be found in the family category.
prime_genre = display_table(ios_final, -5)
Games : 58.16263190564867 Entertainment : 7.883302296710118 Photo & Video : 4.9658597144630665 Education : 3.662321539416512 Social Networking : 3.2898820608317814 Shopping : 2.60707635009311 Utilities : 2.5139664804469275 Sports : 2.1415270018621975 Music : 2.0484171322160147 Health & Fitness : 2.0173805090006205 Productivity : 1.7380509000620732 Lifestyle : 1.5828677839851024 News : 1.3345747982619491 Travel : 1.2414649286157666 Finance : 1.1173184357541899 Weather : 0.8690254500310366 Food & Drink : 0.8069522036002483 Reference : 0.5586592178770949 Business : 0.5276225946617008 Book : 0.4345127250155183 Navigation : 0.186219739292365 Medical : 0.186219739292365 Catalogs : 0.12414649286157665
The Apple Store seems to have a different way of categorizing its apps. Game is by far the most represented category by 58%.
One way to see the most popular apps would to rank them by rating. However, since our company lives on ads generated revenues, we will focus on the number of time an app will be installed on a device. It will give us a clear indication of the most popular application that there is on the stores and the state of the market.
categories_android = freq_table(android_final, 1)
for category in categories_android:
total = 0
len_category = 0
for app in android_final:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
total += float(n_installs)
len_category += 1
avg_n_installs = total / len_category
print(category, ':', avg_n_installs)
BUSINESS : 1712290.1474201474 PRODUCTIVITY : 16787331.344927534 BEAUTY : 513151.88679245283 BOOKS_AND_REFERENCE : 8767811.894736841 PHOTOGRAPHY : 17840110.40229885 HOUSE_AND_HOME : 1331540.5616438356 MEDICAL : 120550.61980830671 COMMUNICATION : 38456119.167247385 PARENTING : 542603.6206896552 TRAVEL_AND_LOCAL : 13984077.710144928 ART_AND_DESIGN : 1986335.0877192982 ENTERTAINMENT : 11640705.88235294 AUTO_AND_VEHICLES : 647317.8170731707 MAPS_AND_NAVIGATION : 4056941.7741935486 DATING : 854028.8303030303 SOCIAL : 23253652.127118643 SPORTS : 3638640.1428571427 LIFESTYLE : 1437816.2687861272 COMICS : 817657.2727272727 HEALTH_AND_FITNESS : 4188821.9853479853 VIDEO_PLAYERS : 24727872.452830188 EVENTS : 253542.22222222222 PERSONALIZATION : 5201482.6122448975 EDUCATION : 1833495.145631068 LIBRARIES_AND_DEMO : 638503.734939759 SHOPPING : 7036877.311557789 FAMILY : 3695641.8198090694 NEWS_AND_MAGAZINES : 9549178.467741935 FINANCE : 1387692.475609756 FOOD_AND_DRINK : 1924897.7363636363 WEATHER : 5074486.197183099 GAME : 15588015.603248259 TOOLS : 10801391.298666667
Communication apps have the most installs (38,456,119). However, if we take a look at the Google Play Store, we can notice that apps like Whatsapp or Messenger are heavily downloaded. Let's check it out.
for app in android_final:
if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
or app[5] == '500,000,000+'
or app[5] == '100,000,000+'):
print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+
The communication category tends to be heavily dominated by giants in the industry of fast text messages. It might not be a good idea to dive into this category as our apps would be drowned.
The game genre seems to be pretty popular as well, but our previous tend to show that this genre is also a little bit saturated. If we really want our apps to be popular quickly, we need to find another pattern. Let's now check the situation for the Apple Store so we can gather more information.
genres_ios = freq_table(ios_final, -5)
for genre in genres_ios:
total = 0
len_genre = 0
for app in ios_final:
genre_app = app[-5]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_n_ratings = total / len_genre
print(genre, ':', avg_n_ratings)
Games : 22788.6696905016 Medical : 612.0 Catalogs : 4004.0 Social Networking : 71548.34905660378 Utilities : 18684.456790123455 Weather : 52279.892857142855 Health & Fitness : 23298.015384615384 Finance : 31467.944444444445 Music : 57326.530303030304 Food & Drink : 33333.92307692308 Shopping : 26919.690476190477 Navigation : 86090.33333333333 Travel : 28243.8 Entertainment : 14029.830708661417 Lifestyle : 16485.764705882353 Business : 7491.117647058823 Sports : 23008.898550724636 Reference : 74942.11111111111 Book : 39758.5 Photo & Video : 28441.54375 Productivity : 21028.410714285714 Education : 7003.983050847458 News : 21248.023255813954
Surprisingly, the app genre with the more installs is the navigation category. That is interesting as apps such as Waze or Google Maps are heavily popular.
Games and Social Networking also tend to be popular. But as we previously saw, such an industry is quickly dominated by the competition.
From the data we gathered, it will not be easy to distance ourselves from the competition. If there is one thing for certain, all public apps will be more popular than apps dedicated to a special target (such as 17+).
Social Networking (including communicating apps), Games and Navigation apps are the most popular among the installed apps. As our aim to quickly earn users to develop our apps futher, I suggest we create an hybdrid.
We have been able to distinguish a few criteria:
An app that could encompasses those three criteria would be present in all the categories above and have a highly addictive power. The app should be able to immerse the user with immerse navigating systems (such as Pokemon Go for example), being able to socially interact (I suggest to contrusct the app with social media APIs) with friends and family or other users and finally to be a game.
I strongly suggest to dive deeper in other metrics and find a socially current popular topic to base on the story-telling of the app.