The goal of that project is to analyze data to understand what type of apps are likely to attract more users on Google Play and the App Store. To do this, I'll need to collect, explore and analyze data about mobile apps available on these platforms.
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. Datasets that will be used in my analysis were found at Kaggle here and here and were scraped from roughly at the same time, in 2018.
The first step is to open csv files 'AppleStore.csv' and 'googleplaystore.csv'. For that task I have created a function read_csv().
from csv import reader
# create a function to read csv files
def read_csv(csv):
opened_csv = open(csv)
read_csv = reader(opened_csv)
dataset = list(read_csv)
return dataset
# read Apple Store
ios = read_csv('AppleStore.csv')
header_ios = ios[0]
ios = ios[1:]
# read Google Play Store
android = read_csv('googleplaystore.csv')
header_android = android[0]
android = android[1:]
To make it easier to explore datasets, I used a function named explore_data() that:
Next that function was implemented for both datasets.
# define the function
def explore_data(dataset, start, end, header=None):
print('Number of rows in dataset: {}.'.format(len(dataset)))
print('Number of columns in dataset: {}.'.format(len(dataset[0])))
if header is None:
print('Column names are:', dataset[0])
else:
print('Column names are', header)
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print() # adds a new (empty) line after each row
# explore ios dataset
explore_data(ios, 0, 3, header_ios)
# explore android dataset
explore_data(android, 0, 3, header_android)
Number of rows in dataset: 7197. Number of columns in dataset: 16. Column names are ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows in dataset: 10841. Number of columns in dataset: 13. Column names are ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']
# find free apps in Apple Store dataset
ios_free = []
for row in ios:
if row[4] == '0.0':
ios_free.append(row)
print('Apple Store free apps dataset includes', len(ios_free), 'rows', '\n')
# find free apps in Google Play Store dataset
android_free = []
for row in android:
if row[6] == 'Free':
android_free.append(row)
print('Google Play Store free apps dataset includes', len(android_free), 'rows', '\n')
Apple Store free apps dataset includes 4056 rows Google Play Store free apps dataset includes 10039 rows
At the first step we check both datasets for duplicates and print some of them to see, how much are they identical to each other.
# check for duplicates in Apple Store dataset
unique_names_ios = []
duplicate_names_ios = []
for row in ios_free:
name = row[1]
if name in unique_names_ios:
duplicate_names_ios.append(name)
else:
unique_names_ios.append(name)
print('There are', len(duplicate_names_ios), 'duplicates in the Apple Store dataset.', '\n')
for row in ios_free:
name = row[1]
if name in duplicate_names_ios[:1]:
print(row)
There are 2 duplicates in the Apple Store dataset. ['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1'] ['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
# check for duplicates in Google Play Store dataset
unique_names_android = []
duplicate_names_android = []
for row in android_free:
name = row[0]
if name in unique_names_android:
duplicate_names_android.append(name)
else:
unique_names_android.append(name)
print('There are', len(duplicate_names_android), 'duplicates in the Google Play Store dataset.', '\n')
print('Here are duplicate rows for one application. We can see how they differ from each other.', '\n')
for row in android_free:
name = row[0]
if name in duplicate_names_android[:1]:# print only duplicates for one app
print(row)
There are 1135 duplicates in the Google Play Store dataset. Here are duplicate rows for one application. We can see how they differ from each other. ['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'] ['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'] ['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
The main difference in duplicate rows happens on the the number of users' ratings or reviews (column 6
in Apple Store dataset and column 4
in Google Play Store dataset). The different numbers show that the data was collected at different time.
Next step in my data cleaning process is to only leave rows with the maximum number of reviews or ratings and add them to the cleaned datasets.
Finally I check the number of rows in datasets testing if I will get the same number with different methods of count.
# select only rows with maximum number of ratings in Apple Store dataset
clean_data_ios_dict = {}
for row in ios_free:
name = row[1]
n_ratings = float(row[5])
if name not in clean_data_ios_dict or n_ratings > float(clean_data_ios_dict[name][5]):
clean_data_ios_dict[name] = row
# convert dictionary with Apple Store data to list of lists
clean_data_ios = clean_data_ios_dict.values()
# check the result
print('Expected length of cleaned Apple Store dataset is', len(ios_free)-len(duplicate_names_ios))
print('Length of final clean Apple Store dataset is:', len(clean_data_ios), '\n')
Expected length of cleaned Apple Store dataset is 4054 Length of final clean Apple Store dataset is: 4054
# select only rows with maximum number of reviews in Google Play Store dataset
clean_data_android_dict = {}
for row in android_free:
name = row[0]
n_reviews = float(row[3])
if name not in clean_data_android_dict or n_reviews > float(clean_data_android_dict[name][3]):
clean_data_android_dict[name] = row
# convert dictionary with Google Play Store data to list of lists
clean_data_android = clean_data_android_dict.values()
# check the result
print('Expected length of cleaned Google Play Store dataset is', len(android_free)-len(duplicate_names_android))
print('Length of final clean Google Play Store dataset is:', len(clean_data_android), '\n')
Expected length of cleaned Google Play Store dataset is 8904 Length of final clean Google Play Store dataset is: 8904
Since my target markets are English-apeaking I'd like to remove from both datasets applications with non-English names. To do that I write a function is_english() to check if the name of the application is English and initialize it for Apple Store and Google Play Store datasets.
# define a function to check the name
def is_english(app):
number_of_false = 0
for letter in app:
if ord(letter) > 127:
number_of_false += 1
if number_of_false < 4:
return True
# iterate over Apple Store dataset
apple_store = []
for row in clean_data_ios:
name = row[1]
if is_english(name):
apple_store.append(row)
print("Final list of Apple Store apps has {} rows".format(len(apple_store)))
# iterate over Google Play Store
google_play_store = []
for row in clean_data_android:
name = row[0]
if is_english(name):
google_play_store.append(row)
print("Final list of Google Play Store apps has {} rows".format(len(google_play_store)))
Final list of Apple Store apps has 3220 rows Final list of Google Play Store apps has 8863 rows
The goal of the analysis is to find an idea of application which could be successful at both markets, Apple Store and Google Play Store. According to the assignment at first an app will be developed for Google Play Store and if it will be succesful, roll it out to Apple Store.
My first step is to identify the most common genres for applications. We will use column prime_genre
for AppleStore (12th position) and column Genres
for Google Play Store (10th position) to count the most common genre.
# display once more names of the columns
print('Column names for Apple Store dataset are: ', header_ios, '\n')
print('Column names for Google Play Store are: ', header_android, '\n')
# define function to create a frequency table
def freq_table(dataset, index):
freq_table = {}
total = len(dataset)
for row in dataset:
token = row[index]
if token in freq_table:
freq_table[token] += 1
else:
freq_table[token] = 1
# calculate percentages
freq_percentages = {}
for key in freq_table:
percentage = (freq_table[key]/total)*100
freq_percentages[key] = round(percentage, 2)
# print result in descending order
for key in sorted(freq_percentages, key=freq_percentages.get, reverse=True):
print(key,':', freq_percentages[key])
return freq_percentages
Column names for Apple Store dataset are: ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] Column names for Google Play Store are: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
# iterate over Apple Store dataset
prime_genres = freq_table(apple_store, 11)
print('The column prime genre includes {} genres total.'.format(len(prime_genres)), '\n')
Games : 58.14 Entertainment : 7.89 Photo & Video : 4.97 Education : 3.66 Social Networking : 3.29 Shopping : 2.61 Utilities : 2.52 Sports : 2.14 Music : 2.05 Health & Fitness : 2.02 Productivity : 1.74 Lifestyle : 1.58 News : 1.34 Travel : 1.24 Finance : 1.12 Weather : 0.87 Food & Drink : 0.81 Reference : 0.56 Business : 0.53 Book : 0.43 Navigation : 0.19 Medical : 0.19 Catalogs : 0.12 The column prime genre includes 23 genres total.
# iterate over Google Play Store dataset
# review column 'Genres'
genres = freq_table(google_play_store, 9)
print('The column Genres includes {} genres total.'.format(len(genres)), '\n')
# review column 'Category'
category = freq_table(google_play_store, 1)
print('The column Category includes {} genres total.'.format(len(category)))
Tools : 8.45 Entertainment : 6.07 Education : 5.35 Business : 4.59 Lifestyle : 3.89 Productivity : 3.89 Finance : 3.7 Medical : 3.53 Sports : 3.46 Personalization : 3.32 Communication : 3.24 Action : 3.1 Health & Fitness : 3.08 Photography : 2.94 News & Magazines : 2.8 Social : 2.66 Travel & Local : 2.32 Shopping : 2.25 Books & Reference : 2.14 Simulation : 2.04 Dating : 1.86 Arcade : 1.85 Video Players & Editors : 1.77 Casual : 1.76 Maps & Navigation : 1.4 Food & Drink : 1.24 Puzzle : 1.13 Racing : 0.99 Libraries & Demo : 0.94 Role Playing : 0.94 Auto & Vehicles : 0.93 Strategy : 0.9 House & Home : 0.82 Weather : 0.8 Events : 0.71 Adventure : 0.68 Comics : 0.61 Art & Design : 0.6 Beauty : 0.6 Parenting : 0.5 Card : 0.45 Casino : 0.43 Trivia : 0.42 Educational;Education : 0.39 Board : 0.38 Educational : 0.37 Education;Education : 0.34 Word : 0.26 Casual;Pretend Play : 0.24 Music : 0.2 Entertainment;Music & Video : 0.17 Puzzle;Brain Games : 0.17 Racing;Action & Adventure : 0.17 Casual;Action & Adventure : 0.14 Casual;Brain Games : 0.14 Arcade;Action & Adventure : 0.12 Action;Action & Adventure : 0.1 Educational;Pretend Play : 0.09 Entertainment;Brain Games : 0.08 Simulation;Action & Adventure : 0.08 Board;Brain Games : 0.08 Parenting;Education : 0.08 Art & Design;Creativity : 0.07 Casual;Creativity : 0.07 Educational;Brain Games : 0.07 Parenting;Music & Video : 0.07 Education;Pretend Play : 0.06 Education;Creativity : 0.05 Role Playing;Pretend Play : 0.05 Education;Music & Video : 0.03 Education;Action & Adventure : 0.03 Education;Brain Games : 0.03 Entertainment;Creativity : 0.03 Adventure;Action & Adventure : 0.03 Educational;Creativity : 0.03 Role Playing;Action & Adventure : 0.03 Educational;Action & Adventure : 0.03 Entertainment;Action & Adventure : 0.03 Puzzle;Action & Adventure : 0.03 Casual;Education : 0.02 Music;Music & Video : 0.02 Simulation;Pretend Play : 0.02 Puzzle;Creativity : 0.02 Sports;Action & Adventure : 0.02 Board;Action & Adventure : 0.02 Entertainment;Pretend Play : 0.02 Video Players & Editors;Music & Video : 0.02 Art & Design;Pretend Play : 0.01 Art & Design;Action & Adventure : 0.01 Comics;Creativity : 0.01 Lifestyle;Pretend Play : 0.01 Entertainment;Education : 0.01 Arcade;Pretend Play : 0.01 Strategy;Action & Adventure : 0.01 Music & Audio;Music & Video : 0.01 Health & Fitness;Education : 0.01 Adventure;Education : 0.01 Casual;Music & Video : 0.01 Video Players & Editors;Creativity : 0.01 Travel & Local;Action & Adventure : 0.01 Tools;Education : 0.01 Parenting;Brain Games : 0.01 Health & Fitness;Action & Adventure : 0.01 Trivia;Education : 0.01 Lifestyle;Education : 0.01 Card;Action & Adventure : 0.01 Books & Reference;Education : 0.01 Simulation;Education : 0.01 Puzzle;Education : 0.01 Role Playing;Brain Games : 0.01 Strategy;Education : 0.01 Racing;Pretend Play : 0.01 Communication;Creativity : 0.01 Strategy;Creativity : 0.01 The column Genres includes 114 genres total. FAMILY : 18.9 GAME : 9.73 TOOLS : 8.46 BUSINESS : 4.59 LIFESTYLE : 3.9 PRODUCTIVITY : 3.89 FINANCE : 3.7 MEDICAL : 3.53 SPORTS : 3.4 PERSONALIZATION : 3.32 COMMUNICATION : 3.24 HEALTH_AND_FITNESS : 3.08 PHOTOGRAPHY : 2.94 NEWS_AND_MAGAZINES : 2.8 SOCIAL : 2.66 TRAVEL_AND_LOCAL : 2.34 SHOPPING : 2.25 BOOKS_AND_REFERENCE : 2.14 DATING : 1.86 VIDEO_PLAYERS : 1.79 MAPS_AND_NAVIGATION : 1.4 FOOD_AND_DRINK : 1.24 EDUCATION : 1.16 ENTERTAINMENT : 0.96 LIBRARIES_AND_DEMO : 0.94 AUTO_AND_VEHICLES : 0.93 HOUSE_AND_HOME : 0.82 WEATHER : 0.8 EVENTS : 0.71 PARENTING : 0.65 ART_AND_DESIGN : 0.64 COMICS : 0.62 BEAUTY : 0.6 The column Category includes 33 genres total.
Apple Store dataset
Google Play Store
Comparison
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play Store dataset, I can find this information in the Installs
column, but this information is missing for the App Store data set. As a workaround, I'll take the total number of user ratings as a proxy, which can be found in the rating_count_tot
column.
# count sum of user ratings for every genre for Apple Store dataset
ratings_dict = {}
for row in apple_store:
genre = row[11]
rating = float(row[5])
ratings_dict.setdefault(genre, []).append(rating)
# calculate average
for genre, rating in ratings_dict.items():
avg_rating = round(sum(rating)/len(rating))
ratings_dict[genre] = avg_rating
# sort and print resulting dictionary
for genre in sorted(ratings_dict, key=ratings_dict.get, reverse=True):
print(genre, ratings_dict[genre])
Navigation 86090 Reference 74942 Social Networking 71548 Music 57327 Weather 52280 Book 39758 Food & Drink 33334 Finance 31468 Photo & Video 28442 Travel 28244 Shopping 26920 Health & Fitness 23298 Sports 23009 Games 22813 News 21248 Productivity 21028 Utilities 18684 Lifestyle 16486 Entertainment 14030 Business 7491 Education 7004 Catalogs 4004 Medical 612
To use the column Installs
from Google Play Store dataset I need at first explore in which format and how is represented that data.
After I print a few values from the column, I realise that:
I will clean and convert the values from the column to integers and use these numbers as a certain approximation to real sum of installs.
# check the format of the column Installs
for row in google_play_store[:5]:
print(row[5])
# convert the column Installs from Google Play Store dataset to floats
for row in google_play_store:
row[5] = int(row[5].replace('+', '').replace(',', ''))
10,000+ 500,000+ 5,000,000+ 50,000,000+ 100,000+
# count sum of installs for every category for Google Play Store
installs_dict = {}
for row in google_play_store:
category = row[1]
installs = row[5]
installs_dict.setdefault(category, []).append(installs)
# calculate average
for category, installs in installs_dict.items():
avg_installs = round(sum(installs)/len(installs))
installs_dict[category] = avg_installs
# sort and print resulting dictionary
for category in sorted(installs_dict, key=installs_dict.get, reverse=True):
print(category, installs_dict[category])
COMMUNICATION 38456119 VIDEO_PLAYERS 24727872 SOCIAL 23253652 PHOTOGRAPHY 17840110 PRODUCTIVITY 16787331 GAME 15588016 TRAVEL_AND_LOCAL 13984078 ENTERTAINMENT 11640706 TOOLS 10801391 NEWS_AND_MAGAZINES 9549178 BOOKS_AND_REFERENCE 8767812 SHOPPING 7036877 PERSONALIZATION 5201483 WEATHER 5074486 HEALTH_AND_FITNESS 4188822 MAPS_AND_NAVIGATION 4056942 FAMILY 3697848 SPORTS 3638640 ART_AND_DESIGN 1986335 FOOD_AND_DRINK 1924898 EDUCATION 1833495 BUSINESS 1712290 LIFESTYLE 1437816 FINANCE 1387692 HOUSE_AND_HOME 1331541 DATING 854029 COMICS 817657 AUTO_AND_VEHICLES 647318 LIBRARIES_AND_DEMO 638504 PARENTING 542604 BEAUTY 513152 EVENTS 253542 MEDICAL 120551
For Apple Store dataset top-3 of the most popular genres are Navigation, Reference and Social Networking. For Google Play Store top-3 categories are: Communication, Video Players, Social. But there is a high probability that the most popular are few leaders for specific category, e.g. Facebook with millions of users for Social category. Let's now check next 3
To check that at first I'll examine the top-6 apps for every top-3 category from Google Play Store Dataset.
# check top-6 apps mostly installed at Google Play Store in top-3 categories
# communication
print('The most popular apps in communication category are:', '\n')
for row in google_play_store:
if row[1] == 'COMMUNICATION' and row[5] > 100000000:
print(row[0], ':', row[5])
print('\n')
# video players
print('The most popular apps in video players category are:', '\n')
for row in google_play_store:
if row[1] == 'VIDEO_PLAYERS' and row[5] > 100000000:
print(row[0], ':', row[5])
print('\n')
#game
print('The most popular apps in social category are:', '\n')
for row in google_play_store:
if row[1] == 'SOCIAL' and row[5] > 100000000:
print(row[0], ':', row[5])
The most popular apps in communication category are: Messenger – Text and Video Chat for Free : 1000000000 WhatsApp Messenger : 1000000000 Google Chrome: Fast & Secure : 1000000000 Gmail : 1000000000 Hangouts : 1000000000 Viber Messenger : 500000000 imo free video calls and chat : 500000000 Google Duo - High Quality Video Calls : 500000000 UC Browser - Fast Download Private & Secure : 500000000 Skype - free IM & video calls : 1000000000 LINE: Free Calls & Messages : 500000000 The most popular apps in video players category are: YouTube : 1000000000 Google Play Movies & TV : 1000000000 MX Player : 500000000 The most popular apps in social category are: Facebook : 1000000000 Instagram : 1000000000 Facebook Lite : 500000000 Snapchat : 500000000 Google+ : 1000000000
Indeed top-3 genres are dominated by few apps attracting millions of users. If we would like to succesfully compete with them the only way to do that is to offer a fundamentally new approaches or functionality. That is certainly not an easy task. Probably better approach will be to choose categories that are in the middle of the list. Let's check 3 categories from top-10 which : Travel_and_local and Game.
# calculate average total number of installs for Google Play Store
total = 0
for category in installs_dict:
total += int(installs_dict[category])
mean_installs = total/len(installs_dict)
print(round(mean_installs))
7281600
Average number of installs estimates at 7,281,600, in our list of categories there are two with a total number of that range: Books_and_reference and Shopping. Next I will explore them.
sorted = sorted(google_play_store, key = lambda x: x[5], reverse=True)
# books_and_references
print('The most popular apps in Books_and_reference category are:', '\n')
for row in sorted:
if row[1] == 'BOOKS_AND_REFERENCE' and row[5] > 10000000:
print(row[0], ':', row[5])
print('\n')
# game
print('The most popular apps in Shopping category are:', '\n')
for row in sorted:
if row[1] == 'SHOPPING' and row[5] > 10000000:
print(row[0], ':', row[5])
The most popular apps in travel_and_local category are: Google Play Books : 1000000000 Wattpad 📖 Free Books : 100000000 Amazon Kindle : 100000000 Bible : 100000000 Audiobooks from Audible : 100000000 The most popular apps in Game category are: Wish - Shopping Made Fun : 100000000 AliExpress - Smarter Shopping, Better Living : 100000000 eBay: Buy & Sell this Summer - Discover Deals Now! : 100000000 Amazon Shopping : 100000000 Flipkart Online Shopping App : 100000000 letgo: Buy & Sell Used Stuff, Cars & Real Estate : 50000000 Lazada - Online Shopping & Deals : 50000000 OLX - Buy and Sell : 50000000 The birth : 50000000 Mercado Libre: Find your favorite brands : 50000000 Myntra Online Shopping App : 50000000 Groupon - Shop Deals, Discounts & Coupons : 50000000
The category Books_and_references looks more interesting mostly because apart from few leaders like Google Play books or Amazon Kindle selling all sorts of books it contains an app with free books - Wattpad as well as religious text. I'd like to check which apps are in the middle range regarding total number of installs.
# check middle range
print('Apps of the middle range in Books_and_reference category:', '\n')
for row in sorted:
if row[1] == 'BOOKS_AND_REFERENCE' and (row[5] < 10000000 and row[5] > 4000000):
print(row[0], ':', row[5])
print('\n')
Apps of the middle range in Books_and_reference category: AlReader -any text book reader : 5000000 Ebook Reader : 5000000 Read books online : 5000000 Ancestry : 5000000 Dictionary - WordWeb : 5000000 50000 Free eBooks & Free AudioBooks : 5000000 Al Quran : EAlim - Translations & MP3 Offline : 5000000 Bible KJV : 5000000 English to Hindi Dictionary : 5000000
Apps in the middle range also contain religiuos texts, not only whole-sellers of books or dictionaries. Next step is to find out if the same genres are popular among users of Apple Store. Let's check top-3 genres in Apple Store dataset.
# check apps mostly rated at Apple Store
# navigation
print('There are few apps at Navigation genre:')
for row in apple_store:
if row[11] == 'Navigation':
print(row[1], ':', row[5])
print('\n')
# reference
print('There are more apps at Reference genre dominated by dictionaries and religious texts:')
for row in apple_store:
if row[11] == 'Reference':
print(row[1], ':', row[5])
print('\n')
# social networking
print('There are plenty apps at Social Networking genre but dominated by few of them:')
for row in apple_store:
if row[11] == 'Social Networking':
print(row[1], ':', row[5])
There are few apps at Navigation genre: Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5 There are more apps at Reference genre dominated by dictionaries and religious texts: Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0 There are plenty apps at Social Networking genre but dominated by few of them: Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856 Match™ - #1 Dating App. : 60659 Skype for iPad : 60163 POF - Best Dating App for Conversations : 52642 Timehop : 49510 Find My Family, Friends & iPhone - Life360 Locator : 43877 Whisper - Share, Express, Meet : 39819 Hangouts : 36404 LINE PLAY - Your Avatar World : 34677 WeChat : 34584 Badoo - Meet New People, Chat, Socialize. : 34428 Followers + for Instagram - Follower Analytics : 28633 GroupMe : 28260 Marco Polo Video Walkie Talkie : 27662 Miitomo : 23965 SimSimi : 23530 Grindr - Gay and same sex guys chat, meet and date : 23201 Wishbone - Compare Anything : 20649 imo video calls and chat : 18841 After School - Funny Anonymous School News : 18482 Quick Reposter - Repost, Regram and Reshare Photos : 17694 Weibo HD : 16772 Repost for Instagram : 15185 Live.me – Live Video Chat & Make Friends Nearby : 14724 Nextdoor : 14402 Followers Analytics for Instagram - InstaReport : 13914 YouNow: Live Stream Video Chat : 12079 FollowMeter for Instagram - Followers Tracking : 11976 LINE : 11437 eHarmony™ Dating App - Meet Singles : 11124 Discord - Chat for Gamers : 9152 QQ : 9109 Telegram Messenger : 7573 Weibo : 7265 Periscope - Live Video Streaming Around the World : 6062 Chat for Whatsapp - iPad Version : 5060 QQ HD : 5058 Followers Analysis Tool For Instagram App Free : 4253 live.ly - live video streaming : 4145 Houseparty - Group Video Chat : 3991 SOMA Messenger : 3232 Monkey : 3060 Down To Lunch : 2535 Flinch - Video Chat Staring Contest : 2134 Highrise - Your Avatar Community : 2011 LOVOO - Dating Chat : 1985 PlayStation®Messages : 1918 BOO! - Video chat camera with filters & stickers : 1805 Qzone : 1649 Chatous - Chat with new people : 1609 Kiwi - Q&A : 1538 GhostCodes - a discovery app for Snapchat : 1313 Jodel : 1193 FireChat : 1037 Google Duo - simple video calling : 1033 Fiesta by Tango - Chat & Meet New People : 885 Google Allo — smart messaging : 862 Peach — share vividly : 727 Hey! VINA - Where Women Meet New Friends : 719 Battlefield™ Companion : 689 All Devices for WhatsApp - Messenger for iPad : 682 Chat for Pokemon Go - GoChat : 500 IAmNaughty – Dating App to Meet New People Online : 463 Qzone HD : 458 Zenly - Locate your friends in realtime : 427 League of Legends Friends : 420 豆瓣 : 407 Candid - Speak Your Mind Freely : 398 知乎 : 397 Selfeo : 366 Fake-A-Location Free ™ : 354 Popcorn Buzz - Free Group Calls : 281 Fam — Group video calling for iMessage : 279 QQ International : 274 Ameba : 269 SoundCloud Pulse: for creators : 240 Tantan : 235 Cougar Dating & Life Style App for Mature Women : 213 Rawr Messenger - Dab your chat : 180 WhenToPost: Best Time to Post Photos for Instagram : 158 Inke—Broadcast an amazing life : 147 Mustknow - anonymous video Q&A : 53 CTFxCmoji : 39 Lobi : 36 Chain: Collaborate On MyVideo Story/Group Video : 35 botman - Real time video chat : 7 BestieBox : 0 MATCH ON LINE chat : 0 niconico ch : 0 LINE BLOG : 0 bit-tube - Live Stream Video Chat : 0
After the exploration of Apple Store dataset I see that the Reference genre is quite diverse dominated by dictionaries and religious texts but is not dominated by few leaders, so it could be an option to build an app in the same genre.
In this project, I analyzed datasets from Apple Store and Google Play Store with the goal of recommending an app genre that can be profitable for both markets. My final recommendation is to build some application based around a niche but commonly used text. Perfect candidate will be some sort of religiuos or quasi-religious qult text (Marie Kondo, Harry Potter or Satan Bible?). User experience can be enhanced with audio-versions or even gamification. Of course that recommendation should be backed by further study.