#!/usr/bin/env python
# coding: utf-8

# # Profitable App Profiles for the App Store and Google Play Markets
# ## Introduction
# The aim of this project is to identify profitable Android (Google Play) and iOS (the App Store) mobile apps.
# 
# The apps in consideration are free to download and install, and the main source of the company's revenue consists of in-app ads. This means the revenue for any given app is mostly influenced by the number of its users - the more users that see and engage with the ads, the better. Hence it is necessary to analyze available data to understand what type of apps are likely to attract more users both on Google Play and the App Store.
# ## 1. Data Collection and Exploration
# As of [September 2018](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/), there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.
# 
# Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try first to analyze a sample of the data instead, to see if we can find any relevant existing data at no cost. For this purpose, there are 2 data sets available in the form of CSV files:
# 
# - [Android apps data set](https://www.kaggle.com/lava18/google-play-store-apps) contains data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.
# - [IOS apps data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) contains data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. 
# 
# To open and explore these two data sets, a function `explore_data()` was created:

# In[1]:


def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


# In[2]:


# Opening the data sets and saving both as lists of lists
from csv import reader

opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]


# In[3]:


explore_data(android, 0, 3, True)


# In[4]:


explore_data(ios, 0, 3, True)


# In[5]:


# Android data set columns
print(android_header)

print('\n')

# iOS data set columns
print(ios_header)


# The Google Play  data set (Android apps) contains 10,841 apps and 13 columns. The most informative columns for us seem to be the following: `'App'`, `'Category'`, `'Rating'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, `'Content Rating'` and `'Genres'`.
# 
# The App Store data set (iOS apps) contains 7,197 apps and 16 columns. The columns potentially useful for our data analysis might be the following: `'track_name'`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, `'user_rating'`, `'user_rating_ver'`, `'cont_rating'` and `'prime_genre'`.
# 
# For further details about both data sets and the meaning of each column, the corresponding data set documentation can be addressed: 
# [Android apps data set](https://www.kaggle.com/lava18/google-play-store-apps) and [iOS apps data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).
# ## 2. Data Cleaning
# ### 2.1. Deleting Wrong Data
# For both data sets discussion sections are available here: [for Google Play](https://www.kaggle.com/lava18/google-play-store-apps/discussion) and [for the App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion). In the discussion section dedicated to **Google Play data set** in [one of the topics](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) it was reported a wrong value for the row 10,472 (missing `'Rating'` and a column shift for next columns).

# In[6]:


print(android_header)
print('\n')
print(android[10472])


# Inspecting the reported row, we can see that the missing value is actually not `'Rating'` but `'Category'`, and also for `'Genres'` there is no value. For comparison, let's check some other row of this data set:

# In[7]:


print(android_header)
print('\n')
print(android[5])


# Hence the row 10,472 indeed has a missing value for `'Category'`, empty cell  for `'Genres'`, and all the values in between are shifted to the left. This row has to be removed from the data set:

# In[8]:


del android[10472]


# ### 2.2. Deleting Duplicates
# Exploring the **Google Play data set**, it was discovered that some apps have duplicate entries. For instance, Instagram has 4 entries:

# In[9]:


for app in android:
    name = app[0]
    if  name == 'Instagram':
        print(app)


# In total, there are 1,181 cases where an app occurs more than once:

# In[10]:


# Creating the lists of duplicate apps and unique apps
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])


# We need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.
# 
# Returning to the rows we printed for the Instagram app, the main difference happens on the 4th position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times:

# In[11]:


for app in android:
    name = app[0]
    if  name == 'Instagram':
        print(app)


# We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

# In[12]:


# Creating a dictionary with the highest number of reviews for each app
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max and reviews_max[name] < n_reviews) or name not in reviews_max:
        reviews_max[name] = n_reviews     


# Given that in the Google Play data set 1,181 duplicates were detected, after we remove the duplicates, we should be left with 9,659 rows. We expect also the length of the dictionary to be equal to 9,659:

# In[13]:


print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))


# In[14]:


#  Creating a new data set without duplicates (one entry per app)
android_clean = []
already_added = []
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)


# Checking the length of the resulting data set (again, expected value is 9,659):

# In[15]:


print(len(android_clean))


# In[16]:


print(android_clean[:5])


# ### 2.3. Deleting Non-English Apps
# Since our company uses only English to develop its apps,
# we'd like to analyze only the apps that are directed toward an English-speaking audience.
# 
# Inspecting both data sets, it was detected that both have also apps with non-English names, that is containing symbols unusual for English texts (i.e. not English letters, digits 0-9, punctuation marks, and special symbols). These apps have to be removed.

# In[17]:


print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[442][0])
print(android_clean[7940][0])


# According to the ASCII system, the numbers corresponding to the set of common English characters are all in the range 0-127. Hence we have to create a function to identify if each symbol of each app name belongs or not to this range. If it doesn't, the app cannot be considered for further data analysis and has to be removed from the data set.

# In[18]:


def english_apps(string):
    for symbol in string:
        if ord(symbol) > 127:
            return False
    return True


# Let's check this function on some apps:

# In[19]:


print(english_apps('Instagram'))
print(english_apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_apps('Docs To Go™ Free Office Suite'))
print(english_apps('Instachat 😜'))


# It results that sometimes the function cannot correctly identify certain English app names containing emojis and some special characters that fall outside the ASCII range.
# In this case we can lose valuable data. 
# 
# To minimize the impact of data loss, we'll only remove an app if its name has more than 3 characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to 3 such symbols will still be labeled as English. 

# In[20]:


# Editing the previous function
def english_apps(string):
    acceptable = 0
    for symbol in string:
        if ord(symbol) > 127:
            acceptable += 1
        if acceptable > 3:
            return False
    return True


# In[21]:


# Checking the updated function
print(english_apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_apps('Docs To Go™ Free Office Suite'))
print(english_apps('Instachat 😜'))


# Now we will filter out non-English apps from both data sets:

# In[22]:


android_cleaned_filtered = []
ios_filtered = []

for app in android_clean:    
    check = english_apps(app[0])
    if check == True:
        android_cleaned_filtered.append(app)
        
for app in ios:
    check = english_apps(app[1])
    if check == True:
        ios_filtered.append(app)       


# In[23]:


explore_data(android_cleaned_filtered, 0, 3, True)


# In[24]:


explore_data(ios_filtered, 0, 3, True)


# After filtering the data set with android apps counts 9,614 rows and the one with iOS apps 6,183 rows.
# ### 2.4. Deleting Non-Free Apps
# The company is specialized in building only free apps. Hence, before proceeding to the data analysis step, we have to remove all non-free apps from both data sets.

# In[25]:


android_final = []
ios_final = []

for app in android_cleaned_filtered:    
    if app[7] == '0':
        android_final.append(app)
        
for app in ios_filtered:
    if app[4] == '0.0':
        ios_final.append(app)


# In[26]:


print('Final number of android apps:', len(android_final))
print('Final number of iOS apps:', len(ios_final))


# Now we have 8,864 android apps and 3,222 iOS apps for further data analysis.
# ## 3. Data Analysis
# As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users, because our revenue is highly influenced by the number of people using our apps.
# 
# To minimize risks and overhead, our validation strategy for an app idea is comprised of 3 steps:
# 
# - Build a minimal Android version of the app, and add it to Google Play.
# - If the app has a good response from users, we develop it further.
# - If the app is profitable after 6 months, we build an iOS version of the app and add it to the App Store.
# 
# Because our final goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.
# ### 3.1. Finding The Most Common Genres
# Let's begin the analysis by getting a sense of what are the most common genres for each market. For Google Play data set the genres of the apps are described in the column `'Genres'` and `'Category'`, for the App Store data set - in the column `'prime_genre'`.
# 
# We'll build two functions we can use to analyze the frequency tables:
# 
# - To generate frequency tables that show percentages.
# - To display the percentages in a frequency table in a descending order.

# In[27]:


def freq_table(dataset, index):
    dictionary = {}
    number_apps = 0
    for row in dataset:
        number_apps += 1
        dictionary[row[index]] = dictionary.get(row[index], 0) + 1
            
    dictionary_percent = {}
    for key in dictionary:
        dictionary_percent[key] = (dictionary[key] / number_apps) * 100
        
    return dictionary_percent


# In[28]:


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


# In[29]:


# Prime_genre column
display_table(ios_final, -5)


# Among **iOS English free apps**, the most common genre is *Games* (58%) followed with a big gap by *Entertainment* (7.9%). The general impression is that the apps designed for entertainment (games, photo and video, social networking, sports, music) significantly dominate the App Store, in comparison to the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle). 
# 
# Judging only by the frequency table, we still cannot recommend an app profile for the App Store market, because a large number of apps for a particular genre does not necessarily imply that apps of that genre have a large number of users.

# In[30]:


# Category column
display_table(android_final, 1)


# Among **Android English free apps**, the most common categories are also of entertaining character (*FAMILY*(18.9%) and *GAME*(9.7%). However, the dispersion of percentages for the other categories is not as large as for iOS apps, and in general a more balanced landscape of both practical and fun apps is observed. The number of categories is comparable with the number of iOS apps' genres. 
# 
# If we look at the `prime_genre` column for Android apps, we will see that it is much more detailed and specified and not anymore comparable with the the number of iOS app genres:

# In[31]:


# Genres column
display_table(android_final, 9)


# Like in the previous case, from these frequency tables alone we cannot deduce anything about the genres (categories) with the most users and cannot recommend an app profile for Google Play.
# ### 3.2. Finding The Most Popular Genres
# One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for **the App Store data set**. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` column.

# In[32]:


# Calculating the average number of user ratings per app genre on the App Store:
prime_genre = freq_table(ios_final, -5)
for genre in prime_genre:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            number_rating = float(app[5])
            total += number_rating
            len_genre += 1
    average_number_rating = total / len_genre
    print(genre, ':', average_number_rating)   


# Looking at the results, a preliminary conlusion is that the most popular app genres (based on the average number of user ratings) are the following:
# 
# - *Navigation* : 86090.33333333333
# - *Reference* : 74942.11111111111
# - *Social Networking* : 71548.34905660378
# - *Music* : 57326.530303030304
# - *Weather* : 52279.892857142855
# - *Book* : 39758.5
# 
# Let's investigate more in detail each of them, in particular their contents of apps:

# In[33]:


print('Navigation')
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])
print('\n')        
print('Reference')
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])
print('\n')
print('Social Networking')
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])
print('\n')
print('Music')
for app in ios_final:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])
print('\n')
print('Weather')
for app in ios_final:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])
print('\n')
print('Book')
for app in ios_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])


# - *Navigation* genre average number of ratings seems to be heavily influenced by 2 giant apps, Waze and Google Maps, while all the other apps show quite low numbers. It means that in this case average is not a good metric to use, since the distribution is heavily skewed. If to exclude these 2 dominant apps, the average becomes very low, which makes this genre less interesting for our purposes.
# - *Reference*. A similar situation is observed: average is influenced mostly by Bible and dictionaries, with a big gap with all the other apps of this genre.
# - *Social Networking*. Even though also here we see an obvious domination of Facebook and Pinterest, we can also note that there are many other social networks for different categories of people and interests, with rather high values of the average number of ratings. We remember also that entertainment apps are very popular on the App Store, and in particular among them social networks are not the most popular ones. Hence this market niche is probably not oversaturated, while the demand is still very high (since the need of communication is always present). Also, people usually spend a lot of time  on this genre of apps. They use them for leisure, being relaxed and open to new information, so there is more chance for an in-app ad to attract their attention.
# - *Music*. Like for the first 2 genres, the distribution for this one is skewed by several very popular apps. The rest of the apps mostly demonstrate quite low average number of ratings.
# - *Weather*. This genre doesn't seem particularly interesting for our purposes, since people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require connecting the app to non-free APIs. 
# - *Book*. Here, practically, we have only 5 apps, if not to count few others, with extremely low average values. This niche is obviously undersaturated, which doesn't necessarily means a low demand. Finding some granular directions in this sphere (e.g. personal growth, psychology, business) and developing high-quality contents, we can attract probably not so large but pointedly interested audience, increasing in this way the chances of the app to become profitable.
# 
# Thus, the most promising iOS app profiles seem to be **Social Networking** and **Book**.
# 
# Our next step is to provide an app profile recommendation for the **Google Play** market. We have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough, with most values being open-ended (100+, 1,000+, 5,000+, etc.). We want to use these data anyway, after some cleaning: leaving the numbers as they are, removing commas and the plus characters, converting the numbers into `float` type.

# In[34]:


# Calculating the average number of installs per app genre on Google Play
categories = freq_table(android_final, 1)
for category in categories:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            number_installs = app[5]
            number_installs = number_installs.replace('+', '')
            number_installs = number_installs.replace(',', '')
            total += float(number_installs)
            len_category += 1
    average_number_installs = total / len_category
    print(category, ':', average_number_installs)   


# We see that the most popular app genres (based on the average number of installs) are the following:
# 
# - *COMMUNICATION* : 38456119
# - *VIDEO_PLAYERS* : 24727872
# - *SOCIAL* : 23253652
# - *PHOTOGRAPHY* : 17840110
# - *PRODUCTIVITY* : 16787331
# - *GAME* : 15588015
# - *TRAVEL_AND_LOCAL* : 13984077
# 
# Let's investigate more in detail the contents of their apps. First, it seems that these seemingly popular genres are dominated by some giant apps, with the number of installs more than 100 millions. These values, certainly, result in very biased average values. 

# In[35]:


for app in android_final:
    if app[1] == 'COMMUNICATION':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# If to exclude from consideration these numerous giant apps of *COMMUNICATION* genre, the average would be reduced roughly 10 times:

# In[36]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('COMMUNICATION')
print('Before:  38456119')
print('After:  ', average_number_installs)                                                     


# The same tendency is traced for all the other genres that look the most popular ones:

# In[37]:


for app in android_final:
    if app[1] == 'VIDEO_PLAYERS':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# In[38]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'VIDEO_PLAYERS') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('VIDEO_PLAYERS')
print('Before:  24727872')
print('After:  ', average_number_installs)   


# In[39]:


for app in android_final:
    if app[1] == 'SOCIAL':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# In[40]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'SOCIAL') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('SOCIAL')
print('Before:  23253652')
print('After:  ', average_number_installs)  


# In[41]:


for app in android_final:
    if app[1] == 'PHOTOGRAPHY':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# In[42]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PHOTOGRAPHY') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('PHOTOGRAPHY')
print('Before:  17840110')
print('After:  ', average_number_installs)  


# In[43]:


for app in android_final:
    if app[1] == 'PRODUCTIVITY':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# In[44]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PRODUCTIVITY') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('PRODUCTIVITY')
print('Before:  16787331')
print('After:  ', average_number_installs)  


# In[45]:


for app in android_final:
    if app[1] == 'GAME':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# In[46]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'GAME') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('GAME')
print('Before:  15588015')
print('After:  ', average_number_installs)  


# In[47]:


for app in android_final:
    if app[1] == 'TRAVEL_AND_LOCAL':
        number_installs = app[5]
        number_installs = number_installs.replace('+', '')
        number_installs = number_installs.replace(',', '')
        if float(number_installs) >= 100000000:
            print(app[0], ':', app[5])


# In[48]:


under_100_millions = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'TRAVEL_AND_LOCAL') and (float(n_installs) < 100000000):
        under_100_millions.append(float(n_installs))
        
average_number_installs = sum(under_100_millions) / len(under_100_millions)
print('TRAVEL_AND_LOCAL')
print('Before:  13984077')
print('After:  ', average_number_installs) 


# This investigation reveals some insights for each of the most popular genres.
# - *COMMUNICATION* is oversaturated by numerous giant apps, with a big gap with all the others. This implies huge competition in given sphere, with very limited chance for newcomers. Apart from that, this category doesn't appear in a clear way among the genres of the App Store, and our aim is to recommend an app profile that shows potential for being profitable on both the App Store and Google Play.
# - *VIDEO_PLAYERS*, again, is oversaturated by some giant apps (even though fewer than the *COMMUNICATION* genre), with a big gap with all the others. Also, this category doesn't appear clearly among the genres of the App Store, only as a part of the *Photo & Video* genre, so it's impossible to compare these two genres in a direct way. In addition, the *Photo & Video* genre was not one of the most popular on the App Store.
# - *SOCIAL*. This category is also dominated by few famous apps like Facebook and Pinterest. However, some of these apps are quite specialized (LinkedIn used for work, Badoo for dating, etc), so their functions are different and the apps are not substitutable. Coming up with some interesting thematic social network could be a good idea for our apps, also because the average of the installs is still quite high even after excluding the giant apps. On the App Store we also found the corresponding genre *Social Networking* being highly potential. Hence this sphere can become profitable on both markets, which indeed is our ultimate goal.
# - *PHOTOGRAPHY* category shows high average number of installs  even after removing the giant apps. Despite this, and despite being one of the most common genres on both markets, it is not identified as one of the most popular genres on the App Store. In addition, this category is represented mostly by photo editors, presumably used mainly for work by professional photographers. This work requires high concentration, so in-app ads probably will not meet suitable target audience here.
# - *PRODUCTIVITY* genre is heavily dominated by famous apps. In addition, it doesn't result to be very popular on the App Store, even being among the most common ones.
# - *GAME*. This category is the most common on the App Store and the second common on Google Play. On Google Play it shows also hign number of installs after removing numerous giant apps. However on the App Store it is not particularly popular. Anyway, given certain potential of this genre, we can consider the idea of creating a game-based social network (since we have already defined this category as possibly profitable). It can be, for instance, an online quiz, quest, some other online games where the involvement of several teams (or several participants) currently available online is required. In this case, for being effective in capturing attention, the in-app ads should appear between games while people are waiting for the results and can percept a new information without being distracted from the game. 
# - *TRAVEL_AND_LOCAL* category has very few (but strongly influencing the average number of installs) dominant apps, which are mostly about maps and hotels. According to our investigation of the most common genres, this category is in the middle in both data sets (the App Store and Google Play). Among the iOS apps, it is not particularly popular, so it  doesn't fit our purposes. However, we can use this idea for our apps: creating a social network related to travelling. For example, it can be an app dedicated to searching for co-travellers or a company to travel together, discussing itineraries, what to visit in a certain place, etc.
# 
# When we were investigating the app genres of the App Store, we defined as potential also the *Book* profile. For Google Play, the corresponding category (*BOOKS_AND_REFERENCE*) doesn't appear one of the most popular and, practically, is on the 11th place among the 33 categories. It could be also difficult to  extract from here some ideas for a social networking app. Hence to create apps profitable on both markets, books don't seem to be the best chioce.
# ## Conclusions
# All in all, after a thorough analysis of the most common and the most popular app genres of both datasets, the **SOCIAL NETWORKING** profile was suggested as the most interesting for our purposes, i.e. creating profitable free English apps with the revenue based on in-app ads for both the App Store and Google play. To stand out in the existing apps of this kind and to overcome the competition, a right theme has to be selected. As some possible ideas, it was proposed to create an online quiz, quest, some other online games with a lot of people/teams involved, or a social networking app dedicated to searching for co-travellers, discussing itineraries and places to visit.