Analisys of Apps market

In this project we will analyze data to help our developers understand what type of apps are likely to attract more users. We only build apps that are free to download, and our main source of revenue consists of in-app ads. This study will help the company develop successful apps for App Store and Google Play.

Open the two data sets, and save both as lists of lists.

In [1]:
def dataset_to_list(dataset):
    open_dataset = open(dataset)
    from csv import reader
    read_dataset = reader(open_dataset)
    convert_to_list = list(read_dataset)
    return convert_to_list

app_store = dataset_to_list('AppleStore.csv')
google_play = dataset_to_list('googleplaystore.csv')

Explore both data sets using the explore_data() function

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset)-1)
        print('Number of columns:', len(dataset[0]))
        
print(explore_data(app_store, 0, 5, rows_and_columns=True))
print(explore_data(google_play, 0, 5, rows_and_columns=True))
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16
None
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13
None

Print the column names and try to identify the columns that could help us with our analysis.

In [3]:
app_store_headers = app_store[0]
google_play_headers = google_play[0]
print(app_store_headers)
print(google_play_headers)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

Here we are going to delete a row that has wrong data.

In [4]:
print(google_play[0])
print(google_play[10472])
print(google_play[10473])
del google_play[10473]
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']

Check if dataset has duplicate entries

In [5]:
unique_rows = []
non_unique_rows = []

for row in google_play[1:]:
    name = row[0]
    if name in unique_rows:
        non_unique_rows.append(name)
    else:
        unique_rows.append(name)
print('Number of duplicate rows: ', len(non_unique_rows))
Number of duplicate rows:  1181

The above cell shows that there are 1181 duplicate entries. We are going to remove all duplicates and keep only the row with the highest number of ratings. The reason being that the higher number of rating implies the entry is more recent. After we remove the duplicates, we will be left with the below number of entries:

In [6]:
print(len(google_play[1:]) - len(non_unique_rows))
9659

Here we are going to create an empty dictionary, where each key is a unique app name and the corresponding value is the highest number of reviews.

In [7]:
reviews_max = {}

for row in google_play[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews

print(len(reviews_max))
9659

Here we will use the reviews_max dictionary created above to remove duplicate rows. We will create two empty lists: android_clean and already_added. We will use a for loop to add each row from the google_play dataset into the android_clean list, provided it meets the below two requirements:

  1. name is not in already_added
  2. the number of reviews is equal to the number of reviews associated to that app name in the reviews_max dictionary.

If both conditions are met, we also append the name to the already_added list.

In [8]:
android_clean = []
already_added = []

for row in google_play[1:]:
    entry = row
    name = row[0]
    n_reviews = float(row[3])
    if name not in already_added and n_reviews == reviews_max[name]:
        android_clean.append(entry)
        already_added.append(name)
        
print(len(android_clean))
9659

Define a function that returns True if all characters in a string belong to the set of common English characters (ASCII 0-127) and False otherwise.

In [9]:
def english_character(a_string):
    for character in a_string:
        if ord(character) > 127:
            return False
    return True
    
english_character('Instachat 😜')
Out[9]:
False

This function labels as non-English some apps that are indeed English but contain some special characters (like emojis) which have a ASCII code greater than 127. We are going to rewrite the function to allow up to 3 characters that fall outside the ASCII range.

In [10]:
def english_character_v2(a_string):
    non_english_count = 0
    for character in a_string:
        if ord(character) > 127:
            non_english_count += 1
    if non_english_count > 3:
        return False
    else:
        return True
    
english_character_v2('》Docs To Go™ Free Office S电视剧热播uite')
Out[10]:
False

Now we are going to use the english_character_v2 function to filter out non-English apps from both data sets.

In [11]:
def english_only_dataset(index, a_dataset):
    a_string = []
    for row in a_dataset:
        entry = row
        name = row[index]
        if english_character_v2(name) == True:
            a_string.append(entry)
    return a_string

app_store_english_only = english_only_dataset(0, app_store[1:])
print(len(app_store_english_only))
print(app_store_english_only[:5])

google_play_english_only = english_only_dataset(0, android_clean)
print(len(google_play_english_only))
print(google_play_english_only[:5])
7197
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]
9614
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]

So far in the data cleaning process, we have:

  • removed inaccurate data
  • removed duplicate app entries
  • removed non-English apps

Our next step in the data cleaning process is to isolate only the free apps for our analysis. This will be our last step in the data cleaning process.

In [12]:
app_store_clean = []

for row in app_store_english_only:
    price = float(row[4])
    if price == 0.0:
        app_store_clean.append(row)
print(len(app_store_clean))
print(app_store_clean[:5])
print('\n')


google_play_clean =[]

for row in google_play_english_only:
    price = row[7]
    if price == '0':
        google_play_clean.append(row)
print(len(google_play_clean))
print(google_play_clean[:5])

### Here I am not creating a function to be used for both datasets as their column with prices has different data types ###
4056
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]


8864
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]

Now that the data cleaning process is finished, we want to find out what app profiles are succcessful on both App Store and Google Play. To do that we will create some frequency tables (for example for the genre).

In [13]:
def freq_table(index, a_list):
    a_dictionary = {}
    for row in a_list:
        genre = row[index]
        if genre in a_dictionary:
            a_dictionary[genre] += 1
        else:
            a_dictionary[genre] = 1
    return a_dictionary

app_store_genre_frequency = freq_table(11, app_store_clean)
print('App Store genre frequency')
print(app_store_genre_frequency)
print('\n')
google_play_category_frequency = freq_table(1, google_play_clean)
print('Google Play category frequency')
print(google_play_category_frequency)
print('\n')
google_play_genre_frequency = freq_table(9, google_play_clean)
print('Google Play genre frequency')
print(google_play_genre_frequency)
App Store genre frequency
{'Travel': 56, 'Reference': 20, 'Music': 67, 'Entertainment': 334, 'Education': 132, 'Productivity': 62, 'Catalogs': 9, 'Book': 66, 'Utilities': 109, 'Lifestyle': 94, 'Food & Drink': 43, 'Photo & Video': 167, 'Weather': 31, 'News': 58, 'Sports': 79, 'Social Networking': 143, 'Health & Fitness': 76, 'Medical': 8, 'Navigation': 20, 'Shopping': 121, 'Finance': 84, 'Business': 20, 'Games': 2257}


Google Play category frequency
{'EVENTS': 63, 'COMICS': 55, 'BUSINESS': 407, 'TRAVEL_AND_LOCAL': 207, 'PARENTING': 58, 'BEAUTY': 53, 'FINANCE': 328, 'FAMILY': 1676, 'NEWS_AND_MAGAZINES': 248, 'WEATHER': 71, 'EDUCATION': 103, 'FOOD_AND_DRINK': 110, 'TOOLS': 750, 'ENTERTAINMENT': 85, 'MAPS_AND_NAVIGATION': 124, 'SOCIAL': 236, 'VIDEO_PLAYERS': 159, 'AUTO_AND_VEHICLES': 82, 'LIFESTYLE': 346, 'BOOKS_AND_REFERENCE': 190, 'HEALTH_AND_FITNESS': 273, 'DATING': 165, 'PRODUCTIVITY': 345, 'SHOPPING': 199, 'SPORTS': 301, 'ART_AND_DESIGN': 57, 'PERSONALIZATION': 294, 'GAME': 862, 'PHOTOGRAPHY': 261, 'LIBRARIES_AND_DEMO': 83, 'COMMUNICATION': 287, 'HOUSE_AND_HOME': 73, 'MEDICAL': 313}


Google Play genre frequency
{'Educational;Brain Games': 6, 'Puzzle': 100, 'Education': 474, 'Casual': 156, 'News & Magazines': 248, 'Communication;Creativity': 1, 'Casual;Action & Adventure': 12, 'Puzzle;Brain Games': 15, 'Strategy': 81, 'Health & Fitness;Action & Adventure': 1, 'Beauty': 53, 'Role Playing': 83, 'Libraries & Demo': 83, 'Board;Action & Adventure': 2, 'Entertainment;Action & Adventure': 3, 'Education;Action & Adventure': 3, 'Word': 23, 'Art & Design;Action & Adventure': 1, 'Simulation;Education': 1, 'Puzzle;Creativity': 2, 'Health & Fitness': 273, 'Education;Pretend Play': 5, 'Card;Action & Adventure': 1, 'Business': 407, 'Music': 18, 'Art & Design': 53, 'Travel & Local;Action & Adventure': 1, 'Tools': 749, 'Racing;Pretend Play': 1, 'Parenting;Education': 7, 'Auto & Vehicles': 82, 'Productivity': 345, 'Travel & Local': 206, 'Casual;Pretend Play': 21, 'Strategy;Creativity': 1, 'Social': 236, 'Education;Brain Games': 3, 'Board;Brain Games': 7, 'Role Playing;Pretend Play': 4, 'Parenting;Brain Games': 1, 'Communication': 287, 'Weather': 71, 'Casual;Creativity': 6, 'Entertainment;Music & Video': 15, 'Maps & Navigation': 124, 'Simulation;Action & Adventure': 7, 'Role Playing;Action & Adventure': 3, 'Arcade;Pretend Play': 1, 'Card': 40, 'Adventure': 60, 'Personalization': 294, 'Simulation;Pretend Play': 2, 'Entertainment;Education': 1, 'Finance': 328, 'Casual;Brain Games': 12, 'Educational': 33, 'Arcade': 164, 'Entertainment;Creativity': 3, 'Parenting;Music & Video': 6, 'Lifestyle': 345, 'Strategy;Education': 1, 'Video Players & Editors;Music & Video': 2, 'Educational;Pretend Play': 8, 'Racing;Action & Adventure': 15, 'Comics;Creativity': 1, 'Entertainment;Brain Games': 7, 'Casual;Education': 2, 'Trivia': 37, 'Education;Creativity': 4, 'Food & Drink': 110, 'Music & Audio;Music & Video': 1, 'Art & Design;Creativity': 6, 'Trivia;Education': 1, 'Education;Music & Video': 3, 'Role Playing;Brain Games': 1, 'Video Players & Editors;Creativity': 1, 'Books & Reference': 190, 'Tools;Education': 1, 'Educational;Education': 35, 'Medical': 313, 'Action;Action & Adventure': 9, 'House & Home': 73, 'Entertainment;Pretend Play': 2, 'Puzzle;Education': 1, 'Events': 63, 'Lifestyle;Education': 1, 'Casino': 38, 'Parenting': 44, 'Board': 34, 'Casual;Music & Video': 1, 'Entertainment': 538, 'Health & Fitness;Education': 1, 'Educational;Creativity': 3, 'Puzzle;Action & Adventure': 3, 'Racing': 88, 'Comics': 54, 'Educational;Action & Adventure': 3, 'Dating': 165, 'Action': 275, 'Adventure;Action & Adventure': 3, 'Arcade;Action & Adventure': 11, 'Art & Design;Pretend Play': 1, 'Simulation': 181, 'Sports;Action & Adventure': 2, 'Lifestyle;Pretend Play': 1, 'Books & Reference;Education': 1, 'Sports': 307, 'Strategy;Action & Adventure': 1, 'Adventure;Education': 1, 'Photography': 261, 'Music;Music & Video': 2, 'Shopping': 199, 'Education;Education': 30, 'Video Players & Editors': 157}

Now we will build two functions to analyze the above frequency tables:

  • one function to generate frequency tables that show percentages
  • another function we can use to display the percentages in a descending order
In [14]:
### Here we create a function to generate frequency tables that show percentages ###

def freq_table_percentages(dataset, index):
    sum_freq_table_values = len(dataset[1:])
        
    freq_table = {}
    for row in dataset[1:]:
        key = row[index]
        if key in freq_table:
            freq_table[key] += 1
        else: 
            freq_table[key] = 1
    
    freq_table_percent = {}
    for key in freq_table:
        freq_table_percent[key] = (freq_table[key] / sum_freq_table_values *100)
    return freq_table_percent

app_store_genre_frequency_percent = freq_table_percentages(app_store_clean, 11)
print('% of apps per genre in App Store')
print(app_store_genre_frequency_percent)
print('\n')
google_play_category_frequency_percent = freq_table_percentages(google_play_clean, 1)
print('% of apps per category in Google Play')
print(google_play_category_frequency_percent)
print('\n')
google_play_genre_frequency_percent = freq_table_percentages(google_play_clean, 9)
print('% of apps per genre in Google Play')
print(google_play_genre_frequency_percent)
% of apps per genre in App Store
{'Travel': 1.381011097410604, 'Reference': 0.4932182490752158, 'Entertainment': 8.236744759556105, 'Education': 3.255240443896424, 'Productivity': 1.528976572133169, 'Catalogs': 0.22194821208384713, 'Navigation': 0.4932182490752158, 'Utilities': 2.688039457459926, 'Lifestyle': 2.318125770653514, 'Music': 1.6522811344019728, 'Photo & Video': 4.1183723797780525, 'Weather': 0.7644882860665845, 'News': 1.4303329223181258, 'Games': 55.659679408138096, 'Sports': 1.9482120838471024, 'Social Networking': 3.501849568434032, 'Health & Fitness': 1.8742293464858202, 'Medical': 0.19728729963008632, 'Book': 1.627620221948212, 'Shopping': 2.9839704069050557, 'Finance': 2.0715166461159065, 'Business': 0.4932182490752158, 'Food & Drink': 1.060419235511714}


% of apps per category in Google Play
{'EVENTS': 0.7108202640189552, 'COMICS': 0.6205573733498815, 'BUSINESS': 4.592124562789123, 'PARENTING': 0.6544059573507841, 'BEAUTY': 0.5979916506826132, 'FINANCE': 3.7007785174320205, 'FAMILY': 18.910075595170937, 'NEWS_AND_MAGAZINES': 2.798149610741284, 'WEATHER': 0.8010831546880289, 'BOOKS_AND_REFERENCE': 2.1437436533904997, 'FOOD_AND_DRINK': 1.241114746699763, 'COMMUNICATION': 3.2381812027530184, 'ENTERTAINMENT': 0.9590432133589079, 'MAPS_AND_NAVIGATION': 1.399074805370642, 'VIDEO_PLAYERS': 1.7939749520478394, 'AUTO_AND_VEHICLES': 0.9251946293580051, 'LIFESTYLE': 3.9038700214374367, 'EDUCATION': 1.1621347173643235, 'HEALTH_AND_FITNESS': 3.0802211440821394, 'DATING': 1.8616721200496444, 'PRODUCTIVITY': 3.8925871601038025, 'HOUSE_AND_HOME': 0.8236488773552973, 'MEDICAL': 3.5315355974275078, 'PHOTOGRAPHY': 2.944826808078529, 'SPORTS': 3.396141261423897, 'ART_AND_DESIGN': 0.6318402346835158, 'PERSONALIZATION': 3.317161232088458, 'GAME': 9.725826469592688, 'TOOLS': 8.462146000225657, 'LIBRARIES_AND_DEMO': 0.9364774906916393, 'SOCIAL': 2.6627552747376737, 'SHOPPING': 2.245289405393208, 'TRAVEL_AND_LOCAL': 2.335552296062281}


% of apps per genre in Google Play
{'Educational;Brain Games': 0.06769716800180525, 'Puzzle': 1.128286133363421, 'Education': 5.348076272142616, 'Comics': 0.6092745120162473, 'Racing;Pretend Play': 0.011282861333634209, 'Communication;Creativity': 0.011282861333634209, 'Casual;Action & Adventure': 0.1353943360036105, 'Strategy;Education': 0.011282861333634209, 'Strategy': 0.9139117680243709, 'Beauty': 0.5979916506826132, 'Role Playing': 0.9364774906916393, 'Libraries & Demo': 0.9364774906916393, 'Board;Action & Adventure': 0.022565722667268417, 'Entertainment;Action & Adventure': 0.033848584000902626, 'Education;Action & Adventure': 0.033848584000902626, 'Word': 0.25950581067358686, 'Art & Design;Action & Adventure': 0.011282861333634209, 'Puzzle;Creativity': 0.022565722667268417, 'Health & Fitness': 3.0802211440821394, 'Tools;Education': 0.011282861333634209, 'Educational;Pretend Play': 0.09026289066907367, 'Parenting;Brain Games': 0.011282861333634209, 'Art & Design': 0.5867087893489789, 'Entertainment;Pretend Play': 0.022565722667268417, 'Tools': 8.450863138892023, 'Weather': 0.8010831546880289, 'Parenting;Education': 0.07898002933543948, 'Auto & Vehicles': 0.9251946293580051, 'Productivity': 3.8925871601038025, 'Travel & Local': 2.324269434728647, 'Casual;Pretend Play': 0.2369400880063184, 'Lifestyle;Pretend Play': 0.011282861333634209, 'Social': 2.6627552747376737, 'Education;Brain Games': 0.033848584000902626, 'Board;Brain Games': 0.07898002933543948, 'Role Playing;Pretend Play': 0.045131445334536835, 'Business': 4.592124562789123, 'Communication': 3.2381812027530184, 'Education;Pretend Play': 0.05641430666817105, 'Casual;Creativity': 0.06769716800180525, 'Entertainment;Music & Video': 0.16924292000451313, 'Books & Reference;Education': 0.011282861333634209, 'Simulation;Action & Adventure': 0.07898002933543948, 'Medical': 3.5315355974275078, 'Arcade;Pretend Play': 0.011282861333634209, 'Card': 0.4513144533453684, 'Adventure': 0.6769716800180525, 'Simulation;Pretend Play': 0.022565722667268417, 'Entertainment;Education': 0.011282861333634209, 'Finance': 3.7007785174320205, 'Sports': 3.463838429425702, 'Educational': 0.37233442400992894, 'Travel & Local;Action & Adventure': 0.011282861333634209, 'Arcade': 1.8503892587160102, 'Art & Design;Pretend Play': 0.011282861333634209, 'Entertainment;Creativity': 0.033848584000902626, 'Puzzle;Brain Games': 0.16924292000451313, 'Video Players & Editors;Music & Video': 0.022565722667268417, 'Card;Action & Adventure': 0.011282861333634209, 'Racing;Action & Adventure': 0.16924292000451313, 'Casual;Music & Video': 0.011282861333634209, 'Simulation;Education': 0.011282861333634209, 'Casual;Education': 0.022565722667268417, 'Trivia': 0.4174658693444658, 'Education;Creativity': 0.045131445334536835, 'Lifestyle': 3.8925871601038025, 'Food & Drink': 1.241114746699763, 'Music & Audio;Music & Video': 0.011282861333634209, 'Art & Design;Creativity': 0.06769716800180525, 'Trivia;Education': 0.011282861333634209, 'Education;Music & Video': 0.033848584000902626, 'Role Playing;Brain Games': 0.011282861333634209, 'Video Players & Editors;Creativity': 0.011282861333634209, 'Books & Reference': 2.1437436533904997, 'Health & Fitness;Action & Adventure': 0.011282861333634209, 'Educational;Education': 0.3949001466771973, 'Personalization': 3.317161232088458, 'Action;Action & Adventure': 0.10154575200270789, 'House & Home': 0.8236488773552973, 'Parenting;Music & Video': 0.06769716800180525, 'Puzzle;Education': 0.011282861333634209, 'Events': 0.7108202640189552, 'Lifestyle;Education': 0.011282861333634209, 'Casino': 0.42874873067809993, 'Parenting': 0.4964458986799052, 'Board': 0.38361728534356315, 'Comics;Creativity': 0.011282861333634209, 'Maps & Navigation': 1.399074805370642, 'Health & Fitness;Education': 0.011282861333634209, 'Educational;Creativity': 0.033848584000902626, 'News & Magazines': 2.798149610741284, 'Puzzle;Action & Adventure': 0.033848584000902626, 'Racing': 0.9928917973598104, 'Casual': 1.7601263680469368, 'Educational;Action & Adventure': 0.033848584000902626, 'Dating': 1.8616721200496444, 'Action': 3.102786866749408, 'Adventure;Action & Adventure': 0.033848584000902626, 'Arcade;Action & Adventure': 0.1241114746699763, 'Music': 0.20309150400541578, 'Simulation': 2.042197901387792, 'Sports;Action & Adventure': 0.022565722667268417, 'Strategy;Creativity': 0.011282861333634209, 'Role Playing;Action & Adventure': 0.033848584000902626, 'Entertainment': 6.070179397495204, 'Casual;Brain Games': 0.1353943360036105, 'Strategy;Action & Adventure': 0.011282861333634209, 'Adventure;Education': 0.011282861333634209, 'Photography': 2.944826808078529, 'Music;Music & Video': 0.022565722667268417, 'Shopping': 2.245289405393208, 'Education;Education': 0.33848584000902626, 'Entertainment;Brain Games': 0.07898002933543948, 'Video Players & Editors': 1.771409229380571}
In [15]:
### Here we build a function to display the most common app genres in descending order ### 

def display_table(dataset, index):
    table = freq_table_percentages(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
print('% of apps per genre in App Store, sorted')
print(display_table(app_store_clean, 11))
print('\n')
print('% of apps per category in Google Play, sorted')
print(display_table(google_play_clean, 1))
print('\n')
print('% of apps per genre in Google Play, sorted')
print(display_table(google_play_clean, 9))
% of apps per genre in App Store, sorted
Games : 55.659679408138096
Entertainment : 8.236744759556105
Photo & Video : 4.1183723797780525
Social Networking : 3.501849568434032
Education : 3.255240443896424
Shopping : 2.9839704069050557
Utilities : 2.688039457459926
Lifestyle : 2.318125770653514
Finance : 2.0715166461159065
Sports : 1.9482120838471024
Health & Fitness : 1.8742293464858202
Music : 1.6522811344019728
Book : 1.627620221948212
Productivity : 1.528976572133169
News : 1.4303329223181258
Travel : 1.381011097410604
Food & Drink : 1.060419235511714
Weather : 0.7644882860665845
Reference : 0.4932182490752158
Navigation : 0.4932182490752158
Business : 0.4932182490752158
Catalogs : 0.22194821208384713
Medical : 0.19728729963008632
None


% of apps per category in Google Play, sorted
FAMILY : 18.910075595170937
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0.6318402346835158
COMICS : 0.6205573733498815
BEAUTY : 0.5979916506826132
None


% of apps per genre in Google Play, sorted
Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
Strategy : 0.9139117680243709
House & Home : 0.8236488773552973
Weather : 0.8010831546880289
Events : 0.7108202640189552
Adventure : 0.6769716800180525
Comics : 0.6092745120162473
Beauty : 0.5979916506826132
Art & Design : 0.5867087893489789
Parenting : 0.4964458986799052
Card : 0.4513144533453684
Casino : 0.42874873067809993
Trivia : 0.4174658693444658
Educational;Education : 0.3949001466771973
Board : 0.38361728534356315
Educational : 0.37233442400992894
Education;Education : 0.33848584000902626
Word : 0.25950581067358686
Casual;Pretend Play : 0.2369400880063184
Music : 0.20309150400541578
Racing;Action & Adventure : 0.16924292000451313
Puzzle;Brain Games : 0.16924292000451313
Entertainment;Music & Video : 0.16924292000451313
Casual;Brain Games : 0.1353943360036105
Casual;Action & Adventure : 0.1353943360036105
Arcade;Action & Adventure : 0.1241114746699763
Action;Action & Adventure : 0.10154575200270789
Educational;Pretend Play : 0.09026289066907367
Simulation;Action & Adventure : 0.07898002933543948
Parenting;Education : 0.07898002933543948
Entertainment;Brain Games : 0.07898002933543948
Board;Brain Games : 0.07898002933543948
Parenting;Music & Video : 0.06769716800180525
Educational;Brain Games : 0.06769716800180525
Casual;Creativity : 0.06769716800180525
Art & Design;Creativity : 0.06769716800180525
Education;Pretend Play : 0.05641430666817105
Role Playing;Pretend Play : 0.045131445334536835
Education;Creativity : 0.045131445334536835
Role Playing;Action & Adventure : 0.033848584000902626
Puzzle;Action & Adventure : 0.033848584000902626
Entertainment;Creativity : 0.033848584000902626
Entertainment;Action & Adventure : 0.033848584000902626
Educational;Creativity : 0.033848584000902626
Educational;Action & Adventure : 0.033848584000902626
Education;Music & Video : 0.033848584000902626
Education;Brain Games : 0.033848584000902626
Education;Action & Adventure : 0.033848584000902626
Adventure;Action & Adventure : 0.033848584000902626
Video Players & Editors;Music & Video : 0.022565722667268417
Sports;Action & Adventure : 0.022565722667268417
Simulation;Pretend Play : 0.022565722667268417
Puzzle;Creativity : 0.022565722667268417
Music;Music & Video : 0.022565722667268417
Entertainment;Pretend Play : 0.022565722667268417
Casual;Education : 0.022565722667268417
Board;Action & Adventure : 0.022565722667268417
Video Players & Editors;Creativity : 0.011282861333634209
Trivia;Education : 0.011282861333634209
Travel & Local;Action & Adventure : 0.011282861333634209
Tools;Education : 0.011282861333634209
Strategy;Education : 0.011282861333634209
Strategy;Creativity : 0.011282861333634209
Strategy;Action & Adventure : 0.011282861333634209
Simulation;Education : 0.011282861333634209
Role Playing;Brain Games : 0.011282861333634209
Racing;Pretend Play : 0.011282861333634209
Puzzle;Education : 0.011282861333634209
Parenting;Brain Games : 0.011282861333634209
Music & Audio;Music & Video : 0.011282861333634209
Lifestyle;Pretend Play : 0.011282861333634209
Lifestyle;Education : 0.011282861333634209
Health & Fitness;Education : 0.011282861333634209
Health & Fitness;Action & Adventure : 0.011282861333634209
Entertainment;Education : 0.011282861333634209
Communication;Creativity : 0.011282861333634209
Comics;Creativity : 0.011282861333634209
Casual;Music & Video : 0.011282861333634209
Card;Action & Adventure : 0.011282861333634209
Books & Reference;Education : 0.011282861333634209
Art & Design;Pretend Play : 0.011282861333634209
Art & Design;Action & Adventure : 0.011282861333634209
Arcade;Pretend Play : 0.011282861333634209
Adventure;Education : 0.011282861333634209
None

The most common app genre in the App Store is "Games" (55.7%), followed by "Entertainment" (8.2%). There is a total of 23 app genres and the top five account for 75% of total apps. The top four genres are all leisure related: Games, Entertainment, Photo & Video, Social Networking. From the above frequency tables we can't say which apps have the most users but we can only see what genres have the most number of apps.

The most common category in Google Play is "Family" (18.9%), followed by "Game" (9.7%).

The most common genre in Google Play is "Tools" (8.5%), followed by "Entertainment" (6.1%). Four out of five of the top genres in Google Play are geared towards practical uses rather than leisure: Tools, Education, Business, Productivity. This is in contrast to what we see in the App store where top genres are all leisure related.

Now we want to know what the average number of users per app by genre is. To do that:

  1. generate a frequency table for the app genres
  2. loop over the unique genres and save the sum of user ratings by genre in a variable.
In [16]:
### Average # of user ratings per app by genre - App Store ###
print('Average # of user ratings per app by genre - App Store')
for genre in app_store_genre_frequency:
    total = 0
    len_genre = 0
    for row in app_store_clean:
        genre_app = row[11]
        if genre_app == genre:
            app_ratings = float(row[5])
            total += app_ratings
            len_genre += 1
    avg_user_ratings = total / len_genre
    print(genre,": ", avg_user_ratings)

print('\n')

### Average # of user reviews per app by category - Google Play ###
print('Average # of user reviews per app by category - Google Play')
for category in google_play_category_frequency:
    total = 0
    len_category = 0
    for row in google_play_clean:
        category_app = row[1]
        if category_app == category:
            app_reviews = float(row[3])
            total += app_reviews
            len_category += 1
    avg_user_reviews = total / (len_category + 1)
    print(category, ": ", avg_user_reviews)
  
print('\n')

### Average # of user reviews per app by genre - Google Play ###
print('Average # of user reviews per app by genre - Google Play')
for genre in google_play_genre_frequency:
    total = 0
    len_genre = 0
    for row in google_play_clean:
        genre_app = row[9]
        if genre_app == genre:
            app_reviews = float(row[3])
            total += app_reviews
            len_genre += 1
    avg_user_reviews = total / (len_genre + 1)
    print(genre, ": ", avg_user_reviews)

    
Average # of user ratings per app by genre - App Store
Travel :  20216.01785714286
Reference :  67447.9
Music :  56482.02985074627
Entertainment :  10822.961077844311
Education :  6266.333333333333
Productivity :  19053.887096774193
Catalogs :  1779.5555555555557
Book :  8498.333333333334
Utilities :  14010.100917431193
Lifestyle :  8978.308510638299
Food & Drink :  20179.093023255813
Photo & Video :  27249.892215568863
Weather :  47220.93548387097
News :  15892.724137931034
Sports :  20128.974683544304
Social Networking :  53078.195804195806
Health & Fitness :  19952.315789473683
Medical :  459.75
Navigation :  25972.05
Shopping :  18746.677685950413
Finance :  13522.261904761905
Business :  6367.8
Games :  18924.68896765618


Average # of user reviews per app by category - Google Play
EVENTS :  2515.90625
COMICS :  41825.16071428572
BUSINESS :  24180.316176470587
TRAVEL_AND_LOCAL :  128861.90384615384
PARENTING :  16101.101694915254
BEAUTY :  7337.777777777777
FINANCE :  38418.76899696048
FAMILY :  113075.53070960048
NEWS_AND_MAGAZINES :  92714.18473895582
WEATHER :  168872.29166666666
EDUCATION :  55751.817307692305
FOOD_AND_DRINK :  56960.963963963964
TOOLS :  305325.79627163784
ENTERTAINMENT :  298243.5
MAPS_AND_NAVIGATION :  141717.168
SOCIAL :  961755.7510548523
VIDEO_PLAYERS :  422691.64375
AUTO_AND_VEHICLES :  13969.915662650603
LIFESTYLE :  33824.06628242075
BOOKS_AND_REFERENCE :  87534.36125654451
HEALTH_AND_FITNESS :  77809.95255474452
DATING :  21821.024096385543
PRODUCTIVITY :  160170.2803468208
SHOPPING :  222767.91
SPORTS :  116551.40066225166
ART_AND_DESIGN :  24273.568965517243
PERSONALIZATION :  180508.34237288137
GAME :  682731.8122827347
PHOTOGRAPHY :  402539.0801526718
LIBRARIES_AND_DEMO :  10795.738095238095
COMMUNICATION :  992151.4895833334
HOUSE_AND_HOME :  26078.22972972973
MEDICAL :  3718.2738853503183


Average # of user reviews per app by genre - Google Play
Educational;Brain Games :  17901.714285714286
Puzzle :  213527.28712871287
Education :  16177.254736842106
Casual :  832370.2993630574
News & Magazines :  92714.18473895582
Communication;Creativity :  1739.0
Casual;Action & Adventure :  870209.0
Puzzle;Brain Games :  147605.625
Strategy :  1236575.4512195121
Health & Fitness;Action & Adventure :  15530.5
Beauty :  7337.777777777777
Role Playing :  246289.4880952381
Libraries & Demo :  10795.738095238095
Board;Action & Adventure :  24855.333333333332
Entertainment;Action & Adventure :  34312.5
Education;Action & Adventure :  3882.25
Word :  218760.70833333334
Art & Design;Action & Adventure :  32.5
Simulation;Education :  8.0
Puzzle;Creativity :  25746.333333333332
Health & Fitness :  77809.95255474452
Education;Pretend Play :  20959.5
Card;Action & Adventure :  460285.5
Business :  24180.316176470587
Music :  205064.0
Art & Design :  25635.425925925927
Travel & Local;Action & Adventure :  445.0
Tools :  305276.44933333335
Racing;Pretend Play :  1100.0
Parenting;Education :  1921.125
Auto & Vehicles :  13969.915662650603
Productivity :  160170.2803468208
Travel & Local :  129480.12560386474
Casual;Pretend Play :  100632.77272727272
Strategy;Creativity :  64771.0
Social :  961755.7510548523
Education;Brain Games :  144443.25
Board;Brain Games :  1968.0
Role Playing;Pretend Play :  44638.4
Parenting;Brain Games :  1807.0
Communication :  992151.4895833334
Weather :  168872.29166666666
Casual;Creativity :  75864.71428571429
Entertainment;Music & Video :  74699.5625
Maps & Navigation :  141717.168
Simulation;Action & Adventure :  153957.875
Role Playing;Action & Adventure :  241731.25
Arcade;Pretend Play :  11835.5
Card :  162278.0243902439
Adventure :  297260.59016393445
Personalization :  180508.34237288137
Simulation;Pretend Play :  25109.666666666668
Entertainment;Education :  3660.0
Finance :  38418.76899696048
Casual;Brain Games :  9671.076923076924
Educational :  6865.588235294118
Arcade :  704533.896969697
Entertainment;Creativity :  107669.5
Parenting;Music & Video :  3746.8571428571427
Lifestyle :  33514.32369942196
Strategy;Education :  1031.0
Video Players & Editors;Music & Video :  52993.666666666664
Educational;Pretend Play :  190862.33333333334
Racing;Action & Adventure :  190495.1875
Comics;Creativity :  258.0
Entertainment;Brain Games :  69216.25
Casual;Education :  9211.666666666666
Trivia :  188835.8947368421
Education;Creativity :  5107.4
Food & Drink :  56960.963963963964
Music & Audio;Music & Video :  684.5
Art & Design;Creativity :  4866.714285714285
Trivia;Education :  4.0
Education;Music & Video :  11209.0
Role Playing;Brain Games :  75687.0
Video Players & Editors;Creativity :  79811.0
Books & Reference :  87534.36125654451
Tools;Education :  171168.0
Educational;Education :  11676.611111111111
Medical :  3718.2738853503183
Action;Action & Adventure :  116087.5
House & Home :  26078.22972972973
Entertainment;Pretend Play :  35770.0
Puzzle;Education :  417.0
Events :  2515.90625
Lifestyle;Education :  1573.0
Casino :  130870.61538461539
Parenting :  20105.644444444446
Board :  118080.0
Casual;Music & Video :  19010.5
Entertainment :  103197.43042671614
Health & Fitness;Education :  4928.0
Educational;Creativity :  11121.75
Puzzle;Action & Adventure :  400421.5
Racing :  591278.0898876404
Comics :  42576.236363636366
Educational;Action & Adventure :  169351.75
Dating :  21821.024096385543
Action :  543001.0724637681
Adventure;Action & Adventure :  1134951.75
Arcade;Action & Adventure :  92402.33333333333
Art & Design;Pretend Play :  487.0
Simulation :  142065.32967032967
Sports;Action & Adventure :  486676.0
Lifestyle;Pretend Play :  70497.5
Books & Reference;Education :  21.0
Sports :  212745.4025974026
Strategy;Action & Adventure :  9585.0
Adventure;Education :  144303.0
Photography :  402539.0801526718
Music;Music & Video :  16982.666666666668
Shopping :  222767.91
Education;Education :  226998.25806451612
Video Players & Editors :  426277.46202531643

The app genre with the highest average number of user ratings per app is "Weather". Weather apps only account for 0.76% of total in the App Store, or 31 apps. Such a high avg number of ratings combined with the relatively low number of apps in this genre may tell us that there is one or a few successful Weather app with a large number of ratings, skewing the genre average up. Creating a weather app is not recommended as it is probably a winner takes most genre.

Our recommendation is to create an app in a genre that's popular having a large number of apps and a high average number of user ratings.

The most popular genres (in terms of # of apps) are: Games, Entertainment, Photo & Video, Social Networking. Based on the average number of ratings we will make our recommendation.

Genre % of total apps Avg user ratings
Games 55.7% 18925
Entertainment 8.2% 10823
Photo & Video 4.1% 27250
Social Networking 3.5% 53078

While Social Networking and Photo & Video tend to be genres with a handful of very successful apps, we believe that in the Games genres there is more room for new entrants to be successful. This is confirmed by the fact that the Games genre has a relatively high average number of ratings despite having by far the highest number of apps.

Given the above reasoning, and aware that a more in depth analysis would help to obtain a more detailed recommendation, we suggest the Games genre.

The Google Play dataset contains a column with number of installs. Unfortunately, the data is stored as a string, so before we can calculate the average we need to convert the data into float. We can remove the commas and + sign from the "Installs" column by using the str.replace(old, new) method.

In [21]:
### Here we are looking at the average number of app installs by category in Google Play ###

for category in google_play_category_frequency:
    total = 0
    len_category = 0
    for row in google_play_clean:
        category_app = row[1]
        installs = row[5]
        if category_app == category:
            installs = installs.replace('+', '')
            installs = installs.replace(',', '')
            total += float(installs)
            len_category += 1
    avg_installs_category = total / len_category
    print(category,': ', avg_installs_category)
EVENTS :  253542.22222222222
COMICS :  817657.2727272727
BUSINESS :  1712290.1474201474
TRAVEL_AND_LOCAL :  13984077.710144928
PARENTING :  542603.6206896552
BEAUTY :  513151.88679245283
FINANCE :  1387692.475609756
FAMILY :  3695641.8198090694
NEWS_AND_MAGAZINES :  9549178.467741935
WEATHER :  5074486.197183099
EDUCATION :  1833495.145631068
FOOD_AND_DRINK :  1924897.7363636363
TOOLS :  10801391.298666667
ENTERTAINMENT :  11640705.88235294
MAPS_AND_NAVIGATION :  4056941.7741935486
SOCIAL :  23253652.127118643
VIDEO_PLAYERS :  24727872.452830188
AUTO_AND_VEHICLES :  647317.8170731707
LIFESTYLE :  1437816.2687861272
BOOKS_AND_REFERENCE :  8767811.894736841
HEALTH_AND_FITNESS :  4188821.9853479853
DATING :  854028.8303030303
PRODUCTIVITY :  16787331.344927534
SHOPPING :  7036877.311557789
SPORTS :  3638640.1428571427
ART_AND_DESIGN :  1986335.0877192982
PERSONALIZATION :  5201482.6122448975
GAME :  15588015.603248259
PHOTOGRAPHY :  17840110.40229885
LIBRARIES_AND_DEMO :  638503.734939759
COMMUNICATION :  38456119.167247385
HOUSE_AND_HOME :  1331540.5616438356
MEDICAL :  120550.61980830671

The app category with the highest average number of installs is Communication with 38.5M downloads, followed by Video_players (24.7M) and Social (23.2M). These are also the categories with the highest average number of user reviews. In terms of % of the total apps in Google Play, these categories are not the ones with the highest number of apps: Communication has 3.2%, Social has 2.7% and Video_players has 1.8%. Combining this information with the fact that these categories have the highest number of average downloads, make us deduct that there may bey a few very successful apps in these categories. The Family category has a pretty high average number of installs (3.7M) and it's the largest category in terms of number of apps. The Games category has a pretty high average number of installs (15.6M) and it makes up 9.7% of total apps (second largest category). For this reason we think that the Games category would be the best to develope a new app.