Profitable App Profiles for the App Store and Google Play Markets

In this project, I am assuming the role of a Data Analyst for a company that builds Android and iOS mobile apps.

Company Background: The company sells free apps that are downloaded and installed from the Google Play store or the iOS App store. The company's main source of revenue is in-app ads. If the company can attract more users who engage with the ads, the company may be able to increase ad revenue.

My Goal: My goal is to analyze historical data from the Google Play store and IOS App store in order to provide the developers with a profitable app profile that will likely attract more users using the following six (6) steps of the data analysis process:

  • 1. Ask Question
  • 2. Get Data
  • 3. Explore Data
  • 4. Clean Data
  • 5. Analyze Data
  • 6. Recommendation

1. Ask Question

What apps are more likely to attract users?

Based on 2018 data from Statista, we can see in Chart 1 below that Google Play and the Apple App Store are the two (2) dominant platforms.

Chart 1

image.png

2. Get Data

We will use third-party data collected in August 2018 for approximately 10,000 Android apps from Google Play and third-party data collected in July 2017 for 7,000 iOS apps from App store.

In [32]:
# Open Google Play dataset
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
google_data = list(read_file)

# Open App Store dataset
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
app_store_data = list(read_file)

3. Explore Data

Data Dictionary (Google Play):

google_data columns - category, rating, installs, type, content rating, and genres columns could all be useful for analysis purposes

  • App - name of app
  • Category - group of similar apps
  • Rating - satisfaction score given by user
  • Reviews - number of reviews from users
  • Size - app size
  • Installs - number of installs
  • Type - free or paid
  • Price - cost to purchase app in USD currency
  • Content Rating - who should use the app
  • Genres - groups or categories of similar apps
  • Last Updated - date last updated
  • Current Ver - app version
  • Android Ver - android version

Data Dictionary (App Store dataset):

app_store_data columns - rating_count_tot, user_rating, cont_rating, and prime_genre columns could all be useful for analysis purposes

  • id - app id
  • track_name - app name
  • size_bytes - size of app
  • currency - denomination used to purchase app
  • price - cost of app
  • rating_count_tot - number of ratings
  • rating_count_ver -
  • user_rating - satisfaction score given by the user
  • user_rating_ver -
  • ver - app version
  • cont_rating - age to use the app
  • prime_genre - group or categories of similar apps
  • sup_devices.num - device number
  • ipadSc_urls.num - ipad number
  • lang.num - language number
  • vpp_lic -license
In [33]:
# define explore_data() function
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
In [34]:
#print first (4) rows of data using the explore_data()
explore_data(google_data,0,4,True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13
In [35]:
#print first (4) rows of data using the explore_data()
explore_data(app_store_data,0,4,True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7198
Number of columns: 16

4. Clean Data

Eventhough the data has been downloaded and explored, the data may still contain errors, duplicates and extra information that is not relevant to the analysis. Since the company only sells free, English version apps, it would not make sense to include non-English and paid app versions. The following data was removed:

  • 1.) Removed 1 row of inaccurate data at row 10473 from google_data
  • 2.) Removed 1181 duplicate rows from google_data
  • 3.) Removed 45 rows from google_data and 1014 rows from app_store_data of Non-English apps
  • 4.) Removed 750 rows of google_data and 2962 of app_store data of Paid apps

1.) Removed one 1 row of inaccurate data at row 10473

In [36]:
# Count the rows that do not have 13 columns
column_count_13 = 0
column_count_other = 0
for row in google_data:
    column_count = 0
    for column in row:
        column_count += 1
    if column_count == 13:
        column_count_13 += 1
    else:
        column_count_other += 1
        print("This row does not have [13] columns:", "\n","\n",row)

print("\n")      
print(column_count_13, 'rows out of 10842 rows have [13] columns')    
print(column_count_other, 'row(s) does not have [13] columns')
This row does not have [13] columns: 
 
 ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


10841 rows out of 10842 rows have [13] columns
1 row(s) does not have [13] columns
In [37]:
print(google_data[10473])

column_count = 0
for row in google_data[10473]:
    column_count += 1
print(column_count)

# row[10473] only has 12 columns instead of 13 like all the other columns
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
12
In [38]:
#delete row
del(google_data[10473])
In [39]:
#confirm row deleted
print(" 'Life Made WIFI Touchscreen Photo Frame' was removed","\n","osmino Wi-Fi:free Wifi is currently at row [10473]","\n",google_data[10473])
 'Life Made WIFI Touchscreen Photo Frame' was removed 
 osmino Wi-Fi:free Wifi is currently at row [10473] 
 ['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']

2) Removed 1181 duplicate rows

  • 1.) find duplicate app names
  • 2.) confirm duplicates
  • 3.) create dictionary with app name and max number of reviews
  • 4.) remove duplicates
In [40]:
# 1.)  find duplicate app names 
unique_app_names = []
duplicate_app_names = []
duplicate_rows = []
for row in google_data:
    app_name = str(row[0])
    if app_name in unique_app_names:
        duplicate_app_names.append(app_name)
        duplicate_rows.append(row)
    else:
        unique_app_names.append(app_name)
print(duplicate_rows[0:3])
print("\n")
print("Number of duplicate app names: ", len(duplicate_app_names))
print("\n")
print("Duplicate app names: ","\n","\n",duplicate_app_names[:15] )
[['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'], ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device'], ['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up']]


Number of duplicate app names:  1181


Duplicate app names:  
 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
In [41]:
# 2.)  Confirm Duplicates

app_names_printed = ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']

print('Quick PDF Scanner + OCR FREE' in duplicate_app_names)
print("Box" in duplicate_app_names)
print('ZOOM Cloud Meetings' in duplicate_app_names)
True
True
True
In [42]:
# 3) Create dictionary reviews_max{} with app name as the key and max review number as the value
 
reviews_max = {}
for row in google_data[1:]:
    name = row[0]
    # reviews are stored in column[3] 
    n_reviews = float(row[3]) # since there are 1181 duplicate rows, select which row to keep by selecting row with most reviews
    if (name in reviews_max) and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
print("reviews_max count 9659 out of : ", len(reviews_max))  
reviews_max count 9659 out of :  9659
In [43]:
# 4.) remove duplicates
android_clean = []
already_added = []
count = 0
for row in google_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
for row in android_clean:
    count += 1
print("confirm length of android_clean list",count)
print("length of android_clean list:", len(android_clean))
confirm length of android_clean list 9659
length of android_clean list: 9659
In [44]:
# Removed 0 rows of inaccurate data for app_store data
# All rows should have 16 columns.  Count the rows that do not have 16 columns

column_count_16 = 0
column_count_other = 0
for row in app_store_data:
    column_count = 0
    for column in row:
        column_count += 1
    if column_count == 16:
        column_count_16 += 1
    else:
        column_count_other += 1
        print("This row does not have [16] columns:", "\n","\n")

print("\n")      
print(column_count_16, 'rows out of 7198 rows have [16] columns')    
print(column_count_other, 'row(s) does not have [16] columns')

7198 rows out of 7198 rows have [16] columns
0 row(s) does not have [16] columns
In [45]:
# Removed 0 duplicate rows of app_store data
unique_app_names_app_store = []
duplicate_app_names_app_store = []
duplicate_rows_app_store = []
for row in app_store_data:
    app_name = str(row[0])
    if app_name in unique_app_names_app_store:
        duplicate_app_names_app_store.append(app_name)
        duplicate_rows_app_store.append(row)
    else:
        unique_app_names_app_store.append(app_name)
print(duplicate_rows_app_store[0:3])
print("\n")
print("Number of duplicate app names: ", len(duplicate_app_names_app_store))
print("\n")
print("Duplicate app names: ","\n","\n",duplicate_app_names_app_store[:15] )
[]


Number of duplicate app names:  0


Duplicate app names:  
 
 []

3.) Remove 45 rows from google data and 1014 from app store data of non-English app versions

In [46]:
# create function to determine if app is an English version, removed 45
def is_English_app(a_string):
    char_over_127 = []
    for char in a_string:
        if ord(char) > 127:
            char_over_127.append(char)
    if len(char_over_127) > 3:
        return False # not English app
    return True # is an English app

# test if function can detect English or non-English app
instagram = is_English_app("Instagram")
pps = is_English_app('爱奇艺PPS -《欢乐颂2》电视剧热播')
docs_to_go = is_English_app('Docs To Go™ Free Office Suite')
instachat = is_English_app('Instachat 😜')
In [47]:
# filter out non-English apps from google_data
android_eng_ver = []
for row in android_clean:
    name = row[0]
    if is_English_app(name):
        android_eng_ver.append(row)
        
#filter out non_English apps from app_store_data, removed 1014
app_store_eng_ver = []
for row in app_store_data:
    name_2 = row[1]
    if is_English_app(name_2):
        app_store_eng_ver.append(row)
        
    
print('android english app versions: ', len(android_eng_ver), ", removed ", 9659-len(android_eng_ver) )
print('app store english app versions: ', len(app_store_eng_ver), ", removed", 7198 - len(app_store_eng_ver))
android english app versions:  9614 , removed  45
app store english app versions:  6184 , removed 1014

4.) Removed 750 rows of google_data and 2962 rows of app store data of paid apps

In [48]:
# isolate free apps in android_eng_ver list
google_free_apps=[]
for row in android_eng_ver:
    price = (row[7])
    if price == '0':
        google_free_apps.append(row)
        
    
# isolate free apps in app_store list    
app_store_free_apps = []   
for row in app_store_eng_ver:
    price = row[4]
    if price == '0.0':
        app_store_free_apps.append(row)

print("google free apps: ",len(google_free_apps), ", removed ", len(android_eng_ver)-len(google_free_apps))
print("app store free apps:", len(app_store_free_apps),", removed", len(app_store_eng_ver)-len(app_store_free_apps))
      
google free apps:  8864 , removed  750
app store free apps: 3222 , removed 2962

5. Analyze Data - What apps are more likely to attract users?

Remember, this project's goal is to provide the developers with a profitable app profile for the android and ios app stores that will likely attract more users.

How does the company validate a new app?

  • The company builds a minimal first app for Google Play
  • If the app receives good user responses, the company further develops the app
  • If the app is profitable after six months, the company then builds an app for the App Store

Company Background: The company sells free apps that are downloaded and installed from the Google Play store or the iOS App store. The company's main source of revenue is in-app ads. If the company can attract more users who engage with the ads, the company may be able to increase ad revenue.

Most Common Genres vs Most Popular Apps The prime_genre column of the App Store and the genre and category columns of Google Play are the best variables for analysis. After a quick analysis, games and entertainment are the top genres on both the App Store and Google Play. Does this mean these are the most downloaded or most popular apps?

We will begin to answer this question by creating frequency tables for any column:

In [49]:
#define function to create dictionary in value, key order so able to use sort().  The sort() only sorts the keys
def display_table(dataset, index):
    table = freq_table(dataset, index)
    
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
              
#define function to create frequency table of a column
def freq_table(dataset, index_int):
    freq_table_dict = {} # freq table dict for any column want
    for row in dataset:
        column = row[index_int]
        if column in freq_table_dict:
            freq_table_dict[column] += 1
        else:
            freq_table_dict[column] = 1
    #convert dictionary values as percentage
    freq_table_perc = {}
    for k_column,value in freq_table_dict.items():
        freq_table_perc[k_column] = round((value/len(dataset))*100,2)
    
    return freq_table_perc

#display frequency table for genres column from Google Play
google_free__apps_genre = display_table(google_free_apps, 9)      
Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;Brain Games : 0.14
Casual;Action & Adventure : 0.14
Arcade;Action & Adventure : 0.12
Action;Action & Adventure : 0.1
Educational;Pretend Play : 0.09
Simulation;Action & Adventure : 0.08
Parenting;Education : 0.08
Entertainment;Brain Games : 0.08
Board;Brain Games : 0.08
Parenting;Music & Video : 0.07
Educational;Brain Games : 0.07
Casual;Creativity : 0.07
Art & Design;Creativity : 0.07
Education;Pretend Play : 0.06
Role Playing;Pretend Play : 0.05
Education;Creativity : 0.05
Role Playing;Action & Adventure : 0.03
Puzzle;Action & Adventure : 0.03
Entertainment;Creativity : 0.03
Entertainment;Action & Adventure : 0.03
Educational;Creativity : 0.03
Educational;Action & Adventure : 0.03
Education;Music & Video : 0.03
Education;Brain Games : 0.03
Education;Action & Adventure : 0.03
Adventure;Action & Adventure : 0.03
Video Players & Editors;Music & Video : 0.02
Sports;Action & Adventure : 0.02
Simulation;Pretend Play : 0.02
Puzzle;Creativity : 0.02
Music;Music & Video : 0.02
Entertainment;Pretend Play : 0.02
Casual;Education : 0.02
Board;Action & Adventure : 0.02
Video Players & Editors;Creativity : 0.01
Trivia;Education : 0.01
Travel & Local;Action & Adventure : 0.01
Tools;Education : 0.01
Strategy;Education : 0.01
Strategy;Creativity : 0.01
Strategy;Action & Adventure : 0.01
Simulation;Education : 0.01
Role Playing;Brain Games : 0.01
Racing;Pretend Play : 0.01
Puzzle;Education : 0.01
Parenting;Brain Games : 0.01
Music & Audio;Music & Video : 0.01
Lifestyle;Pretend Play : 0.01
Lifestyle;Education : 0.01
Health & Fitness;Education : 0.01
Health & Fitness;Action & Adventure : 0.01
Entertainment;Education : 0.01
Communication;Creativity : 0.01
Comics;Creativity : 0.01
Casual;Music & Video : 0.01
Card;Action & Adventure : 0.01
Books & Reference;Education : 0.01
Art & Design;Pretend Play : 0.01
Art & Design;Action & Adventure : 0.01
Arcade;Pretend Play : 0.01
Adventure;Education : 0.01
In [50]:
#display frequency table for categories columns from Google Play
google_free__apps_categories = display_table(google_free_apps, 1)      
FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6
In [51]:
#display frequency table for prime_genre column from App Store
app_store_free__apps_genre = display_table(app_store_free_apps, 11)      
Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12
In [52]:
# calculate the average number of user ratings per app genre on App Store
freq_table_prime_genre_app_store = freq_table(app_store_free_apps, 11)

for genre in freq_table_prime_genre_app_store:
    total = 0 # sum of user ratings
    len_genre = 0 # apps specific to each genre
    avg_user_rating_by_genre = {}
    for row in app_store_free_apps:
        genre_app = row[11]
        # isolate the apps of each genre 
        if genre_app == genre:
            user_rating = float(row[5])
             # add up user ratings for the apps of that genre
            total += user_rating
            len_genre += 1
    # divide the sum of user ratings by the sum of apps belonging to that genre 
    avg_user_rating = total/len_genre
    avg_user_rating_by_genre[genre] = avg_user_rating
 
    # sort the average user ratings by genre in descending order
    table = avg_user_rating_by_genre
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0

6. Recommendation

 > App Store Recommendation

Based on a review of the frequency table for the genres column, around 58% of the apps on the App Store are in the games genre.

However, game apps only average around 22,000 user reviews per app. Navigation apps average the most user reviews with around 86,000 user reviews per app. Medical apps average the least reviews with only 612 user reviews per app.

Eventhough game apps are the leading genre, navigation apps receive the most user engagement. Since the company's revenue is dependent upon user engagement and interation, I would recommend a weather navigation app with a social media aspect where users would receive weather notifications or alerts and the user could interact with the app by:

  • posting weather pictures in "friends or family" groups
  • sending travel reminders to friends
  • creating weather maps that includes friends and family
  • receiving notifications or alerts if bad weather will impact their areas

I know after any bad weather, my family and friends always ask everyone to check-in. The user could also use this app if they have any of the following:

  • Important outdoor events
  • Upcoming road trips
  • Outdoor work events to schedule
In [53]:
#generate frequency table for "Category" column

google_category_freq_table = freq_table(google_free_apps,1)
 

for category in google_category_freq_table:
    total_installs = 0 # sum of installs
    len_category = 0 # apps specific to each genre
    avg_install_by_category = {}
    for row in google_free_apps:
        category_app = row[1]
        # isolate the apps of each genre 
        if category_app == category:
            installs = row[5]
            installs_replaced = installs.replace("+"," ")
            installs_replaced = installs_replaced.replace(",","")
            total_installs += int(installs_replaced)
            len_category += 1
    # divide the sum installs by the sum of apps belonging to that category 
    avg_installs = total_installs/len_category
    avg_install_by_category[category] = avg_installs
 
    # sort the average user ratings by genre in descending order
    table = avg_install_by_category
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_MAGAZINES : 9549178.467741935
MAPS_AND_NAVIGATION : 4056941.7741935486

Analyze the number of android installs on Google Play

Based on a review of the frequency table for the Category column, the top category is Family at around 18%.

However, Family apps only average around 3,695,641 installs per app. Communcation apps average the most installs with around 38,000,000 installs per app. Medical apps again average the least installs with only 120,550 per app.

Eventhough Family apps are the leading category, communication apps receive the most user engagement . Since the company's revenue is dependent upon user engagement and interation, I would recommend a Sports app with a social media aspect where users would receive sports alerts and the user could interact with the app by:

  • posting sports pictures with "friends or family" groups
  • sending sports reminders to friends
  • creating sports group that includes friends and family
  • creating sports watch parties

I know during "March Madness" and the NBA playoffs, we talked on the phone watching the game, so a watch party would have been great.

Next Steps for Additional Analysis: Analyze the frequency table for the Genre column of the Google Play dataset, and find useful patterns. Assume company could also make revenue via in-app purchases and subscriptions, and determine which genres seem to be liked the most by users. Refine project using data science project style guide.