The apps made by the company are free apps, so the main source of revenue comes from in-app ads. The more users an app attracts the more money it generates. The non english, and payed for apps are apps that the company does not produce, so they will need to be removed from the apps data gathered from the Apple Apps Store and the Google Play Store. From the remaning apps we will determain the most popular app genres based on the average number
Creates a function to open, read, and turn into lists csv files containing app data. That function is named: open_dataset and will take a file name and return it's contents as a list. The function is used to assign data with headers from 'AppleStore.csv' and 'googleplaystore.csv' to apps_data_wheader and play_store_data_wheader
apple_file = 'AppleStore.csv' # File name for Apple Apps Data
play_store_file = 'googleplaystore.csv' #File name for PLay Store Apps Data
def open_dataset(file_name):
open_file = open(file_name)
import csv
read_file = csv.reader(open_file)
data_list = list(read_file)
return data_list
# assigning Apple Apps Data list w/ header to apps_data_wheader:
apps_data_wheader = open_dataset(apple_file)
# assigning Play Store Apps Data list w/ header to play_store_data_wheader:
play_store_data_wheader = open_dataset(play_store_file)
Separates the headers and data from the two list created above
Make Separate lists for Headers from Apple Apps, and Play Store Apps Apple Apps Header ==> apps_data_header Play Store Header ==> play_store_data_header
Make separate lists for data for Apple Apps, and Play Store Apps Apple Apps Data ==> apps_data Play Store Data ==> play_store_data
apps_data_header = apps_data_wheader[0]
play_store_data_header = play_store_data_wheader[0]
apps_data = apps_data_wheader[1:]
play_store_data = play_store_data_wheader[1:]
Function explore_data takes the name of a list(dataset) and takes a slice(Start, End) of it to print. It can also give the number of rows and columns in the list but by default it does not.
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
explore_data(apps_data, 0, 5)
explore_data(play_store_data, 0, 5)
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']
explore_data(apps_data, 0, 5, True)
explore_data(play_store_data, 0, 5,True)
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] Number of rows: 7197 Number of columns: 16 ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] Number of rows: 10841 Number of columns: 13
Printing column names for both lists (Headers)
print(apps_data_header)
print(play_store_data_header)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
Apple Store Column Headings:
Column Name | Description |
---|---|
ID | App ID |
track_name | App Name |
size_bytes | Size (in Bytes) |
currency | Currency Type |
price | Price amount |
rating_count | User Rating counts (for all version) |
rating_count_ver | User Rating counts (for current version) |
user_rating | Average User Rating value (for all version) |
user_rating_ver | Average User Rating value (for current version) |
ver | Latest version code |
cont_rating | Content Rating |
prime_genre | Primary Genre |
sup_devices.num | Number of supporting devices |
ipadSc_urls.num | Number of screenshots showed for display |
lang.num | Number of supported languages |
vpp_lic | Vpp Device Based Licensing Enabled |
Google Play Store Column Headings:
Column Name | Description |
---|---|
App | Application name |
Category | Category the app belongs to |
Rating | Overall user rating of the app (as when scraped) |
Reviews | Number of user reviews for the app (as when scraped) |
Size | Size of the app (as when scraped) |
Installs | Number of user downloads/installs for the app (as when scraped) |
Type | Paid or Free |
Price | Price of the app (as when scraped) |
Content Rating | Age group the app is targeted at - Children / Mature 21+ / Adult |
Genres | An app can belong to multiple genres (apart from its main category). |
Last Updated | Date when the app was last updated on Play Store (as when scraped) |
Current Ver | Current version of the app available on Play Store (as when scraped) |
Android Ver | Min required Android version (as when scraped) |
This was innitially found in discussion on kaggle and verrified bellow
print(play_store_data[10472])
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
The row 10472 need to be removed from the list because of the missing category
del play_store_data[10472]
The apps_data list does not appear to have any missing data bassed on discussion on kaggle
We know that there are duplicate entries for the same apps in the data sets from the discussions on kaggle. They need to be identified and removed.
def dupe_check(data_set, name_col_num):
unique_apps = []
duplicate_apps = []
for app in data_set:
name = app[name_col_num]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
return unique_apps, duplicate_apps
apple_uniq_apps, apple_dupe_apps = dupe_check(apps_data, 1)
google_uniq_apps, google_dupe_apps = dupe_check(play_store_data, 0)
Check that the function works, and check how many dupelicate apps there in each list.
print(apple_dupe_apps)
print(len(apple_dupe_apps))
print('\n')
print(google_dupe_apps[:5])
print(len(google_dupe_apps))
['Mannequin Challenge', 'VR Roller Coaster'] 2 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings'] 1181
Make a dictionary named: riviews_max. In it we will store the max number of reviews for each app (the number of reviews are keyed to app names)
reviews_max = {}
for app in play_store_data:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
Check that the above code works. Once we remove the dupelicates from the data set for the google play store we should be left with only 9659 entries.
len(reviews_max)
9659
Now we make a list of lists for rows from google play store data (play_store_data) withoute dupelicate entries for apps.
Check the number of elements in android_clean. If evrithing is right there should be the same amount as in the dictionary reviews_max. (9659)
android_clean = [] #List of apps without dupelicates(w/ data for the apps)
already_added = [] #List of app names already inside android_clean (names only)
for row in play_store_data:
name = row[0]
n_reviews = float(row[3])
if n_reviews == reviews_max[name] and name not in already_added:
android_clean.append(row)
already_added.append(name)
len(android_clean)
9659
Function to check if string is written in english. Will be used to sort out non english apps from data. (We will check if an app name is written in english or not)
english letters tipicaly have a length of 127 or less characters. So to find out if an app name is english or not we check letter by letter for character length. However some app names might include emojies or other special text which have more characters than the standart 127 for english characters. So we will make an exeption for 3 character in an app name to be more than 127 and still consider the app to english.
def eng_check(string):
non_eng_count = 0
for character in string:
char_length = ord(character)
if char_length > 127:
non_eng_count += 1
elif non_eng_count > 3:
return False
return True
Check that the function eng_check works
eng_check('Instagram')
True
eng_check('爱奇艺PPS -《欢乐颂2》电视剧热播')
False
eng_check('Docs To Go™ Free Office Suite')
True
eng_check('Instachat 😜')
True
Use function to sort through both data sets for english apps and add them to a separate list.
eng_apps_android = []
for row in android_clean:
name = row[0]
eng_name = eng_check(name)
if eng_name == True:
eng_apps_android.append(row)
print(len(eng_apps_android))
print(len(android_clean))
9625 9659
eng_apps_apple = []
for row in apps_data:
name = row[0]
eng_name = eng_check(name)
if eng_name == True:
eng_apps_apple.append(row)
print(len(eng_apps_apple))
print(len(apps_data))
7197 7197
isolate free apps from eng_apps_android in free_apps_android
free_apps_android = []
for row in eng_apps_android:
price = row[7]
if price == '0' or price == '0.0':
free_apps_android.append(row)
print(len(free_apps_android))
print(len(eng_apps_android))
8875 9625
isolate free apps from eng_apps_apple in free_apps_apple
free_apps_apple = []
for row in eng_apps_apple:
price = row[4]
if price == '0' or price == '0.0':
free_apps_apple.append(row)
print(len(free_apps_apple))
print(len(eng_apps_apple))
4056 7197
The end goal for the company is to add an app to both Google Play and the App Store, and for that app to be popular on both platforms. To have the best chance of succeding in doing that we need to know what app generes are popular and common on both Google Play and App Store. The generes with apps that have large amounts of reviews are ones that will be popular because it implies that they have a large amount of users. Generes with a large amount of apps created for them are also of intrest because it implies that there is a large potential market for those apps which means more potential user for the app we wish to create.
Columns of Intrest:
Apple: rating_count(row[5]), prime_genre(row[11])
rating_count: User Rating counts (for all version) prime_genre: Primary Genre
Android: Reviews(row[3]), Category(row[1]), Genres(row[9])
Reviews: Number of user reviews for the app (as when scraped) Category: Category the app belongs to Genres: An app can belong to multiple genres (apart from its main category).
Function to make a frequency table out of one column in a dataset. Gives tables as a dictionary
def freq_table(dataset, index):
generes_count = {}
total_number_of_apps = len(dataset)
for row in dataset:
genere = row[index]
if genere in generes_count:
generes_count[genere] += 1
else:
generes_count[genere] = 1
for entry in generes_count:
generes_count[entry] = (generes_count[entry]/total_number_of_apps)*100
return generes_count
Function to print/display frequency tables
note: it return the tables as tuples
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
apple_geners_freq_table = freq_table(free_apps_apple, 11)
android_category_freq_table = freq_table(free_apps_android, 1)
android_geners_freq_table = freq_table(free_apps_android, 9)
print('Apple Geners Frequency Table')
display_table(free_apps_apple, 11)
print('\n')
print('Android_Categories_Frequency_Table')
display_table(free_apps_android, 1)
print('\n')
print('Android_Generes_Frequency_Table')
display_table(free_apps_android, 9)
Apple Geners Frequency Table Games : 55.64595660749507 Entertainment : 8.234714003944774 Photo & Video : 4.117357001972387 Social Networking : 3.5256410256410255 Education : 3.2544378698224854 Shopping : 2.983234714003945 Utilities : 2.687376725838264 Lifestyle : 2.3175542406311638 Finance : 2.0710059171597637 Sports : 1.947731755424063 Health & Fitness : 1.8737672583826428 Music : 1.6518737672583828 Book : 1.6272189349112427 Productivity : 1.5285996055226825 News : 1.4299802761341223 Travel : 1.3806706114398422 Food & Drink : 1.0601577909270217 Weather : 0.7642998027613412 Reference : 0.4930966469428008 Navigation : 0.4930966469428008 Business : 0.4930966469428008 Catalogs : 0.22189349112426035 Medical : 0.19723865877712032 Android_Categories_Frequency_Table FAMILY : 18.940845070422537 GAME : 9.712676056338028 TOOLS : 8.450704225352112 BUSINESS : 4.585915492957747 LIFESTYLE : 3.92112676056338 PRODUCTIVITY : 3.898591549295775 FINANCE : 3.695774647887324 MEDICAL : 3.5267605633802814 SPORTS : 3.391549295774648 PERSONALIZATION : 3.3126760563380286 COMMUNICATION : 3.2338028169014086 HEALTH_AND_FITNESS : 3.076056338028169 PHOTOGRAPHY : 2.952112676056338 NEWS_AND_MAGAZINES : 2.8056338028169012 SOCIAL : 2.659154929577465 TRAVEL_AND_LOCAL : 2.332394366197183 SHOPPING : 2.2422535211267602 BOOKS_AND_REFERENCE : 2.1521126760563383 DATING : 1.8591549295774648 VIDEO_PLAYERS : 1.7915492957746477 MAPS_AND_NAVIGATION : 1.3971830985915492 FOOD_AND_DRINK : 1.2394366197183098 EDUCATION : 1.1605633802816901 ENTERTAINMENT : 0.9577464788732395 LIBRARIES_AND_DEMO : 0.9352112676056339 AUTO_AND_VEHICLES : 0.923943661971831 HOUSE_AND_HOME : 0.8225352112676056 WEATHER : 0.8 EVENTS : 0.7098591549295774 PARENTING : 0.6535211267605634 ART_AND_DESIGN : 0.6422535211267606 COMICS : 0.6197183098591549 BEAUTY : 0.5971830985915494 Android_Generes_Frequency_Table Tools : 8.439436619718311 Entertainment : 6.073239436619718 Education : 5.352112676056338 Business : 4.585915492957747 Lifestyle : 3.9098591549295776 Productivity : 3.898591549295775 Finance : 3.695774647887324 Medical : 3.5267605633802814 Sports : 3.4591549295774646 Personalization : 3.3126760563380286 Communication : 3.2338028169014086 Action : 3.0985915492957745 Health & Fitness : 3.076056338028169 Photography : 2.952112676056338 News & Magazines : 2.8056338028169012 Social : 2.659154929577465 Travel & Local : 2.3211267605633803 Shopping : 2.2422535211267602 Books & Reference : 2.1521126760563383 Simulation : 2.0619718309859154 Dating : 1.8591549295774648 Arcade : 1.847887323943662 Video Players & Editors : 1.7690140845070423 Casual : 1.7577464788732393 Maps & Navigation : 1.3971830985915492 Food & Drink : 1.2394366197183098 Puzzle : 1.1267605633802817 Racing : 0.9915492957746479 Role Playing : 0.9352112676056339 Libraries & Demo : 0.9352112676056339 Auto & Vehicles : 0.923943661971831 Strategy : 0.9126760563380281 House & Home : 0.8225352112676056 Weather : 0.8 Events : 0.7098591549295774 Adventure : 0.676056338028169 Comics : 0.6084507042253521 Beauty : 0.5971830985915494 Art & Design : 0.5971830985915494 Parenting : 0.49577464788732395 Card : 0.4507042253521127 Casino : 0.428169014084507 Trivia : 0.4169014084507042 Educational;Education : 0.39436619718309857 Board : 0.38309859154929576 Educational : 0.37183098591549296 Education;Education : 0.3492957746478873 Word : 0.2591549295774648 Casual;Pretend Play : 0.23661971830985915 Music : 0.2028169014084507 Racing;Action & Adventure : 0.16901408450704225 Puzzle;Brain Games : 0.16901408450704225 Entertainment;Music & Video : 0.16901408450704225 Casual;Brain Games : 0.1352112676056338 Casual;Action & Adventure : 0.1352112676056338 Arcade;Action & Adventure : 0.12394366197183099 Action;Action & Adventure : 0.10140845070422536 Educational;Pretend Play : 0.09014084507042254 Simulation;Action & Adventure : 0.07887323943661971 Parenting;Education : 0.07887323943661971 Entertainment;Brain Games : 0.07887323943661971 Board;Brain Games : 0.07887323943661971 Parenting;Music & Video : 0.0676056338028169 Educational;Brain Games : 0.0676056338028169 Casual;Creativity : 0.0676056338028169 Art & Design;Creativity : 0.0676056338028169 Education;Pretend Play : 0.056338028169014086 Role Playing;Pretend Play : 0.04507042253521127 Education;Creativity : 0.04507042253521127 Role Playing;Action & Adventure : 0.03380281690140845 Puzzle;Action & Adventure : 0.03380281690140845 Entertainment;Creativity : 0.03380281690140845 Entertainment;Action & Adventure : 0.03380281690140845 Educational;Creativity : 0.03380281690140845 Educational;Action & Adventure : 0.03380281690140845 Education;Music & Video : 0.03380281690140845 Education;Brain Games : 0.03380281690140845 Education;Action & Adventure : 0.03380281690140845 Adventure;Action & Adventure : 0.03380281690140845 Video Players & Editors;Music & Video : 0.022535211267605635 Sports;Action & Adventure : 0.022535211267605635 Simulation;Pretend Play : 0.022535211267605635 Puzzle;Creativity : 0.022535211267605635 Music;Music & Video : 0.022535211267605635 Entertainment;Pretend Play : 0.022535211267605635 Casual;Education : 0.022535211267605635 Board;Action & Adventure : 0.022535211267605635 Video Players & Editors;Creativity : 0.011267605633802818 Trivia;Education : 0.011267605633802818 Travel & Local;Action & Adventure : 0.011267605633802818 Tools;Education : 0.011267605633802818 Strategy;Education : 0.011267605633802818 Strategy;Creativity : 0.011267605633802818 Strategy;Action & Adventure : 0.011267605633802818 Simulation;Education : 0.011267605633802818 Role Playing;Brain Games : 0.011267605633802818 Racing;Pretend Play : 0.011267605633802818 Puzzle;Education : 0.011267605633802818 Parenting;Brain Games : 0.011267605633802818 Music & Audio;Music & Video : 0.011267605633802818 Lifestyle;Pretend Play : 0.011267605633802818 Lifestyle;Education : 0.011267605633802818 Health & Fitness;Education : 0.011267605633802818 Health & Fitness;Action & Adventure : 0.011267605633802818 Entertainment;Education : 0.011267605633802818 Communication;Creativity : 0.011267605633802818 Comics;Creativity : 0.011267605633802818 Casual;Music & Video : 0.011267605633802818 Card;Action & Adventure : 0.011267605633802818 Books & Reference;Education : 0.011267605633802818 Art & Design;Pretend Play : 0.011267605633802818 Art & Design;Action & Adventure : 0.011267605633802818 Arcade;Pretend Play : 0.011267605633802818 Adventure;Education : 0.011267605633802818
The most common genre for teh apple app store is Games (%55.6), with the next most common being Entertainment (%8.2). The least common generes are Medical (% 0.2) folowed by Catalogs(% 0.2). Most apps are designed for entertaiment rather than practical purposes, with the Games gener making up more than half of the apps in the store it is clear to see that most apps are not made for practical purposes. Based on this table I recommend that the company focus on making apps intended for entertaiment and focus on the Games app generes. However this table only shows the percentage of apps made for each gener and not the ammount of user trafic or number of user, It does not imply that apps of the gaming gener have a large number of user or that other app generes have low number of users. Further study should be made on the number of reviews/ratings there for each genere of apps, which can be used to judge the number of user for each gener.
For the Google Play Store the most common app categories in order are Family (%18.9), Games (%9.7), and Tools (%8.5) with Family being slight more than the next 2 largest combined. The least common app categories are Beauty (%0.6) and Comics (%0.6). If we look at the secondary app generes for apps we will see that three most common generes are Tools (%8.4), Entertaiment (%6.0), Education(%5.3), and the least common generes are Arcade;Pretend Play (%0.01), and Adventure;Education (%0.01). It should be noted however that many of the generes in the genres column (containing secondary app generes for apps that accompany the primary app genres in the categorys column) overlap unlike those in the categories column.
On the Google Play Store app categories realated to productivity appear to have more apps than on the Apple App Store, however app categories related to entertaiment appear to still have a great number of apps. I would still recommend that the company focus on apps in the Games category even if there are less games apps than Family apps on the Google Play Store. I base this on the fact that there is an extremelly large number of games apps on the Apps Store. However the frequency tables for the Google Play Store again do not show that the categories have a large number of users, only that those categories have a large number of apps made for them. We can use that to guess that there is a large or small number of users however at the end of the day that is only a guess untill further analisis is made.
#The variable apple_geners_freq_table contains the frequency table for the
#column for prime_genre for Apple Apps Store
#The variable free_apps_apple contains a list of english apps with data
#that are free without doublicates
apple_genre_num_ratings = {} #Dictionary of Genres and average number of rating
#for genre
#Finding the average number of ratings for each genre:
for genre in apple_geners_freq_table:
total = 0 #Sum of user ratings specific to gener
len_genre = 0 #Number of apps specific to gener
for row in free_apps_apple:
genre_app = row[11]
if genre_app == genre:
num_user_ratings = float(row[5])
total += num_user_ratings
len_genre += 1
avg_num_user_ratings = total/len_genre
apple_genre_num_ratings[genre] = avg_num_user_ratings
#function to print the above dictionary in a readable format for analisis
def display_table_for_dictionary(dictionary):
table = dictionary
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
display_table_for_dictionary(apple_genre_num_ratings)
Reference : 67447.9 Music : 56482.02985074627 Social Networking : 53078.195804195806 Weather : 47220.93548387097 Photo & Video : 27249.892215568863 Navigation : 25972.05 Travel : 20216.01785714286 Food & Drink : 20179.093023255813 Sports : 20128.974683544304 Health & Fitness : 19952.315789473683 Productivity : 19053.887096774193 Games : 18924.68896765618 Shopping : 18746.677685950413 News : 15892.724137931034 Utilities : 14010.100917431193 Finance : 13522.261904761905 Entertainment : 10822.961077844311 Lifestyle : 8978.308510638299 Book : 8498.333333333334 Business : 6367.8 Education : 6266.333333333333 Catalogs : 1779.5555555555557 Medical : 459.75
Based on the above table of average number of user ratings in each genre I can can conlude that the two app genres that reseive the most users are: References, and Music. The two app genres that reseive the least amount of user ratings are: Medical and Catalogs. Based on the table above I recommend that the company focus on making apps for the Apple Apps Store in the Reference's app genre as that genre appeares to reseive the most user trafic.
#The variable android_category_freq_table contains the frequency table for the
#column for Category from the Google Play Store
#The variable free_apps_android contains a list of english apps with data
#that are free without doublicates
# number of instals is row[5]
android_num_instals = {} #Dictionary of Categories and average number of rating
#for Category
for category in android_category_freq_table:
total = 0
len_category = 0
for row in free_apps_android:
category_app = row[1]
if category_app == category:
num_install = row[5]
num_install = num_install.replace('+', '')
num_install = num_install.replace(',', '')
num_install = float(num_install)
total += num_install
len_category += 1
avg_num_installs = total/len_category
android_num_instals[category] = avg_num_installs
display_table_for_dictionary(android_num_instals)
COMMUNICATION : 38456119.167247385 VIDEO_PLAYERS : 24727872.452830188 SOCIAL : 23253652.127118643 PHOTOGRAPHY : 17772018.759541985 PRODUCTIVITY : 16738957.554913295 GAME : 15588015.603248259 TRAVEL_AND_LOCAL : 13984077.710144928 ENTERTAINMENT : 11640705.88235294 TOOLS : 10801391.298666667 NEWS_AND_MAGAZINES : 9514844.417670682 BOOKS_AND_REFERENCE : 8721959.47643979 SHOPPING : 7036877.311557789 PERSONALIZATION : 5201482.6122448975 WEATHER : 5074486.197183099 HEALTH_AND_FITNESS : 4188821.9853479853 MAPS_AND_NAVIGATION : 4056941.7741935486 FAMILY : 3684783.277810827 SPORTS : 3638640.1428571427 ART_AND_DESIGN : 1986335.0877192982 FOOD_AND_DRINK : 1924897.7363636363 EDUCATION : 1833495.145631068 BUSINESS : 1712290.1474201474 LIFESTYLE : 1429725.3706896552 FINANCE : 1387692.475609756 HOUSE_AND_HOME : 1331540.5616438356 DATING : 854028.8303030303 COMICS : 817657.2727272727 AUTO_AND_VEHICLES : 647317.8170731707 LIBRARIES_AND_DEMO : 638503.734939759 PARENTING : 542603.6206896552 BEAUTY : 513151.88679245283 EVENTS : 253542.22222222222 MEDICAL : 120550.61980830671
The three generes with the most installs are Communication, Video Players, and Social. The two geners with the least installs are Medical, and Events. I recommend based on the company strategy that the commpany focuss on making apps in the geners Social, Video_PLayers, and Photography. This is because we can assume a large number of user traffic on apps in those categories because they have some of the largest number of app installs in those categories. If the apps made in those categories are successfull they can transfered to Apple Apps Store where those categories also have large amount of user traffic. Note that the categories on Google Play, Video_players, and Photography apear to be combined on the Apple Apps Store as Photo & Video.