In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_piece = dataset[start:end]
    for row in dataset_piece:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print("Number of rows: ", len(dataset))
        print('Number of columns: ', len(dataset[0]))
In [4]:
def print_column_names(dataset):
    for item in dataset:
        print(item)
        

Below are function calls to display information, including the column names.

The Google Play Store column name descriptions can be found here --> Google Play Column Names.

The Apple Store column name descriptions can be found here --> Apple Column Names.

In [5]:
from csv import reader
print('Apple Store File...\n')
apple_store_file = open('AppleStore.csv')
read_apple_store_file = reader(apple_store_file) #reader function MUST be used to properly build a list
apple_store_list = list(read_apple_store_file)
explore_data(apple_store_list, 0, 3, True)

print('\nAndroid File....\n')
android_file = open('googleplaystore.csv')
read_android_file = reader(android_file)
android_list = list(read_android_file)
explore_data(android_list, 0, 4, True)

#explore_data(android_list, 0, 3, True)
Apple Store File...

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows:  7198
Number of columns:  16

Android File....

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  10842
Number of columns:  13
In [29]:
print(len(apple_store_list[0]))
i = 0
for item in apple_store_list[0]:
    i+=1
    print(i, ':', item)
16
1 : id
2 : track_name
3 : size_bytes
4 : currency
5 : price
6 : rating_count_tot
7 : rating_count_ver
8 : user_rating
9 : user_rating_ver
10 : ver
11 : cont_rating
12 : prime_genre
13 : sup_devices.num
14 : ipadSc_urls.num
15 : lang.num
16 : vpp_lic

Check For Bad Data

Below is a check to determine if any list item has less than the correct number of columns, which is 13

In [48]:
i = 0
for item in android_list:
    if len(item) < 13:
        print(item)
        print('Number of columns: ', len(item))
        print('Index of item: ', i)
        print('\n')
    i += 1

Below is a function to locate any rows with missing columns.

In [40]:
def missing_columns(dataset):
    column_size = len(dataset[0])
    for row in dataset[1:]:
        if len(row) < column_size:
            print('Missing Column Info: ', row)

Remove Row Function

Below is a function to remove a bad row of data in a given data set.

In [8]:
def remove_row(dataset, index):
    del dataset[index]

    
    
In [9]:
print(android_list[10473])
print(len(android_list))
print('\n')
remove_row(android_list, 10473)
print('\n')
print(android_list[10473])
print(len(android_list))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10842




['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']
10841

The Data Has Duplicates

The data set has duplicate rows, which means that some of the apps are listed more than once, and there can be only one original or unique app. For example:

In [52]:
duplicate_examples = []
unique_apps = []
for row in android_list:
    app_name = row[0]
    if app_name in unique_apps:
        duplicate_examples.append(app_name)
    else:
        unique_apps.append(app_name)
        
print("Number of duplicates: ", len(duplicate_examples))
print("First Few Duplicates: ", duplicate_examples[0:4])
        
Number of duplicates:  1181
First Few Duplicates:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings']

The duplicates will not be removed randomly as we want to keep the most up-to-date review count, which would be the highest. Below is a method for collecting the duplicate apps. The date or the number of reviews will be used to determine the most recent app to keep.

In [54]:
android_duplicates = []
android_unique = []
for app in android_list:
    name = app[0]
    if name in android_unique:
        android_duplicates.append(name)
    else:
        android_unique.append(name)
        
print(len(android_duplicates))
1181
In [60]:
apple_duplicates = []
apple_unique = []
for app in apple_store_list:
    name = app[1]
    if name in apple_unique:
        apple_duplicates.append(name)
    else:
        apple_unique.append(name)    

print('Apple Unique Length: ', len(apple_unique))
print('Apple Duplicate Length: ', len(apple_duplicates))
Apple Unique Length:  7196
Apple Duplicate Length:  2

To determine if an app already exists in the dictionary ('reviews_max'), there are two 'if' statements. The second one utilizies the 'not in' operator rather than 'else'. This is because a duplicate may have varying numbers of reviews, with the first one say, 1, the second one, say 100, a third duplicate, say 50. So the first app name and app reviews (total reviews) will be put into the dictionary. The the first duplicate is found, which has 100 reviews. The first statement then evaluates to 'True'. Then when the second dupliate is found, the first 'if' statement evealuates to 'False', which then would execute the 'Else' statement. Doing so would then add the second duplicate into the dictionary, thus not eliminating the duplicate.

When using the 'not in' operator, the second duplicate (the app with 50 ratings) will result in 'False' in the first 'if' statement, and then since the app already exists as a key in the dictionary, the result for the second 'if' statement will be 'False'. But this is okay since the app has already been added and has the highest rating.

In [12]:
reviews_max = {}
for row in android_list[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    #else DO NOT use 'else'. Please see note above as to why.
    if name not in reviews_max:
        reviews_max[name] = n_reviews
 
print('Max Length: ', len(reviews_max))

android_clean = []
added = []

for row in android_list[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if (n_reviews == reviews_max[name]) and (name not in added):
        android_clean.append(row)
        added.append(name)

print('Clean: ', len(android_clean))
Max Length:  9659
Clean:  9659
In [13]:
def check_string(string):
    for char in string:
        if 0 < ord(char) > 127:
            return False
    return True
In [14]:
def check_string(string):
    c = 0
    for char in string:
        if ord(char) > 127 or ord(char) < 0:
            c += 1
            if c > 3:
                return False
    c = 0
    return True
In [61]:
android_char_cleaned = []
android_char_foreign = []

apple_char_cleaned = []
apple_char_foreign = []

def app_char_check(dataset, index_position):
    cleaned_list = []
    dirty_list = []
    # index_position is the index of the app name. Android is '0', Apple is '1'
    c = 0
    for row in dataset:
        app_name = row[index_position]
        for char in app_name:
            if ord(char) > 127 or ord(char) < 0:
                c += 1
                if c > 3:
                    dirty_list.append(row)
                    break
        if c <= 3:
            cleaned_list.append(row)
        c = 0
    return cleaned_list, dirty_list

                    
               #apple_store_list     
apple_char_cleaned, apple_char_foreign = app_char_check(apple_store_list, 1)
android_char_cleaned, android_char_foreign = app_char_check(android_clean, 0)
print('Apple Cleaned Length: ', len(apple_char_cleaned))
print('Apple Original File Length: ', len(apple_store_list))
print('Apple Dirty Length: ', len(apple_char_foreign))
#print('Apple Dirty: ', apple_char_foreign)


print('Length of New Anroid: ', len(android_char_cleaned))
print('Length of Dirty Anroid: ', len(android_char_foreign))
#print('Android Foreign: ', android_char_foreign)
Apple Cleaned Length:  6184
Apple Original File Length:  7198
Apple Dirty Length:  1014
Length of New Anroid:  9614
Length of Dirty Anroid:  45

Below is a function to filter out the free apps into a new list.

In [87]:
# apple index position of price is 4
# android index position of price is 7

def get_free_apps(dataset, index):
    free_apps = []
    for row in dataset[1:]:
        price = row[index]
        if '$' in price:
            price = float(price.replace('$', ''))
        else:
            price = float(price)
        if price == 0:
            free_apps.append(row)
    return free_apps
        
        
apple_free_apps = get_free_apps(apple_char_cleaned, 4)
android_free_apps = get_free_apps(android_char_cleaned, 7)

print("Apple Free Apps", len(apple_free_apps))
print("Android Free Apps", len(android_free_apps))
Apple Free Apps 3222
Android Free Apps 8863
In [16]:
example = '爱奇艺艺'
results = []
results = check_string(example)
print(results)
False

We are looking for the most popular app genres in both the Apple Store and Android Store datasets. This is because we want to build an app for an audience that has the most likely chance of success on BOTH platforms. We will analyze the 'genre' columns of both datasets to get a good idea of which genres pose the best chances of success, which is adoption and usage.

In [93]:
for genre in apple_free_apps[1:2]:
    print(genre[11])
Photo & Video
In [94]:
for genre in android_free_apps[1:2]:
    print(genre[9])
Art & Design

Below is a function that creates a frequency table of column topics such as app genre.

In [168]:
def freq_table(dataset, index_value):
    freq_dictionary = {}
    for row in dataset:
        name = row[index_value]
        if name in freq_dictionary:
            freq_dictionary[name] += 1
        else:
            freq_dictionary[name] = 1
            
    dataset_length = len(dataset)  
    for key, value in freq_dictionary.items():
        freq_dictionary[key] = (value/dataset_length * 100)
    return freq_dictionary
In [170]:
apple_freq_result = freq_table(apple_free_apps, 11)
android_freq_CATEGORY_result = freq_table(android_free_apps, 1)
android_freq_GENRE_result = freq_table(android_free_apps, 9)

sorted_apple_freq_result = sorted(apple_freq_result.items(), key=lambda x: x[1], reverse=True)
sorted_android_freq_CATEGORY_result = sorted(android_freq_CATEGORY_result.items(), key=lambda x: x[1], reverse=True)
sorted_android_freq_GENRE_result = sorted(android_freq_GENRE_result.items(), key=lambda x: x[1], reverse=True)
print('Apple genres: ', sorted_apple_freq_result, '\n')
print('Android Categories: ', sorted_android_freq_CATEGORY_result, '\n')
print('Android Genres: ', sorted_android_freq_GENRE_result)
#print("Apple Genre Below...")
#for row in sorted_apple_freq_result:
#    print(row[0], ':', row[1])
#print('Android Category Below...')
#for row in sorted_android_freq_CATEGORY_result:
#    print(row[0], ':', row[1])
#print('Android Genre Below...')
#for row in sorted_android_freq_GENRE_result:
#    print(row[0], ':', row[1])
Apple genres:  [('Games', 58.16263190564867), ('Entertainment', 7.883302296710118), ('Photo & Video', 4.9658597144630665), ('Education', 3.662321539416512), ('Social Networking', 3.2898820608317814), ('Shopping', 2.60707635009311), ('Utilities', 2.5139664804469275), ('Sports', 2.1415270018621975), ('Music', 2.0484171322160147), ('Health & Fitness', 2.0173805090006205), ('Productivity', 1.7380509000620732), ('Lifestyle', 1.5828677839851024), ('News', 1.3345747982619491), ('Travel', 1.2414649286157666), ('Finance', 1.1173184357541899), ('Weather', 0.8690254500310366), ('Food & Drink', 0.8069522036002483), ('Reference', 0.5586592178770949), ('Business', 0.5276225946617008), ('Book', 0.4345127250155183), ('Navigation', 0.186219739292365), ('Medical', 0.186219739292365), ('Catalogs', 0.12414649286157665)] 

Android Categories:  [('FAMILY', 18.910075595170937), ('GAME', 9.725826469592688), ('TOOLS', 8.462146000225657), ('BUSINESS', 4.592124562789123), ('LIFESTYLE', 3.9038700214374367), ('PRODUCTIVITY', 3.8925871601038025), ('FINANCE', 3.7007785174320205), ('MEDICAL', 3.5315355974275078), ('SPORTS', 3.396141261423897), ('PERSONALIZATION', 3.317161232088458), ('COMMUNICATION', 3.2381812027530184), ('HEALTH_AND_FITNESS', 3.0802211440821394), ('PHOTOGRAPHY', 2.944826808078529), ('NEWS_AND_MAGAZINES', 2.798149610741284), ('SOCIAL', 2.6627552747376737), ('TRAVEL_AND_LOCAL', 2.335552296062281), ('SHOPPING', 2.245289405393208), ('BOOKS_AND_REFERENCE', 2.1437436533904997), ('DATING', 1.8616721200496444), ('VIDEO_PLAYERS', 1.7939749520478394), ('MAPS_AND_NAVIGATION', 1.399074805370642), ('FOOD_AND_DRINK', 1.241114746699763), ('EDUCATION', 1.1621347173643235), ('ENTERTAINMENT', 0.9590432133589079), ('LIBRARIES_AND_DEMO', 0.9364774906916393), ('AUTO_AND_VEHICLES', 0.9251946293580051), ('HOUSE_AND_HOME', 0.8236488773552973), ('WEATHER', 0.8010831546880289), ('EVENTS', 0.7108202640189552), ('PARENTING', 0.6544059573507841), ('ART_AND_DESIGN', 0.6318402346835158), ('COMICS', 0.6205573733498815), ('BEAUTY', 0.5979916506826132)] 

Android Genres:  [('Tools', 8.450863138892023), ('Entertainment', 6.070179397495204), ('Education', 5.348076272142616), ('Business', 4.592124562789123), ('Lifestyle', 3.8925871601038025), ('Productivity', 3.8925871601038025), ('Finance', 3.7007785174320205), ('Medical', 3.5315355974275078), ('Sports', 3.463838429425702), ('Personalization', 3.317161232088458), ('Communication', 3.2381812027530184), ('Action', 3.102786866749408), ('Health & Fitness', 3.0802211440821394), ('Photography', 2.944826808078529), ('News & Magazines', 2.798149610741284), ('Social', 2.6627552747376737), ('Travel & Local', 2.324269434728647), ('Shopping', 2.245289405393208), ('Books & Reference', 2.1437436533904997), ('Simulation', 2.042197901387792), ('Dating', 1.8616721200496444), ('Arcade', 1.8503892587160102), ('Video Players & Editors', 1.771409229380571), ('Casual', 1.7601263680469368), ('Maps & Navigation', 1.399074805370642), ('Food & Drink', 1.241114746699763), ('Puzzle', 1.128286133363421), ('Racing', 0.9928917973598104), ('Libraries & Demo', 0.9364774906916393), ('Role Playing', 0.9364774906916393), ('Auto & Vehicles', 0.9251946293580051), ('Strategy', 0.9139117680243709), ('House & Home', 0.8236488773552973), ('Weather', 0.8010831546880289), ('Events', 0.7108202640189552), ('Adventure', 0.6769716800180525), ('Comics', 0.6092745120162473), ('Beauty', 0.5979916506826132), ('Art & Design', 0.5867087893489789), ('Parenting', 0.4964458986799052), ('Card', 0.4513144533453684), ('Casino', 0.42874873067809993), ('Trivia', 0.4174658693444658), ('Educational;Education', 0.3949001466771973), ('Board', 0.38361728534356315), ('Educational', 0.37233442400992894), ('Education;Education', 0.33848584000902626), ('Word', 0.25950581067358686), ('Casual;Pretend Play', 0.2369400880063184), ('Music', 0.20309150400541578), ('Entertainment;Music & Video', 0.16924292000451313), ('Puzzle;Brain Games', 0.16924292000451313), ('Racing;Action & Adventure', 0.16924292000451313), ('Casual;Brain Games', 0.1353943360036105), ('Casual;Action & Adventure', 0.1353943360036105), ('Arcade;Action & Adventure', 0.1241114746699763), ('Action;Action & Adventure', 0.10154575200270789), ('Educational;Pretend Play', 0.09026289066907367), ('Entertainment;Brain Games', 0.07898002933543948), ('Simulation;Action & Adventure', 0.07898002933543948), ('Board;Brain Games', 0.07898002933543948), ('Parenting;Education', 0.07898002933543948), ('Art & Design;Creativity', 0.06769716800180525), ('Educational;Brain Games', 0.06769716800180525), ('Casual;Creativity', 0.06769716800180525), ('Parenting;Music & Video', 0.06769716800180525), ('Education;Pretend Play', 0.05641430666817105), ('Education;Creativity', 0.045131445334536835), ('Role Playing;Pretend Play', 0.045131445334536835), ('Education;Brain Games', 0.033848584000902626), ('Entertainment;Creativity', 0.033848584000902626), ('Educational;Creativity', 0.033848584000902626), ('Adventure;Action & Adventure', 0.033848584000902626), ('Role Playing;Action & Adventure', 0.033848584000902626), ('Educational;Action & Adventure', 0.033848584000902626), ('Entertainment;Action & Adventure', 0.033848584000902626), ('Puzzle;Action & Adventure', 0.033848584000902626), ('Education;Action & Adventure', 0.033848584000902626), ('Education;Music & Video', 0.033848584000902626), ('Casual;Education', 0.022565722667268417), ('Music;Music & Video', 0.022565722667268417), ('Simulation;Pretend Play', 0.022565722667268417), ('Puzzle;Creativity', 0.022565722667268417), ('Sports;Action & Adventure', 0.022565722667268417), ('Board;Action & Adventure', 0.022565722667268417), ('Entertainment;Pretend Play', 0.022565722667268417), ('Video Players & Editors;Music & Video', 0.022565722667268417), ('Comics;Creativity', 0.011282861333634209), ('Lifestyle;Pretend Play', 0.011282861333634209), ('Art & Design;Pretend Play', 0.011282861333634209), ('Entertainment;Education', 0.011282861333634209), ('Arcade;Pretend Play', 0.011282861333634209), ('Art & Design;Action & Adventure', 0.011282861333634209), ('Strategy;Action & Adventure', 0.011282861333634209), ('Music & Audio;Music & Video', 0.011282861333634209), ('Health & Fitness;Education', 0.011282861333634209), ('Casual;Music & Video', 0.011282861333634209), ('Travel & Local;Action & Adventure', 0.011282861333634209), ('Tools;Education', 0.011282861333634209), ('Parenting;Brain Games', 0.011282861333634209), ('Video Players & Editors;Creativity', 0.011282861333634209), ('Health & Fitness;Action & Adventure', 0.011282861333634209), ('Trivia;Education', 0.011282861333634209), ('Lifestyle;Education', 0.011282861333634209), ('Card;Action & Adventure', 0.011282861333634209), ('Books & Reference;Education', 0.011282861333634209), ('Simulation;Education', 0.011282861333634209), ('Puzzle;Education', 0.011282861333634209), ('Adventure;Education', 0.011282861333634209), ('Role Playing;Brain Games', 0.011282861333634209), ('Strategy;Education', 0.011282861333634209), ('Racing;Pretend Play', 0.011282861333634209), ('Communication;Creativity', 0.011282861333634209), ('Strategy;Creativity', 0.011282861333634209)]

This function will create a frequency table of a column name, and then for that particular column heading drill down further to return the sum of yet another particular column name. Example: frequency table of genres and then the number of installs of a particular genre.

In [122]:
def apps_most_used(dataset, genre_index, install_index):
    freq_dictionary = {}
    for row in dataset:
        name = row[genre_index]
        usage = row[install_index]
        print(usage)
In [171]:
# first, create a frequency table for the genres
def apple_store_genre_usage(dataset):
    genre_dictionary = {}
    for row in dataset:
        name = row[11]
        if name in genre_dictionary:
            genre_dictionary[name] += 1
        else:
            genre_dictionary[name] = 1
            total = 0
            genre_count = 0
            for genre in dataset[1:]:
                if genre[11] == name:
                    genre_count += float(genre[5])
                    total += 1
            avg_num_user_ratings = genre_count/total
            print(name, ':', avg_num_user_ratings)
                
                
    #return genre_dictionary

# Use the apple_char_cleaned list
# the index for the 'rating_count_total' is: 5
# the index for the genre (i.e. 'prime_genre') is: 11
apple_store_genre_usage(apple_free_apps)
Social Networking : 43899.514285714286
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0
In [205]:
# android_free_apps is the dataset
# The Category name has an index of 1
# Installs has an index of 5

freq_dictionary = {}
for row in android_free_apps:
    name = row[1]
    total = 0 # total number of installs for a category
    install_number = 0 # the number of installs for one app
    count = 0 # the number of apps specific to one genre
    if name in freq_dictionary:
        freq_dictionary[name] += 1
    else:
        freq_dictionary[name] = 1
        for category in android_free_apps:
            if category[1] == name:
                install_number = category[5]
                install_number = install_number.replace('+', '')
                install_number = float(install_number.replace(',', ''))
                total += install_number
                count += 1
                
        avg_installs = total/count
        print(name, ':', avg_installs)


    
ART_AND_DESIGN : 2021626.7857142857
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_MAGAZINES : 9549178.467741935
MAPS_AND_NAVIGATION : 4056941.7741935486
In [214]:
# The GENRRE has an index of 9
# The INSTALLS has an index of 5
freq_dictionary = {}
for row in android_free_apps:
    name = row[9]
    if name in freq_dictionary:
        freq_dictionary[name] += 1
    else:
        freq_dictionary[name] = 1
        total = 0
        install_count = 0
        count = 0
        for genre in android_free_apps:
            if genre[9] == name:
                install_count = genre[5].replace('+', '') 
                install_count = float(install_count.replace(',', ''))
                count += 1
                total += install_count
        print(name, ':', total/count)

    
Art & Design : 2163482.6923076925
Art & Design;Creativity : 285000.0
Auto & Vehicles : 647317.8170731707
Beauty : 513151.88679245283
Books & Reference : 8767811.894736841
Business : 1712290.1474201474
Comics : 831873.1481481482
Comics;Creativity : 50000.0
Communication : 38456119.167247385
Dating : 854028.8303030303
Education : 550185.4430379746
Education;Creativity : 2875000.0
Education;Education : 4759517.0
Education;Pretend Play : 1800000.0
Education;Brain Games : 5333333.333333333
Entertainment : 5602792.775092937
Entertainment;Brain Games : 3314285.714285714
Entertainment;Creativity : 4000000.0
Entertainment;Music & Video : 6413333.333333333
Events : 253542.22222222222
Finance : 1387692.475609756
Food & Drink : 1924897.7363636363
Health & Fitness : 4188821.9853479853
House & Home : 1331540.5616438356
Libraries & Demo : 638503.734939759
Lifestyle : 1412998.3449275363
Lifestyle;Pretend Play : 10000000.0
Card : 3815462.5
Arcade : 22888365.48780488
Puzzle : 8302861.91
Racing : 15910645.681818182
Sports : 4596842.615635179
Casual : 19569221.602564104
Simulation : 3475484.08839779
Adventure : 4922785.333333333
Trivia : 3475712.7027027025
Action : 12603588.872727273
Word : 9094458.695652174
Role Playing : 3965645.421686747
Strategy : 11199902.530864198
Board : 4759209.117647059
Music : 9445583.333333334
Action;Action & Adventure : 5888888.888888889
Casual;Brain Games : 1425916.6666666667
Educational;Creativity : 2333333.3333333335
Puzzle;Brain Games : 9280666.666666666
Educational;Education : 1737143.142857143
Casual;Pretend Play : 6957142.857142857
Educational;Brain Games : 4433333.333333333
Art & Design;Pretend Play : 500000.0
Educational;Pretend Play : 9375000.0
Entertainment;Education : 1000000.0
Casual;Education : 1000000.0
Casual;Creativity : 5333333.333333333
Casual;Action & Adventure : 12916666.666666666
Music;Music & Video : 5050000.0
Arcade;Pretend Play : 1000000.0
Adventure;Action & Adventure : 35333333.333333336
Role Playing;Action & Adventure : 7000000.0
Simulation;Pretend Play : 550000.0
Puzzle;Creativity : 750000.0
Simulation;Action & Adventure : 4857142.857142857
Racing;Action & Adventure : 8816666.666666666
Sports;Action & Adventure : 5050000.0
Educational;Action & Adventure : 17016666.666666668
Arcade;Action & Adventure : 3190909.1818181816
Entertainment;Action & Adventure : 2333333.3333333335
Art & Design;Action & Adventure : 100000.0
Puzzle;Action & Adventure : 18366666.666666668
Education;Action & Adventure : 1000000.0
Strategy;Action & Adventure : 1000000.0
Music & Audio;Music & Video : 500000.0
Health & Fitness;Education : 100000.0
Board;Action & Adventure : 3000000.0
Board;Brain Games : 407142.85714285716
Casual;Music & Video : 10000000.0
Education;Music & Video : 2033333.3333333333
Role Playing;Pretend Play : 5275000.0
Entertainment;Pretend Play : 3000000.0
Medical : 120550.61980830671
Social : 23253652.127118643
Shopping : 7036877.311557789
Photography : 17840110.40229885
Travel & Local : 14051476.145631067
Travel & Local;Action & Adventure : 100000.0
Tools : 10802461.246995995
Tools;Education : 10000000.0
Personalization : 5201482.6122448975
Productivity : 16787331.344927534
Parenting : 467977.5
Parenting;Music & Video : 1118333.3333333333
Parenting;Education : 452857.14285714284
Parenting;Brain Games : 1000000.0
Weather : 5074486.197183099
Video Players & Editors : 24947335.796178345
Video Players & Editors;Music & Video : 7500000.0
Video Players & Editors;Creativity : 5000000.0
News & Magazines : 9549178.467741935
Maps & Navigation : 4056941.7741935486
Health & Fitness;Action & Adventure : 1000000.0
Educational : 411184.8484848485
Casino : 3427910.5263157897
Trivia;Education : 100.0
Lifestyle;Education : 100000.0
Card;Action & Adventure : 10000000.0
Books & Reference;Education : 1000.0
Simulation;Education : 500.0
Puzzle;Education : 100000.0
Adventure;Education : 10000000.0
Role Playing;Brain Games : 10000000.0
Strategy;Education : 500000.0
Racing;Pretend Play : 1000000.0
Communication;Creativity : 500000.0
Strategy;Creativity : 1000000.0