Profitable App Profiles for the App Store and Google Play Markets

The aim of this project is to find mobile app profiles that could be profitable both in the app store and google play markets. We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

Collect And Analyze Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

py1m8_statista.png Source: Statista

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, here are two data sets that seem suitable for our goals:

  1. A dataset containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this link.
  2. A dataset containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this link.

Define function open_data() for opening CSV file

Seperate header and the list

In [1]:
def open_data(file_name):
    open_file = open(file_name,encoding="utf8")
    from csv import reader
    read_file = reader(open_file)
    data = list(read_file)
    return data

apps_header = open_data('AppleStore.csv')[0]
apps_data = open_data('AppleStore.csv')[1:]
google_header = open_data('googleplaystore.csv')[0]
google_data = open_data('googleplaystore.csv')[1:]

Define function explore_data() for exploring data with ease

Also add an option to show the number of rows and columns of the data

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
In [3]:
print(apps_header)
print('\n')
explore_data(apps_data, 1, 3, True)
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17

The columns that might be useful for the purpose of this analysis are: track_name, currency, price, rating_count_tot, user_rating, definitions of the columns could be found in data set documents

In [4]:
print(google_header)
print('\n')
explore_data(google_data, 1, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13

The columns that might be useful for the purpose of this analysis are: App, Category, Rating, Reviews, Installs, Type, Price, Content Rating, Genres

Data Cleaning

The Google Play dataset has a dedicated discussion section, and we can see that one of the discussions describes an error for row 10472.

Print the row at that index to check if it's incorrect by comparing it to a correct row.

In [5]:
print(google_header)
print('\n')
error_row = google_data[10472]
print(error_row)
print('\n')
right_row = google_data[0]
print(right_row)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']

Category is missing in row 10472, causing all the numbers to shift.

The row would be removed using the del statement. Check the length of the list to ensure the del statement has only been run once.

In [6]:
print(len(google_data))
del google_data[10472]
print(len(google_data))
10841
10840

Data Cleaning: Removing Duplicates

If you explore the Google Play data set long enough or look at the discussions section, you'll notice some apps have duplicate.

In [7]:
for row in google_data:
    name = row[0]
    if name == 'Instagram':
        print(row) 
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

For instance, Instagram appeared 4 times in the list.

To identify the duplicate apps, create an empty list for duplicate apps and unique apps. Loop through google play data and add the apps to the duplicate list if it is already in the unique list. Otherwise, add the app to the unique list.

In [8]:
duplicate_apps = []
unique_apps = []

for row in google_data:
    name = row[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Example of duplicate apps:','\n', duplicate_apps[:10])
Number of duplicate apps: 1181


Example of duplicate apps: 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']

Since we don't want to count duplicate apps, we have to remove the duplicate entries and keep only one entry per apps. If we look at the 4th column of the list,'Reviews', we realize that the data was taken at different time because the reviews numbers are different. To delete the duplicate entries accordingly, we will have to keep the entry with the highest number of reviews.

To remove the duplicates, Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.Use the information stored in the dictionary and create a new dataset, which will have only one entry per app .

In [9]:
reviews_max = {}

for row in google_data:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews    
In [10]:
print(len(reviews_max))
9659

The length of reviews_max matches the expected length(10840-1181 = 9659)

In [11]:
android_clean = []
already_added = []

for row in google_data:
    name = row[0]
    n_reviews = float(row[3])
    
    if (reviews_max[name] == n_reviews ) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
        

print(len(android_clean))#double checking the length of the list
        
        
9659

To confirm, explore the new data

In [12]:
explore_data(android_clean, 0, 5, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13

Data Cleaning: Removing Non English Apps

We'd like to analyze only the apps that are designed for an English-speaking audience. However, if we explore the data long enough, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience.We're not interested in keeping these apps, so we'll remove them. One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text.

In [13]:
print('examples of other text:')
print('\n')
print(apps_data[814][2])
print(apps_data[6734][2])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])
examples of other text:


搜狐新闻—新闻热点资讯掌上阅读软件
エレメンタル ファンタジー - 高精細3DアクションRPG


中国語 AQリスニング
لعبة تقدر تربح DZ

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters.

In [14]:
def eng_text(string):
    for character in string:
        if ord(character)>127:
            return False
    return True

#double checking the function
print(eng_text('Instagram'))
print(eng_text('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_text('Docs To Go™ Free Office Suite'))
print(eng_text('Instachat 😜'))
True
False
False
False

Some English app names use emojis or other symbols that fall out of the ASCII range as shown above.

In [15]:
print(ord('™'))
print(ord('😜'))
8482
128540

If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.

In [16]:
def eng_text(string):
    wrong_character= 0
    for character in string:
        if ord(character) > 127:
            wrong_character +=1
    if wrong_character > 3:
        return False
    return True
In [17]:
print(eng_text('Docs To Go™ Free Office Suite'))
print(eng_text('Instachat 😜'))
print(eng_text('爱奇艺PPS -《欢乐颂2》电视剧热播'))
True
True
False

The function works fine.

Data Cleaning: Free Apps

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

Isolating the free apps will be our last step in the data cleaning process.

In [18]:
updated_app_clean = []
updated_android_clean = []

for row in apps_data:
    name = row[2]
    if eng_text(name):
        updated_app_clean.append(row)
        
for row in android_clean:
    name = row[0]
    if eng_text(name):
        updated_android_clean.append(row)
In [19]:
explore_data(updated_app_clean, 0, 3, True)
print('\n')
explore_data(updated_android_clean, 0, 3, True)
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 6183
Number of columns: 17


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13

There are 6183 iOS apps and 9614 Android apps left in the updated list.

Isolating the free apps will be our last step in the data cleaning process.

In [20]:
free_app = []
free_android = []
for row in updated_app_clean:
    price = float(row[5])
    if price == 0:
        free_app.append(row)
        
for row in updated_android_clean:
    price = row[7]
    if price == '0':
        free_android.append(row)
        
In [21]:
print(len(free_app))
print(len(free_android))
3222
8864

There are 3222 iOS apps and 8864 Android apps left in the updated list.

Analysis: Most Common Apps by Genre

We looked at our validation strategy for an app idea, and then we inspected the data sets to identify the columns that might be useful for determining the most common genres in each market. Our conclusion was that we'll need to build a frequency table for the prime_genre column of the App Store data set, and for the Genres and Category columns of the Google Play data set.

We'll build two functions we can use to analyze the frequency tables:

  1. One function to generate frequency tables that show percentages

  2. Another function we can use to display the percentages in a descending order

In [22]:
def freq_table(dataset,index):
    table={}
    total=0
    for row in dataset:
        total+=1
        name =row[index]
        if name in table:
            table[name]+=1
        else:
            table[name]=1
            
    table_percentage = {}
    for key in  table:
        percentage = (table[key]/total)*100
        table_percentage[key]= percentage
    
    return table_percentage


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We will use display_table() to display the frequency table of the columns prime_genre, Genres, and Category.

App Store

In [23]:
display_table(free_app,12)
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665

Among the free English apps, 58% fall into games, followed by Entertainmnet with 8% and photo&video with 5%. Catalogs and Medical have the least amount of apps with 0.1% and 0.2% accordingly. There are more apps designed for entertainmnet than practical purposes. These percentages only reflect on the amount apps available, it does not reflect on the amount of users.

Google Play

In [24]:
display_table(free_android,1)
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 0.6430505415162455
COMICS : 0.6204873646209386
BEAUTY : 0.5979241877256317

Among the free google play apps, 18% fall into Family, followed by Game with 10% and Tools with 8%. Beauty and Comics have the least amount of apps ,both with 0.6%. In google play, there is a good balance of different apps, only kids apps stand out more than others.

In [25]:
display_table(free_android,9)
Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075812
Strategy : 0.9138086642599278
House & Home : 0.8235559566787004
Weather : 0.8009927797833934
Events : 0.7107400722021661
Adventure : 0.6768953068592057
Comics : 0.6092057761732852
Beauty : 0.5979241877256317
Art & Design : 0.5979241877256317
Parenting : 0.4963898916967509
Card : 0.45126353790613716
Casino : 0.42870036101083037
Trivia : 0.41741877256317694
Educational;Education : 0.39485559566787
Board : 0.3835740072202166
Educational : 0.3722924187725632
Education;Education : 0.33844765342960287
Word : 0.2594765342960289
Casual;Pretend Play : 0.236913357400722
Music : 0.2030685920577617
Racing;Action & Adventure : 0.16922382671480143
Puzzle;Brain Games : 0.16922382671480143
Entertainment;Music & Video : 0.16922382671480143
Casual;Brain Games : 0.13537906137184114
Casual;Action & Adventure : 0.13537906137184114
Arcade;Action & Adventure : 0.12409747292418773
Action;Action & Adventure : 0.10153429602888085
Educational;Pretend Play : 0.09025270758122744
Simulation;Action & Adventure : 0.078971119133574
Parenting;Education : 0.078971119133574
Entertainment;Brain Games : 0.078971119133574
Board;Brain Games : 0.078971119133574
Parenting;Music & Video : 0.06768953068592057
Educational;Brain Games : 0.06768953068592057
Casual;Creativity : 0.06768953068592057
Art & Design;Creativity : 0.06768953068592057
Education;Pretend Play : 0.056407942238267145
Role Playing;Pretend Play : 0.04512635379061372
Education;Creativity : 0.04512635379061372
Role Playing;Action & Adventure : 0.033844765342960284
Puzzle;Action & Adventure : 0.033844765342960284
Entertainment;Creativity : 0.033844765342960284
Entertainment;Action & Adventure : 0.033844765342960284
Educational;Creativity : 0.033844765342960284
Educational;Action & Adventure : 0.033844765342960284
Education;Music & Video : 0.033844765342960284
Education;Brain Games : 0.033844765342960284
Education;Action & Adventure : 0.033844765342960284
Adventure;Action & Adventure : 0.033844765342960284
Video Players & Editors;Music & Video : 0.02256317689530686
Sports;Action & Adventure : 0.02256317689530686
Simulation;Pretend Play : 0.02256317689530686
Puzzle;Creativity : 0.02256317689530686
Music;Music & Video : 0.02256317689530686
Entertainment;Pretend Play : 0.02256317689530686
Casual;Education : 0.02256317689530686
Board;Action & Adventure : 0.02256317689530686
Video Players & Editors;Creativity : 0.01128158844765343
Trivia;Education : 0.01128158844765343
Travel & Local;Action & Adventure : 0.01128158844765343
Tools;Education : 0.01128158844765343
Strategy;Education : 0.01128158844765343
Strategy;Creativity : 0.01128158844765343
Strategy;Action & Adventure : 0.01128158844765343
Simulation;Education : 0.01128158844765343
Role Playing;Brain Games : 0.01128158844765343
Racing;Pretend Play : 0.01128158844765343
Puzzle;Education : 0.01128158844765343
Parenting;Brain Games : 0.01128158844765343
Music & Audio;Music & Video : 0.01128158844765343
Lifestyle;Pretend Play : 0.01128158844765343
Lifestyle;Education : 0.01128158844765343
Health & Fitness;Education : 0.01128158844765343
Health & Fitness;Action & Adventure : 0.01128158844765343
Entertainment;Education : 0.01128158844765343
Communication;Creativity : 0.01128158844765343
Comics;Creativity : 0.01128158844765343
Casual;Music & Video : 0.01128158844765343
Card;Action & Adventure : 0.01128158844765343
Books & Reference;Education : 0.01128158844765343
Art & Design;Pretend Play : 0.01128158844765343
Art & Design;Action & Adventure : 0.01128158844765343
Arcade;Pretend Play : 0.01128158844765343
Adventure;Education : 0.01128158844765343

There is not a clear relation between Genres and Category columns; however, Genres is more detailed with more rows.

In conclusion, there are more apple apps designed for fun; while there is a good balance of apps between functional/educational purpose and for fun.

App Store

The frequency tables we analyzed on the previous screen showed us that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to do the following:

  1. Isolate the apps of each genre

  2. Add up the user ratings for the apps of that genre

  3. Divide the sum by the number of apps belonging to that genre (not by the total number of apps)

In [26]:
prime_genre = freq_table(free_app,12)

list_a = []
for genre in prime_genre:
    total = 0
    len_genre = 0
    for app in free_app:
        genre_app = app[12]
        if genre_app == genre:
            total += float(app[6])
            len_genre += 1
            
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)
Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0

Navigation apps have the highest number of user reviews.

In [27]:
for apps in free_app:
    if apps[12] == 'Navigation':
        print(apps[2],':',apps[6])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911

Waze and Google Maps have about half a million user reviews together. It is hard to compete with companies that are already established and have been dominating the field. Next category that could be interesting is reference.

In [28]:
for apps in free_app:
    if apps[12] == 'Reference':
        print(apps[2],':',apps[6])
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8

This category is being dominated by religous references and dictionaries, with bible having almost 1000000, which does not show much potential, it is hard to stand out as a library of religous references or a translator.

In [29]:
for apps in free_app:
    if apps[12] == 'Photo & Video':
        print(apps[2],':',apps[6])
iSwap Faces LITE : 39722
Shutterfly: Prints, Photo Books, Cards Made Easy : 51427
Epson iPrint : 2838
Photo Transfer App - Easy backup of photos+videos : 15654
Instagram : 2161558
Splice - Video Editor + Movie Maker by GoPro : 28189
Meitu : 6478
Digital Domain : 102
Snapseed : 8683
Kwai - Share your video moments : 668
Photo Lab: Picture Editor, effects & fun face app : 34585
Camera360 - Selfie Filter Camera, Photo Editor : 16729
Snapchat : 323905
Pic Collage - Picture Editor & Photo Collage Maker : 123433
FotoRus -Camera & Photo Editor & Pic Collage Maker : 32558
Meitu HD : 2150
Perfect Image - Pic Collage Maker, Add Text to Photo, Cool Picture Editor : 1646
Visage makeup editor plus photo teeth whitener : 5767
NightShooting : 9
Pic Jointer – Photo Collage, Camera Effects Editor : 51330
Flipagram : 79905
Bazaart Photo Editor Pro and Picture Collage Maker : 4909
LINE Camera - Photo editor, Animated Stamp, Filter : 3978
You Doodle - draw on photos & pictures, add text : 8520
SuperPhoto - Photo Effects & Filters : 1952
FreePrints – Photos Delivered : 26060
PIP Camera-Selfie Cam&Pic Collage&Photo Editor : 8454
PopCam Photo : 160
Pixlr - Photo Collages, Effects, Overlays, Filters : 2099
Photo Editor by Aviary : 39501
Photo Collage Maker & Photo Editor - Live Collage : 93781
KODAK Kiosk Connect App : 3711
InstaBoard  for Instagram - photos & videos repost : 1571
Over— Edit Photos, Add Text & Captions to Pictures : 16221
Photo Grid - photo collage maker & photo editor : 40531
YouTube - Watch Videos, Music, and Live Streams : 278166
Photo Editor- : 9095
Cymera - Photo & Beauty Editor & Collage : 523
Capture - Control Your GoPro Camera - Share Video : 6542
Printicular Print Photos - 1 Hour Pickup : 3909
InstaSize: Photo Editor, Picture Effects & Collage : 15605
Retrica - Selfie Camera with Filter, Sticker & GIF : 11021
PicsArt Photo Studio: Collage Maker & Pic Editor : 29078
VSCO : 11174
BeautyCam - AR Carnie selfie : 2082
Vine Camera : 90355
InstaBeauty -Camera&Photo Editor&Pic Collage Maker : 4818
MOLDIV - Photo Editor, Collage & Beauty Camera : 39501
InstaMag - Free Pic and Photo Collage Maker : 16221
Prime Photos from Amazon : 10511
BeautyPlus - Selfie Camera for a Beautiful Image : 7503
Kanvas - Express Yourself : 2177
Canon PRINT Inkjet/SELPHY : 689
RealTimes: Video Maker : 1274
Kiosk Photo Transfer by Fujifilm : 58
Photo Quilt - Auto Collage Maker : 599
Squaready for Video - Convert Rectangle Movie Clip into Square Shape for Instagram : 778
SlideStory - Create a slideshow movie and a snap video : 220
Quik – GoPro Video Editor to edit clips with music : 28654
Pitu : 968
FACIE : 514
VivaVideo - Best Video Editor & Photo Movie Maker : 10618
MixChannel : 6
YouCam Perfect - Photo & Selfie Editor : 4293
FilmStory - For All Your Video Editing Needs : 66
Mixgram - Picture Collage Maker - Pic Photo Editor : 54282
Rookie Cam - Photo Editor & Filter Camera : 33921
Adobe Photoshop Lightroom for iPad : 2005
musical.ly - your video social network : 105429
Funimate video editor: add cool effects to videos : 123268
Meipai : 1190
in-capturing moments in life : 16
Ghost Lens+Scary Photo Video Edit&Collage Maker : 18316
Fyuse - 3D Photos : 4126
YouCam Makeup: Magic Makeup Selfie Cam : 14188
PHHHOTO - Look Alive : 4280
Photo Editing Effects & Collage Maker - Effectshop : 422
Adobe Photoshop Lightroom for iPhone : 1494
Candy Camera : 397
Lomotif Music Video Editor - Add Music & Effects! : 3507
Adobe Photoshop Mix - Cut out, combine, create : 5253
Pro Editor - Video Maker for FaceBook & Youtube : 3668
Instant X - Take instant-camera-like photo with double exposure and bulb mode : 0
Canva - Graphic Design & Photo Editing : 9114
EOPAN : 0
B612 - Trendy Filters, Selfiegenic Camera : 2275
Video & TV Cast for Chromecast: Best Browser to cast and stream webvideos and local videos on TV & Displays : 5676
Homido 360 VR player : 100
StageCameraHD : 0
Retouch Vogue - Facetune Wrinkles & Pimples Makeup : 2235
Felt: Birthday & Greeting Cards & Thank You Card : 1724
MeiCam -  Video Production Master : 0
Pic-it Collage - Photo Collage Maker and Editor : 415
Color Pop Effects - Photo Editor & Picture Editing : 45320
Lumyer - augmented reality camera effects : 3896
Polaroid Print App - ZIP : 631
Filterra – Photo Editor, Effects for Pictures : 14744
Cool Wallpapers for Pokemon : 3694
LOL Movie: Change your face + voice! : 849
Bestie-Beauty Camera 360 & Portrait Selfie Editor : 1035
Google Photos - unlimited photo and video storage : 88742
Layout from Instagram : 12616
MakeupPlus - Natural, Professional Makeup Looks : 3987
GIPHY. The GIF Search Engine for All the GIFs : 2069
Face Swap App- Funny Face Changer Photo Effects : 11977
Polarr Photo Editor - Photo Editing Tools for All : 2246
Moments - private albums with friends and family : 11955
Patternator Pattern Maker Backgrounds & Wallpapers : 2092
Triller - Music Video & Film Maker : 25683
InShot Video Editor Music, No Crop, Cut : 12779
FreeVRPlayer : 134
Video Smith - A Powerful video editing tool set : 1
SelfieCity : 252
A Color Story : 2436
C CHANNEL -Watch tips & tricks videos for girls : 21
SNOW - Selfie, Motion sticker, Fun camera : 1115
SW/NG - Living Photos. Memories that Swing. : 222
Camcorder - Record VHS Home Videos : 830
POTO - Photo Collage Maker : 1149
lollicam - photo, video, and selfie camera : 51
Boomerang from Instagram : 2373
MuseCam - Edit Photos & Manual Camera : 4267
Simple Camera - Fast Minimal Design : 3
Pictalive for Live Photos - Create from videos : 0
intoLive - Custom Live Photos wallpaper maker : 938
Microsoft Selfie : 375
MSQRD — Live Filters & Face Swap for Video Selfies : 12982
Video speed editor - VBooster : 1
Easy Save - Repost your Instagram Photos & Videos : 2159
Baby Story - Pregnancy Pics Baby Milestones Photo : 6700
Solo Selfie : 1799
BlurEffect-Blur Photo & Video, Hide Face : 0
Foodie - Delicious Camera for Food : 144
Collageable - Photo Collage Maker, Pic Grid Free : 5112
Best 9 for Instagram : 88
SwapperFace - Face Swap Free, Live Mask Effects : 2
Pikazo – AI art that YOU control : 56
Color Pop Free - Selective Color Splash Effects and Black & White Photography Editor : 352
Anime Power FX : 807
GIFYme - Create video loops and gifs with amazing filters for Whatsapp and Instagram : 65
April - Layouts, Photo Collage, and Poster Maker : 165
CelebrityDiagnosis! : 0
Prisma: Photo Editor, Art Filters Pic Effects : 15060
Microsoft Pix Camera : 678
InstaSave for Instagram - Download & Repost your own Videos & Photos for Free : 243
GoSnaps - Share Screenshots for Pokémon GO : 12
YouCam Fun - Live Selfie Video Filters : 2522
Confetti - Geofilter Design Maker for Snapchat : 120
Artisto – Video and Photo Editor with Art Filters : 12963
LOOKS - Real Makeup Camera : 25
VR Tube - Virtual Reality 360 Video Player : 4142
Philm-Video&Photo Editor,REAL-TIME Magic Filter : 103
Facetune 2 : 1009
LINE Moments - Capture Your Fun Moments : 1
Emojil - original emoji stamp, decoration camera : 0
PhotoScan - scanner by Google Photos : 1421
CATCHY Photos-Easter Bunny, Tooth Fairy and more.. : 228
VR Video World - Virtual Reality : 88
Everfilter - transform your photos into artworks : 15
camera for filter : 0

Photo & Video shows some potential even though it is not the most popular category. Unlike categories such as, Food and drink, Finance apps, Photo & Video does not require extra expertise. In addition, the category is not being dominated by a big app, like Nevigation except for youtube & Instagram. Apps under this category usually have a high usage time rate too. There is good potential if we aim to dive into apps that could provide an array of functions to scan, edit, draw and share pictures and photos.

Google Play

We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended

In [30]:
display_table(free_android,5)
1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343

The install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc. We don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users.

Let's calculate the average number of installs per app genre for the Google Play dataset. We'll need to use a nested loop

In [31]:
categories_android = freq_table(free_android,1)


for category in categories_android:
        total = 0
        len_category = 0
        for row in free_android:
            category_app = row[1]
            n_installs = row[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            if category_app == category:
                total += float(n_installs)
                len_category += 1
                
        avg_n_installs = total/len_category
        print(category,':',avg_n_installs)
                
            
ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_MAGAZINES : 9549178.467741935
MAPS_AND_NAVIGATION : 4056941.7741935486

On average, communication category has the most installs.

In [32]:
for app in free_android:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+

However, the category is being dominated by apps such asWhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts, all having over 1 billion installs.

Meanwhile, category such as, video players or social apps, are also dominated by a couple major apps, making the field almost impossible to compete in.

In [33]:
for app in free_android:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+
In [34]:
for app in free_android:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+

Other popular categories, such as books and references, might require copy rights, and is mostly filled with religous books. Categories, such as finance , business and education, requires expertise from other fields, which is out of the scope of our project.

In [35]:
for app in free_android:
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+
In [36]:
under_100_m = []

for app in free_android:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PHOTOGRAPHY') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)
Out[36]:
7670532.29338843

Even if we removed the 100,000,000+ apps from the category, it still kept over 40% of the total installs, which is still more than a lot of the average installs of other categories.

In [37]:
7670532.29338843/17840110.40229885*100
Out[37]:
42.99599117054801

We also notice that some of the apps from photography are default apps, such as Google Photos, or they are one dimensional. For instance, some apps could only make collage or grids and have minimal amount of photo editing. If we could aim to create an app that could combine mutiple funcitons, such as photo editing, filtering, grids, collage or even drawing and designing, it could become a competitve and popular app.

Conclusion

In the project, we analyzed the data by genre and number of downloads to find out the most popular apps.

Since the market is filled with default apps and one dimensional apps, we concluded that making a photography/drawing app that aims to combine mutiple functions, such as photo editing, filtering, grids, collage or even drawing and designing could be profitable.