Educational Apps on the Google Play Store and Apple App Store

A data visualization project to benchmark a new product

By Jacqueline Tiefert

I am taking the position of a Product Manager at an EdTech company and I am conducting market research on educational apps offered on the Google Play Store and the Apple App Store over the years 2008 - 2021. I want to determine the optimal price for an educational app as well as configure product benchmark metrics for user ratings and number of installs by answering the following questions:

  • What are the most popular price points for educational apps?
  • Have the prices of educational apps increased or decreased over time?
  • Have educational apps increased in price during the Covid pandemic?
  • What are the most popular average user ratings for educational apps?
  • Have the average user ratings of educational apps increased or decreased over time?
  • Did the Covid pandemic years create an anomoly in the market in terms of higher or lower user ratings?
  • Does price point affect the average user rating?
  • How many installs can be expected within an educational app's release year? Does price affect this?

I. Filtering the Datasets

Part 1: Reading in the Datasets

In [1]:
import pandas as pd

ios = pd.read_csv("appleAppData.csv")
android = pd.read_csv("Google-Playstore.csv")

The code below reveals the header column names for the Google Play Store apps dataset. The data in the code will be referred to as "android." The first rows of data are given below that as samples from the dataset. The relevant columns for this project are the "Category", "Rating", "Installs", "Price", "Released", and "Content Rating" columns. The "Content Rating" columns' values determine the suitability of the app for an audience, and a value of "Everyone" indicates that the app has no age restrictions. The "Category" column gives the genre of the app, and we will filter over this column to select all apps with the value of "Education." The "Released" column tells the date the app was released, and later we will be examining data from both the year and months listed from the time stamp. The "Rating" column gives the average user rating for the app, and the "Installs" column indicates the number of user installs with a plus sign, such as 100+ or 50+ installs. The information also provides the number of rows or apps included in the dataset, and the number of columns. The number of columns indicates the number of unique data points collected on each app.

In [2]:
android.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2312944 entries, 0 to 2312943
Data columns (total 24 columns):
 #   Column             Dtype  
---  ------             -----  
 0   App Name           object 
 1   App Id             object 
 2   Category           object 
 3   Rating             float64
 4   Rating Count       float64
 5   Installs           object 
 6   Minimum Installs   float64
 7   Maximum Installs   int64  
 8   Free               bool   
 9   Price              float64
 10  Currency           object 
 11  Size               object 
 12  Minimum Android    object 
 13  Developer Id       object 
 14  Developer Website  object 
 15  Developer Email    object 
 16  Released           object 
 17  Last Updated       object 
 18  Content Rating     object 
 19  Privacy Policy     object 
 20  Ad Supported       bool   
 21  In App Purchases   bool   
 22  Editors Choice     bool   
 23  Scraped Time       object 
dtypes: bool(4), float64(4), int64(1), object(15)
memory usage: 361.8+ MB
In [3]:
android.head(3)
Out[3]:
App Name App Id Category Rating Rating Count Installs Minimum Installs Maximum Installs Free Price ... Developer Website Developer Email Released Last Updated Content Rating Privacy Policy Ad Supported In App Purchases Editors Choice Scraped Time
0 Gakondo com.ishakwe.gakondo Adventure 0.0 0.0 10+ 10.0 15 True 0.0 ... https://beniyizibyose.tk/#/ [email protected] Feb 26, 2020 Feb 26, 2020 Everyone https://beniyizibyose.tk/projects/ False False False 2021-06-15 20:19:35
1 Ampere Battery Info com.webserveis.batteryinfo Tools 4.4 64.0 5,000+ 5000.0 7662 True 0.0 ... https://webserveis.netlify.app/ [email protected] May 21, 2020 May 06, 2021 Everyone https://dev4phones.wordpress.com/licencia-de-uso/ True False False 2021-06-15 20:19:35
2 Vibook com.doantiepvien.crm Productivity 0.0 0.0 50+ 50.0 58 True 0.0 ... NaN [email protected] Aug 9, 2019 Aug 19, 2019 Everyone https://www.vietnamairlines.com/vn/en/terms-an... False False False 2021-06-15 20:19:35

3 rows × 24 columns

The code below reveals the the header columns for the Apple App stores dataset. The data in the code will be referred to as "ios." The first rows of data are given below that as samples from the dataset. The relevant columns for this project are "Primary_Genre", "Content_Rating", "Released", "Price", and "Average_User_Rating." The "Primary_Genre" column gives the categories of the apps, and the apps falling under the Education category will be relevant for this proejct. "Released" column gives the date the app was released. The "Price" column gives the price to buy the app, and the "Content_Rating" column allows the user to know for which audience the app is suitable. For iOS Apple apps, the content ratings are different than Android Google apps. For Apple apps, a content rating of 4+ means that there is no objectionable material and the app is suitable for everyone. Apple apps have other content ratings such as 9+ (unsuitable for children under age 9), 12+ (unsuitable for children under age 12), and 17+ (unsuitable for children under age 17). We will filter the dataset to select those apps graded 4+ in order to be consistent with Google's "Everyone" content rating.

In [4]:
ios.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1230376 entries, 0 to 1230375
Data columns (total 21 columns):
 #   Column                   Non-Null Count    Dtype  
---  ------                   --------------    -----  
 0   App_Id                   1230376 non-null  object 
 1   App_Name                 1230375 non-null  object 
 2   AppStore_Url             1230376 non-null  object 
 3   Primary_Genre            1230376 non-null  object 
 4   Content_Rating           1230376 non-null  object 
 5   Size_Bytes               1230152 non-null  float64
 6   Required_IOS_Version     1230376 non-null  object 
 7   Released                 1230373 non-null  object 
 8   Updated                  1230376 non-null  object 
 9   Version                  1230376 non-null  object 
 10  Price                    1229886 non-null  float64
 11  Currency                 1230376 non-null  object 
 12  Free                     1230376 non-null  bool   
 13  DeveloperId              1230376 non-null  int64  
 14  Developer                1230376 non-null  object 
 15  Developer_Url            1229267 non-null  object 
 16  Developer_Website        586388 non-null   object 
 17  Average_User_Rating      1230376 non-null  float64
 18  Reviews                  1230376 non-null  int64  
 19  Current_Version_Score    1230376 non-null  float64
 20  Current_Version_Reviews  1230376 non-null  int64  
dtypes: bool(1), float64(4), int64(3), object(13)
memory usage: 188.9+ MB
In [5]:
ios.head(3)
Out[5]:
App_Id App_Name AppStore_Url Primary_Genre Content_Rating Size_Bytes Required_IOS_Version Released Updated Version ... Currency Free DeveloperId Developer Developer_Url Developer_Website Average_User_Rating Reviews Current_Version_Score Current_Version_Reviews
0 com.hkbu.arc.apaper A+ Paper Guide https://apps.apple.com/us/app/a-paper-guide/id... Education 4+ 21993472.0 8.0 2017-09-28T03:02:41Z 2018-12-21T21:30:36Z 1.1.2 ... USD True 1375410542 HKBU ARC https://apps.apple.com/us/developer/hkbu-arc/i... NaN 0.0 0 0.0 0
1 com.dmitriev.abooks A-Books https://apps.apple.com/us/app/a-books/id103157... Book 4+ 13135872.0 10.0 2015-08-31T19:31:32Z 2019-07-23T20:31:09Z 1.3 ... USD True 1031572001 Roman Dmitriev https://apps.apple.com/us/developer/roman-dmit... NaN 5.0 1 5.0 1
2 no.terp.abooks A-books https://apps.apple.com/us/app/a-books/id145702... Book 4+ 21943296.0 9.0 2021-04-14T07:00:00Z 2021-05-30T21:08:54Z 1.3.1 ... USD True 1457024163 Terp AS https://apps.apple.com/us/developer/terp-as/id... NaN 0.0 0 0.0 0

3 rows × 21 columns

In [6]:
## drop some columns from the original database and make sure that it works
ios.drop(columns = ["App_Id"], axis=1, inplace = True)
ios.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1230376 entries, 0 to 1230375
Data columns (total 20 columns):
 #   Column                   Non-Null Count    Dtype  
---  ------                   --------------    -----  
 0   App_Name                 1230375 non-null  object 
 1   AppStore_Url             1230376 non-null  object 
 2   Primary_Genre            1230376 non-null  object 
 3   Content_Rating           1230376 non-null  object 
 4   Size_Bytes               1230152 non-null  float64
 5   Required_IOS_Version     1230376 non-null  object 
 6   Released                 1230373 non-null  object 
 7   Updated                  1230376 non-null  object 
 8   Version                  1230376 non-null  object 
 9   Price                    1229886 non-null  float64
 10  Currency                 1230376 non-null  object 
 11  Free                     1230376 non-null  bool   
 12  DeveloperId              1230376 non-null  int64  
 13  Developer                1230376 non-null  object 
 14  Developer_Url            1229267 non-null  object 
 15  Developer_Website        586388 non-null   object 
 16  Average_User_Rating      1230376 non-null  float64
 17  Reviews                  1230376 non-null  int64  
 18  Current_Version_Score    1230376 non-null  float64
 19  Current_Version_Reviews  1230376 non-null  int64  
dtypes: bool(1), float64(4), int64(3), object(12)
memory usage: 179.5+ MB
In [7]:
## dropping more columns to make the data more manageable
ios.drop(columns = ["Size_Bytes", "Required_IOS_Version", "DeveloperId", "Developer", "Developer_Url"], axis=1, inplace = True)
ios.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1230376 entries, 0 to 1230375
Data columns (total 15 columns):
 #   Column                   Non-Null Count    Dtype  
---  ------                   --------------    -----  
 0   App_Name                 1230375 non-null  object 
 1   AppStore_Url             1230376 non-null  object 
 2   Primary_Genre            1230376 non-null  object 
 3   Content_Rating           1230376 non-null  object 
 4   Released                 1230373 non-null  object 
 5   Updated                  1230376 non-null  object 
 6   Version                  1230376 non-null  object 
 7   Price                    1229886 non-null  float64
 8   Currency                 1230376 non-null  object 
 9   Free                     1230376 non-null  bool   
 10  Developer_Website        586388 non-null   object 
 11  Average_User_Rating      1230376 non-null  float64
 12  Reviews                  1230376 non-null  int64  
 13  Current_Version_Score    1230376 non-null  float64
 14  Current_Version_Reviews  1230376 non-null  int64  
dtypes: bool(1), float64(3), int64(2), object(9)
memory usage: 132.6+ MB

This effectively deleted some columns fromm the Google Playstore dataset so that it will be easier to work with and take less time to run. The files will not be as large.

Part 2: Filtering to select only the educational category

The code below loops through the datasets, selects only the apps that have the category name "Education," and isolates them. Then it filters through the datast and also only selects the apps with the content rating suitable for everyone. The first code snippet works on the Google Play Store dataset first. This reduces the number of apps in the dataset from 2.3 million to a little over 230 thousand. Selecting only educational apps also reduces the Apple apps dataset from 1.2 million to just a little over 100 thousand.

In [8]:
android_final = android[(android["Category"]=='Education')&(android["Content Rating"]=='Everyone')]
print("Number of rows in educational Google apps suitable for everyone:", len(android_final))
Number of rows in educational Google apps suitable for everyone: 232180
In [9]:
android_final.head(4)
Out[9]:
App Name App Id Category Rating Rating Count Installs Minimum Installs Maximum Installs Free Price ... Developer Website Developer Email Released Last Updated Content Rating Privacy Policy Ad Supported In App Purchases Editors Choice Scraped Time
37 Calculus Tutorial 1: Introduction com.RaySemiSoft.CalculusT1 Education 0.0 0.0 100+ 100.0 277 True 0.0 ... NaN [email protected] Jun 18, 2020 Jun 01, 2021 Everyone NaN False False False 2021-06-15 20:19:37
67 RACE ACADEMY co.davos.snqkw Education 0.0 0.0 100+ 100.0 186 True 0.0 ... NaN [email protected] Jan 9, 2021 May 18, 2021 Everyone https://bit.ly/2YDTip0 False False False 2021-06-15 20:19:39
72 Triple Point Academy co.varys.sinbd Education 5.0 5.0 10+ 10.0 18 True 0.0 ... NaN [email protected] Oct 15, 2020 Jun 13, 2021 Everyone https://bit.ly/33pSGFX False False False 2021-06-15 20:19:39
96 Духовно-нравственная культура (ДНК) appinventor.ai_moscluster_com.DNK Education 0.0 0.0 100+ 100.0 103 True 0.0 ... http://www.moscluster.com [email protected] Jul 5, 2017 Jul 05, 2017 Everyone https://www.moscluster.com/?page_id=463 False False False 2021-06-15 20:19:41

4 rows × 24 columns

In [10]:
ios_final = ios[(ios["Primary_Genre"]=='Education') & (ios["Content_Rating"]=='4+')]
print("Number of rows in educational Apple apps suitable for everyone:", len(ios_final))
Number of rows in educational Apple apps suitable for everyone: 106049
In [11]:
ios_final.head(3)
Out[11]:
App_Name AppStore_Url Primary_Genre Content_Rating Released Updated Version Price Currency Free Developer_Website Average_User_Rating Reviews Current_Version_Score Current_Version_Reviews
0 A+ Paper Guide https://apps.apple.com/us/app/a-paper-guide/id... Education 4+ 2017-09-28T03:02:41Z 2018-12-21T21:30:36Z 1.1.2 0.0 USD True NaN 0.0 0 0.0 0
28 AAB Mobile https://apps.apple.com/us/app/aab-mobile/id147... Education 4+ 2019-08-29T07:00:00Z 2019-12-28T20:58:19Z 1.3 0.0 USD True NaN 5.0 1 5.0 1
31 AAJ Year Book https://apps.apple.com/us/app/aaj-year-book/id... Education 4+ 2015-11-09T23:24:28Z 2015-12-08T00:22:26Z 1.01 0.0 USD True http://aaj.edu.jo 1.0 1 1.0 1

Part 3: Visualizing the final dataset selection

The nested pie chart shows that apps in the educational category and graded as suitable for anyone to download, make up less than 10% of either the Google Play Store or the Apple App Store datasets from years 2008-2021. Educational apps consist of 10% of all Google Play Store apps, and 8.62% of all Apply App Store apps. We will use these final filtered datasets for the rest of the data exploration and visualization in this project.

In [12]:
import seaborn as sns
import matplotlib.pyplot as plt

size = 0.3
facecolor = '#eaeaf2'
font_color = '#525252'
labels = ['Other Apps', 'Educational Apps']
vals = [106049, 1124327]
group_sum = [232180, 2080764]
group_name = ["Google",""]
subgroup_name = ["","Apple"]
names=["Edu. Apps", "Other Apps"]


fig, ax = plt.subplots(figsize=(15,10), facecolor=facecolor)
outer_colors = ["#FF0066", "#FFFFD1"]
inner_colors = ["#FF0066", "#FFFFD1"]


ax.pie(group_sum, radius = 1, colors=outer_colors, labels=group_name, textprops={'color':font_color}, wedgeprops = dict(width=size, edgecolor='black'))
ax.pie(vals, radius = 1-size, colors=inner_colors, labels=subgroup_name, wedgeprops=dict(width=size, edgecolor='black'))
ax.set_title("Google vs. Apple Educational Apps as % Total", fontsize=18, pad=15, color=font_color)
plt.legend(names, loc='lower left')
plt.show()

II. Pricing

Part 1: Most Common Prices 2008-2021

The following code reveals that while all Apple apps have prices ending in 99 cents, Google apps have more creative pricing features. I will select a new dataset of prices ending in 99 and grab all prices between 99 cents and 19.99, incrementing by a dollar. I then will create a bar graph to get a count of how many apps are offered at each price. The first bar graph shows just the Google Play Store dataset. It shows that the most commonly found prices are 4.99 and under, with a majority offered for just 99 cents. The second bar graph is of the Apple App Store dataset and it shows a similar trend, although it has more apps offered at 1.99 and 2.99 than the Google Play Store. Perhaps this is just a reflection of a larger dataset, though. Both bar graphs reveal that above 4.99, the most commonly found prices for apps are 6.99 or 9.99.

The following code snippet reveals all the unique price points in the datasets. Note that the Google Play Store dataset has data dating back to 2010 while the Apple app dataset has data dating back to 2008.

In [13]:
##Google apps prices and their frequency
android_final["Price"].value_counts()
Out[13]:
0.000000     225780
0.990000       1265
1.990000        735
2.990000        643
1.490000        489
              ...  
0.590378          1
12.000000         1
4.880000          1
10.970000         1
18.903596         1
Name: Price, Length: 321, dtype: int64

This revealed that Google apps in the educational category are mostly free. We don't want to look at the free apps for now, so we will isolated just those apps that have a price above zero. We are left wwith 6,400 apps.

In [14]:
## only get Google apps priced above zero
android_final=android_final.loc[android_final["Price"] > 0]
android_final["Price"].value_counts()
Out[14]:
0.990000     1265
1.990000      735
2.990000      643
1.490000      489
3.990000      370
             ... 
0.590378        1
12.000000       1
4.880000        1
10.970000       1
18.903596       1
Name: Price, Length: 320, dtype: int64
In [15]:
print("The number of rows left for Google apps:", len(android_final))
The number of rows left for Google apps: 6400

Let's now compare this with the number of educational apps in Apple that have a price.

In [16]:
#Apple apps' prices and their frequency
ios_final["Price"].value_counts()
Out[16]:
0.00      90092
0.99       3692
1.99       3370
2.99       2607
3.99       1508
          ...  
41.99         1
349.99        1
209.99        1
38.99         1
46.99         1
Name: Price, Length: 75, dtype: int64

The majority of educational Apple apps are also offered free.

In [17]:
## only get Apple apps priced above zero
ios_final=ios_final.loc[ios_final["Price"] > 0]
ios_final["Price"].value_counts()
Out[17]:
0.99      3692
1.99      3370
2.99      2607
3.99      1508
4.99      1401
          ... 
109.99       1
399.99       1
349.99       1
124.99       1
46.99        1
Name: Price, Length: 74, dtype: int64
In [18]:
print("The number of rows left for Apple apps:", len(ios_final))
The number of rows left for Apple apps: 15951
In [19]:
##Create a bar graph to visualize Google dataset prices
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(16,8))
popular_prices = [0.99, 1.99, 2.99, 3.99, 4.99, 5.99, 6.99, 7.99, 8.99, 9.99,11.99, 12.99, 13.99, 14.99, 15.99, 16.99, 17.99, 18.99, 19.99]
android_prices = android_final[android_final["Price"].isin(popular_prices)]
sns.set_style("whitegrid")
sns.countplot(data=android_prices, x="Price", orient="h", palette="pastel")
plt.xticks(rotation=45)
plt.title("Google Play Store's Educational Apps Priced Under $20", fontsize=14)

plt.show()
In [20]:
## Create a bar graph to visualize Apple Apps dataset prices
plt.figure(figsize=(16,8))
ios_prices = ios_final[ios_final["Price"].isin(popular_prices)]
sns.countplot(data=ios_prices, x="Price", orient="h", palette="pastel")
plt.title("Apple App Store's Educational Apps Priced Under $20 Dollars", fontsize=14)
plt.xticks(rotation=45)
plt.show()

Part 2: Price Changes Over Time

To look at price changes over time, I use the "Released" column of both datasets. The "Released" colummn gives the date that the app was released. I look at the price points when the apps were released, over the years 2008-2021. I also look at the change over months in each year, to search for any hidden patterns. First, I need to extract the month and year from the "Released" column. This is difficult to do, because the date is saved as a string in the Google Play Store dataset. I have to first create a pattern that gets matched and returns the year, then the month, from each string, and assigns them, respectively, to new columns titled "Year" and "Month." I then do the same thing for the Apple App Store dataset, which proves easier to do.

In [21]:
##Extract month and year and assign new columns with those values for easy access

pattern = r"([2][0-9]{3})"
years = android_prices["Released"].str.extract(pattern)
android_prices["Year"] = years

##Check the Google Play Store Set for months and that they are extracted
monthpattern = r"([A-Z][a-z]{2})"
month = android_prices["Released"].str.extract(monthpattern)
android_prices["Month"] = month
android_prices["Month"] = pd.Categorical(android_prices["Month"], ['Jan','Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
android_prices.sort_values("Month")
/var/folders/01/8qkcw5d910nfzr664dx7b5j80000gn/T/ipykernel_66540/2407542695.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_prices["Year"] = years
/var/folders/01/8qkcw5d910nfzr664dx7b5j80000gn/T/ipykernel_66540/2407542695.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_prices["Month"] = month
/var/folders/01/8qkcw5d910nfzr664dx7b5j80000gn/T/ipykernel_66540/2407542695.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_prices["Month"] = pd.Categorical(android_prices["Month"], ['Jan','Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
Out[21]:
App Name App Id Category Rating Rating Count Installs Minimum Installs Maximum Installs Free Price ... Released Last Updated Content Rating Privacy Policy Ad Supported In App Purchases Editors Choice Scraped Time Year Month
1275326 Learn French com.metalanguage.learnfrench Education 4.8 10.0 100+ 100.0 392 False 4.99 ... Jan 21, 2016 Dec 19, 2019 Everyone http://metalanguagepro.com/privacy-policy/ False False False 2021-06-15 21:49:03 2016 Jan
164134 [PRO] DPE RJ TECNICO MEDIO 2019 br.com.concursoprepara.defensoriapublicadoesta... Education 0.0 0.0 10+ 10.0 19 False 3.99 ... Jan 22, 2019 Jan 22, 2019 Everyone NaN False False False 2021-06-15 23:02:43 2019 Jan
1947303 Praxis II Business Education Exam Prep Flashcards com.smart.serious.software.app.learn.flashcard... Education 0.0 0.0 10+ 10.0 13 False 1.99 ... Jan 2, 2019 Jan 02, 2019 Everyone https://smart-apps.flycricket.io/privacy.html False False False 2021-06-16 07:39:52 2019 Jan
922263 おかねかぞえ nhiraiwa.kids.money Education 0.0 0.0 100+ 100.0 122 False 0.99 ... Jan 12, 2013 Jan 30, 2016 Everyone NaN False False False 2021-06-16 11:06:16 2013 Jan
921974 21 Courageous Prayers org.jeffmikels.courageousprayers Education 0.0 0.0 10+ 10.0 40 False 0.99 ... Jan 3, 2019 Jan 03, 2019 Everyone NaN False False False 2021-06-16 11:06:00 2019 Jan
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1725390 Mnemocon Cards - обучение английскому по карто... ru.mnemocon.cards.mnemoconcards Education 4.7 17.0 100+ 100.0 316 False 0.99 ... NaN Jun 02, 2018 Everyone https://sites.google.com/view/mnemocon False False False 2021-06-16 04:31:43 NaN NaN
1907502 Hiragana/Katakana Drill Pro org.muth.android.kana Education 4.3 70.0 1,000+ 1000.0 1268 False 2.99 ... NaN Nov 21, 2015 Everyone NaN False False False 2021-06-16 07:04:55 NaN NaN
1953197 German Verbs Pro org.muth.android.conjugator_pro_de Education 4.5 246.0 1,000+ 1000.0 4582 False 4.99 ... NaN Jul 17, 2019 Everyone NaN False False False 2021-06-16 07:45:03 NaN NaN
2050133 AnyMemo Pro: For Donation org.liberty.android.fantastischmemopro Education 4.5 151.0 1,000+ 1000.0 1982 False 1.99 ... NaN Aug 08, 2020 Everyone https://anymemo.org/privacy-policy-view False False False 2021-06-16 09:09:59 NaN NaN
2262468 CompTIA Server+ Exam Prep com.dblpartners.comptia_server Education 0.0 0.0 100+ 100.0 223 False 4.99 ... NaN Oct 28, 2019 Everyone http://dynamicpath.com/privacy False False False 2021-06-16 12:14:53 NaN NaN

4148 rows × 26 columns

In [22]:
##The years are stored as strings so I have to convert to float
android_prices['Year'] = android_prices['Year'].astype(float)
##Select years before 2015
android_priceyear = android_prices.loc[android_prices["Year"] < 2015] 
/var/folders/01/8qkcw5d910nfzr664dx7b5j80000gn/T/ipykernel_66540/1971590246.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_prices['Year'] = android_prices['Year'].astype(float)

The graphs below are analyzed for any patterns. Both the Google Play Store and the Apple App Stores' line graphs reveal one major pattern: both exhibit price peaks once or twice per year. Generally, the trend is that higher prices are offered in February-March and/or September-October. The prices are low in the summer and December-January. This makes sense, because teachers would be looking for educational apps during the back-to-school season in the fall, and right before the spring semester, in order to finish the academic year strong with fresh inspiration. Notice that in the Google Play Store's educational apps from years 2010-2014, many of the price peaks are between 6-8 dollars, which is higher than in years following.

In [23]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
sns.lineplot(data=android_priceyear, x="Month", y="Price", hue="Year", palette='bright')
plt.xticks(rotation=45)
plt.title("Google Play Store's Educational Apps' Price Trend over Yrs. 2010-2014", fontsize=14)
plt.show()
In [24]:
android_secpriceyearseg = android_prices.loc[(android_prices["Year"] < 2020) &(android_prices["Year"] > 2014)]
len(android_secpriceyearseg)
Out[24]:
2465
In [25]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
sns.lineplot(data=android_secpriceyearseg, x="Month", y="Price", hue="Year", palette='bright')
plt.xticks(rotation=45)
plt.title("Google Play Store's Educational Apps' Price Trend over Yrs. 2015-2019", fontsize=14)
plt.show()
In [26]:
android_pandemicyrs = android_prices.loc[android_prices["Year"] >= 2019] 
len(android_pandemicyrs)
Out[26]:
1143

The Covid years on the Google Play Store reveal a sharp peak in prices during March 2021, otherwise prices were fairly stable throughout 2020.

In [27]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
sns.lineplot(data=android_pandemicyrs, x="Month", y="Price", hue="Year", palette='bright')
plt.xticks(rotation=45)
plt.title("Google Play Store's Educational Apps' Price Trend over Covid Pandemic Yrs. 2019-2021", fontsize=14)
plt.show()

Below I start to examine the Apple App Store dataset. The Apple App Store dataset is a little different from the Google Play Store dataset, in that the data started two years earlier, in 2008. I isolate these early years in the first graph. We can see that the prices started out high in 2008 and immediately dropped in early 2009. By 2010, the prices stabilized.

In [28]:
## extract the year and month when app was released and add a column for month and year to dataframe
from datetime import datetime
ios_prices["Month"] =pd.DatetimeIndex(ios_prices['Released']).month
ios_prices["Year"]=pd.DatetimeIndex(ios_prices['Released']).year
/var/folders/01/8qkcw5d910nfzr664dx7b5j80000gn/T/ipykernel_66540/2148628083.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ios_prices["Month"] =pd.DatetimeIndex(ios_prices['Released']).month
/var/folders/01/8qkcw5d910nfzr664dx7b5j80000gn/T/ipykernel_66540/2148628083.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ios_prices["Year"]=pd.DatetimeIndex(ios_prices['Released']).year
In [29]:
## get Apple apps priced between 2008 and 2009
ios_priceyearseg1 = ios_prices.loc[ios_prices["Year"] < 2011] 
len(ios_priceyearseg1)
Out[29]:
714
In [30]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
sns.relplot(data=ios_priceyearseg1, x="Month", y="Price", hue="Year", height=7, aspect=2, kind="line", palette='bright')
plt.xticks(rotation=45)
plt.title("Apple App Store's Educational Apps' Price Trend over Yrs. 2008-2010", fontsize=14)
plt.show()
<Figure size 1152x576 with 0 Axes>

The Apple App store prices from 2010-2014 reveal that prices got higher than in 2010. Notice that the prices later in this period eventually got higher than most years prior, especially in 2014 the prices offered were generally higher than years prior. 2012 was when this trend began. Still, the general trend is that the Google Play Store's price peaks were at higher prices than found in the Apple App Store over the same time period. Most of the price peaks in the Google Play Store were between 6 and 8 dollars, whereas in the Apple App store, they are between 4 and 5 dollars.

In [31]:
ios_secpriceyearseg = ios_prices.loc[(ios_prices["Year"] < 2015) &(ios_prices["Year"] > 2009)]
In [32]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
sns.relplot(data=ios_secpriceyearseg, x="Month", y="Price", hue="Year", height=6, aspect=2, kind="line", palette='bright')
plt.xticks(rotation=45)
plt.title("Apple App Store's Educational Apps' Price Trend over Yrs. 2010-2014", fontsize=14)
plt.show()
<Figure size 1152x576 with 0 Axes>

The graph below showss that during the years 2015-2019, Apple price peaks were still between 4 and 5 dollars. However, during the same time period, Google price peaks were mostly between 5 andd 6 dollars, which came down from between 6-8 dollars offered during 2010-2014 in the Google Play Store. Thus, the Apple App Store prices prove to be more stable over time than the Google Play Store. Still, we can see some mvery high price peaks in the Apple App stoe, such as during October 2019, when prices swung towards 7 dollars.

In [33]:
ios_thirdpriceyearseg = ios_prices.loc[(ios_prices["Year"] < 2020) &(ios_prices["Year"] > 2014)]
In [34]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
sns.relplot(data=ios_thirdpriceyearseg, x="Month", y="Price", hue="Year", height=6, aspect=2, kind="line", palette='bright')
plt.title("Apple App Store's Educational Apps' Price Trend over Yrs. 2015-2019", fontsize=14)
plt.show()
<Figure size 1152x576 with 0 Axes>