Ryan West
For my final, I will be using my Instagram data. I have redownloaded the file to have more up to date information. My project involves looking at posts to determine what types of posts I am more likely to like. I will be using the "liked_posts" file in the "likes" folder.
This file lists the name of the account who made the post, the url link, and the timestamp. I will use count to determine how much of each account I have liked and determine the top accounts.
I will then compare that to the list of ads I have viewed to determine if their is a correlation between my top liked accounts and the type of ads I have viewed. As you will see, I run into a snag that prevents me from giving as detailed of a comparison as I desired.
import pandas as pd
import json
import matplotlib as plt
from matplotlib import pyplot as plt
from scipy import stats
Here is loading in the data file containing the posts I have liked.
with open(r"C:\Users\ryno2\DATAINEMERGINGMEDIA\InstagramFinalData\likes\liked_posts.json") as p:
dat = json.load(p)
dat.keys()
dict_keys(['likes_media_likes'])
df_datlikedposts = pd.DataFrame(dat)
df_datlikedposts
df_likesmedia = pd.DataFrame(dat['likes_media_likes'])
df_likesmedia
df_likesmedia.head()
title | media_list_data | string_list_data | |
---|---|---|---|
0 | theamiibros | [] | [{'href': 'https://www.instagram.com/p/Bkd9pIv... |
1 | deftones | [] | [{'href': 'https://www.instagram.com/p/BkduH0I... |
2 | thedakku | [] | [{'href': 'https://www.instagram.com/p/BkcE6k4... |
3 | mrewest | [] | [{'href': 'https://www.instagram.com/p/Bka_R4p... |
4 | mrewest | [] | [{'href': 'https://www.instagram.com/p/BkWRZqr... |
df_likesmedia['title']
likescount = df_likesmedia.groupby('title').count().sort_values('media_list_data',ascending = False)
likescount.keys()
Index(['media_list_data', 'string_list_data'], dtype='object')
I have counted all of the accounts and sorted them from highest to lowest. I will be using the top 5 liked accounts.
likescount
likescount.head(5)
media_list_data | string_list_data | |
---|---|---|
title | ||
wannatradepants | 586 | 586 |
mrewest | 299 | 299 |
respawnedrecords | 209 | 209 |
banana_hoard_vinyl | 186 | 186 |
8bit_exasperation_disbursment | 178 | 178 |
We will now take a look at the main page of these top 5 instagram accounts.
I predict these or most of them will be vinyl related accounts.
wannatradepants: Posts a variety of video game music or similar related things including vinyl. They post everyday so it makes sense they are the most liked.
mrewest: This is actually my oldest brother. He posts a variety of things with vinyl being one of them. Even though he is my brother so i like all of his posts, i don't feel like he posts as often as other accounts so i am surprised he is still number 2.
respawnedrecords: A video game vinyl record store. I've been following them since they first started their business so it makes sense they would be up here.
banana__hoard_vinyl: They post frequently and i really like the images they take. I only really like photos if they are of something i know and/or like so since they post video game related vinyl i tend to like their posts a lot.
8bit_exasperation_disbursment: Same as banana_hoard_vinyl
Conclusion: As expected, all of these accounts relate to vinyl in some way.
Now I will be loading in the ads_viewed file to compare.
with open(r"C:\Users\ryno2\DATAINEMERGINGMEDIA\InstagramFinalData\ads_and_content\ads_viewed.json") as d:
addat = json.load(d)
addat.keys()
dict_keys(['impressions_history_ads_seen'])
df_adsviewed = pd.DataFrame(addat)
df_adsviewed.head()
impressions_history_ads_seen | |
---|---|
0 | {'title': '', 'media_map_data': {}, 'string_ma... |
1 | {'title': '', 'media_map_data': {}, 'string_ma... |
2 | {'title': '', 'media_map_data': {}, 'string_ma... |
3 | {'title': '', 'media_map_data': {}, 'string_ma... |
4 | {'title': '', 'media_map_data': {}, 'string_ma... |
df_impressions = pd.DataFrame(addat['impressions_history_ads_seen'])
df_impressions.head()
title | media_map_data | string_map_data | |
---|---|---|---|
0 | {} | {'Author': {'href': '', 'value': 'riotforge', ... | |
1 | {} | {'Author': {'href': '', 'value': 'stsphonoco',... | |
2 | {} | {'Author': {'href': '', 'value': 'first4figure... | |
3 | {} | {'Author': {'href': '', 'value': 'finalfantasy... | |
4 | {} | {'Author': {'href': '', 'value': 'gamechops', ... |
df_stringmap = pd.DataFrame({ 'Title': [x['string_map_data']['Author']['value']for x in df_adsviewed['impressions_history_ads_seen']]})
I had to go step by step to get to the desired table. I had to look up how to rename column headers because originally the below table listed the "Title" column as just "0" which made grouping and counting difficult. Here is where I ran into snags in regard to using the ads_viewed file.
df_stringmap.head()
Title | |
---|---|
0 | riotforge |
1 | stsphonoco |
2 | first4figures |
3 | finalfantasyvii |
4 | gamechops |
adscount = df_stringmap.groupby('Title').count()
I was unable to figure out how to count these titles but I discovered that each title is only listed once anyways so counting them would be pointless anyways. Because of this I will be taking a look at the accounts myself and determine what they are. I will be counting vinyl accounts against non-vinyl accounts. Accounts that aren't exclusively vinyl but still contain vinyl will be counted as a vinyl account.
Non-vinyl : 34 Vinyl : 5
adscount
Title |
---|
arcsystemworksu |
astronord_ |
bite |
builtbar |
cakeworthy |
compartes |
creepycompany |
dwhomecandles |
ffxiv |
finalfantasyvii |
first4figures |
fluanceaudio |
fragrant_jewels |
gamechops |
goosecreekco |
gruventertainment |
halo |
harajukustreetwear |
homesick |
itcosmetics |
kentstate |
lastnightinsohomovie |
meijerstores |
michaelsstores |
milacaresquad |
misfitsmarket |
paramountplus |
paypal |
qnlabs |
riotforge |
shopatsuko |
shoplctv |
skyboundent |
stsphonoco |
thecontainerstore |
thesoundofvinylus |
victrolaplayers |
yankeecandle |
zetsubou_p |
Non-vinyl : 34 Vinyl : 5
As you can see, the number of non-vinyl ads far exceeds the number of vinyl ads.
I fully expected there to be way more vinyl ads with the number of vinyl related posts I like and honestly I think there should be.
It only seems like the file contains about 50 ad instances when I have obviously viewed way more than that. What this current data tells me is that my targeted ads aren't quite what I thought they would be. Obviously, I don't exclusively like only vinyl posts but I thought since thats the majority of what I like it (or at least I think it's what I like the most) would be way more. I would imagine it would also count other posts I've viewed and liked (such as on the explore page) or posts I have saved to determine the ads I get. I wonder which of those Instagrams weighs more in determining the ads you get.
I feel the data in the files Instagram provides is limited so it affects a truer picture. Ideally, it would show every ad I have seen (because there is no way I have only seen that many). That way you can then count how many ads from each account. Just because I've only seen 5 ads from vinyl accounts doesn't mean I haven't seen multiple ads from those same accounts.
For ad improvement, I just discovered Instagram lets you view links you have clicked on in the app. However, I think there should also be a general posts or stories viewed history because sometimes I am scrolling quickly through people stories, see an ad for something that interests me but since I scrolled past so fast I missed it and can't go back to it.
In conclusion, the top liked accounts are what I expected and the comparison to the ads seen cannot be accomplished confidently. In order to further this type of data analysis I feel would just require a lot more time since the data Instagram allows you to download doesn't contain enough information. To get a better idea of understanding of what kind of posts someone would like (for example, what are the most common hashtags from posts a user has liked) would require manually going to each post the user has liked in their data file and copying and pasting the hashtags from those posts into a file. Then a similar thing could be done for ads but since Instagram doesn't seem to keep track of each ad, the user would have to save and keep track of each ad they have seen.