This notebook shows the data wrangling and analysis of tweet archive data of a popular dog rating page on Twitter called 'WeRateDogs'. It demonstrates gathering, assessment and cleaning of the data following different methods all as part of the data wrangling process. It also demonstrates the analysis of this data and documents the various insights and visualizations.
The data to be used in this project includes:
unique identifier of a particular Tweet.
if the represented Tweet is a reply, this field will contain the integer representation of the original Tweet's ID.
if the represented Tweet is a reply, this field will contain the string representation of the original Tweet's author ID.
the date and time at which the tweet was posted
utility used to post the tweet as a HTML-formatted string
the actual UTF-8 text of the tweet.
if the represented Tweet is a retweet, this field will contain the integer representation of the original Tweet's ID
if the represented Tweet is a retweet, this field will contain the integer representation of the original Tweet's author ID
the date and time at which the retweet was posted
the full url link of the tweet
the integer representation of the dog rating.
the integer representation of the overall value of the rating
the name of the dog
a big pupper usually older
label given to a dog that is excessively fury
a small doggo, usually younger.
a transitional phase between pupper and doggo
unique identifier of a particular Tweet.
url link to the image associated with the given Tweet.
since a tweet can have multiple images, this indicates the number of the image corresponding to the most confident prediction.
p1 is the algorithm's #1 prediction for the image in the tweet
p1_conf is how confident the algorithm is in its #1 prediction
p1_dog is whether or not the #1 prediction is a breed of dog
p2 is the algorithm's second most likely prediction
p2_conf is how confident the algorithm is in its #2 prediction
p2_dog is whether or not the #2 prediction is a breed of dog
p3 is the algorithm's third most likely prediction
p3_conf is how confident the algorithm is in its #3 prediction
p3_dog is whether or not the #3 prediction is a breed of dog
unique identifier of a particular Tweet.
the number of times a Tweet has been retweeted.
the number of times a Tweet has been favorited.
#installing tweepy into the environment
!pip install tweepy
Requirement already satisfied: tweepy in /opt/conda/lib/python3.6/site-packages (3.5.0) Requirement already satisfied: requests>=2.4.3 in /opt/conda/lib/python3.6/site-packages (from tweepy) (2.18.4) Requirement already satisfied: requests_oauthlib>=0.4.1 in /opt/conda/lib/python3.6/site-packages (from tweepy) (0.8.0) Requirement already satisfied: six>=1.7.3 in /opt/conda/lib/python3.6/site-packages (from tweepy) (1.11.0) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (3.0.4) Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (2.6) Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (1.22) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (2019.11.28) Requirement already satisfied: oauthlib>=0.6.2 in /opt/conda/lib/python3.6/site-packages (from requests_oauthlib>=0.4.1->tweepy) (2.0.6)
# importing all the packages that will be required.
import pandas as pd
import requests
import tweepy
import json
import numpy as np
import re
import functools
import matplotlib.pyplot as plt
% matplotlib inline
import seaborn as sns
# reading twitter-archive-enhanced.csv into a dataframe
tweet_archive = pd.read_csv('twitter-archive-enhanced.csv')
tweet_archive.head(3)
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Phineas. He's a mystical boy. Only eve... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643... | 13 | 10 | Phineas | None | None | None | None |
1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Tilly. She's just checking pup on you.... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421... | 13 | 10 | Tilly | None | None | None | None |
2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Archie. He is a rare Norwegian Pouncin... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181... | 12 | 10 | Archie | None | None | None | None |
# programmatically downloading the image_predictions.tsv file
url ='https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)
#saving the contents to the computer
with open('image_predictions.tsv', mode = 'wb') as file:
file.write(response.content)
image_df = pd.read_csv('image_predictions.tsv', sep = '\t')
image_df.head(3)
tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
#api
api_key = "YOU API KEY HERE"
api_secrets = "YOUR API SECRET KEY HERE"
access_token = "YOUR ACCESS TOKEN KEY HERE"
access_secret = "YOUR ACCESS TOKEN SECRET HERE"
# Authenticate to Twitter
auth = tweepy.OAuthHandler(api_key,api_secrets)
auth.set_access_token(access_token,access_secret)
api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
#create list of tweet ids from archive_df tweet_id column
tweet_ids = []
for id in tweet_archive.tweet_id:
tweet_ids.append(str(id))
print(len(tweet_ids))
2356
#create empty list for available tweets
tweets = []
#create empty list for unavailable tweets
unavailable_tweets = []
#gather each tweet's json data by id
for id in tweet_ids:
try:
tweet = (api.get_status(id))._json
tweets.append({'tweet_id':tweet['id'],'retweet_count':tweet['retweet_count'],'favorite_count':tweet['favorite_count']})
except:
unavailable_tweets.append(id)
indices = list(range(len(tweets)))
with open('tweet_json.txt', mode = 'w') as file:
for i in indices
file.write(json.dumps(tweets[i]['tweet_id']))
file.write('\t')
file.write(json.dumps(tweets[i]['retweet_count']))
file.write('\t')
file.write(json.dumps(tweets[i]['favorite_count']))
file.write('\n')
Rate limit reached. Sleeping for: 743 Rate limit reached. Sleeping for: 743
# check for available attributes of the json data retrieved from the api
list(tweet.keys())
['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'extended_entities', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'possibly_sensitive_appealable', 'lang']
#reading the 'tweet_json.txt' file into a dataframe
tweet_counts = pd.read_csv('tweet_json.txt', sep ='\t', header = None, names = ['tweet_id', 'retweet_count','favorite_count'])
tweet_counts.head()
tweet_id | retweet_count | favorite_count | |
---|---|---|---|
0 | 892420643555336193 | 6979 | 33728 |
1 | 892177421306343426 | 5280 | 29255 |
2 | 891815181378084864 | 3466 | 21987 |
3 | 891689557279858688 | 7198 | 36823 |
4 | 891327558926688256 | 7723 | 35207 |
tweet_counts.shape
(2326, 3)
tweet_archive
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Phineas. He's a mystical boy. Only eve... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643... | 13 | 10 | Phineas | None | None | None | None |
1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Tilly. She's just checking pup on you.... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421... | 13 | 10 | Tilly | None | None | None | None |
2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Archie. He is a rare Norwegian Pouncin... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181... | 12 | 10 | Archie | None | None | None | None |
3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Darla. She commenced a snooze mid meal... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557... | 13 | 10 | Darla | None | None | None | None |
4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Franklin. He would like you to stop ca... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558... | 12 | 10 | Franklin | None | None | None | None |
5 | 891087950875897856 | NaN | NaN | 2017-07-29 00:08:17 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a majestic great white breaching ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891087950... | 13 | 10 | None | None | None | None | None |
6 | 890971913173991426 | NaN | NaN | 2017-07-28 16:27:12 +0000 | <a href="http://twitter.com/download/iphone" r... | Meet Jax. He enjoys ice cream so much he gets ... | NaN | NaN | NaN | https://gofundme.com/ydvmve-surgery-for-jax,ht... | 13 | 10 | Jax | None | None | None | None |
7 | 890729181411237888 | NaN | NaN | 2017-07-28 00:22:40 +0000 | <a href="http://twitter.com/download/iphone" r... | When you watch your owner call another dog a g... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890729181... | 13 | 10 | None | None | None | None | None |
8 | 890609185150312448 | NaN | NaN | 2017-07-27 16:25:51 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Zoey. She doesn't want to be one of th... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890609185... | 13 | 10 | Zoey | None | None | None | None |
9 | 890240255349198849 | NaN | NaN | 2017-07-26 15:59:51 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Cassie. She is a college pup. Studying... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890240255... | 14 | 10 | Cassie | doggo | None | None | None |
10 | 890006608113172480 | NaN | NaN | 2017-07-26 00:31:25 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Koda. He is a South Australian decksha... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890006608... | 13 | 10 | Koda | None | None | None | None |
11 | 889880896479866881 | NaN | NaN | 2017-07-25 16:11:53 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Bruno. He is a service shark. Only get... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889880896... | 13 | 10 | Bruno | None | None | None | None |
12 | 889665388333682689 | NaN | NaN | 2017-07-25 01:55:32 +0000 | <a href="http://twitter.com/download/iphone" r... | Here's a puppo that seems to be on the fence a... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889665388... | 13 | 10 | None | None | None | None | puppo |
13 | 889638837579907072 | NaN | NaN | 2017-07-25 00:10:02 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Ted. He does his best. Sometimes that'... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889638837... | 12 | 10 | Ted | None | None | None | None |
14 | 889531135344209921 | NaN | NaN | 2017-07-24 17:02:04 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Stuart. He's sporting his favorite fan... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889531135... | 13 | 10 | Stuart | None | None | None | puppo |
15 | 889278841981685760 | NaN | NaN | 2017-07-24 00:19:32 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Oliver. You're witnessing one of his m... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889278841... | 13 | 10 | Oliver | None | None | None | None |
16 | 888917238123831296 | NaN | NaN | 2017-07-23 00:22:39 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Jim. He found a fren. Taught him how t... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888917238... | 12 | 10 | Jim | None | None | None | None |
17 | 888804989199671297 | NaN | NaN | 2017-07-22 16:56:37 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Zeke. He has a new stick. Very proud o... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888804989... | 13 | 10 | Zeke | None | None | None | None |
18 | 888554962724278272 | NaN | NaN | 2017-07-22 00:23:06 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Ralphus. He's powering up. Attempting ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888554962... | 13 | 10 | Ralphus | None | None | None | None |
19 | 888202515573088257 | NaN | NaN | 2017-07-21 01:02:36 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: This is Canela. She attempted s... | 8.874740e+17 | 4.196984e+09 | 2017-07-19 00:47:34 +0000 | https://twitter.com/dog_rates/status/887473957... | 13 | 10 | Canela | None | None | None | None |
20 | 888078434458587136 | NaN | NaN | 2017-07-20 16:49:33 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Gerald. He was just told he didn't get... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888078434... | 12 | 10 | Gerald | None | None | None | None |
21 | 887705289381826560 | NaN | NaN | 2017-07-19 16:06:48 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Jeffrey. He has a monopoly on the pool... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887705289... | 13 | 10 | Jeffrey | None | None | None | None |
22 | 887517139158093824 | NaN | NaN | 2017-07-19 03:39:09 +0000 | <a href="http://twitter.com/download/iphone" r... | I've yet to rate a Venezuelan Hover Wiener. Th... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887517139... | 14 | 10 | such | None | None | None | None |
23 | 887473957103951883 | NaN | NaN | 2017-07-19 00:47:34 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Canela. She attempted some fancy porch... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887473957... | 13 | 10 | Canela | None | None | None | None |
24 | 887343217045368832 | NaN | NaN | 2017-07-18 16:08:03 +0000 | <a href="http://twitter.com/download/iphone" r... | You may not have known you needed to see this ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887343217... | 13 | 10 | None | None | None | None | None |
25 | 887101392804085760 | NaN | NaN | 2017-07-18 00:07:08 +0000 | <a href="http://twitter.com/download/iphone" r... | This... is a Jubilant Antarctic House Bear. We... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887101392... | 12 | 10 | None | None | None | None | None |
26 | 886983233522544640 | NaN | NaN | 2017-07-17 16:17:36 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Maya. She's very shy. Rarely leaves he... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886983233... | 13 | 10 | Maya | None | None | None | None |
27 | 886736880519319552 | NaN | NaN | 2017-07-16 23:58:41 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Mingus. He's a wonderful father to his... | NaN | NaN | NaN | https://www.gofundme.com/mingusneedsus,https:/... | 13 | 10 | Mingus | None | None | None | None |
28 | 886680336477933568 | NaN | NaN | 2017-07-16 20:14:00 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Derek. He's late for a dog meeting. 13... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886680336... | 13 | 10 | Derek | None | None | None | None |
29 | 886366144734445568 | NaN | NaN | 2017-07-15 23:25:31 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Roscoe. Another pupper fallen victim t... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886366144... | 12 | 10 | Roscoe | None | None | pupper | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2326 | 666411507551481857 | NaN | NaN | 2015-11-17 00:24:19 +0000 | <a href="http://twitter.com/download/iphone" r... | This is quite the dog. Gets really excited whe... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666411507... | 2 | 10 | quite | None | None | None | None |
2327 | 666407126856765440 | NaN | NaN | 2015-11-17 00:06:54 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a southern Vesuvius bumblegruff. Can d... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666407126... | 7 | 10 | a | None | None | None | None |
2328 | 666396247373291520 | NaN | NaN | 2015-11-16 23:23:41 +0000 | <a href="http://twitter.com/download/iphone" r... | Oh goodness. A super rare northeast Qdoba kang... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666396247... | 9 | 10 | None | None | None | None | None |
2329 | 666373753744588802 | NaN | NaN | 2015-11-16 21:54:18 +0000 | <a href="http://twitter.com/download/iphone" r... | Those are sunglasses and a jean jacket. 11/10 ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666373753... | 11 | 10 | None | None | None | None | None |
2330 | 666362758909284353 | NaN | NaN | 2015-11-16 21:10:36 +0000 | <a href="http://twitter.com/download/iphone" r... | Unique dog here. Very small. Lives in containe... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666362758... | 6 | 10 | None | None | None | None | None |
2331 | 666353288456101888 | NaN | NaN | 2015-11-16 20:32:58 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a mixed Asiago from the Galápagos... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666353288... | 8 | 10 | None | None | None | None | None |
2332 | 666345417576210432 | NaN | NaN | 2015-11-16 20:01:42 +0000 | <a href="http://twitter.com/download/iphone" r... | Look at this jokester thinking seat belt laws ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666345417... | 10 | 10 | None | None | None | None | None |
2333 | 666337882303524864 | NaN | NaN | 2015-11-16 19:31:45 +0000 | <a href="http://twitter.com/download/iphone" r... | This is an extremely rare horned Parthenon. No... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666337882... | 9 | 10 | an | None | None | None | None |
2334 | 666293911632134144 | NaN | NaN | 2015-11-16 16:37:02 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a funny dog. Weird toes. Won't come do... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666293911... | 3 | 10 | a | None | None | None | None |
2335 | 666287406224695296 | NaN | NaN | 2015-11-16 16:11:11 +0000 | <a href="http://twitter.com/download/iphone" r... | This is an Albanian 3 1/2 legged Episcopalian... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666287406... | 1 | 2 | an | None | None | None | None |
2336 | 666273097616637952 | NaN | NaN | 2015-11-16 15:14:19 +0000 | <a href="http://twitter.com/download/iphone" r... | Can take selfies 11/10 https://t.co/ws2AMaNwPW | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666273097... | 11 | 10 | None | None | None | None | None |
2337 | 666268910803644416 | NaN | NaN | 2015-11-16 14:57:41 +0000 | <a href="http://twitter.com/download/iphone" r... | Very concerned about fellow dog trapped in com... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666268910... | 10 | 10 | None | None | None | None | None |
2338 | 666104133288665088 | NaN | NaN | 2015-11-16 04:02:55 +0000 | <a href="http://twitter.com/download/iphone" r... | Not familiar with this breed. No tail (weird).... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666104133... | 1 | 10 | None | None | None | None | None |
2339 | 666102155909144576 | NaN | NaN | 2015-11-16 03:55:04 +0000 | <a href="http://twitter.com/download/iphone" r... | Oh my. Here you are seeing an Adobe Setter giv... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666102155... | 11 | 10 | None | None | None | None | None |
2340 | 666099513787052032 | NaN | NaN | 2015-11-16 03:44:34 +0000 | <a href="http://twitter.com/download/iphone" r... | Can stand on stump for what seems like a while... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666099513... | 8 | 10 | None | None | None | None | None |
2341 | 666094000022159362 | NaN | NaN | 2015-11-16 03:22:39 +0000 | <a href="http://twitter.com/download/iphone" r... | This appears to be a Mongolian Presbyterian mi... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666094000... | 9 | 10 | None | None | None | None | None |
2342 | 666082916733198337 | NaN | NaN | 2015-11-16 02:38:37 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a well-established sunblockerspan... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666082916... | 6 | 10 | None | None | None | None | None |
2343 | 666073100786774016 | NaN | NaN | 2015-11-16 01:59:36 +0000 | <a href="http://twitter.com/download/iphone" r... | Let's hope this flight isn't Malaysian (lol). ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666073100... | 10 | 10 | None | None | None | None | None |
2344 | 666071193221509120 | NaN | NaN | 2015-11-16 01:52:02 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a northern speckled Rhododendron.... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666071193... | 9 | 10 | None | None | None | None | None |
2345 | 666063827256086533 | NaN | NaN | 2015-11-16 01:22:45 +0000 | <a href="http://twitter.com/download/iphone" r... | This is the happiest dog you will ever see. Ve... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666063827... | 10 | 10 | the | None | None | None | None |
2346 | 666058600524156928 | NaN | NaN | 2015-11-16 01:01:59 +0000 | <a href="http://twitter.com/download/iphone" r... | Here is the Rand Paul of retrievers folks! He'... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666058600... | 8 | 10 | the | None | None | None | None |
2347 | 666057090499244032 | NaN | NaN | 2015-11-16 00:55:59 +0000 | <a href="http://twitter.com/download/iphone" r... | My oh my. This is a rare blond Canadian terrie... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666057090... | 9 | 10 | a | None | None | None | None |
2348 | 666055525042405380 | NaN | NaN | 2015-11-16 00:49:46 +0000 | <a href="http://twitter.com/download/iphone" r... | Here is a Siberian heavily armored polar bear ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666055525... | 10 | 10 | a | None | None | None | None |
2349 | 666051853826850816 | NaN | NaN | 2015-11-16 00:35:11 +0000 | <a href="http://twitter.com/download/iphone" r... | This is an odd dog. Hard on the outside but lo... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666051853... | 2 | 10 | an | None | None | None | None |
2350 | 666050758794694657 | NaN | NaN | 2015-11-16 00:30:50 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a truly beautiful English Wilson Staff... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666050758... | 10 | 10 | a | None | None | None | None |
2351 | 666049248165822465 | NaN | NaN | 2015-11-16 00:24:50 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a 1949 1st generation vulpix. Enj... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666049248... | 5 | 10 | None | None | None | None | None |
2352 | 666044226329800704 | NaN | NaN | 2015-11-16 00:04:52 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a purebred Piers Morgan. Loves to Netf... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666044226... | 6 | 10 | a | None | None | None | None |
2353 | 666033412701032449 | NaN | NaN | 2015-11-15 23:21:54 +0000 | <a href="http://twitter.com/download/iphone" r... | Here is a very happy pup. Big fan of well-main... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666033412... | 9 | 10 | a | None | None | None | None |
2354 | 666029285002620928 | NaN | NaN | 2015-11-15 23:05:30 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a western brown Mitsubishi terrier. Up... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666029285... | 7 | 10 | a | None | None | None | None |
2355 | 666020888022790149 | NaN | NaN | 2015-11-15 22:32:08 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a Japanese Irish Setter. Lost eye... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666020888... | 8 | 10 | None | None | None | None | None |
2356 rows × 17 columns
tweet_archive.sample(10)
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1062 | 741099773336379392 | NaN | NaN | 2016-06-10 02:48:49 +0000 | <a href="http://vine.co" rel="nofollow">Vine -... | This is Ted. He's given up. 11/10 relatable af... | NaN | NaN | NaN | https://vine.co/v/ixHYvdxUx1L | 11 | 10 | Ted | None | None | None | None |
2347 | 666057090499244032 | NaN | NaN | 2015-11-16 00:55:59 +0000 | <a href="http://twitter.com/download/iphone" r... | My oh my. This is a rare blond Canadian terrie... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666057090... | 9 | 10 | a | None | None | None | None |
555 | 803692223237865472 | NaN | NaN | 2016-11-29 20:08:52 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: I present to you... Dog Jesus. ... | 6.914169e+17 | 4.196984e+09 | 2016-01-25 00:26:41 +0000 | https://twitter.com/dog_rates/status/691416866... | 13 | 10 | None | None | None | None | None |
1305 | 707387676719185920 | NaN | NaN | 2016-03-09 02:08:59 +0000 | <a href="http://twitter.com/download/iphone" r... | Meet Clarkus. He's a Skinny Eastern Worcesters... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/707387676... | 10 | 10 | Clarkus | None | None | None | None |
1060 | 741438259667034112 | NaN | NaN | 2016-06-11 01:13:51 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Tucker. He's still figuring out couche... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/741438259... | 9 | 10 | Tucker | None | None | None | None |
149 | 863079547188785154 | 6.671522e+17 | 4.196984e+09 | 2017-05-12 17:12:53 +0000 | <a href="http://twitter.com/download/iphone" r... | Ladies and gentlemen... I found Pipsy. He may ... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/863079547... | 14 | 10 | None | None | None | None | None |
1073 | 739932936087216128 | NaN | NaN | 2016-06-06 21:32:13 +0000 | <a href="http://twitter.com/download/iphone" r... | Say hello to Rorie. She's zen af. Just enjoyin... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/739932936... | 10 | 10 | Rorie | None | None | None | None |
1828 | 676263575653122048 | NaN | NaN | 2015-12-14 04:52:55 +0000 | <a href="http://twitter.com/download/iphone" r... | All this pupper wanted to do was go skiing. No... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/676263575... | 10 | 10 | None | None | None | pupper | None |
1938 | 673906403526995968 | NaN | NaN | 2015-12-07 16:46:21 +0000 | <a href="http://twitter.com/download/iphone" r... | Guys I'm getting real tired of this. We only r... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/673906403... | 3 | 10 | None | None | None | None | None |
1054 | 742423170473463808 | NaN | NaN | 2016-06-13 18:27:32 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Bell. She likes holding hands. 12/10 w... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/742423170... | 12 | 10 | Bell | None | None | None | None |
tweet_archive.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 17 columns): tweet_id 2356 non-null int64 in_reply_to_status_id 78 non-null float64 in_reply_to_user_id 78 non-null float64 timestamp 2356 non-null object source 2356 non-null object text 2356 non-null object retweeted_status_id 181 non-null float64 retweeted_status_user_id 181 non-null float64 retweeted_status_timestamp 181 non-null object expanded_urls 2297 non-null object rating_numerator 2356 non-null int64 rating_denominator 2356 non-null int64 name 2356 non-null object doggo 2356 non-null object floofer 2356 non-null object pupper 2356 non-null object puppo 2356 non-null object dtypes: float64(4), int64(3), object(10) memory usage: 313.0+ KB
tweet_archive.isnull()
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
1 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
3 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
4 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
5 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
6 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
7 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
8 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
9 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
10 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
11 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
12 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
13 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
14 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
15 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
16 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
17 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
18 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
19 | False | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
20 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
21 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
22 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
23 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
24 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
25 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
26 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
27 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
28 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
29 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2326 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2327 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2328 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2329 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2330 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2331 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2332 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2333 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2334 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2335 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2336 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2337 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2338 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2339 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2340 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2341 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2342 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2343 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2344 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2345 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2346 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2347 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2348 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2349 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2350 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2351 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2352 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2353 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2354 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2355 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | False | False | False |
2356 rows × 17 columns
tweet_archive.name
0 Phineas 1 Tilly 2 Archie 3 Darla 4 Franklin 5 None 6 Jax 7 None 8 Zoey 9 Cassie 10 Koda 11 Bruno 12 None 13 Ted 14 Stuart 15 Oliver 16 Jim 17 Zeke 18 Ralphus 19 Canela 20 Gerald 21 Jeffrey 22 such 23 Canela 24 None 25 None 26 Maya 27 Mingus 28 Derek 29 Roscoe ... 2326 quite 2327 a 2328 None 2329 None 2330 None 2331 None 2332 None 2333 an 2334 a 2335 an 2336 None 2337 None 2338 None 2339 None 2340 None 2341 None 2342 None 2343 None 2344 None 2345 the 2346 the 2347 a 2348 a 2349 an 2350 a 2351 None 2352 a 2353 a 2354 a 2355 None Name: name, Length: 2356, dtype: object
tweet_archive.name.value_counts()
None 745 a 55 Charlie 12 Lucy 11 Oliver 11 Cooper 11 Penny 10 Tucker 10 Lola 10 Winston 9 Bo 9 the 8 Sadie 8 an 7 Bailey 7 Buddy 7 Daisy 7 Toby 7 Scout 6 Dave 6 Jack 6 Oscar 6 Bella 6 Koda 6 Jax 6 Rusty 6 Milo 6 Leo 6 Stanley 6 Chester 5 ... Thor 1 Rueben 1 by 1 Jeremy 1 Bobble 1 Liam 1 Stella 1 General 1 Cheryl 1 Lilli 1 Travis 1 Berkeley 1 Banditt 1 Ralphie 1 Dixie 1 Grizzwald 1 Strudel 1 Orion 1 Kona 1 Lupe 1 Ace 1 Zuzu 1 Chloe 1 Remy 1 Rufio 1 Vinscent 1 Sprinkles 1 Eevee 1 Pancake 1 Mosby 1 Name: name, Length: 957, dtype: int64
words = []
for n in tweet_archive.name:
if n[0].islower():
words.append(n)
other_words = list(np.unique(words))
print(other_words)
['a', 'actually', 'all', 'an', 'by', 'getting', 'his', 'incredibly', 'infuriating', 'just', 'life', 'light', 'mad', 'my', 'not', 'officially', 'old', 'one', 'quite', 'space', 'such', 'the', 'this', 'unacceptable', 'very']
tweet_archive.pupper.value_counts()
None 2099 pupper 257 Name: pupper, dtype: int64
tweet_archive.doggo.value_counts()
None 2259 doggo 97 Name: doggo, dtype: int64
sum(tweet_archive.duplicated())
0
image_df
tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True |
4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True |
5 | 666050758794694657 | https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg | 1 | Bernese_mountain_dog | 0.651137 | True | English_springer | 0.263788 | True | Greater_Swiss_Mountain_dog | 0.016199 | True |
6 | 666051853826850816 | https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg | 1 | box_turtle | 0.933012 | False | mud_turtle | 0.045885 | False | terrapin | 0.017885 | False |
7 | 666055525042405380 | https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg | 1 | chow | 0.692517 | True | Tibetan_mastiff | 0.058279 | True | fur_coat | 0.054449 | False |
8 | 666057090499244032 | https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg | 1 | shopping_cart | 0.962465 | False | shopping_basket | 0.014594 | False | golden_retriever | 0.007959 | True |
9 | 666058600524156928 | https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg | 1 | miniature_poodle | 0.201493 | True | komondor | 0.192305 | True | soft-coated_wheaten_terrier | 0.082086 | True |
10 | 666063827256086533 | https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg | 1 | golden_retriever | 0.775930 | True | Tibetan_mastiff | 0.093718 | True | Labrador_retriever | 0.072427 | True |
11 | 666071193221509120 | https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg | 1 | Gordon_setter | 0.503672 | True | Yorkshire_terrier | 0.174201 | True | Pekinese | 0.109454 | True |
12 | 666073100786774016 | https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg | 1 | Walker_hound | 0.260857 | True | English_foxhound | 0.175382 | True | Ibizan_hound | 0.097471 | True |
13 | 666082916733198337 | https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg | 1 | pug | 0.489814 | True | bull_mastiff | 0.404722 | True | French_bulldog | 0.048960 | True |
14 | 666094000022159362 | https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg | 1 | bloodhound | 0.195217 | True | German_shepherd | 0.078260 | True | malinois | 0.075628 | True |
15 | 666099513787052032 | https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg | 1 | Lhasa | 0.582330 | True | Shih-Tzu | 0.166192 | True | Dandie_Dinmont | 0.089688 | True |
16 | 666102155909144576 | https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg | 1 | English_setter | 0.298617 | True | Newfoundland | 0.149842 | True | borzoi | 0.133649 | True |
17 | 666104133288665088 | https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg | 1 | hen | 0.965932 | False | cock | 0.033919 | False | partridge | 0.000052 | False |
18 | 666268910803644416 | https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg | 1 | desktop_computer | 0.086502 | False | desk | 0.085547 | False | bookcase | 0.079480 | False |
19 | 666273097616637952 | https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg | 1 | Italian_greyhound | 0.176053 | True | toy_terrier | 0.111884 | True | basenji | 0.111152 | True |
20 | 666287406224695296 | https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg | 1 | Maltese_dog | 0.857531 | True | toy_poodle | 0.063064 | True | miniature_poodle | 0.025581 | True |
21 | 666293911632134144 | https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg | 1 | three-toed_sloth | 0.914671 | False | otter | 0.015250 | False | great_grey_owl | 0.013207 | False |
22 | 666337882303524864 | https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg | 1 | ox | 0.416669 | False | Newfoundland | 0.278407 | True | groenendael | 0.102643 | True |
23 | 666345417576210432 | https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg | 1 | golden_retriever | 0.858744 | True | Chesapeake_Bay_retriever | 0.054787 | True | Labrador_retriever | 0.014241 | True |
24 | 666353288456101888 | https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg | 1 | malamute | 0.336874 | True | Siberian_husky | 0.147655 | True | Eskimo_dog | 0.093412 | True |
25 | 666362758909284353 | https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg | 1 | guinea_pig | 0.996496 | False | skunk | 0.002402 | False | hamster | 0.000461 | False |
26 | 666373753744588802 | https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg | 1 | soft-coated_wheaten_terrier | 0.326467 | True | Afghan_hound | 0.259551 | True | briard | 0.206803 | True |
27 | 666396247373291520 | https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg | 1 | Chihuahua | 0.978108 | True | toy_terrier | 0.009397 | True | papillon | 0.004577 | True |
28 | 666407126856765440 | https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg | 1 | black-and-tan_coonhound | 0.529139 | True | bloodhound | 0.244220 | True | flat-coated_retriever | 0.173810 | True |
29 | 666411507551481857 | https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg | 1 | coho | 0.404640 | False | barracouta | 0.271485 | False | gar | 0.189945 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2045 | 886366144734445568 | https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg | 1 | French_bulldog | 0.999201 | True | Chihuahua | 0.000361 | True | Boston_bull | 0.000076 | True |
2046 | 886680336477933568 | https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg | 1 | convertible | 0.738995 | False | sports_car | 0.139952 | False | car_wheel | 0.044173 | False |
2047 | 886736880519319552 | https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg | 1 | kuvasz | 0.309706 | True | Great_Pyrenees | 0.186136 | True | Dandie_Dinmont | 0.086346 | True |
2048 | 886983233522544640 | https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg | 2 | Chihuahua | 0.793469 | True | toy_terrier | 0.143528 | True | can_opener | 0.032253 | False |
2049 | 887101392804085760 | https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg | 1 | Samoyed | 0.733942 | True | Eskimo_dog | 0.035029 | True | Staffordshire_bullterrier | 0.029705 | True |
2050 | 887343217045368832 | https://pbs.twimg.com/ext_tw_video_thumb/88734... | 1 | Mexican_hairless | 0.330741 | True | sea_lion | 0.275645 | False | Weimaraner | 0.134203 | True |
2051 | 887473957103951883 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
2052 | 887517139158093824 | https://pbs.twimg.com/ext_tw_video_thumb/88751... | 1 | limousine | 0.130432 | False | tow_truck | 0.029175 | False | shopping_cart | 0.026321 | False |
2053 | 887705289381826560 | https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg | 1 | basset | 0.821664 | True | redbone | 0.087582 | True | Weimaraner | 0.026236 | True |
2054 | 888078434458587136 | https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg | 1 | French_bulldog | 0.995026 | True | pug | 0.000932 | True | bull_mastiff | 0.000903 | True |
2055 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
2056 | 888554962724278272 | https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg | 3 | Siberian_husky | 0.700377 | True | Eskimo_dog | 0.166511 | True | malamute | 0.111411 | True |
2057 | 888804989199671297 | https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg | 1 | golden_retriever | 0.469760 | True | Labrador_retriever | 0.184172 | True | English_setter | 0.073482 | True |
2058 | 888917238123831296 | https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg | 1 | golden_retriever | 0.714719 | True | Tibetan_mastiff | 0.120184 | True | Labrador_retriever | 0.105506 | True |
2059 | 889278841981685760 | https://pbs.twimg.com/ext_tw_video_thumb/88927... | 1 | whippet | 0.626152 | True | borzoi | 0.194742 | True | Saluki | 0.027351 | True |
2060 | 889531135344209921 | https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg | 1 | golden_retriever | 0.953442 | True | Labrador_retriever | 0.013834 | True | redbone | 0.007958 | True |
2061 | 889638837579907072 | https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg | 1 | French_bulldog | 0.991650 | True | boxer | 0.002129 | True | Staffordshire_bullterrier | 0.001498 | True |
2062 | 889665388333682689 | https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg | 1 | Pembroke | 0.966327 | True | Cardigan | 0.027356 | True | basenji | 0.004633 | True |
2063 | 889880896479866881 | https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg | 1 | French_bulldog | 0.377417 | True | Labrador_retriever | 0.151317 | True | muzzle | 0.082981 | False |
2064 | 890006608113172480 | https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg | 1 | Samoyed | 0.957979 | True | Pomeranian | 0.013884 | True | chow | 0.008167 | True |
2065 | 890240255349198849 | https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg | 1 | Pembroke | 0.511319 | True | Cardigan | 0.451038 | True | Chihuahua | 0.029248 | True |
2066 | 890609185150312448 | https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg | 1 | Irish_terrier | 0.487574 | True | Irish_setter | 0.193054 | True | Chesapeake_Bay_retriever | 0.118184 | True |
2067 | 890729181411237888 | https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg | 2 | Pomeranian | 0.566142 | True | Eskimo_dog | 0.178406 | True | Pembroke | 0.076507 | True |
2068 | 890971913173991426 | https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg | 1 | Appenzeller | 0.341703 | True | Border_collie | 0.199287 | True | ice_lolly | 0.193548 | False |
2069 | 891087950875897856 | https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg | 1 | Chesapeake_Bay_retriever | 0.425595 | True | Irish_terrier | 0.116317 | True | Indian_elephant | 0.076902 | False |
2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True |
2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False |
2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True |
2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True |
2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False |
2075 rows × 12 columns
image_df.sample(5)
tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1881 | 847116187444137987 | https://pbs.twimg.com/media/C8GPrNDW4AAkLde.jpg | 1 | white_wolf | 0.128935 | False | American_Staffordshire_terrier | 0.113434 | True | dingo | 0.081231 | False |
1837 | 837366284874571778 | https://pbs.twimg.com/media/C57sMJwXMAASBSx.jpg | 1 | American_Staffordshire_terrier | 0.660085 | True | Staffordshire_bullterrier | 0.334947 | True | dalmatian | 0.002697 | True |
995 | 708149363256774660 | https://pbs.twimg.com/media/CdPaEkHW8AA-Wom.jpg | 1 | Cardigan | 0.350993 | True | basset | 0.164555 | True | toy_terrier | 0.080484 | True |
481 | 675362609739206656 | https://pbs.twimg.com/media/CV9etctWUAAl5Hp.jpg | 1 | Labrador_retriever | 0.479008 | True | ice_bear | 0.218289 | False | kuvasz | 0.139911 | True |
1937 | 860276583193509888 | https://pbs.twimg.com/media/C_BQ_NlVwAAgYGD.jpg | 1 | lakeside | 0.312299 | False | dock | 0.159842 | False | canoe | 0.070795 | False |
sum(image_df.duplicated())
0
image_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2075 entries, 0 to 2074 Data columns (total 12 columns): tweet_id 2075 non-null int64 jpg_url 2075 non-null object img_num 2075 non-null int64 p1 2075 non-null object p1_conf 2075 non-null float64 p1_dog 2075 non-null bool p2 2075 non-null object p2_conf 2075 non-null float64 p2_dog 2075 non-null bool p3 2075 non-null object p3_conf 2075 non-null float64 p3_dog 2075 non-null bool dtypes: bool(3), float64(3), int64(2), object(4) memory usage: 152.1+ KB
#checking for non-original tweets' tweet ids in the image_df dataframe.
img_list = (list(image_df.tweet_id))
#takes approximately 30 mins
#create empty list for unoriginal tweets
unoriginal_tweets = []
#create empty list for unavailable tweets
unavailable_tweets = []
#create empty list for original tweets
original_tweets = []
#gather each tweet's json data by id
for id in img_list:
try:
img_tweet = (api.get_status(id))._json
if pd.isna(img_tweet['in_reply_to_status_id']) is False or 'retweeted_status' in img_tweet.keys():
unoriginal_tweets.append(id)
elif pd.isna(img_tweet['in_reply_to_status_id']) is True and 'retweeted_status' not in img_tweet.keys():
original_tweets.append(id)
except:
unavailable_tweets.append(id)
Rate limit reached. Sleeping for: 741 Rate limit reached. Sleeping for: 742
print(unoriginal_tweets)
[667550882905632768, 667550904950915073, 669353438988365824, 671729906628341761, 674754018082705410, 674793399141146624, 674999807681908736, 675349384339542016, 675707330206547968, 675870721063669760, 684225744407494656, 684538444857667585, 692142790915014657, 694356675654983680, 695767669421768709, 703425003149250560, 704871453724954624, 705786532653883392, 711998809858043904, 729838605770891264, 746818907684614144, 746906459439529985, 752309394570878976, 754874841593970688, 757597904299253760, 757729163776290825, 759159934323924993, 761371037149827077, 761750502866649088, 766078092750233600, 770093767776997377, 771171053431250945, 772615324260794368, 775898661951791106, 776819012571455488, 777641927919427584, 778396591732486144, 780476555013349377, 780496263422808064, 782021823840026624, 783347506784731136, 786036967502913536, 788070120937619456, 790723298204217344, 791026214425268224, 793614319594401792, 794355576146903043, 794983741416415232, 796177847564038144, 798340744599797760, 798628517273620480, 798644042770751489, 798665375516884993, 798673117451325440, 798694562394996736, 798697898615730177, 799774291445383169, 800443802682937345, 802265048156610565, 802624713319034886, 803692223237865472, 804413760345620481, 805958939288408065, 806242860592926720, 807059379405148160, 808134635716833280, 809808892968534016, 813944609378369540, 816014286006976512, 816829038950027264, 817181837579653120, 818588835076603904, 819015331746349057, 819015337530290176, 820446719150292993, 821813639212650496, 822647212903690241, 823269594223824897, 824796380199809024, 829878982036299777, 832040443403784192, 832215726631055365, 832769181346996225, 838916489579200512, 839290600511926273, 841833993020538882, 844979544864018432, 847971574464610304, 856526610513747968, 860924035999428608, 863079547188785154, 867072653475098625, 877611172832227328, 885311592912609280]
print(unavailable_tweets)
[680055455951884288, 754011816964026368, 759566828574212096, 759923798737051648, 771004394259247104, 779123168116150273, 802247111496568832, 829374341691346946, 837012587749474308, 837366284874571778, 842892208864923648, 844704788403113984, 851861385021730816, 851953902622658560, 861769973181624320, 872261713294495745, 873697596434513921, 888202515573088257]
tweet_counts.head()
tweet_id | retweet_count | favorite_count | |
---|---|---|---|
0 | 892420643555336193 | 6979 | 33728 |
1 | 892177421306343426 | 5280 | 29255 |
2 | 891815181378084864 | 3466 | 21987 |
3 | 891689557279858688 | 7198 | 36823 |
4 | 891327558926688256 | 7723 | 35207 |
tweet_counts.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2326 entries, 0 to 2325 Data columns (total 3 columns): tweet_id 2326 non-null int64 retweet_count 2326 non-null int64 favorite_count 2326 non-null int64 dtypes: int64(3) memory usage: 54.6 KB
tweet_counts.describe()
tweet_id | retweet_count | favorite_count | |
---|---|---|---|
count | 2.326000e+03 | 2326.000000 | 2326.000000 |
mean | 7.417480e+17 | 2457.867154 | 7018.926913 |
std | 6.818802e+16 | 4166.197661 | 10908.789726 |
min | 6.660209e+17 | 1.000000 | 0.000000 |
25% | 6.780814e+17 | 492.250000 | 1219.500000 |
50% | 7.178159e+17 | 1145.500000 | 3040.500000 |
75% | 7.986402e+17 | 2843.500000 | 8562.750000 |
max | 8.924206e+17 | 70429.000000 | 144312.000000 |
sum(tweet_counts.duplicated())
0
all_columns = pd.Series(list(tweet_archive) + list(tweet_counts) + list(image_df))
all_columns[all_columns.duplicated()]
17 tweet_id 20 tweet_id dtype: object
tweet_archive table
¶'name', 'doggo', 'floofer', 'pupper' and 'puppo' columns have missing values misrepresented as strings('None')
'name' column has non-valid words (adjectives,articles,adverbs) as values in some entries
'name', 'doggo', 'floofer', 'pupper' and 'puppo' columns have missing values.
presence of some records with non null values in the 'retweeted_status_id' column,these are not original tweets
presence of some records with non null values in 'reply_to_status_id' column, these are not original tweets
records dating beyond 2017-08-01
'time_stamp' column is an object datatype
image_df table
¶doggo,pupper,puppo,floofer are represented as different column headers instead of individual values of one column(tweet_archive table)
dog breeds are spread out in separate columns(image_df table)
data is in separate tables.
# Make copies of original pieces of data
tweetarch_clean = tweet_archive.copy()
img_clean = image_df.copy()
tweetcounts_clean = tweet_counts.copy()
Use the .replace() function to replace the specified values from the assessment with null values.This fixes both issue 1 and 2.
other_words.append('None')
for word in other_words:
tweetarch_clean.replace(to_replace = word , value = np.nan, inplace = True)
tweetarch_clean.isnull().sample(10)
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
623 | False | True | True | False | False | False | True | True | True | False | False | False | False | True | True | True | True |
889 | False | True | True | False | False | False | True | True | True | False | False | False | False | False | True | False | True |
976 | False | True | True | False | False | False | True | True | True | False | False | False | False | True | True | True | True |
113 | False | False | False | False | False | False | True | True | True | True | False | False | True | True | True | True | True |
639 | False | True | True | False | False | False | True | True | True | False | False | False | True | True | True | True | True |
2036 | False | False | False | False | False | False | True | True | True | False | False | False | True | True | True | True | True |
574 | False | True | True | False | False | False | False | False | False | False | False | False | False | False | True | True | True |
1412 | False | True | True | False | False | False | True | True | True | False | False | False | False | True | True | True | True |
1756 | False | True | True | False | False | False | True | True | True | False | False | False | False | True | True | True | True |
1707 | False | True | True | False | False | False | True | True | True | False | False | False | True | True | True | True | True |
tweetarch_clean.name.value_counts()
Charlie 12 Cooper 11 Oliver 11 Lucy 11 Tucker 10 Penny 10 Lola 10 Bo 9 Winston 9 Sadie 8 Bailey 7 Toby 7 Buddy 7 Daisy 7 Stanley 6 Jax 6 Leo 6 Koda 6 Oscar 6 Bella 6 Dave 6 Rusty 6 Milo 6 Scout 6 Jack 6 Louis 5 George 5 Gus 5 Sammy 5 Phil 5 .. Blipson 1 Kara 1 Cuddles 1 Walker 1 Rorie 1 Lolo 1 Rontu 1 Asher 1 Gerbald 1 Dook 1 Todo 1 Brudge 1 Swagger 1 Rinna 1 Willy 1 Tupawc 1 Ember 1 Bradley 1 Eugene 1 Fido 1 Iggy 1 Pherb 1 Jeb 1 Monty 1 Tess 1 Chloe 1 Chuck 1 Tayzie 1 Aqua 1 Dug 1 Name: name, Length: 931, dtype: int64
tweetarch_clean.query(f'name == {other_words}')
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo |
---|
Use the values of the text column and regex methods to extract the required values for these columns
#create a function to find and replace null values in the dog stages
def fill_dog_stage(stage):
'''loops through all the values of the specified column,searches for null values, which if True,
loops through the corresponding values of the text column, searches for the appropriate value and replaces this value
into the specified position'''
indices = list(range(len(tweetarch_clean[stage])))
for i in indices:
if pd.isna(tweetarch_clean[stage][i]):
try:
tweetarch_clean.loc[i,stage] = re.findall(stage,tweetarch_clean.text[i])[0]
except:
tweetarch_clean.loc[i,stage] = np.nan
#create a list of the dog stages loop through each one, passing to the function
stages = ['pupper','puppo','doggo','floofer']
for stage in stages:
fill_dog_stage(stage)
tweetarch_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 17 columns): tweet_id 2356 non-null int64 in_reply_to_status_id 78 non-null float64 in_reply_to_user_id 78 non-null float64 timestamp 2356 non-null object source 2356 non-null object text 2356 non-null object retweeted_status_id 181 non-null float64 retweeted_status_user_id 181 non-null float64 retweeted_status_timestamp 181 non-null object expanded_urls 2297 non-null object rating_numerator 2356 non-null int64 rating_denominator 2356 non-null int64 name 1502 non-null object doggo 107 non-null object floofer 10 non-null object pupper 281 non-null object puppo 38 non-null object dtypes: float64(4), int64(3), object(10) memory usage: 313.0+ KB
Create a new column for dog stages and loop through each of these columns for appropriate values for the dog stage column to unpivot the three columns. Drop the three columns when done.
dog_stages = [] #doggo,floofer,pupper,puppo
indices = list(range(len(tweetarch_clean)))
for i in indices:
if not pd.isna(tweetarch_clean.floofer[i]):
dog_stages.append('floofer')
elif not pd.isna(tweetarch_clean.puppo[i]):
dog_stages.append('puppo')
elif not pd.isna(tweetarch_clean.pupper[i]):
dog_stages.append('pupper')
elif not pd.isna(tweetarch_clean.doggo[i]):
dog_stages.append('doggo')
else:
dog_stages.append(np.nan)
tweetarch_clean['dog_stage'] = dog_stages
tweetarch_clean = tweetarch_clean.drop(['pupper','doggo','puppo','floofer'], axis = 1)
tweetarch_clean.dog_stage.value_counts()
pupper 281 doggo 92 puppo 38 floofer 10 Name: dog_stage, dtype: int64
Note that some records had values for either both doggo & pupper, doggo & puppo and doggo & floofer.For streamlining purposes, one had to be chosen over the other, hence the slight changes in some of the dog stage value counts. I chose the other stages over 'doggo', placing it last in the loop statement because this term is generally used more loosely relative to the rest, as per my personal judgement.
tweetarch_clean.columns
Index(['tweet_id', 'in_reply_to_status_id', 'in_reply_to_user_id', 'timestamp', 'source', 'text', 'retweeted_status_id', 'retweeted_status_user_id', 'retweeted_status_timestamp', 'expanded_urls', 'rating_numerator', 'rating_denominator', 'name', 'dog_stage'], dtype='object')
Delete all the non-original tweets using the original_tweets ids list and vectorization.
for id in img_clean.tweet_id:
if id not in original_tweets:
img_clean.drop(img_clean[img_clean['tweet_id'] == id].index, inplace = True)
#reset the index
img_clean.reset_index(inplace = True)
#should evaluate to True
list(img_clean.tweet_id) == original_tweets
True
Create a new column for dog breeds and use the p and p_dog columns to populate the column, then drop all rows with null values in the dog breeds column. This also takes care of issue #8
Then drop the columns that will no longer be needed.
breeds = []
indices = list(range(len(img_clean.tweet_id)))
for i in indices:
if img_clean.p1_dog[i]:
breeds.append(img_clean.p1[i])
elif img_clean.p2_dog[i]:
breeds.append(img_clean.p2[i])
elif img_clean.p3_dog[i]:
breeds.append(img_clean.p3[i])
else:
breeds.append(np.nan)
img_clean['dog_breed'] = breeds
img_clean.dropna(inplace = True)
img_clean = img_clean.drop(["img_num",'p1','p1_conf','p1_dog','p2','p2_conf','p2_dog','p3','p3_conf','p3_dog'],axis = 1)
img_clean.columns
Index(['index', 'tweet_id', 'jpg_url', 'dog_breed'], dtype='object')
Filter out these records using boolean indexing. Afterwards drop associated columns and any other columns that will not be further used.
tweetarch_clean = tweetarch_clean.loc[pd.isna(tweetarch_clean['in_reply_to_status_id'])]
tweetarch_clean = tweetarch_clean.loc[pd.isna(tweetarch_clean['retweeted_status_id'])]
tweetarch_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2097 entries, 0 to 2355 Data columns (total 14 columns): tweet_id 2097 non-null int64 in_reply_to_status_id 0 non-null float64 in_reply_to_user_id 0 non-null float64 timestamp 2097 non-null object source 2097 non-null object text 2097 non-null object retweeted_status_id 0 non-null float64 retweeted_status_user_id 0 non-null float64 retweeted_status_timestamp 0 non-null object expanded_urls 2094 non-null object rating_numerator 2097 non-null int64 rating_denominator 2097 non-null int64 name 1390 non-null object dog_stage 372 non-null object dtypes: float64(4), int64(3), object(7) memory usage: 245.7+ KB
tweetarch_clean = tweetarch_clean.drop(['in_reply_to_status_id', 'in_reply_to_user_id','retweeted_status_id', 'retweeted_status_user_id','retweeted_status_timestamp','source','expanded_urls',], axis = 1 )
tweetarch_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2097 entries, 0 to 2355 Data columns (total 7 columns): tweet_id 2097 non-null int64 timestamp 2097 non-null object text 2097 non-null object rating_numerator 2097 non-null int64 rating_denominator 2097 non-null int64 name 1390 non-null object dog_stage 372 non-null object dtypes: int64(3), object(4) memory usage: 131.1+ KB
Merge the three dataframes on the tweet_id column to make one master dataframe.
data_frames = [img_clean,tweetarch_clean,tweetcounts_clean]
master_df = functools.reduce(lambda left,right: pd.merge(left,right,on=['tweet_id'],
how='left'), data_frames)
master_df
tweet_id | jpg_url | dog_breed | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | retweet_count | favorite_count | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | Welsh_springer_spaniel | 2015-11-15 22:32:08 +0000 | Here we have a Japanese Irish Setter. Lost eye... | 8.0 | 10.0 | NaN | NaN | 421.0 | 2285.0 |
1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | redbone | 2015-11-15 23:05:30 +0000 | This is a western brown Mitsubishi terrier. Up... | 7.0 | 10.0 | NaN | NaN | 39.0 | 112.0 |
2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | German_shepherd | 2015-11-15 23:21:54 +0000 | Here is a very happy pup. Big fan of well-main... | 9.0 | 10.0 | NaN | NaN | 36.0 | 100.0 |
3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | Rhodesian_ridgeback | 2015-11-16 00:04:52 +0000 | This is a purebred Piers Morgan. Loves to Netf... | 6.0 | 10.0 | NaN | NaN | 115.0 | 245.0 |
4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | miniature_pinscher | 2015-11-16 00:24:50 +0000 | Here we have a 1949 1st generation vulpix. Enj... | 5.0 | 10.0 | NaN | NaN | 36.0 | 88.0 |
5 | 666050758794694657 | https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg | Bernese_mountain_dog | 2015-11-16 00:30:50 +0000 | This is a truly beautiful English Wilson Staff... | 10.0 | 10.0 | NaN | NaN | 50.0 | 115.0 |
6 | 666055525042405380 | https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg | chow | 2015-11-16 00:49:46 +0000 | Here is a Siberian heavily armored polar bear ... | 10.0 | 10.0 | NaN | NaN | 196.0 | 367.0 |
7 | 666057090499244032 | https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg | golden_retriever | 2015-11-16 00:55:59 +0000 | My oh my. This is a rare blond Canadian terrie... | 9.0 | 10.0 | NaN | NaN | 112.0 | 247.0 |
8 | 666058600524156928 | https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg | miniature_poodle | 2015-11-16 01:01:59 +0000 | Here is the Rand Paul of retrievers folks! He'... | 8.0 | 10.0 | NaN | NaN | 47.0 | 99.0 |
9 | 666063827256086533 | https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg | golden_retriever | 2015-11-16 01:22:45 +0000 | This is the happiest dog you will ever see. Ve... | 10.0 | 10.0 | NaN | NaN | 180.0 | 395.0 |
10 | 666071193221509120 | https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg | Gordon_setter | 2015-11-16 01:52:02 +0000 | Here we have a northern speckled Rhododendron.... | 9.0 | 10.0 | NaN | NaN | 51.0 | 127.0 |
11 | 666073100786774016 | https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg | Walker_hound | 2015-11-16 01:59:36 +0000 | Let's hope this flight isn't Malaysian (lol). ... | 10.0 | 10.0 | NaN | NaN | 130.0 | 274.0 |
12 | 666082916733198337 | https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg | pug | 2015-11-16 02:38:37 +0000 | Here we have a well-established sunblockerspan... | 6.0 | 10.0 | NaN | NaN | 37.0 | 92.0 |
13 | 666094000022159362 | https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg | bloodhound | 2015-11-16 03:22:39 +0000 | This appears to be a Mongolian Presbyterian mi... | 9.0 | 10.0 | NaN | NaN | 63.0 | 142.0 |
14 | 666099513787052032 | https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg | Lhasa | 2015-11-16 03:44:34 +0000 | Can stand on stump for what seems like a while... | 8.0 | 10.0 | NaN | NaN | 53.0 | 133.0 |
15 | 666102155909144576 | https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg | English_setter | 2015-11-16 03:55:04 +0000 | Oh my. Here you are seeing an Adobe Setter giv... | 11.0 | 10.0 | NaN | NaN | 11.0 | 66.0 |
16 | 666273097616637952 | https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg | Italian_greyhound | 2015-11-16 15:14:19 +0000 | Can take selfies 11/10 https://t.co/ws2AMaNwPW | 11.0 | 10.0 | NaN | NaN | 66.0 | 151.0 |
17 | 666287406224695296 | https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg | Maltese_dog | 2015-11-16 16:11:11 +0000 | This is an Albanian 3 1/2 legged Episcopalian... | 1.0 | 2.0 | NaN | NaN | 56.0 | 123.0 |
18 | 666337882303524864 | https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg | Newfoundland | 2015-11-16 19:31:45 +0000 | This is an extremely rare horned Parthenon. No... | 9.0 | 10.0 | NaN | NaN | 79.0 | 168.0 |
19 | 666345417576210432 | https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg | golden_retriever | 2015-11-16 20:01:42 +0000 | Look at this jokester thinking seat belt laws ... | 10.0 | 10.0 | NaN | NaN | 122.0 | 241.0 |
20 | 666353288456101888 | https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg | malamute | 2015-11-16 20:32:58 +0000 | Here we have a mixed Asiago from the Galápagos... | 8.0 | 10.0 | NaN | NaN | 56.0 | 179.0 |
21 | 666373753744588802 | https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg | soft-coated_wheaten_terrier | 2015-11-16 21:54:18 +0000 | Those are sunglasses and a jean jacket. 11/10 ... | 11.0 | 10.0 | NaN | NaN | 73.0 | 162.0 |
22 | 666396247373291520 | https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg | Chihuahua | 2015-11-16 23:23:41 +0000 | Oh goodness. A super rare northeast Qdoba kang... | 9.0 | 10.0 | NaN | NaN | 68.0 | 147.0 |
23 | 666407126856765440 | https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg | black-and-tan_coonhound | 2015-11-17 00:06:54 +0000 | This is a southern Vesuvius bumblegruff. Can d... | 7.0 | 10.0 | NaN | NaN | 30.0 | 93.0 |
24 | 666418789513326592 | https://pbs.twimg.com/media/CT-YWb7U8AA7QnN.jpg | toy_terrier | 2015-11-17 00:53:15 +0000 | This is Walter. He is an Alaskan Terrapin. Lov... | 10.0 | 10.0 | Walter | NaN | 39.0 | 107.0 |
25 | 666421158376562688 | https://pbs.twimg.com/media/CT-aggCXAAIMfT3.jpg | Blenheim_spaniel | 2015-11-17 01:02:40 +0000 | *internally screaming* 12/10 https://t.co/YMcr... | 12.0 | 10.0 | NaN | NaN | 91.0 | 272.0 |
26 | 666428276349472768 | https://pbs.twimg.com/media/CT-g-0DUwAEQdSn.jpg | Pembroke | 2015-11-17 01:30:57 +0000 | Here we have an Austrian Pulitzer. Collectors ... | 7.0 | 10.0 | NaN | NaN | 67.0 | 139.0 |
27 | 666430724426358785 | https://pbs.twimg.com/media/CT-jNYqW4AAPi2M.jpg | Irish_terrier | 2015-11-17 01:40:41 +0000 | Oh boy what a pup! Sunglasses take this one to... | 6.0 | 10.0 | NaN | NaN | 160.0 | 276.0 |
28 | 666435652385423360 | https://pbs.twimg.com/media/CT-nsTQWEAEkyDn.jpg | Chesapeake_Bay_retriever | 2015-11-17 02:00:15 +0000 | "Can you behave? You're ruining my wedding day... | 10.0 | 10.0 | NaN | NaN | 42.0 | 137.0 |
29 | 666437273139982337 | https://pbs.twimg.com/media/CT-pKmRWIAAxUWj.jpg | Chihuahua | 2015-11-17 02:06:42 +0000 | Here we see a lone northeastern Cumberbatch. H... | 7.0 | 10.0 | NaN | NaN | 40.0 | 106.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1721 | 885528943205470208 | https://pbs.twimg.com/media/DEoH3yvXgAAzQtS.jpg | pug | 2017-07-13 15:58:47 +0000 | This is Maisey. She fell asleep mid-excavation... | 13.0 | 10.0 | Maisey | NaN | 5311.0 | 31522.0 |
1722 | 885984800019947520 | https://pbs.twimg.com/media/DEumeWWV0AA-Z61.jpg | Blenheim_spaniel | 2017-07-14 22:10:11 +0000 | Viewer discretion advised. This is Jimbo. He w... | 12.0 | 10.0 | Jimbo | NaN | 5592.0 | 28526.0 |
1723 | 886258384151887873 | https://pbs.twimg.com/media/DEyfTG4UMAE4aE9.jpg | pug | 2017-07-15 16:17:19 +0000 | This is Waffles. His doggles are pupside down.... | 13.0 | 10.0 | Waffles | NaN | 5262.0 | 24459.0 |
1724 | 886366144734445568 | https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg | French_bulldog | 2017-07-15 23:25:31 +0000 | This is Roscoe. Another pupper fallen victim t... | 12.0 | 10.0 | Roscoe | pupper | 2614.0 | 18501.0 |
1725 | 886736880519319552 | https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg | kuvasz | 2017-07-16 23:58:41 +0000 | This is Mingus. He's a wonderful father to his... | 13.0 | 10.0 | Mingus | NaN | 2619.0 | 10466.0 |
1726 | 886983233522544640 | https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg | Chihuahua | 2017-07-17 16:17:36 +0000 | This is Maya. She's very shy. Rarely leaves he... | 13.0 | 10.0 | Maya | NaN | 6294.0 | 30275.0 |
1727 | 887101392804085760 | https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg | Samoyed | 2017-07-18 00:07:08 +0000 | This... is a Jubilant Antarctic House Bear. We... | 12.0 | 10.0 | NaN | NaN | 4975.0 | 26916.0 |
1728 | 887343217045368832 | https://pbs.twimg.com/ext_tw_video_thumb/88734... | Mexican_hairless | 2017-07-18 16:08:03 +0000 | You may not have known you needed to see this ... | 13.0 | 10.0 | NaN | NaN | 8788.0 | 29515.0 |
1729 | 887473957103951883 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | Pembroke | 2017-07-19 00:47:34 +0000 | This is Canela. She attempted some fancy porch... | 13.0 | 10.0 | Canela | NaN | 14973.0 | 60028.0 |
1730 | 887705289381826560 | https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg | basset | 2017-07-19 16:06:48 +0000 | This is Jeffrey. He has a monopoly on the pool... | 13.0 | 10.0 | Jeffrey | NaN | 4522.0 | 26553.0 |
1731 | 888078434458587136 | https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg | French_bulldog | 2017-07-20 16:49:33 +0000 | This is Gerald. He was just told he didn't get... | 12.0 | 10.0 | Gerald | NaN | 2886.0 | 19115.0 |
1732 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | Pembroke | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1733 | 888554962724278272 | https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg | Siberian_husky | 2017-07-22 00:23:06 +0000 | This is Ralphus. He's powering up. Attempting ... | 13.0 | 10.0 | Ralphus | NaN | 2868.0 | 17266.0 |
1734 | 888804989199671297 | https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg | golden_retriever | 2017-07-22 16:56:37 +0000 | This is Zeke. He has a new stick. Very proud o... | 13.0 | 10.0 | Zeke | NaN | 3518.0 | 22409.0 |
1735 | 888917238123831296 | https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg | golden_retriever | 2017-07-23 00:22:39 +0000 | This is Jim. He found a fren. Taught him how t... | 12.0 | 10.0 | Jim | NaN | 3747.0 | 25551.0 |
1736 | 889278841981685760 | https://pbs.twimg.com/ext_tw_video_thumb/88927... | whippet | 2017-07-24 00:19:32 +0000 | This is Oliver. You're witnessing one of his m... | 13.0 | 10.0 | Oliver | NaN | 4426.0 | 22052.0 |
1737 | 889531135344209921 | https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg | golden_retriever | 2017-07-24 17:02:04 +0000 | This is Stuart. He's sporting his favorite fan... | 13.0 | 10.0 | Stuart | puppo | 1874.0 | 13321.0 |
1738 | 889638837579907072 | https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg | French_bulldog | 2017-07-25 00:10:02 +0000 | This is Ted. He does his best. Sometimes that'... | 12.0 | 10.0 | Ted | NaN | 3702.0 | 23605.0 |
1739 | 889665388333682689 | https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg | Pembroke | 2017-07-25 01:55:32 +0000 | Here's a puppo that seems to be on the fence a... | 13.0 | 10.0 | NaN | puppo | 8312.0 | 41910.0 |
1740 | 889880896479866881 | https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg | French_bulldog | 2017-07-25 16:11:53 +0000 | This is Bruno. He is a service shark. Only get... | 13.0 | 10.0 | Bruno | NaN | 4145.0 | 24506.0 |
1741 | 890006608113172480 | https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg | Samoyed | 2017-07-26 00:31:25 +0000 | This is Koda. He is a South Australian decksha... | 13.0 | 10.0 | Koda | NaN | 6120.0 | 26973.0 |
1742 | 890240255349198849 | https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg | Pembroke | 2017-07-26 15:59:51 +0000 | This is Cassie. She is a college pup. Studying... | 14.0 | 10.0 | Cassie | doggo | 6081.0 | 27878.0 |
1743 | 890609185150312448 | https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg | Irish_terrier | 2017-07-27 16:25:51 +0000 | This is Zoey. She doesn't want to be one of th... | 13.0 | 10.0 | Zoey | NaN | 3605.0 | 24455.0 |
1744 | 890729181411237888 | https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg | Pomeranian | 2017-07-28 00:22:40 +0000 | When you watch your owner call another dog a g... | 13.0 | 10.0 | NaN | NaN | 15695.0 | 56701.0 |
1745 | 890971913173991426 | https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg | Appenzeller | 2017-07-28 16:27:12 +0000 | Meet Jax. He enjoys ice cream so much he gets ... | 13.0 | 10.0 | Jax | NaN | 1649.0 | 10340.0 |
1746 | 891087950875897856 | https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg | Chesapeake_Bay_retriever | 2017-07-29 00:08:17 +0000 | Here we have a majestic great white breaching ... | 13.0 | 10.0 | NaN | NaN | 2590.0 | 17757.0 |
1747 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | basset | 2017-07-29 16:00:24 +0000 | This is Franklin. He would like you to stop ca... | 12.0 | 10.0 | Franklin | NaN | 7723.0 | 35207.0 |
1748 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | Labrador_retriever | 2017-07-30 15:58:51 +0000 | This is Darla. She commenced a snooze mid meal... | 13.0 | 10.0 | Darla | NaN | 7198.0 | 36823.0 |
1749 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | Chihuahua | 2017-07-31 00:18:03 +0000 | This is Archie. He is a rare Norwegian Pouncin... | 12.0 | 10.0 | Archie | NaN | 3466.0 | 21987.0 |
1750 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | Chihuahua | 2017-08-01 00:17:27 +0000 | This is Tilly. She's just checking pup on you.... | 13.0 | 10.0 | Tilly | NaN | 5280.0 | 29255.0 |
1751 rows × 11 columns
Merging the data sets filters out all records dating beyond 2017-08-01, fixing issue#6
.This also means that issue#7
is now a non-issue because the column will be dropped, as the main reason for converting it's data type would have been to filter out the records that dated beyond the specified date.
master_df = master_df.drop(['timestamp'], axis = 1)
master_df.columns
Index(['index', 'tweet_id', 'jpg_url', 'dog_breed', 'text', 'rating_numerator', 'rating_denominator', 'name', 'dog_stage', 'retweet_count', 'favorite_count'], dtype='object')
# storing the cleaned master dataframe in a csv file
master_df.to_csv('twitter_archive_master.csv',index=False)
# importing data into a dataframe
tweets = pd.read_csv('twitter_archive_master.csv')
# cheching summary statistics for the data
tweets.describe()
index | tweet_id | rating_numerator | rating_denominator | retweet_count | favorite_count | |
---|---|---|---|---|---|---|
count | 1658.000000 | 1.658000e+03 | 1658.000000 | 1658.000000 | 1657.000000 | 1657.000000 |
mean | 1048.162847 | 7.392385e+17 | 11.385404 | 10.471049 | 2283.746530 | 8004.847314 |
std | 594.020413 | 6.794971e+16 | 7.506534 | 6.359152 | 4158.936293 | 11787.806358 |
min | 0.000000 | 6.660209e+17 | 0.000000 | 2.000000 | 11.000000 | 66.000000 |
25% | 548.250000 | 6.773835e+17 | 10.000000 | 10.000000 | 514.000000 | 1806.000000 |
50% | 1049.500000 | 7.138309e+17 | 11.000000 | 10.000000 | 1131.000000 | 3723.000000 |
75% | 1552.750000 | 7.931619e+17 | 12.000000 | 10.000000 | 2587.000000 | 9904.000000 |
max | 2073.000000 | 8.921774e+17 | 165.000000 | 150.000000 | 70429.000000 | 144312.000000 |
#checking the denominator values
tweets.rating_denominator.value_counts()
10 1642 50 3 80 2 11 2 150 1 120 1 110 1 90 1 70 1 40 1 20 1 7 1 2 1 Name: rating_denominator, dtype: int64
Since the values of the denominator vary, to standardize the ratings for analysis, create an additional percentage column
#creating a percentage rating column
tweets['rating'] = (tweets.rating_numerator / tweets.rating_denominator)* 100
tweets.rating.min()
0.0
The lowest dog rating given on WeRateDogs is 0.
#tweet with the lowest dog rating
tweets.loc[tweets.rating == tweets.rating.min()]
index | tweet_id | jpg_url | dog_breed | text | rating_numerator | rating_denominator | name | dog_stage | retweet_count | favorite_count | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1455 | 1824 | 835152434251116546 | https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpg | American_Staffordshire_terrier | When you're so blinded by your systematic plag... | 0 | 10 | NaN | NaN | 2755.0 | 20923.0 | 0.0 |
# getting the tweet text
tweets.loc[tweets.rating == tweets.rating.min()].text[1455]
"When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10 https://t.co/YbEJPkg4Ag"
#getting the tweet image url for download
tweets.loc[tweets.rating == tweets.rating.min()].jpg_url[1455]
'https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpg'
The tweet with the lowest rating on WeRateDogs, at 0/10 is of an American Staffordshire terrier stating:
"When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10"
This tweet however still has a favorite count of 20,923 which is more than double the average favorite count, and a retweet count of 2,755 which is still slightly higher than the average retweet count.
#tweet with the highest retweet count
tweets.loc[tweets.retweet_count == tweets.retweet_count.max()]
index | tweet_id | jpg_url | dog_breed | text | rating_numerator | rating_denominator | name | dog_stage | retweet_count | favorite_count | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
978 | 1221 | 744234799360020481 | https://pbs.twimg.com/ext_tw_video_thumb/74423... | Labrador_retriever | Here's a doggo realizing you can stand in a po... | 13 | 10 | NaN | doggo | 70429.0 | 144312.0 | 130.0 |
#getting the tweet text
tweets.loc[tweets.retweet_count == tweets.retweet_count.max()].text[978]
"Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4"
#getting the tweet image for downaload
tweets.loc[tweets.retweet_count == tweets.retweet_count.max()].jpg_url[978]
'https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg'
The tweet with the highest retweet count at 70,429 retweets, also happening to have the highest favorite count at 144,312 likes, is of a Labrador retriever stating:
"Here's a doggo realizing you can stand in a pool. 13/10 enlightened af"
tweets.name.value_counts()
Cooper 10 Charlie 9 Lucy 9 Tucker 9 Oliver 9 Penny 8 Sadie 7 Daisy 7 Winston 7 Toby 6 Jax 6 Lola 6 Koda 6 Stanley 5 Bella 5 Oscar 5 Leo 5 Bo 5 Rusty 5 Bear 4 Gus 4 Larry 4 Louis 4 Winnie 4 Alfie 4 Duke 4 Brody 4 Maggie 4 Bentley 4 Cassie 4 .. Tayzie 1 Lipton 1 Aqua 1 Rocco 1 Clybe 1 Carll 1 Humphrey 1 Brownie 1 Jay 1 Asher 1 Brat 1 Lili 1 Eve 1 Ed 1 Grizz 1 Travis 1 Cheesy 1 Sage 1 Jockson 1 Hero 1 Antony 1 Buddah 1 Jarvis 1 Snickers 1 Bonaparte 1 Klevin 1 Betty 1 Cora 1 Bruno 1 Dug 1 Name: name, Length: 830, dtype: int64
Some of the most common dog names on WeRateDogs include Cooper at the very top, with 10 dogs, Charlie, Lucy, Tucker, Oliver. Each having 9 dogs with the stated names and Penny having 8 dogs.
tweets.groupby('dog_breed').favorite_count.sum().sort_values(ascending = False)
dog_breed golden_retriever 1639666.0 Labrador_retriever 1028020.0 Pembroke 902671.0 Chihuahua 664894.0 French_bulldog 524718.0 Samoyed 480684.0 chow 388436.0 cocker_spaniel 351165.0 pug 324429.0 malamute 303845.0 toy_poodle 274019.0 Pomeranian 273973.0 Chesapeake_Bay_retriever 265152.0 Eskimo_dog 242658.0 Cardigan 229040.0 German_shepherd 184672.0 Lakeland_terrier 182833.0 basset 170964.0 miniature_pinscher 168423.0 Great_Pyrenees 157068.0 whippet 139361.0 standard_poodle 131277.0 Shetland_sheepdog 130964.0 Bedlington_terrier 128905.0 Staffordshire_bullterrier 128875.0 English_springer 121099.0 Italian_greyhound 120516.0 Siberian_husky 119303.0 Rottweiler 118892.0 flat-coated_retriever 115892.0 ... Dandie_Dinmont 20572.0 Australian_terrier 19072.0 basenji 18967.0 Gordon_setter 18523.0 Welsh_springer_spaniel 17243.0 bluetick 17055.0 keeshond 16513.0 redbone 16486.0 Bouvier_des_Flandres 15318.0 cairn 15302.0 miniature_schnauzer 14438.0 wire-haired_fox_terrier 14355.0 Rhodesian_ridgeback 13811.0 Appenzeller 12507.0 curly-coated_retriever 11830.0 Lhasa 11146.0 Ibizan_hound 10865.0 toy_terrier 8100.0 Scottish_deerhound 7650.0 Sussex_spaniel 6853.0 silky_terrier 6222.0 Tibetan_terrier 6218.0 clumber 6184.0 Scotch_terrier 3018.0 EntleBucher 2246.0 Brabancon_griffon 2229.0 groenendael 1949.0 standard_schnauzer 1686.0 Irish_wolfhound 1285.0 Japanese_spaniel 1111.0 Name: favorite_count, Length: 113, dtype: float64
The top 5 most popular dog breeds on WeRateDogs in descending order are:
1.Golden retrievers having 1,639,666 total favorite counts
2.Labrador retrievers at 1,028,020 favorite counts
3.Pembrokes at 902,671 favorite counts
4.Chihuahuas at 664,894 favorite counts
5.French Bulldogs at 524,718 favorite counts
print(f'Total number of retweets: {tweets.retweet_count.sum()}')
print(f'Total number of likes: {tweets.favorite_count.sum()}')
print(f'Difference: {tweets.favorite_count.sum()-tweets.retweet_count.sum()}')
Total number of retweets: 3784168.0 Total number of likes: 13264032.0 Difference: 9479864.0
WeRateDogs tweets are more likely to be favorited, having 9.4 million more favorite counts than retweet counts.
#plotting to show correlation
tweets.plot(x = 'retweet_count',y = 'favorite_count', kind = 'scatter');
plt.title('Correlation between tweet retweet count and tweet favorite count');
There is generally a positive correlation between retweet count and favorite count. As a tweet's favorite count increases, the number of retweets is also highly likely to increase.
tweets.plot(x = 'retweet_count',y = 'rating', kind = 'scatter');
plt.title('Correlation between tweet retweet count and dog rating');
The scatter plot reveals a horizontal line of best fit, indicating there is no correlation between dog rating and retweet count. The rating given to a dog therefore does not affect the number of times a particular dog's tweet will be retweeted.
tweets.plot(x = 'favorite_count',y = 'rating', kind = 'scatter');
plt.title('Correlation between tweet favorite count and dog rating');
Much like with retweet count, this scatter plot also reveals a horizontal line of best fit, indicating, there is no correlation between dog rating and favorite count. The rating given to a dog therefore does not affect the number of times that the particular dog's tweet will be favorited.