▶️ First, run the code cell below to import unittest
, a module used for 🧭 Check Your Work sections and the autograder.
import unittest
tc = unittest.TestCase()
pandas
: Use alias pd
.numpy
: Use alias np
.# YOUR CODE BEGINS
# YOUR CODE ENDS
# DO NOT CHANGE THE CODE BELOW
_test_case = 'import-packages'
_points = 1
import sys
tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")
vaderSentiment
package¶▶️ Run the code cell below to install vaderSentiment
.
# install vaderSentiment
!pip install vaderSentiment
▶️ Run the code cell below to initialize vaderSentiment.SentimentIntensityAnalyzer
.
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# create an analyzer instance
analyzer = SentimentIntensityAnalyzer()
▶️ Run the code cell below to display full tweet text in the notebook outputs.
pd.set_option('display.max_colwidth', 1000)
analyzer.polarity_scores()
method¶To analyze a piece of text using VADER, use the analyzer.polarity_scores()
method. Here is an example using a string.
compound score is the overall score between -1 (most extreme negative) and +1 (most extreme positive). pos, neg, and neu are ratios for proportions of text that fall in each category. These should all add up to be 1. We are mainly interested in the compound score.
vader_score = analyzer.polarity_scores("Got my BeautyBase merch I am happy I got the first batch 🥰")
print(vader_score)
vader_score = analyzer.polarity_scores("Got my BeautyBase merch I am happy I got the first batch 🥰")
print(vader_score)
Here is another example.
sentence = "Beautyrite same day shipping is phenomenal its unbelievable i got my package in few hours"
vader_score = analyzer.polarity_scores(sentence)
print(vader_score)
vader_score1
.plaintext
I have never been spoken to so terrible and lied to so many times from what should be professionals. I was trying to help my daughter rent a home and was lied to numerous times about their fair housing practices and was given different information from different individuals. This seems to be a very poorly ran company that I would NEVER trust to oversee my daughters safe housing experience! As a parent do not ever allow your adult child to rent from this company! They will not be taken care of!
{'neg': 0.162, 'neu': 0.765, 'pos': 0.074, 'compound': -0.8565}
# YOUR CODE BEGINS
# YOUR CODE ENDS
print(vader_score1)
# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = 'calculate-vader-score-of-a-string'
_points = 1
_obfuscate = True
import base64 as _b64
_64 = _b64.b64decode('IyBETyBOT1QgQ0hBTkdFIFRIRSBDT0RFIEJFTE9XCgoKc190ZXN0ID0gIkkgaGF2ZSBuZXZlciBiZW\
VuIHNwb2tlbiB0byBzbyB0ZXJyaWJsZSBhbmQgbGllZCB0byBzbyBtYW55IHRpbWVzIGZyb20gd2hhdCBzaG91bGQgYmUgcHJvZm\
Vzc2lvbmFscy4gSSB3YXMgdHJ5aW5nIHRvIGhlbHAgbXkgZGF1Z2h0ZXIgcmVudCBhIGhvbWUgYW5kIHdhcyBsaWVkIHRvIG51bW\
Vyb3VzIHRpbWVzIGFib3V0IHRoZWlyIGZhaXIgaG91c2luZyBwcmFjdGljZXMgYW5kIHdhcyBnaXZlbiBkaWZmZXJlbnQgaW5mb3\
JtYXRpb24gZnJvbSBkaWZmZXJlbnQgaW5kaXZpZHVhbHMuIFRoaXMgc2VlbXMgdG8gYmUgYSB2ZXJ5IHBvb3JseSByYW4gY29tcG\
FueSB0aGF0IEkgd291bGQgTkVWRVIgdHJ1c3QgdG8gb3ZlcnNlZSBteSBkYXVnaHRlcnMgc2FmZSBob3VzaW5nIGV4cGVyaWVuY2\
UhIEFzIGEgcGFyZW50IGRvIG5vdCBldmVyIGFsbG93IHlvdXIgYWR1bHQgY2hpbGQgdG8gcmVudCBmcm9tIHRoaXMgY29tcGFueS\
EgVGhleSB3aWxsIG5vdCBiZSB0YWtlbiBjYXJlIG9mISIKCnZhZGVyX3Njb3JlMV9TT0wgPSBhbmFseXplci5wb2xhcml0eV9zY2\
9yZXMoc190ZXN0KQoKdGMuYXNzZXJ0RXF1YWwodmFkZXJfc2NvcmUxLCB2YWRlcl9zY29yZTFfU09MKQ==')
eval(compile(_64, '<string>', 'exec'))
analyzer.polarity_scores()
on a DataFrame column.¶To apply VADER to a pandas Series of strings, combine Series.apply()
and analyzer.polarity_scores()
methods. Below is an example to find the polarity scores on a DataFrame.
▶️ Run the code cell below to create a DataFrame with one string column.
sample_text_values = [
"VADER is smart, handsome, and funny.", # positive sentence example
"VADER is smart, handsome, and funny!", # punctuation emphasis handled correctly (sentiment intensity adjusted)
"VADER is very smart, handsome, and funny.", # booster words handled correctly (sentiment intensity adjusted)
"VADER is VERY SMART, handsome, and FUNNY.", # emphasis for ALLCAPS handled
"VADER is VERY SMART, handsome, and FUNNY!!!", # combination of signals - VADER appropriately adjusts intensity
"VADER is VERY SMART, uber handsome, and FRIGGIN FUNNY!!!", # booster words & punctuation make this close to ceiling for score
"VADER is not smart, handsome, nor funny.", # negation sentence example
"The book was good.", # positive sentence
"At least it isn't a horrible book.", # negated negative sentence with contraction
"The book was only kind of good.", # qualified positive sentence is handled correctly (intensity adjusted)
"The plot was good, but the characters are uncompelling and the dialog is not great.", # mixed negation sentence
"Today SUX!", # negative slang with capitalization emphasis
"Today only kinda sux! But I'll get by, lol", # mixed sentiment example with slang and constrastive conjunction "but"
"Make sure you :) or :D today!", # emoticons handled
"Catch utf-8 emoji such as such as 💘 and 💋 and 😁", # emojis handled
"Not bad at all" # Capitalized negation
]
df_sample = pd.DataFrame({
'text': sample_text_values
})
df_sample
▶️ Run the code cell below to calculate the polarity scores for each row and convert the output into a DataFrame.
# for each row of the text column, run the polarity_scores() function
vader_scores = df_sample['text'].apply(analyzer.polarity_scores)
# convert the Series of dictionaries into a DataFrame
df_vader_scores = pd.DataFrame(
vader_scores.tolist()
)
df_vader_scores.head()
▶️ Run the code cell below to concatenate df_sample
and df_vader_scores
horizontally.
df_merged = pd.concat([df_sample, df_vader_scores], axis=1)
df_merged
df_tweets = pd.read_csv('https://github.com/bdi475/datasets/raw/main/fake-tweets-beautyrite.csv')
df_tweets_backup = df_tweets.copy()
df_tweets.head(10)
df_tweets
, calcualte the polarity scores of each row.df_tweets
.df_merged
.text | neg | neu | pos | compound | |
---|---|---|---|---|---|
0 | every time I go into Beautyrite everybody’s makeup look bad, what’s that about? | 0.226 | 0.774 | 0.000 | -0.5423 |
1 | Got in my car to go to the gym but it took me to Beautyrite. | 0.000 | 1.000 | 0.000 | 0.0000 |
2 | Why don’t airports have helpful stores like Beautyrite? Why do we need 5 different burger restaurants? | 0.000 | 0.712 | 0.288 | 0.6868 |
3 | Y’all go to Beautyrite’s IG and look at their awful spring lineup 😭😭😭😭😭😭 | 0.484 | 0.516 | 0.000 | -0.9666 |
4 | Today I learned that Acme Makeups, BeautyRite, Wonka Factory are all owned by a single company. The illusion of choice... | 0.000 | 1.000 | 0.000 | 0.0000 |
# YOUR CODE BEGINS
# YOUR CODE ENDS
df_merged.head(5)
# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = 'calculate-vader-scores-of-a-text-column'
_points = 1
_obfuscate = True
import base64 as _b64
_64 = _b64.b64decode('IyBETyBOT1QgQ0hBTkdFIFRIRSBDT0RFIEJFTE9XCgoKdmFkZXJfc2NvcmVzX1NPTCA9IGRmX3R3ZW\
V0c19iYWNrdXBbJ3RleHQnXS5hcHBseShhbmFseXplci5wb2xhcml0eV9zY29yZXMpCgpkZl92YWRlcl9zY29yZXNfU09MID0gcG\
QuRGF0YUZyYW1lKAogICAgdmFkZXJfc2NvcmVzX1NPTC50b2xpc3QoKQopCgpkZl9tZXJnZWRfU09MID0gcGQuY29uY2F0KFtkZl\
90d2VldHNfYmFja3VwLCBkZl92YWRlcl9zY29yZXNfU09MXSwgYXhpcz0xKQpwZC50ZXN0aW5nLmFzc2VydF9mcmFtZV9lcXVhbC\
hkZl9tZXJnZWQuc29ydF92YWx1ZXMoZGZfbWVyZ2VkLmNvbHVtbnMudG9fbGlzdCgpKS5yZXNldF9pbmRleChkcm9wPVRydWUpLA\
ogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBkZl9tZXJnZWRfU09MLnNvcnRfdmFsdWVzKGRmX21lcmdlZF9TT0wuY29sdW\
1ucy50b2xpc3QoKSkucmVzZXRfaW5kZXgoZHJvcD1UcnVlKSk=')
eval(compile(_64, '<string>', 'exec'))
df_merged
, calcualte the average compound score.average_compound_score
.# YOUR CODE BEGINS
# YOUR CODE ENDS
print(f"The average compound score is {round(average_compound_score, 4)}")
# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = 'calculate-average-compound-score'
_points = 1
_obfuscate = True
import base64 as _b64
_64 = _b64.b64decode('IyBETyBOT1QgQ0hBTkdFIFRIRSBDT0RFIEJFTE9XCgoKYXZlcmFnZV9jb21wb3VuZF9zY29yZV9TT0\
wgPSBkZl9tZXJnZWRfU09MWydjb21wb3VuZCddLm1lYW4oKQoKdGMuYXNzZXJ0QWxtb3N0RXF1YWwoYXZlcmFnZV9jb21wb3VuZF\
9zY29yZSwgYXZlcmFnZV9jb21wb3VuZF9zY29yZV9TT0wp')
eval(compile(_64, '<string>', 'exec'))
df_merged
, filter the top 10 postive tweets.compound
column).df_top10_positive
.df_top10_positive
by compound
in descending order.text | neg | neu | pos | compound | |
---|---|---|---|---|---|
37 | My BFF gave me a Beautyrite gift card for my birthday 🥰 i love getting gift cards. | 0.000 | 0.406 | 0.594 | 0.9717 |
24 | IN LOVE WITH BEAUTY2LIPS 🤩🥰 | 0.000 | 0.364 | 0.636 | 0.9222 |
19 | Got my BeautyBase merch I am happy I got the first batch 🥰 | 0.000 | 0.56 | 0.44 | 0.9001 |
6 | Beautyrite is 20% off today (all purchases). 30% off Beautyrite collection.. if you buy 2 BeautyBase products you get a free bag! 🥰 | 0.000 | 0.688 | 0.312 | 0.8977 |
52 | I would like a girls winter date where we shop at Beautyrite. We have tea, read books together & excitingly share thoughts. Then we go home & watch Harry Potter in comfy clothes and blankets. Pls & ty | 0.000 | 0.742 | 0.258 | 0.8591 |
5 | My kid has entered the “how many chemicals and colors can I put on my face in one day” phase. Please help, like in the form of Beautyrite gift cards? | 0.000 | 0.714 | 0.286 | 0.8555 |
11 | Linda has 50 items in her Beautyrite cart😃 anyone wanna help a brotha out cuz ima be broke after the holidays😃 | 0.072 | 0.665 | 0.263 | 0.7579 |
55 | used my beautychai skincare today that i got via my favorite Beautyrite employee | 0.000 | 0.67 | 0.33 | 0.7506 |
feeling rlly good abt beauty2lips | |||||
43 | So there's another 20% off Beautyrite... what was the point of the VIP sale?!?! | 0.000 | 0.723 | 0.277 | 0.7164 |
2 | Why don’t airports have helpful stores like Beautyrite? Why do we need 5 different burger restaurants? | 0.000 | 0.712 | 0.288 | 0.6868 |
# YOUR CODE BEGINS
# YOUR CODE ENDS
df_top10_positive
# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = 'find-top-10-positive-tweets'
_points = 1
_obfuscate = True
import base64 as _b64
_64 = _b64.b64decode('IyBETyBOT1QgQ0hBTkdFIFRIRSBDT0RFIEJFTE9XCgoKZGZfdG9wMTBfcG9zaXRpdmVfU09MID0gZG\
ZfbWVyZ2VkX1NPTC5zb3J0X3ZhbHVlcygnY29tcG91bmQnLCBhc2NlbmRpbmc9RmFsc2UpLmhlYWQoMTApCnBkLnRlc3RpbmcuYX\
NzZXJ0X2ZyYW1lX2VxdWFsKGRmX3RvcDEwX3Bvc2l0aXZlLnJlc2V0X2luZGV4KGRyb3A9VHJ1ZSksCiAgICAgICAgICAgICAgIC\
AgICAgICAgICAgICAgIGRmX3RvcDEwX3Bvc2l0aXZlX1NPTC5yZXNldF9pbmRleChkcm9wPVRydWUpKQ==')
eval(compile(_64, '<string>', 'exec'))
df_merged
, filter the top 10 negative tweets.compound
column).df_top10_negative
.df_top10_negative
by compound
in ascending order.text | neg | neu | pos | compound | |
---|---|---|---|---|---|
3 | Y’all go to Beautyrite’s IG and look at their awful spring lineup 😭😭😭😭😭😭 | 0.484 | 0.516 | 0.000 | -0.9666 |
58 | I feel so bad that my girls Beautyrite package got stolen. She worked so hard for that . . . fucking thieves | 0.422 | 0.578 | 0.000 | -0.9209 |
59 | 😭😭You can DoorDash Beautyrite now??😭😭 | 0.515 | 0.485 | 0.000 | -0.9146 |
28 | Ok but the Beautyrite birthday gifts are lowkey a curse cause why am I about to spend $100 for a face cream 😭 | 0.274 | 0.677 | 0.049 | -0.8519 |
46 | I BOUGHT THE WRONG BEAUTYRITE LIPSTICK SHADEEE AND I BASICALLY WASTED MONEY OML IN MY DEFENSE I WAS IN A RUSH THAT LIPSTICK MAKES ME LOOK PALE AF THIS IS WHY I ONLY BUY EYELINER AND MASCARA EVERY SINGLE TIME🙄 | 0.152 | 0.804 | 0.044 | -0.7603 |
35 | I have an interview at Beautyrite today tell my why I had a dream I didn't get the job😂😭😭😭 | 0.255 | 0.614 | 0.132 | -0.6597 |
14 | why are Beautyrite employees so rude like damn | 0.468 | 0.342 | 0.190 | -0.6189 |
13 | cried in the middle of Beautyrite because the workers wouldn’t leave me alone 🤩 | 0.345 | 0.655 | 0.000 | -0.5859 |
49 | I hate Beautyrite, I feel out of place | 0.346 | 0.654 | 0.000 | -0.5719 |
0 | every time I go into Beautyrite everybody’s makeup look bad, what’s that about? | 0.226 | 0.774 | 0.000 | -0.5423 |
# YOUR CODE BEGINS
# YOUR CODE ENDS
df_top10_negative
# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = 'find-top-10-negative-tweets'
_points = 1
_obfuscate = True
import base64 as _b64
_64 = _b64.b64decode('IyBETyBOT1QgQ0hBTkdFIFRIRSBDT0RFIEJFTE9XCgoKZGZfdG9wMTBfbmVnYXRpdmVfU09MID0gZG\
ZfbWVyZ2VkX1NPTC5zb3J0X3ZhbHVlcygnY29tcG91bmQnKS5oZWFkKDEwKQpwZC50ZXN0aW5nLmFzc2VydF9mcmFtZV9lcXVhbC\
hkZl90b3AxMF9uZWdhdGl2ZS5yZXNldF9pbmRleChkcm9wPVRydWUpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBkZl\
90b3AxMF9uZWdhdGl2ZV9TT0wucmVzZXRfaW5kZXgoZHJvcD1UcnVlKSk=')
eval(compile(_64, '<string>', 'exec'))