wordcloud
package¶!pip install wordcloud
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: wordcloud in /usr/local/lib/python3.7/dist-packages (1.5.0) Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from wordcloud) (7.1.2) Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from wordcloud) (1.21.6)
!pip install --user requests
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (2.23.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests) (2022.6.15) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests) (1.24.3)
Requests
allow you to send HTTP/1.1 requests. You can add headers, form data, multi-part files, and parameters with simple Python dictionaries, and access the response data in the same way.import requests # to download files through url
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
type(res)
requests.models.Response
# checking for error,
res.status_code == requests.codes.ok
True
len(res.text) #178,000+ characters long
178978
print(res.text[:0])
print(res.text[:1])
T
print(res.text[0])
T
print(res.text[:2])
Th
print(res.text[:3])
The
print(res.text[:100])
The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare This eBook is for the use
print(res.text[:500])
The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org/license Title: Romeo and Juliet Author: William Shakespeare Posting Date: May 25, 2012 [EBook #1112] Release Date: November, 1997 [Etext #1112] Language: Eng
res.raise_for_status()
# checking for error
# response.raise_for_status() returns an HTTPError object if an error has occurred during the process.
playFile = open('RomeoAndJuliet.txt', 'wb')
# to write binary data instead of text data
# to maintain the Unicode encoding of the text. refer to the book
# https://www.geeksforgeeks.org/response-iter_content-python-requests/
for chunk in res.iter_content(200000):
playFile.write(chunk)
playFile.close()
with open("RomeoAndJuliet.txt", 'r') as fh:
filedata = fh.read()
print(type(filedata))
print(len(filedata))
print('---------------')
print(filedata[:500])
<class 'str'> 174126 --------------- The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org/license Title: Romeo and Juliet Author: William Shakespeare Posting Date: May 25, 2012 [EBook #1112] Release Date: November, 1997 [Etext #1112] Language: English *** STAR
#Library to form wordcloud :
from wordcloud import WordCloud, STOPWORDS
stopwords = set(STOPWORDS)
#Library to plot the wordcloud :
import matplotlib.pyplot as plt
#Generating the wordcloud data :
wordcloud = WordCloud(stopwords=stopwords, max_words=100).generate(filedata)
#Plot the wordcloud :
plt.figure(figsize = (10, 10))
plt.imshow(wordcloud)
#To remove the axis value :
plt.axis("off")
plt.show()
#Add more words to ignore
stopwords.update(["many","go", "want", "value", "will", "come", "give", "Nurse", "one", "now", "go", "yet", "let"])
#Redo stop words. Limit number of words
wordcloud = WordCloud(stopwords=stopwords, max_words=100, \
background_color="white").generate(filedata)
#Plot the wordcloud :
plt.figure(figsize = (10, 10))
plt.imshow(wordcloud)
#To remove the axis value :
plt.axis("off")
plt.show()
!pip install pillow
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (7.1.2)
pwd
'/content'
#Import required libraries :
import numpy as np
from PIL import Image
#Here we are going to use a circle image as mask :
#char_mask = np.array(Image.open(""))
# makes the circle using numpy
x, y = np.ogrid[:300, :300]
mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
mask = 255 * mask.astype(int)
#Generating wordcloud :
wordcloud = WordCloud(background_color="black",contour_width=0.5, contour_color="yellow", mask=mask).generate(filedata)
#Plot the wordcloud :
plt.figure(figsize = (8,8))
plt.imshow(wordcloud)
#To remove the axis value :
plt.axis("off")
plt.show()