Toggle navigation
JUPYTER
FAQ
View as Code
Python 3 Kernel
View on GitHub
Execute on Binder
Download Notebook
materials
build-a-web-scraper
Notebook
Your Turn: Build A Pipeline
¶
Combine Your Knowledge of the Website,
requests
and
bs4
Automate Your Scraping Process Across Multiple Pages
Generalize Your Code For Varying Searches
Target & Save Specific Information You Want
Your Tasks:
¶
Scrape the first 100 available search results
Generalize your code to allow searching for different locations/jobs
Pick out information about the URL, job title, and job location
Save the results to a file
In [1]:
import
requests
from
bs4
import
BeautifulSoup
Part 1: Inspect
¶
How do the URLs change when you navigate to the next results page?
How do the URLs change when you use a different location and/or job title search?
Which HTML elements contain the link, title, and location of each job?
In [ ]:
Part 2: Scrape
¶
Build the code to fetch the first 100 search results. This means you will need to automatically navigate to multiple results pages
Write functions that allow you to specify the job title, location, and amount of results as arguments
In [ ]:
Part 3: Parse
¶
Sieve through your HTML soup to pick out only the job title, link, and location
Format the results in a readable format (e.g. JSON)
Save the results to a file
In [ ]: