#!/usr/bin/env python # coding: utf-8 # # Introduction to Jupyter Notebooks and Pandas # # ## What is a Jupyter Notebook? 📓 # # - A [Jupyter](https://jupyter.org/) notebook is a document that can contain live code w/ results, visualizations, and rich text. # - It is widely used in data science and analytics. # - A Jupyter notebook has a `.ipynb` file extension (e.g., `my_notebook.ipynb`). # - A Jupyter notebook is a list of cells. # # ### How do you run a Jupyter notebook? # # A Jupyter notebook can be run in one of the following Jupyter environments. # # 1. Jupyter Notebook - original web application for creating and sharing computational documents # 2. JupyterLab - a web-based development environment for notebooks (considered a newer version of the Jupyter Notebook) # 3. [Google Colab](https://colab.research.google.com/) - Google's cloud notebook platform built on top of [Jupyter](https://jupyter.org/) environment # # The first two environments require installations on your local machine or a server. We will use Google Colab as you can run it inside a cloud environment. # # ### Types of cells # # Every cell in a Jupyter notebook is of a specific type. The list of supported types vary by Jupyter environment. # # Google Colab supports two types of cells. # # 1. Code cell # 2. Text cell (also known as a Markdown cell) # # The cell below is a *code* cell. It contains a block of executable code. # # Run the code below by clicking on the cell below and clicking the "Run" icon (▶). # In[ ]: print(10 + 20) # ▶️ Run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder. # In[2]: import unittest tc = unittest.TestCase() # --- # # ### 🎯 Challenge 1: Find the sum of a list # # #### 👇 Tasks # # - ✔️ Complete the code cell below to find the sum of all values in `my_list`. # - ✔️ Store the result in a new variable named `result`. # In[3]: my_list = [11, 20, 52, 91, 90, 75, 74, 20, 21, 10, 14] ### BEGIN SOLUTION result = 0 for num in my_list: result = result + num ### END SOLUTION print(result) # #### 🧭 Check Your Work # # - Once you're done, run the code cell below to test correctness. # - ✔️ If the code cell runs without an error, you're good to move on. # - ❌ If the code cell throws an error, go back and fix any incorrect parts. # In[4]: # Challenge 1 Autograder import unittest tc = unittest.TestCase() tc.assertEqual(result, 478) # --- # # ## Introduction to Pandas # # Pandas is a Python *library* for data manipulation and analysis. Although it's used universally in data-related programming applications, it was initially developed for financial analysis by [AQR Capital Management](https://www.aqr.com/). # # Note: A *library* in the context of programming is a collection of functions (and other data) that others have already written for you. # # Pandas is popular for many reasons: # # 1. 🏃🏿‍♀️ It's fast (for most cases where the dataset can be loaded to your memory). # 2. 🪒 It supports most of the features required for data manipulation. # 3. 💡 Write less code. Get more done. # --- # # ### 🎯 Challenge 2: Import packages # # #### 👇 Tasks # # - ✔️ Import the following Python packages. # 1. `pandas`: Use alias `pd`. # 2. `numpy`: Use alias `np`. # In[5]: ### BEGIN SOLUTION import pandas as pd import numpy as np ### END SOLUTION # #### 🧭 Check Your Work # # - Once you're done, run the code cell below to test correctness. # - ✔️ If the code cell runs without an error, you're good to move on. # - ❌ If the code cell throws an error, go back and fix incorrect parts. # In[6]: # Challenge 2 Autograder import sys tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.") tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.") # --- # # ### It all starts with a `Series`... # # The basic building block of Pandas is a `Series`. A `Series` is like a list, but with many more features. # # You can create a `Series` by passing a list of values to `pd.Series()`. # In[7]: s = pd.Series([1, 2, 3, np.nan, 5, 6]) s # ### Few things to note here # # 1. These look similar to a Python `list`. # 2. The last line of the printed output tells us the data type of values in the `Series` (`dtype: float64`). # - What the heck is `np.nan`? # - It is used to indicate a "missing value". # - `np.nan` is NOT the same as `0`. # # ### Differences between a list and a Series # In[8]: my_list = [1, 2, 3, 4] print(type(my_list)) display(my_list * 2) # In[9]: my_series = pd.Series([1, 2, 3, 4]) print(type(my_series)) display(my_series * 2) # What happens when you multiply a Python `list` by number `2`? It repeats the elements. # # How about a `Series`? It multiples each element by `2`! # --- # # ### 🎯 Challenge 3: Create new `Series` # # #### 👇 Tasks # # - ✔️ Create a new Pandas `Series` named `my_series` with the following three values: `10`, `20`, `30`. # # #### 🚀 Hint # # The code below creates a new Pandas `Series` with the values `1` and `2`. # # ```python # my_new_series = pd.Series([1, 2]) # ``` # In[10]: ### BEGIN SOLUTION my_series = pd.Series([10, 20, 30]) ### END SOLUTION my_series # #### 🧭 Check Your Work # # - Once you're done, run the code cell below to test correctness. # - ✔️ If the code cell runs without an error, you're good to move on. # - ❌ If the code cell throws an error, go back and fix any incorrect parts. # In[11]: # Challenge 2 Autograder pd.testing.assert_series_equal(my_series, pd.Series([1, 2, 3]) * 10) # --- # # ### Using `Series` methods # # A pandas `Series` is similar to a Python `list`. However, a `Series` provides many methods (equivalent to functions) for you to use. # # As an example, `num_reviews.mean()` will return the average number of reviews. # In[12]: reviews_count = [12715, 2274, 2771, 3952, 528, 2766, 724] num_reviews = pd.Series(reviews_count) print(num_reviews)