Working with Columns¶

Add/rename/drop columns

▶️ First, run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

In [1]:

import unittest
tc = unittest.TestCase()

👇 Tasks¶

✔️ Import the following Python packages.
1. pandas: Use alias pd.
2. numpy: Use alias np.

In [2]:

### BEGIN SOLUTION
import pandas as pd
import numpy as np
### END SOLUTION

🧭 Check your work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [3]:

import sys
tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")

📌 Load data¶

▶️ Run the code cell below to create a new DataFrame named df_emp.

In [4]:

# DO NOT CHANGE THE CODE IN THIS CELL
df_emp = pd.DataFrame({
    "emp_id": [30, 40, 10, 20],
    "name": ["Colby", "Adam", "Eli", "Dylan"],
    "dept": ["Sales", "Marketing", "Sales", "Marketing"],
    "office_phone": ["(217)123-4500", np.nan, np.nan, "(217)987-6543"],
    "start_date": ["2017-05-01", "2018-02-01", "2020-08-01", "2019-12-01"],
    "salary": [202000, 185000, 240000, 160500]
})

# Used for intermediate checks
df_emp_backup = df_emp.copy()

df_emp

Out[4]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

👉 Sorting by Column(s) Recap¶

You can sort a DataFrame using df.sort_values().

sort_values usage

🎯 Challenge 1: Sort by `emp_id` ascending¶

👇 Tasks¶

✔️ Sort df_emp by emp_id in ascending order.
- Store the result to a new variable named df_id_asc.
✔️ df_emp should remain unaltered after your code.

🚀 Hints¶

The code below sorts my_dataframe by some_column in ascending order and stores the sorted DataFrame to a new variable sorted_dataframe.

sorted_dataframe = my_dataframe.sort_values("some_column")

▶️ Run the code cell below to reset your df_emp.

In [5]:

df_emp = df_emp_backup.copy()
df_emp

Out[5]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [6]:

### BEGIN SOLUTION
df_id_asc = df_emp.sort_values("emp_id")
### END SOLUTION

df_id_asc

Out[6]:

	emp_id	name	dept	office_phone	start_date	salary
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [7]:

pd.testing.assert_frame_equal(df_id_asc.reset_index(drop=True),
                              df_emp_backup.sort_values("_".join(["EmP", "iD"]).lower()).reset_index(drop=True))

🎯 Challenge 2: Sort by `emp_id` descending¶

👇 Tasks¶

✔️ Sort df_emp by emp_id in descending order.
- Store the result to a new variable named df_id_desc.
✔️ df_emp should remain unaltered after your code.

🚀 Hints¶

The code below sorts my_dataframe by some_column in descending order and stores the sorted DataFrame to a new variable sorted_dataframe.

sorted_dataframe = my_dataframe.sort_values("some_column", ascending=False)

▶️ Run the code cell below to reset your df_emp.

In [8]:

df_emp = df_emp_backup.copy()
df_emp

Out[8]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [9]:

### BEGIN SOLUTION
df_id_desc = df_emp.sort_values("emp_id", ascending=False)
### END SOLUTION

df_id_desc

Out[9]:

	emp_id	name	dept	office_phone	start_date	salary
1	40	Adam	Marketing	NaN	2018-02-01	185000
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500
2	10	Eli	Sales	NaN	2020-08-01	240000

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [10]:

pd.testing.assert_frame_equal(df_id_desc.reset_index(drop=True),
                              df_emp_backup.sort_values("_".join(["EmP", "iD"]).lower(), ascending=bool(0)).reset_index(drop=True))

🎯 Challenge 3: Sort by `name` ascending¶

👇 Tasks¶

✔️ Sort df_emp by name in ascending order.
- Store the result to a new variable named df_name_asc.
✔️ df_emp should remain unaltered after your code.

🚀 Hints¶

The code below sorts my_dataframe by some_column in ascending order and stores the sorted DataFrame to a new variable sorted_dataframe.

sorted_dataframe = my_dataframe.sort_values("some_column")

▶️ Run the code cell below to reset your df_emp.

In [11]:

df_emp = df_emp_backup.copy()
df_emp

Out[11]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [12]:

### BEGIN SOLUTION
df_name_asc = df_emp.sort_values("name")
### END SOLUTION

df_name_asc

Out[12]:

	emp_id	name	dept	office_phone	start_date	salary
1	40	Adam	Marketing	NaN	2018-02-01	185000
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500
2	10	Eli	Sales	NaN	2020-08-01	240000

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [13]:

pd.testing.assert_frame_equal(df_name_asc.reset_index(drop=True),
                              df_emp_backup.sort_values("".join(["nA", "Me"]).lower()).reset_index(drop=True))

🎯 Challenge 4: Sort by `name` descending¶

👇 Tasks¶

✔️ Sort df_emp by name in descending order.
- Store the result to a new variable named df_name_desc.
✔️ df_emp should remain unaltered after your code.

🚀 Hints¶

The code below sorts my_dataframe by some_column in ascending order and stores the sorted DataFrame to a new variable sorted_dataframe.

sorted_dataframe = my_dataframe.sort_values("some_column")

▶️ Run the code cell below to reset your df_emp.

In [14]:

df_emp = df_emp_backup.copy()
df_emp

Out[14]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [15]:

### BEGIN SOLUTION
df_name_desc = df_emp.sort_values("name", ascending=False)
### END SOLUTION

df_name_desc

Out[15]:

	emp_id	name	dept	office_phone	start_date	salary
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [16]:

pd.testing.assert_frame_equal(df_name_desc.reset_index(drop=True),
                              df_emp_backup.sort_values("".join(["nA", "Me"]).lower(), ascending=bool(0)).reset_index(drop=True))

🎯 Challenge 5: Sort by `dept` descending and then by `start_date` ascending¶

👇 Tasks¶

✔️ Sort df_emp by dept in descending order and then by start_date in ascending order for people within same departments.
✔️ Store the sorted result to a new variable named df_dept_desc_date_asc.
✔️ df_emp should remain unaltered after your code.

🚀 Hints¶

The code below sorts my_dataframe by some_column in ascending order and then by another_column in descending order for rows with same same_column values. It stores the sorted DataFrame to a new variable named sorted_dataframe.

sorted_dataframe = my_dataframe.sort_values(["some_column", "another_column"], ascending=[True, False])

▶️ Run the code cell below to reset your df_emp.

In [17]:

df_emp = df_emp_backup.copy()
df_emp

Out[17]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [18]:

### BEGIN SOLUTION
df_dept_desc_date_asc = df_emp.sort_values(["dept", "start_date"], ascending=[False, True])
### END SOLUTION

df_dept_desc_date_asc

Out[18]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
2	10	Eli	Sales	NaN	2020-08-01	240000
1	40	Adam	Marketing	NaN	2018-02-01	185000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [19]:

pd.testing.assert_frame_equal(df_dept_desc_date_asc.reset_index(drop=True),
                              df_emp_backup.sort_values(["".join(["dE", "Pt"]).lower(), "_".join(["sTarT", "dAtE"]).lower()], ascending=[bool(0), bool(1)]).reset_index(drop=True))

🎯 Challenge 6: Sort by `dept` ascending and then by `salary` descending¶

👇 Tasks¶

✔️ Sort df_emp by dept in ascending order and then by salary in descending order.
- Employees within a same department must be sorted by salary in descending order.
- Store the result to a new variable named df_dept_asc_salary_desc.
✔️ df_emp should remain unaltered after your code.

▶️ Run the code cell below to reset your df_emp.

In [20]:

df_emp = df_emp_backup.copy()
df_emp

Out[20]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [21]:

### BEGIN SOLUTION
df_dept_asc_salary_desc = df_emp.sort_values(["dept", "salary"], ascending=[True, False])
### END SOLUTION

df_dept_asc_salary_desc

Out[21]:

	emp_id	name	dept	office_phone	start_date	salary
1	40	Adam	Marketing	NaN	2018-02-01	185000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500
2	10	Eli	Sales	NaN	2020-08-01	240000
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [22]:

pd.testing.assert_frame_equal(df_dept_asc_salary_desc.reset_index(drop=True),
                              df_emp_backup.sort_values(["".join(["dE", "Pt"]).lower(), "sAlArY".lower()], ascending=[bool(1), bool(0)]).reset_index(drop=True))

🎯 Challenge 7: Sort by `salary` descending in-place¶

👇 Tasks¶

✔️ Sort df_emp by salary in descending order in-place.
- Directly update df_emp without creating a new variable.

🚀 Hints¶

The code below sorts my_dataframe by some_column in descending order in-place.

my_dataframe.sort_values("some_column", ascending=False, inplace=True)

▶️ Run the code cell below to reset your df_emp.

In [23]:

df_emp = df_emp_backup.copy()
df_emp

Out[23]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [24]:

### BEGIN SOLUTION
df_emp.sort_values("salary", ascending=False, inplace=True)
### END SOLUTION

df_emp

Out[24]:

	emp_id	name	dept	office_phone	start_date	salary
2	10	Eli	Sales	NaN	2020-08-01	240000
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [25]:

pd.testing.assert_frame_equal(df_emp.reset_index(drop=True),
                              df_emp_backup.sort_values("".join(["sAl", "aRy"]).lower(), ascending=bool(0)).reset_index(drop=True))

🎯 Challenge 8: Sort by `department` and `name` both descending in-place¶

👇 Tasks¶

✔️ Sort df_emp by dept and then by name for employees in the same department.
- Sort in descending orders for both columns in-place.
- Directly update df_emp without creating a new variable.

🚀 Hints¶

The code below sorts my_dataframe by some_column and another_column in descending order in-place.

my_dataframe.sort_values(["some_column", "another_column"], ascending=[False, False], inplace=True)

▶️ Run the code cell below to reset your df_emp.

In [26]:

df_emp = df_emp_backup.copy()
df_emp

Out[26]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [27]:

### BEGIN SOLUTION
df_emp.sort_values(["dept", "name"], ascending=[False, False], inplace=True)
### END SOLUTION

df_emp

Out[27]:

	emp_id	name	dept	office_phone	start_date	salary
2	10	Eli	Sales	NaN	2020-08-01	240000
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500
1	40	Adam	Marketing	NaN	2018-02-01	185000

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [28]:

pd.testing.assert_frame_equal(df_emp.reset_index(drop=True),
                              df_emp_backup.sort_values(["".join(["dE", "Pt"]).lower(), ("nA" + "Me").lower()], ascending=[bool(0), bool(0)]).reset_index(drop=True))

👉 Renaming Column(s)¶

You can rename a column using df.rename(columns={"name_before": "name_after"}). Similar to many Pandas operations, you can perform a rename operation either in-place or out-of-place.

# Rename a column and return a new DataFrame without modifying the original
# df's column names will remain unchanged
df_renamed = df.rename(columns={"name_before": "name_after"})

# Rename a column and update the original DataFrame
df.rename(columns={"name_before": "name_after"}, inplace=True)

🎯 Challenge 9: Rename `office_phone` to `phone_num`¶

👇 Tasks¶

✔️ Rename office_phone column to phone_num.
✔️ Store the result to a new variable named df_renamed.
✔️ Your original DataFrame (df_emp) should remain unaltered.

🚀 Hints¶

Use the following code to rename col_before column to col_after out-of-place.

renamed_dataframe = my_dataframe.rename(columns={"col_before": "col_after"})

▶️ Run the code cell below to reset your df_emp.

In [29]:

df_emp = df_emp_backup.copy()
df_emp

Out[29]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [30]:

### BEGIN SOLUTION
df_renamed = df_emp.rename(columns={"office_phone": "phone_num"})
### END SOLUTION

df_renamed

Out[30]:

	emp_id	name	dept	phone_num	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [31]:

tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist(), "Did you rename the column in-place? The original DataFrame should not be modified.")
tc.assertEqual(df_renamed.columns.tolist(), ["emp_id", "name", "dept", "phone_num", "start_date", "salary"])

🎯 Challenge 10: Rename `office_phone` to `phone_num` in-place¶

👇 Tasks¶

✔️ Rename office_phone column to phone_num in-place.
- Directly update df_emp without creating a new variable.

🚀 Hints¶

Use the following code to rename col_before column to col_after in-place.

my_dataframe.rename(columns={"col_before": "col_after"}, inplace=True)

▶️ Run the code cell below to reset your df_emp.

In [32]:

df_emp = df_emp_backup.copy()
df_emp

Out[32]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [33]:

### BEGIN SOLUTION
df_emp.rename(columns={"office_phone": "phone_num"}, inplace=True)
### END SOLUTION

df_emp

Out[33]:

	emp_id	name	dept	phone_num	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [34]:

tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "name", "dept", "phone_num", "start_date", "salary"])

🎯 Challenge 11: Rename `name` to `first_name` and `salary` to `base_salary` in-place¶

👇 Tasks¶

✔️ Rename name column to first_name and salary to base_salary in-place.
- Directly update df_emp without creating a new variable.

🚀 Hints¶

Use the following code as a reference.

my_dataframe.rename(columns={"col_before1": "col_after1", "col_before2": "col_after2"}, inplace=True)

▶️ Run the code cell below to reset your df_emp.

In [35]:

df_emp = df_emp_backup.copy()
df_emp

Out[35]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [36]:

### BEGIN SOLUTION
df_emp.rename(columns={"name": "first_name", "salary": "base_salary"}, inplace=True)
### END SOLUTION

df_emp

Out[36]:

	emp_id	first_name	dept	office_phone	start_date	base_salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [37]:

tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "first_name", "dept", "office_phone", "start_date", "base_salary"])

👉 Dropping Column(s)¶

You can rename a column using df.drop(columns=["col1", "col2"]). Again, you can perform this operation either in-place or out-of-place.

# Copy df, drop columns
# df will remain unchaged
df_dropped = df.drop(columns=["col1", "col2"])

# Drop columns directly from df
df.rename(columns={"name_before": "name_after"}, inplace=True)

🎯 Challenge 12: Drop `start_date` column¶

👇 Tasks¶

✔️ Drop start_date column from df_emp.
✔️ Store the result to a new variable named df_dropped.
✔️ Your df_emp should remain unaltered.

🚀 Hints¶

Use the following code as a reference.

dropped_dataframe = my_dataframe.drop(columns=["my_column1"])

▶️ Run the code cell below to reset your df_emp.

In [38]:

df_emp = df_emp_backup.copy()
df_emp

Out[38]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [39]:

### BEGIN SOLUTION
df_dropped = df_emp.drop(columns=["start_date"])
### END SOLUTION

df_dropped

Out[39]:

	emp_id	name	dept	office_phone	salary
0	30	Colby	Sales	(217)123-4500	202000
1	40	Adam	Marketing	NaN	185000
2	10	Eli	Sales	NaN	240000
3	20	Dylan	Marketing	(217)987-6543	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [40]:

tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist())
tc.assertEqual(df_dropped.columns.tolist(), ["emp_id", "name", "dept", "office_phone", "salary"])

🎯 Challenge 13: Drop `start_date` column in-place¶

👇 Tasks¶

✔️ Drop start_date column in-place.
- Directly update df_emp without creating a new variable.

🚀 Hints¶

Use the following code as a reference.

my_dataframe.drop(columns=["my_column1"], inplace=True)

▶️ Run the code cell below to reset your df_emp.

In [41]:

df_emp = df_emp_backup.copy()
df_emp

Out[41]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [42]:

### BEGIN SOLUTION
df_emp.drop(columns=["start_date"], inplace=True)
### END SOLUTION

df_emp

Out[42]:

	emp_id	name	dept	office_phone	salary
0	30	Colby	Sales	(217)123-4500	202000
1	40	Adam	Marketing	NaN	185000
2	10	Eli	Sales	NaN	240000
3	20	Dylan	Marketing	(217)987-6543	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [43]:

tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "name", "dept", "office_phone", "salary"])

🎯 Challenge 14: Drop `name` and `salary` columns in-place¶

👇 Tasks¶

✔️ Drop name and salary columns in-place.
- Directly update df_emp without creating a new variable.

🚀 Hints¶

Use the following code as a reference.

my_dataframe.drop(columns=["my_column1", "my_column2"], inplace=True)

▶️ Run the code cell below to reset your df_emp.

In [44]:

df_emp = df_emp_backup.copy()
df_emp

Out[44]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [45]:

### BEGIN SOLUTION
df_emp.drop(columns=["name", "salary"], inplace=True)
### END SOLUTION

df_emp

Out[45]:

	emp_id	dept	office_phone	start_date
0	30	Sales	(217)123-4500	2017-05-01
1	40	Marketing	NaN	2018-02-01
2	10	Sales	NaN	2020-08-01
3	20	Marketing	(217)987-6543	2019-12-01

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [46]:

tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "dept", "office_phone", "start_date"])

👉 Selecting a Subset of Columns¶

You can select a subset of columns from a DataFrame using df[list_of_columns].

You can also use df.loc[:, list_of_columns].

df[["col1", "col2"]]

# is equivalent to

df.loc[:, ["col1", "col2"]]

🎯 Challenge 15: `emp_id` and `name`¶

👇 Tasks¶

✔️ Select only emp_id and name columns from df_emp (in the same order).
✔️ Store the result to a new variable named df_id_name.
✔️ Your df_emp should remain unaltered.

🚀 Hints¶

Use the following code as a reference.

df_subset = df[["col1", "col2"]]

▶️ Run the code cell below to reset your df_emp.

In [47]:

df_emp = df_emp_backup.copy()
df_emp

Out[47]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [48]:

### BEGIN SOLUTION
df_id_name = df_emp[["emp_id", "name"]]
### END SOLUTION

df_id_name

Out[48]:

	emp_id	name
0	30	Colby
1	40	Adam
2	10	Eli
3	20	Dylan

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [49]:

pd.testing.assert_frame_equal(df_emp, df_emp_backup)
pd.testing.assert_frame_equal(df_emp_backup[["emp_id", "name"]], df_id_name, check_like=True)

🎯 Challenge 16: `emp_id`, `name`, `dept`, `salary`¶

👇 Tasks¶

✔️ Select emp_id, name, dept, and salary columns from df_emp (in the same order).
✔️ Store the result to a new variable named df_subset.
✔️ Your df_emp should remain unaltered.

🚀 Hints¶

Use the following code as a reference.

df_subset = df[["col1", "col2"]]

▶️ Run the code cell below to reset your df_emp.

In [50]:

df_emp = df_emp_backup.copy()
df_emp

Out[50]:

	emp_id	name	dept	office_phone	start_date	salary
0	30	Colby	Sales	(217)123-4500	2017-05-01	202000
1	40	Adam	Marketing	NaN	2018-02-01	185000
2	10	Eli	Sales	NaN	2020-08-01	240000
3	20	Dylan	Marketing	(217)987-6543	2019-12-01	160500

In [51]:

### BEGIN SOLUTION
df_subset = df_emp[["emp_id", "name", "dept", "salary"]]
### END SOLUTION

df_subset

Out[51]:

	emp_id	name	dept	salary
0	30	Colby	Sales	202000
1	40	Adam	Marketing	185000
2	10	Eli	Sales	240000
3	20	Dylan	Marketing	160500

🧭 Check Your Work¶

Once you're done, run the code cell below to test correctness.
✔️ If the code cell runs without an error, you're good to move on.
❌ If the code cell throws an error, go back and fix incorrect parts.

In [52]:

pd.testing.assert_frame_equal(df_emp, df_emp_backup)
pd.testing.assert_frame_equal(df_emp_backup[["emp_id", "name", "dept", "salary"]], df_subset)

Working with Columns¶

👇 Tasks¶

🧭 Check your work¶

📌 Load data¶

👉 Sorting by Column(s) Recap¶

🎯 Challenge 1: Sort by emp_id ascending¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 2: Sort by emp_id descending¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 3: Sort by name ascending¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 4: Sort by name descending¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 5: Sort by dept descending and then by start_date ascending¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 6: Sort by dept ascending and then by salary descending¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 7: Sort by salary descending in-place¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 8: Sort by department and name both descending in-place¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

👉 Renaming Column(s)¶

🎯 Challenge 9: Rename office_phone to phone_num¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 10: Rename office_phone to phone_num in-place¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 11: Rename name to first_name and salary to base_salary in-place¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

👉 Dropping Column(s)¶

🎯 Challenge 12: Drop start_date column¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 13: Drop start_date column in-place¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 14: Drop name and salary columns in-place¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

👉 Selecting a Subset of Columns¶

🎯 Challenge 15: emp_id and name¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 16: emp_id, name, dept, salary¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Challenge 1: Sort by `emp_id` ascending¶

🎯 Challenge 2: Sort by `emp_id` descending¶

🎯 Challenge 3: Sort by `name` ascending¶

🎯 Challenge 4: Sort by `name` descending¶

🎯 Challenge 5: Sort by `dept` descending and then by `start_date` ascending¶

🎯 Challenge 6: Sort by `dept` ascending and then by `salary` descending¶

🎯 Challenge 7: Sort by `salary` descending in-place¶

🎯 Challenge 8: Sort by `department` and `name` both descending in-place¶

🎯 Challenge 9: Rename `office_phone` to `phone_num`¶

🎯 Challenge 10: Rename `office_phone` to `phone_num` in-place¶

🎯 Challenge 11: Rename `name` to `first_name` and `salary` to `base_salary` in-place¶

🎯 Challenge 12: Drop `start_date` column¶

🎯 Challenge 13: Drop `start_date` column in-place¶

🎯 Challenge 14: Drop `name` and `salary` columns in-place¶

🎯 Challenge 15: `emp_id` and `name`¶

🎯 Challenge 16: `emp_id`, `name`, `dept`, `salary`¶