Add/rename/drop columns
▶️ First, run the code cell below to import unittest
, a module used for 🧭 Check Your Work sections and the autograder.
import unittest
tc = unittest.TestCase()
pandas
: Use alias pd
.numpy
: Use alias np
.### BEGIN SOLUTION
import pandas as pd
import numpy as np
### END SOLUTION
import sys
tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")
▶️ Run the code cell below to create a new DataFrame
named df_emp
.
# DO NOT CHANGE THE CODE IN THIS CELL
df_emp = pd.DataFrame({
"emp_id": [30, 40, 10, 20],
"name": ["Colby", "Adam", "Eli", "Dylan"],
"dept": ["Sales", "Marketing", "Sales", "Marketing"],
"office_phone": ["(217)123-4500", np.nan, np.nan, "(217)987-6543"],
"start_date": ["2017-05-01", "2018-02-01", "2020-08-01", "2019-12-01"],
"salary": [202000, 185000, 240000, 160500]
})
# Used for intermediate checks
df_emp_backup = df_emp.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
emp_id
ascending¶df_emp
by emp_id
in ascending order.df_id_asc
.df_emp
should remain unaltered after your code.The code below sorts my_dataframe
by some_column
in ascending order and stores the sorted DataFrame
to a new variable sorted_dataframe
.
sorted_dataframe = my_dataframe.sort_values("some_column")
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_id_asc = df_emp.sort_values("emp_id")
### END SOLUTION
df_id_asc
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
pd.testing.assert_frame_equal(df_id_asc.reset_index(drop=True),
df_emp_backup.sort_values("_".join(["EmP", "iD"]).lower()).reset_index(drop=True))
emp_id
descending¶df_emp
by emp_id
in descending order.df_id_desc
.df_emp
should remain unaltered after your code.The code below sorts my_dataframe
by some_column
in descending order and stores the sorted DataFrame
to a new variable sorted_dataframe
.
sorted_dataframe = my_dataframe.sort_values("some_column", ascending=False)
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_id_desc = df_emp.sort_values("emp_id", ascending=False)
### END SOLUTION
df_id_desc
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
pd.testing.assert_frame_equal(df_id_desc.reset_index(drop=True),
df_emp_backup.sort_values("_".join(["EmP", "iD"]).lower(), ascending=bool(0)).reset_index(drop=True))
name
ascending¶df_emp
by name
in ascending order.df_name_asc
.df_emp
should remain unaltered after your code.The code below sorts my_dataframe
by some_column
in ascending order and stores the sorted DataFrame
to a new variable sorted_dataframe
.
sorted_dataframe = my_dataframe.sort_values("some_column")
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_name_asc = df_emp.sort_values("name")
### END SOLUTION
df_name_asc
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
pd.testing.assert_frame_equal(df_name_asc.reset_index(drop=True),
df_emp_backup.sort_values("".join(["nA", "Me"]).lower()).reset_index(drop=True))
name
descending¶df_emp
by name
in descending order.df_name_desc
.df_emp
should remain unaltered after your code.The code below sorts my_dataframe
by some_column
in ascending order and stores the sorted DataFrame
to a new variable sorted_dataframe
.
sorted_dataframe = my_dataframe.sort_values("some_column")
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_name_desc = df_emp.sort_values("name", ascending=False)
### END SOLUTION
df_name_desc
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
pd.testing.assert_frame_equal(df_name_desc.reset_index(drop=True),
df_emp_backup.sort_values("".join(["nA", "Me"]).lower(), ascending=bool(0)).reset_index(drop=True))
dept
descending and then by start_date
ascending¶df_emp
by dept
in descending order and then by start_date
in ascending order for people within same departments.df_dept_desc_date_asc
.df_emp
should remain unaltered after your code.The code below sorts my_dataframe
by some_column
in ascending order and then by another_column
in descending order for rows with same same_column
values. It stores the sorted DataFrame
to a new variable named sorted_dataframe
.
sorted_dataframe = my_dataframe.sort_values(["some_column", "another_column"], ascending=[True, False])
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_dept_desc_date_asc = df_emp.sort_values(["dept", "start_date"], ascending=[False, True])
### END SOLUTION
df_dept_desc_date_asc
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
pd.testing.assert_frame_equal(df_dept_desc_date_asc.reset_index(drop=True),
df_emp_backup.sort_values(["".join(["dE", "Pt"]).lower(), "_".join(["sTarT", "dAtE"]).lower()], ascending=[bool(0), bool(1)]).reset_index(drop=True))
dept
ascending and then by salary
descending¶df_emp
by dept
in ascending order and then by salary
in descending order.salary
in descending order.df_dept_asc_salary_desc
.df_emp
should remain unaltered after your code.▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_dept_asc_salary_desc = df_emp.sort_values(["dept", "salary"], ascending=[True, False])
### END SOLUTION
df_dept_asc_salary_desc
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
pd.testing.assert_frame_equal(df_dept_asc_salary_desc.reset_index(drop=True),
df_emp_backup.sort_values(["".join(["dE", "Pt"]).lower(), "sAlArY".lower()], ascending=[bool(1), bool(0)]).reset_index(drop=True))
salary
descending in-place¶df_emp
by salary
in descending order in-place.df_emp
without creating a new variable.The code below sorts my_dataframe
by some_column
in descending order in-place.
my_dataframe.sort_values("some_column", ascending=False, inplace=True)
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_emp.sort_values("salary", ascending=False, inplace=True)
### END SOLUTION
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
pd.testing.assert_frame_equal(df_emp.reset_index(drop=True),
df_emp_backup.sort_values("".join(["sAl", "aRy"]).lower(), ascending=bool(0)).reset_index(drop=True))
department
and name
both descending in-place¶df_emp
by dept
and then by name
for employees in the same department.df_emp
without creating a new variable.The code below sorts my_dataframe
by some_column
and another_column
in descending order in-place.
my_dataframe.sort_values(["some_column", "another_column"], ascending=[False, False], inplace=True)
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_emp.sort_values(["dept", "name"], ascending=[False, False], inplace=True)
### END SOLUTION
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
pd.testing.assert_frame_equal(df_emp.reset_index(drop=True),
df_emp_backup.sort_values(["".join(["dE", "Pt"]).lower(), ("nA" + "Me").lower()], ascending=[bool(0), bool(0)]).reset_index(drop=True))
You can rename a column using df.rename(columns={"name_before": "name_after"})
. Similar to many Pandas operations, you can perform a rename operation either in-place or out-of-place.
# Rename a column and return a new DataFrame without modifying the original
# df's column names will remain unchanged
df_renamed = df.rename(columns={"name_before": "name_after"})
# Rename a column and update the original DataFrame
df.rename(columns={"name_before": "name_after"}, inplace=True)
office_phone
to phone_num
¶office_phone
column to phone_num
.df_renamed
.df_emp
) should remain unaltered.Use the following code to rename col_before
column to col_after
out-of-place.
renamed_dataframe = my_dataframe.rename(columns={"col_before": "col_after"})
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_renamed = df_emp.rename(columns={"office_phone": "phone_num"})
### END SOLUTION
df_renamed
emp_id | name | dept | phone_num | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist(), "Did you rename the column in-place? The original DataFrame should not be modified.")
tc.assertEqual(df_renamed.columns.tolist(), ["emp_id", "name", "dept", "phone_num", "start_date", "salary"])
office_phone
to phone_num
in-place¶office_phone
column to phone_num
in-place.df_emp
without creating a new variable.Use the following code to rename col_before
column to col_after
in-place.
my_dataframe.rename(columns={"col_before": "col_after"}, inplace=True)
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_emp.rename(columns={"office_phone": "phone_num"}, inplace=True)
### END SOLUTION
df_emp
emp_id | name | dept | phone_num | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "name", "dept", "phone_num", "start_date", "salary"])
name
to first_name
and salary
to base_salary
in-place¶name
column to first_name
and salary
to base_salary
in-place.df_emp
without creating a new variable.Use the following code as a reference.
my_dataframe.rename(columns={"col_before1": "col_after1", "col_before2": "col_after2"}, inplace=True)
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_emp.rename(columns={"name": "first_name", "salary": "base_salary"}, inplace=True)
### END SOLUTION
df_emp
emp_id | first_name | dept | office_phone | start_date | base_salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "first_name", "dept", "office_phone", "start_date", "base_salary"])
You can rename a column using df.drop(columns=["col1", "col2"])
. Again, you can perform this operation either in-place or out-of-place.
# Copy df, drop columns
# df will remain unchaged
df_dropped = df.drop(columns=["col1", "col2"])
# Drop columns directly from df
df.rename(columns={"name_before": "name_after"}, inplace=True)
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_dropped = df_emp.drop(columns=["start_date"])
### END SOLUTION
df_dropped
emp_id | name | dept | office_phone | salary | |
---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 202000 |
1 | 40 | Adam | Marketing | NaN | 185000 |
2 | 10 | Eli | Sales | NaN | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 160500 |
tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist())
tc.assertEqual(df_dropped.columns.tolist(), ["emp_id", "name", "dept", "office_phone", "salary"])
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_emp.drop(columns=["start_date"], inplace=True)
### END SOLUTION
df_emp
emp_id | name | dept | office_phone | salary | |
---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 202000 |
1 | 40 | Adam | Marketing | NaN | 185000 |
2 | 10 | Eli | Sales | NaN | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 160500 |
tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "name", "dept", "office_phone", "salary"])
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_emp.drop(columns=["name", "salary"], inplace=True)
### END SOLUTION
df_emp
emp_id | dept | office_phone | start_date | |
---|---|---|---|---|
0 | 30 | Sales | (217)123-4500 | 2017-05-01 |
1 | 40 | Marketing | NaN | 2018-02-01 |
2 | 10 | Sales | NaN | 2020-08-01 |
3 | 20 | Marketing | (217)987-6543 | 2019-12-01 |
tc.assertEqual(df_emp.columns.tolist(), ["emp_id", "dept", "office_phone", "start_date"])
You can select a subset of columns from a DataFrame
using df[list_of_columns]
.
You can also use df.loc[:, list_of_columns]
.
df[["col1", "col2"]]
# is equivalent to
df.loc[:, ["col1", "col2"]]
emp_id
and name
¶▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_id_name = df_emp[["emp_id", "name"]]
### END SOLUTION
df_id_name
emp_id | name | |
---|---|---|
0 | 30 | Colby |
1 | 40 | Adam |
2 | 10 | Eli |
3 | 20 | Dylan |
pd.testing.assert_frame_equal(df_emp, df_emp_backup)
pd.testing.assert_frame_equal(df_emp_backup[["emp_id", "name"]], df_id_name, check_like=True)
emp_id
, name
, dept
, salary
¶emp_id
, name
, dept
, and salary
columns from df_emp
(in the same order).df_subset
.df_emp
should remain unaltered.Use the following code as a reference.
df_subset = df[["col1", "col2"]]
▶️ Run the code cell below to reset your df_emp
.
df_emp = df_emp_backup.copy()
df_emp
emp_id | name | dept | office_phone | start_date | salary | |
---|---|---|---|---|---|---|
0 | 30 | Colby | Sales | (217)123-4500 | 2017-05-01 | 202000 |
1 | 40 | Adam | Marketing | NaN | 2018-02-01 | 185000 |
2 | 10 | Eli | Sales | NaN | 2020-08-01 | 240000 |
3 | 20 | Dylan | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
### BEGIN SOLUTION
df_subset = df_emp[["emp_id", "name", "dept", "salary"]]
### END SOLUTION
df_subset
emp_id | name | dept | salary | |
---|---|---|---|---|
0 | 30 | Colby | Sales | 202000 |
1 | 40 | Adam | Marketing | 185000 |
2 | 10 | Eli | Sales | 240000 |
3 | 20 | Dylan | Marketing | 160500 |
pd.testing.assert_frame_equal(df_emp, df_emp_backup)
pd.testing.assert_frame_equal(df_emp_backup[["emp_id", "name", "dept", "salary"]], df_subset)