Department of Data Science

Course: Tools and Techniques for Data Science

Instructor: Muhammad Arif Butt, Ph.D.

Lecture 3.16 (Pandas-08)

Handling Missing Data.ipynb¶

In [ ]:

Learning agenda of this notebook¶

Have an insight about the Dataset
Identify the Columns having Null/Missing values using df.isna() method
Handle/Impute the Null/Missing Values under the math Column using df.loc[mask,col]=value
Handle/Impute the Null/Missing Values under the group Column using df.loc[mask,col]=value
Handle Missing values under a Numeric/Categorical Column using fillna()
Handle Repeating Values (for same information) under the session Column
Create a new Column by Modifying an Existing Column
Delete Rows Having NaN values using df.dropna() method
Convert Categorical Variables into Numerical

In [ ]:

1. Have an Insight about the Dataset¶

In [ ]:

! cat datasets/group-marks.csv

In [ ]:

# import the pandas library
import pandas as pd
df = pd.read_csv('datasets/group-marks.csv')
df.head()

In [ ]:

df.shape

Whenever the pd.read.csv() method detects a missing value (nothing between two commas in a csv file or an empty cell in Excel) it flags it with NaN. There can be many reasons for these NaN values, one can be that the data is gathered via google form from people and this field might be optional and skipped.
There can also be a scenario that a user has entered some text under a numeric field about which he/she do not have any information.

In [ ]:

2. Identify the Columns having Null/Missing values¶

The df.isna() method is recommended to use than df.isnull(), which return a boolean same-sized object that indicates whether an element is NA value or not. Missing values get mapped to True. Everything else gets mapped to False values. Remember, characters such as empty strings '' or numpy.inf are not considered NA values.
The df.notna() method is recommended to use than df.notnull() methods return a boolean same-sized object that indicates whether an element is NA value or not. Non-missing values get mapped to True.

In [ ]:

df.math.isna()

In [ ]:

df.loc[df.math.isna(), :]

In [ ]:

df.isna().head()

In [ ]:

df.notna().head()

In [ ]:

# Now we can use sum() on this dataframe object of Boolean values (True is mapped to 1)
df.isna().sum()

In [ ]:

# Similarly, we can use sum() on this dataframe object of Boolean values (True is is mapped to 1)
df.notna().sum()

In [ ]:

3. Handle/Impute the Null/Missing Values under the `math` Column¶

a. Identify the Rows under the `math` Column having Null/Missing values¶

The df.isna() method works equally good on Series objects as well

In [ ]:

mask = df.math.isna()
mask

In [ ]:

df.loc[df.math.isna(), :]

In [ ]:

# This will return only those rows of dataframe having null values under the math column
df[mask]         # df[df.math.isna()]
df.loc[mask, :]  # df.loc[df.math.isna(), :]

In [ ]:

b. Replace the Null/Missing Values under the `math` Column¶

After detecting the NaN values, the next question is, what value we should write in the cells where we have Null/Missing values under the math column
Suppose, we want to put the average values at the place of missing values.

In [ ]:

df

In [ ]:

# Compute the mean of math column
df.math.mean() 

In [ ]:

By seeing the error, it appears that the math column do not have the int64 or float64 type. Let us check this out

In [ ]:

# Check out the data type of math column
df['math'].dtypes

In [ ]:

# We can also use the `df.info()` method to display the count of Non-Null columns, their datatypes, their names 
# and memory usage of that dataframe.

df.info()

What can be the reason for this?
Let us check out the values under this column

In [ ]:

df['math']

In [ ]:

# We can replace all such values using the `replace()` method
import numpy as np
df['math'] = df.math.replace('No Idea', np.nan).head()

In [ ]:

df

In [ ]:

df.math.replace('No Idea', np.nan, inplace=True)

In [ ]:

# Note the marks of Saadia in math are changed from string `No Idea` to `NaN`
# Since this seems working fine let us make inplace=True to make these changes in the original dataframe
df.replace('No Idea', np.nan, inplace=True)

In [ ]:

df.head()

In [ ]:

# Let us check the data type of math column
df['math'].dtypes

In [ ]:

# It is still Object, which is natural, however, we can change the datatype to `df.astype()` method
df['math'] = df['math'].astype(float)

In [ ]:

# Let us check the data type of math column
df['math'].dtypes

In [ ]:

# Let us compute the average of math marks again 
df.math.mean() 

In [ ]:

# List only those records under math column having Null values
mask = df.math.isna()
df.loc[mask, 'math']

In [ ]:

df.math.mean()

In [ ]:

# Let us replace these values with mean value of the math column
df.loc[(df.math.isna()),'math'] = df.math.mean()

In [ ]:

# Confirm the result
df.isna().sum()
#df.info()

In [ ]:

df.head()

In [ ]:

4. Handle/Impute the Null/Missing Values under the `group` Column¶

The group column contains categorical values, i.e., a value that can take on one of a limited, and usually fixed, number of possible values.

a. Identify the Rows under the `group` Column having Null/Missing values¶

In [ ]:

df

In [ ]:

mask = df.group.isna()
mask

In [ ]:

df[mask]          # df[df.group.isna()]
df.loc[mask, :]   # df.loc[df.group.isna()]

In [ ]:

b. Replace the Null/Missing Values under the `group` Column¶

After detecting the NaN values, the next question is, what value we should write in the cells where we have Null/Missing values
Since this is a categorical column having datatype object (group A, group B, group C, ...), so let us replace it with th value inside the column having the maximum frequency

In [ ]:

# Use value_counts() function which return a Series containing counts of unique values (in descending order)
# with the most frequently-occurring element at first. It excludes NA values by default.
df.group.value_counts()

In [ ]:

# Another way of doing is use the mode() function on the column
df.group.mode() 

In [ ]:

# List only those records under group column having Null values
mask = df.group.isna()
df.loc[mask, 'group']     # df.loc[(df.group.isna()), 'group']

In [ ]:

df.loc[(df.group.isna()),'group']

In [ ]:

# Let us replace these values with maximum occurring value in the `group` column
df.loc[(df.group.isna()),'group'] = 'group C'

In [ ]:

# Confirm the result
df.isna().sum()
#df.info()

In [ ]:

df.head()

Note that in the original dataframe Arifa group information was missing, and now it is group C

In [ ]:

5. Handle Missing values under a Numeric/Categorical Column using `fillna()`¶

a. Replace the Null/Missing Values under the math Column using `fillna()`¶

This is more recommended way of filling in the Null values within columns of your dataset rather than the use of the loc method.

object.fillna(value, method, inplace=True)

The only required argument is either the value, with which we want to replace the missing values OR the method to be used to replace the missing values
Returns object with missing values filled or None if inplace=True

In [ ]:

# Let us read the dataset again with NA values under math column
import pandas as pd
df = pd.read_csv('datasets/group-marks.csv')

In [ ]:

df.head()

Before proceeding, let us this time handle the string value No Idea under the math column while reading the csv file, instead of doing afterwards in the dataframe using the replace() method as we have done above.

For this we will use the na_values argument to the pd.read_csv() method, to which you can pass a single value or a list of values to be replaced with NaN

In [ ]:

df = pd.read_csv('datasets/group-marks.csv', na_values='No Idea')

In [ ]:

df.head()

In [ ]:

df.isna().sum()

In [ ]:

df.loc[df.math.isna()]

In [ ]:

# This time instead of loc, use fillna() method with just two arguments
# inplace=True parameter ensure that this happens in the original dataframe

df.math.fillna(value=df.math.mean(), inplace=True)

In [ ]:

# Confirm the result
df.isna().sum()
#df.info()

In [ ]:

df.head()

In [ ]:

b. Replace the Null/Missing Values under the `group` Column using `fillna()`¶

In [ ]:

# Let us read the dataset again with NA values
import pandas as pd
df = pd.read_csv('datasets/group-marks.csv', na_values='No Idea')
df.head()

In [ ]:

df.isna().sum()

In [ ]:

# Once again instead of loc,let us use fillna() method with just two arguments

df.group.fillna('group C', inplace=True)

In [ ]:

# Confirm the result
df.isna().sum()
#df.info()

In [ ]:

# Let us fill the math, english and scholarship columns as well again
df.math.fillna(df.math.mean(), inplace=True)
df.english.fillna(df.english.mean(), inplace=True)
df.scholarship.fillna(df.scholarship.mean(), inplace=True)

In [ ]:

# Confirm the result
df.isna().sum()

In [ ]:

c. Replace the Null/Missing Values under the`math` and `group` Column using `ffill` and `bfill` Arguments¶

In above examples, we have used the mean value in case of numeric column and mode value in case of a categorical column as the filling value to the fillna() method

object.fillna(value, method, inplace=True)

We can pass ffill or bfill as method argument to the ffillna() method. This will replace the null values with other values from the DataFrame
ffill (Forward fill): It fills the NaN value with the previous value
bfill (Back fill): It fills the NaN value with the Next/Upcoming value

In [ ]:

# Let us read the dataset again with NA values
import pandas as pd
df = pd.read_csv('datasets/group-marks.csv', na_values='No Idea')
df.head(20)

In [ ]:

df.isna().sum()

In [ ]:

# forward fill or ffill attribute
# If have NaN value, just carry forward the previous value
# using ffill attribute, you can fill the NaN value with the previous value in that column
df.fillna(method = 'ffill', inplace=True)
df.head()

In [ ]:

Is it working fine?

In [ ]:

df.fillna(method = 'bfill', inplace=True)
df.head()

In [ ]:

# Confirm the result
df.isna().sum()

In [ ]:

6. Handle Repeating Values (for same information) under the `session` Column¶

If you observe the values under the session column, you can observe that it is a categorical column containing six different categories (as values).
- Notice that the categories MORNING and MOR are same
- Similarly, AFTERNOON and AFT are same
- Similarly, EVENING and EVE are same
This happens when you have collected data from different sources, where same information is written in different ways
So the session column has six different categories (as values) but should have only three

In [ ]:

import pandas as pd
df = pd.read_csv('datasets/group-marks.csv' )
df

In [ ]:

df.session

In [ ]:

# Let use check out the counts of unique values inside the session Column
df.session.value_counts()

In [ ]:

df.session.unique()

In [ ]:

Handle the Repeating Values under the session Column using `map()`¶

To keep the data clean we will map all these values to only three categories to MOR , AFT and EVE using the map() function.

df.map(mapping, na_action=None)

The map() method is used for substituting each value in a Series with another value, that may be derived from a dict. The map() method returns a series after performing the mapping
You can give ignore as second argument which will propagate NaN values, without passing them to the mapping correspondence.

In [ ]:

# To do this, let us create a new mapping (dictionary) 
dict1 = {
    'MORNING' : 'MOR',
    'MOR' : 'MOR',
    'AFTERNOON' : 'AFT',
    'AFT': 'AFT',
    'EVENING' : 'EVE',
    'EVE': 'EVE'
}

In [ ]:

# It returns a series with the same index as caller, the original series remains unchanged. 
# So we have assigned the resulting series to `df.session` series
df.session.map(dict1)

In [ ]:

df.session = df.session.map(dict1)

In [ ]:

# Count of new categories in the column session
# Observe we have managed to properly manage the values inside the session column
df.session.value_counts()

In [ ]:

# Let us verify the result
df.head()

In [ ]:

7. Create a new Column by Modifying an Existing Column¶

We have a column scholarship in the dataset, which is in Pak Rupees
Suppose you want to have a new column which should represent the scholarship in US Dollars
For that we need to add a new column by dividing each value of scholarship with 150

In [ ]:

import pandas as pd
df = pd.read_csv('datasets/group-marks.csv' )
df.head()

In [ ]:

df.scholarship.apply(lambda x: x/170)

In [ ]:

df['Scholarship_in_$'] = df.scholarship.apply(lambda x : x/150)

In [ ]:

df.head()

In [ ]:

8. Delete Rows Having NaN values using `df.dropna()` method¶

**``` df.dropna(axis, how, subset, inplace)

**

In [ ]:

import pandas as pd
df = pd.read_csv('datasets/group-marks.csv')
df.head()

In [ ]:

df.shape

In [ ]:

# You can use dropna() method to drop all the rows, it it has any na value
df1 = df.dropna()
df1.shape

In [ ]:

df1.head()

In [ ]:

# Default Arguments to dropna()
df2 = df.dropna(axis=0, how='any')
df2.shape

In [ ]:

# If we set how='all` it means drop a row only if all of its values are NA
df2 = df.dropna(axis=0, how='all')
df2.shape

In [ ]:

# Use of subset argument and pass it a list of columns based on whose values you want to drop a row
df2 = df.dropna(axis=0, how='any', subset=['math'])
df2.shape

In [ ]:

# Use of subset argument
df2 = df.dropna(axis=0, how='any', subset=['session'])
df2.shape

In [ ]:

# Having `how=all` and `subset=listofcolumnnames`, then it will 
# drop a row only if both the columns have a NA value in that row
df2 = df.dropna(axis=0, how='any', subset=['math', 'session'])
df2.shape

In [ ]:

# If we set the axis=1 and how=all, it means drop a column if all the  values under it is na
df2 = df.dropna(axis=1, how='all')
df2.shape

In [ ]:

df.info()

In [ ]:

df2 = df.dropna(axis=1, how='any')
df2.shape

In [ ]:

# If we set the axis=1 and how=any, it means drop a column if any value under it is na
df2 = df.dropna(axis=1, how='any')
df2.shape

In [ ]:

df2.head()

In [ ]:

9. Convert Categorical Variables into Numerical¶

Most of the machine learning algorithms do not take categorical variables so we need to convert them into numerical ones.
We can do this using Pandas function pd.get_dummies(), which will create a binary column for each of the categories.

pd.get_dummies(data, drop_first=False)

Where, the only required argument is data which can be a dataframe or a series
The parameter drop_first : bool, default False Whether to get k-1 dummies out of k categorical levels by removing the first level.

Note: Making a dummy variable will take all the K distinct values in one coumn and make K columns out of them

a. Convert all categorical variables into dummy/indicator variables¶

In [ ]:

import pandas as pd
df = pd.read_csv('datasets/group-marks.csv')
df.head()

In [ ]:

# currently we have 10 columns in the data
df.shape

In [ ]:

# Convert all categorical variables into dummy/indicator variables
df = pd.get_dummies(df)

In [ ]:

# Let us view the datafreame, keep a note on the number of columns
df.head()

In [ ]:

# The Number of columns has gone to 1605 now
df.shape

In [ ]:

df

So we have 112 columns
It adds a lot of dimensionality to your data, i.e., increase the number of columns
It also become difficult to deal with that much number of columns
This is a trade-off, which is handled by technique called dimensionality reduction

In [ ]:

b. Perform One-Hot Encoding for Categorical Column `gender` Only¶

In our dataframe, the gender column is a categorical column having two values 'male' and 'female'
It will create a dummy binary columns.
This is also known as One Hot Encoding. You will learn more encoding techniques in the data pre-processing module.

In [15]:

import pandas as pd
df1 = pd.read_csv('datasets/group-marks.csv')
df1.head()

Out[15]:

	rollno	name	gender	group	session	age	scholarship	math	english	urdu
0	MS01	SAADIA	female	group B	MORNING	28	2562	No Idea	72.0	74
1	MS02	JUMAIMA	female	group C	AFTERNOON	33	2800	69	90.0	88
2	MS03	ARIFA	female	NaN	EVENING	34	3500	NaN	95.0	93
3	MS04	SAADIA	female	group A	MOR	44	2000	47	57.0	44
4	MS05	DANISH	male	group C	AFTERNOON	54	2100	76	78.0	55

In [ ]:

# Convert only gender variable into dummy/indicator variables
df2 = pd.get_dummies(df1[['gender']])
df2.head()

In [ ]:

# Since we donot need two separate columns, so simply use the `drop_first` argument of get_dummies to handle this
df2 = pd.get_dummies(df1[['gender']], drop_first=True)
df2.head()

In [ ]:

# We will talk about join in the next session in detail.
df3 = df1.join(df2['gender_male'])
df3.head()

In [ ]:

In [35]:

import pandas as pd
df = pd.read_csv('datasets/group-marks.csv')
df.head()

Out[35]:

	rollno	name	gender	group	session	age	scholarship	math	english	urdu
0	MS01	SAADIA	female	group B	MORNING	28	2562	No Idea	72.0	74
1	MS02	JUMAIMA	female	group C	AFTERNOON	33	2800	69	90.0	88
2	MS03	ARIFA	female	NaN	EVENING	34	3500	NaN	95.0	93
3	MS04	SAADIA	female	group A	MOR	44	2000	47	57.0	44
4	MS05	DANISH	male	group C	AFTERNOON	54	2100	76	78.0	55

In [36]:

df.session.value_counts()
dict1 = {
    'MORNING' : 'MOR',
    'AFTERNOON' : 'AFT',
}
#df.session = df.session.map(dict1)
df.session = df.session.replace(dict1)

In [37]:

df

Out[37]:

	rollno	name	gender	group	session	age	scholarship	math	english	urdu
0	MS01	SAADIA	female	group B	MOR	28	2562	No Idea	72.0	74
1	MS02	JUMAIMA	female	group C	AFT	33	2800	69	90.0	88
2	MS03	ARIFA	female	NaN	EVENING	34	3500	NaN	95.0	93
3	MS04	SAADIA	female	group A	MOR	44	2000	47	57.0	44
4	MS05	DANISH	male	group C	AFT	54	2100	76	78.0	55
5	MS06	SAFIA	female	group B	AFT	23	3800	NaN	83.0	78
6	MS07	SARA	female	group B	EVENING	47	3000	88	95.0	92
7	MS08	ABDULLAH	male	group B	EVE	33	2000	40	43.0	39
8	MS09	KHAN	male	group D	MOR	27	2500	64	NaN	67
9	MS10	HASEENA	female	group B	AFT	33	2800	38	60.0	50
10	MS11	MUSTJAB	male	group C	MOR	46	3000	58	54.0	52
11	MS12	ABRAR	male	group D	MOR	53	3312	40	52.0	43
12	MS13	MAHOOR	female	NaN	MOR	25	2345	65	81.0	73
13	MS14	USAMA	male	group A	AFT	26	2654	78	72.0	70
14	MS15	NAVAIRA	female	group A	AFT	25	2137	50	53.0	58
15	MS16	SAWAIRA	female	group C	EVENING	29	2567	69	75.0	78
16	MS17	NOFIL	male	group C	MOR	22	3500	88	89.0	86
17	MS18	SHUMAILA	female	group B	AFT	31	2500	18	NaN	28
18	MS19	ABUBAKAR	male	group C	AFT	33	3000	46	42.0	46
19	MS20	NADRA	female	group C	MOR	36	2200	54	58.0	45
20	MS21	HASEEB	male	group D	AFT	34	3500	66	69.0	63
21	MS22	HOORIA	female	group B	EVENING	32	2000	65	75.0	70
22	MS23	RAUF	male	group D	MOR	31	2500	44	54.0	53
23	MS24	LAIBA	female	group C	AFT	37	3000	NaN	73.0	73
24	MS25	FATIMA	male	group D	AFT	33	2500	74	71.0	80
25	MS26	IBRAR	male	group A	EVENING	39	3500	73	74.0	72
26	MS27	USAMA	male	group B	EVE	32	2500	69	54.0	55
27	MS28	NOOR	female	group C	MOR	47	3000	67	69.0	75
28	MS29	UMAR	male	group C	AFT	44	4000	70	70.0	65
29	MS30	ROSHAN	female	group D	MOR	42	3500	62	70.0	75
30	MS31	UNAIZA	female	group D	EVENING	41	2500	69	74.0	74
31	MS32	FATIMA	female	group B	EVE	19	3000	63	NaN	61
32	MS33	SHAISTA	female	NaN	MOR	29	3500	56	72.0	65
33	MS34	ALI	male	group D	EVENING	34	3000	40	42.0	38
34	MS35	ABDULLAH	male	group E	MOR	45	2500	97	87.0	82
35	MS36	OSAMA	male	group E	AFT	31	3500	81	81.0	79
36	MS37	AMBREEN	female	group D	EVENING	28	3000	74	81.0	62
37	MS38	NASEEM	female	group D	MOR	26	2500	50	64.0	59
38	MS39	SAFIA	female	group D	AFT	36	3500	75	90.0	88
39	MS40	NABEEL	male	group B	AFT	33	3000	57	56.0	57
40	MS41	SIKANDER	male	group C	EVENING	35	2500	55	61.0	54
41	MS42	ZAIN	female	group C	EVE	27	3500	58	73.0	68
42	MS43	AYESHA	female	group B	MOR	38	3000	53	58.0	65
43	MS44	HASSAN	male	group B	AFT	31	2500	59	65.0	66
44	MS45	ZAINAB	female	group E	MOR	28	3500	NaN	56.0	54
45	MS46	TALHA	male	group B	EVENING	41	3000	65	54.0	57
46	MS47	SABA	female	group A	EVE	36	2500	55	65.0	62
47	MS48	RAFIA	female	group C	MOR	30	3500	66	71.0	76
48	MS49	FATIMA	female	group D	MOR	40	2500	57	74.0	76
49	MS50	KAKAMANNA	male	group C	AFT	37	3000	66	78.0	81

In [ ]:

In [24]:

df1 = pd.get_dummies(df.session)
df1.head()

Out[24]:

	AFTERNOON	EVENING	MOR	MORNING
0	0	0	0	1
1	1	0	0	0
2	0	1	0	0
3	0	0	1	0
4	1	0	0	0

In [ ]:

In [20]:

df1 = pd.get_dummies(df.gender, drop_first=True)
df3 = df.join(df1['male'])
df3.head()

Out[20]:

	rollno	name	gender	group	session	age	scholarship	math	english	urdu	male
0	MS01	SAADIA	female	group B	MORNING	28	2562	No Idea	72.0	74	0
1	MS02	JUMAIMA	female	group C	AFTERNOON	33	2800	69	90.0	88	0
2	MS03	ARIFA	female	NaN	EVENING	34	3500	NaN	95.0	93	0
3	MS04	SAADIA	female	group A	MOR	44	2000	47	57.0	44	0
4	MS05	DANISH	male	group C	AFTERNOON	54	2100	76	78.0	55	1

In [ ]:

In [22]:

df1 = pd.get_dummies(df[['gender']], drop_first=True)
df3 = df.join(df1['gender_male'])
df3.head()

Out[22]:

	rollno	name	gender	group	session	age	scholarship	math	english	urdu	gender_male
0	MS01	SAADIA	female	group B	MORNING	28	2562	No Idea	72.0	74	0
1	MS02	JUMAIMA	female	group C	AFTERNOON	33	2800	69	90.0	88	0
2	MS03	ARIFA	female	NaN	EVENING	34	3500	NaN	95.0	93	0
3	MS04	SAADIA	female	group A	MOR	44	2000	47	57.0	44	0
4	MS05	DANISH	male	group C	AFTERNOON	54	2100	76	78.0	55	1

In [3]:

df3 = df.join(df1['gender_male'])
df3.head()

Out[3]:

	rollno	name	gender	group	session	age	scholarship	math	english	urdu	gender_male
0	MS01	SAADIA	female	group B	MORNING	28	2562	No Idea	72.0	74	0
1	MS02	JUMAIMA	female	group C	AFTERNOON	33	2800	69	90.0	88	0
2	MS03	ARIFA	female	NaN	EVENING	34	3500	NaN	95.0	93	0
3	MS04	SAADIA	female	group A	MOR	44	2000	47	57.0	44	0
4	MS05	DANISH	male	group C	AFTERNOON	54	2100	76	78.0	55	1

In [ ]:

Department of Data Science

Course: Tools and Techniques for Data Science

Instructor: Muhammad Arif Butt, Ph.D.

Lecture 3.16 (Pandas-08)

Handling Missing Data.ipynb¶

Learning agenda of this notebook¶

1. Have an Insight about the Dataset¶

2. Identify the Columns having Null/Missing values¶

3. Handle/Impute the Null/Missing Values under the math Column¶

a. Identify the Rows under the math Column having Null/Missing values¶

b. Replace the Null/Missing Values under the math Column¶

4. Handle/Impute the Null/Missing Values under the group Column¶

a. Identify the Rows under the group Column having Null/Missing values¶

b. Replace the Null/Missing Values under the group Column¶

5. Handle Missing values under a Numeric/Categorical Column using fillna()¶

a. Replace the Null/Missing Values under the math Column using fillna()¶

b. Replace the Null/Missing Values under the group Column using fillna()¶

c. Replace the Null/Missing Values under themath and group Column using ffill and bfill Arguments¶

6. Handle Repeating Values (for same information) under the session Column¶

Handle the Repeating Values under the session Column using map()¶

7. Create a new Column by Modifying an Existing Column¶

8. Delete Rows Having NaN values using df.dropna() method¶

9. Convert Categorical Variables into Numerical¶

a. Convert all categorical variables into dummy/indicator variables¶

b. Perform One-Hot Encoding for Categorical Column gender Only¶

3. Handle/Impute the Null/Missing Values under the `math` Column¶

a. Identify the Rows under the `math` Column having Null/Missing values¶

b. Replace the Null/Missing Values under the `math` Column¶

4. Handle/Impute the Null/Missing Values under the `group` Column¶

a. Identify the Rows under the `group` Column having Null/Missing values¶

b. Replace the Null/Missing Values under the `group` Column¶

5. Handle Missing values under a Numeric/Categorical Column using `fillna()`¶

a. Replace the Null/Missing Values under the math Column using `fillna()`¶

b. Replace the Null/Missing Values under the `group` Column using `fillna()`¶

c. Replace the Null/Missing Values under the`math` and `group` Column using `ffill` and `bfill` Arguments¶

6. Handle Repeating Values (for same information) under the `session` Column¶

Handle the Repeating Values under the session Column using `map()`¶

8. Delete Rows Having NaN values using `df.dropna()` method¶

b. Perform One-Hot Encoding for Categorical Column `gender` Only¶