import pandas as pd
df = pd.read_excel("..//data//sample.xlsx", header = [3,4])
df.head()
Roll no. | Name | Class | Math | English | |||||
---|---|---|---|---|---|---|---|---|---|
Unnamed: 0_level_1 | Unnamed: 1_level_1 | Unnamed: 2_level_1 | 1st | 2nd | 3rd | 1st | 2nd | 3rd | |
0 | 1 | A | 10 | 88 | 93 | 91 | 64 | 73 | 70 |
1 | 2 | B | 9 | 81 | 75 | 89 | 67 | 97 | 90 |
2 | 3 | C | 9 | 90 | 95 | 90 | 66 | 81 | 76 |
3 | 4 | D | 10 | 84 | 59 | 81 | 66 | 56 | 57 |
4 | 5 | E | 9 | 58 | 94 | 72 | 60 | 94 | 66 |
df.columns
MultiIndex([('Roll no.', 'Unnamed: 0_level_1'), ( 'Name', 'Unnamed: 1_level_1'), ( 'Class', 'Unnamed: 2_level_1'), ( 'Math', '1st'), ( 'Math', '2nd'), ( 'Math', '3rd'), ( 'English', '1st'), ( 'English', '2nd'), ( 'English', '3rd')], )
df = df.set_index([('Roll no.', 'Unnamed: 0_level_1'),
( 'Name', 'Unnamed: 1_level_1'),
( 'Class', 'Unnamed: 2_level_1')]).rename_axis(['Roll no.', 'Name', 'Class'])
df
Math | English | |||||||
---|---|---|---|---|---|---|---|---|
1st | 2nd | 3rd | 1st | 2nd | 3rd | |||
Roll no. | Name | Class | ||||||
1 | A | 10 | 88 | 93 | 91 | 64 | 73 | 70 |
2 | B | 9 | 81 | 75 | 89 | 67 | 97 | 90 |
3 | C | 9 | 90 | 95 | 90 | 66 | 81 | 76 |
4 | D | 10 | 84 | 59 | 81 | 66 | 56 | 57 |
5 | E | 9 | 58 | 94 | 72 | 60 | 94 | 66 |
6 | F | 10 | 63 | 97 | 96 | 76 | 66 | 65 |
7 | G | 10 | 97 | 77 | 59 | 80 | 71 | 59 |
8 | H | 9 | 94 | 55 | 91 | 63 | 84 | 62 |
9 | I | 9 | 99 | 92 | 99 | 78 | 86 | 85 |
10 | J | 10 | 73 | 87 | 82 | 57 | 62 | 88 |
.loc[-- , --]
¶df.loc[:,(['Math', 'English'], ['1st', '2nd' , '3rd'])].head()
Math | English | |||||||
---|---|---|---|---|---|---|---|---|
1st | 2nd | 3rd | 1st | 2nd | 3rd | |||
Roll no. | Name | Class | ||||||
1 | A | 10 | 88 | 93 | 91 | 64 | 73 | 70 |
2 | B | 9 | 81 | 75 | 89 | 67 | 97 | 90 |
3 | C | 9 | 90 | 95 | 90 | 66 | 81 | 76 |
4 | D | 10 | 84 | 59 | 81 | 66 | 56 | 57 |
5 | E | 9 | 58 | 94 | 72 | 60 | 94 | 66 |
Remember that []
and ()
have special meaning when dealing with a MultiIndex
object:
(...) a tuple is interpreted as one multi-level key
[...] a list is used to specify several keys [on the same level].
(...) a tuple of lists refer to several values within a level
When we write (['one', 'two'], ['a', 'b'])
, the first list inside the tuple specifies all the values we want from the 1st level of the MultiIndex
. The second list inside the tuple specifies all the values we want from the 2nd level of the MultiIndex
.
To specify that we want anything from the first level. And then specify which columns from the second level we want, we can use use pd.IndexSlice
, which helps slicing frames with more elaborate indices.
df.loc[:, pd.IndexSlice[:, ['1st', '3rd']]].head()
Math | English | Math | English | |||
---|---|---|---|---|---|---|
1st | 1st | 3rd | 3rd | |||
Roll no. | Name | Class | ||||
1 | A | 10 | 88 | 64 | 91 | 70 |
2 | B | 9 | 81 | 67 | 89 | 90 |
3 | C | 9 | 90 | 66 | 90 | 76 |
4 | D | 10 | 84 | 66 | 81 | 57 |
5 | E | 9 | 58 | 60 | 72 | 66 |