My data below background:
This is question&answer data
Each Section_ID have 3 respondents
I would like to divide each “Question” and “Answer” into 3 separate columns in order to see the consensus between 3
Under “Section_ID”, each respondents’ questions and answers are unique
Other:
Question’s data is equal to Answer’s data (Question=Answer)
Each Section_ID&Interview_date has the same value repeated
==================================================== What I would like to do:
Create new columns (divide each “Question” and “Answer” into 3 separate columns, original Section_ID and Interview_date can remain the same)
Assign Question&Answer’s values into different columns
I would like Section_ID and Interview_date aligned the longst value;
Add blank if the columns(Question&Answer) doesn’t have any more values
Thank you for reading!
import pandas as pd
dic = {'Question_ID': ["AD","AD","AD","HG","HG","JI","JI","JI","JI","OP","RT","ER","ER","ER","ER"],
'Answer_ID': ['BB', 'BB', 'BB', 'GG', 'GG', 'UI', 'UI', 'UI', 'UI', 'LL', 'TT', 'RR', 'RR', 'RR', 'RR'],
'Section_ID': ['X'] * 9 + ['Z'] * 6,
'Person_Incharge': ['Y'] * 9 + ['Q'] * 6,}
df = pd.DataFrame(dic)
df
# dfx = df.set_index(["Section_ID", "Person_Incharge"]).unstack(["Question_ID","Answer_ID"])
Question_ID | Answer_ID | Section_ID | Person_Incharge | |
---|---|---|---|---|
0 | AD | BB | X | Y |
1 | AD | BB | X | Y |
2 | AD | BB | X | Y |
3 | HG | GG | X | Y |
4 | HG | GG | X | Y |
5 | JI | UI | X | Y |
6 | JI | UI | X | Y |
7 | JI | UI | X | Y |
8 | JI | UI | X | Y |
9 | OP | LL | Z | Q |
10 | RT | TT | Z | Q |
11 | ER | RR | Z | Q |
12 | ER | RR | Z | Q |
13 | ER | RR | Z | Q |
14 | ER | RR | Z | Q |
Transform the above dataframe according to the following requirements:
We'll start by identifying the Q&A pair.
The format is {Question - Answer}.
Next, the pair between multiple pairs:
The format is {(Question - Answer) - (Question - Answer) - ...}
# use groupby to group questions together
# grouped_questions = [{x: y} for x, y in df.groupby('Question_ID', as_index=False)]
grouped_questions = {}
for x, y in df.groupby('Question_ID'):
grouped_questions[x] = y.reset_index(drop=True)
# example pair for question AD
grouped_questions['AD']
Question_ID | Answer_ID | Section_ID | Person_Incharge | |
---|---|---|---|---|
0 | AD | BB | X | Y |
1 | AD | BB | X | Y |
2 | AD | BB | X | Y |
# we can either concat or merge depending on our use case
# we start by identifying which pairs the question id and its question number
question_1 = ['AD', 'OP']
question_2 = ['HG', 'RT']
question_3 = ['JI', 'ER']
# next, iterate by the length of the any one of the question list
# in this case, the iteration count will be 2
final_df = pd.DataFrame()
for index in range(len(question_1)):
first_question = grouped_questions[question_1[index]]
second_question = grouped_questions[question_2[index]]
third_question = grouped_questions[question_3[index]]
# merge first question with second question
merged_df = first_question.merge(second_question, how='outer', left_index=True, right_index=True, suffixes=["_1", "_2"])
# merge second question with third question
merged_df = merged_df.merge(third_question, how='outer', left_index=True, right_index=True)
merged_df = merged_df.rename({'Question_ID': 'Question_ID_3', 'Answer_ID': 'Answer_ID_3'}, axis=1)
merged_df = merged_df.fillna("Blank")
if final_df.shape[0] == 0:
final_df = merged_df
else:
final_df = pd.concat([final_df, merged_df])
final_df.reset_index(drop=True)
Question_ID_1 | Answer_ID_1 | Section_ID_1 | Person_Incharge_1 | Question_ID_2 | Answer_ID_2 | Section_ID_2 | Person_Incharge_2 | Question_ID_3 | Answer_ID_3 | Section_ID | Person_Incharge | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AD | BB | X | Y | HG | GG | X | Y | JI | UI | X | Y |
1 | AD | BB | X | Y | HG | GG | X | Y | JI | UI | X | Y |
2 | AD | BB | X | Y | Blank | Blank | Blank | Blank | JI | UI | X | Y |
3 | Blank | Blank | Blank | Blank | Blank | Blank | Blank | Blank | JI | UI | X | Y |
4 | OP | LL | Z | Q | RT | TT | Z | Q | ER | RR | Z | Q |
5 | Blank | Blank | Blank | Blank | Blank | Blank | Blank | Blank | ER | RR | Z | Q |
6 | Blank | Blank | Blank | Blank | Blank | Blank | Blank | Blank | ER | RR | Z | Q |
7 | Blank | Blank | Blank | Blank | Blank | Blank | Blank | Blank | ER | RR | Z | Q |
for num in range(1, 3):
final_df = final_df.drop(["Person_Incharge_{}".format(num), "Section_ID_{}".format(num)],axis=1)
final_df
Question_ID_1 | Answer_ID_1 | Question_ID_2 | Answer_ID_2 | Question_ID_3 | Answer_ID_3 | Section_ID | Person_Incharge | |
---|---|---|---|---|---|---|---|---|
0 | AD | BB | HG | GG | JI | UI | X | Y |
1 | AD | BB | HG | GG | JI | UI | X | Y |
2 | AD | BB | Blank | Blank | JI | UI | X | Y |
3 | Blank | Blank | Blank | Blank | JI | UI | X | Y |
0 | OP | LL | RT | TT | ER | RR | Z | Q |
1 | Blank | Blank | Blank | Blank | ER | RR | Z | Q |
2 | Blank | Blank | Blank | Blank | ER | RR | Z | Q |
3 | Blank | Blank | Blank | Blank | ER | RR | Z | Q |
# to put section id and person incharge to the front
col = final_df.pop('Section_ID')
final_df.insert(0, col.name, col)
col = final_df.pop('Person_Incharge')
final_df.insert(1, col.name, col)
# also reset the index
final_df.reset_index(drop=True)
Section_ID | Person_Incharge | Question_ID_1 | Answer_ID_1 | Question_ID_2 | Answer_ID_2 | Question_ID_3 | Answer_ID_3 | |
---|---|---|---|---|---|---|---|---|
0 | X | Y | AD | BB | HG | GG | JI | UI |
1 | X | Y | AD | BB | HG | GG | JI | UI |
2 | X | Y | AD | BB | Blank | Blank | JI | UI |
3 | X | Y | Blank | Blank | Blank | Blank | JI | UI |
4 | Z | Q | OP | LL | RT | TT | ER | RR |
5 | Z | Q | Blank | Blank | Blank | Blank | ER | RR |
6 | Z | Q | Blank | Blank | Blank | Blank | ER | RR |
7 | Z | Q | Blank | Blank | Blank | Blank | ER | RR |
df_split_questions = df.copy()
question_ids = ['AD', 'HG', 'JI', 'OP', 'RT', 'ER']
for id in question_ids:
df_split_questions['Question_ID_' + id] = df['Question_ID']==id
df_split_questions
Question_ID | Answer_ID | Section_ID | Person_Incharge | Question_ID_AD | Question_ID_HG | Question_ID_JI | Question_ID_OP | Question_ID_RT | Question_ID_ER | |
---|---|---|---|---|---|---|---|---|---|---|
0 | AD | BB | X | Y | True | False | False | False | False | False |
1 | AD | BB | X | Y | True | False | False | False | False | False |
2 | AD | BB | X | Y | True | False | False | False | False | False |
3 | HG | GG | X | Y | False | True | False | False | False | False |
4 | HG | GG | X | Y | False | True | False | False | False | False |
5 | JI | UI | X | Y | False | False | True | False | False | False |
6 | JI | UI | X | Y | False | False | True | False | False | False |
7 | JI | UI | X | Y | False | False | True | False | False | False |
8 | JI | UI | X | Y | False | False | True | False | False | False |
9 | OP | LL | Z | Q | False | False | False | True | False | False |
10 | RT | TT | Z | Q | False | False | False | False | True | False |
11 | ER | RR | Z | Q | False | False | False | False | False | True |
12 | ER | RR | Z | Q | False | False | False | False | False | True |
13 | ER | RR | Z | Q | False | False | False | False | False | True |
14 | ER | RR | Z | Q | False | False | False | False | False | True |
df_split_answers = df_split_questions.copy()
answer_ids = ['BB', 'GG', 'UI', 'LL', 'RR']
for id in answer_ids:
df_split_answers['Answer_ID_' + id] = df['Answer_ID']==id
df_split_answers.drop(['Question_ID', 'Answer_ID'], axis=1)
Section_ID | Person_Incharge | Question_ID_AD | Question_ID_HG | Question_ID_JI | Question_ID_OP | Question_ID_RT | Question_ID_ER | Answer_ID_BB | Answer_ID_GG | Answer_ID_UI | Answer_ID_LL | Answer_ID_RR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | X | Y | True | False | False | False | False | False | True | False | False | False | False |
1 | X | Y | True | False | False | False | False | False | True | False | False | False | False |
2 | X | Y | True | False | False | False | False | False | True | False | False | False | False |
3 | X | Y | False | True | False | False | False | False | False | True | False | False | False |
4 | X | Y | False | True | False | False | False | False | False | True | False | False | False |
5 | X | Y | False | False | True | False | False | False | False | False | True | False | False |
6 | X | Y | False | False | True | False | False | False | False | False | True | False | False |
7 | X | Y | False | False | True | False | False | False | False | False | True | False | False |
8 | X | Y | False | False | True | False | False | False | False | False | True | False | False |
9 | Z | Q | False | False | False | True | False | False | False | False | False | True | False |
10 | Z | Q | False | False | False | False | True | False | False | False | False | False | False |
11 | Z | Q | False | False | False | False | False | True | False | False | False | False | True |
12 | Z | Q | False | False | False | False | False | True | False | False | False | False | True |
13 | Z | Q | False | False | False | False | False | True | False | False | False | False | True |
14 | Z | Q | False | False | False | False | False | True | False | False | False | False | True |