In this notebook, we will learn to create new DataFrame
object from other data structures( e.g.,numpy array and dictionary) and convert data frame to numpy array and dictionary. The defult setting for pandas DataFrame
is
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set()
DataFrame
from Numpy array.¶Let's create a random array of size(100,20) and random column names. We will use these array and column names to create the DataFrame
in next step.
import random as random
A = np.random.rand(100,10)
letter = ['A','B','C','D','E','F','G','H','X']
def namer(n):
col_names = [ random.choice(letter)\
+random.choice(letter)\
+random.choice(letter)\
+random.choice(letter) for i in range(n)]
return col_names
print(namer(A.shape[1]))
['HHEE', 'FEGD', 'BFHC', 'HXFC', 'CBDF', 'DEDH', 'CBCX', 'XGXB', 'GCBC', 'FDEE']
df = pd.DataFrame(A, columns = col_names )
df.head()
AAGF | XBGA | DXAC | XEDB | EDCG | ABDH | GFHX | FAAB | BBEC | DGDC | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.285058 | 0.456733 | 0.841292 | 0.397957 | 0.888982 | 0.970782 | 0.379687 | 0.565177 | 0.657939 | 0.461385 |
1 | 0.725005 | 0.432756 | 0.037801 | 0.020203 | 0.590901 | 0.590571 | 0.623529 | 0.166581 | 0.179392 | 0.454290 |
2 | 0.668743 | 0.352332 | 0.642905 | 0.427461 | 0.025124 | 0.365414 | 0.609983 | 0.568686 | 0.522738 | 0.525048 |
3 | 0.807165 | 0.827478 | 0.542088 | 0.628743 | 0.616745 | 0.386370 | 0.632225 | 0.387794 | 0.618686 | 0.786503 |
4 | 0.645945 | 0.136078 | 0.769546 | 0.721885 | 0.107891 | 0.128859 | 0.938451 | 0.875492 | 0.647702 | 0.148635 |
new DataFrame
to a file:df.to_csv('data/test.csv')
DataFrame
from list of dictionaries.¶Here we will create a list with collection of dictionaries. Each of the dictionary will have keys and values. Using this list of dictionaries, we will create another DataFrame
. The keys of the dictionary will serve as the column names.
LD = []
for i in range(100):
LD.append({'Player' : namer(1)[0],\
'game1' : random.uniform(0,1),\
'game2' : random.uniform(0,1),\
'game3' : random.uniform(0,1),
'game4' : random.uniform(0,1),
'game5' : random.uniform(0,1)})
LD[0]
{'Player': 'BGXB', 'game1': 0.2965944756471328, 'game2': 0.11334763879800447, 'game3': 0.028543866127768824, 'game4': 0.225405432495144, 'game5': 0.05423542200055986}
DF = pd.DataFrame(LD)
DF=DF.set_index("Player")
DF.head(10)
game1 | game2 | game3 | game4 | game5 | |
---|---|---|---|---|---|
Player | |||||
BGXB | 0.296594 | 0.113348 | 0.028544 | 0.225405 | 0.054235 |
DBDB | 0.047226 | 0.107065 | 0.801571 | 0.816877 | 0.556934 |
AXXH | 0.862611 | 0.439051 | 0.083341 | 0.389785 | 0.258748 |
BXED | 0.643533 | 0.082176 | 0.167241 | 0.405304 | 0.088063 |
FXGE | 0.279076 | 0.000998 | 0.949414 | 0.303408 | 0.009342 |
AHDC | 0.617194 | 0.272401 | 0.252663 | 0.788798 | 0.130996 |
CXGA | 0.104552 | 0.895106 | 0.414877 | 0.167643 | 0.454175 |
BDBA | 0.045650 | 0.926742 | 0.454097 | 0.055006 | 0.939082 |
HBDC | 0.069192 | 0.797224 | 0.943648 | 0.567334 | 0.044285 |
EBBX | 0.383365 | 0.852788 | 0.679330 | 0.418570 | 0.817291 |
DataFrame
from a List :¶A = [random.uniform(0,1)for i in range(10)]
B = [random.uniform(0,1)for i in range(10)]
C = [random.uniform(0,1)for i in range(10)]
D = [random.uniform(0,1)for i in range(10)]
df = pd.DataFrame()
df['A'],df['B'],df['C'],df['D'] = A,B,C,D
df.head()
A | B | C | D | |
---|---|---|---|---|
0 | 0.816389 | 0.212530 | 0.705185 | 0.225743 |
1 | 0.646496 | 0.876869 | 0.648350 | 0.788687 |
2 | 0.869173 | 0.832004 | 0.516009 | 0.917783 |
3 | 0.668866 | 0.778850 | 0.834528 | 0.243842 |
4 | 0.742311 | 0.984313 | 0.872512 | 0.451476 |