DataFrame Creation¶

In this notebook, we will learn to create new DataFrame object from other data structures( e.g.,numpy array and dictionary) and convert data frame to numpy array and dictionary. The defult setting for pandas DataFrame is

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

In [1]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set()

1. To create new `DataFrame` from Numpy array.¶

Let's create a random array of size(100,20) and random column names. We will use these array and column names to create the DataFrame in next step.

In [7]:

import random as random
A = np.random.rand(100,10)
letter = ['A','B','C','D','E','F','G','H','X']

def namer(n):
    col_names = [ random.choice(letter)\
             +random.choice(letter)\
             +random.choice(letter)\
             +random.choice(letter) for i in range(n)]
    return col_names

In [8]:

print(namer(A.shape[1]))

['HHEE', 'FEGD', 'BFHC', 'HXFC', 'CBDF', 'DEDH', 'CBCX', 'XGXB', 'GCBC', 'FDEE']

In [9]:

df = pd.DataFrame(A, columns = col_names )
df.head()

Out[9]:

	AAGF	XBGA	DXAC	XEDB	EDCG	ABDH	GFHX	FAAB	BBEC	DGDC
0	0.285058	0.456733	0.841292	0.397957	0.888982	0.970782	0.379687	0.565177	0.657939	0.461385
1	0.725005	0.432756	0.037801	0.020203	0.590901	0.590571	0.623529	0.166581	0.179392	0.454290
2	0.668743	0.352332	0.642905	0.427461	0.025124	0.365414	0.609983	0.568686	0.522738	0.525048
3	0.807165	0.827478	0.542088	0.628743	0.616745	0.386370	0.632225	0.387794	0.618686	0.786503
4	0.645945	0.136078	0.769546	0.721885	0.107891	0.128859	0.938451	0.875492	0.647702	0.148635

To save data from new DataFrame to a file:

In [19]:

df.to_csv('data/test.csv')

2. To create new `DataFrame` from list of dictionaries.¶

Here we will create a list with collection of dictionaries. Each of the dictionary will have keys and values. Using this list of dictionaries, we will create another DataFrame. The keys of the dictionary will serve as the column names.

In [18]:

LD = []
for i in range(100):
    LD.append({'Player' : namer(1)[0],\
               'game1' : random.uniform(0,1),\
               'game2' : random.uniform(0,1),\
               'game3' : random.uniform(0,1),
               'game4' : random.uniform(0,1),
               'game5' : random.uniform(0,1)})

In [19]:

LD[0]

Out[19]:

{'Player': 'BGXB',
 'game1': 0.2965944756471328,
 'game2': 0.11334763879800447,
 'game3': 0.028543866127768824,
 'game4': 0.225405432495144,
 'game5': 0.05423542200055986}

In [20]:

DF = pd.DataFrame(LD)
DF=DF.set_index("Player")

In [21]:

DF.head(10)

Out[21]:

	game1	game2	game3	game4	game5
Player
BGXB	0.296594	0.113348	0.028544	0.225405	0.054235
DBDB	0.047226	0.107065	0.801571	0.816877	0.556934
AXXH	0.862611	0.439051	0.083341	0.389785	0.258748
BXED	0.643533	0.082176	0.167241	0.405304	0.088063
FXGE	0.279076	0.000998	0.949414	0.303408	0.009342
AHDC	0.617194	0.272401	0.252663	0.788798	0.130996
CXGA	0.104552	0.895106	0.414877	0.167643	0.454175
BDBA	0.045650	0.926742	0.454097	0.055006	0.939082
HBDC	0.069192	0.797224	0.943648	0.567334	0.044285
EBBX	0.383365	0.852788	0.679330	0.418570	0.817291

3. To create `DataFrame` from a List :¶

In [26]:

A = [random.uniform(0,1)for i in range(10)]
B = [random.uniform(0,1)for i in range(10)]
C = [random.uniform(0,1)for i in range(10)]
D = [random.uniform(0,1)for i in range(10)]

df = pd.DataFrame()
df['A'],df['B'],df['C'],df['D'] = A,B,C,D
df.head()

Out[26]:

	A	B	C	D
0	0.816389	0.212530	0.705185	0.225743
1	0.646496	0.876869	0.648350	0.788687
2	0.869173	0.832004	0.516009	0.917783
3	0.668866	0.778850	0.834528	0.243842
4	0.742311	0.984313	0.872512	0.451476

References:¶

Pydata document for Styling DataFrame visualization

DataFrame Creation¶

1. To create new DataFrame from Numpy array.¶

2. To create new DataFrame from list of dictionaries.¶

3. To create DataFrame from a List :¶

References:¶

1. To create new `DataFrame` from Numpy array.¶

2. To create new `DataFrame` from list of dictionaries.¶

3. To create `DataFrame` from a List :¶