In [1]:

import sys
sys.executable

Out[1]:

'C:\\Users\\Karishma\\anaconda3\\envs\\ickle\\python.exe'

Ickle¶

Data Analysis Library for Python

This Jupyter Notebook serves as the documentation for Ickle.

You can contribute to Ickle here: https://github.com/karishmashuklaa/ickle

Table Of Contents¶

Getting Started
DataFrame and Visual Representation
Basic Properties
Selection of Subsets
Basic and Aggregation Methods
Non-Aggregation Methods
Other Methods
Arithmetic and Comparison Operators
String-Only Methods
Pivot Table
Read CSV
Read SQL
Read Excel

Getting Started¶

Installation¶

Ickle can be installed via pip.

pip install ickle

Import¶

import ickle as ick

DataFrame and Visual Representation¶

Go to table of contents

DataFrame¶

A DataFrame holds two dimensional heterogenous data. It accepts dictionary as input, with Numpy arrays as values and strings as column names.

Parameters:

data: A dictionary of strings mapped to Numpy arrays. The key will become the column name.

In [3]:

import numpy as np
import ickle as ick

In [3]:

name = np.array(['John', 'Sam', 'Tina', 'Josh', 'Jack', 'Jill'])
place = np.array(['Kolkata', 'Mumbai', 'Delhi', 'Mumbai', 'Mumbai', 'Mumbai'])
weight = np.array([57, 70, 54, 59, 62, 70])
married = np.array([True, False, True, False, False, False])

data = {'name': name, 'place': place, 'weight': weight, 'married': married}
df = ick.DataFrame(data)

Visual Representation¶

DataFrame can be displayed in the following manner

In [4]:

df

Out[4]:

	name	place	weight	married
0	John	Kolkata	57	True
1	Sam	Mumbai	70	False
2	Tina	Delhi	54	True
3	Josh	Mumbai	59	False
4	Jack	Mumbai	62	False
5	Jill	Mumbai	70	False

We will use the above DataFrame throughout the notebook

Go to table of contents

`len`¶

returns: the number of rows in the DataFrame

In [5]:

len(df)

Out[5]:

`columns`¶

returns: list of column names

In [6]:

df.columns

Out[6]:

['name', 'place', 'weight', 'married']

Modify exisiting column names¶

In [7]:

df.columns = ['NAME', 'PLACE', 'WEIGHT', 'MARRIED']
df

Out[7]:

	NAME	PLACE	WEIGHT	MARRIED
0	John	Kolkata	57	True
1	Sam	Mumbai	70	False
2	Tina	Delhi	54	True
3	Josh	Mumbai	59	False
4	Jack	Mumbai	62	False
5	Jill	Mumbai	70	False

`shape`¶

returns: two-item tuple of number of rows and columns in the DataFrame

In [8]:

df.shape

Out[8]:

(6, 4)

`values`¶

returns: a single 2D NumPy array of all the columns of data.

In [9]:

df.values

Out[9]:

array([['John', 'Kolkata', 57, True],
       ['Sam', 'Mumbai', 70, False],
       ['Tina', 'Delhi', 54, True],
       ['Josh', 'Mumbai', 59, False],
       ['Jack', 'Mumbai', 62, False],
       ['Jill', 'Mumbai', 70, False]], dtype=object)

`dtypes`¶

returns: a two-column DataFrame of column names in one column and their data type in the other

In [10]:

df.dtypes

Out[10]:

	Column Name	Data Type
0	NAME	string
1	PLACE	string
2	WEIGHT	int
3	MARRIED	bool

Selection of Subsets¶

Select a single column
Select multiple columns
Boolean selection
Simultaneuous selection of row and column
Add new / Overwrite existing columns

Go to table of contents

Select a single column¶

by passing the name of column as a string

In [11]:

df['NAME']

Out[11]:

	NAME
0	John
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

Select multiple columns¶

by passing column names as a list of strings

In [12]:

df[['NAME', 'PLACE']]

Out[12]:

	NAME	PLACE
0	John	Kolkata
1	Sam	Mumbai
2	Tina	Delhi
3	Josh	Mumbai
4	Jack	Mumbai
5	Jill	Mumbai

Boolean Selection¶

In [13]:

bool_sel = df['WEIGHT'] > 60

df[bool_sel]

Out[13]:

	NAME	PLACE	WEIGHT	MARRIED
0	Sam	Mumbai	70	False
1	Jack	Mumbai	62	False
2	Jill	Mumbai	70	False

Simultaneuous selection of row and column¶

df[row, col]

In [14]:

df[0,2]

Out[14]:

	WEIGHT
0	57

Select columns as strings¶

In [15]:

df[0, 'WEIGHT']

Out[15]:

	WEIGHT
0	57

Select rows as slices¶

In [16]:

df[:1, 'WEIGHT']

Out[16]:

	WEIGHT
0	57

Select rows as booleans and lists¶

In [17]:

bool_row = df['MARRIED']

In [18]:

df[bool_row, 'WEIGHT']

Out[18]:

	WEIGHT
0	57
1	54

Add new / Overwrite existing columns¶

In [19]:

df['AGE'] = np.array([21, 41, 22, 42, 32, 25])
df

Out[19]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22
3	Josh	Mumbai	59	False	42
4	Jack	Mumbai	62	False	32
5	Jill	Mumbai	70	False	25

Basic And Aggregation Methods¶

Basic Methods:

head()
tail()

Aggregation Methods:

min()
max()
mean()
median()
sum()
var()
std()
all()
any()
argmax()
argmin()

Go to table of contents

head¶

head(n)

returns: the first n rows. By default n=5

In [20]:

df.head()

Out[20]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22
3	Josh	Mumbai	59	False	42
4	Jack	Mumbai	62	False	32

tail¶

tail(n) return the last n rows.By default n=5

In [21]:

df.tail()

Out[21]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	Sam	Mumbai	70	False	41
1	Tina	Delhi	54	True	22
2	Josh	Mumbai	59	False	42
3	Jack	Mumbai	62	False	32
4	Jill	Mumbai	70	False	25

Aggregation Methods¶

All aggregation methods are applied only column-wise and not row-wise.

min()
max()
median()
mean()
sum()
var()
std()
all()
any()
argmax()
argmin()

min¶

min()

used to get a minimum value for each column

In [22]:

df.min()

Out[22]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	Jack	Delhi	54	False	21

max¶

max()

used to get a maximum value for each column

In [23]:

df.max()

Out[23]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	Tina	Mumbai	70	True	42

median¶

median()

computes median for each numeric column

In [24]:

df.median()

Out[24]:

	WEIGHT	MARRIED	AGE
0	60.500	0.000	28.500

mean¶

mean()

computes mean for each numeric column

In [25]:

df.mean()

Out[25]:

	WEIGHT	MARRIED	AGE
0	62.000	0.333	30.500

sum¶

sum()

returns the sum of all values for each column

In [26]:

df.sum()

Out[26]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	JohnSamTinaJoshJackJill	KolkataMumbaiDelhiMumbaiMumbaiMumbai	372	2	183

var¶

var()

computes variance for each numeric column

In [27]:

df.var()

Out[27]:

	WEIGHT	MARRIED	AGE
0	37.667	0.222	72.917

std¶

std()

computes standard deviation for each numeric column

In [28]:

df.std()

Out[28]:

	WEIGHT	MARRIED	AGE
0	6.137	0.471	8.539

all¶

all()

it tests whether all array elements along the columns evaluate to True

In [29]:

df.all()

Out[29]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	Jill	Mumbai	True	False	True

any¶

any()

it tests whether any array element along the columns evaluate to True

In [30]:

df.any()

Out[30]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	True	True	True

argmax¶

argmax()

returns the indices of the maximum values

In [31]:

df.argmax()

Out[31]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	2	1	1	0	3

argmin¶

argmin()

returns the indices of the minimum values

In [32]:

df.argmin()

Out[32]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	4	2	2	1	0

Other Methods¶

isna()
count()
unique()
nunique()
value_counts()
rename()
drop()
diff()
pct_change()
sort_values()
sample()

isna¶

isna()

Determines whether each value in the DataFrame is missing or not

In [33]:

df.isna()

Out[33]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	False	False	False	False	False
1	False	False	False	False	False
2	False	False	False	False	False
3	False	False	False	False	False
4	False	False	False	False	False
5	False	False	False	False	False

count¶

count()

Counts the number of non-missing values per column

In [107]:

df.count()

Out[107]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	6	6	6	6	6

unique¶

unique()

Finds the unique values of each column

In [35]:

dfs = df.unique()
dfs[3]

Out[35]:

	MARRIED
0	False
1	True

nunique¶

nunique()

Finds the number of unique values in each column.

Identical to unique()

In [36]:

df.nunique()

Out[36]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	6	3	5	2	6

value_counts¶

value_counts(normalize=False)

Returns the frequency of each unique value for each column.

Parameters

normalize: bool

If True, returns the relative frequencies(percent)

In [37]:

dfs = df.value_counts()
dfs[1]

Out[37]:

	PLACE	count
0	Mumbai	4
1	Delhi	1
2	Kolkata	1

rename¶

rename(columns)

Renames columns in the DataFrame

Parameters

columns: dict

A dictionary mapping the old column name to the new column name

In [38]:

df.rename({'WEIGHT': 'WEIGHT (kg)'})

Out[38]:

	NAME	PLACE	WEIGHT (kg)	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22
3	Josh	Mumbai	59	False	42
4	Jack	Mumbai	62	False	32
5	Jill	Mumbai	70	False	25

drop¶

drop(columns)

Drops one or more columns from a DataFrame

Parameters

columns: str or list of strings

In [39]:

df.drop('AGE')

Out[39]:

	NAME	PLACE	WEIGHT	MARRIED
0	John	Kolkata	57	True
1	Sam	Mumbai	70	False
2	Tina	Delhi	54	True
3	Josh	Mumbai	59	False
4	Jack	Mumbai	62	False
5	Jill	Mumbai	70	False

diff¶

diff(n=1)

Take the difference between the current value and the nth value above it

Parameters

n: int

In [111]:

df.diff(n=2)

Out[111]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	nan	nan	nan
1	Sam	Mumbai	nan	nan	nan
2	Tina	Delhi	-3.000	0.000	1.000
3	Josh	Mumbai	-11.000	0.000	1.000
4	Jack	Mumbai	8.000	-1.000	10.000
5	Jill	Mumbai	11.000	0.000	-17.000

pct_change¶

pct_change(n=1)

Take the percentage difference between the current value and the nth value above it

Parameters

n: int

In [104]:

df.pct_change(n=1)

Out[104]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	nan	nan	nan
1	Sam	Mumbai	0.228	-1.000	0.952
2	Tina	Delhi	-0.229	inf	-0.463
3	Josh	Mumbai	0.093	-1.000	0.909
4	Jack	Mumbai	0.051	nan	-0.238
5	Jill	Mumbai	0.129	nan	-0.219

sort_values¶

sort_values(by, asc=True)

Sort the DataFrame by one or more values

Parameters

by: str or list of column names

asc: boolean of sorting order

In [115]:

df.sort_values('AGE', asc=False)

Out[115]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	Josh	Mumbai	59	False	42
1	Sam	Mumbai	70	False	41
2	Jack	Mumbai	62	False	32
3	Jill	Mumbai	70	False	25
4	Tina	Delhi	54	True	22
5	John	Kolkata	57	True	21

sample¶

sample(n=None, frac=None, replace=False, seed=None)

Randomly samples rows of the DataFrame

Parameters

n: int

number of rows to return

frac: float

Proportion of the data to sample

replace: bool

Whether or not to sample with replacement

seed: int

Seed the random number generator

In [116]:

df.sample(n=4, frac=0.5, replace=True)

Out[116]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	Jack	Mumbai	62	False	32
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22

Non Aggregation Methods¶

abs()
cummin()
cummax()
cumsum()
clip()
round()
copy()

Go to table of contents

abs¶

abs()

Takes the absolute value of each value in the DataFrame

In [40]:

df.abs()

Out[40]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22
3	Josh	Mumbai	59	False	42
4	Jack	Mumbai	62	False	32
5	Jill	Mumbai	70	False	25

cummin¶

cummin()

Finds cumulative minimum by column

In [41]:

df.cummin()

Out[41]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	57	False	21
2	Tina	Delhi	54	False	21
3	Josh	Mumbai	54	False	21
4	Jack	Mumbai	54	False	21
5	Jill	Mumbai	54	False	21

cummax¶

cummax()

Finds cumulative maximum by column

In [42]:

df.cummax()

Out[42]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	True	41
2	Tina	Delhi	70	True	41
3	Josh	Mumbai	70	True	42
4	Jack	Mumbai	70	True	42
5	Jill	Mumbai	70	True	42

cumsum¶

cumsum()

Finds cumulative sum by column

In [43]:

df.cumsum()

Out[43]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	1	21
1	Sam	Mumbai	127	1	62
2	Tina	Delhi	181	2	84
3	Josh	Mumbai	240	2	126
4	Jack	Mumbai	302	2	158
5	Jill	Mumbai	372	2	183

clip¶

clip(lower=None, upper=None)

All values less than lower will be set to lower

All values greater than upper will be set to upper

Parameters

lower: number or None

upper: number or None

In [44]:

df.clip(lower=55, upper=60)

Out[44]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	55	55
1	Sam	Mumbai	60	55	55
2	Tina	Delhi	55	55	55
3	Josh	Mumbai	59	55	55
4	Jack	Mumbai	60	55	55
5	Jill	Mumbai	60	55	55

round¶

round(n)

Rounds values to the nearest n decimals

In [45]:

df.round(n=1)

Out[45]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22
3	Josh	Mumbai	59	False	42
4	Jack	Mumbai	62	False	32
5	Jill	Mumbai	70	False	25

copy¶

copy()

Copies the DataFrame

In [46]:

df.copy()

Out[46]:

	NAME	PLACE	WEIGHT	MARRIED	AGE
0	John	Kolkata	57	True	21
1	Sam	Mumbai	70	False	41
2	Tina	Delhi	54	True	22
3	Josh	Mumbai	59	False	42
4	Jack	Mumbai	62	False	32
5	Jill	Mumbai	70	False	25

Arithmetic And Comparison Operators¶

Addition
Subtraction
Multiplication
Division
Floor Division
Power
Greater than
Less than
Greater than equal to
Lesser than equal to
Not Equal
Equal

Arithmetic and Comparison Operators only work with numerical columns

Go to table of contents

In [110]:

df_op = df['WEIGHT']
df_op

Out[110]:

	WEIGHT
0	57
1	70
2	54
3	59
4	62
5	70

Addition¶

In [50]:

df_op + 2

Out[50]:

	WEIGHT
0	59
1	72
2	56
3	61
4	64
5	72

In [51]:

2 + df_op

Out[51]:

	WEIGHT
0	59
1	72
2	56
3	61
4	64
5	72

Subtraction¶

In [52]:

df_op - 2

Out[52]:

	WEIGHT
0	55
1	68
2	52
3	57
4	60
5	68

In [53]:

2 - df_op

Out[53]:

	WEIGHT
0	-55
1	-68
2	-52
3	-57
4	-60
5	-68

Multiplication¶

In [54]:

df_op * 2

Out[54]:

	WEIGHT
0	114
1	140
2	108
3	118
4	124
5	140

In [55]:

2 * df_op

Out[55]:

	WEIGHT
0	114
1	140
2	108
3	118
4	124
5	140

Division¶

In [56]:

df_op / 2

Out[56]:

	WEIGHT
0	28.500
1	35.000
2	27.000
3	29.500
4	31.000
5	35.000

In [57]:

2 / df_op 

Out[57]:

	WEIGHT
0	0.035
1	0.029
2	0.037
3	0.034
4	0.032
5	0.029

Floor Division¶

In [58]:

df_op // 2

Out[58]:

	WEIGHT
0	28
1	35
2	27
3	29
4	31
5	35

In [59]:

2 // df_op

Out[59]:

	WEIGHT
0	0
1	0
2	0
3	0
4	0
5	0

Power¶

In [60]:

df_op ** 3

Out[60]:

	WEIGHT
0	185193
1	343000
2	157464
3	205379
4	238328
5	343000

In [61]:

3 ** df_op

Out[61]:

	WEIGHT
0	886634019
1	-102221863
2	32838297
3	-610228421
4	703701817
5	-102221863

Greater than¶

In [62]:

df_op > 50

Out[62]:

	WEIGHT
0	True
1	True
2	True
3	True
4	True
5	True

Less than¶

In [63]:

df_op < 55

Out[63]:

	WEIGHT
0	False
1	False
2	True
3	False
4	False
5	False

Greater than equal¶

In [64]:

df_op >= 75

Out[64]:

	WEIGHT
0	False
1	False
2	False
3	False
4	False
5	False

Lesser than equal¶

In [65]:

df_op <= 55

Out[65]:

	WEIGHT
0	False
1	False
2	True
3	False
4	False
5	False

Not Equal¶

In [66]:

df_op != 55

Out[66]:

	WEIGHT
0	True
1	True
2	True
3	True
4	True
5	True

Equal¶

In [67]:

df_op == 70

Out[67]:

	WEIGHT
0	False
1	True
2	False
3	False
4	False
5	True

String Only Methods¶

All the strings behave in the same manner as built-in string functions in Python. These methods can be used only with columns

capitalize()
center()
count()
endswith()
startswith()
find()
len()
get()
index()
isalnum()
isalpha()
isdecimal()
islower()
isnumeric()
isspace()
istitle()
isupper()
lstrip()
rstrip()
strip()
replace()
swapcase()
title()
lower()
upper()
zfill()
encode()

Go to table of contents

capitalize¶

capitalize(col)

In [68]:

df.str.capitalize('NAME')

Out[68]:

	NAME
0	John
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

center¶

center(col, width, fillchar=None)

In [69]:

df.str.center('NAME', 10, 'a')

Out[69]:

	NAME
0	aaaJohnaaa
1	aaaSamaaaa
2	aaaTinaaaa
3	aaaJoshaaa
4	aaaJackaaa
5	aaaJillaaa

count¶

count(col, sub, start=None, stop=None)

In [70]:

df.str.count('PLACE', 'Mumbai')

Out[70]:

	PLACE
0	0
1	1
2	0
3	1
4	1
5	1

endswith¶

endswith(col, suffix, start=None, stop=None)

In [71]:

df.str.endswith('NAME', 'n')

Out[71]:

	NAME
0	True
1	False
2	False
3	False
4	False
5	False

startswith¶

startswith(col, suffix, start=None, stop=None)

In [72]:

df.str.startswith('NAME', 'J')

Out[72]:

	NAME
0	True
1	False
2	False
3	True
4	True
5	True

find¶

find(col, sub, start=None, stop=None)

In [73]:

df.str.find('NAME', 'Tina')

Out[73]:

	NAME
0	-1
1	-1
2	0
3	-1
4	-1
5	-1

len¶

len(col)

In [74]:

df.str.len('NAME')

Out[74]:

	NAME
0	4
1	3
2	4
3	4
4	4
5	4

get¶

get(col, item)

In [75]:

df.str.get('NAME', 0)

Out[75]:

	NAME
0	J
1	S
2	T
3	J
4	J
5	J

index¶

index(col, sub, start=None, stop=None)

In [76]:

df.str.index('NAME', '')

Out[76]:

	NAME
0	0
1	0
2	0
3	0
4	0
5	0

isalnum¶

isalnum(col)

In [77]:

df.str.isalnum('NAME')

Out[77]:

	NAME
0	True
1	True
2	True
3	True
4	True
5	True

isalpha¶

isalpha(col)

In [78]:

df.str.isalpha('NAME')

Out[78]:

	NAME
0	True
1	True
2	True
3	True
4	True
5	True

isdecimal¶

isdecimal(col)

In [79]:

df.str.isdecimal('NAME')

Out[79]:

	NAME
0	False
1	False
2	False
3	False
4	False
5	False

isnumeric¶

isnumeric(col)

In [80]:

df.str.isnumeric('NAME')

Out[80]:

	NAME
0	False
1	False
2	False
3	False
4	False
5	False

isspace¶

isspace(col)

In [81]:

df.str.isspace('NAME')

Out[81]:

	NAME
0	False
1	False
2	False
3	False
4	False
5	False

istitle¶

istitle(col)

In [82]:

df.str.istitle('NAME')

Out[82]:

	NAME
0	True
1	True
2	True
3	True
4	True
5	True

isupper¶

isupper(col)

In [84]:

df.str.isupper('NAME')

Out[84]:

	NAME
0	False
1	False
2	False
3	False
4	False
5	False

islower¶

islower(col)

In [83]:

df.str.islower('NAME')

Out[83]:

	NAME
0	False
1	False
2	False
3	False
4	False
5	False

lstrip¶

lstrip(col, chars)

In [85]:

df.str.lstrip('NAME', 'o')

Out[85]:

	NAME
0	John
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

rstrip¶

rstrip(col, chars)

In [86]:

df.str.rstrip('NAME', 'o')

Out[86]:

	NAME
0	John
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

strip¶

strip(col, chars)

In [87]:

df.str.strip('NAME', 'o')

Out[87]:

	NAME
0	John
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

replace¶

replace(col, old, new, count=None)

In [88]:

df.str.replace('NAME', 'John', 'Cena')

Out[88]:

	NAME
0	Cena
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

swapcase¶

swapcase(col)

In [89]:

df.str.swapcase('NAME')

Out[89]:

	NAME
0	jOHN
1	sAM
2	tINA
3	jOSH
4	jACK
5	jILL

title¶

title(col)

In [90]:

df.str.title('NAME')

Out[90]:

	NAME
0	John
1	Sam
2	Tina
3	Josh
4	Jack
5	Jill

lower¶

lower(col)

In [91]:

df.str.lower('NAME')

Out[91]:

	NAME
0	john
1	sam
2	tina
3	josh
4	jack
5	jill

upper¶

upper(col)

In [92]:

df.str.upper('NAME')

Out[92]:

	NAME
0	JOHN
1	SAM
2	TINA
3	JOSH
4	JACK
5	JILL

zfill¶

zfill(col, width)

In [93]:

df.str.zfill('NAME', 10)

Out[93]:

	NAME
0	000000John
1	0000000Sam
2	000000Tina
3	000000Josh
4	000000Jack
5	000000Jill

encode¶

encode(col, encoding='utf-8', errors='strict')

In [94]:

df.str.encode('NAME')

Out[94]:

	NAME
0	b'John'
1	b'Sam'
2	b'Tina'
3	b'Josh'
4	b'Jack'
5	b'Jill'

Pivot Table¶

Creates a pivot table from one or two 'grouping' columns

Parameters

rows: str of column name to group by (Optional)

columns: str of column name to group by (Optional)

values: str of column name to aggregate (Required)

aggfunc: str of aggregation function

Go to table of contents

In [95]:

df.pivot_table(rows='NAME', columns='PLACE', values='WEIGHT', aggfunc='mean')

Out[95]:

	NAME	Delhi	Kolkata	Mumbai
0	Jack	nan	nan	62.000
1	Jill	nan	nan	70.000
2	John	nan	57.000	nan
3	Josh	nan	nan	59.000
4	Sam	nan	nan	70.000
5	Tina	54.000	nan	nan

Read CSV¶

Reads a CSV file into a dataframe

Parameters

filelocation: str of the file location

header: int of the number of lines to skip in case of multi-header files, None in case of header-less files (Optional)

read_csv(file, header)

Go to table of contents

In [96]:

data = ick.read_csv('./dataset/employee.csv');
data.head()

Out[96]:

	dept	race	gender	salary
0	Houston Police Department-HPD	White	Male	45279
1	Houston Fire Department (HFD)	White	Male	63166
2	Houston Police Department-HPD	Black	Male	66614
3	Public Works & Engineering-PWE	Asian	Male	71680
4	Houston Airport System (HAS)	White	Male	42390

In [10]:

data = ick.read_csv('./dataset/employee.csv', header=None);
data.head()

Out[10]:

	0	1	2	3
0	Houston Police Department-HPD	White	Male	45279
1	Houston Fire Department (HFD)	White	Male	63166
2	Houston Police Department-HPD	Black	Male	66614
3	Public Works & Engineering-PWE	Asian	Male	71680
4	Houston Airport System (HAS)	White	Male	42390

Read SQL¶

Reads the data from SQL query into a dataframe

Parameters

sql: str of the sql statement to be executed

drivername: str of driver engine of the database

username: str of username to connect to the database

password: str of password to connect to the database

host: str of host to connect to the database

port: int of port to connect to the database

database: str of database name

read_sql(sql, drivername, username, password, host, port, database)

Go to table of contents

In [11]:

drivername="postgresql"
username="postgres"
password="test123" 
host="localhost"
port="5432"
database="spotify_trends"
sql="SELECT id, firstname, lastname, email FROM authors"

ick.read_sql(sql,drivername,username,password,host,port,database)

Out[11]:

	id	firstname	lastname	email
0	1	John	Doe	upchh@example.com
1	2	John	Doe	upchh@example.com
2	3	John	Doe	upchh@example.com
3	4	John	Doe	upchh@example.com
4	5	John	Doe	upchh@example.com
5	6	John	Doe	upchh@example.com
6	7	John	Doe	upchh@example.com
7	8	John	Doe	upchh@example.com
8	9	John	Doe	upchh@example.com

Read Excel¶

Reads the data from a simple Excel file into a dataframe

Parameters

file_path: str of the path to the Excel file that you want to read sheet_name: str of sheet name

Supported file extensions - xlsx, xlsm, xltx, xltm

read_excel(file_path, sheet_name)

Go to table of contents

In [4]:

ick.read_excel(file_path="./dataset/Book1.xlsx", sheet_name="Sheet1")

Out[4]:

	Name	Age	Country
0	John	54	USA
1	Sam	23	UK
2	Max	44	Pakistan

In [ ]:

Ickle¶

Table Of Contents¶

Getting Started¶

Installation¶

Import¶

DataFrame and Visual Representation¶

DataFrame¶

Visual Representation¶

Basic Properties¶

len¶

columns¶

Modify exisiting column names¶

shape¶

values¶

dtypes¶

Selection of Subsets¶

Select a single column¶

Select multiple columns¶

Boolean Selection¶

Simultaneuous selection of row and column¶

Select columns as strings¶

Select rows as slices¶

Select rows as booleans and lists¶

Add new / Overwrite existing columns¶

Basic And Aggregation Methods¶

head¶

tail¶

Aggregation Methods¶

min¶

max¶

median¶

mean¶

sum¶

var¶

std¶

all¶

any¶

argmax¶

argmin¶

Other Methods¶

isna¶

count¶

unique¶

nunique¶

value_counts¶

rename¶

drop¶

diff¶

pct_change¶

sort_values¶

sample¶

Non Aggregation Methods¶

abs¶

cummin¶

cummax¶

cumsum¶

clip¶

round¶

copy¶

Arithmetic And Comparison Operators¶

Addition¶

Subtraction¶

Multiplication¶

Division¶

Floor Division¶

Power¶

Greater than¶

Less than¶

Greater than equal¶

Lesser than equal¶

Not Equal¶

Equal¶

String Only Methods¶

capitalize¶

center¶

count¶

endswith¶

startswith¶

find¶

len¶

`len`¶

`columns`¶

`shape`¶

`values`¶

`dtypes`¶