import sys
sys.executable
'C:\\Users\\Karishma\\anaconda3\\envs\\ickle\\python.exe'
Data Analysis Library for Python
This Jupyter Notebook serves as the documentation for Ickle.
You can contribute to Ickle here: https://github.com/karishmashuklaa/ickle
import ickle as ick
A DataFrame
holds two dimensional heterogenous data. It accepts dictionary as input, with Numpy arrays as values and strings as column names.
Parameters:
data
: A dictionary of strings mapped to Numpy arrays. The key will become the column name.import numpy as np
import ickle as ick
name = np.array(['John', 'Sam', 'Tina', 'Josh', 'Jack', 'Jill'])
place = np.array(['Kolkata', 'Mumbai', 'Delhi', 'Mumbai', 'Mumbai', 'Mumbai'])
weight = np.array([57, 70, 54, 59, 62, 70])
married = np.array([True, False, True, False, False, False])
data = {'name': name, 'place': place, 'weight': weight, 'married': married}
df = ick.DataFrame(data)
DataFrame
can be displayed in the following manner
df
name | place | weight | married | |
---|---|---|---|---|
0 | John | Kolkata | 57 | True |
1 | Sam | Mumbai | 70 | False |
2 | Tina | Delhi | 54 | True |
3 | Josh | Mumbai | 59 | False |
4 | Jack | Mumbai | 62 | False |
5 | Jill | Mumbai | 70 | False |
We will use the above DataFrame
throughout the notebook
len
¶returns: the number of rows in the DataFrame
len(df)
6
columns
¶returns: list of column names
df.columns
['name', 'place', 'weight', 'married']
df.columns = ['NAME', 'PLACE', 'WEIGHT', 'MARRIED']
df
NAME | PLACE | WEIGHT | MARRIED | |
---|---|---|---|---|
0 | John | Kolkata | 57 | True |
1 | Sam | Mumbai | 70 | False |
2 | Tina | Delhi | 54 | True |
3 | Josh | Mumbai | 59 | False |
4 | Jack | Mumbai | 62 | False |
5 | Jill | Mumbai | 70 | False |
shape
¶returns: two-item tuple of number of rows and columns in the DataFrame
df.shape
(6, 4)
values
¶returns: a single 2D NumPy array of all the columns of data.
df.values
array([['John', 'Kolkata', 57, True], ['Sam', 'Mumbai', 70, False], ['Tina', 'Delhi', 54, True], ['Josh', 'Mumbai', 59, False], ['Jack', 'Mumbai', 62, False], ['Jill', 'Mumbai', 70, False]], dtype=object)
dtypes
¶returns: a two-column DataFrame
of column names in one column and their data type in the other
df.dtypes
Column Name | Data Type | |
---|---|---|
0 | NAME | string |
1 | PLACE | string |
2 | WEIGHT | int |
3 | MARRIED | bool |
by passing the name of column as a string
df['NAME']
NAME | |
---|---|
0 | John |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
by passing column names as a list of strings
df[['NAME', 'PLACE']]
NAME | PLACE | |
---|---|---|
0 | John | Kolkata |
1 | Sam | Mumbai |
2 | Tina | Delhi |
3 | Josh | Mumbai |
4 | Jack | Mumbai |
5 | Jill | Mumbai |
bool_sel = df['WEIGHT'] > 60
df[bool_sel]
NAME | PLACE | WEIGHT | MARRIED | |
---|---|---|---|---|
0 | Sam | Mumbai | 70 | False |
1 | Jack | Mumbai | 62 | False |
2 | Jill | Mumbai | 70 | False |
df[row, col]
df[0,2]
WEIGHT | |
---|---|
0 | 57 |
df[0, 'WEIGHT']
WEIGHT | |
---|---|
0 | 57 |
df[:1, 'WEIGHT']
WEIGHT | |
---|---|
0 | 57 |
bool_row = df['MARRIED']
df[bool_row, 'WEIGHT']
WEIGHT | |
---|---|
0 | 57 |
1 | 54 |
df['AGE'] = np.array([21, 41, 22, 42, 32, 25])
df
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
3 | Josh | Mumbai | 59 | False | 42 |
4 | Jack | Mumbai | 62 | False | 32 |
5 | Jill | Mumbai | 70 | False | 25 |
df.head()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
3 | Josh | Mumbai | 59 | False | 42 |
4 | Jack | Mumbai | 62 | False | 32 |
tail(n) return the last n rows.By default n=5
df.tail()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | Sam | Mumbai | 70 | False | 41 |
1 | Tina | Delhi | 54 | True | 22 |
2 | Josh | Mumbai | 59 | False | 42 |
3 | Jack | Mumbai | 62 | False | 32 |
4 | Jill | Mumbai | 70 | False | 25 |
df.min()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | Jack | Delhi | 54 | False | 21 |
df.max()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | Tina | Mumbai | 70 | True | 42 |
df.median()
WEIGHT | MARRIED | AGE | |
---|---|---|---|
0 | 60.500 | 0.000 | 28.500 |
df.mean()
WEIGHT | MARRIED | AGE | |
---|---|---|---|
0 | 62.000 | 0.333 | 30.500 |
df.sum()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | JohnSamTinaJoshJackJill | KolkataMumbaiDelhiMumbaiMumbaiMumbai | 372 | 2 | 183 |
df.var()
WEIGHT | MARRIED | AGE | |
---|---|---|---|
0 | 37.667 | 0.222 | 72.917 |
df.std()
WEIGHT | MARRIED | AGE | |
---|---|---|---|
0 | 6.137 | 0.471 | 8.539 |
df.all()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | Jill | Mumbai | True | False | True |
df.any()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | True | True | True |
df.argmax()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | 2 | 1 | 1 | 0 | 3 |
df.argmin()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | 4 | 2 | 2 | 1 | 0 |
df.isna()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | False | False | False | False | False |
1 | False | False | False | False | False |
2 | False | False | False | False | False |
3 | False | False | False | False | False |
4 | False | False | False | False | False |
5 | False | False | False | False | False |
df.count()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | 6 | 6 | 6 | 6 | 6 |
dfs = df.unique()
dfs[3]
MARRIED | |
---|---|
0 | False |
1 | True |
df.nunique()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | 6 | 3 | 5 | 2 | 6 |
value_counts(normalize=False)
Returns the frequency of each unique value for each column.
Parameters
normalize
: bool
If True, returns the relative frequencies(percent)
dfs = df.value_counts()
dfs[1]
PLACE | count | |
---|---|---|
0 | Mumbai | 4 |
1 | Delhi | 1 |
2 | Kolkata | 1 |
rename(columns)
Renames columns in the DataFrame
Parameters
columns
: dict
A dictionary mapping the old column name to the new column name
df.rename({'WEIGHT': 'WEIGHT (kg)'})
NAME | PLACE | WEIGHT (kg) | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
3 | Josh | Mumbai | 59 | False | 42 |
4 | Jack | Mumbai | 62 | False | 32 |
5 | Jill | Mumbai | 70 | False | 25 |
drop(columns)
Drops one or more columns from a DataFrame
Parameters
columns
: str or list of strings
df.drop('AGE')
NAME | PLACE | WEIGHT | MARRIED | |
---|---|---|---|---|
0 | John | Kolkata | 57 | True |
1 | Sam | Mumbai | 70 | False |
2 | Tina | Delhi | 54 | True |
3 | Josh | Mumbai | 59 | False |
4 | Jack | Mumbai | 62 | False |
5 | Jill | Mumbai | 70 | False |
diff(n=1)
Take the difference between the current value and the nth value above it
Parameters
n
: int
df.diff(n=2)
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | nan | nan | nan |
1 | Sam | Mumbai | nan | nan | nan |
2 | Tina | Delhi | -3.000 | 0.000 | 1.000 |
3 | Josh | Mumbai | -11.000 | 0.000 | 1.000 |
4 | Jack | Mumbai | 8.000 | -1.000 | 10.000 |
5 | Jill | Mumbai | 11.000 | 0.000 | -17.000 |
pct_change(n=1)
Take the percentage difference between the current value and the nth value above it
Parameters
n
: int
df.pct_change(n=1)
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | nan | nan | nan |
1 | Sam | Mumbai | 0.228 | -1.000 | 0.952 |
2 | Tina | Delhi | -0.229 | inf | -0.463 |
3 | Josh | Mumbai | 0.093 | -1.000 | 0.909 |
4 | Jack | Mumbai | 0.051 | nan | -0.238 |
5 | Jill | Mumbai | 0.129 | nan | -0.219 |
sort_values(by, asc=True)
Sort the DataFrame by one or more values
Parameters
by
: str or list of column names
asc
: boolean of sorting order
df.sort_values('AGE', asc=False)
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | Josh | Mumbai | 59 | False | 42 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Jack | Mumbai | 62 | False | 32 |
3 | Jill | Mumbai | 70 | False | 25 |
4 | Tina | Delhi | 54 | True | 22 |
5 | John | Kolkata | 57 | True | 21 |
sample(n=None, frac=None, replace=False, seed=None)
Randomly samples rows of the DataFrame
Parameters
n
: int
number of rows to return
frac
: float
Proportion of the data to sample
replace
: bool
Whether or not to sample with replacement
seed
: int
Seed the random number generator
df.sample(n=4, frac=0.5, replace=True)
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | Jack | Mumbai | 62 | False | 32 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
df.abs()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
3 | Josh | Mumbai | 59 | False | 42 |
4 | Jack | Mumbai | 62 | False | 32 |
5 | Jill | Mumbai | 70 | False | 25 |
df.cummin()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 57 | False | 21 |
2 | Tina | Delhi | 54 | False | 21 |
3 | Josh | Mumbai | 54 | False | 21 |
4 | Jack | Mumbai | 54 | False | 21 |
5 | Jill | Mumbai | 54 | False | 21 |
df.cummax()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | True | 41 |
2 | Tina | Delhi | 70 | True | 41 |
3 | Josh | Mumbai | 70 | True | 42 |
4 | Jack | Mumbai | 70 | True | 42 |
5 | Jill | Mumbai | 70 | True | 42 |
df.cumsum()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | 1 | 21 |
1 | Sam | Mumbai | 127 | 1 | 62 |
2 | Tina | Delhi | 181 | 2 | 84 |
3 | Josh | Mumbai | 240 | 2 | 126 |
4 | Jack | Mumbai | 302 | 2 | 158 |
5 | Jill | Mumbai | 372 | 2 | 183 |
clip(lower=None, upper=None)
All values less than lower
will be set to lower
All values greater than upper
will be set to upper
Parameters
lower
: number or None
upper
: number or None
df.clip(lower=55, upper=60)
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | 55 | 55 |
1 | Sam | Mumbai | 60 | 55 | 55 |
2 | Tina | Delhi | 55 | 55 | 55 |
3 | Josh | Mumbai | 59 | 55 | 55 |
4 | Jack | Mumbai | 60 | 55 | 55 |
5 | Jill | Mumbai | 60 | 55 | 55 |
df.round(n=1)
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
3 | Josh | Mumbai | 59 | False | 42 |
4 | Jack | Mumbai | 62 | False | 32 |
5 | Jill | Mumbai | 70 | False | 25 |
df.copy()
NAME | PLACE | WEIGHT | MARRIED | AGE | |
---|---|---|---|---|---|
0 | John | Kolkata | 57 | True | 21 |
1 | Sam | Mumbai | 70 | False | 41 |
2 | Tina | Delhi | 54 | True | 22 |
3 | Josh | Mumbai | 59 | False | 42 |
4 | Jack | Mumbai | 62 | False | 32 |
5 | Jill | Mumbai | 70 | False | 25 |
Arithmetic and Comparison Operators only work with numerical columns
df_op = df['WEIGHT']
df_op
WEIGHT | |
---|---|
0 | 57 |
1 | 70 |
2 | 54 |
3 | 59 |
4 | 62 |
5 | 70 |
df_op + 2
WEIGHT | |
---|---|
0 | 59 |
1 | 72 |
2 | 56 |
3 | 61 |
4 | 64 |
5 | 72 |
2 + df_op
WEIGHT | |
---|---|
0 | 59 |
1 | 72 |
2 | 56 |
3 | 61 |
4 | 64 |
5 | 72 |
df_op - 2
WEIGHT | |
---|---|
0 | 55 |
1 | 68 |
2 | 52 |
3 | 57 |
4 | 60 |
5 | 68 |
2 - df_op
WEIGHT | |
---|---|
0 | -55 |
1 | -68 |
2 | -52 |
3 | -57 |
4 | -60 |
5 | -68 |
df_op * 2
WEIGHT | |
---|---|
0 | 114 |
1 | 140 |
2 | 108 |
3 | 118 |
4 | 124 |
5 | 140 |
2 * df_op
WEIGHT | |
---|---|
0 | 114 |
1 | 140 |
2 | 108 |
3 | 118 |
4 | 124 |
5 | 140 |
df_op / 2
WEIGHT | |
---|---|
0 | 28.500 |
1 | 35.000 |
2 | 27.000 |
3 | 29.500 |
4 | 31.000 |
5 | 35.000 |
2 / df_op
WEIGHT | |
---|---|
0 | 0.035 |
1 | 0.029 |
2 | 0.037 |
3 | 0.034 |
4 | 0.032 |
5 | 0.029 |
df_op // 2
WEIGHT | |
---|---|
0 | 28 |
1 | 35 |
2 | 27 |
3 | 29 |
4 | 31 |
5 | 35 |
2 // df_op
WEIGHT | |
---|---|
0 | 0 |
1 | 0 |
2 | 0 |
3 | 0 |
4 | 0 |
5 | 0 |
df_op ** 3
WEIGHT | |
---|---|
0 | 185193 |
1 | 343000 |
2 | 157464 |
3 | 205379 |
4 | 238328 |
5 | 343000 |
3 ** df_op
WEIGHT | |
---|---|
0 | 886634019 |
1 | -102221863 |
2 | 32838297 |
3 | -610228421 |
4 | 703701817 |
5 | -102221863 |
df_op > 50
WEIGHT | |
---|---|
0 | True |
1 | True |
2 | True |
3 | True |
4 | True |
5 | True |
df_op < 55
WEIGHT | |
---|---|
0 | False |
1 | False |
2 | True |
3 | False |
4 | False |
5 | False |
df_op >= 75
WEIGHT | |
---|---|
0 | False |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
df_op <= 55
WEIGHT | |
---|---|
0 | False |
1 | False |
2 | True |
3 | False |
4 | False |
5 | False |
df_op != 55
WEIGHT | |
---|---|
0 | True |
1 | True |
2 | True |
3 | True |
4 | True |
5 | True |
df_op == 70
WEIGHT | |
---|---|
0 | False |
1 | True |
2 | False |
3 | False |
4 | False |
5 | True |
All the strings behave in the same manner as built-in string functions in Python.
These methods can be used only with columns
capitalize(col)
df.str.capitalize('NAME')
NAME | |
---|---|
0 | John |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
center(col, width, fillchar=None)
df.str.center('NAME', 10, 'a')
NAME | |
---|---|
0 | aaaJohnaaa |
1 | aaaSamaaaa |
2 | aaaTinaaaa |
3 | aaaJoshaaa |
4 | aaaJackaaa |
5 | aaaJillaaa |
count(col, sub, start=None, stop=None)
df.str.count('PLACE', 'Mumbai')
PLACE | |
---|---|
0 | 0 |
1 | 1 |
2 | 0 |
3 | 1 |
4 | 1 |
5 | 1 |
endswith(col, suffix, start=None, stop=None)
df.str.endswith('NAME', 'n')
NAME | |
---|---|
0 | True |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
startswith(col, suffix, start=None, stop=None)
df.str.startswith('NAME', 'J')
NAME | |
---|---|
0 | True |
1 | False |
2 | False |
3 | True |
4 | True |
5 | True |
find(col, sub, start=None, stop=None)
df.str.find('NAME', 'Tina')
NAME | |
---|---|
0 | -1 |
1 | -1 |
2 | 0 |
3 | -1 |
4 | -1 |
5 | -1 |
len(col)
df.str.len('NAME')
NAME | |
---|---|
0 | 4 |
1 | 3 |
2 | 4 |
3 | 4 |
4 | 4 |
5 | 4 |
get(col, item)
df.str.get('NAME', 0)
NAME | |
---|---|
0 | J |
1 | S |
2 | T |
3 | J |
4 | J |
5 | J |
index(col, sub, start=None, stop=None)
df.str.index('NAME', '')
NAME | |
---|---|
0 | 0 |
1 | 0 |
2 | 0 |
3 | 0 |
4 | 0 |
5 | 0 |
isalnum(col)
df.str.isalnum('NAME')
NAME | |
---|---|
0 | True |
1 | True |
2 | True |
3 | True |
4 | True |
5 | True |
isalpha(col)
df.str.isalpha('NAME')
NAME | |
---|---|
0 | True |
1 | True |
2 | True |
3 | True |
4 | True |
5 | True |
isdecimal(col)
df.str.isdecimal('NAME')
NAME | |
---|---|
0 | False |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
isnumeric(col)
df.str.isnumeric('NAME')
NAME | |
---|---|
0 | False |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
isspace(col)
df.str.isspace('NAME')
NAME | |
---|---|
0 | False |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
istitle(col)
df.str.istitle('NAME')
NAME | |
---|---|
0 | True |
1 | True |
2 | True |
3 | True |
4 | True |
5 | True |
isupper(col)
df.str.isupper('NAME')
NAME | |
---|---|
0 | False |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
islower(col)
df.str.islower('NAME')
NAME | |
---|---|
0 | False |
1 | False |
2 | False |
3 | False |
4 | False |
5 | False |
lstrip(col, chars)
df.str.lstrip('NAME', 'o')
NAME | |
---|---|
0 | John |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
rstrip(col, chars)
df.str.rstrip('NAME', 'o')
NAME | |
---|---|
0 | John |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
strip(col, chars)
df.str.strip('NAME', 'o')
NAME | |
---|---|
0 | John |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
replace(col, old, new, count=None)
df.str.replace('NAME', 'John', 'Cena')
NAME | |
---|---|
0 | Cena |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
swapcase(col)
df.str.swapcase('NAME')
NAME | |
---|---|
0 | jOHN |
1 | sAM |
2 | tINA |
3 | jOSH |
4 | jACK |
5 | jILL |
title(col)
df.str.title('NAME')
NAME | |
---|---|
0 | John |
1 | Sam |
2 | Tina |
3 | Josh |
4 | Jack |
5 | Jill |
lower(col)
df.str.lower('NAME')
NAME | |
---|---|
0 | john |
1 | sam |
2 | tina |
3 | josh |
4 | jack |
5 | jill |
upper(col)
df.str.upper('NAME')
NAME | |
---|---|
0 | JOHN |
1 | SAM |
2 | TINA |
3 | JOSH |
4 | JACK |
5 | JILL |
zfill(col, width)
df.str.zfill('NAME', 10)
NAME | |
---|---|
0 | 000000John |
1 | 0000000Sam |
2 | 000000Tina |
3 | 000000Josh |
4 | 000000Jack |
5 | 000000Jill |
encode(col, encoding='utf-8', errors='strict')
df.str.encode('NAME')
NAME | |
---|---|
0 | b'John' |
1 | b'Sam' |
2 | b'Tina' |
3 | b'Josh' |
4 | b'Jack' |
5 | b'Jill' |
Creates a pivot table from one or two 'grouping' columns
Parameters
rows
: str of column name to group by (Optional)
columns
: str of column name to group by (Optional)
values
: str of column name to aggregate (Required)
aggfunc
: str of aggregation function
df.pivot_table(rows='NAME', columns='PLACE', values='WEIGHT', aggfunc='mean')
NAME | Delhi | Kolkata | Mumbai | |
---|---|---|---|---|
0 | Jack | nan | nan | 62.000 |
1 | Jill | nan | nan | 70.000 |
2 | John | nan | 57.000 | nan |
3 | Josh | nan | nan | 59.000 |
4 | Sam | nan | nan | 70.000 |
5 | Tina | 54.000 | nan | nan |
Reads a CSV file into a dataframe
Parameters
filelocation
: str of the file location
header
: int of the number of lines to skip in case of multi-header files, None in case of header-less files (Optional)
read_csv(file, header)
data = ick.read_csv('./dataset/employee.csv');
data.head()
dept | race | gender | salary | |
---|---|---|---|---|
0 | Houston Police Department-HPD | White | Male | 45279 |
1 | Houston Fire Department (HFD) | White | Male | 63166 |
2 | Houston Police Department-HPD | Black | Male | 66614 |
3 | Public Works & Engineering-PWE | Asian | Male | 71680 |
4 | Houston Airport System (HAS) | White | Male | 42390 |
data = ick.read_csv('./dataset/employee.csv', header=None);
data.head()
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | Houston Police Department-HPD | White | Male | 45279 |
1 | Houston Fire Department (HFD) | White | Male | 63166 |
2 | Houston Police Department-HPD | Black | Male | 66614 |
3 | Public Works & Engineering-PWE | Asian | Male | 71680 |
4 | Houston Airport System (HAS) | White | Male | 42390 |
Reads the data from SQL query into a dataframe
Parameters
sql
: str of the sql statement to be executed
drivername
: str of driver engine of the database
username
: str of username to connect to the database
password
: str of password to connect to the database
host
: str of host to connect to the database
port
: int of port to connect to the database
database
: str of database name
read_sql(sql, drivername, username, password, host, port, database)
drivername="postgresql"
username="postgres"
password="test123"
host="localhost"
port="5432"
database="spotify_trends"
sql="SELECT id, firstname, lastname, email FROM authors"
ick.read_sql(sql,drivername,username,password,host,port,database)
id | firstname | lastname | ||
---|---|---|---|---|
0 | 1 | John | Doe | upchh@example.com |
1 | 2 | John | Doe | upchh@example.com |
2 | 3 | John | Doe | upchh@example.com |
3 | 4 | John | Doe | upchh@example.com |
4 | 5 | John | Doe | upchh@example.com |
5 | 6 | John | Doe | upchh@example.com |
6 | 7 | John | Doe | upchh@example.com |
7 | 8 | John | Doe | upchh@example.com |
8 | 9 | John | Doe | upchh@example.com |
Reads the data from a simple Excel file into a dataframe
Parameters
file_path
: str of the path to the Excel file that you want to read
sheet_name
: str of sheet name
Supported file extensions - xlsx, xlsm, xltx, xltm
read_excel(file_path, sheet_name)
ick.read_excel(file_path="./dataset/Book1.xlsx", sheet_name="Sheet1")
Name | Age | Country | |
---|---|---|---|
0 | John | 54 | USA |
1 | Sam | 23 | UK |
2 | Max | 44 | Pakistan |