pandas Dataframe - Basic Operativity¶

In [1]:

import addutils.toc ; addutils.toc.js(ipy_notebook=True)

Out[1]:

In [2]:

from addutils import css_notebook
css_notebook()

Out[2]:

See pandas documentation for more information and examples. Run the code at the end of the Notebook to generate the example files.

In [3]:

import numpy as np
import pandas as pd
from numpy import NaN
from addutils import side_by_side2
from addutils import css_notebook
from addutils import read_txt
from IPython.core.display import HTML
from faker import Factory
css_notebook()

Out[3]:

1 File I/O and DataFrame Generation¶

Pandas reads and format data from many different file formats: txt, csv, web, xls, mat. In this case we use read_csv to read two textual data files.

First have a look to the file in its original form:

In [4]:

read_txt('temp/p01_prices.txt')

Date,AAPL,GOOG,JNJ,XOM

2015-09-21,115.209999,635.440002,93.129997,73.389999

2015-09-22,113.400002,622.690002,93.239998,72.739998

2015-09-23,114.32,622.359985,92.989998,72.300003

2015-09-24,115.0,625.799988,92.480003,72.730003

1.1 Create DataFrames with read_csv¶

This file can be read and formatted at the same time using read_csv. Lets read the two files p01_prices.txt and p01_volumes.txt

In [5]:

prices = pd.read_csv('temp/p01_prices.txt', index_col=0, parse_dates=[0])
volumes = pd.read_csv('temp/p01_volumes.txt', index_col=0, parse_dates=[0])

HTML(side_by_side2(prices, volumes))

Out[5]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	115.209999	635.440002	93.129997	73.389999
2015-09-22	113.400002	622.690002	93.239998	72.739998
2015-09-23	114.320000	622.359985	92.989998	72.300003
2015-09-24	115.000000	625.799988	92.480003	72.730003
2015-09-25	114.709999	611.969971	91.000000	73.230003
2015-09-28	112.440002	594.890015	91.370003	72.599998

	AAPL	JNJ	XOM
Date
2015-09-21	46554300	6971400	10585500
2015-09-22	49809000	10607800	14104200
2015-09-23	35645700	5597300	13777500
2015-09-24	49810600	7178400	14283800

Both "prices" and "volumes" datasets are 2D DataFrame objects:

In [6]:

type(prices)

Out[6]:

pandas.core.frame.DataFrame

1.2 Create DataFrames from Python Dictionaries¶

In [7]:

fakeIT = Factory.create('it_IT')
data = {'Name' : [fakeIT.name() for i in range(5)],
        'Company' : [fakeIT.company() for i in range(5)],
        'City' : [fakeIT.city() for i in range(5)]}

In [8]:

df = pd.DataFrame(data, columns = ['Name','Company','City'])
df

Out[8]:

	Name	Company	City
0	Ruth Milani	Basile-Gentile SPA	Quarto Elda
1	Albino Marchetti	De luca SPA	Donati calabro
2	Vienna Cattaneo	Ferretti, Coppola e Colombo SPA	Gioacchino sardo
3	Elga Vitale	Bianchi, Battaglia e Fontana e figli	San Cecco umbro
4	Diana Greco	Lombardi s.r.l.	Borgo Sarita

1.3 Create DataFrames from Items¶

In [9]:

df = pd.DataFrame.from_items([('Name', [fakeIT.name() for i in range(5)]),
                              ('Company', [fakeIT.company() for i in range(5)])])
df

Out[9]:

	Name	Company
0	Dott. Fatima Carbone	Caputo s.r.l.
1	Sig. Alighiero Sorrentino	De rosa Group
2	Sig. Pablo Monti	Cattaneo SPA
3	Demian Palmieri	Mazza-Martinelli SPA
4	Sig.ra Loretta Martino	Conte-Sartori Group

1.4 Create DataFrames fron Numpy Arrays¶

In [10]:

df = pd.DataFrame(np.array([[2,5],[3,6]]).T, index=list('ab'), columns=['ONE','TWO'])
df

Out[10]:

	ONE	TWO
a	2	3
b	5	6

1.5 DataFrames can be converted in Numpy Arrays¶

In [11]:

np.asarray(df)

Out[11]:

array([[2, 3],
       [5, 6]])

1.6 DataFrames, Series and Panels¶

In Pandas there are 3 main data structure types ("data container" objects):

Series: for one-dimensional data
DataFrames: for bi-dimensional data (matrices)
Panels: for 3D or nD data

For simplicity, in this course we will describe only pandas Series and DataFrames.

2 Automatic Data Alignment¶

As you can see the dates has been interpreted correctly and used as row index. Notice that the rows in the two datafiles are misaligned, this is not a problem in pandas because the Automatic Data Alignment feature: an operation involving the two datasets will simply use NaN for the undefined (misaligned) values

In [12]:

prices*volumes

Out[12]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	5.363521e+09	NaN	6.492465e+08	7.768698e+08
2015-09-22	5.648341e+09	NaN	9.890713e+08	1.025939e+09
2015-09-23	4.075016e+09	NaN	5.204929e+08	9.961133e+08
2015-09-24	5.728219e+09	NaN	6.638585e+08	1.038861e+09
2015-09-25	NaN	NaN	NaN	NaN
2015-09-28	NaN	NaN	NaN	NaN

Which can be better formatted to a "2 decimal places float number" with comma as thousands separator (see Package Options):

In [13]:

pd.set_option('display.float_format', lambda x: '{:,.1f}'.format(x))   # formatting
(prices*volumes).replace('nan', '-')    # replacing NaN

Out[13]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	5,363,520,856.4	-	649,246,461.1	776,869,834.4
2015-09-22	5,648,340,699.6	-	989,071,250.8	1,025,939,479.8
2015-09-23	4,075,016,424.0	-	520,492,915.8	996,113,291.3
2015-09-24	5,728,219,000.0	-	663,858,453.5	1,038,860,816.9
2015-09-25	-	-	-	-
2015-09-28	-	-	-	-

Example: calculate the volume-weighted average price

In [14]:

vwap = (prices*volumes).sum()/volumes.sum()
vwap.dropna()

Out[14]:

AAPL   114.5
JNJ     93.0
XOM     72.8
dtype: float64

3 Indexing¶

3.1 Label-Based Indexing¶

.loc is strictly label based, will raise KeyError when the items are not found, allowed inputs are:

A single label
A list or array of labels [’a’, ’b’, ’c’]
A slice object with labels [’a’:’f’] (note that contrary to usual python slices, both the start and the stop are included!)

In [15]:

HTML(side_by_side2(prices, prices.loc['2012-11-21':'2012-11-27',['AAPL', 'GOOG']]))

Out[15]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	115.2	635.4	93.1	73.4
2015-09-22	113.4	622.7	93.2	72.7
2015-09-23	114.3	622.4	93.0	72.3
2015-09-24	115.0	625.8	92.5	72.7
2015-09-25	114.7	612.0	91.0	73.2
2015-09-28	112.4	594.9	91.4	72.6

	AAPL	GOOG
Date

Columns can be selected without specifying the index:

In [16]:

HTML(side_by_side2(prices, prices[['AAPL', 'GOOG']]))

Out[16]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	115.2	635.4	93.1	73.4
2015-09-22	113.4	622.7	93.2	72.7
2015-09-23	114.3	622.4	93.0	72.3
2015-09-24	115.0	625.8	92.5	72.7
2015-09-25	114.7	612.0	91.0	73.2
2015-09-28	112.4	594.9	91.4	72.6

	AAPL	GOOG
Date
2015-09-21	115.2	635.4
2015-09-22	113.4	622.7
2015-09-23	114.3	622.4
2015-09-24	115.0	625.8
2015-09-25	114.7	612.0
2015-09-28	112.4	594.9

3.2 Position-Based Indexing¶

.iloc is strictly position based, will raise KeyError when the items are out of bounds:

In [17]:

HTML(side_by_side2(prices, prices.iloc[1:5,[0, 1]]))

Out[17]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	115.2	635.4	93.1	73.4
2015-09-22	113.4	622.7	93.2	72.7
2015-09-23	114.3	622.4	93.0	72.3
2015-09-24	115.0	625.8	92.5	72.7
2015-09-25	114.7	612.0	91.0	73.2
2015-09-28	112.4	594.9	91.4	72.6

	AAPL	GOOG
Date
2015-09-22	113.4	622.7
2015-09-23	114.3	622.4
2015-09-24	115.0	625.8
2015-09-25	114.7	612.0

3.3 Mixed Indexing

3.3 Advanced Indexing - .ix¶

.ix will always try first to resolve labeled index (like .loc), then it will fall back on potitional indexing (like .loc).

Rows can be indexed using the ix method. Try by yourself:

prices.ix[1:4,0:2]                                # Position-based Indexing  
prices.ix[:'2012-11-23']                          # Label-based Indexing on index (rows)
prices.ix[:,[2,2,1]]                              # Duplicated values on columns 
prices.ix[::2]                                    # One value every two rows
prices.ix[::-1]                                   # Reverse rows
prices.ix[prices['AAPL'] > 380]                   # Boolean indexing on index
prices.ix[:,[len(c)<4 for c in prices.columns]]   # Boolean indexing on columns

In [18]:

prices.ix[:,[len(c)<4 for c in prices.columns]]

Out[18]:

	JNJ	XOM
Date
2015-09-21	93.1	73.4
2015-09-22	93.2	72.7
2015-09-23	93.0	72.3
2015-09-24	92.5	72.7
2015-09-25	91.0	73.2
2015-09-28	91.4	72.6

4 DataFrame Basic Operations¶

4.1 Reindex/Reorder rows and columns¶

In [19]:

data = np.array([[2,5,8,11],[3,6,9,12]])
d1 = pd.DataFrame(data.T, index=list('abce'), columns=['K','W'])
HTML(side_by_side2(d1, pd.DataFrame(d1, index=list('baez'), columns=['W','K','T'])) )

Out[19]:

	K	W
a	2	3
b	5	6
c	8	9
e	11	12

	W	K	T
b	6.0	5.0	nan
a	3.0	2.0	nan
e	12.0	11.0	nan
z	nan	nan	nan

4.2 Calculate new columns¶

In [20]:

d1['Z'] = d1['W']-d1['K']
d1['B'] = d1['W']>4
d1

Out[20]:

	K	W	Z	B
a	2	3	1	False
b	5	6	1	True
c	8	9	1	True
e	11	12	1	True

4.3 Deleting rows and columns¶

In [21]:

d1['SUM'] = d1.sum(axis=1)
HTML(side_by_side2(d1, d1.drop(['b', 'c'], axis=0), d1.drop(['Z', 'B'], axis=1)))

Out[21]:

	K	W	Z	B	SUM
a	2	3	1	False	6.0
b	5	6	1	True	13.0
c	8	9	1	True	19.0
e	11	12	1	True	25.0

	K	W	Z	B	SUM
a	2	3	1	False	6.0
e	11	12	1	True	25.0

	K	W	SUM
a	2	3	6.0
b	5	6	13.0
c	8	9	19.0
e	11	12	25.0

4.4 Inserting colums in a specific position¶

In [22]:

d2 = d1.copy()  # .copy() method is needed to create a new object.
d2.insert(1, 'Exp(W)', np.exp(d1['W']))
HTML(side_by_side2(d1, d2, space=10))

Out[22]:

	K	W	Z	B	SUM
a	2	3	1	False	6.0
b	5	6	1	True	13.0
c	8	9	1	True	19.0
e	11	12	1	True	25.0

	K	Exp(W)	W	Z	B	SUM
a	2	20.1	3	1	False	6.0
b	5	403.4	6	1	True	13.0
c	8	8,103.1	9	1	True	19.0
e	11	162,754.8	12	1	True	25.0

Example: Indexing rows to create a new column with empty values, then use the Forward Fill Padding to fill the gaps

In [23]:

d1['part'] = d1['K'].ix[:2]
d1['part'] = d1['part'].fillna(method='ffill')
d1

Out[23]:

	K	W	Z	B	SUM	part
a	2	3	1	False	6.0	2.0
b	5	6	1	True	13.0	5.0
c	8	9	1	True	19.0	5.0
e	11	12	1	True	25.0	5.0

4.5 Check if a value or a list of given values are contained in a specific column¶

In [24]:

# TODO: Upgrade side_by_side2 to include series
HTML(side_by_side2(d1, d1['K'].isin([3, 8])))

Out[24]:

	K	W	Z	B	SUM	part
a	2	3	1	False	6.0	2.0
b	5	6	1	True	13.0	5.0
c	8	9	1	True	19.0	5.0
e	11	12	1	True	25.0	5.0

	K
a	False
b	False
c	True
e	False

4.6 Rename columns¶

In [25]:

HTML(side_by_side2(d1, d1.rename(columns={'K':'ONE','W':'TWO','Z':'THREE'})))

Out[25]:

	K	W	Z	B	SUM	part
a	2	3	1	False	6.0	2.0
b	5	6	1	True	13.0	5.0
c	8	9	1	True	19.0	5.0
e	11	12	1	True	25.0	5.0

	ONE	TWO	THREE	B	SUM	part
a	2	3	1	False	6.0	2.0
b	5	6	1	True	13.0	5.0
c	8	9	1	True	19.0	5.0
e	11	12	1	True	25.0	5.0

4.7 Iterate efficiently through rows¶

iterrows returns an iterator yielding each index value along with a Series containing the data in each row:

In [26]:

for row_index, row in d1.iterrows():
    print(row_index, '**', ' - '.join([str(item) for item in row]))

a ** 2 - 3 - 1 - False - 6.0 - 2.0
b ** 5 - 6 - 1 - True - 13.0 - 5.0
c ** 8 - 9 - 1 - True - 19.0 - 5.0
e ** 11 - 12 - 1 - True - 25.0 - 5.0

itertuples returns an iterator yielding a tuple for each row in the DataFrame. The first element of the tuple is the row’s corresponding index value, while the remaining elements are the row values:

In [27]:

for t in d1.itertuples():
    print(t)

('a', 2, 3, 1, False, 6.0, 2.0)
('b', 5, 6, 1, True, 13.0, 5.0)
('c', 8, 9, 1, True, 19.0, 5.0)
('e', 11, 12, 1, True, 25.0, 5.0)

5 Duplicated Data¶

5.1 Find duplicated data in columns¶

In [28]:

d3 = pd.read_csv('temp/p01_d2.csv', index_col=0)
d3['a dup'] = d3.duplicated(['a'])
d3['a+b dup'] = d3.duplicated(['a', 'b'])
d3['a+b dup - take last'] = d3.duplicated(['a', 'b'], keep='last')
d3

Out[28]:

	a	b	c	a dup	a+b dup	a+b dup - take last
0	one	x	-0.3	False	False	False
1	one	y	-0.1	True	False	False
2	two	y	1.2	False	False	True
3	three	x	-0.5	False	False	False
4	two	y	1.3	True	True	False

5.2 Remove Duplicates¶

In [29]:

d3.drop_duplicates(['a', 'b'],keep='last')

Out[29]:

	a	b	c	a dup	a+b dup	a+b dup - take last
0	one	x	-0.3	False	False	False
1	one	y	-0.1	True	False	False
3	three	x	-0.5	False	False	False
4	two	y	1.3	True	True	False

6 Working with Large Arrays¶

6.1 Control the DataFrame memory occupation¶

Let's start by generating a DataFrame from a Numpy Array. We'll see than there is no memory overhead on DataFrame Values:

In [30]:

rows, cols = 100, 100
np_array = np.array(np.random.randn(rows, cols), dtype=np.float64)
d4 = pd.DataFrame(np_array)
print ('Rows x Cols x 8:                     ', rows*cols*8)
print ('np Array Memory Occupation:          ', np_array.nbytes)
print ('Dataframe Values Memory Occupation:  ', d4.values.nbytes)
print ('Dataframe Index Memory Occupation:   ', d4.index.nbytes)
print ('Dataframe Columns Memory Occupation: ', d4.columns.nbytes)

Rows x Cols x 8:                      80000
np Array Memory Occupation:           80000
Dataframe Values Memory Occupation:   80000
Dataframe Index Memory Occupation:    800
Dataframe Columns Memory Occupation:  800

To reduce the memory occupation it's possible to change the value's dtype:

In [31]:

d4 = d4.astype(dtype=np.float16)
print ('Dataframe Values Memory Occupation:  ', d4.values.nbytes)

Dataframe Values Memory Occupation:   20000

If the data is sparse the Dataframe can be sparsified as well to save further resources with the to_sparse() method:

In [32]:

d4.ix[2:,4:] = np.nan
print ('Dataframe Values Memory Occupation:  ', d4.values.nbytes)
d4 = d4.to_sparse()
print ('Dataframe Values Memory Occupation:  ', d4.values.nbytes)

Dataframe Values Memory Occupation:   20000
Dataframe Values Memory Occupation:   80000

In this case rows and colums are np.int64 arrays:

In [33]:

d4.columns

Out[33]:

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
            34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
            51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
            68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
            85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
           dtype='int64')

6.2 Explore large arrays¶

Working with large arrays: in Excel is difficult to explore arrays with thousands of lines and columns. Explore the pandas capabilities with the following code. The first line visualize the firts two lines, while the second actually load the whole file. Try to do the same in Excel for comparison.

In [34]:

d3 = pd.read_csv('example_data/p01_d3.csv.gz', compression='gzip')

In [35]:

for col in d3.columns:         
    print (col, end=' - ')

Istat - Comune - Provincia - Regione - Prefisso - CAP - CodFisco - Abitanti - Link -

Try by yourself:

d3.head()
d3[d3.columns[:3]].head()
d3[d3.columns[-4:-1]].tail()
d3.ix[1000:1010, :7]
d3.ix[:, 'Abitanti'].describe()

In [36]:

d3[d3.columns[-4:-1]].tail()

Out[36]:

	CAP	CodFisco	Abitanti
8087	33020	M200	607
8088	13848	M201	1152
8089	87040	M202	2413
8090	83030	M203	1232
8091	89867	M204	2055

7 Column pct_change and shift¶

It is possibile to add multiple new columns to a DataFrame.

In [37]:

data = np.array([[3, 5, 7, 10, 13, 16, 56, 72],
                 [8, 16, 28, 37, 45, 57, 69, 90],
                 [3, 6, NaN, NaN, 15, 18, NaN, NaN],
                 [1, 2, 4, 7, 11, 16, 65, 88],
                 [NaN, NaN, NaN, NaN, 16, 19, 82, 91]])
d4 = pd.DataFrame(data.T, columns=['one', 'two', 'three', 'four', 'five'])
d4

Out[37]:

	one	two	three	four	five
0	3.0	8.0	3.0	1.0	nan
1	5.0	16.0	6.0	2.0	nan
2	7.0	28.0	nan	4.0	nan
3	10.0	37.0	nan	7.0	nan
4	13.0	45.0	15.0	11.0	16.0
5	16.0	57.0	18.0	16.0	19.0
6	56.0	69.0	nan	65.0	82.0
7	72.0	90.0	nan	88.0	91.0

In [38]:

d4[['one ret','two ret']] = d4[['one','two']].pct_change()+1
d4

Out[38]:

	one	two	three	four	five	one ret	two ret
0	3.0	8.0	3.0	1.0	nan	nan	nan
1	5.0	16.0	6.0	2.0	nan	1.7	2.0
2	7.0	28.0	nan	4.0	nan	1.4	1.8
3	10.0	37.0	nan	7.0	nan	1.4	1.3
4	13.0	45.0	15.0	11.0	16.0	1.3	1.2
5	16.0	57.0	18.0	16.0	19.0	1.2	1.3
6	56.0	69.0	nan	65.0	82.0	3.5	1.2
7	72.0	90.0	nan	88.0	91.0	1.3	1.3

In [39]:

d4['four var'] = np.log(d4['four'] - d4['four'].shift())
d4

Out[39]:

	one	two	three	four	five	one ret	two ret	four var
0	3.0	8.0	3.0	1.0	nan	nan	nan	nan
1	5.0	16.0	6.0	2.0	nan	1.7	2.0	0.0
2	7.0	28.0	nan	4.0	nan	1.4	1.8	0.7
3	10.0	37.0	nan	7.0	nan	1.4	1.3	1.1
4	13.0	45.0	15.0	11.0	16.0	1.3	1.2	1.4
5	16.0	57.0	18.0	16.0	19.0	1.2	1.3	1.6
6	56.0	69.0	nan	65.0	82.0	3.5	1.2	3.9
7	72.0	90.0	nan	88.0	91.0	1.3	1.3	3.1

8 Reindex¶

DataFrame.reindex method conforms a DataFrame to a new index, filling cells with no values. It is possible to use this method to rearrange columns.

In [40]:

d5 = d4.reindex(columns=['one','one ret','two','two ret','four','four var'])
d5

Out[40]:

	one	one ret	two	two ret	four	four var
0	3.0	nan	8.0	nan	1.0	nan
1	5.0	1.7	16.0	2.0	2.0	0.0
2	7.0	1.4	28.0	1.8	4.0	0.7
3	10.0	1.4	37.0	1.3	7.0	1.1
4	13.0	1.3	45.0	1.2	11.0	1.4
5	16.0	1.2	57.0	1.3	16.0	1.6
6	56.0	3.5	69.0	1.2	65.0	3.9
7	72.0	1.3	90.0	1.3	88.0	3.1

Notice that DataFrame.reindex gives a new view, hence d4 isn't changed.

In [41]:

d4

Out[41]:

	one	two	three	four	five	one ret	two ret	four var
0	3.0	8.0	3.0	1.0	nan	nan	nan	nan
1	5.0	16.0	6.0	2.0	nan	1.7	2.0	0.0
2	7.0	28.0	nan	4.0	nan	1.4	1.8	0.7
3	10.0	37.0	nan	7.0	nan	1.4	1.3	1.1
4	13.0	45.0	15.0	11.0	16.0	1.3	1.2	1.4
5	16.0	57.0	18.0	16.0	19.0	1.2	1.3	1.6
6	56.0	69.0	nan	65.0	82.0	3.5	1.2	3.9
7	72.0	90.0	nan	88.0	91.0	1.3	1.3	3.1

9 More on Indexing: Multi Index¶

In [42]:

d6 = pd.read_csv('temp/p01_d4.csv', index_col=['Country',
                                               'Number',
                                               'Dir'])
d6 = d6.sortlevel()
d6

Out[42]:

			0	1	2
Country	Number	Dir
Fra	one	x	-0.1	0.3	-0.2
	two	y	0.3	-0.9	0.3
	two	z	1.8	1.0	-1.4
Ger	one	x	-0.7	-0.5	0.4
Jap	one	x	1.2	1.2	-0.2
Jap	two	x	-0.4	0.5	-0.6
USA	one	y	-1.9	-0.9	0.5
USA	one	z	-0.1	-1.0	0.7

Try by yourself:

d6.ix['Fra']
d6.ix['Fra', 'two']
d6.ix['Fra':'Ger']
d6.reorder_levels([2,1,0], axis=0).sortlevel(0)
d6.reset_index(level=1)

In [43]:

d6.ix['Fra'] 

Out[43]:

		0	1	2
Number	Dir
one	x	-0.1	0.3	-0.2
two	y	0.3	-0.9	0.3
two	z	1.8	1.0	-1.4

10 Package Options¶

The way the DataFrames are displayed can be customized ijn many ways: (See documentation):

In [44]:

print ('Display Max Rows: \t',  pd.get_option('display.max_rows'))
pd.describe_option('display.max_rows')

Display Max Rows: 	 60
display.max_rows : int
    If max_rows is exceeded, switch to truncate view. Depending on
    `large_repr`, objects are either centrally truncated or printed as
    a summary view. 'None' value means unlimited.

    In case python/IPython is running in a terminal and `large_repr`
    equals 'truncate' this can be set to 0 and pandas will auto-detect
    the height of the terminal and print a truncated object which fits
    the screen height. The IPython notebook, IPython qtconsole, or
    IDLE do not run in a terminal and hence it is not possible to do
    correct auto-detection.
    [default: 60] [currently: 60]

In [45]:

pd.set_option('display.max_rows', 10)
pd.get_option('display.max_rows')

Out[45]:

In [46]:

pd.reset_option('display.max_rows')
pd.get_option('display.max_rows')

Out[46]:

Try by yourself:

pd.describe_option('display.chop_threshold')
pd.describe_option('display.colheader_justify')
pd.describe_option('display.column_space')
pd.describe_option('display.date_dayfirst')
pd.describe_option('display.date_yearfirst')
pd.describe_option('display.encoding')
pd.describe_option('display.expand_frame_repr')
pd.describe_option('display.float_format')
pd.describe_option('display.max_columns')
pd.describe_option('display.max_colwidth')
pd.describe_option('display.max_rows')
pd.describe_option('display.notebook_repr_html')
pd.describe_option('display.precision')
# for more options see documentation...

In [47]:

pd.set_option('display.precision', 2)
pd.set_option('display.notebook_repr_html', False)
prices

Out[47]:

            AAPL  GOOG  JNJ  XOM
Date                            
2015-09-21 115.2 635.4 93.1 73.4
2015-09-22 113.4 622.7 93.2 72.7
2015-09-23 114.3 622.4 93.0 72.3
2015-09-24 115.0 625.8 92.5 72.7
2015-09-25 114.7 612.0 91.0 73.2
2015-09-28 112.4 594.9 91.4 72.6

In [48]:

pd.set_option('display.notebook_repr_html', True)
prices

Out[48]:

	AAPL	GOOG	JNJ	XOM
Date
2015-09-21	115.2	635.4	93.1	73.4
2015-09-22	113.4	622.7	93.2	72.7
2015-09-23	114.3	622.4	93.0	72.3
2015-09-24	115.0	625.8	92.5	72.7
2015-09-25	114.7	612.0	91.0	73.2
2015-09-28	112.4	594.9	91.4	72.6

Visit www.add-for.com for more tutorials and updates.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.