# Setup
import pandas as pd
test_user_data = {
'first_name': ['Craig', 'Treasure', 'Ashley', 'Guil'],
'last_name': ['Dennis', 'Porth', 'Boucher', 'Hernandez'],
'balance': [42.42, 25.00, 2.02, 87.00]
}
test_user_names = ['craigsdennis', 'treasure', 'lindsay2000', 'guil']
users = pd.DataFrame(test_user_data, index=test_user_names)
balances = users['balance']
balances
craigsdennis 42.42 treasure 25.00 lindsay2000 2.02 guil 87.00 Name: balance, dtype: float64
Note that the returned Series
has a property of name
set to the column label.
balances.name
'balance'
You can retrieve a row from a DataFrame
by using the loc
property and supply the label.
Note how the returned Series
is labelled by the existing column labels of the users
DataFrame
and the name
property is set to the label.
users.loc['guil']
first_name Guil last_name Hernandez balance 87 Name: guil, dtype: object
Normal list like indices are also available to get a specific row, by using the iloc
( i
ndex loc
ation) property and the appropriate index.
For instance the second row can be accessed by using the index 1
.
users.iloc[1]
first_name Treasure last_name Porth balance 25 Name: treasure, dtype: object
You can definitely chain lookups together, but there's a better way to do it!
Note that this is really running multiple separate lookups. You don't want to force this extra work to happen. However, I do want you to see that you can (and you will see this!)
# CAREFUL: This first retrieves the column `Series`
# and then uses the label.
users['first_name']['craigsdennis']
'Craig'
# CAREFUL: This retrieves the row `Series`
# and then does a lookup for first_name
users.loc['craigsdennis']['first_name']
'Craig'
The DataFrame allows access in a single operation to a specific value by specifying a row and a column. There are a few options here too.
DataFrame.loc
¶The loc
property on DataFrame
allows you to work on both axes, rows and columns, by indexing with a tuple (remember a comma creates a tuple). The first element of the tuple is the row access, second is the column.
Using a single label and single column name will retrieve that specific scalar value.
users.loc['craigsdennis', 'first_name']
'Craig'
DataFrame.at
¶To be even more explicit (💪🏾) that you are trying to just get at a single scalar value, you should use the DataFrame.at
property.
users.at['craigsdennis', 'first_name']
'Craig'
Using the loc
and iloc
properties you can slice an existing DataFrame
into a new one.
In the example below we use :
in the rows axis to select all rows, and we specify which columns we want back using a list in the columns axis, ala NumPy Fancy Indexing.
# All rows and the following ordered columns as a list (fancy, eh?)
users.loc[:, ['balance', 'last_name']]
balance | last_name | |
---|---|---|
craigsdennis | 42.42 | Dennis |
treasure | 25.00 | Porth |
lindsay2000 | 2.02 | Boucher |
guil | 87.00 | Hernandez |
When using a slice with loc
the results are inclusive, meaning they include the right side.
# All rows starting with 'treasure' up to and including 'lindsay2000'
# and all the columns
users.loc['treasure':'lindsay2000', :]
first_name | last_name | balance | |
---|---|---|---|
treasure | Treasure | Porth | 25.00 |
lindsay2000 | Ashley | Boucher | 2.02 |
When using a slice with iloc
the results are exclusive, just like standard Python list slices
# All rows starting with the 2nd up until but not including the 3rd
# and all columns starting at the second until the end
users.iloc[1:2, 1:]
last_name | balance | |
---|---|---|
treasure | Porth | 25.0 |