Table of Contents
Introduction
In this section, we will discuss accessing the rows in a dataframe by both label (loc
attribute) and position (iloc
attribute).
We first import the required libraries and modules.
1import numpy as np
2import pandas as pd
3import random as rd
The next step is to create a DataFrame from a dictionary of lists with the help of the random
module.
1a=['Tom','Gary','Lois','Wendy','Betty']
2b=rd.choices(range(50,85),k=5)
3c=np.random.uniform(1.6, 1.9, 5).round(2)
4
5df = pd.DataFrame({'Weight':b,
6 'Height':c}, index=a)
7df
Weight | Height | |
---|---|---|
Tom | 58 | 1.65 |
Gary | 54 | 1.84 |
Lois | 57 | 1.70 |
Wendy | 79 | 1.73 |
Betty | 65 | 1.68 |
Selection by the loc Attribute
Syntax
Theloc
attribute.
1dataframe.loc[row_selection, column_selection]
where row_selection
and column_selection
refer to a list of row labels and column labels, respectively, i.e. ['label1', 'label2', 'label3']
. loc
supports the slice notation and therefore accepts a colon(:
) to select all rows or columns.
Selecting a Cell
We can select a cell value by specifying both row and column labels.
Example
Selecting a cell using theloc
attribute.
1df.loc['Tom','Height']
1.65
Selecting a Row
Select an entire row using its label. The colon (:
) for column_selection
indicates all columns are selected.
Example
Selecting a row using theloc
attribute.
1df.loc['Tom',:]
Weight 58.00
Height 1.65
Ratio 35.20
Name: Tom, dtype: float64
The above returns a Series object.
1type(df.loc['Tom',:])
pandas.core.series.Series
If the row label is in a list, then a DataFrame will be created instead.
Example
Selecting a row using theloc
attribute - returns a DataFrame.
1df.loc[['Tom'],:]
Weight | Height | Ratio | |
---|---|---|---|
Tom | 58 | 1.65 | 35.2 |
1type(df.loc[['Tom'],:])
pandas.core.frame.DataFrame
It is also possible to select a row using the following shorter version by omitting the comma and colon.
Example
Selecting a row using theloc
attribute - short version.
1df.loc['Gary']
Weight 54.00
Height 1.84
Ratio 29.30
Name: Gary, dtype: float64
Selecting Multiple Rows
The labels of the rows we wish to select are inside a list. This will always return a DataFrame.
Example
Selecting multiple rows using theloc
attribute.
1df.loc[['Tom','Betty'],:]
Weight | Height | Ratio | |
---|---|---|---|
Tom | 58 | 1.65 | 35.2 |
Betty | 65 | 1.68 | 38.7 |
We can also selecting multiple rows by omitting the comma and colon.
Example
Selecting multiple rows using theloc
attribute - short version.
1df.loc[['Gary','Tom']]
Weight | Height | Ratio | |
---|---|---|---|
Gary | 54 | 1.84 | 29.3 |
Tom | 58 | 1.65 | 35.2 |
We can also select multiple contiguous rows (adjacent to each other with no gap) using the colon :
notation.
Example
Contiguous row selection.1df.loc['Gary':'Wendy',:]
Weight | Height | Ratio | |
---|---|---|---|
Gary | 54 | 1.84 | 29.3 |
Lois | 57 | 1.70 | 33.5 |
Wendy | 79 | 1.73 | 45.7 |
Example
Contiguous row selection - short version.1df.loc['Gary':'Betty']
Weight | Height | Ratio | |
---|---|---|---|
Gary | 54 | 1.84 | 29.3 |
Lois | 57 | 1.70 | 33.5 |
Wendy | 79 | 1.73 | 45.7 |
Betty | 65 | 1.68 | 38.7 |
When using the colon (:
) notation with labels, the end-index becomes inclusive. This is inconsistent with respect to how everything else in Python works.
Summary
Row selection by theloc
attribute.
Selection | Return Data Type | Example |
---|---|---|
Single value | Scalar | df.loc['Tom','Height'] |
Single row | Series | df.loc['Gary',:] |
Single row | DataFrame | df.loc[['Gary'],:] |
Multiple rows | DataFrame | df.loc[['Tom','Betty'],:] |
Contiguous rows | DataFrame | df.loc['Gary':'Wendy',:] |
When selecting rows using the loc
attribute, it is possible to omit the comma and colon. For example, df.loc['Gary',:]
becomes df.loc['Gary']
.
Selection by the iloc Attribute
Syntax
Theiloc
attribute.
1dataframe.iloc[row_selection, column_selection]
where row_selection
and column_selection
refer to a list of row indices and column indices, respectively.iloc
supports the slice notation and therefore accepts a colon(:
) to select all rows or columns.
When performing data selection using the iloc
attribute, we employ the standard half-open interval.
Selecting a Cell
We can select a cell value by specifying both row and column indices.
Example
Selecting a cell using theiloc
attribute.
1df.iloc[0,0]
58
Selecting a Row
Select an entire row using its label. The colon (:
) in column_selection
indicates all columns are selected.
Example
Selecting a row using theiloc
attribute.
1df.iloc[1,:]
Weight 54.00
Height 1.84
Ratio 29.30
Name: Gary, dtype: float64
The above returns a Series object.
1type(df.iloc[1,:])
pandas.core.series.Series
If the row index is in a list, then a DataFrame will be created instead.
Example
Selecting a row using theiloc
attribute - returns a DataFrame.
1df.iloc[[1],:]
Weight | Height | Ratio | |
---|---|---|---|
Gary | 54 | 1.84 | 29.3 |
1type(df.iloc[[1],:])
pandas.core.frame.DataFrame
Selecting Multiple Rows
Selecting multiple rows (whose indices are inside a list) will always return a DataFrame.
Example
Selecting multiple rows using theiloc
attribute.
1df.iloc[[1,3],:]
Weight | Height | Ratio | |
---|---|---|---|
Gary | 54 | 1.84 | 29.3 |
Wendy | 79 | 1.73 | 45.7 |
We can also select multiple contiguous rows (adjacent to each other with no gap) using the slice (colon :
) notation.
Example
Contiguous row selection.1df.iloc[1:3,:]
Weight | Height | Ratio | |
---|---|---|---|
Gary | 54 | 1.84 | 29.3 |
Lois | 57 | 1.70 | 33.5 |
Summary
Row selection by theiloc
attribute.
Selection | Return Data Type | Example |
---|---|---|
Single value | Scalar | df.iloc[1,2] |
Single row | Series | df.iloc[2,:] |
Single row | DataFrame | df.iloc[[2],:] |
Multiple rows | DataFrame | df.iloc[[3,1],:] |
Contiguous rows | DataFrame | df.iloc[1:3,:] |