Table of Contents
- Introduction
- Selecting Columns by Labels
- Selection by the
locAttribute - Selection by the
ilocAttribute
Introduction
We learned in the previous sections that a DataFrame behaves in many ways like a 2-D NumPy array or a structured array, and in other ways like a Python dictionary of Series objects sharing the same index.
These analogies turn out to be helpful as we study data selection in DataFrames. In this section, we will discuss accessing the columns in a DataFrame by label and by position.
Selecting Columns by Labels
We first import the required libraries and modules.
1import numpy as np
2import pandas as pd
3import random as rd
We now create a dataframe from a dictionary of lists using the random module to generate the required data.
Example
Create a DataFrame from a dictionary of Python lists.1a=['Tom','Gary','Lois','Wendy','Betty']
2b=rd.choices(range(50,85),k=5)
3c=np.random.uniform(1.6, 1.9, 5).round(2)
4
5df = pd.DataFrame({'Weight':b,
6 'Height':c}, index=a)
7df
| Weight | Height | |
|---|---|---|
| Tom | 58 | 1.65 |
| Gary | 54 | 1.84 |
| Lois | 57 | 1.70 |
| Wendy | 79 | 1.73 |
| Betty | 65 | 1.68 |
Selecting Single Column
We can access each column using its column label (which is the key of the original dictionary). The output is a Series.
Example
Access a column of a DataFrame using its label.1df['Weight']
Tom 58
Gary 54
Lois 57
Wendy 79
Betty 65
Name: Weight, dtype: int64
Alternatively, we can access each column using the column name as an attribute.
Example
Access a column of a DataFrame using using its label as an attribute.1df.Weight
Tom 58
Gary 54
Lois 57
Wendy 79
Betty 65
Name: Weight, dtype: int64
This second method of accessing a column is not recommended since it will not work if the column names are not strings or if the column names conflict with methods or attributes of the DataFrame.
For example, we create a column with ‘shape’ as its label.
Example
Create a DataFrame with theshape attribute as a column label.
1df1 = pd.DataFrame({'Weight':b,
2 'shape':c}, index=a)
3df1
| Weight | shape | |
|---|---|---|
| Tom | 58 | 1.65 |
| Gary | 54 | 1.84 |
| Lois | 57 | 1.70 |
| Wendy | 79 | 1.73 |
| Betty | 65 | 1.68 |
In this case, df1.shape will return the shape of the dataframe rather than the ‘shape’ column.
Example
Column label conflicts attribute of DataFrame.1df1.shape
(5, 2)
The dictionary-like syntax can be used to modify the dataframe object. For example, creating a new column.
Example
Creating a new column from existing columns.1df['Ratio'] = df['Weight']/df['Height']
2df
| Weight | Height | Ratio | |
|---|---|---|---|
| Tom | 58 | 1.65 | 35.151515 |
| Gary | 54 | 1.84 | 29.347826 |
| Lois | 57 | 1.70 | 33.529412 |
| Wendy | 79 | 1.73 | 45.664740 |
| Betty | 65 | 1.68 | 38.690476 |
We can also round off a certain column using the round() method.
Example
Rounding off a numerical column of values.1df['Ratio'] = df['Ratio'].round(1)
2df
| Weight | Height | Ratio | |
|---|---|---|---|
| Tom | 58 | 1.65 | 35.2 |
| Gary | 54 | 1.84 | 29.3 |
| Lois | 57 | 1.70 | 33.5 |
| Wendy | 79 | 1.73 | 45.7 |
| Betty | 65 | 1.68 | 38.7 |
Selecting Multiple Columns
We can put the labels of all columns we want in a list and feed into the dataframe.
Example
Selecting multiple columns using labels.1cols = ['Weight','Ratio']
2df[cols]
| Weight | Ratio | |
|---|---|---|
| Tom | 58 | 35.2 |
| Gary | 54 | 29.3 |
| Lois | 57 | 33.5 |
| Wendy | 79 | 45.7 |
| Betty | 65 | 38.7 |
Selection by the loc Attribute
The loc attribute is used to access a group of rows and columns by label(s) or a boolean array.
Syntax
Theloc attribute.
1dataframe.loc[row_selection, column_selection]
where row_selection and column_selection refer to a list of row labels and column labels, respectively, i.e. ['label1', 'label2', 'label3']. loc supports the slice notation and therefore accepts a colon(:) to select all rows or columns.
Selecting a Cell
We can select a cell value by specifying both row and column labels.
Example
Selecting a cell using theloc attribute.
1df.loc['Tom','Height']
1.65
Selecting Single Column
Select an entire column using its label. The colon (:) in the row position indicates all rows are selected.
Example
Selecting a column using theloc attribute.
1df.loc[:,"Weight"]
Tom 58
Gary 54
Lois 57
Wendy 79
Betty 65
Name: Weight, dtype: int64
The above returns a series object.
1type(df.loc[:,"Weight"])
pandas.core.series.Series
If the column label is in a list, then a DataFrame will be created instead.
1df.loc[:,["Weight"]]
| Weight | |
|---|---|
| Tom | 58 |
| Gary | 54 |
| Lois | 57 |
| Wendy | 79 |
| Betty | 65 |
1type(df.loc[:,["Weight"]])
pandas.core.frame.DataFrame
Selecting Multiple Columns
The labels of the columns we wish to select are inside a list. This will always return a DataFrame.
Example
Selecting multiple columns using theloc attribute.
1df.loc[:,['Weight','Ratio']]
| Weight | Ratio | |
|---|---|---|
| Tom | 58 | 35.2 |
| Gary | 54 | 29.3 |
| Lois | 57 | 33.5 |
| Wendy | 79 | 45.7 |
| Betty | 65 | 38.7 |
We can also select multiple contiguous columns (adjacent to each other with no gap) using the colon : notation.
Example
Contiguous column selection.1df.loc[:,'Weight':'Height']
| Weight | Height | |
|---|---|---|
| Tom | 58 | 1.65 |
| Gary | 54 | 1.84 |
| Lois | 57 | 1.70 |
| Wendy | 79 | 1.73 |
| Betty | 65 | 1.68 |
The usefulness of contiguous column selection may not be apparent in a DataFrame with only a few columns. However, imagine you have a large DataFrame and you need to select over a dozen contiguous columns. In this case, the colon (:) notation will do away with the need to write down the label of every single column included in the selection.
Summary
Column selection by theloc attribute.
| Selection | Return Data Type | Example |
|---|---|---|
| Single value | Scalar | df.loc['Tom','Height'] |
| Single column | Series | df.loc[:,"Weight"] |
| Single column | DataFrame | df.loc[:,["Weight"]] |
| Multiple columns | DataFrame | df.loc[:,['Weight','Ratio']] |
| Contiguous columns | DataFrame | df.loc[:,'Weight':'Ratio'] |
Selection by the iloc Attribute
The iloc attribute is a purely integer-location based indexing for selection by position.
Syntax
Theiloc attribute.
1dataframe.iloc[row_selection, column_selection]
where row_selection and column_selection refer to a list of row indices and column indices, respectively.iloc supports the slice notation and therefore accepts a colon(:) to select all rows or columns.
When performing data selection using the iloc attribute, we employ the standard half-open interval.
Selecting a Cell
We can select a cell value by specifying both row and column indices.
Example
Selecting a cell using theiloc attribute.
1df.iloc[0,0]
58
Selecting Single Column
We can select an entire column using its index. The colon (:) in row_selection indicates all rows are selected.
Example
Selecting a column using theiloc attribute.
1df.iloc[:,2]
Tom 35.2
Gary 29.3
Lois 33.5
Wendy 45.7
Betty 38.7
Name: Ratio, dtype: float64
The above returns a series object.
1type(df.iloc[:,2])
pandas.core.series.Series
If the column label is in a list, then a DataFrame will be created instead.
Example
Selecting a column using theiloc attribute - returns a DataFrame.
1df.iloc[:,[2]]
| Ratio | |
|---|---|
| Tom | 35.2 |
| Gary | 29.3 |
| Lois | 33.5 |
| Wendy | 45.7 |
| Betty | 38.7 |
1type(df.iloc[:,[2]])
pandas.core.frame.DataFrame
Selecting Multiple Columns
Selecting multiple columns (whose indices are inside a list) will always return a DataFrame.
Example
Selecting multiple columns using theiloc attribute.
1df.iloc[:,[0,2]]
| Weight | Ratio | |
|---|---|---|
| Tom | 58 | 35.2 |
| Gary | 54 | 29.3 |
| Lois | 57 | 33.5 |
| Wendy | 79 | 45.7 |
| Betty | 65 | 38.7 |
We can also select multiple contiguous columns (adjacent to each other with no gap) using the colon : notation.
Example
Contiguous column selection.1df.iloc[:,0:2]
| Weight | Height | |
|---|---|---|
| Tom | 58 | 1.65 |
| Gary | 54 | 1.84 |
| Lois | 57 | 1.70 |
| Wendy | 79 | 1.73 |
| Betty | 65 | 1.68 |
Summary
Column selection by theiloc attribute.
| Selection | Return Data Type | Example |
|---|---|---|
| Single value | Scalar | df.iloc[1,2] |
| Single column | Series | df.iloc[:,2] |
| Single column | DataFrame | df.iloc[:,[2]] |
| Multiple columns | DataFrame | df.iloc[:,[2,1]] |
| Contiguous columns | DataFrame | df.iloc[:, 1:4] |