Table of Contents



Introduction

We learned in the previous sections that a DataFrame behaves in many ways like a 2-D NumPy array or a structured array, and in other ways like a Python dictionary of Series objects sharing the same index.

These analogies turn out to be helpful as we study data selection in DataFrames. In this section, we will discuss accessing the columns in a DataFrame by label and by position.

Selecting Columns by Labels

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd
3import random as rd

We now create a dataframe from a dictionary of lists using the random module to generate the required data.

Example

Create a DataFrame from a dictionary of Python lists.

1a=['Tom','Gary','Lois','Wendy','Betty']
2b=rd.choices(range(50,85),k=5)
3c=np.random.uniform(1.6, 1.9, 5).round(2)
4
5df = pd.DataFrame({'Weight':b,
6                   'Height':c}, index=a)
7df

Weight Height
Tom 58 1.65
Gary 54 1.84
Lois 57 1.70
Wendy 79 1.73
Betty 65 1.68

Selecting Single Column

We can access each column using its column label (which is the key of the original dictionary). The output is a Series.

Example

Access a column of a DataFrame using its label.

1df['Weight']
Tom      58
Gary     54
Lois     57
Wendy    79
Betty    65
Name: Weight, dtype: int64

Alternatively, we can access each column using the column name as an attribute.

Example

Access a column of a DataFrame using using its label as an attribute.

1df.Weight
Tom      58
Gary     54
Lois     57
Wendy    79
Betty    65
Name: Weight, dtype: int64
Caution

This second method of accessing a column is not recommended since it will not work if the column names are not strings or if the column names conflict with methods or attributes of the DataFrame.

For example, we create a column with ‘shape’ as its label.

Example

Create a DataFrame with the shape attribute as a column label.

1df1 = pd.DataFrame({'Weight':b,
2                   'shape':c}, index=a)
3df1

Weight shape
Tom 58 1.65
Gary 54 1.84
Lois 57 1.70
Wendy 79 1.73
Betty 65 1.68

In this case, df1.shape will return the shape of the dataframe rather than the ‘shape’ column.

Example

Column label conflicts attribute of DataFrame.

1df1.shape
(5, 2)

The dictionary-like syntax can be used to modify the dataframe object. For example, creating a new column.

Example

Creating a new column from existing columns.

1df['Ratio'] = df['Weight']/df['Height']
2df

Weight Height Ratio
Tom 58 1.65 35.151515
Gary 54 1.84 29.347826
Lois 57 1.70 33.529412
Wendy 79 1.73 45.664740
Betty 65 1.68 38.690476

We can also round off a certain column using the round() method.

Example

Rounding off a numerical column of values.

1df['Ratio'] = df['Ratio'].round(1)
2df

Weight Height Ratio
Tom 58 1.65 35.2
Gary 54 1.84 29.3
Lois 57 1.70 33.5
Wendy 79 1.73 45.7
Betty 65 1.68 38.7

Selecting Multiple Columns

We can put the labels of all columns we want in a list and feed into the dataframe.

Example

Selecting multiple columns using labels.

1cols = ['Weight','Ratio']
2df[cols]

Weight Ratio
Tom 58 35.2
Gary 54 29.3
Lois 57 33.5
Wendy 79 45.7
Betty 65 38.7

Selection by the loc Attribute

The loc attribute is used to access a group of rows and columns by label(s) or a boolean array.

Syntax

The loc attribute.

1dataframe.loc[row_selection, column_selection]

where row_selection and column_selection refer to a list of row labels and column labels, respectively, i.e. ['label1', 'label2', 'label3']. loc supports the slice notation and therefore accepts a colon(:) to select all rows or columns.

Selecting a Cell

We can select a cell value by specifying both row and column labels.

Example

Selecting a cell using the loc attribute.

1df.loc['Tom','Height']
1.65

Selecting Single Column

Select an entire column using its label. The colon (:) in the row position indicates all rows are selected.

Example

Selecting a column using the loc attribute.

1df.loc[:,"Weight"]
Tom      58
Gary     54
Lois     57
Wendy    79
Betty    65
Name: Weight, dtype: int64

The above returns a series object.

1type(df.loc[:,"Weight"])
pandas.core.series.Series

If the column label is in a list, then a DataFrame will be created instead.

1df.loc[:,["Weight"]]

Weight
Tom 58
Gary 54
Lois 57
Wendy 79
Betty 65
1type(df.loc[:,["Weight"]])
pandas.core.frame.DataFrame

Selecting Multiple Columns

The labels of the columns we wish to select are inside a list. This will always return a DataFrame.

Example

Selecting multiple columns using the loc attribute.

1df.loc[:,['Weight','Ratio']]

Weight Ratio
Tom 58 35.2
Gary 54 29.3
Lois 57 33.5
Wendy 79 45.7
Betty 65 38.7

We can also select multiple contiguous columns (adjacent to each other with no gap) using the colon : notation.

Example

Contiguous column selection.

1df.loc[:,'Weight':'Height']

Weight Height
Tom 58 1.65
Gary 54 1.84
Lois 57 1.70
Wendy 79 1.73
Betty 65 1.68

The usefulness of contiguous column selection may not be apparent in a DataFrame with only a few columns. However, imagine you have a large DataFrame and you need to select over a dozen contiguous columns. In this case, the colon (:) notation will do away with the need to write down the label of every single column included in the selection.


Summary

Column selection by the loc attribute.

Selection Return Data Type Example
Single value Scalar df.loc['Tom','Height']
Single column Series df.loc[:,"Weight"]
Single column DataFrame df.loc[:,["Weight"]]
Multiple columns DataFrame df.loc[:,['Weight','Ratio']]
Contiguous columns DataFrame df.loc[:,'Weight':'Ratio']

Selection by the iloc Attribute

The iloc attribute is a purely integer-location based indexing for selection by position.

Syntax

The iloc attribute.

1dataframe.iloc[row_selection, column_selection]

where row_selection and column_selection refer to a list of row indices and column indices, respectively.iloc supports the slice notation and therefore accepts a colon(:) to select all rows or columns.

Note

When performing data selection using the iloc attribute, we employ the standard half-open interval.

Selecting a Cell

We can select a cell value by specifying both row and column indices.

Example

Selecting a cell using the iloc attribute.

1df.iloc[0,0]
58

Selecting Single Column

We can select an entire column using its index. The colon (:) in row_selection indicates all rows are selected.

Example

Selecting a column using the iloc attribute.

1df.iloc[:,2]
Tom      35.2
Gary     29.3
Lois     33.5
Wendy    45.7
Betty    38.7
Name: Ratio, dtype: float64

The above returns a series object.

1type(df.iloc[:,2])
pandas.core.series.Series

If the column label is in a list, then a DataFrame will be created instead.

Example

Selecting a column using the iloc attribute - returns a DataFrame.

1df.iloc[:,[2]]

Ratio
Tom 35.2
Gary 29.3
Lois 33.5
Wendy 45.7
Betty 38.7
1type(df.iloc[:,[2]])
pandas.core.frame.DataFrame

Selecting Multiple Columns

Selecting multiple columns (whose indices are inside a list) will always return a DataFrame.

Example

Selecting multiple columns using the iloc attribute.

1df.iloc[:,[0,2]]

Weight Ratio
Tom 58 35.2
Gary 54 29.3
Lois 57 33.5
Wendy 79 45.7
Betty 65 38.7

We can also select multiple contiguous columns (adjacent to each other with no gap) using the colon : notation.

Example

Contiguous column selection.

1df.iloc[:,0:2]

Weight Height
Tom 58 1.65
Gary 54 1.84
Lois 57 1.70
Wendy 79 1.73
Betty 65 1.68

Summary

Column selection by the iloc attribute.

Selection Return Data Type Example
Single value Scalar df.iloc[1,2]
Single column Series df.iloc[:,2]
Single column DataFrame df.iloc[:,[2]]
Multiple columns DataFrame df.iloc[:,[2,1]]
Contiguous columns DataFrame df.iloc[:, 1:4]