Selecting DataFrame Columns

Table of Contents

Introduction

We learned in the previous sections that a DataFrame behaves in many ways like a 2-D NumPy array or a structured array, and in other ways like a Python dictionary of Series objects sharing the same index.

These analogies turn out to be helpful as we study data selection in DataFrames. In this section, we will discuss accessing the columns in a DataFrame by label and by position.

Selecting Columns by Labels

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd
3import random as rd

We now create a dataframe from a dictionary of lists using the random module to generate the required data.

👀 Review

DataFrame from a dictionary of Python lists

Example

Create a DataFrame from a dictionary of Python lists.

1a=['Tom','Gary','Lois','Wendy','Betty']
2b=rd.choices(range(50,85),k=5)
3c=np.random.uniform(1.6, 1.9, 5).round(2)
4
5df = pd.DataFrame({'Weight':b,
6                   'Height':c}, index=a)
7df

	Weight	Height
Tom	58	1.65
Gary	54	1.84
Lois	57	1.70
Wendy	79	1.73
Betty	65	1.68

Selecting Single Column

We can access each column using its column label (which is the key of the original dictionary). The output is a Series.

Example

Access a column of a DataFrame using its label.

1df['Weight']

Tom      58
Gary     54
Lois     57
Wendy    79
Betty    65
Name: Weight, dtype: int64

Alternatively, we can access each column using the column name as an attribute.

Example

Access a column of a DataFrame using using its label as an attribute.

1df.Weight

Tom      58
Gary     54
Lois     57
Wendy    79
Betty    65
Name: Weight, dtype: int64

Caution

This second method of accessing a column is not recommended since it will not work if the column names are not strings or if the column names conflict with methods or attributes of the DataFrame.

For example, we create a column with ‘shape’ as its label.

Example

Create a DataFrame with the shape attribute as a column label.

1df1 = pd.DataFrame({'Weight':b,
2                   'shape':c}, index=a)
3df1

	Weight	shape
Tom	58	1.65
Gary	54	1.84
Lois	57	1.70
Wendy	79	1.73
Betty	65	1.68

In this case, df1.shape will return the shape of the dataframe rather than the ‘shape’ column.

Example

Column label conflicts attribute of DataFrame.

1df1.shape

(5, 2)

The dictionary-like syntax can be used to modify the dataframe object. For example, creating a new column.

Example

Creating a new column from existing columns.

1df['Ratio'] = df['Weight']/df['Height']
2df

	Weight	Height	Ratio
Tom	58	1.65	35.151515
Gary	54	1.84	29.347826
Lois	57	1.70	33.529412
Wendy	79	1.73	45.664740
Betty	65	1.68	38.690476

We can also round off a certain column using the round() method.

Example

Rounding off a numerical column of values.

1df['Ratio'] = df['Ratio'].round(1)
2df

	Weight	Height	Ratio
Tom	58	1.65	35.2
Gary	54	1.84	29.3
Lois	57	1.70	33.5
Wendy	79	1.73	45.7
Betty	65	1.68	38.7

Selecting Multiple Columns

We can put the labels of all columns we want in a list and feed into the dataframe.

Example

Selecting multiple columns using labels.

1cols = ['Weight','Ratio']
2df[cols]

	Weight	Ratio
Tom	58	35.2
Gary	54	29.3
Lois	57	33.5
Wendy	79	45.7
Betty	65	38.7

Selection by the loc Attribute

The loc attribute is used to access a group of rows and columns by label(s) or a boolean array.

Syntax

The loc attribute.

1dataframe.loc[row_selection, column_selection]

where row_selection and column_selection refer to a list of row labels and column labels, respectively, i.e. ['label1', 'label2', 'label3']. loc supports the slice notation and therefore accepts a colon(:) to select all rows or columns.

Selecting a Cell

We can select a cell value by specifying both row and column labels.

Example

Selecting a cell using the loc attribute.

1df.loc['Tom','Height']

1.65

Selecting Single Column

Select an entire column using its label. The colon (:) in the row position indicates all rows are selected.

Example

Selecting a column using the loc attribute.

1df.loc[:,"Weight"]

Tom      58
Gary     54
Lois     57
Wendy    79
Betty    65
Name: Weight, dtype: int64

The above returns a series object.

1type(df.loc[:,"Weight"])

pandas.core.series.Series

If the column label is in a list, then a DataFrame will be created instead.

1df.loc[:,["Weight"]]

	Weight
Tom	58
Gary	54
Lois	57
Wendy	79
Betty	65

1type(df.loc[:,["Weight"]])

pandas.core.frame.DataFrame

Selecting Multiple Columns

The labels of the columns we wish to select are inside a list. This will always return a DataFrame.

Example

Selecting multiple columns using the loc attribute.

1df.loc[:,['Weight','Ratio']]

	Weight	Ratio
Tom	58	35.2
Gary	54	29.3
Lois	57	33.5
Wendy	79	45.7
Betty	65	38.7

We can also select multiple contiguous columns (adjacent to each other with no gap) using the colon : notation.

Example

Contiguous column selection.

1df.loc[:,'Weight':'Height']

	Weight	Height
Tom	58	1.65
Gary	54	1.84
Lois	57	1.70
Wendy	79	1.73
Betty	65	1.68

The usefulness of contiguous column selection may not be apparent in a DataFrame with only a few columns. However, imagine you have a large DataFrame and you need to select over a dozen contiguous columns. In this case, the colon (:) notation will do away with the need to write down the label of every single column included in the selection.

Summary

Column selection by the loc attribute.

Selection	Return Data Type	Example
Single value	Scalar	`df.loc['Tom','Height']`
Single column	Series	`df.loc[:,"Weight"]`
Single column	DataFrame	`df.loc[:,["Weight"]]`
Multiple columns	DataFrame	`df.loc[:,['Weight','Ratio']]`
Contiguous columns	DataFrame	`df.loc[:,'Weight':'Ratio']`

Selection by the iloc Attribute

The iloc attribute is a purely integer-location based indexing for selection by position.

Syntax

The iloc attribute.

1dataframe.iloc[row_selection, column_selection]

where row_selection and column_selection refer to a list of row indices and column indices, respectively.iloc supports the slice notation and therefore accepts a colon(:) to select all rows or columns.

Note

When performing data selection using the iloc attribute, we employ the standard half-open interval.

Selecting a Cell

We can select a cell value by specifying both row and column indices.

Example

Selecting a cell using the iloc attribute.

1df.iloc[0,0]

Selecting Single Column

We can select an entire column using its index. The colon (:) in row_selection indicates all rows are selected.

Example

Selecting a column using the iloc attribute.

1df.iloc[:,2]

Tom      35.2
Gary     29.3
Lois     33.5
Wendy    45.7
Betty    38.7
Name: Ratio, dtype: float64

The above returns a series object.

1type(df.iloc[:,2])

pandas.core.series.Series

If the column label is in a list, then a DataFrame will be created instead.

Example

Selecting a column using the iloc attribute - returns a DataFrame.

1df.iloc[:,[2]]

	Ratio
Tom	35.2
Gary	29.3
Lois	33.5
Wendy	45.7
Betty	38.7

1type(df.iloc[:,[2]])

pandas.core.frame.DataFrame

Selecting Multiple Columns

Selecting multiple columns (whose indices are inside a list) will always return a DataFrame.

Example

Selecting multiple columns using the iloc attribute.

1df.iloc[:,[0,2]]

	Weight	Ratio
Tom	58	35.2
Gary	54	29.3
Lois	57	33.5
Wendy	79	45.7
Betty	65	38.7

We can also select multiple contiguous columns (adjacent to each other with no gap) using the colon : notation.

Example

Contiguous column selection.

1df.iloc[:,0:2]

	Weight	Height
Tom	58	1.65
Gary	54	1.84
Lois	57	1.70
Wendy	79	1.73
Betty	65	1.68

Summary

Column selection by the iloc attribute.

Selection	Return Data Type	Example
Single value	Scalar	`df.iloc[1,2]`
Single column	Series	`df.iloc[:,2]`
Single column	DataFrame	`df.iloc[:,[2]]`
Multiple columns	DataFrame	`df.iloc[:,[2,1]]`
Contiguous columns	DataFrame	`df.iloc[:, 1:4]`

Selecting DataFrame Columns

Introduction

Selecting Columns by Labels

Selecting Single Column

Selecting Multiple Columns

Selection by the loc Attribute

Selecting a Cell

Selecting Single Column

Selecting Multiple Columns

Selection by the iloc Attribute

Selecting a Cell

Selecting Single Column

Selecting Multiple Columns

Share this Article