Filtering DataFrame Columns

Table of Contents

Introduction

Boolean masking refers to selecting subsets of a DataFrame with the help of Boolean expressions. In particular, it can be applied to filtering the columns of a DataFrame based on certain specified Boolean expressions.

In DataFrames, however, we cannot use the Python logical operators and, or and not. Instead, we need to use the symbols as shown in the table below.

Summary

Logical boolean operators in pandas.

Python Logical Operators	pandas Logical Operators
`and`	`&`
`or`	`\|`
`not`	`~`

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd

Let’s create a more elaborate dataframe with 10 columns.

 1R = np.random.RandomState(10) # set a random seed
 2a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
 3b=R.choice(range(50,85), size=10, replace=False)
 4c=R.uniform(1.6, 1.9, 10).round(2)
 5d=R.choice(['Germany', 'UK', 'USA', 'France', 'Canada'], size=10)
 6
 7df = pd.DataFrame({'Weight':b,
 8                   'Height':c,
 9                   'Country':d}, index=a)
10df = df.T
11df

	Tom	Gary	Lois	Wendy	Betty	Jacobs	Donald	Yasmine	Nick	Andrew
Weight	73	74	52	57	55	53	62	64	76	63
Height	1.78	1.65	1.85	1.63	1.69	1.62	1.62	1.83	1.6	1.78
Country	USA	Germany	UK	USA	Germany	Germany	France	UK	France	Canada

Filtering Columns Based on Row Values

Suppose we want to select persons whose weight is below 60.

Example

Filtering columns based on row ‘Weight’.

1df.loc[:, df.loc['Weight']<60]

	Lois	Jacobs
Weight	52	53
Height	1.85	1.62
Country	UK	Germany

The expression within the square brackets is a Boolean expression and it is actually a Series of Boolean values.

1df.loc['Weight']<60

Tom        False
Jack       False
Gary       False
Lois        True
Jacobs      True
Donald     False
Yasmine    False
Cindy      False
Name: Weight, dtype: bool

Or persons who come from ‘Germany’.

Example

Filtering columns based on row ‘Country’.

1df.loc[:,df.loc['Country']=='Germany']

	Gary	Jacobs
Weight	74	53
Height	1.65	1.62
Country	Germany	Germany

Suppose we want to select persons whose weight is above 60 and height is below 1.70

We first create a mask which is a Boolean expression containing the two conditions. Thereafter, we apply the mask to the DataFrame.

Example

Filtering columns based on multiple rows.

1mask = (df.loc['Weight']>60) & (df.loc['Height']<1.7)
2df.loc[:,mask]

	Gary	Donald
Weight	74	62
Height	1.65	1.62
Country	Germany	France

Tip

If you have more than one condition, make sure to put every Boolean expression in between parentheses so that you do not need to worry about operator precedence.

isin() Method

To select only columns such that the ‘Country’ is either ‘USA’ or ‘Germany’.

Example

Filtering columns using the isin() method.

1df.loc[:, df.loc['Country'].isin(["USA", "Germany"])]

	Tom	Gary	Wendy	Betty	Jacobs
Weight	73	74	57	55	53
Height	1.78	1.65	1.63	1.69	1.62
Country	USA	Germany	USA	Germany	Germany

Filtering Based on Column Labels

It is also possible to filter the columns of a DataFrame based on the column labels. Recall that df.columns allows us to see all the column labels.

1df.columns

Index(['Tom', 'Jack', 'Gary', 'Lois', 'Jacobs', 'Donald', 'Yasmine', 'Cindy'], dtype='object')

startswith() Method

To filter columns such that the label starts with a certain character, we use the str attribute followed by the startswith() method. The Boolean mask looks like this.

Example

Filtering columns such that label starts with ‘A’ or ‘B’.

1df.columns.str.startswith(('A','B'))

array([False, False, False, False,  True, False, False, False, False,
        True])

We then feed the above mask into df.loc.

1df.loc[:,df.columns.str.startswith(('A','B'))]

	Betty	Andrew
Weight	55	63
Height	1.69	1.78
Country	Germany	Canada

endswith() Method

To filter columns such that the label ends with certain characters, we use the str attribute followed by the endswith() method.

Example

Filtering columns such that label ends with ‘y’ or ’s’.

1df.loc[:,df.columns.str.endswith(('y','s'))]

	Gary	Lois	Wendy	Betty	Jacobs
Weight	74	52	57	55	53
Height	1.65	1.85	1.63	1.69	1.62
Country	Germany	UK	USA	Germany	Germany

contains() Method

To filter columns such that the label contains certain characters, we use the str attribute followed by the contains() method.

Example

Filtering columns such that label contains a specified string.

1df.loc[:,df.columns.str.contains('on')]

	Donald
Weight	62
Height	1.62
Country	France

Column labels containing one of multiple string patterns.

Example

Filtering columns such that label contains one of 2 specified strings.

1df.loc[:,df.columns.str.contains('on|as')]

	Donald	Yasmine
Weight	62	64
Height	1.62	1.83
Country	France	UK

Example

Filtering columns such that label contains one of 3 specified strings.

1df.loc[:,df.columns.str.contains('on|as|et')]

	Betty	Donald	Yasmine
Weight	55	62	64
Height	1.69	1.62	1.83
Country	Germany	France	UK

To grep column labels that contain BOTH “o” and “i”.

Example

Filtering columns such that label contains both ‘o’ and ‘i’.

1df.loc[:,(df.columns.str.contains('o')) & (df.columns.str.contains('i'))]

	Lois
Weight	52
Height	1.85
Country	UK

len() Method

To filter columns such that the label has more than a certain number of characters.

Example

Filtering columns such that the label is greater than a certain length.

1df.loc[:,df.columns.str.len()>5]

	Jacobs	Donald	Yasmine	Andrew
Weight	53	62	64	63
Height	1.62	1.62	1.83	1.78
Country	Germany	France	UK	Canada

Filtering DataFrame Columns

Introduction

Filtering Columns Based on Row Values

isin() Method

Filtering Based on Column Labels

startswith() Method

endswith() Method

contains() Method

len() Method

Share this Article