Table of Contents



Introduction

Boolean masking refers to selecting subsets of a DataFrame with the help of Boolean expressions. In particular, it can be applied to filtering the columns of a DataFrame based on certain specified Boolean expressions.

In DataFrames, however, we cannot use the Python logical operators and, or and not. Instead, we need to use the symbols as shown in the table below.

Summary

Logical boolean operators in pandas.

Python Logical Operators pandas Logical Operators
and &
or |
not ~

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd

Let’s create a more elaborate dataframe with 10 columns.

 1R = np.random.RandomState(10) # set a random seed
 2a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
 3b=R.choice(range(50,85), size=10, replace=False)
 4c=R.uniform(1.6, 1.9, 10).round(2)
 5d=R.choice(['Germany', 'UK', 'USA', 'France', 'Canada'], size=10)
 6
 7df = pd.DataFrame({'Weight':b,
 8                   'Height':c,
 9                   'Country':d}, index=a)
10df = df.T
11df

Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew
Weight 73 74 52 57 55 53 62 64 76 63
Height 1.78 1.65 1.85 1.63 1.69 1.62 1.62 1.83 1.6 1.78
Country USA Germany UK USA Germany Germany France UK France Canada

Filtering Columns Based on Row Values

Suppose we want to select persons whose weight is below 60.

Example

Filtering columns based on row ‘Weight’.

1df.loc[:, df.loc['Weight']<60]

Lois Jacobs
Weight 52 53
Height 1.85 1.62
Country UK Germany

The expression within the square brackets is a Boolean expression and it is actually a Series of Boolean values.

1df.loc['Weight']<60
Tom        False
Jack       False
Gary       False
Lois        True
Jacobs      True
Donald     False
Yasmine    False
Cindy      False
Name: Weight, dtype: bool

Or persons who come from ‘Germany’.

Example

Filtering columns based on row ‘Country’.

1df.loc[:,df.loc['Country']=='Germany']

Gary Jacobs
Weight 74 53
Height 1.65 1.62
Country Germany Germany

Suppose we want to select persons whose weight is above 60 and height is below 1.70

We first create a mask which is a Boolean expression containing the two conditions. Thereafter, we apply the mask to the DataFrame.

Example

Filtering columns based on multiple rows.

1mask = (df.loc['Weight']>60) & (df.loc['Height']<1.7)
2df.loc[:,mask]

Gary Donald
Weight 74 62
Height 1.65 1.62
Country Germany France
Tip

If you have more than one condition, make sure to put every Boolean expression in between parentheses so that you do not need to worry about operator precedence.

isin() Method

To select only columns such that the ‘Country’ is either ‘USA’ or ‘Germany’.

Example

Filtering columns using the isin() method.

1df.loc[:, df.loc['Country'].isin(["USA", "Germany"])]

Tom Gary Wendy Betty Jacobs
Weight 73 74 57 55 53
Height 1.78 1.65 1.63 1.69 1.62
Country USA Germany USA Germany Germany

Filtering Based on Column Labels

It is also possible to filter the columns of a DataFrame based on the column labels. Recall that df.columns allows us to see all the column labels.

1df.columns
Index(['Tom', 'Jack', 'Gary', 'Lois', 'Jacobs', 'Donald', 'Yasmine', 'Cindy'], dtype='object')

startswith() Method

To filter columns such that the label starts with a certain character, we use the str attribute followed by the startswith() method. The Boolean mask looks like this.

Example

Filtering columns such that label starts with ‘A’ or ‘B’.

1df.columns.str.startswith(('A','B'))
array([False, False, False, False,  True, False, False, False, False,
        True])

We then feed the above mask into df.loc.

1df.loc[:,df.columns.str.startswith(('A','B'))]

Betty Andrew
Weight 55 63
Height 1.69 1.78
Country Germany Canada

endswith() Method

To filter columns such that the label ends with certain characters, we use the str attribute followed by the endswith() method.

Example

Filtering columns such that label ends with ‘y’ or ’s’.

1df.loc[:,df.columns.str.endswith(('y','s'))]

Gary Lois Wendy Betty Jacobs
Weight 74 52 57 55 53
Height 1.65 1.85 1.63 1.69 1.62
Country Germany UK USA Germany Germany

contains() Method

To filter columns such that the label contains certain characters, we use the str attribute followed by the contains() method.

Example

Filtering columns such that label contains a specified string.

1df.loc[:,df.columns.str.contains('on')]

Donald
Weight 62
Height 1.62
Country France

Column labels containing one of multiple string patterns.

Example

Filtering columns such that label contains one of 2 specified strings.

1df.loc[:,df.columns.str.contains('on|as')]

Donald Yasmine
Weight 62 64
Height 1.62 1.83
Country France UK

Example

Filtering columns such that label contains one of 3 specified strings.

1df.loc[:,df.columns.str.contains('on|as|et')]

Betty Donald Yasmine
Weight 55 62 64
Height 1.69 1.62 1.83
Country Germany France UK

To grep column labels that contain BOTH “o” and “i”.

Example

Filtering columns such that label contains both ‘o’ and ‘i’.

1df.loc[:,(df.columns.str.contains('o')) & (df.columns.str.contains('i'))]

Lois
Weight 52
Height 1.85
Country UK

len() Method

To filter columns such that the label has more than a certain number of characters.

Example

Filtering columns such that the label is greater than a certain length.

1df.loc[:,df.columns.str.len()>5]

Jacobs Donald Yasmine Andrew
Weight 53 62 64 63
Height 1.62 1.62 1.83 1.78
Country Germany France UK Canada