Table of Contents



Introduction

Boolean masking refers to selecting subsets of a DataFrame with the help of Boolean expressions. In particular, it can be applied to filtering the rows of a DataFrame based on certain specified Boolean expressions.

In DataFrames, however, we cannot use the Python logical operators and, or and not. Instead, we need to use the symbols as shown in the table below.

Summary

Logical boolean operators in pandas.

Python Logical Operators pandas Logical Operators
and &
or |
not ~

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd
3import random as rd

Let’s create a more elaborate dataframe for this section.

1a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
2b=rd.choices(range(50,85),k=10)
3c=np.random.uniform(1.6, 1.9, 10).round(2)
4d = rd.choices(['Germany', 'UK', 'USA', 'France', 'Canada'], k=10)
5
6df = pd.DataFrame({'Weight':b,
7                   'Height':c,
8                   'Country':d}, index=a)
9df

Weight Height Country
Tom 59 1.85 France
Gary 68 1.64 Germany
Lois 81 1.85 Germany
Wendy 56 1.67 Canada
Betty 65 1.73 USA
Jacobs 59 1.69 France
Donald 72 1.81 UK
Yasmine 52 1.85 Canada
Nick 66 1.67 Germany
Andrew 69 1.60 UK

Filtering Based on Column Values

Suppose we want to select persons whose weight is below 60.

Example

Filtering rows based on column ‘Weight’.

1df[df['Weight']<60]

Weight Height Country
Tom 59 1.85 France
Wendy 56 1.67 Canada
Jacobs 59 1.69 France
Yasmine 52 1.85 Canada

The expression within the square brackets is a Boolean expression and it is actually a Series of Boolean values.

1df['Weight']<60
Tom         True
Gary       False
Lois       False
Wendy       True
Betty      False
Jacobs      True
Donald     False
Yasmine     True
Nick       False
Andrew     False
Name: Weight, dtype: bool

We can select only persons (rows) who come from ‘Germany’.

Example

Filtering rows based on column ‘Country’.

1df[df['Country']=='Germany']

Weight Height Country
Gary 68 1.64 Germany
Lois 81 1.85 Germany
Nick 66 1.67 Germany

Suppose we want to select persons whose weight is above 60 and height is below 1.70.

We first create a mask which is a Boolean expression containing the two conditions. Thereafter, we apply the mask to the DataFrame.

Example

Filtering rows based on multiple columns.

1mask = (df['Weight']>60) & (df['Height']<1.7)
2df[mask]

Weight Height Country
Gary 68 1.64 Germany
Nick 66 1.67 Germany
Andrew 69 1.60 UK
Tip

If you have more than one condition, it is advisable to put every Boolean expression within parentheses so that you do not need to worry about operator precedence.

isin() Method

To filter the DataFrame to include only participants from USA or Germany.

Example

Filtering rows using the isin() method.

1df.loc[df["Country"].isin(["USA", "Germany"]), :]

Weight Height Country
Gary 68 1.64 Germany
Lois 81 1.85 Germany
Betty 65 1.73 USA
Nick 66 1.67 Germany

Filtering Based on Index

It is also possible to filter the rows of a DataFrame based on the index column. Recall that df.index allows us to see all the row labels.

1df.index
Index(['Tom', 'Gary', 'Lois', 'Wendy', 'Betty', 'Jacobs', 'Donald', 'Yasmine',
       'Nick', 'Andrew'],
      dtype='object')

startswith() Method

To filter rows such that the index starts with a certain character, we use the str attribute followed by the startswith() method. The Boolean mask looks like this.

Example

Filtering rows based on the index using the startswith() method.

1df.index.str.startswith(('A','B'))
array([False, False, False, False,  True, False, False, False, False,
        True])

We then feed the above mask into df.loc.

1df.loc[df.index.str.startswith(('A','B'))]

Weight Height Country
Betty 65 1.73 USA
Andrew 69 1.60 UK

endswith() Method

To filter rows such that the index ends with certain characters, we use the str attribute followed by the endswith() method.

Example

Filtering rows based on the index using the endswith() method.

1df.loc[df.index.str.endswith(('y','s'))]

Weight Height Country
Gary 68 1.64 Germany
Lois 81 1.85 Germany
Wendy 56 1.67 Canada
Betty 65 1.73 USA
Jacobs 59 1.69 France

contains() Method

To filter rows such that the index contains certain characters, we use the str attribute followed by the contains() method.

Example

Filtering rows such that the index contains string ‘on’.

1df.loc[df.index.str.contains('on')]

Weight Height Country
Donald 72 1.81 UK

We can also filter the rows such that the index contains one of multiple specified strings.

Example

Filtering rows such that the index contains string ‘on’ or ‘as’.

1df.loc[df.index.str.contains('on|as')]

Weight Height Country
Donald 72 1.81 UK
Yasmine 52 1.85 Canada

Example

Filtering rows such that the index contains ‘on’, ‘as’ or ’et’.

1df.loc[df.index.str.contains('on|as|et')]

Weight Height Country
Betty 65 1.73 USA
Donald 72 1.81 UK
Yasmine 52 1.85 Canada

We can also grep index labels that contain all the specified strings.

Example

Filtering rows such that the index contains both ‘o’ and ‘i’.

1df.loc[(df.index.str.contains('o')) & (df.index.str.contains('i'))]

Weight Height Country
Lois 81 1.85 Germany

len() Method

To filter rows such that the index has more than a certain number of characters.

Example

Filtering rows based on the index using the len() method.

1df.loc[df.index.str.len()>5]

Weight Height Country
Jacobs 76 1.73 Canada
Donald 55 1.65 France
Yasmine 77 1.80 Germany
Andrew 59 1.68 Canada

Filtering Integer Index

We first convert the current name index into an integer range index.

Example

Resetting DataFrame index to integer range.

1df.reset_index(inplace=True)

The inplace=True argument indicates that the DataFrame is modified in place which means that original df is modified without making a further assignment.

1df

index Weight Height Country
0 Tom 54 1.84 Canada
1 Gary 57 1.77 USA
2 Lois 83 1.87 France
3 Wendy 52 1.84 Canada
4 Betty 67 1.61 USA
5 Jacobs 76 1.73 Canada
6 Donald 55 1.65 France
7 Yasmine 77 1.80 Germany
8 Nick 59 1.82 Germany
9 Andrew 59 1.68 Canada

We can select all rows whose indices lie between 3 and 8, exclusive.

Example

Filtering rows based on integer index.

1df.loc[(df.index > 3) & (df.index < 8)]

index Weight Height Country
4 Betty 67 1.61 USA
5 Jacobs 76 1.73 Canada
6 Donald 55 1.65 France
7 Yasmine 77 1.80 Germany