Filtering DataFrame Rows

Table of Contents

Introduction

Boolean masking refers to selecting subsets of a DataFrame with the help of Boolean expressions. In particular, it can be applied to filtering the rows of a DataFrame based on certain specified Boolean expressions.

In DataFrames, however, we cannot use the Python logical operators and, or and not. Instead, we need to use the symbols as shown in the table below.

Summary

Logical boolean operators in pandas.

Python Logical Operators	pandas Logical Operators
`and`	`&`
`or`	`\|`
`not`	`~`

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd
3import random as rd

Let’s create a more elaborate dataframe for this section.

1a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
2b=rd.choices(range(50,85),k=10)
3c=np.random.uniform(1.6, 1.9, 10).round(2)
4d = rd.choices(['Germany', 'UK', 'USA', 'France', 'Canada'], k=10)
5
6df = pd.DataFrame({'Weight':b,
7                   'Height':c,
8                   'Country':d}, index=a)
9df

	Weight	Height	Country
Tom	59	1.85	France
Gary	68	1.64	Germany
Lois	81	1.85	Germany
Wendy	56	1.67	Canada
Betty	65	1.73	USA
Jacobs	59	1.69	France
Donald	72	1.81	UK
Yasmine	52	1.85	Canada
Nick	66	1.67	Germany
Andrew	69	1.60	UK

Filtering Based on Column Values

Suppose we want to select persons whose weight is below 60.

Example

Filtering rows based on column ‘Weight’.

1df[df['Weight']<60]

	Weight	Height	Country
Tom	59	1.85	France
Wendy	56	1.67	Canada
Jacobs	59	1.69	France
Yasmine	52	1.85	Canada

The expression within the square brackets is a Boolean expression and it is actually a Series of Boolean values.

1df['Weight']<60

Tom         True
Gary       False
Lois       False
Wendy       True
Betty      False
Jacobs      True
Donald     False
Yasmine     True
Nick       False
Andrew     False
Name: Weight, dtype: bool

We can select only persons (rows) who come from ‘Germany’.

Example

Filtering rows based on column ‘Country’.

1df[df['Country']=='Germany']

	Weight	Height	Country
Gary	68	1.64	Germany
Lois	81	1.85	Germany
Nick	66	1.67	Germany

Suppose we want to select persons whose weight is above 60 and height is below 1.70.

We first create a mask which is a Boolean expression containing the two conditions. Thereafter, we apply the mask to the DataFrame.

Example

Filtering rows based on multiple columns.

1mask = (df['Weight']>60) & (df['Height']<1.7)
2df[mask]

	Weight	Height	Country
Gary	68	1.64	Germany
Nick	66	1.67	Germany
Andrew	69	1.60	UK

Tip

If you have more than one condition, it is advisable to put every Boolean expression within parentheses so that you do not need to worry about operator precedence.

isin() Method

To filter the DataFrame to include only participants from USA or Germany.

Example

Filtering rows using the isin() method.

1df.loc[df["Country"].isin(["USA", "Germany"]), :]

	Weight	Height	Country
Gary	68	1.64	Germany
Lois	81	1.85	Germany
Betty	65	1.73	USA
Nick	66	1.67	Germany

Filtering Based on Index

It is also possible to filter the rows of a DataFrame based on the index column. Recall that df.index allows us to see all the row labels.

1df.index

Index(['Tom', 'Gary', 'Lois', 'Wendy', 'Betty', 'Jacobs', 'Donald', 'Yasmine',
       'Nick', 'Andrew'],
      dtype='object')

startswith() Method

To filter rows such that the index starts with a certain character, we use the str attribute followed by the startswith() method. The Boolean mask looks like this.

Example

Filtering rows based on the index using the startswith() method.

1df.index.str.startswith(('A','B'))

array([False, False, False, False,  True, False, False, False, False,
        True])

We then feed the above mask into df.loc.

1df.loc[df.index.str.startswith(('A','B'))]

	Weight	Height	Country
Betty	65	1.73	USA
Andrew	69	1.60	UK

endswith() Method

To filter rows such that the index ends with certain characters, we use the str attribute followed by the endswith() method.

Example

Filtering rows based on the index using the endswith() method.

1df.loc[df.index.str.endswith(('y','s'))]

	Weight	Height	Country
Gary	68	1.64	Germany
Lois	81	1.85	Germany
Wendy	56	1.67	Canada
Betty	65	1.73	USA
Jacobs	59	1.69	France

contains() Method

To filter rows such that the index contains certain characters, we use the str attribute followed by the contains() method.

Example

Filtering rows such that the index contains string ‘on’.

1df.loc[df.index.str.contains('on')]

	Weight	Height	Country
Donald	72	1.81	UK

We can also filter the rows such that the index contains one of multiple specified strings.

Example

Filtering rows such that the index contains string ‘on’ or ‘as’.

1df.loc[df.index.str.contains('on|as')]

	Weight	Height	Country
Donald	72	1.81	UK
Yasmine	52	1.85	Canada

Example

Filtering rows such that the index contains ‘on’, ‘as’ or ’et’.

1df.loc[df.index.str.contains('on|as|et')]

	Weight	Height	Country
Betty	65	1.73	USA
Donald	72	1.81	UK
Yasmine	52	1.85	Canada

We can also grep index labels that contain all the specified strings.

Example

Filtering rows such that the index contains both ‘o’ and ‘i’.

1df.loc[(df.index.str.contains('o')) & (df.index.str.contains('i'))]

	Weight	Height	Country
Lois	81	1.85	Germany

len() Method

To filter rows such that the index has more than a certain number of characters.

Example

Filtering rows based on the index using the len() method.

1df.loc[df.index.str.len()>5]

	Weight	Height	Country
Jacobs	76	1.73	Canada
Donald	55	1.65	France
Yasmine	77	1.80	Germany
Andrew	59	1.68	Canada

Filtering Integer Index

We first convert the current name index into an integer range index.

Example

Resetting DataFrame index to integer range.

1df.reset_index(inplace=True)

The inplace=True argument indicates that the DataFrame is modified in place which means that original df is modified without making a further assignment.

1df

	index	Weight	Height	Country
0	Tom	54	1.84	Canada
1	Gary	57	1.77	USA
2	Lois	83	1.87	France
3	Wendy	52	1.84	Canada
4	Betty	67	1.61	USA
5	Jacobs	76	1.73	Canada
6	Donald	55	1.65	France
7	Yasmine	77	1.80	Germany
8	Nick	59	1.82	Germany
9	Andrew	59	1.68	Canada

We can select all rows whose indices lie between 3 and 8, exclusive.

Example

Filtering rows based on integer index.

1df.loc[(df.index > 3) & (df.index < 8)]

	index	Weight	Height	Country
4	Betty	67	1.61	USA
5	Jacobs	76	1.73	Canada
6	Donald	55	1.65	France
7	Yasmine	77	1.80	Germany

Filtering DataFrame Rows

Introduction

Filtering Based on Column Values

isin() Method

Filtering Based on Index

startswith() Method

endswith() Method

contains() Method

len() Method

Filtering Integer Index

Share this Article