Table of Contents



Introduction

In this article, we will discuss the following operations on the rows of a DataFrame:

  • sorting rows
  • rearranging rows

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd

We then create a dataframe with the help of some random functions in NumPy.

 1R = np.random.RandomState(10) # set a random seed
 2a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
 3b=R.choice(range(50,85), size=10, replace=False)
 4c=R.uniform(1.6, 1.9, 10).round(2)
 5d=R.choice(['Germany', 'UK', 'USA', 'France', 'Canada'], size=10)
 6
 7df = pd.DataFrame({'Weight':b,
 8                   'Height':c,
 9                   'Country':d}, index=a)
10df

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Yasmine 64 1.83 UK
Nick 76 1.60 France
Andrew 63 1.78 Canada

Sorting Rows

We usually sort the rows of a DataFrame in three typical scenarios:

  • sorting rows based on the index.
  • sorting rows base on values of a single column.
  • sorting rows base on values of multiple columns.

Sorting Rows Based on Index

To sort the rows according to the first alphabet.

Example

Sorting rows according to the first alphabet of index.

1df.sort_index()

Weight Height Country
Andrew 63 1.78 Canada
Betty 55 1.69 Germany
Donald 62 1.62 France
Gary 74 1.65 Germany
Jacobs 53 1.62 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Tom 73 1.78 USA
Wendy 57 1.63 USA
Yasmine 64 1.83 UK

To rearrange the rows according to the first alphabet, in reverse order.

Example

Sorting rows according to the first alphabet of index (reverse order).

1df.sort_index(ascending=False)

Weight Height Country
Yasmine 64 1.83 UK
Wendy 57 1.63 USA
Tom 73 1.78 USA
Nick 76 1.60 France
Lois 52 1.85 UK
Jacobs 53 1.62 Germany
Gary 74 1.65 Germany
Donald 62 1.62 France
Betty 55 1.69 Germany
Andrew 63 1.78 Canada

A key function can be specified which is applied to the index before sorting.

For example, to sort the rows according to the length of the row labels.

Example

Sorting rows according to the length of the index.

1df.sort_index(key=lambda x: x.str.len())

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK

Same as above but in descending order.

Example

Sorting rows according to the length of the index (descending).

1df.sort_index(key=lambda x: x.str.len(), ascending=False)

Weight Height Country
Yasmine 64 1.83 UK
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Andrew 63 1.78 Canada
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Tom 73 1.78 USA
Note

The original DataFrame is not modified. However, we can sort the rows in-place by including the argument inplace=True. This will do away with the need to reassign the DataFrame.

Example

Sorting rows according to the length of the index (in-place).

1df.sort_index(key=lambda x: x.str.len(), inplace=True)
2df

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK

Sorting Rows Based on Values of a Single Column

In this example, we will sort the rows in ascending order of ‘Weight’ using the sort_values() method.

Example

Sorting rows by the ‘Weight’ column (ascending).

1df.sort_values(by ='Weight', axis=0)

Weight Height Country
Lois 52 1.85 UK
Jacobs 53 1.62 Germany
Betty 55 1.69 Germany
Wendy 57 1.63 USA
Donald 62 1.62 France
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK
Tom 73 1.78 USA
Gary 74 1.65 Germany
Nick 76 1.60 France

We can also sort the rows in descending order of ‘Weight’ by using the optional argument ascending=False.

Example

Sorting rows by the ‘Weight’ column (descending).

1df.sort_values(by ='Weight', axis=0, ascending=False)

Weight Height Country
Nick 76 1.60 France
Gary 74 1.65 Germany
Tom 73 1.78 USA
Yasmine 64 1.83 UK
Andrew 63 1.78 Canada
Donald 62 1.62 France
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Lois 52 1.85 UK

Sorting Rows Based on Values of Multiple Columns

In this example, we will sort the rows by ‘Country’ and then by ‘Height’, both in ascending (default) order.

Example

Sorting rows by 2 columns.

1df.sort_values(by=['Country', 'Height'], axis=0)

Weight Height Country
Andrew 63 1.78 Canada
Nick 76 1.60 France
Donald 62 1.62 France
Jacobs 53 1.62 Germany
Gary 74 1.65 Germany
Betty 55 1.69 Germany
Yasmine 64 1.83 UK
Lois 52 1.85 UK
Wendy 57 1.63 USA
Tom 73 1.78 USA

We can also employ mixed sorting where different columns can be assigned either ascending or descending order.

Example

Sorting rows by ‘Country’ (alphabetical order) and then by ‘Height’ (descending order)

1df.sort_values(by=['Country', 'Height'], axis=0, ascending=[True,False])

Weight Height Country
Andrew 63 1.78 Canada
Donald 62 1.62 France
Nick 76 1.60 France
Betty 55 1.69 Germany
Gary 74 1.65 Germany
Jacobs 53 1.62 Germany
Lois 52 1.85 UK
Yasmine 64 1.83 UK
Tom 73 1.78 USA
Wendy 57 1.63 USA

Rearranging Rows

We discuss two methods of rearranging rows:

  • by manipulating row labels
  • by manipulating row integer indices

But first, let’s print out our current DataFrame.

1df

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK

Manipulating Row Labels

To move any row to any position, we can manipulate the list of row labels.

Example

Moving a row to a specified position: insert ‘Wendy’ between ‘Donald’ and ‘Andrew’.

Step 1 : Get list of row labels.

1ind = df.index.tolist()
2ind
['Tom',
 'Gary',
 'Lois',
 'Nick',
 'Wendy',
 'Betty',
 'Jacobs',
 'Donald',
 'Andrew',
 'Yasmine']

We can use the insert() method to insert the desired row to its destination position.

Step 2 : Using insert() method to manipulate label positions.

1rem=ind.index("Wendy") # index of row to be moved
2ins=ind.index("Donald") # index of destination row
3
4ind.pop(rem) # remove "Wendy"
5ind.insert(ins,'Wendy') # insert "Wendy"
6
7ind
['Tom',
 'Gary',
 'Lois',
 'Nick',
 'Betty',
 'Jacobs',
 'Donald',
 'Wendy',
 'Andrew',
 'Yasmine']

We now feed the above modified list of row labels into our DataFrame using the loc attribute.

Step 3 : Feeding new label to DataFrame using loc attribute.

1df.loc[ind]

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Wendy 57 1.63 USA
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK

Manipulating Row Indices

We can also rearrange rows of a DataFrame by manipulating the integer index of the rows. Let’s first print out the current DataFrame.

1df

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK

Example

Moving a row to a specified position: insert ‘Wendy’ between ‘Donald’ and ‘Andrew’.

The first step is to reset the current index to an integer index.

Step 1 : Reset to integer index using reset_index() method.

1df1 = df.reset_index()
2df1

index Weight Height Country
0 Tom 73 1.78 USA
1 Gary 74 1.65 Germany
2 Lois 52 1.85 UK
3 Nick 76 1.60 France
4 Wendy 57 1.63 USA
5 Betty 55 1.69 Germany
6 Jacobs 53 1.62 Germany
7 Donald 62 1.62 France
8 Andrew 63 1.78 Canada
9 Yasmine 64 1.83 UK

We then generate the list of integer row indices.

Step 2 : Get list of row indices using index.tolist().

1ind = df1.index.tolist()
2ind
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can then pick the relevant row indices (corresponding to both ‘Wendy’ and ‘Donald’) to be manipulated.

Step 3 : Using insert() and pop() methods to manipulate row indices.

1rem=4 # index of row to be moved
2ins=7 # index of destination row
3
4ind.pop(rem) # remove "Wendy"
5ind.insert(ins,rem) # insert "Wendy"
6ind
[0, 1, 2, 3, 5, 6, 7, 4, 8, 9]

We now feed the above modified list of row indices into our DataFrame using the loc attribute.

Step 4 : Feeding new index to DataFrame using loc attribute.

1df1 = df1.loc[ind]
2df1

index Weight Height Country
0 Tom 73 1.78 USA
1 Gary 74 1.65 Germany
2 Lois 52 1.85 UK
3 Nick 76 1.60 France
5 Betty 55 1.69 Germany
6 Jacobs 53 1.62 Germany
7 Donald 62 1.62 France
4 Wendy 57 1.63 USA
8 Andrew 63 1.78 Canada
9 Yasmine 64 1.83 UK

If desired, we may remove the numerical index and revert back to the row labels by setting the ‘index’ column as the new index.

Step 5 : Revert back to original row labels using set_index() method.

1df1.set_index('index').rename_axis(None, axis=0)

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Nick 76 1.60 France
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Wendy 57 1.63 USA
Andrew 63 1.78 Canada
Yasmine 64 1.83 UK

The rename_axis(None, axis=0) method ensures that the index column remains unnamed.