Table of Contents
Introduction
In this article, we will discuss the following operations on the rows of a DataFrame:
- sorting rows
- rearranging rows
We first import the required libraries and modules.
1import numpy as np
2import pandas as pd
We then create a dataframe with the help of some random functions in NumPy.
1R = np.random.RandomState(10) # set a random seed
2a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
3b=R.choice(range(50,85), size=10, replace=False)
4c=R.uniform(1.6, 1.9, 10).round(2)
5d=R.choice(['Germany', 'UK', 'USA', 'France', 'Canada'], size=10)
6
7df = pd.DataFrame({'Weight':b,
8 'Height':c,
9 'Country':d}, index=a)
10df
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Yasmine | 64 | 1.83 | UK |
Nick | 76 | 1.60 | France |
Andrew | 63 | 1.78 | Canada |
Sorting Rows
We usually sort the rows of a DataFrame in three typical scenarios:
- sorting rows based on the index.
- sorting rows base on values of a single column.
- sorting rows base on values of multiple columns.
Sorting Rows Based on Index
To sort the rows according to the first alphabet.
Example
Sorting rows according to the first alphabet of index.1df.sort_index()
Weight | Height | Country | |
---|---|---|---|
Andrew | 63 | 1.78 | Canada |
Betty | 55 | 1.69 | Germany |
Donald | 62 | 1.62 | France |
Gary | 74 | 1.65 | Germany |
Jacobs | 53 | 1.62 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Tom | 73 | 1.78 | USA |
Wendy | 57 | 1.63 | USA |
Yasmine | 64 | 1.83 | UK |
To rearrange the rows according to the first alphabet, in reverse order.
Example
Sorting rows according to the first alphabet of index (reverse order).1df.sort_index(ascending=False)
Weight | Height | Country | |
---|---|---|---|
Yasmine | 64 | 1.83 | UK |
Wendy | 57 | 1.63 | USA |
Tom | 73 | 1.78 | USA |
Nick | 76 | 1.60 | France |
Lois | 52 | 1.85 | UK |
Jacobs | 53 | 1.62 | Germany |
Gary | 74 | 1.65 | Germany |
Donald | 62 | 1.62 | France |
Betty | 55 | 1.69 | Germany |
Andrew | 63 | 1.78 | Canada |
A key
function can be specified which is applied to the index before sorting.
For example, to sort the rows according to the length of the row labels.
Example
Sorting rows according to the length of the index.1df.sort_index(key=lambda x: x.str.len())
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
Same as above but in descending order.
Example
Sorting rows according to the length of the index (descending).1df.sort_index(key=lambda x: x.str.len(), ascending=False)
Weight | Height | Country | |
---|---|---|---|
Yasmine | 64 | 1.83 | UK |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Andrew | 63 | 1.78 | Canada |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Tom | 73 | 1.78 | USA |
The original DataFrame is not modified. However, we can sort the rows in-place by including the argument inplace=True
. This will do away with the need to reassign the DataFrame.
Example
Sorting rows according to the length of the index (in-place).1df.sort_index(key=lambda x: x.str.len(), inplace=True)
2df
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
Sorting Rows Based on Values of a Single Column
In this example, we will sort the rows in ascending order of ‘Weight’ using the sort_values()
method.
Example
Sorting rows by the ‘Weight’ column (ascending).1df.sort_values(by ='Weight', axis=0)
Weight | Height | Country | |
---|---|---|---|
Lois | 52 | 1.85 | UK |
Jacobs | 53 | 1.62 | Germany |
Betty | 55 | 1.69 | Germany |
Wendy | 57 | 1.63 | USA |
Donald | 62 | 1.62 | France |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Nick | 76 | 1.60 | France |
We can also sort the rows in descending order of ‘Weight’ by using the optional argument ascending=False
.
Example
Sorting rows by the ‘Weight’ column (descending).1df.sort_values(by ='Weight', axis=0, ascending=False)
Weight | Height | Country | |
---|---|---|---|
Nick | 76 | 1.60 | France |
Gary | 74 | 1.65 | Germany |
Tom | 73 | 1.78 | USA |
Yasmine | 64 | 1.83 | UK |
Andrew | 63 | 1.78 | Canada |
Donald | 62 | 1.62 | France |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Lois | 52 | 1.85 | UK |
Sorting Rows Based on Values of Multiple Columns
In this example, we will sort the rows by ‘Country’ and then by ‘Height’, both in ascending (default) order.
Example
Sorting rows by 2 columns.1df.sort_values(by=['Country', 'Height'], axis=0)
Weight | Height | Country | |
---|---|---|---|
Andrew | 63 | 1.78 | Canada |
Nick | 76 | 1.60 | France |
Donald | 62 | 1.62 | France |
Jacobs | 53 | 1.62 | Germany |
Gary | 74 | 1.65 | Germany |
Betty | 55 | 1.69 | Germany |
Yasmine | 64 | 1.83 | UK |
Lois | 52 | 1.85 | UK |
Wendy | 57 | 1.63 | USA |
Tom | 73 | 1.78 | USA |
We can also employ mixed sorting where different columns can be assigned either ascending or descending order.
Example
Sorting rows by ‘Country’ (alphabetical order) and then by ‘Height’ (descending order)1df.sort_values(by=['Country', 'Height'], axis=0, ascending=[True,False])
Weight | Height | Country | |
---|---|---|---|
Andrew | 63 | 1.78 | Canada |
Donald | 62 | 1.62 | France |
Nick | 76 | 1.60 | France |
Betty | 55 | 1.69 | Germany |
Gary | 74 | 1.65 | Germany |
Jacobs | 53 | 1.62 | Germany |
Lois | 52 | 1.85 | UK |
Yasmine | 64 | 1.83 | UK |
Tom | 73 | 1.78 | USA |
Wendy | 57 | 1.63 | USA |
Rearranging Rows
We discuss two methods of rearranging rows:
- by manipulating row labels
- by manipulating row integer indices
But first, let’s print out our current DataFrame.
1df
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
Manipulating Row Labels
To move any row to any position, we can manipulate the list of row labels.
Example
Moving a row to a specified position: insert ‘Wendy’ between ‘Donald’ and ‘Andrew’.Step 1 : Get list of row labels.
1ind = df.index.tolist()
2ind
['Tom',
'Gary',
'Lois',
'Nick',
'Wendy',
'Betty',
'Jacobs',
'Donald',
'Andrew',
'Yasmine']
We can use the insert()
method to insert the desired row to its destination position.
Step 2 : Using insert()
method to manipulate label positions.
1rem=ind.index("Wendy") # index of row to be moved
2ins=ind.index("Donald") # index of destination row
3
4ind.pop(rem) # remove "Wendy"
5ind.insert(ins,'Wendy') # insert "Wendy"
6
7ind
['Tom',
'Gary',
'Lois',
'Nick',
'Betty',
'Jacobs',
'Donald',
'Wendy',
'Andrew',
'Yasmine']
We now feed the above modified list of row labels into our DataFrame using the loc
attribute.
Step 3 : Feeding new label to DataFrame using loc
attribute.
1df.loc[ind]
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Wendy | 57 | 1.63 | USA |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
Manipulating Row Indices
We can also rearrange rows of a DataFrame by manipulating the integer index of the rows. Let’s first print out the current DataFrame.
1df
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
Example
Moving a row to a specified position: insert ‘Wendy’ between ‘Donald’ and ‘Andrew’.The first step is to reset the current index to an integer index.
Step 1 : Reset to integer index using reset_index()
method.
1df1 = df.reset_index()
2df1
index | Weight | Height | Country | |
---|---|---|---|---|
0 | Tom | 73 | 1.78 | USA |
1 | Gary | 74 | 1.65 | Germany |
2 | Lois | 52 | 1.85 | UK |
3 | Nick | 76 | 1.60 | France |
4 | Wendy | 57 | 1.63 | USA |
5 | Betty | 55 | 1.69 | Germany |
6 | Jacobs | 53 | 1.62 | Germany |
7 | Donald | 62 | 1.62 | France |
8 | Andrew | 63 | 1.78 | Canada |
9 | Yasmine | 64 | 1.83 | UK |
We then generate the list of integer row indices.
Step 2 : Get list of row indices using index.tolist()
.
1ind = df1.index.tolist()
2ind
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
We can then pick the relevant row indices (corresponding to both ‘Wendy’ and ‘Donald’) to be manipulated.
Step 3 : Using insert()
and pop()
methods to manipulate row indices.
1rem=4 # index of row to be moved
2ins=7 # index of destination row
3
4ind.pop(rem) # remove "Wendy"
5ind.insert(ins,rem) # insert "Wendy"
6ind
[0, 1, 2, 3, 5, 6, 7, 4, 8, 9]
We now feed the above modified list of row indices into our DataFrame using the loc
attribute.
Step 4 : Feeding new index to DataFrame using loc
attribute.
1df1 = df1.loc[ind]
2df1
index | Weight | Height | Country | |
---|---|---|---|---|
0 | Tom | 73 | 1.78 | USA |
1 | Gary | 74 | 1.65 | Germany |
2 | Lois | 52 | 1.85 | UK |
3 | Nick | 76 | 1.60 | France |
5 | Betty | 55 | 1.69 | Germany |
6 | Jacobs | 53 | 1.62 | Germany |
7 | Donald | 62 | 1.62 | France |
4 | Wendy | 57 | 1.63 | USA |
8 | Andrew | 63 | 1.78 | Canada |
9 | Yasmine | 64 | 1.83 | UK |
If desired, we may remove the numerical index and revert back to the row labels by setting the ‘index’ column as the new index.
Step 5 : Revert back to original row labels using set_index()
method.
1df1.set_index('index').rename_axis(None, axis=0)
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Nick | 76 | 1.60 | France |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Wendy | 57 | 1.63 | USA |
Andrew | 63 | 1.78 | Canada |
Yasmine | 64 | 1.83 | UK |
The rename_axis(None, axis=0)
method ensures that the index column remains unnamed.