Table of Contents
Introduction
In this article, we will discuss the following operations on the columns of a DataFrame:
- sorting columns
- rearranging columns
- re-indexing columns
We first import the required libraries and modules.
1import numpy as np
2import pandas as pd
We then create a dataframe with the help of some random functions in NumPy.
1R = np.random.RandomState(10) # set a random seed
2a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
3b=R.choice(range(50,85), size=10, replace=False)
4c=R.uniform(1.6, 1.9, 10).round(2)
5d=R.choice(['Germany', 'UK', 'USA', 'France', 'Canada'], size=10)
6
7df = pd.DataFrame({'Weight':b,
8 'Height':c,
9 'Country':d}, index=a)
10df = df.T
11df
Tom | Gary | Lois | Wendy | Betty | Jacobs | Donald | Yasmine | Nick | Andrew | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 73 | 74 | 52 | 57 | 55 | 53 | 62 | 64 | 76 | 63 |
Height | 1.78 | 1.65 | 1.85 | 1.63 | 1.69 | 1.62 | 1.62 | 1.83 | 1.6 | 1.78 |
Country | USA | Germany | UK | USA | Germany | Germany | France | UK | France | Canada |
Sorting Columns
Sorting Columns by First Letter
To sort the columns according to the first alphabet.
Example
Sorting columns by first alphabet of the labels.1df[sorted(df.columns)]
Andrew | Betty | Donald | Gary | Jacobs | Lois | Nick | Tom | Wendy | Yasmine | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 63 | 55 | 62 | 74 | 53 | 52 | 76 | 73 | 57 | 64 |
Height | 1.78 | 1.69 | 1.62 | 1.65 | 1.62 | 1.85 | 1.6 | 1.78 | 1.63 | 1.83 |
Country | Canada | Germany | France | Germany | Germany | UK | France | USA | USA | UK |
To sort the columns according to the first alphabet, in reverse order.
1df[sorted(df.columns, reverse=True)]
Yasmine | Wendy | Tom | Nick | Lois | Jacobs | Gary | Donald | Betty | Andrew | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 64 | 57 | 73 | 76 | 52 | 53 | 74 | 62 | 55 | 63 |
Height | 1.83 | 1.63 | 1.78 | 1.6 | 1.85 | 1.62 | 1.65 | 1.62 | 1.69 | 1.78 |
Country | UK | USA | USA | France | UK | Germany | Germany | France | Germany | Canada |
Example
Sorting columns by length of the labels.Recall that we can sort a Python list according to the length of the items using the key
argument which is a function used to decide the order.
In this case, we employ the len()
function as the key.
1sorted(df.columns, key=lambda x: len(x))
['Tom',
'Gary',
'Lois',
'Nick',
'Wendy',
'Betty',
'Jacobs',
'Donald',
'Andrew',
'Yasmine']
We can then sort the columns according to the length of the column labels.
1df[sorted(df.columns, key=lambda x: len(x))]
Tom | Gary | Lois | Nick | Wendy | Betty | Jacobs | Donald | Andrew | Yasmine | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 73 | 74 | 52 | 76 | 57 | 55 | 53 | 62 | 63 | 64 |
Height | 1.78 | 1.65 | 1.85 | 1.6 | 1.63 | 1.69 | 1.62 | 1.62 | 1.78 | 1.83 |
Country | USA | Germany | UK | France | USA | Germany | Germany | France | Canada | UK |
Example
Sorting columns by length of the labels (reverse order).We first sort the list of column labels in reverse order according to the length.
1sorted(df.columns, key=lambda x: len(x), reverse=True)
['Yasmine',
'Jacobs',
'Donald',
'Andrew',
'Wendy',
'Betty',
'Gary',
'Lois',
'Nick',
'Tom']
We then feed the sorted list into the DataFrame.
1df[sorted(df.columns, key=lambda x: len(x), reverse=True)]
Yasmine | Jacobs | Donald | Andrew | Wendy | Betty | Gary | Lois | Nick | Tom | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 64 | 53 | 62 | 63 | 57 | 55 | 74 | 52 | 76 | 73 |
Height | 1.83 | 1.62 | 1.62 | 1.78 | 1.63 | 1.69 | 1.65 | 1.85 | 1.6 | 1.78 |
Country | UK | Germany | France | Canada | USA | Germany | Germany | UK | France | USA |
Sort columns based on values of a single row.
Example
Sorting columns in ascending order of weight.1df.sort_values(by ='Weight', axis=1)
Lois | Jacobs | Betty | Wendy | Donald | Andrew | Yasmine | Tom | Gary | Nick | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 52 | 53 | 55 | 57 | 62 | 63 | 64 | 73 | 74 | 76 |
Height | 1.85 | 1.62 | 1.69 | 1.63 | 1.62 | 1.78 | 1.83 | 1.78 | 1.65 | 1.6 |
Country | UK | Germany | Germany | USA | France | Canada | UK | USA | Germany | France |
Example
Sorting columns in descending order of weight.We can include the optional argument ascending=False
to sort the columns in descending order of ‘Weight’.
1df.sort_values(by ='Weight', axis=1, ascending=False)
Nick | Gary | Tom | Yasmine | Andrew | Donald | Wendy | Betty | Jacobs | Lois | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 76 | 74 | 73 | 64 | 63 | 62 | 57 | 55 | 53 | 52 |
Height | 1.6 | 1.65 | 1.78 | 1.83 | 1.78 | 1.62 | 1.63 | 1.69 | 1.62 | 1.85 |
Country | France | Germany | USA | UK | Canada | France | USA | Germany | Germany | UK |
Sort columns based on values of multiple rows.
Example
Sorting columns by ‘Country’ (alphabetical order) and then by ‘Height’.1df.sort_values(by =['Country', 'Height'], axis=1)
Andrew | Nick | Donald | Jacobs | Gary | Betty | Yasmine | Lois | Wendy | Tom | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 63 | 76 | 62 | 53 | 74 | 55 | 64 | 52 | 57 | 73 |
Height | 1.78 | 1.6 | 1.62 | 1.62 | 1.65 | 1.69 | 1.83 | 1.85 | 1.63 | 1.78 |
Country | Canada | France | France | Germany | Germany | Germany | UK | UK | USA | USA |
Similar to above but in descending order.
Example
Sorting columns by ‘Country’ (reverse alphabetical order) and then by ‘Height’ (descending order).1df.sort_values(by =['Country', 'Height'], axis=1, ascending=False)
Tom | Wendy | Lois | Yasmine | Betty | Gary | Jacobs | Donald | Nick | Andrew | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 73 | 57 | 52 | 64 | 55 | 74 | 53 | 62 | 76 | 63 |
Height | 1.78 | 1.63 | 1.85 | 1.83 | 1.69 | 1.65 | 1.62 | 1.62 | 1.6 | 1.78 |
Country | USA | USA | UK | UK | Germany | Germany | Germany | France | France | Canada |
We can also employ mixed sorting where different rows can be assigned either ascending or descending order.
Example
Sorting columns by ‘Country’ (alphabetical order) and then by ‘Height’ (descending order).1df.sort_values(by =['Country', 'Height'], axis=1, ascending=[True,False])
Andrew | Donald | Nick | Betty | Gary | Jacobs | Lois | Yasmine | Tom | Wendy | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 63 | 62 | 76 | 55 | 74 | 53 | 52 | 64 | 73 | 57 |
Height | 1.78 | 1.62 | 1.6 | 1.69 | 1.65 | 1.62 | 1.85 | 1.83 | 1.78 | 1.63 |
Country | Canada | France | France | Germany | Germany | Germany | UK | UK | USA | USA |
Rearranging Columns
Our first example is to move the last column to become the first column (after index column).
Example
Rearranging columns by manipulating the list of column labels.The first step is to create a list of the column labels using the tolist()
method.
1cols = df.columns.tolist()
2cols
['Tom',
'Gary',
'Lois',
'Wendy',
'Betty',
'Jacobs',
'Donald',
'Yasmine',
'Nick',
'Andrew']
The following code will extract the last element as a list.
1cols[-1:]
['Andrew']
And the following code will slice the original list by excluding the last element.
1cols[:-1]
['Tom',
'Gary',
'Lois',
'Wendy',
'Betty',
'Jacobs',
'Donald',
'Yasmine',
'Nick']
We now concatenate (+
) the two lists to create the desired list of (rearranged) column labels.
1cols1 = cols[-1:] + cols[:-1]
2cols1
['Andrew',
'Tom',
'Gary',
'Lois',
'Wendy',
'Betty',
'Jacobs',
'Donald',
'Yasmine',
'Nick']
Finally, we feed the list of column labels into the DataFrame.
1df[cols1]
Andrew | Tom | Gary | Lois | Wendy | Betty | Jacobs | Donald | Yasmine | Nick | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 63 | 73 | 74 | 52 | 57 | 55 | 53 | 62 | 64 | 76 |
Height | 1.78 | 1.78 | 1.65 | 1.85 | 1.63 | 1.69 | 1.62 | 1.62 | 1.83 | 1.6 |
Country | Canada | USA | Germany | UK | USA | Germany | Germany | France | UK | France |
To move any column to any position, we can proceed in a likewise manner.
Example
Rearranging columns.For example, to move the column ‘Wendy’ to column ‘Yasmine’ position.
1rem=cols.index("Wendy") # index of column to be moved
2ins=cols.index("Yasmine") # index of destination column
3
4cols.pop(rem) # remove "Wendy"
5cols.insert(ins,'Wendy') # insert "Wendy"
6
7cols
['Tom',
'Gary',
'Lois',
'Betty',
'Jacobs',
'Donald',
'Yasmine',
'Wendy',
'Nick',
'Andrew']
We then feed the above list of rearranged columns to the DataFrame.
1df[cols]
Tom | Gary | Lois | Betty | Jacobs | Donald | Yasmine | Wendy | Nick | Andrew | |
---|---|---|---|---|---|---|---|---|---|---|
Weight | 73 | 74 | 52 | 55 | 53 | 62 | 64 | 57 | 76 | 63 |
Height | 1.78 | 1.65 | 1.85 | 1.69 | 1.62 | 1.62 | 1.83 | 1.63 | 1.6 | 1.78 |
Country | USA | Germany | UK | Germany | Germany | France | UK | USA | France | Canada |
Re-indexing the Columns
Re-indexing a DataFrame allows us to completely redefine the columns of a DataFrame by specifying the columns to keep or create.
We will work with the transposed version of the DataFrame and name it df1
.
1df1=df.T
2df1
Weight | Height | Country | |
---|---|---|---|
Tom | 73 | 1.78 | USA |
Gary | 74 | 1.65 | Germany |
Lois | 52 | 1.85 | UK |
Wendy | 57 | 1.63 | USA |
Betty | 55 | 1.69 | Germany |
Jacobs | 53 | 1.62 | Germany |
Donald | 62 | 1.62 | France |
Yasmine | 64 | 1.83 | UK |
Nick | 76 | 1.6 | France |
Andrew | 63 | 1.78 | Canada |
We now create a new column with label ‘Ratio’ which is the ratio of the ‘Weight’ and ‘Height’ columns.
Example
Creating a new column from two existing columns.1df1['Ratio'] = df1['Weight']/df1['Height']
2df1
Weight | Height | Country | Ratio | |
---|---|---|---|---|
Tom | 73 | 1.78 | USA | 41.011236 |
Gary | 74 | 1.65 | Germany | 44.848485 |
Lois | 52 | 1.85 | UK | 28.108108 |
Wendy | 57 | 1.63 | USA | 34.969325 |
Betty | 55 | 1.69 | Germany | 32.544379 |
Jacobs | 53 | 1.62 | Germany | 32.716049 |
Donald | 62 | 1.62 | France | 38.271605 |
Yasmine | 64 | 1.83 | UK | 34.972678 |
Nick | 76 | 1.6 | France | 47.5 |
Andrew | 63 | 1.78 | Canada | 35.393258 |
We now use the reindex
method to keep only the ‘Height’ and ‘Ratio’ columns while creating a new column with label ‘Placeholder’. The argument axis=1
tells pandas we are reindexing the column labels.
Example
Reindexing columns.1df1.reindex(['Height', 'Ratio', 'Placeholder'], axis=1)
Height | Ratio | Placeholder | |
---|---|---|---|
Tom | 1.78 | 41.011236 | NaN |
Gary | 1.65 | 44.848485 | NaN |
Lois | 1.85 | 28.108108 | NaN |
Wendy | 1.63 | 34.969325 | NaN |
Betty | 1.69 | 32.544379 | NaN |
Jacobs | 1.62 | 32.716049 | NaN |
Donald | 1.62 | 38.271605 | NaN |
Yasmine | 1.83 | 34.972678 | NaN |
Nick | 1.6 | 47.5 | NaN |
Andrew | 1.78 | 35.393258 | NaN |
By default, values in the new column that do not have corresponding records in the DataFrame are assigned NaN
.
We can fill in the missing values by passing a value to the parameter fill_value
.
Example
Reindexing columns withfill_value=0
.
1df1.reindex(['Height', 'Ratio', 'Placeholder'], axis=1, fill_value=0)
Height | Ratio | Placeholder | |
---|---|---|---|
Tom | 1.78 | 41.011236 | 0 |
Gary | 1.65 | 44.848485 | 0 |
Lois | 1.85 | 28.108108 | 0 |
Wendy | 1.63 | 34.969325 | 0 |
Betty | 1.69 | 32.544379 | 0 |
Jacobs | 1.62 | 32.716049 | 0 |
Donald | 1.62 | 38.271605 | 0 |
Yasmine | 1.83 | 34.972678 | 0 |
Nick | 1.6 | 47.5 | 0 |
Andrew | 1.78 | 35.393258 | 0 |