Table of Contents



Introduction

In this article, we will discuss the following operations on the columns of a DataFrame:

  • sorting columns
  • rearranging columns
  • re-indexing columns

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd

We then create a dataframe with the help of some random functions in NumPy.

 1R = np.random.RandomState(10) # set a random seed
 2a='Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew'.split()
 3b=R.choice(range(50,85), size=10, replace=False)
 4c=R.uniform(1.6, 1.9, 10).round(2)
 5d=R.choice(['Germany', 'UK', 'USA', 'France', 'Canada'], size=10)
 6
 7df = pd.DataFrame({'Weight':b,
 8                   'Height':c,
 9                   'Country':d}, index=a)
10df = df.T
11df

Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick Andrew
Weight 73 74 52 57 55 53 62 64 76 63
Height 1.78 1.65 1.85 1.63 1.69 1.62 1.62 1.83 1.6 1.78
Country USA Germany UK USA Germany Germany France UK France Canada

Sorting Columns

Sorting Columns by First Letter

To sort the columns according to the first alphabet.

Example

Sorting columns by first alphabet of the labels.

1df[sorted(df.columns)]

Andrew Betty Donald Gary Jacobs Lois Nick Tom Wendy Yasmine
Weight 63 55 62 74 53 52 76 73 57 64
Height 1.78 1.69 1.62 1.65 1.62 1.85 1.6 1.78 1.63 1.83
Country Canada Germany France Germany Germany UK France USA USA UK

To sort the columns according to the first alphabet, in reverse order.

1df[sorted(df.columns, reverse=True)]

Yasmine Wendy Tom Nick Lois Jacobs Gary Donald Betty Andrew
Weight 64 57 73 76 52 53 74 62 55 63
Height 1.83 1.63 1.78 1.6 1.85 1.62 1.65 1.62 1.69 1.78
Country UK USA USA France UK Germany Germany France Germany Canada

Example

Sorting columns by length of the labels.

Recall that we can sort a Python list according to the length of the items using the key argument which is a function used to decide the order.

👀 Review

In this case, we employ the len() function as the key.

1sorted(df.columns, key=lambda x: len(x))
['Tom',
 'Gary',
 'Lois',
 'Nick',
 'Wendy',
 'Betty',
 'Jacobs',
 'Donald',
 'Andrew',
 'Yasmine']

We can then sort the columns according to the length of the column labels.

1df[sorted(df.columns, key=lambda x: len(x))]

Tom Gary Lois Nick Wendy Betty Jacobs Donald Andrew Yasmine
Weight 73 74 52 76 57 55 53 62 63 64
Height 1.78 1.65 1.85 1.6 1.63 1.69 1.62 1.62 1.78 1.83
Country USA Germany UK France USA Germany Germany France Canada UK

Example

Sorting columns by length of the labels (reverse order).

We first sort the list of column labels in reverse order according to the length.

1sorted(df.columns, key=lambda x: len(x), reverse=True)
['Yasmine',
 'Jacobs',
 'Donald',
 'Andrew',
 'Wendy',
 'Betty',
 'Gary',
 'Lois',
 'Nick',
 'Tom']

We then feed the sorted list into the DataFrame.

1df[sorted(df.columns, key=lambda x: len(x), reverse=True)]

Yasmine Jacobs Donald Andrew Wendy Betty Gary Lois Nick Tom
Weight 64 53 62 63 57 55 74 52 76 73
Height 1.83 1.62 1.62 1.78 1.63 1.69 1.65 1.85 1.6 1.78
Country UK Germany France Canada USA Germany Germany UK France USA

Sort columns based on values of a single row.

Example

Sorting columns in ascending order of weight.

1df.sort_values(by ='Weight', axis=1)

Lois Jacobs Betty Wendy Donald Andrew Yasmine Tom Gary Nick
Weight 52 53 55 57 62 63 64 73 74 76
Height 1.85 1.62 1.69 1.63 1.62 1.78 1.83 1.78 1.65 1.6
Country UK Germany Germany USA France Canada UK USA Germany France

Example

Sorting columns in descending order of weight.

We can include the optional argument ascending=False to sort the columns in descending order of ‘Weight’.

1df.sort_values(by ='Weight', axis=1, ascending=False)

Nick Gary Tom Yasmine Andrew Donald Wendy Betty Jacobs Lois
Weight 76 74 73 64 63 62 57 55 53 52
Height 1.6 1.65 1.78 1.83 1.78 1.62 1.63 1.69 1.62 1.85
Country France Germany USA UK Canada France USA Germany Germany UK

Sort columns based on values of multiple rows.

Example

Sorting columns by ‘Country’ (alphabetical order) and then by ‘Height’.

1df.sort_values(by =['Country', 'Height'], axis=1)

Andrew Nick Donald Jacobs Gary Betty Yasmine Lois Wendy Tom
Weight 63 76 62 53 74 55 64 52 57 73
Height 1.78 1.6 1.62 1.62 1.65 1.69 1.83 1.85 1.63 1.78
Country Canada France France Germany Germany Germany UK UK USA USA

Similar to above but in descending order.

Example

Sorting columns by ‘Country’ (reverse alphabetical order) and then by ‘Height’ (descending order).

1df.sort_values(by =['Country', 'Height'], axis=1, ascending=False)

Tom Wendy Lois Yasmine Betty Gary Jacobs Donald Nick Andrew
Weight 73 57 52 64 55 74 53 62 76 63
Height 1.78 1.63 1.85 1.83 1.69 1.65 1.62 1.62 1.6 1.78
Country USA USA UK UK Germany Germany Germany France France Canada

We can also employ mixed sorting where different rows can be assigned either ascending or descending order.

Example

Sorting columns by ‘Country’ (alphabetical order) and then by ‘Height’ (descending order).

1df.sort_values(by =['Country', 'Height'], axis=1, ascending=[True,False])

Andrew Donald Nick Betty Gary Jacobs Lois Yasmine Tom Wendy
Weight 63 62 76 55 74 53 52 64 73 57
Height 1.78 1.62 1.6 1.69 1.65 1.62 1.85 1.83 1.78 1.63
Country Canada France France Germany Germany Germany UK UK USA USA

Rearranging Columns

Our first example is to move the last column to become the first column (after index column).

Example

Rearranging columns by manipulating the list of column labels.

The first step is to create a list of the column labels using the tolist() method.

1cols = df.columns.tolist()
2cols
['Tom',
 'Gary',
 'Lois',
 'Wendy',
 'Betty',
 'Jacobs',
 'Donald',
 'Yasmine',
 'Nick',
 'Andrew']

The following code will extract the last element as a list.

1cols[-1:]
['Andrew']

And the following code will slice the original list by excluding the last element.

1cols[:-1]
['Tom',
 'Gary',
 'Lois',
 'Wendy',
 'Betty',
 'Jacobs',
 'Donald',
 'Yasmine',
 'Nick']

We now concatenate (+) the two lists to create the desired list of (rearranged) column labels.

1cols1 = cols[-1:] + cols[:-1]
2cols1
['Andrew',
 'Tom',
 'Gary',
 'Lois',
 'Wendy',
 'Betty',
 'Jacobs',
 'Donald',
 'Yasmine',
 'Nick']

Finally, we feed the list of column labels into the DataFrame.

1df[cols1]

Andrew Tom Gary Lois Wendy Betty Jacobs Donald Yasmine Nick
Weight 63 73 74 52 57 55 53 62 64 76
Height 1.78 1.78 1.65 1.85 1.63 1.69 1.62 1.62 1.83 1.6
Country Canada USA Germany UK USA Germany Germany France UK France

To move any column to any position, we can proceed in a likewise manner.

Example

Rearranging columns.

For example, to move the column ‘Wendy’ to column ‘Yasmine’ position.

1rem=cols.index("Wendy") # index of column to be moved
2ins=cols.index("Yasmine") # index of destination column
3
4cols.pop(rem) # remove "Wendy"
5cols.insert(ins,'Wendy') # insert "Wendy"
6
7cols
['Tom',
 'Gary',
 'Lois',
 'Betty',
 'Jacobs',
 'Donald',
 'Yasmine',
 'Wendy',
 'Nick',
 'Andrew']

We then feed the above list of rearranged columns to the DataFrame.

1df[cols]

Tom Gary Lois Betty Jacobs Donald Yasmine Wendy Nick Andrew
Weight 73 74 52 55 53 62 64 57 76 63
Height 1.78 1.65 1.85 1.69 1.62 1.62 1.83 1.63 1.6 1.78
Country USA Germany UK Germany Germany France UK USA France Canada

Re-indexing the Columns

Re-indexing a DataFrame allows us to completely redefine the columns of a DataFrame by specifying the columns to keep or create.

We will work with the transposed version of the DataFrame and name it df1.

1df1=df.T
2df1

Weight Height Country
Tom 73 1.78 USA
Gary 74 1.65 Germany
Lois 52 1.85 UK
Wendy 57 1.63 USA
Betty 55 1.69 Germany
Jacobs 53 1.62 Germany
Donald 62 1.62 France
Yasmine 64 1.83 UK
Nick 76 1.6 France
Andrew 63 1.78 Canada

We now create a new column with label ‘Ratio’ which is the ratio of the ‘Weight’ and ‘Height’ columns.

Example

Creating a new column from two existing columns.

1df1['Ratio'] = df1['Weight']/df1['Height']
2df1

Weight Height Country Ratio
Tom 73 1.78 USA 41.011236
Gary 74 1.65 Germany 44.848485
Lois 52 1.85 UK 28.108108
Wendy 57 1.63 USA 34.969325
Betty 55 1.69 Germany 32.544379
Jacobs 53 1.62 Germany 32.716049
Donald 62 1.62 France 38.271605
Yasmine 64 1.83 UK 34.972678
Nick 76 1.6 France 47.5
Andrew 63 1.78 Canada 35.393258

We now use the reindex method to keep only the ‘Height’ and ‘Ratio’ columns while creating a new column with label ‘Placeholder’. The argument axis=1 tells pandas we are reindexing the column labels.

Example

Reindexing columns.

1df1.reindex(['Height', 'Ratio', 'Placeholder'], axis=1)

Height Ratio Placeholder
Tom 1.78 41.011236 NaN
Gary 1.65 44.848485 NaN
Lois 1.85 28.108108 NaN
Wendy 1.63 34.969325 NaN
Betty 1.69 32.544379 NaN
Jacobs 1.62 32.716049 NaN
Donald 1.62 38.271605 NaN
Yasmine 1.83 34.972678 NaN
Nick 1.6 47.5 NaN
Andrew 1.78 35.393258 NaN

By default, values in the new column that do not have corresponding records in the DataFrame are assigned NaN.

We can fill in the missing values by passing a value to the parameter fill_value.

Example

Reindexing columns with fill_value=0.

1df1.reindex(['Height', 'Ratio', 'Placeholder'], axis=1, fill_value=0)

Height Ratio Placeholder
Tom 1.78 41.011236 0
Gary 1.65 44.848485 0
Lois 1.85 28.108108 0
Wendy 1.63 34.969325 0
Betty 1.69 32.544379 0
Jacobs 1.62 32.716049 0
Donald 1.62 38.271605 0
Yasmine 1.83 34.972678 0
Nick 1.6 47.5 0
Andrew 1.78 35.393258 0