Table of Contents



Introduction

A pandas DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data. As mentioned, a DataFrame is analogous to the EXCEL spreadsheet with its rows and columns, while a Series is analogous to a single column of data.

Just like the Series object which is an analog of a 1-D NumPy array with flexible indices, a DataFrame is an analog of a 2-D NumPy array with both flexible row indices and column labels.

It can also be thought of as a dictionary-like container for Series objects. We can think of a DataFrame as a sequence of aligned Series objects, or in other words, Series that share the same index.

Syntax

A pandas DataFrame can be created using the following constructor.

Syntax

The pandas.DataFrame function.

1pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
Parameter Required? Default Value Description
data ✔️ Yes NA ndarray (structured or homogeneous), Iterable, dict, or DataFrame.
index ❌ No RangeIndex Index used to label rows of resulting frame.
columns ❌ No RangeIndex Column labels used for resulting frame.
dtype ❌ No Inferred from data Data type to force. Only a single dtype is allowed. If None, it is inferred from data.
copy ❌ No None bool. Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2-D ndarray input, the default of None behaves like copy=False. If data is a dict containing one or more Series (possibly of different dtypes), copy=False will ensure that these inputs are not copied.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

DataFrame from a Series Object

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series.

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd
3import random as rd

Example

Step 1: Creating a pandas Series from a dictionary.

1age_dict = {'Tom': 32,
2            'Gary': 26,
3            'Lois': 22,
4            'Wendy': 31,
5            'Betty': 35}  # create dictionary
6
7age = pd.Series(age_dict)  # create Series
8age
Tom      32
Gary     26
Lois     22
Wendy    31
Betty    35
dtype: int64

We now create a single-column DataFrame from the above Series object.

Example

Step 2: Creating a pandas DataFrame from a pandas Series.

1pd.DataFrame(age, columns=['Age'])  # create dataframe

Age
Tom 32
Gary 26
Lois 22
Wendy 31
Betty 35

DataFrame from a Dictionary of Series Objects

It is also possible to create a DataFrame from a dictionary of Series objects.

We create another Series object that share the same index as the age Series object (created above).

Example

Creating another pandas Series from a dictionary.

1height_dict = {'Tom': 1.75,
2               'Gary': 1.83,
3               'Lois': 1.69,
4               'Wendy': 1.67,
5               'Betty': 1.62}  # create dictionary
6
7height = pd.Series(height_dict)  # create Series
8height
Tom      1.75
Gary     1.83
Lois     1.69
Wendy    1.67
Betty    1.62
dtype: float64

We then create a DataFrame from a dictionary of the two Series objects.

Example

Creating a pandas DataFrame from a dictionary of two Series.

1pd.DataFrame({'age':age, 'height':height})  # create dataframe

age height
Tom 32 1.75
Gary 26 1.83
Lois 22 1.69
Wendy 31 1.67
Betty 35 1.62

Note that the column labels are just the ‘keys’ of the dictionary.

DataFrame from a Dictionary of Python Lists

It is also possible to create a pandas DataFrame from a dictionary of Python Lists.

We first create the lists to store the ’names’, ‘ages’, and ‘heights’. For convenience, we employ the random functions to generate some random data.

Example

Creating a pandas DataFrame from a dictionary of lists.

1a=['Tom','Gary','Lois','Wendy','Betty']
2b=rd.choices(range(30,45),k=5)
3c=np.random.uniform(1.6, 1.9, 5).round(2)
4
5# printing
6for x in (a,b,c):
7    print(x)
['Tom', 'Gary', 'Lois', 'Wendy', 'Betty']
[32, 41, 39, 36, 40]
[1.87 1.81 1.67 1.78 1.74]

We now create a dictionary of the above lists and generate a DataFrame using pd.DataFrame(), while specifying list a to be the index of the DataFrame.

1df = pd.DataFrame({'Age':b,
2                   'Height':c}, index=a)
3df

Age Height
Tom 32 1.87
Gary 41 1.81
Lois 39 1.67
Wendy 36 1.78
Betty 40 1.74

DataFrame from a List of Dictionaries

It is possible to convert a list of dictionaries into a DataFrame. In this case, each row of the dataframe is a dictionary.

Example

Creating a pandas DataFrame from list of dictionaries.

1dict1 = {'Name':'Tom',
2         'Age':30,
3         'Height':1.88}
4
5dict2 = {'Name':'Gary',
6         'Age':38,
7         'Height':1.68}
8
9pd.DataFrame([dict1,dict2])

Name Age Height
0 Tom 30 1.88
1 Gary 38 1.68

DataFrame from a Two-dimensional Array

Given a 2-D NumPy array of data, we can create a DataFrame with any specified column and index labels. If omitted, an integer RangeIndex will be used.

First, let’s recall that the split() method applied to a string (where the characters are separated by space) results in a list of characters.

Example

Creating a list of characters from a string.

1'A B C D E F'.split()
['A', 'B', 'C', 'D', 'E', 'F']

We now create a DataFrame where the column and index labels result from using the split() method.

Example

Creating a DataFrame from a 2-D NumPy array.

1pd.DataFrame(np.random.rand(6,4),
2             index='A B C D E F'.split(),
3             columns='W X Y Z'.split())

W X Y Z
A 0.074901 0.696408 0.153493 0.098724
B 0.265094 0.659235 0.833043 0.985687
C 0.930414 0.512948 0.539358 0.541957
D 0.911736 0.975602 0.777425 0.922223
E 0.255050 0.830163 0.964033 0.693914
F 0.246925 0.060152 0.535843 0.622826

Note that np.random.rand(6,4) creates a 2-D array of the shape (6,4) and populates it with random samples from a uniform distribution over $[0, 1)$.

DataFrame from a Nested List

Another way of creating a DataFrame is to provide the data as a nested list, along with labels for the columns and the index.

Example

Creating a pandas DataFrame from a nested list.

1data = [['Tom', 32, 1.75],
2        ['Gary', 26, 1.83],
3        ['Lois', 22,  1.69],
4        ['Wendy', 31, 1.67],
5        ['Betty', 35, 1.62]]
6pd.DataFrame(data, columns=['Name', "Age", "Height"])

Name Age Height
0 Tom 32 1.75
1 Gary 26 1.83
2 Lois 22 1.69
3 Wendy 31 1.67
4 Betty 35 1.62

DataFrame from a NumPy Structured Array

A structured array is a stripped-down version of a pandas DataFrame, so it comes as no surprise that the latter can be created directly from the former.

👀 Review

We first create a structured array using the lists a, b and c defined earlier. The first step is to create the data types.

Example

Creating a pandas DataFrame from a NumPy structured array.

1dt = np.dtype({
2'names':('Name', 'Age', 'Height'),
3'formats':( 'U10', int, float)})
4print(dt)
[('Name', '<U10'), ('Age', '<i4'), ('Height', '<f8')]

The next step is to initialize the structured array. Since there are a total of 5 entries, we initialize the structured array with an empty 1-D array with 5 elements using the dt data type defined earlier.

1data = np.empty(5, dt)
2print(data)
[('', 0, 0.) ('', 0, 0.) ('', 0, 0.) ('', 0, 0.) ('', 0, 0.)]

The final step is to populate the empty structured array and convert it into a pandas DataFrame.

1data['Name'] = a
2data['Age'] = b
3data['Height'] = c
4
5pd.DataFrame(data)

Name Age Height
0 Tom 32 1.87
1 Gary 41 1.81
2 Lois 39 1.67
3 Wendy 36 1.78
4 Betty 40 1.74
Tip

This method of constructing a pandas DataFrame is not recommended unless the structured array is already available in the first place.