The pandas DataFrame Object

Table of Contents

Introduction

A pandas DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data. As mentioned, a DataFrame is analogous to the EXCEL spreadsheet with its rows and columns, while a Series is analogous to a single column of data.

Just like the Series object which is an analog of a 1-D NumPy array with flexible indices, a DataFrame is an analog of a 2-D NumPy array with both flexible row indices and column labels.

It can also be thought of as a dictionary-like container for Series objects. We can think of a DataFrame as a sequence of aligned Series objects, or in other words, Series that share the same index.

Syntax

A pandas DataFrame can be created using the following constructor.

Syntax

The pandas.DataFrame function.

1pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)

Parameter	Required?	Default Value	Description
`data`	✔️ Yes	NA	`ndarray` (structured or homogeneous), Iterable, dict, or DataFrame.
`index`	❌ No	`RangeIndex`	Index used to label rows of resulting frame.
`columns`	❌ No	`RangeIndex`	Column labels used for resulting frame.
`dtype`	❌ No	Inferred from `data`	Data type to force. Only a single `dtype` is allowed. If `None`, it is inferred from `data`.
`copy`	❌ No	`None`	`bool`. Copy data from inputs. For dict data, the default of `None` behaves like `copy=True`. For DataFrame or 2-D ndarray input, the default of `None` behaves like `copy=False`. If data is a dict containing one or more Series (possibly of different dtypes), `copy=False` will ensure that these inputs are not copied.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

DataFrame from a Series Object

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series.

We first import the required libraries and modules.

1import numpy as np
2import pandas as pd
3import random as rd

Example

Step 1: Creating a pandas Series from a dictionary.

1age_dict = {'Tom': 32,
2            'Gary': 26,
3            'Lois': 22,
4            'Wendy': 31,
5            'Betty': 35}  # create dictionary
6
7age = pd.Series(age_dict)  # create Series
8age

Tom      32
Gary     26
Lois     22
Wendy    31
Betty    35
dtype: int64

We now create a single-column DataFrame from the above Series object.

Example

Step 2: Creating a pandas DataFrame from a pandas Series.

1pd.DataFrame(age, columns=['Age'])  # create dataframe

	Age
Tom	32
Gary	26
Lois	22
Wendy	31
Betty	35

DataFrame from a Dictionary of Series Objects

It is also possible to create a DataFrame from a dictionary of Series objects.

We create another Series object that share the same index as the age Series object (created above).

Example

Creating another pandas Series from a dictionary.

1height_dict = {'Tom': 1.75,
2               'Gary': 1.83,
3               'Lois': 1.69,
4               'Wendy': 1.67,
5               'Betty': 1.62}  # create dictionary
6
7height = pd.Series(height_dict)  # create Series
8height

Tom      1.75
Gary     1.83
Lois     1.69
Wendy    1.67
Betty    1.62
dtype: float64

We then create a DataFrame from a dictionary of the two Series objects.

Example

Creating a pandas DataFrame from a dictionary of two Series.

1pd.DataFrame({'age':age, 'height':height})  # create dataframe

	age	height
Tom	32	1.75
Gary	26	1.83
Lois	22	1.69
Wendy	31	1.67
Betty	35	1.62

Note that the column labels are just the ‘keys’ of the dictionary.

DataFrame from a Dictionary of Python Lists

It is also possible to create a pandas DataFrame from a dictionary of Python Lists.

We first create the lists to store the ’names’, ‘ages’, and ‘heights’. For convenience, we employ the random functions to generate some random data.

Example

Creating a pandas DataFrame from a dictionary of lists.

1a=['Tom','Gary','Lois','Wendy','Betty']
2b=rd.choices(range(30,45),k=5)
3c=np.random.uniform(1.6, 1.9, 5).round(2)
4
5# printing
6for x in (a,b,c):
7    print(x)

['Tom', 'Gary', 'Lois', 'Wendy', 'Betty']
[32, 41, 39, 36, 40]
[1.87 1.81 1.67 1.78 1.74]

We now create a dictionary of the above lists and generate a DataFrame using pd.DataFrame(), while specifying list a to be the index of the DataFrame.

1df = pd.DataFrame({'Age':b,
2                   'Height':c}, index=a)
3df

	Age	Height
Tom	32	1.87
Gary	41	1.81
Lois	39	1.67
Wendy	36	1.78
Betty	40	1.74

DataFrame from a List of Dictionaries

It is possible to convert a list of dictionaries into a DataFrame. In this case, each row of the dataframe is a dictionary.

Example

Creating a pandas DataFrame from list of dictionaries.

1dict1 = {'Name':'Tom',
2         'Age':30,
3         'Height':1.88}
4
5dict2 = {'Name':'Gary',
6         'Age':38,
7         'Height':1.68}
8
9pd.DataFrame([dict1,dict2])

	Name	Age	Height
0	Tom	30	1.88
1	Gary	38	1.68

DataFrame from a Two-dimensional Array

Given a 2-D NumPy array of data, we can create a DataFrame with any specified column and index labels. If omitted, an integer RangeIndex will be used.

First, let’s recall that the split() method applied to a string (where the characters are separated by space) results in a list of characters.

Example

Creating a list of characters from a string.

1'A B C D E F'.split()

['A', 'B', 'C', 'D', 'E', 'F']

We now create a DataFrame where the column and index labels result from using the split() method.

Example

Creating a DataFrame from a 2-D NumPy array.

1pd.DataFrame(np.random.rand(6,4),
2             index='A B C D E F'.split(),
3             columns='W X Y Z'.split())

	W	X	Y	Z
A	0.074901	0.696408	0.153493	0.098724
B	0.265094	0.659235	0.833043	0.985687
C	0.930414	0.512948	0.539358	0.541957
D	0.911736	0.975602	0.777425	0.922223
E	0.255050	0.830163	0.964033	0.693914
F	0.246925	0.060152	0.535843	0.622826

Note that np.random.rand(6,4) creates a 2-D array of the shape (6,4) and populates it with random samples from a uniform distribution over $[0, 1)$.

DataFrame from a Nested List

Another way of creating a DataFrame is to provide the data as a nested list, along with labels for the columns and the index.

Example

Creating a pandas DataFrame from a nested list.

1data = [['Tom', 32, 1.75],
2        ['Gary', 26, 1.83],
3        ['Lois', 22,  1.69],
4        ['Wendy', 31, 1.67],
5        ['Betty', 35, 1.62]]
6pd.DataFrame(data, columns=['Name', "Age", "Height"])

	Name	Age	Height
0	Tom	32	1.75
1	Gary	26	1.83
2	Lois	22	1.69
3	Wendy	31	1.67
4	Betty	35	1.62

DataFrame from a NumPy Structured Array

A structured array is a stripped-down version of a pandas DataFrame, so it comes as no surprise that the latter can be created directly from the former.

👀 Review

Structured Array

We first create a structured array using the lists a, b and c defined earlier. The first step is to create the data types.

Example

Creating a pandas DataFrame from a NumPy structured array.

1dt = np.dtype({
2'names':('Name', 'Age', 'Height'),
3'formats':( 'U10', int, float)})
4print(dt)

[('Name', '<U10'), ('Age', '<i4'), ('Height', '<f8')]

The next step is to initialize the structured array. Since there are a total of 5 entries, we initialize the structured array with an empty 1-D array with 5 elements using the dt data type defined earlier.

1data = np.empty(5, dt)
2print(data)

[('', 0, 0.) ('', 0, 0.) ('', 0, 0.) ('', 0, 0.) ('', 0, 0.)]

The final step is to populate the empty structured array and convert it into a pandas DataFrame.

1data['Name'] = a
2data['Age'] = b
3data['Height'] = c
4
5pd.DataFrame(data)

	Name	Age	Height
0	Tom	32	1.87
1	Gary	41	1.81
2	Lois	39	1.67
3	Wendy	36	1.78
4	Betty	40	1.74

Tip

This method of constructing a pandas DataFrame is not recommended unless the structured array is already available in the first place.

The pandas DataFrame Object

Introduction

Syntax

DataFrame from a Series Object

DataFrame from a Dictionary of Series Objects

DataFrame from a Dictionary of Python Lists

DataFrame from a List of Dictionaries

DataFrame from a Two-dimensional Array

DataFrame from a Nested List

DataFrame from a NumPy Structured Array

Share this Article