Pandas >> Data Combination(2): concat()

2022-05-05 Pandas

Table of Contents

In this tutorial, we will explain how to combinate/concatenate multiple DataFrames or DataFrames and Series into one DataFrame using concat() method.

Pandas data combination

Introduction

As opposed to the merge() method which joins data by some key columns, the concat() method concatenates data along a particular axis (row or column).

pandas.concat
https://pandas.pydata.org/docs/reference/api/pandas.concat.html

Data Preparation

Prepare the first DataFrame.

import pandas as pd

df1 = pd.DataFrame({
    "first_name": ["John", "Sarah", "Mike", "Tom", "Mary"],
    "last_name": ["Doe", "Smith", "Brown", "Davis", "Clark"],
    "gender": ["M", "F", "M", "M", "F"],
    "class": ["A", "A", "B", "A", "B"]
}, index=["1001", "1002", "1003", "1004", "1005"])
df1

Pandas data combination

The second DataFrame.

df2 = pd.DataFrame({
    "first_name": ["Mike", "Tom", "Mary", "Bob", "Kevin"],
    "last_name": ["Brown", "Davis", "Clark", "Lopez", "Wilson"],
    "gender": ["M", "M", "F", "M", "M"],
    "class": ["B", "A", "B", "B", "A"],
}, index=["1003", "1004", "1005", "1006", "1007"])
df2

Pandas data combination

The third DataFrame.

df3 = pd.DataFrame({
    "height": [172, 175, 180, 178, 182],
    "weight": [60, 65, 70, 68, 75],
}, index=["1003", "1004", "1005", "1006", "1007"])
df3

Pandas data combination

Concatenating data

Combinate/Concatenate DataFrames in the vertical direction

Sometimes, we need to concatenate multiple DataFrames with the same columns into one DataFrame. For example, suppose the order data of an online shop is stored once a month. We have to read the data of each month as a DataFrame. If we want to operate two months of data, we need to concatenate the two months of data into one DataFrame. In this case, the concat() method is just right for this task.

Example Code
If we want to concatenate df1 and df2 created above, we can do it like this. By default, axis is set to 0 (vertical direction/row direction/Index direction) .

pd.concat([df1, df2])
# OR
pd.concat([df1, df2], axis=0)

Pandas data combination

We will find that there are some duplicate values (1003,1004,1005) in the index.
We can set ignore_index=True to reset it to a 0-based index.

pd.concat([df1, df2], axis=0, ignore_index=True)

Pandas data combination

Combinate/Concatenate DataFrames in the horizontal direction

Sometimes, we have multiple datasets with the same key column (index) and different information (columns) and need to concatenate them in the horizontal direction based on the index.

For example, in DataFrame df1 there are the columns first_name, last_name, gender, and class, suppose we also have another DataFrame df3 in which there are the columns height and weight.

We can concatenate them into one in the horizontal direction like this.

pd.concat([df1, df3], axis=1)
# OR
pd.concat([df1, df3], axis=1, join="outer")

We will find the indexes of two DataFrames df1, and df3 are merged. (1001-1007)
The parameter join stands for how to handle indexes on another axis.
By default, the join is set to outer.
That means all data (indexes) will be kept and the columns where indexes don’t exist will be filled with NaN.

Pandas data combination

If we only want to keep the data that have the same indexes, we can set the parameter ‘join’ to inner.

pd.concat([df1, df3], axis=1, join="inner")

Pandas data combination

Concatenate mutiple (>2) DataFrames

We can use the concat() method to concatenate multiple DataFrames if we specify a list of DataFrames.

pd.concat([df1, df2, df3], axis=1, join="outer")

Pandas data combination

Concatenate DataFrame and Series

We can also concatenate a DataFrame object and a Series object to one DataFrame.
If we have a list with additional information and the same order as the DataFrame, we can convert the list to a Series and concatenate it with the DataFrame.

For example, we have an interests list, and its indexes and order are the same as the DataFrame df1. If we want to add it as a new column to the right side of the DataFrame, we can do it like this.

Convert the list to a Series object.

# value list
list_for_series = ["Cooking", "Travel", "Gaming", "Art", "History"]
# new column name
col_name_for_series = "interests"
# the parameter index need to be set the same indexes as df1
s = pd.Series(list_for_series, name=col_name_for_series, index=df1.index)
s

Pandas data combination

Concatenate the DataFrame df1 and the Series s in the horizontal direction.

pd.concat([df1, s], axis=1)

Pandas data combination

If you want to learn more about adding columns, you can reference the article below.

Pandas » How to Add Columns to an Existing DataFrame
https://thats-it-code.com/pandas/pandas__how-to-add-columns-to-an-existing-dataframe/

Conclusion

We can concatenate multiple DataFrames such as Time Series Data in vertical direction.
We can also concatenate multiple DataFrames or DataFrames and Series with the same indexes and orders from different sources.

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us