Pandas >> Data Combination(2): concat()
Table of Contents
In this tutorial, we will explain how to combinate/concatenate multiple DataFrames or DataFrames and Series into one DataFrame using concat() method.
Introduction
As opposed to the merge() method which joins data by some key columns, the concat() method concatenates data along a particular axis (row or column).
pandas.concat
https://pandas.pydata.org/docs/reference/api/pandas.concat.html
Data Preparation
Prepare the first DataFrame.
import pandas as pd
df1 = pd.DataFrame({
"first_name": ["John", "Sarah", "Mike", "Tom", "Mary"],
"last_name": ["Doe", "Smith", "Brown", "Davis", "Clark"],
"gender": ["M", "F", "M", "M", "F"],
"class": ["A", "A", "B", "A", "B"]
}, index=["1001", "1002", "1003", "1004", "1005"])
df1
The second DataFrame.
df2 = pd.DataFrame({
"first_name": ["Mike", "Tom", "Mary", "Bob", "Kevin"],
"last_name": ["Brown", "Davis", "Clark", "Lopez", "Wilson"],
"gender": ["M", "M", "F", "M", "M"],
"class": ["B", "A", "B", "B", "A"],
}, index=["1003", "1004", "1005", "1006", "1007"])
df2
The third DataFrame.
df3 = pd.DataFrame({
"height": [172, 175, 180, 178, 182],
"weight": [60, 65, 70, 68, 75],
}, index=["1003", "1004", "1005", "1006", "1007"])
df3
Concatenating data
Combinate/Concatenate DataFrames in the vertical direction
Sometimes, we need to concatenate multiple DataFrames with the same columns into one DataFrame. For example, suppose the order data of an online shop is stored once a month. We have to read the data of each month as a DataFrame. If we want to operate two months of data, we need to concatenate the two months of data into one DataFrame. In this case, the concat() method is just right for this task.
Example Code
If we want to concatenate df1 and df2 created above, we can do it like this. By default, axis is set to 0 (vertical direction/row direction/Index direction) .
pd.concat([df1, df2])
# OR
pd.concat([df1, df2], axis=0)
We will find that there are some duplicate values (1003,1004,1005) in the index.
We can set ignore_index=True to reset it to a 0-based index.
pd.concat([df1, df2], axis=0, ignore_index=True)
Combinate/Concatenate DataFrames in the horizontal direction
Sometimes, we have multiple datasets with the same key column (index) and different information (columns) and need to concatenate them in the horizontal direction based on the index.
For example, in DataFrame df1 there are the columns first_name, last_name, gender, and class, suppose we also have another DataFrame df3 in which there are the columns height and weight.
We can concatenate them into one in the horizontal direction like this.
pd.concat([df1, df3], axis=1)
# OR
pd.concat([df1, df3], axis=1, join="outer")
We will find the indexes of two DataFrames df1, and df3 are merged. (1001-1007)
The parameter join stands for how to handle indexes on another axis.
By default, the join is set to outer.
That means all data (indexes) will be kept and the columns where indexes don’t exist will be filled with NaN.
If we only want to keep the data that have the same indexes, we can set the parameter ‘join’ to inner.
pd.concat([df1, df3], axis=1, join="inner")
Concatenate mutiple (>2) DataFrames
We can use the concat() method to concatenate multiple DataFrames if we specify a list of DataFrames.
pd.concat([df1, df2, df3], axis=1, join="outer")
Concatenate DataFrame and Series
We can also concatenate a DataFrame object and a Series object to one DataFrame.
If we have a list with additional information and the same order as the DataFrame, we can convert the list to a Series and concatenate it with the DataFrame.
For example, we have an interests list, and its indexes and order are the same as the DataFrame df1. If we want to add it as a new column to the right side of the DataFrame, we can do it like this.
Convert the list to a Series object.
# value list
list_for_series = ["Cooking", "Travel", "Gaming", "Art", "History"]
# new column name
col_name_for_series = "interests"
# the parameter index need to be set the same indexes as df1
s = pd.Series(list_for_series, name=col_name_for_series, index=df1.index)
s
Concatenate the DataFrame df1 and the Series s in the horizontal direction.
pd.concat([df1, s], axis=1)
If you want to learn more about adding columns, you can reference the article below.
Pandas » How to Add Columns to an Existing DataFrame
https://thats-it-code.com/pandas/pandas__how-to-add-columns-to-an-existing-dataframe/
Conclusion
We can concatenate multiple DataFrames such as Time Series Data in vertical direction.
We can also concatenate multiple DataFrames or DataFrames and Series with the same indexes and orders from different sources.