How to Split One Column to Multiple Columns in Pandas
Table of Contents
In this article, we will talk about how to split one column in which contents are concatenated with delimiter like comma to multiple columns. Or use regex expression to extract contents for multiple columns.
Let’s prepare data first.
Preparing data
import pandas as pd
df = pd.DataFrame({
"name": ["Kevin", "Jack", "Mary"],
"height-weight": ["180,75", "175,70", "165,60"],
"score": ["88,90,76", "85,88,80", "90,90,90"],
"Interests": ["Writing, Sports", "Sports, Traveling", "Violin, Reading"]
}, index=["K", "J", "M"])
df
Split column using str.split() method of Series
pandas.Series.str.split split strings around given delimiter.
str.split() method has pat
parameter which can be delimiter string (whitespace by default) or regular expression.
For example, we can split height and weight of every student using str.split() method.
df["height-weight"].str.split(",")
You will find the split result (list) is a Series (one column contains list value), yet different from our expected result.
So let’s add another option in split() method to achieve this goal. This option is expand, if we specified expand=True, two columns will be generated.
df["height-weight"].str.split(",", expand=True)
If we use two columns to accept this result, we can add height and weight column to df DataFrame.
df[["height", "weight"]] = df["height-weight"].str.split(",", expand=True)
df
If you want to concatenate new columns by splitting and some columns of original DataFrame to a new DataFrame,
you can use concat() method of Pandas. concat() accepts a list of DataFrame, axis=1 means concatenate DataFrames in horizontal direction.
Note: The new columns by splitting have no column names but 0,1.
df2 = pd.concat([df["name"], df["height-weight"].str.split(",", expand=True)], axis=1)
df2
We can use rename() method of DataFrame to change column names as below.
You can read my another article about how to rename column and index.
Pandas » How to Rename Column and Index
https://thats-it-code.com/pandas/pandas__how-to-rename-column-and-index/
df2.rename(columns={0: 'height', 1: 'weight'}, inplace=True)
df2
Extract columns from value of one column by str.extract()
pandas.Series.str.extract will extract contents as columns by regex. In regex pattern, capturing groups (enclosed in parentheses) must be specified.
For example, we can use str.extract() method to extract three scores from score column.
df["score"].str.extract("(\d+),(\d+),(\d+)", expand=True)
Also we can specify column names to accept new columns.
df[["Math", "Science", "Sports"]] = df["score"].str.extract("(\d+),(\d+),(\d+)", expand=True)
df
Conclusion
- We can use str.split() to split one column to multiple columns by specifying expand=True option.
- We can use str.extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.