How to Split One Column to Multiple Columns in Pandas

2022-01-02 Pandas

Table of Contents

In this article, we will talk about how to split one column in which contents are concatenated with delimiter like comma to multiple columns. Or use regex expression to extract contents for multiple columns.

split column

Let’s prepare data first.

Preparing data

import pandas as pd

df = pd.DataFrame({
    "name": ["Kevin", "Jack", "Mary"],
    "height-weight": ["180,75", "175,70", "165,60"],
    "score": ["88,90,76", "85,88,80", "90,90,90"],
    "Interests": ["Writing, Sports", "Sports, Traveling", "Violin, Reading"]
}, index=["K", "J", "M"])

df

split column

Split column using str.split() method of Series

pandas.Series.str.split split strings around given delimiter.

str.split() method has pat parameter which can be delimiter string (whitespace by default) or regular expression.

For example, we can split height and weight of every student using str.split() method.

df["height-weight"].str.split(",")

split column

You will find the split result (list) is a Series (one column contains list value), yet different from our expected result.

So let’s add another option in split() method to achieve this goal. This option is expand, if we specified expand=True, two columns will be generated.

df["height-weight"].str.split(",", expand=True)

split column

If we use two columns to accept this result, we can add height and weight column to df DataFrame.

df[["height", "weight"]] = df["height-weight"].str.split(",", expand=True)
df

split column

If you want to concatenate new columns by splitting and some columns of original DataFrame to a new DataFrame, you can use concat() method of Pandas. concat() accepts a list of DataFrame, axis=1 means concatenate DataFrames in horizontal direction.
Note: The new columns by splitting have no column names but 0,1.

df2 = pd.concat([df["name"], df["height-weight"].str.split(",", expand=True)], axis=1)
df2

split column

We can use rename() method of DataFrame to change column names as below.
You can read my another article about how to rename column and index.

Pandas » How to Rename Column and Index
https://thats-it-code.com/pandas/pandas__how-to-rename-column-and-index/

df2.rename(columns={0: 'height', 1: 'weight'}, inplace=True)
df2

split column

Extract columns from value of one column by str.extract()

pandas.Series.str.extract will extract contents as columns by regex. In regex pattern, capturing groups (enclosed in parentheses) must be specified.
For example, we can use str.extract() method to extract three scores from score column.

df["score"].str.extract("(\d+),(\d+),(\d+)", expand=True)

split column

Also we can specify column names to accept new columns.

df[["Math", "Science", "Sports"]] = df["score"].str.extract("(\d+),(\d+),(\d+)", expand=True)
df

split column

Conclusion

  • We can use str.split() to split one column to multiple columns by specifying expand=True option.
  • We can use str.extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us