How to Select Rows Containing Specified String

2022-01-01 Pandas

In this article, we will talk about how to extract rows containing specified string or satisfying some conditions.

[Pandas] How to filter data

Firstly, we prepare data.

Preparing data

import pandas as pd

df = pd.DataFrame({
    "name": ["Kevin", "Jack", "Mary", "Bob", "Robert", "Amy"],
    "score": [80, 90, 95, 93, 88, 81],
    "class": ["A", "B", "A", "A", "B", "B"],
    "Interests": ["Writing, Sports", "Sports, Traveling", "Violin, Reading", "Reading, Blogging","Learning, Reading, Writing","Volunteering, Piano"]
}, index=["K", "J", "M", "B", "R", "A"])

df

[Pandas] How to filter data

How to select rows from DataFrame

We can select rows from DataFrame by one condition (==,!=, >, <) or multiple conditions combination (&, |). You can read the article below to understand how to select rows from DataFrame in detail.

Pandas » How to Filter Data
https://thats-it-code.com/pandas/pandas__how-to-filter-data/

Select all students of class ‘A’

df[df["class"] == 'A']

[Pandas] How to filter data

Select all students with score greater than 80

df[df["score"] > 80]

[Pandas] How to filter data

Select all students of class ‘A’ with score greater than 80.

df[(df["class"] == 'A') & (df["score"] > 80)]

[Pandas] How to filter data

Select rows with specified column has exactly the same value (==)

We can use == to select data with exactly same value.
For example, we want to select persons whose interests are Reading, Blogging.

df[df["Interests"] == "Reading, Blogging"]

[Pandas] How to filter data

Select rows with those columns contain specified string (str.contains())

We can use str.contains() method in filter condition to select rows with those columns contain some string.
For Example, we want to select students who are interested in Reading.

df[df["Interests"].str.contains("Reading")]

[Pandas] How to filter data

When there are NaN, ValueError will occur when selecting.

Firstly, we add one row whose Interests is NaN.

import numpy as np
new_row = pd.DataFrame({"name": "Tom", "score": 85, "class": 'C', "Interests": np.nan}, index=["T"])
df2 = pd.concat([df, new_row], axis=0)
df2

[Pandas] How to filter data

If we execute code above, we will get ValueError.

df2[df2["Interests"].str.contains("Reading")]

[Pandas] How to filter data

So to avoid above error, we can firstly fill NaN with empty string and apply selecting operation.
You can read the article below to find how to fill NaN value in Pandas.

How to Process Null Values in Pandas
https://thats-it-code.com/pandas/how-to-process-null-values/

df2["Interests"] = df2["Interests"].fillna("")
df2[df2["Interests"].str.contains("Reading")]

[Pandas] How to filter data

Use case option in contains to specify case sensitive or case insensitive

For example, we can select above rows by using lower case reading.

df[df["Interests"].str.contains("reading", case=False)]

[Pandas] How to filter data

Use regex expression in contains

We can also use regex expression in contains() method.
For example, we can use .*V.* to select students who are interested in Voilin and Volunteering.

df[df["Interests"].str.contains(".*V.*")]

[Pandas] How to filter data

By default regex mode is open. If we want to turn it off, we can specify regex=False.

df[df["Interests"].str.contains(".*V.*", regex=False)]

Use str.startswith() to find rows whose column starts with a pattern.

For example, we can find students whose interests start with Learning as below.

df[df["Interests"].str.startswith("Learning")]

[Pandas] How to filter data

Use str.endswith() to find rows whose column ends with a pattern.

For example, we can find students whose interests end with ing as below.

df[df["Interests"].str.endswith("ing")]

[Pandas] How to filter data

Use str.match() to find rows whose column values match regex pattern.

For example, we can find students who have three interests as below.
[^,] means character except comma, + means at least occur one time.

df[df["Interests"].str.match("[^,]+,[^,]+,[^,]+")]

[Pandas] How to filter data

Conclusion

We can use == to select rows whose column value exactly equals specified value.
We can use str.contains() method to select rows whose column values contain some value or regex pattern.
We can also use str.startswith(), str.endswith(), str.match() method to select rows by partially matched some patterns.

That's it ! Code Snippets

How to Select Rows Containing Specified String

Table of Contents

Preparing data

How to select rows from DataFrame

Select rows with specified column has exactly the same value (==)

Select rows with those columns contain specified string (str.contains())

When there are NaN, ValueError will occur when selecting.

Use case option in contains to specify case sensitive or case insensitive

Use regex expression in contains

Use str.startswith() to find rows whose column starts with a pattern.

Use str.endswith() to find rows whose column ends with a pattern.

Use str.match() to find rows whose column values match regex pattern.

Conclusion

Contact us

How to Select Rows Containing Specified String

Table of Contents

Preparing data

How to select rows from DataFrame

Select rows with specified column has exactly the same value (==)

Select rows with those columns contain specified string (str.contains())

When there are NaN, ValueError will occur when selecting.

Use case option in contains to specify case sensitive or case insensitive

Use regex expression in contains

Use str.startswith() to find rows whose column starts with a pattern.

Use str.endswith() to find rows whose column ends with a pattern.

Use str.match() to find rows whose column values match regex pattern.

Conclusion

Subscribe and be the FIRST reader of our latest articles

Contact us