Pandas >> How to Filter Data

2021-10-31 Pandas

Table of Contents

In this article, we will talk about how to filter data in DataFrame. In other words, how to select rows from a DataFrame based on column values?

[Pandas] How to filter data

Firstly, we prepare data for demonstration.

Preparing data

import pandas as pd

df = pd.DataFrame({
    "name": ["Kevin", "Jack", "Mary", "Bob", "Robert", "Amy"],
    "score": [80, 90, 95, 93, 88, 81],
    "class": ["A", "B", "A", "A", "B", "B"]
}, index=["K", "J", "M", "B", "R", "A"])

df

Result

name score class
K Kevin 80 A
J Jack 90 B
M Mary 95 A
B Bob 93 A
R Robert 88 B
A Amy 81 B

Filter by one condition

We can specify one condition in brackets after DataFrame.
The condition can be defined as df["column name"] == "value".

Filter data with class are ‘A’

df[df["class"] == 'A']

Result

name_1 score_1 class_1
K Kevin 80 A
M Mary 95 A
B Bob 93 A

We can also use loc

df.loc[df["class"] == 'A']

Result

name_1 score_1 class_1
K Kevin 80 A
M Mary 95 A
B Bob 93 A

Filter data with class are not ‘A’

df[df["class"] ! = 'A']

Result

name_1 score_1 class_1
J Jack 90 B
R Robert 88 B
A Amy 81 B

Filter data with score greater than 85

df[df["score"] > 85]

# OR use query API
df.query(" score > 85 ")

Result

name_1 score_1 class_1
J Jack 90 B
M Mary 95 A
B Bob 93 A
R Robert 88 B

Filter by multiple conditions

We can use boolean operators between two conditions. You should enclose every condition in parentheses.
AND: &
OR: |
NOT: ~

Filter data with class are ‘A’ and score greater than 85

df[(df["class"] == 'A') & (df["score"] > 85)]

# If we don't use parentheses
df["class"] == 'A' & df["score"] > 85
will be parsed as 
df["class"] == ('A' & df["score"]) > 85
Series is ambiguous error will occur.

Result

name_1 score_1 class_1
M Mary 95 A
B Bob 93 A

Filter data with class are ‘A’ and score greater than 93 or lower than 90

df[(df["class"] == 'A') & ((df["score"] > 93) | (df["score"] < 90))]

Result

name_1 score_1 class_1
K Kevin 80 A
M Mary 95 A

Filter data not with class are ‘A’ and score greater than 85

df[~((df["class"] == 'A') & (df["score"] > 85))]

Result

name_1 score_1 class_1
K Kevin 80 A
J Jack 90 B
R Robert 88 B
A Amy 81 B

Select rows specified column value is in some values We can use isin method to get the boolean results of if the column value is in some values or not.

df.loc[df["name"].isin(["Kevin", "Bob"])]

# OR use query API
df.query(" name in ('Kevin', 'Bob') ")

Result

name score class
K Kevin 80 A
B Bob 93 A

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us