Pandas >> How to Filter Data
Table of Contents
In this article, we will talk about how to filter data in DataFrame
. In other words, how to select rows from a DataFrame based on column values?
Firstly, we prepare data for demonstration.
Preparing data
import pandas as pd
df = pd.DataFrame({
"name": ["Kevin", "Jack", "Mary", "Bob", "Robert", "Amy"],
"score": [80, 90, 95, 93, 88, 81],
"class": ["A", "B", "A", "A", "B", "B"]
}, index=["K", "J", "M", "B", "R", "A"])
df
Result
name | score | class | |
---|---|---|---|
K | Kevin | 80 | A |
J | Jack | 90 | B |
M | Mary | 95 | A |
B | Bob | 93 | A |
R | Robert | 88 | B |
A | Amy | 81 | B |
Filter by one condition
We can specify one condition in brackets after DataFrame
.
The condition can be defined as df["column name"] == "value"
.
Filter data with class are ‘A’
df[df["class"] == 'A']
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
K | Kevin | 80 | A |
M | Mary | 95 | A |
B | Bob | 93 | A |
We can also use loc
df.loc[df["class"] == 'A']
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
K | Kevin | 80 | A |
M | Mary | 95 | A |
B | Bob | 93 | A |
Filter data with class are not ‘A’
df[df["class"] ! = 'A']
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
J | Jack | 90 | B |
R | Robert | 88 | B |
A | Amy | 81 | B |
Filter data with score greater than 85
df[df["score"] > 85]
# OR use query API
df.query(" score > 85 ")
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
J | Jack | 90 | B |
M | Mary | 95 | A |
B | Bob | 93 | A |
R | Robert | 88 | B |
Filter by multiple conditions
We can use boolean operators between two conditions. You should enclose every condition in parentheses.
AND: &
OR: |
NOT: ~
Filter data with class are ‘A’ and score greater than 85
df[(df["class"] == 'A') & (df["score"] > 85)]
# If we don't use parentheses
df["class"] == 'A' & df["score"] > 85
will be parsed as ⏬
df["class"] == ('A' & df["score"]) > 85
Series is ambiguous error will occur.
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
M | Mary | 95 | A |
B | Bob | 93 | A |
Filter data with class are ‘A’ and score greater than 93 or lower than 90
df[(df["class"] == 'A') & ((df["score"] > 93) | (df["score"] < 90))]
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
K | Kevin | 80 | A |
M | Mary | 95 | A |
Filter data not with class are ‘A’ and score greater than 85
df[~((df["class"] == 'A') & (df["score"] > 85))]
Result
name_1 | score_1 | class_1 | |
---|---|---|---|
K | Kevin | 80 | A |
J | Jack | 90 | B |
R | Robert | 88 | B |
A | Amy | 81 | B |
Select rows specified column value is in some values
We can use isin
method to get the boolean results of if the column value is in some values or not.
df.loc[df["name"].isin(["Kevin", "Bob"])]
# OR use query API
df.query(" name in ('Kevin', 'Bob') ")
Result
name | score | class | |
---|---|---|---|
K | Kevin | 80 | A |
B | Bob | 93 | A |