Pandas >> How to Filter Data
Table of Contents
In this article, we will talk about how to filter data in DataFrame. In other words, how to select rows from a DataFrame based on column values?
![[Pandas] How to filter data](/img/pandas_00008.png)
Firstly, we prepare data for demonstration.
Preparing data
import pandas as pd
df = pd.DataFrame({
"name": ["Kevin", "Jack", "Mary", "Bob", "Robert", "Amy"],
"score": [80, 90, 95, 93, 88, 81],
"class": ["A", "B", "A", "A", "B", "B"]
}, index=["K", "J", "M", "B", "R", "A"])
df
Result
| name | score | class | |
|---|---|---|---|
| K | Kevin | 80 | A |
| J | Jack | 90 | B |
| M | Mary | 95 | A |
| B | Bob | 93 | A |
| R | Robert | 88 | B |
| A | Amy | 81 | B |
Filter by one condition
We can specify one condition in brackets after DataFrame.
The condition can be defined as df["column name"] == "value".
Filter data with class are ‘A’
df[df["class"] == 'A']
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| K | Kevin | 80 | A |
| M | Mary | 95 | A |
| B | Bob | 93 | A |
We can also use loc
df.loc[df["class"] == 'A']
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| K | Kevin | 80 | A |
| M | Mary | 95 | A |
| B | Bob | 93 | A |
Filter data with class are not ‘A’
df[df["class"] ! = 'A']
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| J | Jack | 90 | B |
| R | Robert | 88 | B |
| A | Amy | 81 | B |
Filter data with score greater than 85
df[df["score"] > 85]
# OR use query API
df.query(" score > 85 ")
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| J | Jack | 90 | B |
| M | Mary | 95 | A |
| B | Bob | 93 | A |
| R | Robert | 88 | B |
Filter by multiple conditions
We can use boolean operators between two conditions. You should enclose every condition in parentheses.
AND: &
OR: |
NOT: ~
Filter data with class are ‘A’ and score greater than 85
df[(df["class"] == 'A') & (df["score"] > 85)]
# If we don't use parentheses
df["class"] == 'A' & df["score"] > 85
will be parsed as ⏬
df["class"] == ('A' & df["score"]) > 85
Series is ambiguous error will occur.
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| M | Mary | 95 | A |
| B | Bob | 93 | A |
Filter data with class are ‘A’ and score greater than 93 or lower than 90
df[(df["class"] == 'A') & ((df["score"] > 93) | (df["score"] < 90))]
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| K | Kevin | 80 | A |
| M | Mary | 95 | A |
Filter data not with class are ‘A’ and score greater than 85
df[~((df["class"] == 'A') & (df["score"] > 85))]
Result
| name_1 | score_1 | class_1 | |
|---|---|---|---|
| K | Kevin | 80 | A |
| J | Jack | 90 | B |
| R | Robert | 88 | B |
| A | Amy | 81 | B |
Select rows specified column value is in some values
We can use isin method to get the boolean results of if the column value is in some values or not.
df.loc[df["name"].isin(["Kevin", "Bob"])]
# OR use query API
df.query(" name in ('Kevin', 'Bob') ")
Result
| name | score | class | |
|---|---|---|---|
| K | Kevin | 80 | A |
| B | Bob | 93 | A |