How to Process Null Values in Pandas

2021-12-31 Pandas

Table of Contents

In this article, we will talk about how to process null/empty data in DataFrame. In real world, it’s impossible to have values in every row and column of the data. So when we encounter a data set containing null values, we have to decide whether to delete, replace, etc. to handle it according to our needs.

processing nan data

Firstly, we prepare a csv like below.

processing nan data

As you can see, there are some empty cells in score column. Let’s read into Pandas DataFrame by using read_csv method of pandas. About how to read data in Pandas, you can read another article about how to read data in Pandas.

How to Read various data in Pandas
https://thats-it-code.com/pandas/pandas__how-to-read-various-data/

import pandas as pd
df = pd.read_csv("data/school_report.csv", header=0, index_col=False)
df

processing nan data

Delete data containing null values by dropna()

If we want to drop all the data containing null values, we just use dropna() method of DataFrame.

pydata.org > pandas.DataFrame.dropna https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html

df.dropna()

You will find all rows containing NaN value in any column disappeared.

processing nan data

If you just want to drop columns containing NaN values, you can use axis option and specify “columns”.

df.dropna(axis="columns")

You will find column address and score disappeared.

processing nan data

If you only want to remove rows all values are null, you can use how=“all” option.

df.dropna(how="all")

If you only want to remove rows containing at least one null value, you can use how=any option.

df.dropna(how="any")

Also you can remove columns all values are null by adding axis=1 option.

df.dropna(how="all", axis=1)
df.dropna(how="any", axis=1)

If you only want to remove some rows based some columns whether contain null values or not. You can use subset option.
For example, if we want to remove all rows whose address is empty, we can achieve it using below code.

df.dropna(subset=["address"])

processing nan data

Series just use dropna()

df["score"].dropna()

processing nan data

Fill null value with specified value

You can use fillna() method of DataFrame to fill null values with other values.

pydata.org > pandas.DataFrame.fillna https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

Replace all null values with same value

We can replace all null values with same value by specifying the new value in fillna() method.

df.fillna("N/A")

processing nan data

Replace each column with a different value

We can use a dict to specify new value for each column in fillna() method.

df.fillna({"address": "zzz", "score": 0})

processing nan data

Fill numeric column with mean or median value

We can use mean() method to get mean value of each numeric column and use median() method to get median value of each numeric column and use them to fill null values in each column.

In our data, we have only one numeric column with null values, so we will use Series instead.

df["score"].mean()
81.76923076923077
df["score"].fillna(df["score"].mean())

processing nan data

df["score"].median()
80.0
df["score"].fillna(df["score"].median())

processing nan data

Fill null values with previous value or next value

We can specify method=‘ffill’ to fill null values with previous value, and specify method=‘bfill’ to fill null vaues with next value. This is very useful in Time Series data.

fill null values with previous value

df.fillna(method='ffill')

processing nan data

fill null values with next value
If the null value is in the last row, It will keep NaN.

df.fillna(method='bfill')

processing nan data

Select rows or columns with null values

If we want to select all rows with empty address, we can use isnull() method. You can learn more about how to filter data in Pandas in article below.

Pandas » How to Filter Data
https://thats-it-code.com/pandas/pandas__how-to-filter-data/

Select all rows with empty address

df[df["address"].isnull()]

processing nan data

We must use isnull to check value in pandas Dataframe is null or not like below other than use general python syntax.

df[df["name"] == "Oliver"]["address"].isnull()
2    True
Name: address, dtype: bool

If we use general python syntax, we will encounter error and will not get expect result.

if df[df["name"] == "Oliver"]["address"]:
    print("not empty")
else:
    print("empty")
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You can learn more about how to select data in Pandas in article below.

Pandas » How to Select and Modify Data
https://thats-it-code.com/pandas/pandas__how-to-select-and-modify-data/

Conclusion

  • We can remove rows or columns with null values using dropna()
  • We can fill specified value using fillna()
  • We can check and filter data with null values using isnull()

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us