Pandas >> How to Read CSV file in Pandas

2021-06-06 Pandas

Table of Contents

In this article, we will talk about various ways of reading csv files in Pandas.

[Pandas] How to read csv file

How to read a csv file

We can use read_csv method of pandas to read CSV files easily by specifying the address of the CSV file. The address can be a URL starting with HTTP, FTP, S3, gs, and file.

# import pandas module
import pandas as pd

# use read_csv method, and return pandas DataFrame object
df = pd.read_csv("data/sample.csv")

# show first 5 rows
df.head(5)

How to read csv of shift_jis

Pandas will read csv files with UTF encoding by default, if we want to read csv files with other encoding, we can use encoding option. For example, if we want to read a csv file with shift_jis encoding, we can specify encoding='cp392' option.

# import pandas module
import pandas as pd

# use encoding option to specifying csv encoding
df = pd.read_csv("data/sample.csv", encoding='cp392')

# show first 5 rows
df.head(5)

How to read all columns of csv files as strings

Sometimes we need to read all content of csv as strings. Then we can user dtype option. When reading csv files, we can specify dtype=str in read_csv method.

# import pandas module
import pandas as pd

# if you want to read all csv file contents as strings you can specifying dtype=str
df = pd.read_csv("data/sample.csv", dtype=str)

# show first 5 rows
df.head(5)

How to read date column from csv files

Sometimes we need to read some columns as date format from a csv file for dealing them conviniently later. For example, now we have a csv file below and want to convert birth column to date format when reading.

name sex score birth
Kevin Male 80 1992-02-02
Jack Male 90 1995-09-30
Mary Female 95 1998-12-11
# import pandas module
import pandas as pd

# Firstly we can add a date parser function, which is the most flexible way possible.
from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d')

# Then we use this parser function to parse the columns to date spcified by `parse_dates` option.
df = pd.read_csv("data/sample.csv", parse_dates=["birth"], date_parser=dateparse)

# We can check the type of `birth` column
df["birth"].dtypes

Result

dtype('<M8[ns]')

How to read csv files without index

If you don’t want to read csv files without index, you can use the option index_col=False

# import pandas module
import pandas as pd

# if you don't want to use first column as index you can achieve it by specifying index_col=False
df = pd.read_csv("data/sample.csv", index_col=False)

# show first 5 rows
df.head(5)

How to read csv files by specifying seperator

You can specify customized seperator by using sep=","

# import pandas module
import pandas as pd

# you can specifying seperator between columns by using sep parameter
df = pd.read_csv("data/sample.csv", sep=",")

# show first 5 rows
df.head(5)

How to read csv files using first row as header

You can use first row as header by specifying header=0

# import pandas module
import pandas as pd

# you can specifying seperator between columns by using sep parameter
df = pd.read_csv("data/sample.csv", header=0)

# show first 5 rows
df.head(5)

How to read csv files without header"

You can specify header=None when there is not header row

# import pandas module
import pandas as pd

# you can specifying seperator between columns by using sep parameter
df = pd.read_csv("data/sample.csv", header=None)

# show first 5 rows
df.head(5)

How to read multiple csv files and concatenate them into one DataFrame

We can use glob method of glob module to find all csv files in some directory and use concat method of pandas to concatenate all pandas DataFrames.

# import pandas and glob module
import pandas as pd
import glob

# specify path in which csv files locate
path = r'path/to/csv files directory' # use your path
all_files = glob.glob(path + "/*.csv")

df_lst = []

for file_name in all_files:
    df = pd.read_csv(file_name, index_col=None, header=0)
    df_lst.append(df)

df = pd.concat(df_lst, axis=0, ignore_index=True)

# show first 5 rows
df.head(5)

How to read pieces of large csv

Sometimes we need to read a very large CSV file and run out of memory if we read it all. We can specify nrow option in the read_csv method to read the specified number of rows.

# import pandas module
import pandas as pd

# read first 100 rows of csv file
df = pd.read_csv("data/sample.csv", nrows=100)

# how many rows in DataFrame
print(len(df))

Result

100

How to read online csv files by url

We can also read a online csv file by using url in the read_csv method.

# import pandas module
import pandas as pd

# use read_csv method and set url, and return pandas DataFrame object
df = pd.read_csv("https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_TotalPopulationBySex.csv")

# show first 5 rows
df.head(5)

How to read csv with quote chars

When content of some columns include delimiter, the column content must be surrounded with quote chars. We can read such a csv file by using quotechar option in read_csv method.

# import pandas module
import pandas as pd

# You can specify quotechar='"'
df = pd.read_csv("data/sample.csv", quotechar='"')

# show first 5 rows
df.head(5)

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us