Pandas >> How to Read CSV file in Pandas
Table of Contents
In this article, we will talk about various ways of reading csv files in Pandas
.
How to read a csv file
We can use read_csv
method of pandas
to read CSV files easily by specifying the address of the CSV file.
The address can be a URL starting with HTTP, FTP, S3, gs, and file.
# import pandas module
import pandas as pd
# use read_csv method, and return pandas DataFrame object
df = pd.read_csv("data/sample.csv")
# show first 5 rows
df.head(5)
How to read csv of shift_jis
Pandas will read csv files with UTF encoding by default, if we want to read csv files with other encoding,
we can use encoding
option. For example, if we want to read a csv file with shift_jis
encoding, we can specify encoding='cp392'
option.
# import pandas module
import pandas as pd
# use encoding option to specifying csv encoding
df = pd.read_csv("data/sample.csv", encoding='cp392')
# show first 5 rows
df.head(5)
How to read all columns of csv files as strings
Sometimes we need to read all content of csv as strings. Then we can user dtype
option.
When reading csv files, we can specify dtype=str
in read_csv
method.
# import pandas module
import pandas as pd
# if you want to read all csv file contents as strings you can specifying dtype=str
df = pd.read_csv("data/sample.csv", dtype=str)
# show first 5 rows
df.head(5)
How to read date column from csv files
Sometimes we need to read some columns as date
format from a csv file for dealing them conviniently later.
For example, now we have a csv file below and want to convert birth column to date
format when reading.
name | sex | score | birth |
---|---|---|---|
Kevin | Male | 80 | 1992-02-02 |
Jack | Male | 90 | 1995-09-30 |
Mary | Female | 95 | 1998-12-11 |
# import pandas module
import pandas as pd
# Firstly we can add a date parser function, which is the most flexible way possible.
from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d')
# Then we use this parser function to parse the columns to date spcified by `parse_dates` option.
df = pd.read_csv("data/sample.csv", parse_dates=["birth"], date_parser=dateparse)
# We can check the type of `birth` column
df["birth"].dtypes
Result
dtype('<M8[ns]')
How to read csv files without index
If you don’t want to read csv files without index, you can use the option index_col=False
# import pandas module
import pandas as pd
# if you don't want to use first column as index you can achieve it by specifying index_col=False
df = pd.read_csv("data/sample.csv", index_col=False)
# show first 5 rows
df.head(5)
How to read csv files by specifying seperator
You can specify customized seperator by using sep=","
# import pandas module
import pandas as pd
# you can specifying seperator between columns by using sep parameter
df = pd.read_csv("data/sample.csv", sep=",")
# show first 5 rows
df.head(5)
How to read csv files using first row as header
You can use first row as header by specifying header=0
# import pandas module
import pandas as pd
# you can specifying seperator between columns by using sep parameter
df = pd.read_csv("data/sample.csv", header=0)
# show first 5 rows
df.head(5)
How to read csv files without header"
You can specify header=None when there is not header row
# import pandas module
import pandas as pd
# you can specifying seperator between columns by using sep parameter
df = pd.read_csv("data/sample.csv", header=None)
# show first 5 rows
df.head(5)
How to read multiple csv files and concatenate them into one DataFrame
We can use glob
method of glob
module to find all csv files in some directory and use concat
method of pandas to concatenate all pandas DataFrames.
# import pandas and glob module
import pandas as pd
import glob
# specify path in which csv files locate
path = r'path/to/csv files directory' # use your path
all_files = glob.glob(path + "/*.csv")
df_lst = []
for file_name in all_files:
df = pd.read_csv(file_name, index_col=None, header=0)
df_lst.append(df)
df = pd.concat(df_lst, axis=0, ignore_index=True)
# show first 5 rows
df.head(5)
How to read pieces of large csv
Sometimes we need to read a very large CSV file and run out of memory if we read it all.
We can specify nrow
option in the read_csv
method to read the specified number of rows.
# import pandas module
import pandas as pd
# read first 100 rows of csv file
df = pd.read_csv("data/sample.csv", nrows=100)
# how many rows in DataFrame
print(len(df))
Result
100
How to read online csv files by url
We can also read a online csv file by using url in the read_csv
method.
# import pandas module
import pandas as pd
# use read_csv method and set url, and return pandas DataFrame object
df = pd.read_csv("https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_TotalPopulationBySex.csv")
# show first 5 rows
df.head(5)
How to read csv with quote chars
When content of some columns include delimiter, the column content must be surrounded with quote chars. We can read such a csv file by using quotechar
option in read_csv
method.
# import pandas module
import pandas as pd
# You can specify quotechar='"'
df = pd.read_csv("data/sample.csv", quotechar='"')
# show first 5 rows
df.head(5)