How to Read various data in Pandas

2021-10-11 Pandas

In this article, we will discuss how to read various type of data in Pandas including pickle, clipboard, excel, json, HTML, XML, parquet files.

[Pandas] How to read various data

How to read pickle file

We can use read_pickle method of pandas and specify the path of pickle file to read pickle file.

# import pandas module
import pandas as pd

# use read_pickle method, and return pandas DataFrame object
df = pd.read_pickle("path/to/sample.pkl")

How to read content from clipboard

We can use read_clipboard of pandas to read content from clipboard. We can use sep option to specify delimiter and the default setting is one or more whitespace characters.

# import pandas module
import pandas as pd

# use sep option to specifying delimiter
df = pd.read_clipboard(sep=';')

How to read excel file

We can use read_excel of pandas to read content from Excel.

# import pandas module
import pandas as pd

# We can specify sheet name by specifying `sheet_name` option.
# If we don't want to have the index, we can specify `index_col=None`.
# If we want to treat the first line as header, we can use `header=0`.
df = pd.read_excel('path/to/sample.xlsx', sheet_name='Sheet1', index_col=None, header=0)  

# If there is no header, we can use `header=None`
df = pd.read_excel('path/to/sample.xlsx', sheet_name='Sheet1', index_col=None, header=None)

How to read json file

json file can store unstructured data, we can use read_json to read json file into pandas.

orient option is used to indicate JSON string format:

orient='split': JSON string is dict like {index -> [index], columns -> [columns], data -> [values]}
orient='records': JSON string is list like [{column -> value}, ... , {column -> value}]
orient='index': JSON string is dict like {index -> {column -> value}}
orient='columns': JSON string is dict like {column -> {index -> value}}
orient='values': JSON string is just the values array

# import pandas module
import pandas as pd

df = pd.read_json('path/to/sample.json', orient='split')

How to read HTML file

HTML file can be read into pandas using read_html method of pandas. read_html will return a list of DataFrames.

# import pandas module
import pandas as pd

# `match` is used to find the expected table and read it
# default settings is `.+` (match any non-empty string)
# If we have a table containing "John", we can specify `match='John'` to find it.
list_of_df = pd.read_html('str, path object or file-like object', match='John')  

print(len(list_of_df))

How to read XML file

We can use read_xml of pandas to read content from XML file.

The syntax of xpath

# import pandas module
import pandas as pd

xml = '''<?xml version='1.0' encoding='utf-8'?>
<data>
  <row id="1" name="Kevin" score="80"/>
  <row id="2" name="John"/>
  <row id="3" name="Jack" score="90"/>
</data>'''

# We can specify xpath to find wanted nodes. default ‘./*’
df = pd.read_xml(xml, xpath=".//row")

Result

	id	name	score
0	1	Kevin	80
1	2	John	NaN
2	3	Jack	90

How to read parquet file

We can use read_parquet of pandas to read content from parquet file.

Apache Parquet is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem.

# import pandas module
import pandas as pd

# engine option is used to specify library of parquet.
# {‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
df = pd.read_xml('path/to/sample.parquet', engine="pyarrow")

That's it ! Code Snippets

Pandas >> How to Read various data in Pandas

Table of Contents

How to read pickle file

How to read content from clipboard

How to read excel file

How to read json file

How to read HTML file

How to read XML file

How to read parquet file

Contact us

Pandas >> How to Read various data in Pandas

Table of Contents

How to read pickle file

How to read content from clipboard

How to read excel file

How to read json file

How to read HTML file

How to read XML file

How to read parquet file

Subscribe and be the FIRST reader of our latest articles

Contact us