Pandas >> Read Muliple Csv Files
Table of Contents
This tutorial will explain how to read multiple csv files in Pandas. (glob, os.walk(), Dask)
Using glob
Using a for loop: You can use a for loop to read multiple CSV files into separate data frames and then concatenate them into a single data frame. Here’s an example:
import pandas as pd
import glob
# List all CSV files in a directory
files = glob.glob('path/to/files/*.csv')
# Read each file into a separate data frame
dfs = []
for file in files:
df = pd.read_csv(file)
dfs.append(df)
# Concatenate all data frames into a single data frame
df_all = pd.concat(dfs, ignore_index=True)
Using os.walk()
You can use the os.walk() function to recursively search a directory for CSV files and read them into a single data frame. Here’s an example:
import pandas as pd
import os
# Walk through all directories and subdirectories
dfs = []
for root, dirs, files in os.walk('path/to/files'):
for file in files:
# Check if the file is a CSV file
if file.endswith('.csv'):
# Read the file into a data frame and append it to the list of data frames
df = pd.read_csv(os.path.join(root, file))
dfs.append(df)
# Concatenate all data frames into a single data frame
df_all = pd.concat(dfs, ignore_index=True)
Using Dask library
You can also use dask libray.
Dask is a parallel computing library that can handle large datasets that don’t fit into memory. It can also read multiple CSV files using wildcards and concatenate them into a single data frame.
Install Dask
pip install dask
Here’s an example:
import dask.dataframe as dd
# Read all CSV files matching a pattern into a Dask data frame
df_dask = dd.read_csv('path/to/files/*.csv')
# Convert the Dask data frame to a Pandas data frame
df_pandas = df_dask.compute()
df_pandas