Pandas >> Read Muliple Csv Files

2023-03-12 Pandas

Table of Contents

This tutorial will explain how to read multiple csv files in Pandas. (glob, os.walk(), Dask)

Pandas

Using glob

Using a for loop: You can use a for loop to read multiple CSV files into separate data frames and then concatenate them into a single data frame. Here’s an example:

import pandas as pd
import glob

# List all CSV files in a directory
files = glob.glob('path/to/files/*.csv')

# Read each file into a separate data frame
dfs = []
for file in files:
    df = pd.read_csv(file)
    dfs.append(df)

# Concatenate all data frames into a single data frame
df_all = pd.concat(dfs, ignore_index=True)

Using os.walk()

You can use the os.walk() function to recursively search a directory for CSV files and read them into a single data frame. Here’s an example:

import pandas as pd
import os

# Walk through all directories and subdirectories
dfs = []
for root, dirs, files in os.walk('path/to/files'):
    for file in files:
        # Check if the file is a CSV file
        if file.endswith('.csv'):
            # Read the file into a data frame and append it to the list of data frames
            df = pd.read_csv(os.path.join(root, file))
            dfs.append(df)

# Concatenate all data frames into a single data frame
df_all = pd.concat(dfs, ignore_index=True)

Using Dask library

You can also use dask libray.
Dask is a parallel computing library that can handle large datasets that don’t fit into memory. It can also read multiple CSV files using wildcards and concatenate them into a single data frame.

Install Dask

pip install dask

Here’s an example:

import dask.dataframe as dd

# Read all CSV files matching a pattern into a Dask data frame
df_dask = dd.read_csv('path/to/files/*.csv')

# Convert the Dask data frame to a Pandas data frame
df_pandas = df_dask.compute()
df_pandas

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us