Combine CSV files in Python in 3 Easy Steps

Combine CSV files in Python in 3 Easy Steps

Unfortunately, with the latest update to RStudio, RWordPress has not yet been updated. In the meantime, manually posting RStudio code on how to easily combine two files in R in a few easy steps:

  • Step 1:  Import the packages
    • Select the folder
  • Step 2:  Read each file and combine/concatenate it
    • Save the combined csv file
    • Clean-up if needed i.e. remove extra columns etc.
  • Step 3:  Save the combined csv file

Step 1: Import the packages and select the folder

import pandas as pd

import os import glob  # Need to install glob? add:   ‘pip install glob

folder = r’C:/Users/Chris/code/combine/’   #Note:  Make it easy with the creation of a subfolder just to combine files (and avoid including inadvertent files)

all_files = glob.glob(folder + “/*.csv”) all_files

Step 2:  Read the individual files and combine/concatenate it

# Ensure the format is the same for each file) df_combined = (pd.read_csv(f, sep=’,’) for f in all_files)

data_in_each_csv = (pd.read_csv(f, sep=’,’) for f in all_files)

df_combined = pd.concat(data_in_each_csv, ignore_index=True)

df_combined.to_csv(“merged.csv”) df_combined

Step 2 Optional:  Clean the output as necessary

Drop duplicates:  df = df.drop_duplicates(‘column_name’)  

Drop column:  df2 = df.drop(columns = [‘Unnamed: 0’])

Step 4:  Save the combined CSV folder

import csv

df2.to_csv(‘C:/Users/chris/code/outputs/combined_files.csv’)

# Or if you prefer an Excel version

# df2.to_excel(r’C:/Users/Chris/code/outputs/combined_files.xlsx’, index=False)

Leave a Reply