Combine CSV files in Python in 3 Easy Steps
Combine CSV files in Python in 3 Easy Steps
Unfortunately, with the latest update to RStudio, RWordPress has not yet been updated. In the meantime, manually posting RStudio code on how to easily combine two files in R in a few easy steps:
- Step 1: Import the packages
- Select the folder
- Step 2: Read each file and combine/concatenate it
- Save the combined csv file
- Clean-up if needed i.e. remove extra columns etc.
- Step 3: Save the combined csv file
Step 1: Import the packages and select the folder
import pandas as pd
import os import glob # Need to install glob? add: ‘pip install glob
folder = r’C:/Users/Chris/code/combine/’ #Note: Make it easy with the creation of a subfolder just to combine files (and avoid including inadvertent files)
all_files = glob.glob(folder + “/*.csv”) all_files
Step 2: Read the individual files and combine/concatenate it
# Ensure the format is the same for each file) df_combined = (pd.read_csv(f, sep=’,’) for f in all_files)
data_in_each_csv = (pd.read_csv(f, sep=’,’) for f in all_files)
df_combined = pd.concat(data_in_each_csv, ignore_index=True)
df_combined.to_csv(“merged.csv”) df_combined
Step 2 Optional: Clean the output as necessary
Drop duplicates: df = df.drop_duplicates(‘column_name’)
Drop column: df2 = df.drop(columns = [‘Unnamed: 0’])
Step 4: Save the combined CSV folder
import csv
df2.to_csv(‘C:/Users/chris/code/outputs/combined_files.csv’)
# Or if you prefer an Excel version
# df2.to_excel(r’C:/Users/Chris/code/outputs/combined_files.xlsx’, index=False)