Pandas Dataframes to CSV (on a loop) taking a long time

Category: python pandas dataframe (2 Views)

Please be aware that I installed Python 2 days ago and wrote my first two codes copying and pasting other codes I found, changing some things here and there.

I have a code that: 1- Creates a Pandas Dataframe 2- Loop some data, creating new dataframes that are "merged" (not sure this is the right word) to the Dataframe created in step 1

On my first code, Step 1 created Dataframe df0 and on Step 2, it created a Dataframe df that was merged to df0 using:

df0=df0.append(df, ignore_index=True)

The final Dataframe was written in a CSV file. But that was taking a long time (around 3 hours) and Python tells me that pandas.append will be suspended soon.

On my second code, Step 1 creates a Dataframe and writes it directly to a CSV suing:

df.to_csv('file.csv',mode='w',index=False,header=True)

and Step 2 creates a new dataframe on each step of the loop and writes it to the CSV file created in Step 1 using:

df.to_csv('file.csv',mode='a',index=False,header=False)

This speeded the proccess of writing all the data in a CSV file (to around 20 minutes).

But this is still a long time for me, since I have to do this a few times throughout the day.

There are around 10.000 dataframes written to this CSV. They have only 5 columns and the number of rows varies from 0 to 75.000. The final CSV has around 3 million rows, so not a big deal for Python I guess.

Is there a smarter and faster way to do what I need to do?

Thanks!

🔴 No definitive solution yet

📌 Solution 1

I think it depends on your data... but using pd.concat is not a bad idea... even if it is slow... maybe you must switch to something faster... for example polars