Please be aware that I installed Python 2 days ago and wrote my first two codes copying and pasting other codes I found, changing some things here and there.
I have a code that: 1- Creates a Pandas Dataframe 2- Loop some data, creating new dataframes that are "merged" (not sure this is the right word) to the Dataframe created in step 1
On my first code, Step 1 created Dataframe df0 and on Step 2, it created a Dataframe df that was merged to df0 using:
df0=df0.append(df, ignore_index=True)
The final Dataframe was written in a CSV file. But that was taking a long time (around 3 hours) and Python tells me that pandas.append will be suspended soon.
On my second code, Step 1 creates a Dataframe and writes it directly to a CSV suing:
df.to_csv('file.csv',mode='w',index=False,header=True)
and Step 2 creates a new dataframe on each step of the loop and writes it to the CSV file created in Step 1 using:
df.to_csv('file.csv',mode='a',index=False,header=False)
This speeded the proccess of writing all the data in a CSV file (to around 20 minutes).
But this is still a long time for me, since I have to do this a few times throughout the day.
There are around 10.000 dataframes written to this CSV. They have only 5 columns and the number of rows varies from 0 to 75.000. The final CSV has around 3 million rows, so not a big deal for Python I guess.
Is there a smarter and faster way to do what I need to do?
Thanks!