1

I am creating test and training data for an algorithm. I have data in different csv files I want to create training and test data from that.

I have imported all the csv files to the pandas dataframe using

dfs = [pd.read_csv(file) for file in datafiles]

dfs[1] has the first dataframe dfs[2] second and so on

I would like to assign them to different data frame in the format Xtest1 is dfs[1], Xtest2 is dfs[2] and so on till the end of the files

Can anyone help do it using a loop or any other idea

P.Tillmann
  • 2,090
  • 10
  • 17
  • Are you using sklearn? They have a function to do this for you. – Jack Moody Apr 01 '19 at 11:53
  • Possible duplicate of [How do I create test and train samples from one dataframe with pandas?](https://stackoverflow.com/questions/24147278/how-do-i-create-test-and-train-samples-from-one-dataframe-with-pandas) – Jack Moody Apr 01 '19 at 11:53
  • 1
    Why flood global environment with many similar objects and not just keep using the **one** list you have? – Parfait Apr 01 '19 at 12:57
  • See [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – Georgy Apr 01 '19 at 13:40

2 Answers2

0

You need to use a dictionary to do this. Can you try the following:

dfs = {'Xtest'+ str(ind): pd.read_csv(file) for ind, file in enumerate(datafiles)}

And whenever you need to access the dataframe, you can do it the following way:

dfs['Xtest1']

If you want to iterate the dictionary you can do using the following:

for i in range(4):
    print(dfs['Xtest' + str(i)])
Jeril
  • 7,858
  • 3
  • 52
  • 69
  • Hi Jeril thanks alot. Actually i have to apply a train test split function for that i need the dataframe in separate dataframe like X_test1, X_test2. is there any way i can iterate like a for loop in which I can change variables like Xtest# and # can be a number like 1 2 3 depending on the loop iteration number. – Abdullah Nisar Apr 01 '19 at 12:33
  • `dfs` is a dictionary of dataframes, and you can iterate it in a forloop. Check my revised solution. – Jeril Apr 01 '19 at 12:46
  • Hi Jeril. Thanks. Actually I have to do the following. I want to do it in a loop train1, test1 = train_test_split(training_set1, test_size=0.2, shuffle=False) train2, test2 = train_test_split(training_set2, test_size=0.4, shuffle=False) train3, test3 = train_test_split(training_set3, test_size=0.2, shuffle=False) for i in range(10, len(train1)): X_train1.append(train1[i-10:i, 1:11]) y_train1.append(train1[i, 10]) X_train1, y_train1 = np.array(X_train1), np.array(y_train1) similarly for train2 and train 3 – Abdullah Nisar Apr 01 '19 at 12:51
0

You mean automatically create variables and assign something to them?

try globals(), locals()

i.e.

for a in range(10):
    locals()["var1_"+str(a)+"] = 1
Pavel Kovtun
  • 367
  • 2
  • 8