I am hoping to determine an efficient way to convert a single data frame into a list of data frames. Below is my reproducible MWE:
set.seed(1)
ABAge = runif(100)
ABPoints = rnorm(100)
ACAge = runif(100)
ACPoints = rnorm(100)
BCAge = runif(100)
BCPoints = rnorm(100)
A_B <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ABAge, Points = ABPoints)
A_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ACAge, Points = ACPoints)
B_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = BCAge, Points = BCPoints)
A_B$ID <- as.character(A_B$ID)
A_C$ID <- as.character(A_C$ID)
B_C$ID <- as.character(B_C$ID)
listFormat <- list("A_B" = A_B, "A_C" = A_C, "B_C" = B_C)
dfFormat <- data.frame(ID = as.character(paste0("ID", 1:100)), A_B.Age = ABAge, A_B.Points = ABPoints, A_C.Age = ACAge, A_C.Points = ACPoints, B_C.Age = BCAge, B_C.Points = BCPoints)
dfFormat$ID = as.character(dfFormat$ID)
This results in a data frame format (dfFormat) that looks like this:
'data.frame': 100 obs. of 7 variables:
$ ID : chr "ID1" "ID2" "ID3" "ID4" ...
$ A_B.Age : num 0.266 0.372 0.573 0.908 0.202 ...
$ A_B.Points: num 0.398 -0.612 0.341 -1.129 1.433 ...
$ A_C.Age : num 0.6737 0.0949 0.4926 0.4616 0.3752 ...
$ A_C.Points: num 0.409 1.689 1.587 -0.331 -2.285 ...
$ B_C.Age : num 0.814 0.929 0.147 0.75 0.976 ...
$ B_C.Points: num 1.474 0.677 0.38 -0.193 1.578 ...
and a list of data frames listFormat that looks like this:
List of 3
$ A_B:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
$ A_C:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
$ B_C:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...
I am hoping to come up with an automated way to convert the dfFormat to listFormat. As can be seen in the above objects there are two main conditions:
The column
IDis always the first column indfFormatand it is always the first column in each sublist oflistFormat.The number of sublists is equal to the number of unique column names in
dfFormatbefore the underscore ('_'). In this case, that is three prefixes (ex. "A_B", "A_C", and "B_C"). These prefixes are also the names of the three sublists.Within each sublist, it contains the number of columns that had the associated prefix ("A_B"). For each sublist, this was two ("Age" and "Points"). These suffixes are the names of the columns.
I asked the reverse question here (i.e. how to go from listFormat to dfFormat) and got some helpful answers that I am learning from. I need to have code to reverse both directions and it seems the reverse direction may need new types of code. I put my attempt below to show how I am stuck!
conUnd <- which(sapply(colnames(dfFormat), function(x) grepl("_", x)))
listName <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1])
uListName <- unique(sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1]))
listCol <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][2])
listFormat = list()
for (i in 1:length(uListName)){
[Gets messy here trying to define column names based on string variables]
}
Any advice would be greatly appreciated. I know my code is not efficient.