Aggregate is quite a fast one, but you can use an sapply solution to parallelize the code. This can easily be done on Windows using snowfall :
require(snowfall)
sfInit(parallel=TRUE,cpus=2)
sfExport("Data")
ID <- unique(Data$ID)
CombNames <- sfSapply(ID,function(i){
paste(Data$Names[Data$ID==i],collapse=" | ")
})
data.frame(ID,CombNames)
sfStop()
The parallel version will give you an extra speedup, but the single sapply solution is actually slower than aggregate. Tapply is a bit faster, but can't be parallelized using snowfall. on my computer :
n <- 3000
m <- 3
Data <- data.frame( ID = rep(1:n,m),
Names=rep(LETTERS[1:m],each=n))
# using snowfall for parallel sapply
system.time({
ID <- unique(Data$ID)
CombNames <- sfSapply(ID,function(i){
paste(Data$Names[Data$ID==i],collapse=" | ")
})
data.frame(ID,CombNames)
})
user system elapsed
0.02 0.00 0.33
# using tapply
system.time({
CombNames <- tapply(Data$Names,Data$ID,paste,collapse=" | ")
data.frame(ID=names(CombNames),CombNames)
})
user system elapsed
0.44 0.00 0.44
# using aggregate
system.time(
aggregate(Names ~ ID, data=Data, FUN=paste, collapse=" | ")
)
user system elapsed
0.47 0.00 0.47
# using the normal sapply
system.time({
ID <- unique(Data$ID)
CombNames <- sapply(ID,function(i){
paste(Data$Names[Data$ID==i],collapse=" | ")
})
data.frame(ID,CombNames)
})
user system elapsed
0.75 0.00 0.75
Note:
For the record, the better sapply-solution would be :
CombNames <- sapply(split(Data$Names,Data$ID),paste,collapse=" | ")
data.frame(ID=names(CombNames),CombNames)
which is equivalent to tapply. But parallelizing this one is actually slower, as you have to move more data around within the sfSapply. The speed comes from copying the dataset to every cpu. This is what you have to keep in mind when your dataset is huge : you'll pay the speed with more memory usage.