Here is an alternative approach which creates all unique combinations of TIME, TYPE, and duplicated GROUPs through a cross join and then computes the correlation of SCORE for the correspondings subsets of DATA:
library(data.table) # development version 1.14.3 required
setDT(DATA, key = c("GROUP", "TYPE", "TIME"))[
, CJ(time = TIME, type = TYPE, groupA = GROUP, groupB = GROUP, unique = TRUE)][
groupA < groupB][
, corType := paste0("G", groupA, "G", groupB)][][
, corValue := cor(DATA[.(groupA, type, time), SCORE],
DATA[.(groupB, type, time), SCORE]),
by = .I][]
time type groupA groupB corType corValue
1: 100 1 1 2 G1G2 0.11523940
2: 100 1 1 3 G1G3 -0.05124326
3: 100 1 1 4 G1G4 -0.16943203
4: 100 1 2 3 G2G3 0.05475435
5: 100 1 2 4 G2G4 -0.10769738
6: 100 1 3 4 G3G4 0.01464146
7: 100 2 1 2 G1G2 NA
8: 100 2 1 3 G1G3 NA
9: 100 2 1 4 G1G4 NA
10: 100 2 2 3 G2G3 NA
11: 100 2 2 4 G2G4 NA
12: 100 2 3 4 G3G4 NA
13: 101 1 1 2 G1G2 NA
14: 101 1 1 3 G1G3 NA
15: 101 1 1 4 G1G4 NA
16: 101 1 2 3 G2G3 NA
17: 101 1 2 4 G2G4 NA
18: 101 1 3 4 G3G4 NA
19: 101 2 1 2 G1G2 -0.04997479
20: 101 2 1 3 G1G3 -0.02262932
21: 101 2 1 4 G1G4 -0.00331578
22: 101 2 2 3 G2G3 -0.01243952
23: 101 2 2 4 G2G4 0.16683223
24: 101 2 3 4 G3G4 -0.10556083
time type groupA groupB corType corValue
Explanation
DATA is coerced to class data.table while setting a key on columns GROUP, TYPE, and TIME. Keying is required for fast subsetting later.
- The cross join
CJ() creates all unique combinations of columns TIME, TYPE, GROUP, and GROUP (twice). The columns of the cross join have been renamed to avoid name clashes later on.
[groupA < groupB] ensures that equivalent combinations of groupA and groupB only appear once, e.g., G2G1 is dropped in favour of G1G2. So, this is kind of data.table version of t(combn(unique(DATA$GROUP), 2)).
- A new column
corType is append by reference.
- Finally, the groupwise correlations are computed by stepping rowwise through the cross join table (using
by = .I) and subsetting DATA by groupA, type, time and groupB, type, time, resp., using fast subsetting through keys. Please, see the vignette Keys and fast binary search based subset for more details.
Note that by = .I is a new feature of data.table development version 1.14.3.
Combinations of time, type, and group which do not exist in DATA will appear in the result set but are marked by NA in column corValue.
Data
set.seed(42) # required for reproducible data
DATA = data.frame("GROUP" = sort(rep(1:4, 200)),
"TYPE" = rep(1:2, 400),
"TIME" = rep(100:101, 400),
"SCORE" = sample(1:100, r=T, 800))