How to assign incremental values based on two columns in R?

Question

My data set looks like:

ID   VISIT_ID  DATE  DV
1001 112233    12-23 3
1001 112233    12-23 4
1001 112244    12-23 5
1001 112244    12-23 6
1001 112244    12-23 7
1001 112244    12-23 8
1002 112254    12-23 3
1002 112254    12-23 4
1002 112254    12-23 5
1002 112264    12-23 6
1002 112264    12-23 7
1002 112264    12-23 8

I want the results like below; it assign a incremental encounter value for each unique VISIT_ID. The sequence will restart from 1 for each ID. Helps will be much appreciated.

ID   VISIT_ID  DATE  DV  ENCOUNTER
1001 112233    12-23 3   1
1001 112233    12-23 4   1
1001 112244    12-23 5   2
1001 112244    12-23 6   2
1001 112244    12-23 7   2
1001 112244    12-23 8   2
1002 112254    12-23 3   1
1002 112254    12-23 4   1
1002 112254    12-23 5   1
1002 112264    12-23 6   2
1002 112264    12-23 7   2
1002 112264    12-23 8   2

See also http://stackoverflow.com/questions/37006427/group-values-by-unique-elements/37007036#37007036 in order to understand better the pros/cons of the below proposed solutions + some additional options. — David Arenburg, Jan 04 '17 at 07:46

akrun · Answer 1 · 2017-01-04T07:40:48.420

We can use match to find the index of unique 'VISIT_ID' after grouping by 'ID'

library(dplyr)
df1 %>% 
   group_by(ID) %>% 
   mutate(ENCOUNTER = match(VISIT_ID, unique(VISIT_ID))) 
#      ID VISIT_ID  DATE    DV ENCOUNTER
#    <int>    <int> <chr> <int>     <int>
#1   1001   112233 12-23     3         1
#2   1001   112233 12-23     4         1
#3   1001   112244 12-23     5         2
#4   1001   112244 12-23     6         2
#5   1001   112244 12-23     7         2
#6   1001   112244 12-23     8         2
#7   1002   112254 12-23     3         1
#8   1002   112254 12-23     4         1
#9   1002   112254 12-23     5         1
#10  1002   112264 12-23     6         2
#11  1002   112264 12-23     7         2
#12  1002   112264 12-23     8         2

Or another option is duplicated

df1 %>%
   group_by(ID) %>% 
   mutate(ENCOUNTER = cumsum(!duplicated(VISIT_ID)))

Or using data.table

library(data.table)
setDT(df1)[, ENCOUNTER := match(VISIT_ID, unique(VISIT_ID), by = ID]

Or with base R

with(df1, ave(VISIT_ID, ID, FUN = function(x) cumsum(!duplicated(x))))

`ave(df$VISIT_ID, df$ID,FUN = function(x) match(x, unique(x)))` — Ronak Shah, Jan 04 '17 at 07:40

Ronak Shah · Answer 2 · 2017-01-04T07:43:09.720

1

With base R ave we can convert the VISIT_ID to factor and then numeric to get unique number for every VISIT_ID of ID

df$ENCOUNTER <- ave(df$VISIT_ID, df$ID,FUN = function(x) as.numeric(as.factor(x)))
df

#     ID VISIT_ID  DATE DV ENCOUNTER
#1  1001   112233 12-23  3         1
#2  1001   112233 12-23  4         1
#3  1001   112244 12-23  5         2
#4  1001   112244 12-23  6         2
#5  1001   112244 12-23  7         2
#6  1001   112244 12-23  8         2
#7  1002   112254 12-23  3         1
#8  1002   112254 12-23  4         1
#9  1002   112254 12-23  5         1
#10 1002   112264 12-23  6         2
#11 1002   112264 12-23  7         2
#12 1002   112264 12-23  8         2

edited Jan 04 '17 at 07:43

answered Jan 04 '17 at 07:37

Ronak Shah

377,200
20
156
213

I would say this as problematic by the way you converted to `numeric` – akrun Jan 04 '17 at 07:42
not sure what you mean by that. In which case would it fail? – Ronak Shah Jan 04 '17 at 07:45
Suppose you have elements like `as.numeric(factor(c("B", "B", "A", "A")))` (though in the example showed, it could work) – akrun Jan 04 '17 at 07:47
Yep. In case if we have `VISIT_ID` as strings we can use the `match` or `cumsum` option. – Ronak Shah Jan 04 '17 at 07:57
Yes, but then it will be bordering on plagiarism... :-) – akrun Jan 04 '17 at 08:47
I can provide answers on data provided by OP. :-) – Ronak Shah Jan 04 '17 at 08:49

How to assign incremental values based on two columns in R?

2 Answers2

Linked

Related