I have a table:
id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14
And I want to convert it to:
id first last
1 1 5
2 3 7
3 8 14
Please help!
I have a table:
id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14
And I want to convert it to:
id first last
1 1 5
2 3 7
3 8 14
Please help!
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we get the first and last value of 'time'
library(data.table)
setDT(df1)[, list(firstocc = time[1L], lastocc = time[.N]),
by = id]
Or with dplyr, we use the same methodology.
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(firstocc = first(time), lastocc = last(time))
Or with base R (no packages needed)
do.call(rbind, lapply(split(df1, df1$id),
function(x) data.frame(id = x$id[1],
firstocc = x$time[1], lastocc = x$time[nrow(x)])))
If we need to be based on the min and max values (not related to the expected output) , the data.table option is
setDT(df1)[, setNames(as.list(range(time)),
c('firstOcc', 'lastOcc')) ,id]
and dplyr is
df1 %>%
group_by(id) %>%
summarise(firstocc = min(time), lastocc = max(time))
There are many packages that can perform aggregation of this sort in R. We show how to do it without any packages and then show it with some packages.
1) Use aggregate. No packages needed.
ag <- aggregate(time ~ id, DF, function(x) c(first = min(x), last = max(x)))
giving:
> ag
id time.first time.last
1 1 1 5
2 2 2 7
3 3 3 14
ag is a two column data frame whose second column contains a two column matrix with columns named 'first' and 'last'. If you want to flatten it to a 3 column data frame use:
do.call("cbind", ag)
giving:
id first last
[1,] 1 1 5
[2,] 2 2 7
[3,] 3 3 14
1a) This variation of (1) is more compact at the expense of uglier column names.
aggregate(time ~ id, DF, range)
2) sqldf
library(sqldf)
sqldf("select id, min(time) first, max(time) last from DF group by id")
giving:
id first last
[1,] 1 1 5
[2,] 2 2 7
[3,] 3 3 14
3) summaryBy summaryBy in the doBy package is much like aggregate:
library(doBy)
summaryBy(time ~ id, data = DF, FUN = c(min, max))
giving:
id time.min time.max
1 1 1 5
2 2 2 7
3 3 3 14
Note: Here is the input DF in reproducible form:
Lines <- "id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14"
DF <- read.table(text = Lines, header = TRUE)
Update: Added (1a), (2) and (3) and fixed (1).
You can remove duplicates and reshape it
dd <- read.table(header = TRUE, text = "id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14")
d2 <- dd[!(duplicated(dd$id) & duplicated(dd$id, fromLast = TRUE)), ]
reshape(within(d2, tt <- c('first', 'last')), dir = 'wide', timevar = 'tt')
# id time.first time.last
# 1 1 1 5
# 4 2 3 7
# 7 3 8 14