I am trying to do sampling with replacement in Scala/Spark, defining the probabilities for each class.
This is how I would do it in R.
# Vector to sample from
x <- c("User1","User2","User3","User4","User5")
# Occurenciens from which to obtain sampling probabilities
y <- c(2,4,4,3,2)
# Calculate sampling probabilities
p <- y / sum(y)
# Draw sample with replacement of size 10
s <- sample(x, 10, replace = TRUE, prom = p)
# Which yields (for example):
[1] "User5" "User1" "User1" "User5" "User2" "User4" "User4" "User2" "User1" "User3"
How can I do the same in Scala / Spark?