Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

sample(): “Monkey’s Paw” style programming in R

$
0
0

By John Mount

NewImage

The R functions base::sample and base::sample.int are functions that include extra “conveniences” that seem to have no purpose beyond encouraging grave errors. In this note we will outline the problem and a suggested work around. Obviously the R developers are highly skilled people with good intent, and likely have no choice in these matters (due to the need for backwards compatibility). However, that doesn’t mean we can’t take steps to write safer and easier to debug code.

“The Monkey’s Paw” William Wymark Jacobs, 1902.

Suppose we were given data in the following form:

set.seed(2562)
x 0)
print(goodIndices)
# [1] 2

Further suppose our goal is to generate a sample of size 5 of the values of x from only the goodIndices positions. That is a sample (with replacement) of the positive values from our vector x. I challenge a working R developer who has used base::sample or base::sample.int regularly to say they have never written at least one of the following errors at some time:

sample(x[goodIndices],size=5,replace=TRUE)
# [1] 5 6 1 3 2
x[sample(goodIndices,size=5,replace=TRUE)]
# [1]   7.361322 -17.442331   7.361322 -17.442331   7.361322

These samples are obviously wrong, but you will notice this only if you check. There is only one positive value in x (7.361322) so the only possible legitimate sample of 5 positive values under replacement is c(7.361322,7.361322,7.361322,7.361322,7.361322). Notice we never got this, and received no diagnostic. A bad sample like this can take a long time to find through its pernicious effects in downstream code.

Notice the following code works (because it reliably prohibits triggering the horrid special case):

as.numeric(sample(as.list(x[goodIndices]),size=5,replace=TRUE))
# [1] 7.361322 7.361322 7.361322 7.361322 7.361322
x[as.numeric(sample(as.list(goodIndices),size=5,replace=TRUE))]
# [1] 7.361322 7.361322 7.361322 7.361322 7.361322
x[goodIndices[sample.int(length(goodIndices),size=5,replace=TRUE)]]
# [1] 7.361322 7.361322 7.361322 7.361322 7.361322

As always: this is a deliberately trivial example so you can see the problem clearly.

So what is going on? …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles