By John Mount
A common complaint from new users of R is: the string processing notation is ugly.
- Using
paste(,,sep='')
to concatenate strings seems clumsy. - You are never sure which regular expression dialect
grep()/gsub()
are really using. - Remembering the difference between
length()
andnchar()
is initially difficult.
As always things can be improved by using additional libraries (for example: stringr). But this always evokes Python’s “There should be one– and preferably only one –obvious way to do it” or what I call the “rule 42” problem: “if it is the right way, why isn’t it the first way?”
From “Alice’s Adventures in Wonderland”:
Alice’s Adventures in Wonderland, drawn by John Tenniel.
At this moment the King, who had been for some time busily writing in his note-book, cackled out `Silence!’ and read out from his book, `Rule Forty-two. All persons more than a mile high to leave the court.’
Everybody looked at Alice.
`I’m not a mile high,’ said Alice.
`You are,’ said the King.
`Nearly two miles high,’ added the Queen.
`Well, I shan’t go, at any rate,’ said Alice: `besides, that’s not a regular rule: you invented it just now.’
`It’s the oldest rule in the book,’ said the King.
`Then it ought to be Number One,’ said Alice.
We will write a bit on evil ways that you should never actually use to try and weasel around the string concatenation notation issue in R.
If you read enough R documentation and resources you would think you could use one of the object oriented system (S3 or S4) to override the behavior of plus for strings. A bit more digging shows you can’t override infix methods quite the same way you override named methods (R is a language where even the exceptions have exceptions), but it can be done using “Ops groups”.
The …read more
Source:: win-vector.com