Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Bend or break: strings in R

$
0
0

By John Mount

NewImage

A common complaint from new users of R is: the string processing notation is ugly.

  • Using paste(,,sep='') to concatenate strings seems clumsy.
  • You are never sure which regular expression dialect grep()/gsub() are really using.
  • Remembering the difference between length() and nchar() is initially difficult.

As always things can be improved by using additional libraries (for example: stringr). But this always evokes Python’s “There should be one– and preferably only one –obvious way to do it” or what I call the “rule 42” problem: “if it is the right way, why isn’t it the first way?”

From “Alice’s Adventures in Wonderland”:

Alice’s Adventures in Wonderland, drawn by John Tenniel.

At this moment the King, who had been for some time busily writing in his note-book, cackled out `Silence!’ and read out from his book, `Rule Forty-two. All persons more than a mile high to leave the court.’

Everybody looked at Alice.

`I’m not a mile high,’ said Alice.

`You are,’ said the King.

`Nearly two miles high,’ added the Queen.

`Well, I shan’t go, at any rate,’ said Alice: `besides, that’s not a regular rule: you invented it just now.’

`It’s the oldest rule in the book,’ said the King.

`Then it ought to be Number One,’ said Alice.

We will write a bit on evil ways that you should never actually use to try and weasel around the string concatenation notation issue in R.

If you read enough R documentation and resources you would think you could use one of the object oriented system (S3 or S4) to override the behavior of plus for strings. A bit more digging shows you can’t override infix methods quite the same way you override named methods (R is a language where even the exceptions have exceptions), but it can be done using “Ops groups”.

The …read more

Source:: win-vector.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles