By hrbrmstr
(If you don’t know what XML is, you should probably read a primer before reading this post,)
When working with data, one inevitably comes across things encoded in XML. I’m in the “anti-XML” camp, but deal with my fair share of XML in “cyber” and help out enough people who have to work with XML that I’ve become pretty proficient when slicing & dicing it.
R has two main packages to deal with XML: the original XML
package and the more lightweight and modern xml2
package. If you really need all the power of libxml2
(the C library that powers both packages) or are creating XML from R, then you probably know your way around the XML
package and are pretty self-sufficient.
Most folks can get by with the xml2
package if their goal is to work with XML data. By “work with” I mean read in files or data from APIs that come in XML format and have to find nuggets of gold in between all those and
>
tags. To do so requires finding what you need and that means using a query language called XPath
to pinpoint the node(s) you are after. Working with XPath
can be pretty daunting for those who went to school to ultimately cure diseases, build high-performing stock portfolios, target advertising to everyone or perform a host of other real work. Becoming an expert in XPath
was not something on the bucket list but to work with XML you will need to be familiar with it.
The xmlview
package provides a way to visually inspect XML and interactively test out XPath
expressions. It’s as simple to use as:
devtools::install_github("ramnathv/htmlwidgets") # we use some bleeding edge features devtools::install_github("hrbrmstr/xmlview") library(xml2) library(xmlview) # plain text XML xml_view("ToveJaniReminderDon't forget me this weekend!") # read-in XML document doc read_xml("http://www.npr.org/rss/rss.php?id=1001") xml_view(doc, add_filter=TRUE) |
(There’s also an experimental …read more
Source:: r-bloggers.com