After a long 12 months of pouring my soul into it, my book, Data Analysis with R, was finally published. After the requisite 2-4 day breather, I started thinking about how I was going to get back into the swing of regular blog posts and decided that the easier and softer way is to cannibalize and expand on an example in the book.
In the chapter “Sources of Data” I show how to consume web data of different formats in R. The motivating example is to build a simple recommendation system that uses user-supplied “tags” (genres/labels) submitted to Last.fm and MusicBrainz to quantify musical artist “similarity”. The example in the book stops at the construction and sorting of the similarity matrix but, in this post, we’re going to make a really fly D3 visualization of the musical similarity network and provide recommendations in the tooltips. The code, including the Javascript and HTML, I used for this post was hastily thrown into a git repo and is available here. If you’re uninterested in the detailed methodology, I suggest you skip to the section labeled “Outcome”.
Methodology
Although in the book tags from both Last.fm and MusicBrainz are used, we’ll just be using Last.fm here. (In additional contrast to the book, the code here is, as you might imagine, substantially faster-paced.)
The first step is to make a character vector of all the artists that you’d like to be included. If you were building a real system, you’d probably want all Last.fm artists. Since we’re not, I just used 70 of my most played artists on my Last.fm. Since I got the list straight from the source, I didn’t have to worry that any of the API requests would return “No Artist Found”.
The following is a function that takes an artist and returns …read more
Source:: r-bloggers.com