This week, we continue the parallel themes of deep learning and natural language processing. Last week I mentioned some papers that use deep learning for NLP. In deep learning, these tasks are modeled as a prediction problem, which is why such an extensive training set is required. I think it’s important to remember this amongst the flurry of sensationalist headlines around deep learning. While I don’t think anyone believes that these systems are actually “threatening” humans or are “self-aware”, it troubles me that these sensationalist headlines can feed the paranoia of high-profile people warning against super intelligent AIs. Besides, Isaac Asimov solved this decades ago with a bit of computer science mischief and a 0th Robot Law.
Spark + H2O
Spark 1.6
The preview release of Spark 1.6 was announced a few weeks back. It appears that the DataBricks model is to give their cloud clients access prior to the general public. For the technically minded, this isn’t a huge issue for two reasons: 1) if you’re using Spark seriously, it’s better for you that someone else is beta testing and working the kinks out; 2) you can build the release yourself via the source code. The DataBricks crew is already onto the first release candidate, so you can probably get a fairly stable build at this point.
Judging from the source code, R users ain’t getting no love this time around. It seems that the bindings for MLlib still only support generalized linear models (GLM). Hence, the biggest strength of Spark for R users is around data collection and processing (aka cleaning, munging, wrangling) prior to conducting an analysis. To model large datasets, it looks like you need to look elsewhere.
H2O / Sparkling Water
People anticipating the full release of Spark 1.6 can tickle their fancy with a different announcement on their blog: integration …read more
Source:: r-bloggers.com