by Joseph Rickert
If there is anything that experienced machine learning practitioners are likely to agree on, it would be the importance of careful and thoughtful feature engineering. The judicious selection of which predictor variables to include in a model often has a more beneficial effect on overall classifier performance than the choice of the classification algorithm itself. This is one reason why classification algorithms that automatically include feature selection such as glmnet, gbm or random forests top the list of “go to” algorithms for many practitioners.
There are occasions, however, when you find yourself for one reason or another committed to classifier that doesn’t automatically narrow down that list of predictor variables that some sort of automated feature selection might seem like a good idea. If you are an R user then the caret package offers a whole lot machinery that might be helpful. Caret offers both filter methods and wrapper methods that include recursive feature estimation, genetic algorithms (GAs) and simulated annealing. In this post, we will have a look at a small experiment with caret’s GA option. But first, a little background.
Performing feature selection with GAs requires conceptualizing the process of feature selection as an optimization problem and then mapping it to the genetic framework of random variation and natural selection. Individuals from a given generation of a population mate to produce offspring who inherit genes (chromosomes) from both parents. Random mutation alters a small part of child’s genetic material. The children of this new generation who are genetically most fit produce the next generation. In the feature selection context, individuals become solutions to a prediction problem. Chromosomes (sequences of genes) are modeled as vectors of 1’s and 0’s with a 1 indicating the presence of a feature and a 0 its absence. …read more
Source:: r-bloggers.com