Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Reverse Engineering with Correlated Features

$
0
0

By arthur charpentier

In econometric modeling, I usually have a problem with correlated features. A few weeks ago, I was discussing feature selection when features are correlated. This week, I was wondering about reverse engineering when features might be correlated (not to say very correlated). The way I see reverse engineering is the following

  1. someone has some dataset, and based on that dataset, a model was fitted. But we cannot see how it works….
  2. we can generate “fake data”, feed the model with those data, and get predictions
  3. based on those predictions, we wish we can fit a model that should be close to the the ‘true’ model used
  4. one way to measure how good our model is is to compare predictions on the initial data with our model with the original dataset (or the initial ‘true’ values if we use generated datasets).

My concern was about those “fake data”, when features are correlated. My first concern was about the fact that if we generate randomly those fake data (without taking into accound correlations) there might be a huge was of time, since we will ask the model to fit a model on data that might not observed. For instance, if we have two factors, and , that we assume to be extremely correlated, in the sense that only pairs and can be observed, we waste our time by generating the four possible pairs. My second concern was that those useless data might actually mislead our model. If half of the fake data cannot be obtained, but we still get a prediction, those might mislead our model, while we should focus on more valid data.

In order to test those intuitions, consider some simple (simulated dataset). We have two categorial variables (each of them has 26 levels), and …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles