Improved vtreat documentation

Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here).

vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically justifiable manner.

Even with modern machine learning techniques (random forests, support vector machines, neural nets, gradient boosted trees, and so on) or standard statistical methods (regression, generalized regression, generalized additive models) there are common data issues that can cause modeling to fail. vtreat deals with a number of these in a principled and automated fashion.

In particular vtreat emphasizes a concept called “y-aware pre-processing” and implements:

Treatment of missing values through safe replacement plus indicator column (a simple but very powerful method when combined with downstream machine learning algorithms).
Treatment of novel levels (new values of categorical variable seen during test or application, but not seen during training) through sub-models (or impact/effects coding of pooled rare events).
Explicit coding of categorical variable levels as new indicator variables (with optional suppression of non-significant indicators).
Treatment of categorical variables with very large numbers of levels through sub-models (again impact/effects coding).
(optional) User specified significance pruning on levels coded into effects/impact sub-models.
Correct treatment of nested models or sub-models through data split (see here) or through the generation of “cross validated” data frames (see here; these are issues similar to what is required to build statistically efficient stacked models or super-learners).
Safe processing of “wide data” (data with very many variables, often driving common machine learning algorithms to over-fit) through out of sample per-variable significance estimates and user controllable pruning (something we have lectured on previously here and here).
Collaring/Winsorizing of unexpected out of range numeric inputs.
(optional) Conversion of all variables into effects (or “y-scale”) units (through the optional scale argument to vtreat::prepare(), using some of the ideas discussed here). This allows …read more
Source:: r-bloggers.com

Improved vtreat documentation

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List