Walmart Trip Type Classification was my first real foray into the world of Kaggle and I’m hooked. I previously dabbled in What’s Cooking but that was as part of a team and the team didn’t work out particularly well. As a learning experience the competition was second to none. My final entry put me at position 155 out of 1061 entries which, although not a stellar performance by any means, is just inside the top 15% and I’m pretty happy with that. Below are a few notes on the competition.
Before I get started, congratulations to the competitors at the top of the leaderboard! You guys killed it.
Data Preparation
Getting to my final model was quite a process, with many moments of frustration, disappointment, enlightenment and exhilaration along the way.
The first step was to clean up the data. Generally it appeared to be in pretty good shape, with nothing extraordinary jumping out at me. There were some minor issues though, for example, department labels for both “MENSWEAR” and “MENS WEAR”, which needed to be consolidated into a single category.
My initial submission was a simple decision tree. This was an important part of the process for me because it established that I had a working analysis pipeline, a submission file in valid format, and a local value of the evaluation metric which was consistent with that on the public leaderboard. Submissions were gauged according to log loss and my first model scored 2.79291, quite a bit better than the Random Forest & Department Description benchmark at 5.77216, but still nowhere near competitive.
I then did a few very elementary things with the data and applied an XGBoost model, which resulted in a significant bump in model performance. I hadn’t …read more
Source:: r-bloggers.com