This project was launched as a competition by Expedia in Kaggle. The challenge was to contextualize customer data and predict the likelihood a user will stay at a 100 different hotel groups. So, this is a large multi class classification problem. Expedia provides two data sets for this project. Training data is between 2013-2014 data and test data is for 2015. Basically, we will be using previous years data to make predictions for the future. In this Project I attempt to create my own algorithm using collaborative filtering and the two-dataset provided.
Techniques
I used a different approach. I decided to combine unsupervised and supervised learning to this problem. I recalled collaborative-filtering concepts and applied it in this problem. Instead of applying one trained model to all users, I decided to divide users into 5 different user clusters based on user similarity using
KMeans clustering. Further I wrote an algorithm that will train
KNN model for each user cluster. The accuracy rate improved significantly using this approach to 36%
Duration
This project lasted approximately 1 week. Most time was spent was on researching collaborative-filtering recommendation and writing an algorithm for recommendation algorithm.
Key Skills
Clustering Methods, Classification Model, Feature Engineering, Data Munging
Tools
Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn