Project 2 - Recommendation 1 - Taniya D. Adhikari

m

Skin-Care Recommendation

Knowledge-Based Recommendation Approach

Business Problem

Consumers of skincare products are often faced with huge challenge of choosing from variety of options. There are ten thousand plus beauty products out there and new products add on daily basis which makes it hard for consumers to choose from. The problem with a vast variety of skincare products is that consumers are burdened with a challenge of choosing from numerous products, which is often time consuming and difficult to select the right product for their skin concerns. Usually, consumer will look up reviews for products to find the best match, however, overflowing reviews online becomes overwhelming and consumer must read through reviews to find products.

This project focuses on building a knowledge-based recommendation system for skincare products based on user preferences. The motivation of this project is to make consumer shopping experience quick and easy by successfully recommending top products from the wide variety of products available in the market. In this project I attempted to develop a system that uses predictive analytics to recommend products to users by matching them to highly rated products and similar users. The recommendation system is an interactive tool, that will be asking user to give their preferences and it will recommend products by finding products that are highly satisfying to customers as well as how closely user matches to the customers who had similar issues or problems.

Techniques

Data Preparation: Some of the techniques used for textual data cleaning are lemmatization and tokenization using nltk (Natural Language Toolkit) package in Python.

EDA: To analyze some of the variables, I used ggplot2 from R. The two variables I was most interested in was product type and the reviews.

I performed analysis for fake reviews. This was done with the assumption that fake reviews are usually copy paste reviews with same words, so I looked for reviews with 80% similarity. To do this I looked for most common words in the reviews and used term frequency-inverse document frequency (TF-IDF) and cosine similarity.

Two techniques I used is Sentiment Analysis and Cosine Similarity.

Future Applications

This model can be combined with other types of recommendation system to solve a problem of “cold start” which is not knowing user history. It can also convert into product rating tool. Similar kind of model can be used for different products as well.

Project Duration

This project lasted approximately 4-6 weeks. Both Data Collection and Data preparation took some time as I had to scrap web data and create a dataset.

Key Skills

Textual Data Cleaning, Textual Data Transformation, Web Scraping, Data Visualization, Unsupervised Learning, Sentiment Analysis, Similarity Analysis. Natural Language Processing.

Tools

Python, Pandas, NumPy, Scikit-learn, Ggplot2, R, Genism,NLTK