Skip to content

A Recommender system that predicts your next order based on your previous purchases. Also, it discuss the association between product purchases.

Notifications You must be signed in to change notification settings

NouranHany/Instacart-Market-Basket-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instacart-Market-Basket-Analysis 🛒

graph

Project Description

Open directly from kaggle Notebooks!

EDA and Interactive Dashboard by Power BI 📊

  • General Analysis.
  • Analyzing Behavour of users: users who always order same products.
  • How Time affects the purchasing behaviour of customers?
  • Analyzing products
  • Analyzing Organic Prodcuts.
  • Purchasing behaviour on Departments and Aisles.
powerBI_dashboard.mp4

Next Order Recommender/Predictor 🍞 🍟 🍩

A predictive analysis model , that predicts the products ordered in users' future order based on each purchasing history. Primary Key is the user-product pair to predict whether will be in the future order or not.

XGBoost Classifier was used.

Features with highest importance used by the model:

  • up_orders_since_last_order: measures how long the user hasn't considered buying a specific product.
  • up_order_rate_since_first_time: measures the degree a user like a product. It's the ratio by which a user will buy a product from the first moment he/she knew about it.
  • prod_reorder_ratio: measures how customers in general like a product.
  • user_reorder_ratio: measures how this user is likely to buy something new! Feature Importance

Association Rules 🍌 ➡️ 🍅

We have used the Apriori algorithm to extract assciation rules embedded in instacart's data. The following functions are implemented to later serve as an API calls in the deployed version:

Function Documentation
most_10_frequent_items Takes the cardinality of the itemset and return the most 10 frequent item-sets of that cardinality.
all_items_with_at_least_support_and_len Returns the itemset of a specific cardinality and satisfying a minimum support.
show_itemset_support Returns the support of a given item-set
rules_with_specific_threshold Return (Filter) rules satisfying a given threshold. Threshold can be on confidence, support, lift, leverage and conviction.
select_rules_with_antecedents_length Return rules with a specific antecedents cardinality.
select_rules_with_antecedents_names Return rules of a specific antecedent.
select_rules_with_consequents_names Return rules of a specific consequent.

Deploying a user use-case

deploying_example.mp4

Project Challenges 🙀

Next Order Recommender

  • Data is sparse, we have very large number of products and of course the customer will have very few in his/her next order. Data is very skewed to the negative class. Class distribution: 90% negative class, 10% positive class.
  • First, we've found that there's alot of false negatives, do We changed the threshold to maximize the recall, while keeping the precision above a certain threshold [0.3].
  • In ther words, we wanted to reduce, the false negatives, the number of products the model say user won't predict in the future while he/she will actually does. On the other side, it's okay to allow some false positives, when the model recommends a products the user will less likely buy in his/her next order.

The PR-Curve

Project Organization

./
├── EDA
|     ├── eda-on-instacart-data.ipynb 
|     └── Instacart Power Bi dashboard.pdf            
├── Model
|     ├── Association Rules Using Apriori.ipynb                                              
|     └── predictive-analysis-model.ipynb 
├── Business Insights 
|     ├── Business Questions-Solution.pdf                                              
|     └── Project-Data Description.pdf
├── Example of Deployment
|     ├── model.html
|     ├── index.html
|     ├── rules.html
|     ├── js
|     └── css
├── Presentation
|     └── DS - Presentation.pptx 
└── images

Todos

  • Applying dynamic thresholding on user's products according to his/her average basket size.
  • Continue developing the time-dependent features.
  • Apply oversampling and undersampling, and observe whether this will help the imbalanced class problem.
  • Deploy the Full Model.

Contributors

Toka Khaled
Toka Khaled
Noran Hany
Noran Hany

About

A Recommender system that predicts your next order based on your previous purchases. Also, it discuss the association between product purchases.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published