Skip to content

Feature Store implementation for storing product information and text/visual embeddings for further use in datascience projects.

Notifications You must be signed in to change notification settings

mcagri/FeatureStore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature Store with MongoDB and SimpleEmbeddingService

Sample Dataset

Home Depot product data is downloaded from https://data.world/

You can check out my project @ Data.World

Compose Files

MongoDB

For the purpose of this tutorial MongoDB is used to store embeddings along with rest of the data

sudo docker volume create mondodb_data

sudo docker-compose up -d

version: '3.7'
services:
  mongodb_container:
    image: mongo:latest
    environment:
      MONGO_INITDB_ROOT_USERNAME: username
      MONGO_INITDB_ROOT_PASSWORD: password
    ports:
      - 27017:27017
    volumes:
      - mongodb_data:/data/db

volumes:
  mongodb_data:

PostgreSQL

PostgreSQL DB server with pgvector extension is used. Vector storage and indexing capabilities are not part of this tutorial

sudo docker volume create postgresql_data

sudo docker-compose up -d

services:
  db:
    hostname: db
    image: ankane/pgvector
    ports:
     - 5432:5432
    restart: always
    environment:
      - POSTGRES_DB=vectordb
      - POSTGRES_USER=testuser
      - POSTGRES_PASSWORD=testpwd
      - POSTGRES_HOST_AUTH_METHOD=trust
    volumes:
     - postgresql_data:/var/lib/postgresql/data
     
volumes:
  postgresql_data:

FileProcessor Notebook

This notebook is used for the initial setup of the Postgre data source and sample use of embedding service for the ETL process to store Product data and embeddings. The embeddings for the product names will be generated.

Next Steps

  1. Initialize Databases - ✅
  2. Create embeddings for product names and create collection - ✅
  3. Extending SimpleEmbeddingService to handle images - ✅
  4. Create image embeddings for product images and update collection - ✅
  5. Automation of feature store updates with Prefect - ✅

About

Feature Store implementation for storing product information and text/visual embeddings for further use in datascience projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published