Skip to content

A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.

Notifications You must be signed in to change notification settings

marcomoldovan/multimodal-self-distillation

Repository files navigation

This repository contains the code of two distinct research projects which are closely related and share much of the same codebase. The second project is and extension to the multimodal domain of the first one.

General Representation Learning through Latent Space Masking and Prediction

PyTorch Lightning Config: Hydra Template
Paper Conference

Description

We want to generalize the self-distillation learning paradigm so that it applied to any kind of unimodal or fused multimodal data without the need of modality-specific augmentation or masking strategies. Instead we embed the input data into a universal input array and apply a single masking strategy in the latent space instead of the data space. We test this genealized apporach on a multitude of datasets containing text, images, audio and video data.

How to run

#TODO update this section to run with poetry Install dependencies

# clone project
git clone https://github.com/marcomoldovan/multimodal-self-distillation
cd multimodal-self-distillation

# install the correct python version
sudo apt-get install python3.10 # Linux, Python 3.7 or higher
brew install python@3.10 #MacOS, Python 3.7 or higher
choco install python --version=3.9 # Windows, Python 3.7-3.9

# create python virtual environment and activate it
python3 -m venv myenv
source myenv/bin/activate

# if you have several version of python you can create a virtual environment with a specific version:
virtualenv --python=/usr/bin/<python3.x> myenv
myenv\Scripts\activate.bat

# [ALTERNATIVE] create conda environment
conda create -n myenv python=<3.x>
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/unimodal

python train.py experiment=unimodal/experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

Self-Supervised Multimodal Alignment with Self-Distillation

PyTorch Lightning Config: Hydra Template
Paper Conference

Description

We view pairs of multimodal datapoints as augmentations of the same semantic concept and leverage this observation to apply the self-distillation paradigm to the multimodal setting in order to learn a coordinated multimodal representation space. We show that this approach is able to learn a representation space that is more aligned than the one learned by a standard contrastive loss while avoiding the need for negative mining, a cruicial weekness of the contrastive approach.

How to run

Install dependencies

# clone project
git clone https://github.com/marcomoldovan/multimodal-self-distillation
cd multimodal-self-distillation

# install the correct python version
sudo apt-get install python3.10 # Linux, Python 3.7 or higher
brew install python@3.10 #MacOS, Python 3.7 or higher
choco install python --version=3.9 # Windows, Python 3.7-3.9

# create python virtual environment and activate it
python3 -m venv myenv
source myenv/bin/activate

# if you have several version of python you can create a virtual environment with a specific version:
virtualenv --python=/usr/bin/<python3.x> myenv
myenv\Scripts\activate.bat

# [ALTERNATIVE] create conda environment
conda create -n myenv python=<3.x>
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/multimodal

python train.py experiment=multimodal/experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64