General Representation Learning through Latent Space Masking and Prediction

This repository contains the code of two distinct research projects which are closely related and share much of the same codebase. The second project is and extension to the multimodal domain of the first one.

General Representation Learning through Latent Space Masking and Prediction

Description

We want to generalize the self-distillation learning paradigm so that it applied to any kind of unimodal or fused multimodal data without the need of modality-specific augmentation or masking strategies. Instead we embed the input data into a universal input array and apply a single masking strategy in the latent space instead of the data space. We test this genealized apporach on a multitude of datasets containing text, images, audio and video data.

How to run

#TODO update this section to run with poetry Install dependencies

# clone project
git clone https://github.com/marcomoldovan/multimodal-self-distillation
cd multimodal-self-distillation

# install the correct python version
sudo apt-get install python3.10 # Linux, Python 3.7 or higher
brew install python@3.10 #MacOS, Python 3.7 or higher
choco install python --version=3.9 # Windows, Python 3.7-3.9

# create python virtual environment and activate it
python3 -m venv myenv
source myenv/bin/activate

# if you have several version of python you can create a virtual environment with a specific version:
virtualenv --python=/usr/bin/<python3.x> myenv
myenv\Scripts\activate.bat

# [ALTERNATIVE] create conda environment
conda create -n myenv python=<3.x>
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/unimodal

python train.py experiment=unimodal/experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

Self-Supervised Multimodal Alignment with Self-Distillation

Description

We view pairs of multimodal datapoints as augmentations of the same semantic concept and leverage this observation to apply the self-distillation paradigm to the multimodal setting in order to learn a coordinated multimodal representation space. We show that this approach is able to learn a representation space that is more aligned than the one learned by a standard contrastive loss while avoiding the need for negative mining, a cruicial weekness of the contrastive approach.

How to run

Install dependencies

# clone project
git clone https://github.com/marcomoldovan/multimodal-self-distillation
cd multimodal-self-distillation

# install the correct python version
sudo apt-get install python3.10 # Linux, Python 3.7 or higher
brew install python@3.10 #MacOS, Python 3.7 or higher
choco install python --version=3.9 # Windows, Python 3.7-3.9

# create python virtual environment and activate it
python3 -m venv myenv
source myenv/bin/activate

# if you have several version of python you can create a virtual environment with a specific version:
virtualenv --python=/usr/bin/<python3.x> myenv
myenv\Scripts\activate.bat

# [ALTERNATIVE] create conda environment
conda create -n myenv python=<3.x>
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/multimodal

python train.py experiment=multimodal/experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
configs		configs
data		data
logs		logs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test.py		test.py
train.py		train.py

marcomoldovan/multimodal-self-distillation

Folders and files

Latest commit

History

Repository files navigation

General Representation Learning through Latent Space Masking and Prediction

Description

How to run

Self-Supervised Multimodal Alignment with Self-Distillation

Description

How to run

About

Topics

Resources

Stars

Watchers

Forks

Languages