Skip to content

Latest commit

 

History

History
41 lines (31 loc) · 3.43 KB

MODEL_ZOO.md

File metadata and controls

41 lines (31 loc) · 3.43 KB

VideoMAEv2 Model Zoo

Model Weight Links

Please fill out VideoMAE V2 Download Request Form, you will see the download link for the VideoMAE V2 model weights after submission. The form asks for some information about your organization and how you plan to use the model, so that we can better understand the needs of our users and improve our future works.

The weights of the distilled models can be downloaded directly at Distillation section.

Pre-train

Model Config Dataset Encoder Masking Decoder Masking Epoch #Frame
ViT-giant vit_g_hybrid_pt_1200e UnlabeledHybrid tube (90%) running cell (50%) 1200 16
  • We set different sampling intervals for the videos from different sources in unlabeledhybrid: 2 for SSv2 and 4 for the other datasets.

Fine-tune

Model Config Dataset Pre-train Post-pre-train #Frame Top-1 Top-5
ViT-giant vit_g_hybrid_pt_1200e_k710_ft K710 UnlabeledHybrid None 16x5x3 83.8 96.4
ViT-giant vit_g_hybrid_pt_1200e_k400_ft K400 UnlabeledHybrid None 16x5x3 87.2 97.4
ViT-giant vit_g_hybrid_pt_1200e_k710_it_k400_ft K400 UnlabeledHybrid K710 16x5x3 88.4 98.0
ViT-giant vit_g_hybrid_pt_1200e_k710_it_k600_ft K600 UnlabeledHybrid K710 16x5x3 88.8 98.2
ViT-giant vit_g_hybrid_pt_1200e_ssv2_ft SSv2 UnlabeledHybrid None 16x2x3 77.0 95.9
ViT-giant vit_g_hybrid_pt_1200e_k710_it_ucf101_ft UCF101 UnlabeledHybrid K710 16x5x3 99.6 100.0
ViT-giant vit_g_hybrid_pt_1200e_k710_it_hmdb51_ft HMDB51 UnlabeledHybrid K710 16x5x3 88.1 98.5
  • We report the fine-tuning accuracy for sparse sampling on SSv2 and for dense sampling on the other datasets.
  • #Frame = #input_frame x #clip x #crop.
  • all the input resolution is $224^2$.

Distillation

Model Dataset Teacher Model #Frame K710 Top-1 K400 Top-1 K600 Top-1 Checkpoint
ViT-small K710 vit_g_hybrid_pt_1200e_k710_ft 16x5x3 77.6 83.7 83.1 vit_s_k710_dl_from_giant.pth
fine-tuning accuracy 16x7x3 -- 84.0 84.6 --
ViT-base K710 vit_g_hybrid_pt_1200e_k710_ft 16x5x3 81.5 86.6 85.9 vit_b_k710_dl_from_giant.pth
fine-tuning accuracy 16x7x3 -- 87.1 87.4
  • We initialize the parameters of the student model with the model obtained after the post-pre-train stage.
  • The fine-tuning accuracy refers to the accuracy achieved by further fine-tuning several epochs in the specified dataset after distillation.