Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still no 512 or 768 pre trained model? #88

Open
FurkanGozukara opened this issue Sep 20, 2023 · 21 comments
Open

Still no 512 or 768 pre trained model? #88

FurkanGozukara opened this issue Sep 20, 2023 · 21 comments

Comments

@FurkanGozukara
Copy link

256 is working but just too bad resolution

relative_find_best_frame_true_square_aspect_ratio_vox.mp4
relative_find_best_frame_false_org_aspect_ratio_vox.mp4
@Qia98
Copy link

Qia98 commented Sep 26, 2023

I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets.

@FurkanGozukara
Copy link
Author

I'm trying to train 512 or higher resolution. But I met some challenge in getting 512 datasets.

do you have a model i can test?

@EdgarMaucourant
Copy link

Hey @nieweiqiang ,

You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos).

@FurkanGozukara
Copy link
Author

Hey @nieweiqiang ,

You could use the Talking Head 1KH dataset, it is good for faces and had a lot of videos in the 512 or 768 (along with a lot of other to you might want to use a script to resize the videos).

do you have any pre trained model i can test? or any tutorial how to train ourselves? i can get gpu power

@EdgarMaucourant
Copy link

EdgarMaucourant commented Oct 2, 2023

I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on.

First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH

The repo only includes the scripts but this is quite easy to use. Just few things to notice:

  • You need around 2 TB of free disk space to download the full dataset. The scripts will scrap a bunch of videos from Youtube and then will crop the videos to the face and extract interesting parts into smaller videos. At the end you will have around 500K videos ranging from 10 frames to 800 frames.
  • The scripts in this repo are meant to be used with a Linux environment, I tried to transform the bash scripts into BAT scripts to be used on Windows but as I was short on time I had to abandon this idea. I ended up using WSL2 (Linux on Windows). So either you want to create BAT scripts or install WSL2 and use your windows partitions (automatically mounted into the Linux disto) as a storage. For WSL2 see https://learn.microsoft.com/en-us/windows/wsl/install
  • The videos cropped from the original videos don't have all the same dimensions (but seems to be squared) so you will have to resize them or exclude the resolutions that you don't want to use. I trained for a 512x512 result so I resized them to that size using the training script of thin plate (see below). You can resize them before the training if you have the script/software to do it, I went for the easy path and used the training script instead.

Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files).

The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder.
Also it seems that the script generate a bunch of invalid videos that will make the training to fail so I just removed all files under 20KB in size and that solved it (around 15000 videos removed) .

The hardest part is knowing what to put in the yaml file of the config.
First of all in the config folder, copy/paste one of the existing config file. I used vox-256.yaml as this is what was the closest to my datasets (faces talking). In the file I made the following changes:

  • In dataset_params:
    • Change root_dir to the path of the source folder you created before (make sure you use the source folder path not train or test).
  • In train_params:
    • change num_epochs to 2 (100 is large and you want to test first on a small number of epochs and raise the number if needed)
    • change num_repeats to 2 (the datasets as already a very large number of videos as inputs). This would repeats training on the same videos multiple times. In that case 2 times
    • change epoch_milestones to [1,2] because you just have 2 epochs
    • change batch_size to 5, this number is difficult to estimate and all depends on your GPU Memory. However if it is too large the training will fail quite quickly (under 2 to 5 minutes) and the message will clearly state that torch tried to allocate more memory than available, if it's the case lower the number until it passes. 5 works fine on my RTX3090 with 24GB RAM.
    • change dataloader_workers to 0, this should not be necessary, I lowered that number when I was trying to solve the issue with the GPU RAM above and forgot to set it back to 12 so feel free to keep it to 12
    • change checkpoint_freq to 1 (because you don't have much epochs)
    • change bg_start to 1

The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value.
So I went the easy way and hardcoded the value in the script directly. You can do this by replacing the line 70 in frames_dataset.py from self.frame_shape = frame_shape to self.frame_shape = (512,512,3)

Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà!

Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far.

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Oct 2, 2023

@EdgarMaucourant awesome man so much explanation

if you already have checkpoints can you send me latest one?

if you don't want to publicly share you can email me : furkangozukara@gmail.com

@EdgarMaucourant
Copy link

Hey @FurkanGozukara,

I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large.

@FurkanGozukara
Copy link
Author

Hey @FurkanGozukara,

I will share them when I will have them but for now this is still training... I will take probably several more days to train as the datasets if large.

awesome looking forward too. you are doing an amazing job

@skyler14
Copy link

skyler14 commented Oct 7, 2023

Anything peculiar coming up while training on higher resolutions? I'm going to follow this

@ak01user
Copy link

ak01user commented Oct 7, 2023

@EdgarMaucourant How is the model training? I trained it for more than ten hours, 200 epochs, image size is 384*384, but the effect is not very good.I plan to continue training

@EdgarMaucourant
Copy link

actually the training failed after 65 hours without any output :'(
I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

@FurkanGozukara
Copy link
Author

actually the training failed after 65 hours without any output :'( I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

sad

Looking forward to results

@Qia98
Copy link

Qia98 commented Oct 8, 2023

I don't have a pre-trained model (yet) as it is still training, and I can't give you a full tutorial as I'm not the author and did not go into all the details but I can give you what I did to get the training on.

First of all you need a dataset to train on, I used this one: https://github.com/tcwang0509/TalkingHead-1KH

The repo only includes the scripts but this is quite easy to use. Just few things to notice:

  • You need around 2 TB of free disk space to download the full dataset. The scripts will scrap a bunch of videos from Youtube and then will crop the videos to the face and extract interesting parts into smaller videos. At the end you will have around 500K videos ranging from 10 frames to 800 frames.
  • The scripts in this repo are meant to be used with a Linux environment, I tried to transform the bash scripts into BAT scripts to be used on Windows but as I was short on time I had to abandon this idea. I ended up using WSL2 (Linux on Windows). So either you want to create BAT scripts or install WSL2 and use your windows partitions (automatically mounted into the Linux disto) as a storage. For WSL2 see https://learn.microsoft.com/en-us/windows/wsl/install
  • The videos cropped from the original videos don't have all the same dimensions (but seems to be squared) so you will have to resize them or exclude the resolutions that you don't want to use. I trained for a 512x512 result so I resized them to that size using the training script of thin plate (see below). You can resize them before the training if you have the script/software to do it, I went for the easy path and used the training script instead.

Once you have your dataset, don't try to extract the frames from the videos, I tried that and you would need more than 10 TB of storage (over 50 Millions frames extracted), and even if you have the time and the storage, despite what the documentation of Thin Plate mentions the scripts are meant to ingest mp4 videos not frames (although some parts seems to handle frames, but the main script only look for mp4 files).

The script expect a hierarchy of folders as input that is not the same as Talking Head (TH) Dataset. So you will have to create a new folder (call it whatever you like, this will be your source folder). Create two subfolders: train and test. Copy (or move) the content of the folder train/cropped_clips from TH to the train folder, copy the content of the folder val/cropped_clips from TH to the test folder. Also it seems that the script generate a bunch of invalid videos that will make the training to fail so I just removed all files under 20KB in size and that solved it (around 15000 videos removed) .

The hardest part is knowing what to put in the yaml file of the config. First of all in the config folder, copy/paste one of the existing config file. I used vox-256.yaml as this is what was the closest to my datasets (faces talking). In the file I made the following changes:

  • In dataset_params:

    • Change root_dir to the path of the source folder you created before (make sure you use the source folder path not train or test).
  • In train_params:

    • change num_epochs to 2 (100 is large and you want to test first on a small number of epochs and raise the number if needed)
    • change num_repeats to 2 (the datasets as already a very large number of videos as inputs). This would repeats training on the same videos multiple times. In that case 2 times
    • change epoch_milestones to [1,2] because you just have 2 epochs
    • change batch_size to 5, this number is difficult to estimate and all depends on your GPU Memory. However if it is too large the training will fail quite quickly (under 2 to 5 minutes) and the message will clearly state that torch tried to allocate more memory than available, if it's the case lower the number until it passes. 5 works fine on my RTX3090 with 24GB RAM.
    • change dataloader_workers to 0, this should not be necessary, I lowered that number when I was trying to solve the issue with the GPU RAM above and forgot to set it back to 12 so feel free to keep it to 12
    • change checkpoint_freq to 1 (because you don't have much epochs)
    • change bg_start to 1

The last change is that you want to train on a specific size (512x512 in my case) so you have to make sure the videos are resized to that size. From what I can read in the script file, you should do that by setting the frame_shape settings in your yaml file (under the dataset_params section). However I did not found the correct format for that settings, in the script it is defined as (256,256,3) but it was not working in the YAML file when I used that value. So I went the easy way and hardcoded the value in the script directly. You can do this by replacing the line 70 in frames_dataset.py from self.frame_shape = frame_shape to self.frame_shape = (512,512,3)

Then you should be good to go! Just run the run.py script in the folder passing in your config file and voilà!

Note that I'm not an expert, and I'm still trying to get an trained model so I don't guarantee these are the best steps, only what I did to get it working so far.

I change the same things for 512 training. The datasets I used is voxceled2. I resize the datasets to 512 and transfrom mp4 format to png. It costs about 11TB(only a part). If I use mp4 for training, it costs about 10 hours per epoch. But in png format, it costs about 1 hours per epoch. Total about 3days
The config of my training is :
num_epochs: 100
num_repeats: 200 (the datasets is only a part, so I increase the num_repeats)
batch_size: 8
Other parameters the same with vox-256
And also, in frames_dataset.py I change the image size by hard setting.
but I did't get a good checkpoint for work

@Qia98
Copy link

Qia98 commented Oct 8, 2023

actually the training failed after 65 hours without any output :'( I did not had time to relaunch it until now, so I started with a much small dataset and see how it will go.

Can I see your log.txt? My traing is normal.
The loss is stable and convergence.
From perceptual - 99.74809; equivariance_value - 0.39179; warp_loss - 5.25956; bg - 0.25512 to perceptual - 68.15993; equivariance_value - 0.15263; warp_loss - 0.67301; bg - 0.03551

@Qia98
Copy link

Qia98 commented Oct 9, 2023

image-20231009 When training the 512 model, I noticed that the visualized picture appears to have been cropped.

Has anyone ever encountered this problem? I want to know whether there's something wrong with my frame_dataset.py or the dataset format.

@EdgarMaucourant
Copy link

Hi @nieweiqiang ,

Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that.

On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more.

I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model!

@FurkanGozukara
Copy link
Author

Hi @nieweiqiang ,

Probably that the code to generate that vis is hardcoded to 256x256, I did not look at the code but I would supect that.

On my end I'm giving up. Sorry guys, I was doing this on my spare time, and what ever I tried it fails at some point because I'm lacking memory or space on my computer (32 GB RAM is not enough I think, of maybe this is the GPU RAM). I tried to reduce the number of repeats the number of items in the dataset, but whatever I do it fails at some point and I'm lacking time to look into this further more.

I hope that what I shared above for the yaml file was insightful and I wish you all the best for training a model!

so sad to hear :(

@thhung
Copy link

thhung commented Oct 12, 2023

@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ?

@FurkanGozukara
Copy link
Author

@FurkanGozukara Do you plan to continue the work of @EdgarMaucourant ?

i have 0 idea right now how to prepare dataset and start training

@ak01user
Copy link

image-20231009 When training the 512 model, I noticed that the visualized picture appears to have been cropped.
Has anyone ever encountered this problem? I want to know whether there's something wrong with my frame_dataset.py or the dataset format.

This phenomenon occurs when I interrupt the program during saving.

@huangxin168
Copy link

**Qia98 ** commented Oct 9, 2023
have you solve the prblem? also want to train 512 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants