Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support any shape of Llava (from llava 1.6) #460

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

hhaAndroid
Copy link
Collaborator

No description provided.

@hhaAndroid hhaAndroid requested a review from LZHgrla March 11, 2024 07:52
@choyakawa
Copy link

Not working with zero3: #432 (comment)

@hhaAndroid
Copy link
Collaborator Author

Not working with zero3: #432 (comment)

qlora does not currently support zero3.

@choyakawa
Copy link

Not working with zero3: #432 (comment)

qlora does not currently support zero3.

It is not the issue with 4bit. I used full and no lora, however the 'newline' is somehow not compatible with Zero3.

@awzhgw
Copy link

awzhgw commented Apr 24, 2024

@hhaAndroid @tpoisonooo 这个PR啥时候合并呢?我急需

@awzhgw
Copy link

awzhgw commented Apr 24, 2024

@hhaAndroid

当我开始pretrain的时候,报错了。。这是什么错误呢?

RuntimeError: The expanded size of the tensor (4096) must match the existing size (0) at non-singleton dimension 0.  Target sizes: [4096, 32, 1].  Tensor sizes: [0, 1, 1]
    model = self.train_loop.run()  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 270, in run
    self.runner.call_hook('before_train')
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook
    getattr(hook, fn_name)(self, **kwargs)
  File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 221, in before_train
    self._generate_samples(runner, max_new_tokens=50)
  File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 207, in _generate_samples
    self._eval_images(runner, model, device, max_new_tokens,
  File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/anyshape_evaluate_chat_hook.py", line 53, in _eval_images
    image_features = model.preprocess_for_pixel_values({
  File "/export/App/training_platform/PinoModel/xtuner/xtuner/model/anyshape_llava.py", line 109, in preprocess_for_pixel_values
    self.image_newline[:, None, None].expand(
RuntimeError: The expanded size of the tensor (4096) must match the existing size (0) at non-singleton dimension 0.  Target sizes: [4096, 32, 1].  Tensor sizes: [0, 1, 1]
[2024-04-24 13:52:01,685] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2444983) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

@awzhgw
Copy link

awzhgw commented Apr 24, 2024

@hhaAndroid this pr can support llama3 8B ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants