[DeviceMesh] Supported N groups in `from_group` #126258

awgu · 2024-05-15T01:32:19Z

Stack from ghstack (oldest at bottom):

-> [DeviceMesh] Supported N groups in from_group #126258

Overview
This PR supports constructing an ND mesh with from_group() by passing in group: List[ProcessGroup] and mesh: Union[torch.Tensor, "ArrayLike"] together. The ndim of the device mesh returned from from_group() is equal to the number of ProcessGroups passed. If the ndim is greater than 1, then the mesh argument is required (since there is no simple way to recover the mesh tensor from the process groups otherwise).

This PR also adds mesh_dim_names as an argument to forward to the device mesh for convenience.

Old Approach

Overview

This PR mainly adds mesh_shape to from_group() so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to from_group() with mesh_shape = (replicate_dim_size, shard_dim_size) and from_group() will construct subgroups for the user. (The user can then get the subgroups from the submeshes.)
- Constructing the 2D DeviceMesh from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general.
This PR also adds mesh_dim_names to from_group() so that the user can name the mesh dimensions of the constructed device mesh.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

[ghstack-poisoned]

pytorch-bot · 2024-05-15T01:32:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126258

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit d757bd4 with merge base 91bf952 ():

NEW FAILURES - The following jobs have failed:

periodic / linux-focal-rocm6.1-py3.8 / build (gh)
Process completed with exit code 1.
trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu) (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh) (similar failure)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 7b4fd9ebe36785533e60d1bbfea31fdc1b7479af Pull Request resolved: #126258

…group`" - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dims` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: 9b0f4ed648a6ffad4414c036ed05ea1099f362e4 Pull Request resolved: #126258

…group`" - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dims` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: d69f7f205f9c189409a8c76ae3d29b72aa11130d Pull Request resolved: #126258

…group`" **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: 29f8816f6a80183c441b019784918b967fef187e Pull Request resolved: #126258

test/distributed/test_device_mesh.py

wanchaol

looks awesome!

torch/distributed/device_mesh.py

…group`" **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: 1a9b841498bd94cc25498358777dd1a9b7cba0f3 Pull Request resolved: #126258

…group`" **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: c74b058293bd07de7d1150780e7a6d31388870a0 Pull Request resolved: #126258

awgu · 2024-05-15T20:06:10Z

test/distributed/_composable/fsdp/test_fully_shard_init.py

+
+        # Use the global PG as the parent group (in practice, this could be a
+        # subgroup of the global PG)
+        dp_group = dist.distributed_c10d._get_default_group()


The assumption is that the trainer is still able to construct this overall DP process group. Then, constructing the mesh arg from there can be done like below.

…group`" **Overview** This PR supports constructing an ND mesh with `from_group()` by passing in `group: List[ProcessGroup]` and `mesh: Union[torch.Tensor, "ArrayLike"]` together. The `ndim` of the device mesh returned from `from_group()` is equal to the number of `ProcessGroup`s passed. If the `ndim` is greater than 1, then the `mesh` argument is required (since there is no simple way to recover the `mesh` tensor from the process groups otherwise). This PR also adds `mesh_dim_names` as an argument to forward to the device mesh for convenience. <details> <summary> Old Approach </summary> **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: 902775a48b7350058ea477652677f2a9e6154082 Pull Request resolved: #126258

wanchaol

The new approach sounds good to me!

awgu · 2024-05-16T14:09:54Z

NotImplementedError: Could not run '_c10d_functional_autograd::all_to_all_single' with arguments from the 'CUDA' backend.

periodic / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 2, linux.rocm.gpu) failure is known and unrelated. #126380

awgu · 2024-05-16T14:10:01Z

@pytorchbot merge -i

pytorchmergebot · 2024-05-16T14:12:09Z

Merge started

Your change will be merged while ignoring the following 2 checks: periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu), periodic / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 2, linux.rocm.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-16T14:12:36Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x bb745634b2752c3c27c747fb74736ae13d23b6b4 returned non-zero exit code 1

Auto-merging test/distributed/_composable/fsdp/test_fully_shard_init.py
CONFLICT (content): Merge conflict in test/distributed/_composable/fsdp/test_fully_shard_init.py
Auto-merging test/distributed/test_device_mesh.py
CONFLICT (content): Merge conflict in test/distributed/test_device_mesh.py
Auto-merging torch/distributed/device_mesh.py
error: could not apply bb745634b27... [DeviceMesh] Added `mesh`/`mesh_dim_names` to `from_group`
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

Details for Dev Infra team

Raised by workflow job

**Overview** This PR supports constructing an ND mesh with `from_group()` by passing in `group: List[ProcessGroup]` and `mesh: Union[torch.Tensor, "ArrayLike"]` together. The `ndim` of the device mesh returned from `from_group()` is equal to the number of `ProcessGroup`s passed. If the `ndim` is greater than 1, then the `mesh` argument is required (since there is no simple way to recover the `mesh` tensor from the process groups otherwise). This PR also adds `mesh_dim_names` as an argument to forward to the device mesh for convenience. <details> <summary> Old Approach </summary> **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: bb94cd7d13d4c2c4fea0033710dacd51dbc341e4 Pull Request resolved: #126258

**Overview** This PR supports constructing an ND mesh with `from_group()` by passing in `group: List[ProcessGroup]` and `mesh: Union[torch.Tensor, "ArrayLike"]` together. The `ndim` of the device mesh returned from `from_group()` is equal to the number of `ProcessGroup`s passed. If the `ndim` is greater than 1, then the `mesh` argument is required (since there is no simple way to recover the `mesh` tensor from the process groups otherwise). This PR also adds `mesh_dim_names` as an argument to forward to the device mesh for convenience. <details> <summary> Old Approach </summary> **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: e4587a5d259fd1be4af7f566e5ddc0cb6fd12e97 Pull Request resolved: #126258

awgu · 2024-05-16T16:45:36Z

distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float16

linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu) failure unrelated

linux-focal-rocm6.1-py3.8 / build failure unrelated

awgu · 2024-05-16T16:45:43Z

@pytorchbot merge -i

pytorchmergebot · 2024-05-16T16:47:28Z

Merge started

Your change will be merged while ignoring the following 2 checks: periodic / linux-focal-rocm6.1-py3.8 / build, periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-16T21:18:42Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

donglimm · 2024-05-17T00:20:53Z

torch/distributed/device_mesh.py

            """
            Contstructs a :class:`DeviceMesh` with ``device_type`` from an
            existing :class:`ProcessGroup`.

-            The constructed device mesh is assumed to be 1D.
+            The constructed device mesh has number of dimensions equal to the


is there any requirement for the input pg groups,
The test case test_2d_process_group_init first builds a global pg, then builds dp_shard_group, dp_replicate_group based on the global pg, is this required to always build global pg first ?
Or can these two process groups be created independently ?

I think the two process groups can be constructed independently.

awgu · 2024-05-17T01:00:48Z

trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu) failure unrelated

awgu · 2024-05-17T01:00:52Z

@pytorchbot merge -i

pytorchmergebot · 2024-05-17T01:02:54Z

Merge started

Your change will be merged while ignoring the following 3 checks: trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu), periodic / linux-focal-rocm6.1-py3.8 / build, periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

**Overview** This PR supports constructing an ND mesh with `from_group()` by passing in `group: List[ProcessGroup]` and `mesh: Union[torch.Tensor, "ArrayLike"]` together. The `ndim` of the device mesh returned from `from_group()` is equal to the number of `ProcessGroup`s passed. If the `ndim` is greater than 1, then the `mesh` argument is required (since there is no simple way to recover the `mesh` tensor from the process groups otherwise). This PR also adds `mesh_dim_names` as an argument to forward to the device mesh for convenience. <details> <summary> Old Approach </summary> **Overview** - This PR mainly adds `mesh_shape` to `from_group()` so that the user can construct an ND (N > 1) device mesh from a process group. This is to unblock HSDP, where we can pass the overall data parallel process group to `from_group()` with `mesh_shape = (replicate_dim_size, shard_dim_size)` and `from_group()` will construct subgroups for the user. (The user can then get the subgroups from the submeshes.) - Constructing the 2D `DeviceMesh` from an existing shard process group and replicate process group is hard because we cannot easily recover the array of ranks in their parent group on each rank in general. - This PR also adds `mesh_dim_names` to `from_group()` so that the user can name the mesh dimensions of the constructed device mesh. </details> Pull Request resolved: pytorch#126258 Approved by: https://github.com/wanchaol

[DeviceMesh] Added mesh_shape/mesh_dim_names to from_group

3ba6c52

[ghstack-poisoned]

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels May 15, 2024

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh_shape/mesh_dim_names to from_group

f670268

ghstack-source-id: 7b4fd9ebe36785533e60d1bbfea31fdc1b7479af Pull Request resolved: #126258

awgu added release notes: DeviceMesh and removed release notes: distributed (fsdp) release notes category labels May 15, 2024

awgu mentioned this pull request May 15, 2024

[FSDP] Fixed docs for inter/intra node PG helpers #126288

Closed

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh_shape/mesh_dim_names to from_group

fa90409

ghstack-source-id: 9b0f4ed648a6ffad4414c036ed05ea1099f362e4 Pull Request resolved: #126258

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh_shape/mesh_dim_names to from_group

e57da28

ghstack-source-id: d69f7f205f9c189409a8c76ae3d29b72aa11130d Pull Request resolved: #126258

awgu requested review from wconstab, wanchaol and wz337 May 15, 2024 14:56

awgu marked this pull request as ready for review May 15, 2024 14:56

awgu requested a review from donglimm May 15, 2024 15:18

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh_shape/mesh_dim_names to from_group

825be58

ghstack-source-id: 29f8816f6a80183c441b019784918b967fef187e Pull Request resolved: #126258

awgu commented May 15, 2024

View reviewed changes

test/distributed/test_device_mesh.py Outdated Show resolved Hide resolved

wanchaol approved these changes May 15, 2024

View reviewed changes

torch/distributed/device_mesh.py Outdated Show resolved Hide resolved

torch/distributed/device_mesh.py Outdated Show resolved Hide resolved

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh/mesh_dim_names to from_group

70ce612

ghstack-source-id: 1a9b841498bd94cc25498358777dd1a9b7cba0f3 Pull Request resolved: #126258

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh/mesh_dim_names to from_group

1157697

ghstack-source-id: c74b058293bd07de7d1150780e7a6d31388870a0 Pull Request resolved: #126258

awgu commented May 15, 2024

View reviewed changes

awgu added a commit that referenced this pull request May 15, 2024

[DeviceMesh] Added mesh/mesh_dim_names to from_group

bb74563

ghstack-source-id: 902775a48b7350058ea477652677f2a9e6154082 Pull Request resolved: #126258

awgu changed the title ~~[DeviceMesh] Added mesh_shape/mesh_dim_names to from_group~~ [DeviceMesh] Supported N groups in from_group May 15, 2024

awgu requested a review from wanchaol May 15, 2024 21:04

wanchaol approved these changes May 15, 2024

View reviewed changes

awgu added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label May 15, 2024

wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label May 15, 2024

pytorchmergebot added the merging label May 16, 2024

pytorchmergebot removed the merging label May 16, 2024

awgu added a commit that referenced this pull request May 16, 2024

[DeviceMesh] Added mesh/mesh_dim_names to from_group

e5e22bc

ghstack-source-id: bb94cd7d13d4c2c4fea0033710dacd51dbc341e4 Pull Request resolved: #126258

awgu added a commit that referenced this pull request May 16, 2024

[DeviceMesh] Added mesh/mesh_dim_names to from_group

fa864b5

ghstack-source-id: e4587a5d259fd1be4af7f566e5ddc0cb6fd12e97 Pull Request resolved: #126258

pytorchmergebot added the merging label May 16, 2024

pytorchmergebot removed the merging label May 16, 2024

donglimm reviewed May 17, 2024

View reviewed changes

pytorchmergebot added the merging label May 17, 2024

pytorchmergebot closed this in 697ed6f May 17, 2024

pytorchmergebot added Merged and removed merging labels May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeviceMesh] Supported N groups in `from_group` #126258

[DeviceMesh] Supported N groups in `from_group` #126258

awgu commented May 15, 2024 •

edited

pytorch-bot bot commented May 15, 2024 •

edited

wanchaol left a comment

awgu May 15, 2024

wanchaol left a comment

awgu commented May 16, 2024

awgu commented May 16, 2024

pytorchmergebot commented May 16, 2024

pytorchmergebot commented May 16, 2024

awgu commented May 16, 2024

awgu commented May 16, 2024

pytorchmergebot commented May 16, 2024

pytorchmergebot commented May 16, 2024

donglimm May 17, 2024

awgu May 17, 2024

awgu commented May 17, 2024

awgu commented May 17, 2024

pytorchmergebot commented May 17, 2024

[DeviceMesh] Supported N groups in from_group #126258

[DeviceMesh] Supported N groups in from_group #126258

Conversation

awgu commented May 15, 2024 • edited

pytorch-bot bot commented May 15, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126258

❌ 2 New Failures, 1 Unrelated Failure

wanchaol left a comment

Choose a reason for hiding this comment

awgu May 15, 2024

Choose a reason for hiding this comment

wanchaol left a comment

Choose a reason for hiding this comment

awgu commented May 16, 2024

awgu commented May 16, 2024

pytorchmergebot commented May 16, 2024

Merge started

pytorchmergebot commented May 16, 2024

Merge failed

awgu commented May 16, 2024

awgu commented May 16, 2024

pytorchmergebot commented May 16, 2024

Merge started

pytorchmergebot commented May 16, 2024

Merge failed

donglimm May 17, 2024

Choose a reason for hiding this comment

awgu May 17, 2024

Choose a reason for hiding this comment

awgu commented May 17, 2024

awgu commented May 17, 2024

pytorchmergebot commented May 17, 2024

Merge started

[DeviceMesh] Supported N groups in `from_group` #126258

[DeviceMesh] Supported N groups in `from_group` #126258

awgu commented May 15, 2024 •

edited

pytorch-bot bot commented May 15, 2024 •

edited