Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use torch generic workflow for CI, add ssh, artifacts #325

Merged
merged 15 commits into from
May 15, 2024

Conversation

wconstab
Copy link
Contributor

@wconstab wconstab commented May 14, 2024

Stack from ghstack (oldest at bottom):

This moves over to using the standard pytorch CI job template. (doc).

The general advantages should be that we can more easily add features or options in a maintained way. A specific reason is becuase I was not able to ssh-debug on our old CI and @seemethere mentioned that the 'generic workflow' is where the CI SSH support lives.

SSH
Use ssh just like pytorch/pytorch CI:
image

Artifacts Uploading
The job.dump_folder for each test is uniquely named and bundled into an outputs.zip which can be downloaded from github actions UI:
image

  • profiles
  • checkpoints
  • flight recorder comm_dumps (if any hang happened during CI)

To implement the artifacts upload, the following changes are made to test_runner.py

  • job.dump_folder is set to a unique subfolder for each test, so test outputs don't overwrite each other
  • the root for job.dump_folder is artifacts-to-be-uploaded
  • the 'default' configuration gets tested explicitly with its own dump_dir specified

[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: fc0b11e214ee065e9843a926b6610d58c78dc989
Pull Request resolved: #325
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 14, 2024
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: 3a4a6a3d7e557386bb78e9ad629bd9af429cade2
Pull Request resolved: #325
.github/workflows/unit_test_4gpu.yaml Outdated Show resolved Hide resolved
.github/workflows/unit_test_4gpu.yaml Outdated Show resolved Hide resolved
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: 6ba9a97f2012ff99ed334e4cbfc35804d7bb192b
Pull Request resolved: #325
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: b26e91b0d7b92cdb8d55859186777c1ff6669503
Pull Request resolved: #325
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: 008612abbfe85656a1cae9ef590710687744bfd6
Pull Request resolved: #325
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: 9f410e6b6b7d616c7beb044ac9382be3179df812
Pull Request resolved: #325
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
ghstack-source-id: 121fe0b027a4d20b9ec49da1bbf17d5d1cc1928a
Pull Request resolved: #325
[ghstack-poisoned]
[ghstack-poisoned]
@wconstab wconstab mentioned this pull request May 15, 2024
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@wconstab wconstab mentioned this pull request May 15, 2024
[ghstack-poisoned]
@pytorch pytorch deleted a comment from codecov-commenter May 15, 2024
[ghstack-poisoned]
@wconstab wconstab changed the title Use torch generic workflow for CI Use torch generic workflow for CI, add ssh, artifacts May 15, 2024
[ghstack-poisoned]
@wconstab wconstab requested a review from wanchaol May 15, 2024 19:31
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debuggability of CIs looks awesome!

@wconstab wconstab merged commit 7b4c79a into gh/wconstab/14/base May 15, 2024
4 checks passed
wconstab added a commit that referenced this pull request May 15, 2024
ghstack-source-id: b1fa8d8c1778ecb532ed71792ead9f4dbb067cf4
Pull Request resolved: #325
@wconstab wconstab deleted the gh/wconstab/14/head branch May 15, 2024 22:14
tianyu-l added a commit that referenced this pull request May 22, 2024
…rchdata import failure"


1. use the same generic torch CI workflow for periodic integration test, as in #325 for cpu/gpu unit tests.
2. `StatefulDataloader` is not in `torchdata` official release yet. Print helper message if user doesn't have a recent nightly installed.

[ghstack-poisoned]
tianyu-l pushed a commit that referenced this pull request May 28, 2024
ghstack-source-id: b1fa8d8c1778ecb532ed71792ead9f4dbb067cf4
Pull Request resolved: #325
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants