Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compat expansion; forge refactor #13302

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

brianolson
Copy link
Contributor

Description

Expand "compat" forge test to simultaneously do traffic generation, gather stats, and run a gradual upgrade of validator nodes. Ensure that TPS stays high enough during upgrade.

Lots of refactor to get there, replacing lots of &Foo and &mut Foo with Arc<Mutex<Foo>> to make things multithread capable.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

This is tests. This PR running tests in forge clusters is the test.

Key Areas to Review

Is this idiomatic Rust? Is there a better way?

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@brianolson brianolson added CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:build-performance-images build performance docker image variants labels May 16, 2024
@brianolson brianolson self-assigned this May 16, 2024
Copy link

trunk-io bot commented May 16, 2024

⏱️ 4h 34m total CI duration on this PR
Job Cumulative Duration Recent Runs
forge-compat-test / forge 54m 🟥
rust-images-performance / rust-all 33m 🟥🟩
execution-performance / single-node-performance 31m 🟥🟥🟥🟥
rust-images / rust-all 22m 🟥🟩
forge-e2e-test / forge 20m 🟩
rust-smoke-tests 20m 🟥🟥
rust-targeted-unit-tests 14m 🟥🟥
execution-performance / test-target-determinator 13m 🟩🟩🟩🟩
rust-lints 9m 🟥🟥
run-tests-main-branch 8m 🟩🟩
check 8m 🟩🟩
rust-build-cached-packages 8m 🟩🟩
test-target-determinator 6m 🟩🟩
rust-move-tests 6m 🟩🟩
cli-e2e-tests / run-cli-tests 6m 🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
general-lints 5m 🟩🟩
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 2m 🟩
semgrep/ci 2m 🟩🟩🟩🟩
node-api-compatibility-tests / node-api-compatibility-tests 49s 🟩
file_change_determinator 23s 🟩🟩
file_change_determinator 19s 🟩🟩
file_change_determinator 18s 🟩🟩
permission-check 8s 🟩🟩
permission-check 8s 🟩🟩🟩
permission-check 6s 🟩🟩
permission-check 4s 🟩🟩
determine-docker-build-metadata 4s 🟩🟩
permission-check 4s 🟩🟩

🚨 4 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
forge-compat-test / forge 54m 14m +287%
rust-images / rust-all 18m 13m +40%
forge-e2e-test / forge 20m 15m +33%
rust-move-tests 3m 11m -74%

settingsfeedbackdocs ⋅ learn more about trunk.io

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Arc<std::sync::Mutex<LocalAccount>> -> Arc<LocalAccount>
  because it contains an atomic counter which hides mutability and is safe, no additional mutex needed or desired
Some back to just &LocalAccount
Add tokio Handle to NetworkContextSynchronizer and use it for async-ness inside NetworkTest run() implementations.
NetworkContextSynchronizer Arc<Mutex<NetworkContext<'t>>> -> Arc<tokio::sync::Mutex<NetworkContext<'t>>>
  because tokio contaminates and enblobifiies all

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

❌ Forge suite compat failure on a68e71c05caebf01504d4499110f3fba213fb53d ==> 0fafd8e51e2e17ae3405d929dd4d4a077082f3d3

Forge test runner terminated:
Trailing Log Lines:
             at ./testsuite/forge/src/runner.rs:600:30
  19: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:428:11
  20: forge::main
             at ./testsuite/forge-cli/src/main.rs:354:21
  21: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-compat-pr-13302-1717189208-a68e71c05caebf01504d4499110f3f","timestamp":"2024-05-31T21:03:33.703193Z","message":"Deleting namespace forge-compat-pr-13302: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-compat-pr-13302-1717189208-a68e71c05caebf01504d4499110f3f","timestamp":"2024-05-31T21:03:33.703222Z","message":"aptos-node resources for Forge removed in namespace: forge-compat-pr-13302"}
Debugging output:
NAME                                    READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                1/1     Running     0          2m55s
aptos-node-1-validator-0                1/1     Running     0          2m55s
aptos-node-2-validator-0                1/1     Running     0          2m55s
aptos-node-3-validator-0                1/1     Running     0          2m55s
genesis-aptos-genesis-eforge191-gpv9f   0/1     Completed   0          3m6s

Copy link
Contributor

❌ Forge suite realistic_env_max_load failure on 0fafd8e51e2e17ae3405d929dd4d4a077082f3d3

Forge test runner terminated:
Trailing Log Lines:
             at ./testsuite/forge/src/runner.rs:600:30
  19: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:428:11
  20: forge::main
             at ./testsuite/forge-cli/src/main.rs:354:21
  21: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-e2e-pr-13302-1717189218-0fafd8e51e2e17ae3405d929dd4d4a077","timestamp":"2024-05-31T21:04:33.652450Z","message":"Deleting namespace forge-e2e-pr-13302: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-e2e-pr-13302-1717189218-0fafd8e51e2e17ae3405d929dd4d4a077","timestamp":"2024-05-31T21:04:33.652477Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-13302"}
Debugging output:
NAME                                    READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge207-0       1/1     Running     0          3m34s
aptos-node-0-validator-0                1/1     Running     0          3m34s
aptos-node-1-fullnode-eforge207-0       1/1     Running     0          3m35s
aptos-node-1-validator-0                1/1     Running     0          3m35s
aptos-node-2-fullnode-eforge207-0       1/1     Running     0          3m35s
aptos-node-2-validator-0                1/1     Running     0          3m35s
aptos-node-3-fullnode-eforge207-0       1/1     Running     0          3m35s
aptos-node-3-validator-0                1/1     Running     0          3m35s
aptos-node-4-fullnode-eforge207-0       1/1     Running     0          3m35s
aptos-node-4-validator-0                1/1     Running     0          3m35s
aptos-node-5-validator-0                1/1     Running     0          3m35s
aptos-node-6-validator-0                1/1     Running     0          3m35s
genesis-aptos-genesis-eforge207-zcsg9   0/1     Completed   0          3m57s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-performance-images build performance docker image variants CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant