Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 5.4] tools/scylla-sstable: add scylla sstable shard-of command #18681

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented May 15, 2024

when migrating to the uuid-based identifiers, the mapping from the
integer-based generation to the shard-id is preserved. we used to have
"gen % smp_count" for calculating the shard which is responsible to host
a given sstable. despite that this is not a documented behavior, this is
handy when we try to correlate an sstable to a shard, typically when
looking at a performance issue.

in this change, a new subcommand is added to expose the connection
between the sstable and its "owner" shards.

Fixes #16343
Signed-off-by: Kefu Chai kefu.chai@scylladb.com

Closes #16345

(cherry picked from commit 273ee36)

Fixes #18381

  • need to backport, because we have needs in production to figure out the mapping from an sstable identifier to the shard which "owns" it.

tchaikov and others added 3 commits May 15, 2024 14:32
when migrating to the uuid-based identifiers, the mapping from the
integer-based generation to the shard-id is preserved. we used to have
"gen % smp_count" for calculating the shard which is responsible to host
a given sstable. despite that this is not a documented behavior, this is
handy when we try to correlate an sstable to a shard, typically when
looking at a performance issue.

in this change, a new subcommand is added to expose the connection
between the sstable and its "owner" shards.

Fixes scylladb#16343
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb#16345

(cherry picked from commit 273ee36)
test_scylla_sstable_shard_of takes lots of time preparing the keys for a
certain shard. with the debug build, it takes 3 minutes to complete the
test.

so in order to test the "shard-of" subcommand in an more efficient way,
in this change, we improve the test in two ways:

1. cache the output of 'scylla types shardof`. so we can avoid the
   overhead of running a seastar application repeatly for the
   same keys.
2. reduce the number of partitions from 42 to 1. as the number of
   partitions in an sstable does not matter when testing the
   output of "shard-of" command of a certain sstable. because,
   the sstable is always generated by a certain shard.

before this change, with pytest-profiling:

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000  181.950   60.650 runner.py:219(call_and_report)
      4/3    0.000    0.000  181.948   60.649 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000  181.948   60.649 runner.py:318(from_call)
      4/3    0.000    0.000  181.948   60.649 runner.py:262(<lambda>)
    44/11    0.000    0.000  181.935   16.540 _hooks.py:427(__call__)
    43/11    0.000    0.000  181.935   16.540 _manager.py:103(_hookexec)
    43/11    0.000    0.000  181.935   16.540 _callers.py:30(_multicall)
      361    0.001    0.000  181.531    0.503 contextlib.py:141(__exit__)
   782/81    0.001    0.000  177.578    2.192 {built-in method builtins.next}
     1044    0.006    0.000   92.452    0.089 base_events.py:1894(_run_once)
       11    0.000    0.000   91.129    8.284 fixtures.py:686(<lambda>)
    17/11    0.000    0.000   91.129    8.284 fixtures.py:1025(finish)
        4    0.000    0.000   91.128   22.782 fixtures.py:913(_teardown_yield_fixture)
      2/1    0.000    0.000   91.055   91.055 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000   91.055   91.055 runner.py:119(runtestprotocol)
        2    0.000    0.000   91.052   45.526 conftest.py:50(cql)
        2    0.000    0.000   91.040   45.520 util.py:161(cql_session)
        1    0.000    0.000   91.040   91.040 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000   91.040   91.040 runner.py:509(teardown_exact)
     1945    0.002    0.000   90.722    0.047 events.py:82(_run)
```

after this change:
```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000    8.271    2.757 runner.py:219(call_and_report)
    44/11    0.000    0.000    8.270    0.752 _hooks.py:427(__call__)
    44/11    0.000    0.000    8.270    0.752 _manager.py:103(_hookexec)
    44/11    0.000    0.000    8.270    0.752 _callers.py:30(_multicall)
      4/3    0.000    0.000    8.269    2.756 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000    8.269    2.756 runner.py:318(from_call)
      4/3    0.000    0.000    8.269    2.756 runner.py:262(<lambda>)
       48    0.000    0.000    8.269    0.172 {method 'send' of 'generator' objects}
       27    0.000    0.000    5.671    0.210 contextlib.py:141(__exit__)
       11    0.000    0.000    4.297    0.391 fixtures.py:686(<lambda>)
      2/1    0.000    0.000    4.228    4.228 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000    4.228    4.228 runner.py:119(runtestprotocol)
        2    0.000    0.000    4.213    2.106 capture.py:877(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:509(teardown_exact)
        2    0.000    0.000    3.628    1.814 capture.py:872(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 runner.py:160(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 python.py:1797(runtest)
   114/81    0.001    0.000    3.505    0.043 {built-in method builtins.next}
       15    0.784    0.052    3.183    0.212 subprocess.py:417(check_output)
```

Fixes scylladb#16516
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb#16523

(cherry picked from commit 642652e)
…owners

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#18440

(cherry picked from commit d7a0159)
@tchaikov tchaikov requested a review from nyh as a code owner May 15, 2024 06:37
@tchaikov tchaikov changed the title tools/scylla-sstable: add scylla sstable shard-of command [Backport 5.4] tools/scylla-sstable: add scylla sstable shard-of command May 15, 2024
@tchaikov tchaikov added this to the 5.4.7 milestone May 15, 2024
@scylladb-promoter
Copy link
Contributor

Docs Preview 📖

Docs Preview for this pull request is available here

Changed Files:

Note: This preview will be available for 30 days and will be automatically deleted after that period. You can manually trigger a new build by committing changes.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - dtest
❌ - Unit Tests

Failed Tests (2/21537):

Build Details:

  • Duration: 5 hr 49 min
  • Builder: i-02aac6891b18b7043 (m5ad.12xlarge)

@tchaikov
Copy link
Contributor Author

tchaikov commented May 15, 2024

🔴 CI State: FAILURE

✅ - Build ✅ - dtest ❌ - Unit Tests

Failed Tests (2/21537):

* [scylla-gdb.run.release.1](https://jenkins.scylladb.com//job/scylla-5.4/job/scylla-ci/102/testReport/junit/%28root%29/non-boost%20tests/Tests___Unit_Tests___scylla_gdb_run_release_1) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+scylla-gdb.run.release.1)

* [test_read_stats](https://jenkins.scylladb.com//job/scylla-5.4/job/scylla-ci/102/testReport/junit/%28root%29/test_misc/Tests___Unit_Tests___test_read_stats) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_read_stats)

Build Details:

* Duration: 5 hr 49 min

* Builder: i-02aac6891b18b7043 (m5ad.12xlarge)
    def scylla(gdb, cmd):
>       return gdb.execute('scylla ' + cmd, from_tty=False, to_string=True)
E       gdb.error: Error occurred in Python: There is no member named _raw.

test_misc.py:7: error
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/jenkins/workspace/scylla-5.4/scylla-ci/scylla/test/scylla-gdb/../../scylla-gdb.py", line 5456, in invoke
    scylla_read_stats.dump_reads_from_semaphore(semaphore)
  File "/jenkins/workspace/scylla-5.4/scylla-ci/scylla/test/scylla-gdb/../../scylla-gdb.py", line 5397, in dump_reads_from_semaphore
    raw_schema = schema.dereference()['_raw']
                 ~~~~~~~~~~~~~~~~~~~~^^^^^^^^
gdb.error: There is no member named _raw.

i don't have an explanation yet.. filed #18700

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 5 hr 43 min
  • Builder: i-04cf4187b3b58f17d (m5ad.12xlarge)

@raphaelsc
Copy link
Member

🔴 CI State: FAILURE

✅ - Build ✅ - dtest ❌ - Unit Tests

Failed Tests (2/21537):

* [scylla-gdb.run.release.1](https://jenkins.scylladb.com//job/scylla-5.4/job/scylla-ci/102/testReport/junit/%28root%29/non-boost%20tests/Tests___Unit_Tests___scylla_gdb_run_release_1) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+scylla-gdb.run.release.1)

* [test_read_stats](https://jenkins.scylladb.com//job/scylla-5.4/job/scylla-ci/102/testReport/junit/%28root%29/test_misc/Tests___Unit_Tests___test_read_stats) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_read_stats)

Build Details:

* Duration: 5 hr 49 min

* Builder: i-02aac6891b18b7043 (m5ad.12xlarge)
    def scylla(gdb, cmd):
>       return gdb.execute('scylla ' + cmd, from_tty=False, to_string=True)
E       gdb.error: Error occurred in Python: There is no member named _raw.

test_misc.py:7: error
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/jenkins/workspace/scylla-5.4/scylla-ci/scylla/test/scylla-gdb/../../scylla-gdb.py", line 5456, in invoke
    scylla_read_stats.dump_reads_from_semaphore(semaphore)
  File "/jenkins/workspace/scylla-5.4/scylla-ci/scylla/test/scylla-gdb/../../scylla-gdb.py", line 5397, in dump_reads_from_semaphore
    raw_schema = schema.dereference()['_raw']
                 ~~~~~~~~~~~~~~~~~~~~^^^^^^^^
gdb.error: There is no member named _raw.

i don't have an explanation yet.. filed #18700

really awkward :-)

@tchaikov
Copy link
Contributor Author

🔴 CI State: FAILURE

i don't have an explanation yet.. filed #18700

really awkward :-)

indeed.

@scylladb-promoter scylladb-promoter merged commit 63d1c76 into scylladb:branch-5.4 May 17, 2024
4 checks passed
@tchaikov tchaikov deleted the branch-5.4-scylla-sstable-shard-of branch June 5, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants