Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service/storage_proxy: capture tr_state by copy in handle_paxos_accept() #18702

Closed

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented May 16, 2024

this change is inspired by following warning from clang-tidy

Warning: /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:884:13: warning: 'tr_state' used after it was moved [bugprone-use-after-move]
  884 |         if (tr_state) {
      |             ^
/home/runner/work/scylladb/scylladb/service/storage_proxy.cc:872:139: note: move occurred here
  872 |         auto f = get_schema_for_read(proposal.update.schema_version(), src_addr, *timeout).then([&sp = _sp, &sys_ks = _sys_ks, tr_state = std::move(tr_state),
      |                                                                                                                                           ^

this is not a false positive. as tr_state is a captured by move for
constructing a variable in the captured list of a lambda which is in
turn passed to the expression evaluated to f.

even the expression itself is not evaluated yet when we reference
tr_state to check if it is empty after preparing the expression,
tr_state is already moved away into the captured variable. so
at that moment, the statement of f = f.finally(...) is never
evaluated, because tr_state is always empty by then.

so before this change, the trace message is never recorded.

in this change, we address this issue by capturing tr_state by
copying it. as tr_state is backed by a lw_shared_ptr, the overhead is
neglectable.

after this change, the tracing message is recorded.

please note, we could coroutinize this function to improve its
readability, but since this is a fix and should be backported,
let's start with a minimal fix, and worry about the readability
in a follow-up change.

Refs 548767f
Fixes #18725

  • the commit which originally introduced this issue was 7694f16, which dated back before 5.2, so we need to backport to 5.4 and 5.2 at least

@tchaikov tchaikov force-pushed the tracing-in-handle_paxos_accept branch from 1c147ee to 3a818bb Compare May 16, 2024 05:44
@tchaikov tchaikov requested a review from kbr-scylla May 16, 2024 05:45
@tchaikov tchaikov added the backport/none Backport is not required label May 16, 2024
@tchaikov
Copy link
Contributor Author

cc @raphaelsc

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 5 hr 10 min
  • Builder: i-05514738f185e02bd (m5ad.12xlarge)

@avikivity
Copy link
Member

If it fixes a real bug, please first provide a minimal fix (with a related bug). Later we can evaluate if coroutinization has merits (likely it does).

@avikivity
Copy link
Member

Ah, I see it's for a trace message. Still, tracing is important and a missing message could be confusing.

f = f.finally([tr_state, src_ip] {
tracing::trace(tr_state, "paxos_accept: handling is done, sending a response to /{}", src_ip);
});
tracing::trace(tr_state, "paxos_accept: handling is done, sending a response to /{}", src_ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code used finally meaning it was trying to trace even in the case of an exception. This new code will not add a trace in case of an exception.

@tchaikov tchaikov force-pushed the tracing-in-handle_paxos_accept branch 2 times, most recently from 7643f53 to 902af1e Compare May 17, 2024 08:05
…ept()

this change is inspired by following warning from clang-tidy

```
Warning: /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:884:13: warning: 'tr_state' used after it was moved [bugprone-use-after-move]
  884 |         if (tr_state) {
      |             ^
/home/runner/work/scylladb/scylladb/service/storage_proxy.cc:872:139: note: move occurred here
  872 |         auto f = get_schema_for_read(proposal.update.schema_version(), src_addr, *timeout).then([&sp = _sp, &sys_ks = _sys_ks, tr_state = std::move(tr_state),
      |                                                                                                                                           ^
```

this is not a false positive. as `tr_state` is a captured by move for
constructing a variable in the captured list of a lambda which is in
turn passed to the expression evaluated to `f`.

even the expression itself is not evaluated yet when we reference
`tr_state` to check if it is empty after preparing the expression,
`tr_state` is already moved away into the captured variable. so
at that moment, the statement of `f = f.finally(...)` is never
evaluated, because `tr_state` is always empty by then.

so before this change, the trace message is never recorded.

in this change, we address this issue by capturing `tr_state` by
copying it. as `tr_state` is backed by a `lw_shared_ptr`, the overhead is
neglectable.

after this change, the tracing message is recorded.

the change introduced this issue was 548767f.

please note, we could coroutinize this function to improve its
readability, but since this is a fix and should be backported,
let's start with a minimal fix, and worry about the readability
in a follow-up change.

Refs 548767f
Fixes scylladb#18725
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
@tchaikov tchaikov force-pushed the tracing-in-handle_paxos_accept branch from 902af1e to a429e7b Compare May 17, 2024 08:07
@tchaikov tchaikov requested review from gleb-cloudius and removed request for kbr-scylla May 17, 2024 08:11
@tchaikov
Copy link
Contributor Author

v2:

  • instead of coroutinizing this function, a minimal fix is used. so the comment on finally() is addressed as well.

@gleb-cloudius hi Gleb, thank you for your review. could you take another look?

@tchaikov tchaikov added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed and removed backport/none Backport is not required labels May 17, 2024
@tchaikov tchaikov changed the title service/storage_proxy: coroutinize handle_paxos_accept() service/storage_proxy: capture tr_state by copy in handle_paxos_accept() May 17, 2024
@tchaikov
Copy link
Contributor Author

ERROR 2024-05-17 17:22:15,419 [shard 0:strm] init - Could not start Prometheus API server on 127.144.104.40:9180: std::system_error (error system:98, posix_listen failed for address 127.144.104.40:9180: Address already in use)

i think it's an infra issue.

@kbr-scylla
Copy link
Contributor

ERROR 2024-05-17 17:22:15,419 [shard 0:strm] init - Could not start Prometheus API server on 127.144.104.40:9180: std::system_error (error system:98, posix_listen failed for address 127.144.104.40:9180: Address already in use)

Damn

I suspect it could be related to recent change in test.py -- 734c5de

@temichus please take look, it could be that we're leaking IP addresses somewhere, perhaps releasing them too early after this change :(

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 5 hr 7 min
  • Builder: i-05bfbc46eb79b2cd1 (m5ad.12xlarge)

@tchaikov tchaikov deleted the tracing-in-handle_paxos_accept branch May 20, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed promoted-to-master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

handle_paxos_accept() fails to record a trace message when done with handling
5 participants