Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make enable_compacting_data_for_streaming_and_repair truly live-update #18705

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

denesb
Copy link
Contributor

@denesb denesb commented May 16, 2024

This config item is propagated to the table object via table::config. Although the field in table::config, used to propagate the value, was utils::updateable_value<T>, it was assigned a constant and so the live-update chain was broken.
This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series.

Fixes: #18674

  • This patch has to be backported because it fixes broken functionality

@denesb denesb self-assigned this May 16, 2024
@denesb denesb added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed backport/6.0 labels May 16, 2024
@avikivity
Copy link
Member

Could we just remove the configuration? it smells like we added it so it can be disabled if broken, not because there's a reason not to do it.

The changelog doesn't say the same thing, but not too far either.

commit 9e3987f
Author: Botond Dénes bdenes@scylladb.com
Date: Thu Jul 20 10:01:34 2023 -0400

db/config: add config item for enabling compaction for streaming and repair

Compacting can greatly reduce the amount of data to be processed by
streaming and repair, but with certain data shapes, its effectiveness
can be reduced and its CPU overhead might outweight the benefits. This
should very rarely be the case, but leave an off switch in case
this becomes a problem in a deployment.
Not wired yet.

@avikivity
Copy link
Member

Though I guess, if you went to the trouble of fixing it, probably you did want to disable it.

@denesb
Copy link
Contributor Author

denesb commented May 17, 2024

Though I guess, if you went to the trouble of fixing it, probably you did want to disable it.

We have hit a bug in the compacting reader, causing false-positive validation errors (see #18667) and to work around it field wanted to disable compacting during streaming, and that ended up needing a RR.

…g_and_repair

This config item is propagated to the table object via table::config.
Although the field in table::config, used to propagate the value, was
utils::updateable_value<T>, it was assigned a constant and so the
live-update chain was broken.
This patch fixes this.
Allow injection points save values into the "value" parameter, which
external code can then examine. This allows exfiltrating the values if
internal variables, to be examined by tests, without exposing these
variables via an "official" path.
Allow external code to obtain information about an error injection
point, including whether it is enabled, and importantly, what its
parameters are. Together with the `set_parameter()` added in the
previous patch, this allows tests to read out the values of internal
parameters, via a set_parameter() injection point.
The /v2/error_injection/{injection} endpoint now has a GET method too,
expose this.
…ng_and_repair live-update

Avoid this the live-update feature of this config item breaking
silently.
@denesb denesb force-pushed the streaming-compact-live-update branch from a8dbe99 to 8f617b3 Compare May 17, 2024 12:03
@denesb
Copy link
Contributor Author

denesb commented May 17, 2024

New in v2:

  • Rebased
  • Fix test: make sure to update config on all nodes, not just one a random one (test failed when config was updated on one node, then effect of change checked on another).

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_custom/test_repair
✅ - Container Test
❌ - dtest with topology changes
❌ - dtest
✅ - Unit Tests

Failed Tests (2/31706):

Build Details:

  • Duration: 6 hr 18 min
  • Builder: i-08507b129501aef7b (r5ad.8xlarge)

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_custom/test_repair
✅ - Container Test
❌ - dtest
✅ - dtest with topology changes
✅ - Unit Tests

Failed Tests (1/30873):

Build Details:

  • Duration: 3 hr 39 min
  • Builder: i-08e62b49e547bc395 (i4i.8xlarge)

@denesb
Copy link
Contributor Author

denesb commented May 21, 2024

test_repair_compacts_data dtest is broken by this PR, looks like it silently relied on live-update being broken. I need to fix the dtest first.

@avikivity
Copy link
Member

test_repair_compacts_data dtest is broken by this PR, looks like it silently relied on live-update being broken. I need to fix the dtest first.

haha

@denesb
Copy link
Contributor Author

denesb commented May 22, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed backport/6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enable_compacting_data_for_streaming_and_repair has to be LiveUpdate
3 participants