[DTest]: Scale test with 100 nodes failed with Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments) #18669

aleksbykov · 2024-05-14T10:52:48Z

Installation details
Scylla version (or git commit hash): Scylla version 5.5.0~dev-0.20240513.2ce643d06bf0 with build-id 2e2c89cbb469c1231861753c4af823040a31579e
Cluster size: up to 100 nodes

POC dtest update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load_100_nodes](https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/269/artifact/logs-full.release.000/1715679351559_update_cluster_layout_tests.py%3A%3ATestLargeScaleCluster%3A%3Atest_add_many_nodes_under_load_100_nodes/) failed upon adding 72nd node.

with one hundred nodes test failed on node 72: with startup failed because node1(topology coordinator failed with core):

INFO  2024-05-14 09:35:44,350 [shard 0:main] init - Shutting down sighup was successful
INFO  2024-05-14 09:35:44,350 [shard 0:main] init - Shutting down configurables
INFO  2024-05-14 09:35:44,350 [shard 0:main] init - Shutting down configurables was successful
ERROR 2024-05-14 09:35:44,350 [shard 0:main] init - Startup failed: std::runtime_error (Bootstrap failed. See earlier errors (Rolled back: Failed to commit cdc generation: std::runtime_error (raft topology: exec_global_command(barrier) failed with seastar::rpc::closed_error (connection is closed))))

Node1 reported:

INFO  2024-05-14 09:35:09,187 [shard 0:main] raft_group_registry - marking Raft server 653fc7f9-529e-4226-a3a2-c59480e630ef as alive for raft groups
WARN  2024-05-14 09:35:09,217 [shard 0:stmt] mutation_partition - Memory usage of unpaged query exceeds soft limit of 1048576 (configured via max_memory_for_unlimited_query_soft_limit)
WARN  2024-05-14 09:35:09,231 [shard 0:stmt] mutation_partition - Memory usage of unpaged query exceeds soft limit of 1048576 (configured via max_memory_for_unlimited_query_soft_limit)
INFO  2024-05-14 09:35:10,057 [shard 0:strm] boot_strapper - Get random bootstrap_tokens={1433426564769120107, -3024130438421483493, 6783300416989774946, -1913632903514385366, -1296153106641068349, 1007429126839380150, 7984639701427359074, -5786157125825623110, -7359984701809794846, 3522986124224876937, -3441361695749005185, -1206305428627841913, -3124027604652700064, 2725068477338432786, 5205155087332034779, 2433256482079879044, 3679214181567140968, -5816330808577396288, -118004484174318508, 9039937626987112835, -8986726699696615599, -2490997589493233577, -7403114531822865343, -5716614150496756632, -7493452086901961486, -2617472633652495306, -3959849680369634054, -6734145791635177944, 3700399166162257380, 4663913147842742309, -7037273237366318004, -3628651639882969965, -7687167558551732292, -4990535938613735299, 5358693883544366981, -4977344577775304537, -8589655038447844655, 7083891868698313787, -8863804598399540463, -8771150581826651932, -5767987021411659170, 5218986179965462847, 2563434673873028804, 7162972308007128904, -8252511960786747378, 6376868954473548859, -76401018025807680, 2710714891402271653, 1656538783571724615, 418032343716706861, 6697443819176173875, -3514937049858708671, -1863753423742103941, -6400004109999888752, -2692834300114109558, 1772721558144266394, -2575621367426696002, 383763399075815123, 214228935000835408, -6262633267021563247, 1690187744831784835, -3384833187351254953, 1891341548161072751, 2528595013806715080, -1284056852921209896, 5133445597581039797, 6019522473871943281, 7315350882739636578, 9051377965348576980, -1397375351519384899, -6205047198403160958, 6859895283439108025, 7954942467421122994, -7872251089852618669, 8571203620064791618, -3677958235254371874, -6859785428053165913, 2282647432688241035, 1800670181316544576, -1859478944350163913, -2250514226640233767, -7938529939655239330, 4206354305925898378, -2937263044357094893, -5008816272078468082, -5498669479214528962, -5958746016302597292, -8971552293989757935, -9106251332869317242, 5918305565532252436, -3841807879079826123, -7323785487124108681, -7735831486495017644, 7204087527042597046, -7232993587521638169, -4755186106873936807, -958779569968857377, 1916323627092394585, 7212456190278428228, 7157569241964554297, -6515071474984575560, -7911708490078297653, -8150780494158497303, -6323264798375069181, -5039969876466999202, 8169410260307209331, -8101573313025566123, 825092948703179705, 5122021485732998520, 724335777260500712, 1572630048126628947, 625943744704382258, -5729509194927664742, -945883982220334658, -3266957823789072687, 3200194487146988246, 4953675008171200170, 5956725420241501323, 4357968297482089038, -8113442556620021593, -5462829191312072664, 3523901485555516340, -2603353483898286824, 3932941005767061773, -6556989695979498498, 7757339169541007516, 8818407673633376845, 1896016089707419861, -2797834054124500170, -7156250960156119166, 632977869553601190, 4424575218781430862, 3758371095514587868, -4528231696225597070, -8077240885193817196, -676982939146095600, 32681540223590446, -7711062772471220431, -3152853350762551531, 7725397093181086598, -3161376335528435746, 7771306044333077868, -3821692663245420267, -6306188336484368604, -3094319437721657761, -802116362462267891, 9214007929849164490, 116433898980682754, 7697789486638277069, -4075465411742528477, -7326517739065625923, 2112368831788757319, -6170387378614800072, 2253725650358131408, 6700888590293025599, 2168865430477276424, 695649264342536306, -7971750163187343559, -3170733055132740834, 7861011416081269000, -2273925284435075797, 5114868705041233173, 5564589071519102085, 6713810811342183230, -356363176651482907, 4275472870740204854, 5697228932344174734, 6691864265103870369, 625188329207482295, -868587851645524464, 7996393205484900109, -390104312926003599, 1328171356989826527, -974771486148213499, 6091405376432035272, -6433226178234882108, 7593238809332649617, -8109742698956844364, 4646230940499213604, -9117438798757406119, 3991923048465018930, -8227945738875681247, -1253549626841561311, -7875609559368782488, 2859925769821374569, -4086010592457615043, 4574271297010416851, 7018320147844907384, 3047475207172594410, -3154963074535217256, -3444347913241611466, 4042803308545958286, 2664649119029700058, 5091259864602170296, 6602591859512010379, 5185395581325197002, 1150375261101609956, -3501784990426085068, -1337623305353950550, -9122397680101266306, 5163435014056426504, -7303688106280964990, -8713058669476528687, -3505071432777038461, 3858149921735920, -3825907779282338016, 76516151967863266, -4836007009399991140, -4950983275267961235, -4336198082505719916, 6119017763310869985, 6741462379366766122, 1976938571152401547, -401356364890233257, 7350557784710934185, 1502329198412860627, -1376843198472012287, 6458266078616169864, 25452585543239930, -7562222136015234239, -3745334077680945772, 6721213754361735353, -6882100446714222226, 7611898378492396288, -7585761129002046485, 8559125686671297708, 5711293982266954434, -2619363999579083185, 3484361002832311692, 8019752824201374624, 1243571285313864769, 5317769688969194095, -3994366891095412569, -9125477489958615233, 5298270704832328366, 4728581015820436936, -3441737662948346413, 7216465223581840673, -7869446220183283232, 5500695658777323719, 5211148161944101124, 6445779647686489488, 6729456479580983997, -8760014131959324523, -7673231066596070984, 2136180900933206657, 660876118807753351, 4194089243637930954, 3434951662449236451, -7144918282434882076, -1629403148235721976, -2875386342848361916, 2601130810192064008, 5832775409262545507, 4633047589837472532, 8995914355180526706}
INFO  2024-05-14 09:35:10,211 [shard 0:strm] raft_topology - updating topology state: bootstrap: insert tokens and CDC generation data (UUID: 41bfd970-11d5-11ef-59b4-87d0a0ed676e)
ERROR 2024-05-14 09:35:10,267 [shard 0: gms] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments)
Aborting on shard 0.
Backtrace:
  0x5eb6d48
  0x5eed5a1
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x3dbaf
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x8e883
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x3dafd
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x2687e
  0x213a667
  0x1d995bd
  0x1c4cf28
  0x1b19ad6
  0x1ba5fb7
  0x143784a
  0x5ec897f
  0x5ec9c67
  0x5ec8fc8
  0x5e56ed7
  0x5e5609c
  0x13c7cb8
  0x13c9700
  0x13c62cc
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x27b89
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x27c4a
  0x13c37e4

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:68
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:825
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:855
seastar::print_with_backtrace(char const*, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:867
 (inlined by) seastar::sigabrt_action() at ./build/release/seastar/./seastar/src/core/reactor.cc:4071
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4047
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:4043
/data/scylla-s3-reloc.cache/by-build-id/2e2c89cbb469c1231861753c4af823040a31579e/extracted/scylla/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=70e92bb237883be3065a6afc9f0696aef2d068bf, for GNU/Linux 3.2.0, not stripped

__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2945
decltype(auto) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}>(logalloc::region, logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}) at ././utils/logalloc.hh:473
 (inlined by) decltype(auto) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:529
 (inlined by) operator() at ./replica/memtable.cc:800
 (inlined by) decltype(auto) with_allocator<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0>(allocation_strategy&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0&&) at ././utils/allocation_strategy.hh:318
 (inlined by) replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&) at ./replica/memtable.cc:799
void replica::table::do_apply<frozen_mutation const&, seastar::lw_shared_ptr<schema const>&>(replica::compaction_group&, db::rp_handle&&, frozen_mutation const&, seastar::lw_shared_ptr<schema const>&) at ./replica/table.cc:2816
 (inlined by) operator() at ./replica/table.cc:2844
 (inlined by) seastar::future<void> seastar::futurize<void>::invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&>(replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&) at ././seastar/include/seastar/core/future.hh:2032
 (inlined by) auto seastar::futurize_invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&>(replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&) at ././seastar/include/seastar/core/future.hh:2066
 (inlined by) seastar::futurize<std::result_of<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0 ()>::type>::type replica::dirty_memory_manager_logalloc::region_group::run_when_memory_available<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0>(replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) at ./replica/dirty_memory_manager.hh:572
 (inlined by) replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) at ./replica/table.cc:2843
replica::database::apply_in_memory(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) at ./replica/database.cc:1833
replica::database::do_apply(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) at ./replica/database.cc:2053
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
 (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:125
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2690
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3152
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3320
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3210
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:682
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:2172
__libc_start_call_main at ??:?
__libc_start_main_alias_2 at :?
_start at ??:?

Test is presented by this commit: https://github.com/aleksbykov/scylla-dtest/commit/e9be258e70810650bcf3bda46af06f4b5d720b00

The text was updated successfully, but these errors were encountered:

kbr-scylla · 2024-05-14T10:56:38Z

My comments on the old issue:

Scale test (adding 40 nodes) failed by timeouts(nodes added to long) and errors on teardown: seastar::rpc::remote_verb_error (Mutation of 17696794 bytes is too large for the maximum size of 16777216) #17545 (comment)
Scale test (adding 40 nodes) failed by timeouts(nodes added to long) and errors on teardown: seastar::rpc::remote_verb_error (Mutation of 17696794 bytes is too large for the maximum size of 16777216) #17545 (comment)

Summarizing, we should compare if it's a regression or not, it could indeed be as @aleksbykov suggested that the instance is too small (running out of memory), because we have 72 nodes running on a single machine (this is a dtest).

If it happens on 5.4 too, there should be nothing to worry about.

kbr-scylla · 2024-05-14T10:57:21Z

Marking as release blocker for now, before checking the above, but there's a chance we will be able to take it off and/or close the issue quickly.

aleksbykov · 2024-05-14T11:00:05Z

We could compare 3 runs:

master with --force-gossip-topology-changes
master with raft-topology (we already have this and know it breaks)
5.4

kbr-scylla · 2024-05-14T11:00:52Z

From the other issue:

Also, how many shards is this using? Is it --smp 2 (IIRC the default with dtest)? Is it running with --overprovisioned?

yes:
smp, (positional) 2, memory, (positional) 1024M,

But I see no overprovisioned. Which means all nodes are competing for resources on the same two shards, IIUC.

@aleksbykov you could also check whether it keeps failing with --overprovisioned flag (passed to seastar, same as --smp 2). Only for master raft-topology mode.

aleksbykov · 2024-05-15T04:24:05Z

master with --force-gossip-topology-changes

Job https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/272/ also failed. also around 71 node. not all logs are collected yet. But also there is coredumps.

aleksbykov · 2024-05-15T09:13:05Z

From the other issue:

Also, how many shards is this using? Is it --smp 2 (IIRC the default with dtest)? Is it running with --overprovisioned?

yes:
smp, (positional) 2, memory, (positional) 1024M,

But I see no overprovisioned. Which means all nodes are competing for resources on the same two shards, IIUC.

@aleksbykov you could also check whether it keeps failing with --overprovisioned flag (passed to seastar, same as --smp 2). Only for master raft-topology mode.

@kbr-scylla , the test was run with enabled --overprovisioned option according to log in ScyllaNode.start method, because cpuset was not passed in parameters:

# TODO add support for classes_log_level
        if '--cpuset' not in args:
            args += ['--overprovisioned']
        if '--prometheus-address' not in args:

kbr-scylla · 2024-05-15T10:56:09Z

Job https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/272/ also failed. also around 71 node. not all logs are collected yet. But also there is coredumps.

Ok, so there's a chance there's no regression. Let's confirm with 5.4 run

@kbr-scylla , the test was run with enabled --overprovisioned option according to log in ScyllaNode.start method, because cpuset was not passed in parameters:

Good

aleksbykov · 2024-05-15T11:43:52Z

Also with smp=1 it also failed on same moment:
https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/274/artifact/logs-full.release.000/1715769946325_update_cluster_layout_tests.py%3A%3ATestLargeScaleCluster%3A%3Atest_add_many_nodes_under_load_100_nodes/

aleksbykov · 2024-05-15T11:44:23Z

Ok, so there's a chance there's no regression. Let's confirm with 5.4 run

it is running for now

aleksbykov · 2024-05-16T06:25:36Z

5.4
it is running for now

Failed by timeout. Configured timeout for test was not enough. Increased and job has been rerun.
68 nodes were provisioned successfully

kbr-scylla · 2024-05-16T14:00:06Z

Increased and job has been rerun. 68 nodes were provisioned successfully

Are you saying it failed at node 69?

aleksbykov · 2024-05-17T08:15:56Z

Issue reproduced with 5.4. Looks very similar abort:
Job: https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/branch-5.4-dtest-release/6/artifact/logs-full.release.000/1715853769679_update_cluster_layout_tests.py%3A%3ATestLargeScaleCluster%3A%3Atest_add_many_nodes_under_load_100_nodes/

it failed on 87 node

aleksbykov · 2024-05-17T08:52:10Z

@yaronkaikov , is it possible to run these custom test: master-dtest-with-raft/269 with 100 nodes on more powerfull instance?

kbr-scylla · 2024-05-17T11:50:18Z

Issue reproduced with 5.4. Looks very similar abort:

So it's most likely as you said -- the instance is too weak to deal with that many nodes.

Let's see if the test passes on master (with default i.e. raft-topology mode) on a stronger instance and if so, I'll close this issue.

yaronkaikov · 2024-05-19T06:55:29Z

@yaronkaikov , is it possible to run these custom test: master-dtest-with-raft/269 with 100 nodes on more powerfull instance?

How many instances do you need? we may need to create a new auto-scaling group with more powerful instances for testing

btw, you have a job running now for 5 days https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/272/

aleksbykov · 2024-05-19T07:05:39Z

How many instances do you need? we may need to create a new auto-scaling group with more powerful instances for testing

At the moment, I'm considering running these tests to verify a theory that the issue lies in the lack of resources on the instance to perform tests with 100 nodes. I want to do this through Jenkins because it will be more convenient to share the results and rerun the tests or reproduce the problem.

As far as I know, we currently have one test with such a large configuration.

The new auto-scaling groups sound great, and it should be easy to set them up in the current job parameters, if I'm not mistaken.

mykaul · 2024-05-19T09:22:02Z

Moving to 6.1, just to remove it from 6.0 blockers list for the time being.

yaronkaikov · 2024-05-20T06:35:06Z

How many instances do you need? we may need to create a new auto-scaling group with more powerful instances for testing

At the moment, I'm considering running these tests to verify a theory that the issue lies in the lack of resources on the instance to perform tests with 100 nodes. I want to do this through Jenkins because it will be more convenient to share the results and rerun the tests or reproduce the problem.

As far as I know, we currently have one test with such a large configuration.

The new auto-scaling groups sound great, and it should be easy to set them up in the current job parameters, if I'm not mistaken.

@aleksbykov Done, you can use it by using the label ec2-strong-dtest-asg-testing
Let me know if any assistance is required

aleksbykov · 2024-05-21T07:43:35Z

Latest job with stronger instance failed by timeout (not enough time configured for test running). Timeout was increased and job is rerun
https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/276

aleksbykov · 2024-05-22T04:04:50Z

Job failed on large instance also with raft topology. Topology coordinator failed with same error:

WARN  2024-05-21 11:21:25,443 [shard 0:stmt] querier - Read 36096 live rows and 1495 tombstones for system.cdc_generations_v3 <partition-range-scan> (-inf, +inf) (see tombstone_warn_threshold)
INFO  2024-05-21 11:21:27,218 [shard 0:strm] boot_strapper - Get random bootstrap_tokens={-8148082370294179137, 4620014630779972309, 2424700765112090203, 5466524275082028577, -5819510943213408016, -6058870038039184230, -3241555648671600928, 365242135387759817, 344700842578459237, 9852199498991121, -7051368105332904415, 3645027930726209707, 3336758005247463872, 3151283819991638200, 6413107386812507710, -8415595805950628215, 160351155132405632, -6929708672024747366, -1298086245746467984, 6612158933228204982, 7125578591291886972, -3610989874596270173, 3489176897153492974, 2560917642279427575, -5352136660562995027, -3369999300053744040, -76676475761430343, -3632101459773386692, 4411085217939970496, -4335513586063779825, -6106066598250354729, -5189486550360641118, 8462762049898172610, 5021478606918039844, -201264105394937713, -3306142757747260037, 6412029416747550341, -7768211040909738263, -7077441556879510791, 8304821689512092746, 1273571948249123837, 2119863080085069643, 4687280360120818632, -9136809892799794339, -3338614652533959203, 7147738390767700898, 1908461327283427597, -2008754501187927089, 4105394895775541757, 911213472174970829, 6782978461574343927, -3431911020852380894, -3622773639082677625, -394559251076010123, -6162423987369491756, -9098694345347253717, 1850950852828471361, 7978162429888121207, 1921388730251161511, -8567699539280143237, 2252744047228501961, -2857301303074841759, 4821896340125357258, -6600331649644831782, -58680104134114103, -5242868753697015731, -6452535797929285944, 8507309621055389583, -7107446121823092960, -1340666642063605725, -3401923917234982068, -4521397087053464556, 4437299910591387488, -3499066501814971103, 4102749158037290879, 1016877770200575903, -5064464502860479572, 8209077802042232983, -3399662567045315019, 2017125442922007801, 5932932212593547490, -6678674888555589282, -6978952762247964257, 7594281966267746597, 4398652639879150001, 7453869979177745236, -3317121570343834981, 2068127202136093344, 4249406911822678995, -2477818714222436557, 6506617384180992935, -4629272698508198586, 3177129638637710223, 6643884996812542730, 8600404830858861505, -4429612224043985033, 8474182966265523017, -673460548772853838, 1103768140372311150, 9008326038061939511, 3444359836164395419, 4284761482367643047, 8125349991362818918, -6216015612802542476, 6447845345552351454, -4420561220456685378, -6394953367176667843, -4964902413272461850, 4636917662681572253, -6673354119025374541, -7157487177381974458, -5284322919171515834, -4301206186132276216, 7850326886209304523, -5958159561846355269, -8683256762003195675, 5033441410664747971, -8387731195143571846, 2623901874196853466, -7485026136944046076, -8140844621599473586, -1865444627593952243, 7414647625024164361, -1535205147557040105, -2551681060292695769, -385146284139840059, 8750040605127626801, 1579192154523762126, 413906829988706890, 7873724730433842438, 3486520900805561309, -7235569402645139367, 6192605429040670520, -6185156084978561529, 1196762009629406002, -5604133415447852330, 6916528604891467828, -3899157593798354426, 3564390682438912740, 3794445677863966919, 7970574733109596588, 4167476357157500990, -7472300876046469405, -31701128956443517, 1092321986579620281, 6536346001758865963, -6218170212892174542, -3137823890598769102, 964307931925631166, -1352906456188764034, -4026320546777759257, 7998151608920923687, 8947437356414172739, -3577966604773211933, 4087410578068285012, -1758739494081223076, 6725993246402721026, -3423746303587087475, 7149038341512000324, -3543291593566646525, 1011459796209873376, -5753173041309555, 7333770743603801677, 3266080243434171810, -1526391202249763020, -9056477046259225569, -8396047059933248764, 5717855096518878327, -6094657385760456366, -3033135599316612957, 7245006528961674578, 6287426114655736977, 5558496196118483471, 8933380522828126217, -4136573185681528021, -3867988413589020891, -6671115575290535158, -6281009673214061891, -5355853129142193017, -136982392358733346, 8208784309597806026, 3584230947347786035, 7799991260160768662, -3215905248271767805, -6965466787969870072, -5675712017193737415, 4085471617089157294, 5713054318453385712, -5294655383524214585, -6092797742710560203, 5180461121356505137, -93307238965597135, -8127775389111821740, 7866415343271525260, -7446270076006360062, 1830469166041761055, -9214811591090972654, -7402597557596279682, -5963022797815006130, 6725014179471126572, -4905670800956842945, -4068997425995266188, 4245016001398678666, 5650579168698380707, -6486595560993535505, 5759641652113510576, -4306875110583448576, -2516207563320145030, -1045337515660436424, -9131480170340207847, -5566940313929962364, 2957197601716144215, 7714984177018917108, 2591249728959614939, 7138340952407306477, 6553305832899964110, -5378101852316764767, 5433361589529561577, 3531051422956681915, 3588741888014605893, -1970777295714585533, -8890864527901804596, -776856418447315581, 4421606580727532839, -3669067063683287650, -4614363199867221190, 999467127174365378, -5328984352950268126, 8706513111718155223, -3720878552975730432, 1761972805123737693, -7851560238669269810, 8962173972678665017, 6763336745967353200, -7250757404177364974, 7276013736096824235, 2036833116857911970, -185249593215815125, 929326221146836045, -5766852694492296916, 998932530559062399, -5298191210807773962, 1486585713675909357, -4560025680954262302, -5846980093959511276, -3614167833814323012, -8528077215314674759, -5748209719068550576, -3108027903784201817, 3064490542610400795, -4706179066340757480, -2186542488114433132, -5479046369551479833, -5761218520893471808, 4557334126294098760, -4584107441795617998}
INFO  2024-05-21 11:21:27,595 [shard 0:strm] raft_topology - updating topology state: bootstrap: insert tokens and CDC generation data (UUID: 43b9ae9c-1764-11ef-3121-4996f1fa57c1)
Reactor stalled for 68 ms on shard 0. Backtrace: 0x5eb620a 0x5eb5615 0x5eb69cf 0x3dbaf 0x15b284 0x15d77ab 0x15d63a7 0x15d49be 0x156c3c6 0x48abdaf 0x4b32663 0x4b312c9 0x4b4625f 0x143838a 0x5ec7eaf 0x5ec9197 0x5ec84f8 0x5e56407 0x5e555cc 0x13c87b8 0x13ca200 0x13c6dcc 0x27b89 0x27c4a 0x13c42e4
Reactor stalled for 128 ms on shard 0. Backtrace: 0x5eb620a 0x5eb5615 0x5eb69cf 0x3dbaf 0x15b284 0x15d77ab 0x15d63a7 0x15d49be 0x156c3c6 0x48abdaf 0x4b32663 0x4b312c9 0x4b4625f 0x143838a 0x5ec7eaf 0x5ec9197 0x5ec84f8 0x5e56407 0x5e555cc 0x13c87b8 0x13ca200 0x13c6dcc 0x27b89 0x27c4a 0x13c42e4
ERROR 2024-05-21 11:21:27,880 [shard 0: gms] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 24 free segments)
Aborting on shard 0.
Backtrace:
  0x5eb6278
  0x5eecad1
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x3dbaf
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x8e883
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x3dafd
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x2687e
  0x2132317
  0x1d987ed
  0x1c4e9b8
  0x1b1a7a6
  0x1ba6ce7
  0x143838a
  0x5ec7eaf
  0x5ec9197
  0x5ec84f8
  0x5e56407
  0x5e555cc
  0x13c87b8
  0x13ca200
  0x13c6dcc
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x27b89
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x27c4a
  0x13c42e4

Job logs: link

aleksbykov · 2024-05-22T04:08:12Z

latest run: m5ad.12xlarge 192.0 GiB 48 vCPUs 1800 GB (2 * 900 GB NVMe SSD) 10 Gigabit

previous run: c5ad.8xlarge 64.0 GiB 32 vCPUs 1200 GB (2 * 600 GB NVMe SSD) 10 Gigabit

kbr-scylla · 2024-05-22T15:16:52Z

@aleksbykov we should compare to 5.4 for that larger instance as well.

kbr-scylla · 2024-05-22T15:19:18Z

Moving to 6.1, just to remove it from 6.0 blockers list for the time being.

Can you explain to me this decision?

@mykaul if it's not a blocker for 6.0, then we should get rid of status/release blocker label.
If it is a blocker for 6.0, then moving it to 6.1 is just asking for trouble (like causing us to forget that there's a critical issue that must be solved and releasing a broken product).

How can we pretend that we solved release blockers if we didn't solve them -- just for the sake of reducing some metric (number of 6.0 release blockers)?

mykaul · 2024-05-22T17:13:22Z

I'll try to explain - this limitation is not one that we'll fix quickly (otherwise we would have fixed it by now), but one we could document and fix later, in 6.0.x, which mean we need to begin with fixing it in 6.1 first. I moved it as is, with the 'release blocker' flag so it'll be a higher priority to fix than other items in the 6.1 queue.

aleksbykov · 2024-05-23T08:06:09Z

@aleksbykov we should compare to 5.4 for that larger instance as well.

@kbr-scylla , i started it yesterday.
the job also failed with same error : https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/branch-5.4-dtest-release/8/testReport/
node16 reported:
node16', ['ERROR 2024-05-22 18:34:12,265 [shard 0:main] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 16 free segments)', 'Aborting on shard 0.'])]
Other errors: [('node16', ['ERROR 2024-05-22 18:34:12,265 [shard 0:main] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 16 free segments)']), ('node87', ['ERROR 2024-05-22 18:34:17,547 [shard 0:stre] node_ops - bootstrap[07e0ae37-5504-4d34-a51c-4428505d0dfc]:

86 nodes was added, 87th node failed to start due to node16 aborted.

kbr-scylla · 2024-05-27T08:51:10Z

I don't think there is a real limitation. This is a dtest which is booting 100 nodes in a single machine, which is something that we don't officially support. And we know that it's also failing in 5.4, after booting similar number of nodes as in 6.0. So if there is a problem, it was already there.

So next steps we should do are:

@aleksbykov I know there is a 100 nodes longevity test (which boots a separate machine for each node) because I ran it when testing CDC 4 years ago (for 4.6 I think?) -- is it a plan of our regular testing cycle for release? We should run it at least once. If it passes then we're good.
@aleksbykov and regarding this dtest itself, do we know what was the last release when this dtest, succeeded, if any?

aleksbykov · 2024-05-27T09:01:03Z

@aleksbykov I know there is a 100 nodes longevity test (which boots a separate machine for each node) because I ran it when testing CDC 4 years ago (for 4.6 I think?) -- is it a plan of our regular testing cycle for release? We should run it at least once. If it passes then we're good.

@kbr-scylla , i think it was in plan. I will search it and trigger.

@aleksbykov and regarding this dtest itself, do we know what was the last release when this dtest, succeeded, if any?

I don't think it was ever run . We have 40 nodes test which is passed. And i created 100 nodes tests just as POC and check for single run as @mykaul requested.
do we need such test in dtest in regular basis? because for that, we will need mark it in special way, so this test request large instance

kbr-scylla · 2024-05-27T15:14:19Z

do we need such test in dtest in regular basis? because for that, we will need mark it in special way, so this test request large instance

Probably not. We don't need to support 100 nodes running on a single machine.

Also now I understand (after Avi pointed it out) that the problem is probably here:

smp, (positional) 2, memory, (positional) 1024M,

The larger instance you tried:

m5ad.12xlarge 192.0 GiB 48 vCPUs 1800 GB (2 * 900 GB NVMe SSD) 10 Gigabit

should be enough -- if each node takes 1GB of memory, and we boot 100 nodes, then 100GB of memory should be enough.

We're allocating 1GB for two shards so 512 MB per shard -- and most likely what happens, is that some topology-related metadata is trying to take over 512 MB of memory when we have 100 nodes. This metadata should not be that large. I would guess effective_replication_map is a significant contributor but even with 256 vnodes per node, so 100 * 256 vnodes in total, and 3 replicas for every vnode, it should not take that much memory. (How many keyspaces are there in an empty cluster? -- For each we need another e_r_m)

So @aleksbykov I have one more request: please run the test again on the larger instance, but this time use --memory 1536M instead of --memory 1024M like what apparently dtest uses for this test (maybe it's the default). Then it should pass.

aleksbykov · 2024-05-28T08:45:13Z

Ok i will check

aleksbykov assigned kbr-scylla May 14, 2024

aleksbykov added scale/large cluster area/topology changes area/raft labels May 14, 2024

aleksbykov mentioned this issue May 14, 2024

Scale test (adding 40 nodes) failed by timeouts(nodes added to long) and errors on teardown: seastar::rpc::remote_verb_error (Mutation of 17696794 bytes is too large for the maximum size of 16777216) #17545

Closed

kbr-scylla added this to the 6.0 milestone May 14, 2024

kbr-scylla added the status/release blocker Preventing from a release to be promoted label May 14, 2024

kbr-scylla assigned kostja May 14, 2024

mykaul modified the milestones: 6.0, 6.1 May 19, 2024

mykaul added the P1 Urgent label May 20, 2024

kbr-scylla modified the milestones: 6.1, 6.0.1 May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTest]: Scale test with 100 nodes failed with Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments) #18669

[DTest]: Scale test with 100 nodes failed with Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments) #18669

aleksbykov commented May 14, 2024

kbr-scylla commented May 14, 2024

kbr-scylla commented May 14, 2024

aleksbykov commented May 14, 2024

kbr-scylla commented May 14, 2024

aleksbykov commented May 15, 2024

aleksbykov commented May 15, 2024

kbr-scylla commented May 15, 2024

aleksbykov commented May 15, 2024

aleksbykov commented May 15, 2024

aleksbykov commented May 16, 2024

kbr-scylla commented May 16, 2024

aleksbykov commented May 17, 2024

aleksbykov commented May 17, 2024

kbr-scylla commented May 17, 2024

yaronkaikov commented May 19, 2024

aleksbykov commented May 19, 2024

mykaul commented May 19, 2024

yaronkaikov commented May 20, 2024

aleksbykov commented May 21, 2024

aleksbykov commented May 22, 2024

aleksbykov commented May 22, 2024

kbr-scylla commented May 22, 2024

kbr-scylla commented May 22, 2024

mykaul commented May 22, 2024

aleksbykov commented May 23, 2024

kbr-scylla commented May 27, 2024

aleksbykov commented May 27, 2024

kbr-scylla commented May 27, 2024

aleksbykov commented May 28, 2024

[DTest]: Scale test with 100 nodes failed with Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments) #18669

[DTest]: Scale test with 100 nodes failed with Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments) #18669

Comments

aleksbykov commented May 14, 2024

kbr-scylla commented May 14, 2024

kbr-scylla commented May 14, 2024

aleksbykov commented May 14, 2024

kbr-scylla commented May 14, 2024

aleksbykov commented May 15, 2024

aleksbykov commented May 15, 2024

kbr-scylla commented May 15, 2024

aleksbykov commented May 15, 2024

aleksbykov commented May 15, 2024

aleksbykov commented May 16, 2024

kbr-scylla commented May 16, 2024

aleksbykov commented May 17, 2024

aleksbykov commented May 17, 2024

kbr-scylla commented May 17, 2024

yaronkaikov commented May 19, 2024

aleksbykov commented May 19, 2024

mykaul commented May 19, 2024

yaronkaikov commented May 20, 2024

aleksbykov commented May 21, 2024

aleksbykov commented May 22, 2024

aleksbykov commented May 22, 2024

kbr-scylla commented May 22, 2024

kbr-scylla commented May 22, 2024

mykaul commented May 22, 2024

aleksbykov commented May 23, 2024

kbr-scylla commented May 27, 2024

aleksbykov commented May 27, 2024

kbr-scylla commented May 27, 2024

aleksbykov commented May 28, 2024