Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DTest]: Scale test with 100 nodes failed with Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments) #18669

Open
aleksbykov opened this issue May 14, 2024 · 29 comments
Assignees
Milestone

Comments

@aleksbykov
Copy link
Contributor

Installation details
Scylla version (or git commit hash): Scylla version 5.5.0~dev-0.20240513.2ce643d06bf0 with build-id 2e2c89cbb469c1231861753c4af823040a31579e
Cluster size: up to 100 nodes

POC dtest update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load_100_nodes](https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/269/artifact/logs-full.release.000/1715679351559_update_cluster_layout_tests.py%3A%3ATestLargeScaleCluster%3A%3Atest_add_many_nodes_under_load_100_nodes/) failed upon adding 72nd node.

with one hundred nodes test failed on node 72: with startup failed because node1(topology coordinator failed with core):

INFO  2024-05-14 09:35:44,350 [shard 0:main] init - Shutting down sighup was successful
INFO  2024-05-14 09:35:44,350 [shard 0:main] init - Shutting down configurables
INFO  2024-05-14 09:35:44,350 [shard 0:main] init - Shutting down configurables was successful
ERROR 2024-05-14 09:35:44,350 [shard 0:main] init - Startup failed: std::runtime_error (Bootstrap failed. See earlier errors (Rolled back: Failed to commit cdc generation: std::runtime_error (raft topology: exec_global_command(barrier) failed with seastar::rpc::closed_error (connection is closed))))

Node1 reported:

INFO  2024-05-14 09:35:09,187 [shard 0:main] raft_group_registry - marking Raft server 653fc7f9-529e-4226-a3a2-c59480e630ef as alive for raft groups
WARN  2024-05-14 09:35:09,217 [shard 0:stmt] mutation_partition - Memory usage of unpaged query exceeds soft limit of 1048576 (configured via max_memory_for_unlimited_query_soft_limit)
WARN  2024-05-14 09:35:09,231 [shard 0:stmt] mutation_partition - Memory usage of unpaged query exceeds soft limit of 1048576 (configured via max_memory_for_unlimited_query_soft_limit)
INFO  2024-05-14 09:35:10,057 [shard 0:strm] boot_strapper - Get random bootstrap_tokens={1433426564769120107, -3024130438421483493, 6783300416989774946, -1913632903514385366, -1296153106641068349, 1007429126839380150, 7984639701427359074, -5786157125825623110, -7359984701809794846, 3522986124224876937, -3441361695749005185, -1206305428627841913, -3124027604652700064, 2725068477338432786, 5205155087332034779, 2433256482079879044, 3679214181567140968, -5816330808577396288, -118004484174318508, 9039937626987112835, -8986726699696615599, -2490997589493233577, -7403114531822865343, -5716614150496756632, -7493452086901961486, -2617472633652495306, -3959849680369634054, -6734145791635177944, 3700399166162257380, 4663913147842742309, -7037273237366318004, -3628651639882969965, -7687167558551732292, -4990535938613735299, 5358693883544366981, -4977344577775304537, -8589655038447844655, 7083891868698313787, -8863804598399540463, -8771150581826651932, -5767987021411659170, 5218986179965462847, 2563434673873028804, 7162972308007128904, -8252511960786747378, 6376868954473548859, -76401018025807680, 2710714891402271653, 1656538783571724615, 418032343716706861, 6697443819176173875, -3514937049858708671, -1863753423742103941, -6400004109999888752, -2692834300114109558, 1772721558144266394, -2575621367426696002, 383763399075815123, 214228935000835408, -6262633267021563247, 1690187744831784835, -3384833187351254953, 1891341548161072751, 2528595013806715080, -1284056852921209896, 5133445597581039797, 6019522473871943281, 7315350882739636578, 9051377965348576980, -1397375351519384899, -6205047198403160958, 6859895283439108025, 7954942467421122994, -7872251089852618669, 8571203620064791618, -3677958235254371874, -6859785428053165913, 2282647432688241035, 1800670181316544576, -1859478944350163913, -2250514226640233767, -7938529939655239330, 4206354305925898378, -2937263044357094893, -5008816272078468082, -5498669479214528962, -5958746016302597292, -8971552293989757935, -9106251332869317242, 5918305565532252436, -3841807879079826123, -7323785487124108681, -7735831486495017644, 7204087527042597046, -7232993587521638169, -4755186106873936807, -958779569968857377, 1916323627092394585, 7212456190278428228, 7157569241964554297, -6515071474984575560, -7911708490078297653, -8150780494158497303, -6323264798375069181, -5039969876466999202, 8169410260307209331, -8101573313025566123, 825092948703179705, 5122021485732998520, 724335777260500712, 1572630048126628947, 625943744704382258, -5729509194927664742, -945883982220334658, -3266957823789072687, 3200194487146988246, 4953675008171200170, 5956725420241501323, 4357968297482089038, -8113442556620021593, -5462829191312072664, 3523901485555516340, -2603353483898286824, 3932941005767061773, -6556989695979498498, 7757339169541007516, 8818407673633376845, 1896016089707419861, -2797834054124500170, -7156250960156119166, 632977869553601190, 4424575218781430862, 3758371095514587868, -4528231696225597070, -8077240885193817196, -676982939146095600, 32681540223590446, -7711062772471220431, -3152853350762551531, 7725397093181086598, -3161376335528435746, 7771306044333077868, -3821692663245420267, -6306188336484368604, -3094319437721657761, -802116362462267891, 9214007929849164490, 116433898980682754, 7697789486638277069, -4075465411742528477, -7326517739065625923, 2112368831788757319, -6170387378614800072, 2253725650358131408, 6700888590293025599, 2168865430477276424, 695649264342536306, -7971750163187343559, -3170733055132740834, 7861011416081269000, -2273925284435075797, 5114868705041233173, 5564589071519102085, 6713810811342183230, -356363176651482907, 4275472870740204854, 5697228932344174734, 6691864265103870369, 625188329207482295, -868587851645524464, 7996393205484900109, -390104312926003599, 1328171356989826527, -974771486148213499, 6091405376432035272, -6433226178234882108, 7593238809332649617, -8109742698956844364, 4646230940499213604, -9117438798757406119, 3991923048465018930, -8227945738875681247, -1253549626841561311, -7875609559368782488, 2859925769821374569, -4086010592457615043, 4574271297010416851, 7018320147844907384, 3047475207172594410, -3154963074535217256, -3444347913241611466, 4042803308545958286, 2664649119029700058, 5091259864602170296, 6602591859512010379, 5185395581325197002, 1150375261101609956, -3501784990426085068, -1337623305353950550, -9122397680101266306, 5163435014056426504, -7303688106280964990, -8713058669476528687, -3505071432777038461, 3858149921735920, -3825907779282338016, 76516151967863266, -4836007009399991140, -4950983275267961235, -4336198082505719916, 6119017763310869985, 6741462379366766122, 1976938571152401547, -401356364890233257, 7350557784710934185, 1502329198412860627, -1376843198472012287, 6458266078616169864, 25452585543239930, -7562222136015234239, -3745334077680945772, 6721213754361735353, -6882100446714222226, 7611898378492396288, -7585761129002046485, 8559125686671297708, 5711293982266954434, -2619363999579083185, 3484361002832311692, 8019752824201374624, 1243571285313864769, 5317769688969194095, -3994366891095412569, -9125477489958615233, 5298270704832328366, 4728581015820436936, -3441737662948346413, 7216465223581840673, -7869446220183283232, 5500695658777323719, 5211148161944101124, 6445779647686489488, 6729456479580983997, -8760014131959324523, -7673231066596070984, 2136180900933206657, 660876118807753351, 4194089243637930954, 3434951662449236451, -7144918282434882076, -1629403148235721976, -2875386342848361916, 2601130810192064008, 5832775409262545507, 4633047589837472532, 8995914355180526706}
INFO  2024-05-14 09:35:10,211 [shard 0:strm] raft_topology - updating topology state: bootstrap: insert tokens and CDC generation data (UUID: 41bfd970-11d5-11ef-59b4-87d0a0ed676e)
ERROR 2024-05-14 09:35:10,267 [shard 0: gms] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 12 free segments)
Aborting on shard 0.
Backtrace:
  0x5eb6d48
  0x5eed5a1
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x3dbaf
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x8e883
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x3dafd
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x2687e
  0x213a667
  0x1d995bd
  0x1c4cf28
  0x1b19ad6
  0x1ba5fb7
  0x143784a
  0x5ec897f
  0x5ec9c67
  0x5ec8fc8
  0x5e56ed7
  0x5e5609c
  0x13c7cb8
  0x13c9700
  0x13c62cc
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x27b89
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/2ce643d06bf04269661c1612b9b209a6ecc2a1b1/libreloc/libc.so.6+0x27c4a
  0x13c37e4
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:68
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:825
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:855
seastar::print_with_backtrace(char const*, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:867
 (inlined by) seastar::sigabrt_action() at ./build/release/seastar/./seastar/src/core/reactor.cc:4071
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4047
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:4043
/data/scylla-s3-reloc.cache/by-build-id/2e2c89cbb469c1231861753c4af823040a31579e/extracted/scylla/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=70e92bb237883be3065a6afc9f0696aef2d068bf, for GNU/Linux 3.2.0, not stripped

__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2945
decltype(auto) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}>(logalloc::region, logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}) at ././utils/logalloc.hh:473
 (inlined by) decltype(auto) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:529
 (inlined by) operator() at ./replica/memtable.cc:800
 (inlined by) decltype(auto) with_allocator<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0>(allocation_strategy&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0&&) at ././utils/allocation_strategy.hh:318
 (inlined by) replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&) at ./replica/memtable.cc:799
void replica::table::do_apply<frozen_mutation const&, seastar::lw_shared_ptr<schema const>&>(replica::compaction_group&, db::rp_handle&&, frozen_mutation const&, seastar::lw_shared_ptr<schema const>&) at ./replica/table.cc:2816
 (inlined by) operator() at ./replica/table.cc:2844
 (inlined by) seastar::future<void> seastar::futurize<void>::invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&>(replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&) at ././seastar/include/seastar/core/future.hh:2032
 (inlined by) auto seastar::futurize_invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&>(replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&) at ././seastar/include/seastar/core/future.hh:2066
 (inlined by) seastar::futurize<std::result_of<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0 ()>::type>::type replica::dirty_memory_manager_logalloc::region_group::run_when_memory_available<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0>(replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) at ./replica/dirty_memory_manager.hh:572
 (inlined by) replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) at ./replica/table.cc:2843
replica::database::apply_in_memory(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) at ./replica/database.cc:1833
replica::database::do_apply(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) at ./replica/database.cc:2053
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
 (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:125
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2690
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3152
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3320
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3210
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:682
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:2172
__libc_start_call_main at ??:?
__libc_start_main_alias_2 at :?
_start at ??:?

Test is presented by this commit: https://github.com/aleksbykov/scylla-dtest/commit/e9be258e70810650bcf3bda46af06f4b5d720b00

@kbr-scylla
Copy link
Contributor

My comments on the old issue:

Summarizing, we should compare if it's a regression or not, it could indeed be as @aleksbykov suggested that the instance is too small (running out of memory), because we have 72 nodes running on a single machine (this is a dtest).

If it happens on 5.4 too, there should be nothing to worry about.

@kbr-scylla
Copy link
Contributor

Marking as release blocker for now, before checking the above, but there's a chance we will be able to take it off and/or close the issue quickly.

@kbr-scylla kbr-scylla added this to the 6.0 milestone May 14, 2024
@kbr-scylla kbr-scylla added the status/release blocker Preventing from a release to be promoted label May 14, 2024
@aleksbykov
Copy link
Contributor Author

We could compare 3 runs:

  • master with --force-gossip-topology-changes
  • master with raft-topology (we already have this and know it breaks)
  • 5.4

@kbr-scylla
Copy link
Contributor

From the other issue:

Also, how many shards is this using? Is it --smp 2 (IIRC the default with dtest)? Is it running with --overprovisioned?

yes:
smp, (positional) 2, memory, (positional) 1024M,

But I see no overprovisioned. Which means all nodes are competing for resources on the same two shards, IIUC.

@aleksbykov you could also check whether it keeps failing with --overprovisioned flag (passed to seastar, same as --smp 2). Only for master raft-topology mode.

@aleksbykov
Copy link
Contributor Author

  • master with --force-gossip-topology-changes

Job https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/272/ also failed. also around 71 node. not all logs are collected yet. But also there is coredumps.

@aleksbykov
Copy link
Contributor Author

From the other issue:

Also, how many shards is this using? Is it --smp 2 (IIRC the default with dtest)? Is it running with --overprovisioned?

yes:
smp, (positional) 2, memory, (positional) 1024M,

But I see no overprovisioned. Which means all nodes are competing for resources on the same two shards, IIUC.

@aleksbykov you could also check whether it keeps failing with --overprovisioned flag (passed to seastar, same as --smp 2). Only for master raft-topology mode.

@kbr-scylla , the test was run with enabled --overprovisioned option according to log in ScyllaNode.start method, because cpuset was not passed in parameters:

# TODO add support for classes_log_level
        if '--cpuset' not in args:
            args += ['--overprovisioned']
        if '--prometheus-address' not in args:

@kbr-scylla
Copy link
Contributor

Job https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/272/ also failed. also around 71 node. not all logs are collected yet. But also there is coredumps.

Ok, so there's a chance there's no regression. Let's confirm with 5.4 run

@kbr-scylla , the test was run with enabled --overprovisioned option according to log in ScyllaNode.start method, because cpuset was not passed in parameters:

Good

@aleksbykov
Copy link
Contributor Author

Ok, so there's a chance there's no regression. Let's confirm with 5.4 run

it is running for now

@aleksbykov
Copy link
Contributor Author

5.4
it is running for now

Failed by timeout. Configured timeout for test was not enough. Increased and job has been rerun.
68 nodes were provisioned successfully

@kbr-scylla
Copy link
Contributor

Increased and job has been rerun. 68 nodes were provisioned successfully

Are you saying it failed at node 69?

@aleksbykov
Copy link
Contributor Author

@yaronkaikov , is it possible to run these custom test: master-dtest-with-raft/269 with 100 nodes on more powerfull instance?

@kbr-scylla
Copy link
Contributor

Issue reproduced with 5.4. Looks very similar abort:

So it's most likely as you said -- the instance is too weak to deal with that many nodes.

Let's see if the test passes on master (with default i.e. raft-topology mode) on a stronger instance and if so, I'll close this issue.

@yaronkaikov
Copy link
Contributor

@yaronkaikov , is it possible to run these custom test: master-dtest-with-raft/269 with 100 nodes on more powerfull instance?

How many instances do you need? we may need to create a new auto-scaling group with more powerful instances for testing

btw, you have a job running now for 5 days https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/272/

@aleksbykov
Copy link
Contributor Author

How many instances do you need? we may need to create a new auto-scaling group with more powerful instances for testing

At the moment, I'm considering running these tests to verify a theory that the issue lies in the lack of resources on the instance to perform tests with 100 nodes. I want to do this through Jenkins because it will be more convenient to share the results and rerun the tests or reproduce the problem.

As far as I know, we currently have one test with such a large configuration.

The new auto-scaling groups sound great, and it should be easy to set them up in the current job parameters, if I'm not mistaken.

@mykaul mykaul modified the milestones: 6.0, 6.1 May 19, 2024
@mykaul
Copy link
Contributor

mykaul commented May 19, 2024

Moving to 6.1, just to remove it from 6.0 blockers list for the time being.

@yaronkaikov
Copy link
Contributor

How many instances do you need? we may need to create a new auto-scaling group with more powerful instances for testing

At the moment, I'm considering running these tests to verify a theory that the issue lies in the lack of resources on the instance to perform tests with 100 nodes. I want to do this through Jenkins because it will be more convenient to share the results and rerun the tests or reproduce the problem.

As far as I know, we currently have one test with such a large configuration.

The new auto-scaling groups sound great, and it should be easy to set them up in the current job parameters, if I'm not mistaken.

@aleksbykov Done, you can use it by using the label ec2-strong-dtest-asg-testing
Let me know if any assistance is required

@mykaul mykaul added the P1 Urgent label May 20, 2024
@aleksbykov
Copy link
Contributor Author

Latest job with stronger instance failed by timeout (not enough time configured for test running). Timeout was increased and job is rerun
https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/master-dtest-with-raft/276

@aleksbykov
Copy link
Contributor Author

Job failed on large instance also with raft topology. Topology coordinator failed with same error:

WARN  2024-05-21 11:21:25,443 [shard 0:stmt] querier - Read 36096 live rows and 1495 tombstones for system.cdc_generations_v3 <partition-range-scan> (-inf, +inf) (see tombstone_warn_threshold)
INFO  2024-05-21 11:21:27,218 [shard 0:strm] boot_strapper - Get random bootstrap_tokens={-8148082370294179137, 4620014630779972309, 2424700765112090203, 5466524275082028577, -5819510943213408016, -6058870038039184230, -3241555648671600928, 365242135387759817, 344700842578459237, 9852199498991121, -7051368105332904415, 3645027930726209707, 3336758005247463872, 3151283819991638200, 6413107386812507710, -8415595805950628215, 160351155132405632, -6929708672024747366, -1298086245746467984, 6612158933228204982, 7125578591291886972, -3610989874596270173, 3489176897153492974, 2560917642279427575, -5352136660562995027, -3369999300053744040, -76676475761430343, -3632101459773386692, 4411085217939970496, -4335513586063779825, -6106066598250354729, -5189486550360641118, 8462762049898172610, 5021478606918039844, -201264105394937713, -3306142757747260037, 6412029416747550341, -7768211040909738263, -7077441556879510791, 8304821689512092746, 1273571948249123837, 2119863080085069643, 4687280360120818632, -9136809892799794339, -3338614652533959203, 7147738390767700898, 1908461327283427597, -2008754501187927089, 4105394895775541757, 911213472174970829, 6782978461574343927, -3431911020852380894, -3622773639082677625, -394559251076010123, -6162423987369491756, -9098694345347253717, 1850950852828471361, 7978162429888121207, 1921388730251161511, -8567699539280143237, 2252744047228501961, -2857301303074841759, 4821896340125357258, -6600331649644831782, -58680104134114103, -5242868753697015731, -6452535797929285944, 8507309621055389583, -7107446121823092960, -1340666642063605725, -3401923917234982068, -4521397087053464556, 4437299910591387488, -3499066501814971103, 4102749158037290879, 1016877770200575903, -5064464502860479572, 8209077802042232983, -3399662567045315019, 2017125442922007801, 5932932212593547490, -6678674888555589282, -6978952762247964257, 7594281966267746597, 4398652639879150001, 7453869979177745236, -3317121570343834981, 2068127202136093344, 4249406911822678995, -2477818714222436557, 6506617384180992935, -4629272698508198586, 3177129638637710223, 6643884996812542730, 8600404830858861505, -4429612224043985033, 8474182966265523017, -673460548772853838, 1103768140372311150, 9008326038061939511, 3444359836164395419, 4284761482367643047, 8125349991362818918, -6216015612802542476, 6447845345552351454, -4420561220456685378, -6394953367176667843, -4964902413272461850, 4636917662681572253, -6673354119025374541, -7157487177381974458, -5284322919171515834, -4301206186132276216, 7850326886209304523, -5958159561846355269, -8683256762003195675, 5033441410664747971, -8387731195143571846, 2623901874196853466, -7485026136944046076, -8140844621599473586, -1865444627593952243, 7414647625024164361, -1535205147557040105, -2551681060292695769, -385146284139840059, 8750040605127626801, 1579192154523762126, 413906829988706890, 7873724730433842438, 3486520900805561309, -7235569402645139367, 6192605429040670520, -6185156084978561529, 1196762009629406002, -5604133415447852330, 6916528604891467828, -3899157593798354426, 3564390682438912740, 3794445677863966919, 7970574733109596588, 4167476357157500990, -7472300876046469405, -31701128956443517, 1092321986579620281, 6536346001758865963, -6218170212892174542, -3137823890598769102, 964307931925631166, -1352906456188764034, -4026320546777759257, 7998151608920923687, 8947437356414172739, -3577966604773211933, 4087410578068285012, -1758739494081223076, 6725993246402721026, -3423746303587087475, 7149038341512000324, -3543291593566646525, 1011459796209873376, -5753173041309555, 7333770743603801677, 3266080243434171810, -1526391202249763020, -9056477046259225569, -8396047059933248764, 5717855096518878327, -6094657385760456366, -3033135599316612957, 7245006528961674578, 6287426114655736977, 5558496196118483471, 8933380522828126217, -4136573185681528021, -3867988413589020891, -6671115575290535158, -6281009673214061891, -5355853129142193017, -136982392358733346, 8208784309597806026, 3584230947347786035, 7799991260160768662, -3215905248271767805, -6965466787969870072, -5675712017193737415, 4085471617089157294, 5713054318453385712, -5294655383524214585, -6092797742710560203, 5180461121356505137, -93307238965597135, -8127775389111821740, 7866415343271525260, -7446270076006360062, 1830469166041761055, -9214811591090972654, -7402597557596279682, -5963022797815006130, 6725014179471126572, -4905670800956842945, -4068997425995266188, 4245016001398678666, 5650579168698380707, -6486595560993535505, 5759641652113510576, -4306875110583448576, -2516207563320145030, -1045337515660436424, -9131480170340207847, -5566940313929962364, 2957197601716144215, 7714984177018917108, 2591249728959614939, 7138340952407306477, 6553305832899964110, -5378101852316764767, 5433361589529561577, 3531051422956681915, 3588741888014605893, -1970777295714585533, -8890864527901804596, -776856418447315581, 4421606580727532839, -3669067063683287650, -4614363199867221190, 999467127174365378, -5328984352950268126, 8706513111718155223, -3720878552975730432, 1761972805123737693, -7851560238669269810, 8962173972678665017, 6763336745967353200, -7250757404177364974, 7276013736096824235, 2036833116857911970, -185249593215815125, 929326221146836045, -5766852694492296916, 998932530559062399, -5298191210807773962, 1486585713675909357, -4560025680954262302, -5846980093959511276, -3614167833814323012, -8528077215314674759, -5748209719068550576, -3108027903784201817, 3064490542610400795, -4706179066340757480, -2186542488114433132, -5479046369551479833, -5761218520893471808, 4557334126294098760, -4584107441795617998}
INFO  2024-05-21 11:21:27,595 [shard 0:strm] raft_topology - updating topology state: bootstrap: insert tokens and CDC generation data (UUID: 43b9ae9c-1764-11ef-3121-4996f1fa57c1)
Reactor stalled for 68 ms on shard 0. Backtrace: 0x5eb620a 0x5eb5615 0x5eb69cf 0x3dbaf 0x15b284 0x15d77ab 0x15d63a7 0x15d49be 0x156c3c6 0x48abdaf 0x4b32663 0x4b312c9 0x4b4625f 0x143838a 0x5ec7eaf 0x5ec9197 0x5ec84f8 0x5e56407 0x5e555cc 0x13c87b8 0x13ca200 0x13c6dcc 0x27b89 0x27c4a 0x13c42e4
Reactor stalled for 128 ms on shard 0. Backtrace: 0x5eb620a 0x5eb5615 0x5eb69cf 0x3dbaf 0x15b284 0x15d77ab 0x15d63a7 0x15d49be 0x156c3c6 0x48abdaf 0x4b32663 0x4b312c9 0x4b4625f 0x143838a 0x5ec7eaf 0x5ec9197 0x5ec84f8 0x5e56407 0x5e555cc 0x13c87b8 0x13ca200 0x13c6dcc 0x27b89 0x27c4a 0x13c42e4
ERROR 2024-05-21 11:21:27,880 [shard 0: gms] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 24 free segments)
Aborting on shard 0.
Backtrace:
  0x5eb6278
  0x5eecad1
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x3dbaf
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x8e883
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x3dafd
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x2687e
  0x2132317
  0x1d987ed
  0x1c4e9b8
  0x1b1a7a6
  0x1ba6ce7
  0x143838a
  0x5ec7eaf
  0x5ec9197
  0x5ec84f8
  0x5e56407
  0x5e555cc
  0x13c87b8
  0x13ca200
  0x13c6dcc
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x27b89
  /jenkins/workspace/scylla-staging/abykov/Dtest/master-dtest-with-raft/scylla/.ccm/scylla-repository/a517fcf970ec3b55a43c2108a523e5903eb2c30a/libreloc/libc.so.6+0x27c4a
  0x13c42e4

Job logs: link

@aleksbykov
Copy link
Contributor Author

latest run: m5ad.12xlarge 192.0 GiB 48 vCPUs 1800 GB (2 * 900 GB NVMe SSD) 10 Gigabit

previous run: c5ad.8xlarge 64.0 GiB 32 vCPUs 1200 GB (2 * 600 GB NVMe SSD) 10 Gigabit

@kbr-scylla
Copy link
Contributor

@aleksbykov we should compare to 5.4 for that larger instance as well.

@kbr-scylla
Copy link
Contributor

Moving to 6.1, just to remove it from 6.0 blockers list for the time being.

Can you explain to me this decision?

@mykaul if it's not a blocker for 6.0, then we should get rid of status/release blocker label.
If it is a blocker for 6.0, then moving it to 6.1 is just asking for trouble (like causing us to forget that there's a critical issue that must be solved and releasing a broken product).

How can we pretend that we solved release blockers if we didn't solve them -- just for the sake of reducing some metric (number of 6.0 release blockers)?

@mykaul
Copy link
Contributor

mykaul commented May 22, 2024

I'll try to explain - this limitation is not one that we'll fix quickly (otherwise we would have fixed it by now), but one we could document and fix later, in 6.0.x, which mean we need to begin with fixing it in 6.1 first. I moved it as is, with the 'release blocker' flag so it'll be a higher priority to fix than other items in the 6.1 queue.

@aleksbykov
Copy link
Contributor Author

@aleksbykov we should compare to 5.4 for that larger instance as well.

@kbr-scylla , i started it yesterday.
the job also failed with same error : https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/Dtest/job/branch-5.4-dtest-release/8/testReport/
node16 reported:
node16', ['ERROR 2024-05-22 18:34:12,265 [shard 0:main] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 16 free segments)', 'Aborting on shard 0.'])]
Other errors: [('node16', ['ERROR 2024-05-22 18:34:12,265 [shard 0:main] lsa - Aborting due to allocation failure: failed to refill emergency reserve of 30 (have 16 free segments)']), ('node87', ['ERROR 2024-05-22 18:34:17,547 [shard 0:stre] node_ops - bootstrap[07e0ae37-5504-4d34-a51c-4428505d0dfc]:

86 nodes was added, 87th node failed to start due to node16 aborted.

@kbr-scylla
Copy link
Contributor

I don't think there is a real limitation. This is a dtest which is booting 100 nodes in a single machine, which is something that we don't officially support. And we know that it's also failing in 5.4, after booting similar number of nodes as in 6.0. So if there is a problem, it was already there.

So next steps we should do are:

  • @aleksbykov I know there is a 100 nodes longevity test (which boots a separate machine for each node) because I ran it when testing CDC 4 years ago (for 4.6 I think?) -- is it a plan of our regular testing cycle for release? We should run it at least once. If it passes then we're good.
  • @aleksbykov and regarding this dtest itself, do we know what was the last release when this dtest, succeeded, if any?

@aleksbykov
Copy link
Contributor Author

  • @aleksbykov I know there is a 100 nodes longevity test (which boots a separate machine for each node) because I ran it when testing CDC 4 years ago (for 4.6 I think?) -- is it a plan of our regular testing cycle for release? We should run it at least once. If it passes then we're good.

@kbr-scylla , i think it was in plan. I will search it and trigger.

@aleksbykov and regarding this dtest itself, do we know what was the last release when this dtest, succeeded, if any?

I don't think it was ever run . We have 40 nodes test which is passed. And i created 100 nodes tests just as POC and check for single run as @mykaul requested.
do we need such test in dtest in regular basis? because for that, we will need mark it in special way, so this test request large instance

@kbr-scylla
Copy link
Contributor

do we need such test in dtest in regular basis? because for that, we will need mark it in special way, so this test request large instance

Probably not. We don't need to support 100 nodes running on a single machine.


Also now I understand (after Avi pointed it out) that the problem is probably here:

smp, (positional) 2, memory, (positional) 1024M,

The larger instance you tried:

m5ad.12xlarge 192.0 GiB 48 vCPUs 1800 GB (2 * 900 GB NVMe SSD) 10 Gigabit

should be enough -- if each node takes 1GB of memory, and we boot 100 nodes, then 100GB of memory should be enough.

We're allocating 1GB for two shards so 512 MB per shard -- and most likely what happens, is that some topology-related metadata is trying to take over 512 MB of memory when we have 100 nodes. This metadata should not be that large. I would guess effective_replication_map is a significant contributor but even with 256 vnodes per node, so 100 * 256 vnodes in total, and 3 replicas for every vnode, it should not take that much memory. (How many keyspaces are there in an empty cluster? -- For each we need another e_r_m)

So @aleksbykov I have one more request: please run the test again on the larger instance, but this time use --memory 1536M instead of --memory 1024M like what apparently dtest uses for this test (maybe it's the default). Then it should pass.

@aleksbykov
Copy link
Contributor Author

Ok i will check

@kbr-scylla kbr-scylla modified the milestones: 6.1, 6.0.1 May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants