refactor client tracking, fix atomicity, squashing and multi/exec #2970

kostasrim · 2024-04-29T14:00:13Z

add partial support for CLIENT CACHING TRUE (only to be used with TRACKING OPTIN)
add OPTIN to CLIENT TRACKING command
refactor client tracking to respect transactional atomicity
fixed multi/exec and disabled squashing with client tracking
add tests

Resolves #2969, #2971, #2997, #2998

P.s. All tests in rueidis TestSingleClientIntegration pass except pub/sub because we don't yet support it see #3001

src/facade/dragonfly_connection.h

src/server/server_family.cc

src/server/main_service.cc

src/facade/dragonfly_connection.cc

src/facade/dragonfly_connection.h

src/facade/dragonfly_connection.cc

kostasrim · 2024-05-02T10:43:05Z

src/server/transaction.cc

@@ -838,13 +838,23 @@ OpStatus Transaction::ScheduleSingleHop(RunnableType cb) {

 // Runs in coordinator thread.
 void Transaction::Execute(RunnableType cb, bool conclude) {
+  auto tracking_wrap = [cb, this](Transaction* t, EngineShard* shard) -> RunnableResult {


@dranikpg this with the changes in InvokeCmd seemed to be the most non intrusive way (to comply with the requirements of the state machine)

I would like to understand why tracking requires transaction semantics (an example will be fine)

why cid_ is not enough and we need invoke_cid_ ?

It would suggest to do it like we handle blocking commands - once it finished and manually from RunSquashedCb

dragonfly/src/server/transaction.cc

Lines 638 to 640 in a95419b

if (auto* bcontroller = shard->blocking_controller(); bcontroller) {

if (awaked_prerun || was_suspended) {

bcontroller->FinalizeWatched(GetShardArgs(idx), this);

So it becomes if (concluding || (multi && multi_->concluding)) Track(this)

Now you don't need invoke_cid, etc there as well

I would like to understand why tracking requires transaction semantics (an example will be fine)

Because invalidation messages must be sent before the transaction concludes. Otherwise, we might accidentally skip them. An example would be:

>> CLIENT TRACKING ON >> GET FOO >> SET FOO BAR >> GET FOO >> SET FOO BAR >> GET FOO ---------> might miss Invalidation message

A valid execution would be once we call the first SET we will send an invalidation message as a separate transaction. Now before that even starts/concludes, the GET that follows will get executed first and it will itself issue a separate transaction to send an invalidation message. Now the problem here is, that once we send an invalidation message we remove the key from the tracking map (since we only send invalidation messages once until the key is reread). Then the second invalidation transaction won't work because the key no longer exists in the map and we will never get that second invalidation message.

kostasrim · 2024-05-02T10:46:25Z

src/server/conn_context.cc

@@ -119,6 +119,13 @@ void ConnectionContext::ChangeMonitor(bool start) {
  EnableMonitoring(start);
 }

+ConnectionState::ClientTracking& ConnectionContext::ClientTrackingInfo() {
+  if (parent_cntx_) {
+    return parent_cntx_->conn_state.tracking_info_;


That;s for squashing :)

If you access conn_state, don't make it a function on conn_cntx

then you can just use conn_state, you can make it mutable or add a new member like conn

dragonfly/src/server/main_service.cc

Lines 214 to 215 in a95419b

if (cntx->conn_state.squashing_info)

cntx = cntx->conn_state.squashing_info->owner;

src/facade/dragonfly_connection.cc

src/server/conn_context.h

src/server/conn_context.cc

romange · 2024-05-02T11:50:37Z

src/server/db_slice.cc

  }
+  auto& client_set = it->second;


consider client_tracking_map_.extract(key) function that can combine find and delete in one call.

doesn't satisfy extract requirements (for the value_type) and generates a compiler error.

src/server/db_slice.cc

romange · 2024-05-02T11:54:07Z

src/server/main_service.cc

@@ -1206,6 +1186,7 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx)
    if (stored_cmd.Cid()->IsWriteOnly()) {
      dfly_cntx->conn_state.exec_info.is_write = true;
    }
+    dfly_cntx->conn_state.tracking_info_.UpdatePrevAndLastCommand();


why do you need to call it here?

romange · 2024-05-02T11:56:51Z

src/server/conn_context.cc

+}
+
+void ConnectionState::ClientTracking::UpdatePrevAndLastCommand() {
+  if (prev_command_ && multi_) {


seems that what you really want is to know if you are in the middle of EXEC execution and not multi.

We store so much fragile info that needs to be updated everywhere... seqnums would solve all this

romange · 2024-05-02T12:04:29Z

src/server/conn_context.h

+    // Enable tracking on the client
+    void TrackClientCaching();
+
+    void UpdatePrevAndLastCommand();


nit: UdatePrevAndLastCommand name describes the implementation of this function. What it does is advancing the state. So I think it's better call it Tick or Advance or Update

romange · 2024-05-02T12:05:46Z

src/server/conn_context.h

+    // true if the previous command invoked is CLIENT CACHING TRUE
+    bool prev_command_ = false;
+    // true if the currently executing command is CLIENT CACHING TRUE
+    bool executing_command_ = false;


rename: executing_command_ to track_next_cmd_

but the track_next_cmd_ seems misleading since it implies it's the next command. executing_command_ is the command we currently execute in InvokeCmd flow and prev_command_ is the command before it. So:

>> GET FOO ----> prev_command >> GET BAR ----> current_command

romange · 2024-05-02T12:06:34Z

src/server/conn_context.h

+    bool optin_ = false;
+    // remember if CLIENT CACHING TRUE was the last command
+    // true if the previous command invoked is CLIENT CACHING TRUE
+    bool prev_command_ = false;


rename prev_command_ to track_current_cmd_

dranikpg

It looks somewhat over engineered to me 😅 We have similar semantics in blocking commands - only that we subscribe with specific commands and not with any.

Let's use squashing_info instead of adding a parent field to ConnectionContext or let's use that field for everything - there should be one way of doing things with proper comments, so nobody adds yet a third
I'd still suggest to add numbers to commands, because UpdatePrevAndLastCommand() appears in many places and we update three whole fileds: prev, executing, multi. The track command can just store its number and we don't have to update much more
Track() should be called when we conclude or finish the current multi command, currently we call it for every hop. Not that there are multi-hop read commands, but I think it belongs to all other management code. Invoke-cid should also not be needed with that

src/facade/dragonfly_connection.cc

dranikpg · 2024-05-02T11:59:39Z

src/server/transaction.cc

@@ -838,13 +838,23 @@ OpStatus Transaction::ScheduleSingleHop(RunnableType cb) {

 // Runs in coordinator thread.
 void Transaction::Execute(RunnableType cb, bool conclude) {
+  auto tracking_wrap = [cb, this](Transaction* t, EngineShard* shard) -> RunnableResult {


It would suggest to do it like we handle blocking commands - once it finished and manually from RunSquashedCb

dragonfly/src/server/transaction.cc

Lines 638 to 640 in a95419b

if (auto* bcontroller = shard->blocking_controller(); bcontroller) {

if (awaked_prerun || was_suspended) {

bcontroller->FinalizeWatched(GetShardArgs(idx), this);

So it becomes if (concluding || (multi && multi_->concluding)) Track(this)

Now you don't need invoke_cid, etc there as well

dranikpg · 2024-05-02T12:05:26Z

src/server/conn_context.cc

@@ -119,6 +119,13 @@ void ConnectionContext::ChangeMonitor(bool start) {
  EnableMonitoring(start);
 }

+ConnectionState::ClientTracking& ConnectionContext::ClientTrackingInfo() {
+  if (parent_cntx_) {
+    return parent_cntx_->conn_state.tracking_info_;


If you access conn_state, don't make it a function on conn_cntx

then you can just use conn_state, you can make it mutable or add a new member like conn

dragonfly/src/server/main_service.cc

Lines 214 to 215 in a95419b

if (cntx->conn_state.squashing_info)

cntx = cntx->conn_state.squashing_info->owner;

dranikpg · 2024-05-02T12:09:33Z

src/server/conn_context.cc

+}
+
+void ConnectionState::ClientTracking::UpdatePrevAndLastCommand() {
+  if (prev_command_ && multi_) {


We store so much fragile info that needs to be updated everywhere... seqnums would solve all this

dranikpg · 2024-05-02T12:17:18Z

src/server/conn_context.h

+  ConnectionContext* parent_cntx_ = nullptr;
+
+  ConnectionState::ClientTracking& ClientTrackingInfo();


See previous comment on whether we can keep this in conn_state

dranikpg · 2024-05-02T12:50:18Z

there is a deeper problem with CLIENT TRACKING OPTIN

suppose you have

MULTI
CLIENT TRACKING OPTIN
GET A
GET B 
GET C
EXEC

Now, if you squash the last 3, you loose the order - becuase GET C can run first of those three

romange · 2024-05-02T12:52:31Z

lets reject CLIENT TRACKING OPTIN in multi

romange · 2024-05-02T12:53:18Z

in fact, should we even allow CLIENT commands inside MULTI?

kostasrim · 2024-05-02T13:57:38Z

in fact, should we even allow CLIENT commands inside MULTI?

Only CLIENT CACHING YES as it has specific semantics

dranikpg

Some nits remaining, LGTM

src/server/conn_context.cc

src/server/main_service.cc

dranikpg · 2024-05-09T06:41:52Z

src/server/conn_context.cc

+  if ((cid->opt_mask() & CO::READONLY) && cid->IsTransactional() && info.ShouldTrackKeys()) {
+    auto conn = cntx->parent_cntx_ ? cntx->parent_cntx_->conn()->Borrow() : cntx->conn()->Borrow();
+    auto cb = [&, conn](unsigned i, auto* pb) {
+      if (shards.find(i) != shards.end()) {


nit: There is IsActive() so you don't need GetActiveShards()

dranikpg · 2024-05-09T06:43:53Z

src/server/transaction.h

+  void SetConnectionContextAndInvokeCid(ConnectionContext* cntx) {
+    cntx_ = cntx;
+  }


nit: please rename it then

src/server/conn_context.h

src/server/conn_context.cc

dranikpg · 2024-05-09T06:47:15Z

src/server/transaction.h

+  ConnectionContext* cntx_{nullptr};
+


I'll invent something in the future to get rid of this 😆

No need to wait for the future, I think transactions should not be aware of ConnectionContext. It's a design choice and it is possible to preserve it. Let's introduce a on_track_cb that will be passed to transaction if tracking is needed, i.e. if the condition ((cid->opt_mask() & CO::READONLY) && cid->IsTransactional() && info.ShouldTrackKeys()) holds. The callback will call OpTrackKeys. This way ClientTracking::Track will disappear at all

src/facade/dragonfly_connection.cc

dranikpg · 2024-05-09T06:49:35Z

src/server/conn_context.cc

+           << " with thread ID: " << conn_ref.Thread();
+
+  auto& db_slice = slice_args.shard->db_slice();
+  // TODO: There is a bug here that we track all arguments instead of tracking only keys.


ah, found it, yes that's also left 🙂

99.9% it does not affect us and it's not a bug. I created this #3034 for @romange to confirm and I will push a quick fix

src/server/transaction.cc

dranikpg

I think it's correct now. Note there are some unresolved nits by Roman

romange · 2024-05-28T04:26:22Z

src/facade/dragonfly_connection.cc

@@ -1119,7 +1118,8 @@ void Connection::HandleMigrateRequest() {
    this->Migrate(dest);
  }

-  DCHECK(dispatch_q_.empty());
+  // This triggers on rueidis SingleIntegrationTest


I think this comment can be improved. Explain how it happens that DCHECK(dispatch_q_.empty()); fails after the migration, "rueidis SingleIntegrationTest" is irrelevant here

romange · 2024-05-28T04:28:40Z

src/server/conn_context.cc

+  return OpStatus::OK;
+}
+
+void ConnectionState::ClientTracking::Track(ConnectionContext* cntx, const CommandId* cid) {


this function now is called within the Shard. We have a convention to name it as xxxOnShard to stress it.

romange · 2024-05-28T04:32:53Z

src/server/conn_context.cc

+  auto& info = cntx->conn_state.tracking_info_;
+  if ((cid->opt_mask() & CO::READONLY) && cid->IsTransactional() && info.ShouldTrackKeys()) {


I believe the code can be moved to the coordinator thread, i.e. when we prepare the transaction. the result can be kept as a boolean within transaction. Nothing here is relevant to the shard or the transaction state. we have Transaction::coordinator_state_ mask that can be used for this.

actually, see my other comment.

feat: client tracking optin

4c3ceff

kostasrim changed the title ~~feat: client tracking optin~~ feat: client tracking optin argument Apr 29, 2024

kostasrim self-assigned this Apr 29, 2024

fix CLIENT CACHING command wrong args

76444b3

kostasrim requested a review from dranikpg April 29, 2024 16:05

fix tests

7a54288

romange reviewed Apr 29, 2024

View reviewed changes

src/facade/dragonfly_connection.h Outdated Show resolved Hide resolved

romange reviewed Apr 29, 2024

View reviewed changes

src/server/server_family.cc Show resolved Hide resolved

romange reviewed Apr 29, 2024

View reviewed changes

src/server/main_service.cc Outdated Show resolved Hide resolved

romange reviewed Apr 29, 2024

View reviewed changes

src/facade/dragonfly_connection.cc Outdated Show resolved Hide resolved

romange reviewed Apr 29, 2024

View reviewed changes

src/facade/dragonfly_connection.h Outdated Show resolved Hide resolved

refactor client tracking, fix atomicity, squashing and multi/exec

8390b04

kostasrim changed the title ~~feat: client tracking optin argument~~ refactor client tracking, fix atomicity, squashing and multi/exec May 2, 2024

kostasrim commented May 2, 2024

View reviewed changes

src/facade/dragonfly_connection.cc Outdated Show resolved Hide resolved

kostasrim commented May 2, 2024

View reviewed changes

src/facade/dragonfly_connection.cc Show resolved Hide resolved

kostasrim commented May 2, 2024

View reviewed changes

remove unused code

db25d6d