Return re-connection failures immediately #159

bjosv · 2023-05-30T11:57:24Z

Includes a new testcase in a separate commit which visualize legacy behavior.
The testcase simulates a temporary network problem which triggers node reconnects to fail.

A new testcase is added which simulates a temporary network problem which triggers node reconnects to fail. Update clusterclient to skip empty lines and comments and limit the maxretry to 2 (default 5).

zuiderkwast · 2023-05-30T15:52:43Z

tests/clusterclient.c

+        redisReply *reply = (redisReply *)redisClusterCommand(cc, cmd);
        if (cc->err) {
-            fprintf(stderr, "redisClusterCommand error: %s\n", cc->errstr);
-            exit(101);
+            printf("error: %s\n", cc->errstr);
+        } else {
+            printf("%s\n", reply->str);
        }


So now we can continue to use the cluster context after a reconnect failure? Sending queries still works as long as it's not the failing node? What would happen if we'd do that without this change? Would it crash the program?

How does redisClusterSetOptionMaxRetry affect the behaviour?

I think the PR description lacks some details about behaviour before and after.

zuiderkwast · 2023-05-30T15:54:16Z

hircluster.c

-            redisReconnect(c);
+            if (redisReconnect(c) != REDIS_OK) {
+                __redisClusterSetError(cc, c->err, c->errstr);
+                return NULL;
+            }

            if (cc->ssl && cc->ssl_init_fn(c, cc->ssl) != REDIS_OK) {
                __redisClusterSetError(cc, c->err, c->errstr);


I saw this too but I thought ssl_init_fn is supposed to handle a connection which failed to reconnect.

It seems cleaner to return ASAP though, but I'd like to understand the consequences better.

zuiderkwast · 2023-05-30T16:01:00Z

tests/scripts/reconnect-failure-test.sh

-# A reconnect failure triggers a search for an available node
-EXPECT ["PING"]
-SEND +PONG
-EXPECT ["SET", "foo", "second"]
-SEND -MOVED 12182 127.0.0.1:7402


Ah, looking at the commits one by one explains it. :)

Legacy: After N failed reconnect attempts, it sends the command to a random node and gets a redirect (after PING), then slot map update? Then the same thing is repeated M times. I don't get it exactly. Is maxretry = N or maxretry = M?

bjosv added 2 commits May 30, 2023 13:51

Add a reconnect failure testcase

0691c8b

A new testcase is added which simulates a temporary network problem which triggers node reconnects to fail. Update clusterclient to skip empty lines and comments and limit the maxretry to 2 (default 5).

Return reconnection failures immediately

41d1039

bjosv marked this pull request as ready for review May 30, 2023 12:05

bjosv requested a review from zuiderkwast May 30, 2023 12:05

zuiderkwast reviewed May 30, 2023

View reviewed changes

bjosv marked this pull request as draft May 31, 2023 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return re-connection failures immediately #159

Return re-connection failures immediately #159

bjosv commented May 30, 2023 •

edited

zuiderkwast May 30, 2023

zuiderkwast May 30, 2023

zuiderkwast May 30, 2023

Return re-connection failures immediately #159

Are you sure you want to change the base?

Return re-connection failures immediately #159

Conversation

bjosv commented May 30, 2023 • edited

zuiderkwast May 30, 2023

Choose a reason for hiding this comment

zuiderkwast May 30, 2023

Choose a reason for hiding this comment

zuiderkwast May 30, 2023

Choose a reason for hiding this comment

bjosv commented May 30, 2023 •

edited