Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce shared query buffer for client reads #258

Open
wants to merge 8 commits into
base: unstable
Choose a base branch
from

Conversation

uriyage
Copy link

@uriyage uriyage commented Apr 8, 2024

This PR optimizes client query buffer handling in Valkey by introducing a shared query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer.

The primary changes are:

  • Adding a shared query buffer shared_qb that clients use by default
  • Modifying client querybuf initialization and reset logic
  • Copying any partial query from shared to private buffer before command execution
  • Freeing idle client query buffers when empty to allow reuse of shared buffer
  • Master client query buffers are kept private as their contents need to be preserved for replication stream

In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client.

@uriyage uriyage force-pushed the shared_qb branch 2 times, most recently from 6ce2ea4 to fc9ee81 Compare April 8, 2024 11:48
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me. Can you document the procedure you were using when you saw the 3% performance boost? Did we also test it for pipelined clients?

I would also ideally like some protection in freeClient() to make sure if a client gets freed it somehow doesn't also free the shard query buffer.

src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/server.c Show resolved Hide resolved
tests/unit/querybuf.tcl Show resolved Hide resolved
src/replication.c Outdated Show resolved Hide resolved
@uriyage
Copy link
Author

uriyage commented May 5, 2024

Performance Benchmark:

Server Setup:

  • 3 million keys, each with 512 bytes

Benchmark Configuration:

  • 1,000 clients running SET commands
  • Server and benchmark running on separate ARM instances with 64 cores

Server Command:

./valkey-server --save

Benchmark Command:

./valkey-benchmark -t set -d 512 -r 3000000 -c 1000 --threads 50 -h <server_host_address> -n 50000000

Results (Average of 3 runs):

  1. Without shared query buffer:

    • Throughput: 208,361 operations per second
    • Average latency: 4.806 milliseconds
  2. With shared query buffer:

    • Throughput: 214,062 operations per second
    • Average latency: 4.692 milliseconds

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LTGM overall. Thanks @uriyage

src/server.c Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/server.c Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

src/networking.c Outdated Show resolved Hide resolved
uriyage and others added 5 commits May 9, 2024 05:28
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with it. @uriyage Can you do the merge to get it up to date?

@madolson madolson added release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge labels May 9, 2024
src/networking.c Outdated Show resolved Hide resolved
Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 96.22642% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 69.82%. Comparing base (4e18e32) to head (7f7f618).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #258      +/-   ##
============================================
+ Coverage     69.80%   69.82%   +0.01%     
============================================
  Files           109      109              
  Lines         61801    61839      +38     
============================================
+ Hits          43141    43178      +37     
- Misses        18660    18661       +1     
Files Coverage Δ
src/replication.c 86.72% <100.00%> (-0.05%) ⬇️
src/server.c 88.62% <100.00%> (+0.01%) ⬆️
src/networking.c 85.20% <95.23%> (+0.30%) ⬆️

... and 13 files with indirect coverage changes

@soloestoy
Copy link
Member

soloestoy commented May 13, 2024

@uriyage can you give more tests about the cases that the whole command or argv exceed PROTO_IOBUF_LEN?

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
@uriyage
Copy link
Author

uriyage commented May 13, 2024

@uriyage can you give move test results about the cases that the whole command or argv exceed PROTO_IOBUF_LEN?

I checked with 24K values (since with jemalloc we actually allocate more than 16K).

Commands:

./valkey-server --protected-mode no --save

./valkey-benchmark -t set -d 24000 -r 1000 -c 500 --threads 4 -h 192.31.233.204 -n 10000000

I get 83,544 without a shared query buffer and 83,238 with a shared query buffer.

So, essentially about the same, which is expected since with larger values, we revert to the current logic where we use a private buffer per client.

@uriyage
Copy link
Author

uriyage commented May 14, 2024

I'm good with it. @uriyage Can you do the merge to get it up to date?

@madolson Done

@madolson
Copy link
Member

@soloestoy Do you have any further followup?

@@ -24,8 +24,24 @@ start_server {tags {"querybuf slow"}} {
# The test will run at least 2s to check if client query
# buffer will be resized when client idle 2s.
test "query buffer resized correctly" {
set rd [valkey_client]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to add two more cases that commands with large arguments exceed 16KB (PROTO_IOBUF_LEN) and 32KB (PROTO_MBULK_BIG_ARG), that we can cover all special cases.

@soloestoy
Copy link
Member

I'm thinking if we can eliminate the limit of shared query buffer (and resize in cron if needed), maybe we can get more benefits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants