Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reactor: io_uring: enable some optimization flags #2089

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

avikivity
Copy link
Member

@avikivity avikivity commented Feb 9, 2024

Enable some optimization flags in an attempt to improve performance with io_uring:

IORING_SETUP_COOP_TASKRUN - prevents a completion from interrupting the reactor if it is running. Requires that the reactor issue an io_uring_enter system call in a timely fashion, but thanks to the task quota timer, we do.

IORING_SETUP_TASKRUN_FLAG - sets up a flag that notifies the reactor that the kernel has pending completions that it did not process. This allows the reactor to issue an io_uring_enter even if it has no pending submission queue entries or completion queue entries (e.g. it indicates a third queue, in the kernel, is not empty).

IORING_SETUP_SINGLE_ISSUER - elides some locking by guaranteeing that only a single thread plays with the ring; this happens to be true for us.

IORING_SETUP_DEFER_TASKRUN - batches up completion processing in an attempt to get some more performance.

This flags bump up the dependencies to Linux 6.1 and liburing 2.2. This seems worthwhile as right now io-uring lags behind linux-aio (which processes completions from interrupt context and therefore doesn't need all these optimizations).

After this exercise, io_uring is still slower than linux-aio.

@tchaikov
Copy link
Contributor

This flags bump up the dependencies to Linux 6.1 and liburing 2.2. This seems worthwhile as right now io-uring lags behind linux-aio (which processes completions from interrupt context and therefore doesn't need all these optimizations). However, I don't know how to specify the liburing version requirement.

lemme create a change to bump the required version to v2.2

@tchaikov
Copy link
Contributor

following patch would do the trick:

diff --git a/cmake/SeastarDependencies.cmake b/cmake/SeastarDependencies.cmake
index 6c80d0fa..9aff230b 100644
--- a/cmake/SeastarDependencies.cmake
+++ b/cmake/SeastarDependencies.cmake
@@ -133,7 +133,7 @@ macro (seastar_find_dependencies)
   seastar_set_dep_args (Protobuf REQUIRED
     VERSION 2.5.0)
   seastar_set_dep_args (LibUring
-    VERSION 2.0
+    VERSION 2.2
     OPTION ${Seastar_IO_URING})
   seastar_set_dep_args (StdAtomic REQUIRED)
   seastar_set_dep_args (hwloc

@avikivity
Copy link
Member Author

Thanks, I'll update my patch. But unfortunately I still see a 10% slowdown, I don't think my changes did anything.

@avikivity avikivity marked this pull request as draft February 11, 2024 14:47
Enable some optimization flags in an attempt to improve performance
with io_uring:

IORING_SETUP_COOP_TASKRUN - prevents a completion from interrupting the
reactor if it is running. Requires that the reactor issue an io_uring_enter
system call in a timely fashion, but thanks to the task quota timer, we do.

IORING_SETUP_TASKRUN_FLAG - sets up a flag that notifies the reactor
that the kernel has pending completions that it did not process. This
allows the reactor to issue an io_uring_enter even if it has no pending
submission queue entries or completion queue entries (e.g. it indicates
a third queue, in the kernel, is not empty).

IORING_SETUP_SINGLE_ISSUER - elides some locking by guaranteeing that only
a single thread plays with the ring; this happens to be true for us.

IORING_SETUP_DEFER_TASKRUN - batches up completion processing in an
attempt to get some more performance.

This flags bump up the dependencies to Linux 6.1 and liburing 2.2. This
seems worthwhile as right now io-uring lags behind linux-aio (which processes
completions from interrupt context and therefore doesn't need all these
optimizations).

After this exercise, io_uring is still slower than linux-aio.
@avikivity
Copy link
Member Author

v2: applies patch from @tchaikov to bump the uring version dependency

@nyh
Copy link
Contributor

nyh commented Feb 11, 2024

Enable some optimization flags in an attempt to improve performance with io_uring:
..
After this exercise, io_uring is still slower than linux-aio.

Can you share some example numbers on how much io_uring is still slower than linux-aio, and how much did your patch improve?

@avikivity
Copy link
Member Author

Client:

ab -k -c 100 -n 1000000 http://localhost:10000/

Server:

./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend linux-aio

Requests per second: 110300.38 [#/sec] (mean)

Server:

./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend io_uring

Requests per second: 96563.98 [#/sec] (mean)

Usually linux-aio/io_uring are within 10% of each other. I'm not sure if my patch improves anything, the results are noisy.

@avikivity
Copy link
Member Author

The most I was able to see is a large difference in IPC, but I have no idea how that happens.

@avikivity
Copy link
Member Author

I did find a difference:

[avi@avi seastar (uring-poll-first)]$ sudo perf stat -a -e irq_vectors:reschedule_entry  ./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend io_uring
INFO  2024-02-11 18:18:18,608 seastar - Reactor backend: io_uring
starting prometheus API server
Seastar HTTP server listening on port 10000 ...
^CStoppping HTTP server
Stoppping Prometheus server

 Performance counter stats for 'system wide':

           980,159      irq_vectors:reschedule_entry                                          

      14.433292048 seconds time elapsed

[avi@avi seastar (uring-poll-first)]$ sudo perf stat -a -e irq_vectors:reschedule_entry  ./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend linux-aio
INFO  2024-02-11 18:18:40,236 seastar - Reactor backend: linux-aio
starting prometheus API server
Seastar HTTP server listening on port 10000 ...
^CStoppping HTTP server
Stoppping Prometheus server

 Performance counter stats for 'system wide':

               496      irq_vectors:reschedule_entry                                          

      13.693591716 seconds time elapsed

@avikivity
Copy link
Member Author

I thought the patch addresses those self-interrupts, but maybe not)

@avikivity
Copy link
Member Author

Ah that run was without the patch.

@avikivity
Copy link
Member Author

Unfortunately the patch doesn't help sufficiently, it's still slower.

@avikivity
Copy link
Member Author

Follow-up: #2092

@avikivity
Copy link
Member Author

On ARM, these two patches change io_uring from ~ -12% to ~ +2%. So maybe there's hope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants