Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC: stalled connection rejection causing stalled connection reopening #2549

Open
Tristan971 opened this issue Apr 24, 2024 · 3 comments
Open
Labels
status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug.

Comments

@Tristan971
Copy link
Member

Detailed Description of the Problem

This is a tricky one, and it's extremely difficult to reliably reproduce or be sure what exactly is happening, however I noticed (and so did some users) the following behaviour occasionally:

  • a browser starts a QUIC query
  • inexplicably, that query is stalled for 4 seconds (+- 10ms)
  • if you wait long enough with no interraction, it happens again

It shows up like so in a browser:
image

So on one side, there's a 4s timeout that doesn't trigger reliably.

Expected Behavior

There's no random 4s timeout to connections.

Steps to Reproduce the Behavior

Unsure, my best repro so far is to open a fresh browser instance, then navigate to my site and observe in maybe 50% of cases that random 4s stall.

Do you have any idea what may have caused this?

Not sure, but I know it's a regression. One day I realized it, and just assumed it was a DNS stall or something on my computer. I only investigated much later and unfortunately don't know exactly when it started.

Do you have an idea how to solve the issue?

No response

What is your configuration?

# relevant portion

    timeout check           5s
    timeout client          5s
    timeout client-fin      5s
    timeout connect         1s
    timeout http-keep-alive 3s
    timeout http-request    5s
    timeout queue           5s
    timeout server          30s
    timeout server-fin      10s
    timeout tarpit          5s
    timeout tunnel          1h
    timeout http-request 1000
    timeout http-keep-alive 2000

Output of haproxy -vv

HAProxy version 3.0-dev4-7151076+mangadex-73c78ef 2024-03-02T09:44+00:00 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.15.126-1-pve #1 SMP PVE 5.15.126-1 (2023-10-03T17:24Z) x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -ggdb3 -gdwarf-4 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wnull-dereference -fwrapv -Wno-unknown-warning-option -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -DMAX_SESS_STKCTR=5
  OPTIONS = USE_LIBCRYPT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_QUIC=1 USE_PROMEX=1 USE_STATIC_PCRE2=1 USE_PCRE2=1 USE_PCRE2_JIT=1
  DEBUG   = -DDEBUG_MEMORY_POOLS -DDEBUG_STRICT

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE +STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 1.1.1w+quic-mangadex-73c78ef  2 Mar 2024
Running on OpenSSL version : OpenSSL 1.1.1w+quic-mangadex-73c78ef  2 Mar 2024
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.6
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.42 2022-12-11
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with clang compiler version 17.0.6 (++20231208085813+6009708b4367-1~exp1~20231208085906.81)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=

Available services : prometheus-exporter
Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

I will update my HAProxy version first and see if something changes, but I figured I ought to at least open the issue already.

@Tristan971 Tristan971 added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Apr 24, 2024
@Tristan971
Copy link
Member Author

Since I can somewhat reliably reproduce it, I'll try to get traces and so on on a mainline build.

@a-denoyelle
Copy link
Contributor

I had some tests on my side using chrome as browser but I cannot reproduce it for now.

As a side info for us related to your screenshot, the issue seems to happen when opening a new connection as graphs for request on connection reuse are different (no DNS/Initial connection/SSL display).

@Tristan971
Copy link
Member Author

I had some tests on my side using chrome as browser but I cannot reproduce it for now.

Yeah it is quite tricky alas, and seems to require multiple factors which I'm just lucky to have for some reason at home with the OS, browser, ISP, etc.

Have very limited time for testing recently, but I won't forget to get extra details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

2 participants