Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"internal error" stream state when using gzip compression on 3.0-dev7 #2530

Open
felipewd opened this issue Apr 12, 2024 · 9 comments
Open
Labels
2.0 This issue affects the HAProxy 2.0 stable branch. 2.2 This issue affects the HAProxy 2.2 stable branch. 2.4 This issue affects the HAProxy 2.4 stable branch. status: fixed This issue is a now-fixed bug. type: bug This issue describes a bug.

Comments

@felipewd
Copy link

felipewd commented Apr 12, 2024

Detailed Description of the Problem

We have a backend generating a huge almost-40MB m3u8 file for streaming. So we've tried using the compression filter, which seems easy enough.

The response without compression seem ideal, with content-length declared:

< HTTP/1.1 200 OK
< Server: nginx
< Date: Fri, 12 Apr 2024 20:25:23 GMT
< Content-Type: audio/mpegurl
< Content-Length: 41328301
< access-control-allow-origin: *
< content-description: File Transfer
< expires: 0
< Cache-Control: must-revalidate
< pragma: public
< content-disposition: attachment; filename="playlist_3917348_plus.m3u"

When using the compression filter, we get truncated response with:

< HTTP/1.1 200 OK
< Server: nginx
< Date: Fri, 12 Apr 2024 20:25:38 GMT
< Content-Type: audio/mpegurl
< access-control-allow-origin: *
< content-description: File Transfer
< expires: 0
< Cache-Control: must-revalidate
< pragma: public
< content-disposition: attachment; filename="playlist_3917348_plus.m3u"
< transfer-encoding: chunked
< vary: Accept-Encoding
< content-encoding: gzip
< 
{ [13906 bytes data]
* transfer closed with outstanding read data remaining
100 3947k    0 3947k    0     0  11.8M      0 --:--:-- --:--:-- --:--:-- 11.8M
* Closing connection
curl: (18) transfer closed with outstanding read data remaining

We were able to pull a httplog with a weird stream termination state:

front limitado/s01 0/0/0/56/242 200 41328601 - - ID-- 72/71/36/20/0 0/0   "GET /get.php?username=test&password=test&type=m3u_plus HTTP/1.1"

I'm a complete novice regarding this filter, but this state ID-- seems to suggest something isn't right.

This backend is using stick-tables to prevent abusers to access this huge URL many times, other than that, is pretty straightforward.

Expected Behavior

Compression to deliver a full valid response.

Steps to Reproduce the Behavior

Not sure...we're using a backend server that doesn't reply with a compressed response (kind of like if a compression offload is used) and using the compression filter on that response.

Do you have any idea what may have caused this?

Nope

Do you have an idea how to solve the issue?

Not at all

What is your configuration?

global
        nbthread 24
        maxconn 900000
        ulimit-n 3150919

        tune.bufsize 1048576
        tune.pipesize 262144

        h1-case-adjust content-length Content-Length
        h1-case-adjust content-type Content-Type
        h1-case-adjust content-range Content-Range


...

defaults
        mode    HTTP

<timeouts ...>

        option splice-auto
        option h1-case-adjust-bogus-client

backend limitado
    filter compression
    compression algo gzip 
    compression type audio/mpegurl

<stick-table and various ACLs for abuse detection>

server s01 xxx.xxx.xxx.xxx:80

Output of haproxy -vv

HAProxy version 3.0-dev7-0046922 2024/04/06 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 6.4.3-060403-generic #202307110536 SMP PREEMPT_DYNAMIC Tue Jul 11 05:43:58 UTC 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_OPENSSL=1 USE_ZLIB=1 USE_NS=1 USE_PCRE=1 USE_PCRE_JIT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT +PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=24).
Built with OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
Running on OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

No response

@felipewd felipewd added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Apr 12, 2024
@felipewd felipewd changed the title "internal error" when using gzip compression on 3.0-dev7 "internal error" stream state when using gzip compression on 3.0-dev7 Apr 12, 2024
@felipewd
Copy link
Author

felipewd commented Apr 13, 2024

Just a quick note: I've tried recompiling haproxy with USE_SLZ=1 instead of USE_ZLIB=1...but didn't change anything.

And for some reason, 2.9.7 is also affected by this.

@felipewd
Copy link
Author

I isolated the same config on another machine and couldn't reproduce the issue. Something about our production seems to have a special fairy dust for these kinds of things :-)

@capflam
Copy link
Member

capflam commented Apr 15, 2024

Thanks @felipewd. This kind of error is most probably reported by the H1 multiplexer because the message chunking appears as invalid. It is hard to know if the issue is on the HTTP compression filter side or the H1 multiplexer one.

You may eventually enable traces on H1 to only print errors:

(echo "trace h1 sink buf0; trace h1 level error; trace h1 verbosity complete; trace h1 start now; show events buf0 -w" ; cat ) | socat -  {SOCKET}

Don't forget to stop the traces at the end, using echo "trace 0" | socat - {SOCKET}

You have observed the issue on the 3.0-dev7 dans 2.9.7. Do you know if the 3.0-dev6 or 2.9.6 are also affected ?

@capflam capflam added status: reviewed This issue was reviewed. A fix is required. and removed status: needs-triage This issue needs to be triaged. labels Apr 15, 2024
@felipewd
Copy link
Author

hey @capflam, thanks for the quick reply.

We were able to collect these traces:

<0>2024-04-15T11:28:48.772952+00:00 [04|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21135738 .body_len=0) - h1c=0x7fb61473eae0(0x00040000) conn=0x7fb61c7374d0(0x00000300) h1s=0x7fb61472e050(0x00124010) sd=0x7fb65c7367e0(0x50404001) sc=0x7fb65c72fb40(0x000084a2)
<0>2024-04-15T11:28:48.772965+00:00 [04|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21135738 .body_len=0) - h1c=0x7fb61473eae0(0x00040000) conn=0x7fb61c7374d0(0x00000300) h1s=0x7fb61472e050(0x00125010) sd=0x7fb65c7367e0(0x50414001) sc=0x7fb65c72fb40(0x000084a2)
<0>2024-04-15T11:28:53.023800+00:00 [12|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20926691 .body_len=0) - h1c=0x7fb64c72b310(0x00040000) conn=0x7fb60c72fb90(0x00000300) h1s=0x7fb62c747740(0x00124010) sd=0x7fb62c72ebc0(0x50404001) sc=0x7fb62c72e290(0x000084a2)
<0>2024-04-15T11:28:53.023809+00:00 [12|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20926691 .body_len=0) - h1c=0x7fb64c72b310(0x00040000) conn=0x7fb60c72fb90(0x00000300) h1s=0x7fb62c747740(0x00125010) sd=0x7fb62c72ebc0(0x50414001) sc=0x7fb62c72e290(0x000084a2)
<0>2024-04-15T11:28:54.730778+00:00 [19|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20868768 .body_len=0) - h1c=0x7fb64873a0c0(0x00040000) conn=0x7fb64873a280(0x00000300) h1s=0x7fb64872c940(0x00124010) sd=0x7fb648829090(0x50404001) sc=0x7fb648728fb0(0x000084a2)
<0>2024-04-15T11:28:54.730786+00:00 [19|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20868768 .body_len=0) - h1c=0x7fb64873a0c0(0x00040000) conn=0x7fb64873a280(0x00000300) h1s=0x7fb64872c940(0x00125010) sd=0x7fb648829090(0x50414001) sc=0x7fb648728fb0(0x000084a2)
<0>2024-04-15T11:29:07.582478+00:00 [16|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21287031 .body_len=0) - h1c=0x7fb64c738d50(0x00040000) conn=0x7fb64472ad60(0x00000300) h1s=0x7fb6187295b0(0x00124010) sd=0x7fb62472bd80(0x50404001) sc=0x7fb62472acf0(0x000084a2)
<0>2024-04-15T11:29:07.582487+00:00 [16|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21287031 .body_len=0) - h1c=0x7fb64c738d50(0x00040000) conn=0x7fb64472ad60(0x00000300) h1s=0x7fb6187295b0(0x00125010) sd=0x7fb62472bd80(0x50414001) sc=0x7fb62472acf0(0x000084a2)
<0>2024-04-15T11:29:08.877755+00:00 [03|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20899561 .body_len=0) - h1c=0x7fb64c72b160(0x00040000) conn=0x7fb61872a160(0x00000300) h1s=0x7fb658731c20(0x00124010) sd=0x7fb65872c100(0x50404001) sc=0x7fb65872b7d0(0x000084a2)
<0>2024-04-15T11:29:08.877767+00:00 [03|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20899561 .body_len=0) - h1c=0x7fb64c72b160(0x00040000) conn=0x7fb61872a160(0x00000300) h1s=0x7fb658731c20(0x00125010) sd=0x7fb65872c100(0x50414001) sc=0x7fb65872b7d0(0x000084a2)
<0>2024-04-15T11:30:46.234229+00:00 [09|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21497716 .body_len=0) - h1c=0x7fb62c72d320(0x00040000) conn=0x7fb6347341e0(0x00000300) h1s=0x7fb61872c7f0(0x00124010) sd=0x7fb61872ab30(0x40404001) sc=0x7fb61872aba0(0x00008402)
<0>2024-04-15T11:30:46.234239+00:00 [09|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21497716 .body_len=0) - h1c=0x7fb62c72d320(0x00040000) conn=0x7fb6347341e0(0x00000300) h1s=0x7fb61872c7f0(0x00125010) sd=0x7fb61872ab30(0x40414001) sc=0x7fb61872aba0(0x00008402)

I'll test dev6 and 2.9.6 now and report back

@felipewd
Copy link
Author

Just a quick follow-up: dev6 and 2.9.6 are also affected by this symptom. Seems easy enough to cause in production, if you'd like me to test something else.

Don't know how to precisely collect something useful here. Even those traces I'm not 100% sure are for this specific request, since we have a ton of other requests on this server.

@capflam
Copy link
Member

capflam commented Apr 15, 2024

Thanks @felipewd. So it is probably related to you issue because these traces will result by a ID-- termination states is logs. Eventually IL--. Here the error is reported because the end of the message was reported while some data must still be sent for the last chunk. At least, it is the H1 multiplexer point of view. I'm digging now.

@capflam
Copy link
Member

capflam commented Apr 16, 2024

I have no reproducer but by reading the code, I can find a way to produce this behavior. Your traces really help me to understand. Can you try the attached patch please, on top of the 3.0-dev7 ?

0001-BUG-MEDIUM-stconn-Don-t-forward-channel-data-if-inpu.patch.txt

@felipewd
Copy link
Author

hey @capflam I can confirm this fixes the issue. Great and fast work, as usual :-)

Thanks!

@capflam
Copy link
Member

capflam commented Apr 16, 2024

A bit to quick, the fix was not pushed :)

@capflam capflam reopened this Apr 16, 2024
haproxy-mirror pushed a commit that referenced this issue Apr 16, 2024
…filtered

Once data are received and placed in a channel buffer, if it is possible,
outgoing data are immediately forwarded. But we must take care to not do so
if there is also pending input data and a filter registered on the
channel. It is especially important for HTX streams because the HTX may be
altered, especially the extra field. And it is indeed an issue with the HTTP
compression filter and the H1 multiplexer. The wrong chunk size may be
announced leading to an internal error.

This patch should fix the issue #2530. It must be backported to all stable
versions.
@capflam capflam added status: fixed This issue is a now-fixed bug. 2.0 This issue affects the HAProxy 2.0 stable branch. 2.2 This issue affects the HAProxy 2.2 stable branch. 2.4 This issue affects the HAProxy 2.4 stable branch. 2.6 This issue affects the HAProxy 2.6 stable branch. 2.8 This issue affects the HAProxy 2.8 stable branch. 2.9 This issue affects the HAProxy 2.9 stable branch. and removed status: reviewed This issue was reviewed. A fix is required. labels Apr 16, 2024
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Apr 19, 2024
…filtered

Once data are received and placed in a channel buffer, if it is possible,
outgoing data are immediately forwarded. But we must take care to not do so
if there is also pending input data and a filter registered on the
channel. It is especially important for HTX streams because the HTX may be
altered, especially the extra field. And it is indeed an issue with the HTTP
compression filter and the H1 multiplexer. The wrong chunk size may be
announced leading to an internal error.

This patch should fix the issue haproxy#2530. It must be backported to all stable
versions.

(cherry picked from commit 50d8c18)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Apr 19, 2024
…filtered

Once data are received and placed in a channel buffer, if it is possible,
outgoing data are immediately forwarded. But we must take care to not do so
if there is also pending input data and a filter registered on the
channel. It is especially important for HTX streams because the HTX may be
altered, especially the extra field. And it is indeed an issue with the HTTP
compression filter and the H1 multiplexer. The wrong chunk size may be
announced leading to an internal error.

This patch should fix the issue haproxy#2530. It must be backported to all stable
versions.

(cherry picked from commit 50d8c18)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit dcc0a40)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
@capflam capflam removed 2.6 This issue affects the HAProxy 2.6 stable branch. 2.8 This issue affects the HAProxy 2.8 stable branch. 2.9 This issue affects the HAProxy 2.9 stable branch. labels Apr 19, 2024
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Apr 20, 2024
…filtered

Once data are received and placed in a channel buffer, if it is possible,
outgoing data are immediately forwarded. But we must take care to not do so
if there is also pending input data and a filter registered on the
channel. It is especially important for HTX streams because the HTX may be
altered, especially the extra field. And it is indeed an issue with the HTTP
compression filter and the H1 multiplexer. The wrong chunk size may be
announced leading to an internal error.

This patch should fix the issue haproxy#2530. It must be backported to all stable
versions.

(cherry picked from commit 50d8c18)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit dcc0a40)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 439a41b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0 This issue affects the HAProxy 2.0 stable branch. 2.2 This issue affects the HAProxy 2.2 stable branch. 2.4 This issue affects the HAProxy 2.4 stable branch. status: fixed This issue is a now-fixed bug. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

2 participants