"internal error" stream state when using gzip compression on 3.0-dev7 #2530

felipewd · 2024-04-12T20:37:43Z

Detailed Description of the Problem

We have a backend generating a huge almost-40MB m3u8 file for streaming. So we've tried using the compression filter, which seems easy enough.

The response without compression seem ideal, with content-length declared:

< HTTP/1.1 200 OK
< Server: nginx
< Date: Fri, 12 Apr 2024 20:25:23 GMT
< Content-Type: audio/mpegurl
< Content-Length: 41328301
< access-control-allow-origin: *
< content-description: File Transfer
< expires: 0
< Cache-Control: must-revalidate
< pragma: public
< content-disposition: attachment; filename="playlist_3917348_plus.m3u"

When using the compression filter, we get truncated response with:

< HTTP/1.1 200 OK
< Server: nginx
< Date: Fri, 12 Apr 2024 20:25:38 GMT
< Content-Type: audio/mpegurl
< access-control-allow-origin: *
< content-description: File Transfer
< expires: 0
< Cache-Control: must-revalidate
< pragma: public
< content-disposition: attachment; filename="playlist_3917348_plus.m3u"
< transfer-encoding: chunked
< vary: Accept-Encoding
< content-encoding: gzip
< 
{ [13906 bytes data]
* transfer closed with outstanding read data remaining
100 3947k    0 3947k    0     0  11.8M      0 --:--:-- --:--:-- --:--:-- 11.8M
* Closing connection
curl: (18) transfer closed with outstanding read data remaining

We were able to pull a httplog with a weird stream termination state:

front limitado/s01 0/0/0/56/242 200 41328601 - - ID-- 72/71/36/20/0 0/0   "GET /get.php?username=test&password=test&type=m3u_plus HTTP/1.1"

I'm a complete novice regarding this filter, but this state ID-- seems to suggest something isn't right.

This backend is using stick-tables to prevent abusers to access this huge URL many times, other than that, is pretty straightforward.

Expected Behavior

Compression to deliver a full valid response.

Steps to Reproduce the Behavior

Not sure...we're using a backend server that doesn't reply with a compressed response (kind of like if a compression offload is used) and using the compression filter on that response.

Do you have any idea what may have caused this?

Nope

Do you have an idea how to solve the issue?

Not at all

What is your configuration?

global
        nbthread 24
        maxconn 900000
        ulimit-n 3150919

        tune.bufsize 1048576
        tune.pipesize 262144

        h1-case-adjust content-length Content-Length
        h1-case-adjust content-type Content-Type
        h1-case-adjust content-range Content-Range


...

defaults
        mode    HTTP

<timeouts ...>

        option splice-auto
        option h1-case-adjust-bogus-client

backend limitado
    filter compression
    compression algo gzip 
    compression type audio/mpegurl

<stick-table and various ACLs for abuse detection>

server s01 xxx.xxx.xxx.xxx:80

Output of `haproxy -vv`

HAProxy version 3.0-dev7-0046922 2024/04/06 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 6.4.3-060403-generic #202307110536 SMP PREEMPT_DYNAMIC Tue Jul 11 05:43:58 UTC 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_OPENSSL=1 USE_ZLIB=1 USE_NS=1 USE_PCRE=1 USE_PCRE_JIT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT +PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=24).
Built with OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
Running on OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

No response

The text was updated successfully, but these errors were encountered:

felipewd · 2024-04-13T03:55:56Z

Just a quick note: I've tried recompiling haproxy with USE_SLZ=1 instead of USE_ZLIB=1...but didn't change anything.

And for some reason, 2.9.7 is also affected by this.

felipewd · 2024-04-13T05:16:18Z

I isolated the same config on another machine and couldn't reproduce the issue. Something about our production seems to have a special fairy dust for these kinds of things :-)

capflam · 2024-04-15T06:19:49Z

Thanks @felipewd. This kind of error is most probably reported by the H1 multiplexer because the message chunking appears as invalid. It is hard to know if the issue is on the HTTP compression filter side or the H1 multiplexer one.

You may eventually enable traces on H1 to only print errors:

(echo "trace h1 sink buf0; trace h1 level error; trace h1 verbosity complete; trace h1 start now; show events buf0 -w" ; cat ) | socat -  {SOCKET}

Don't forget to stop the traces at the end, using echo "trace 0" | socat - {SOCKET}

You have observed the issue on the 3.0-dev7 dans 2.9.7. Do you know if the 3.0-dev6 or 2.9.6 are also affected ?

felipewd · 2024-04-15T11:31:54Z

hey @capflam, thanks for the quick reply.

We were able to collect these traces:

<0>2024-04-15T11:28:48.772952+00:00 [04|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21135738 .body_len=0) - h1c=0x7fb61473eae0(0x00040000) conn=0x7fb61c7374d0(0x00000300) h1s=0x7fb61472e050(0x00124010) sd=0x7fb65c7367e0(0x50404001) sc=0x7fb65c72fb40(0x000084a2)
<0>2024-04-15T11:28:48.772965+00:00 [04|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21135738 .body_len=0) - h1c=0x7fb61473eae0(0x00040000) conn=0x7fb61c7374d0(0x00000300) h1s=0x7fb61472e050(0x00125010) sd=0x7fb65c7367e0(0x50414001) sc=0x7fb65c72fb40(0x000084a2)
<0>2024-04-15T11:28:53.023800+00:00 [12|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20926691 .body_len=0) - h1c=0x7fb64c72b310(0x00040000) conn=0x7fb60c72fb90(0x00000300) h1s=0x7fb62c747740(0x00124010) sd=0x7fb62c72ebc0(0x50404001) sc=0x7fb62c72e290(0x000084a2)
<0>2024-04-15T11:28:53.023809+00:00 [12|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20926691 .body_len=0) - h1c=0x7fb64c72b310(0x00040000) conn=0x7fb60c72fb90(0x00000300) h1s=0x7fb62c747740(0x00125010) sd=0x7fb62c72ebc0(0x50414001) sc=0x7fb62c72e290(0x000084a2)
<0>2024-04-15T11:28:54.730778+00:00 [19|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20868768 .body_len=0) - h1c=0x7fb64873a0c0(0x00040000) conn=0x7fb64873a280(0x00000300) h1s=0x7fb64872c940(0x00124010) sd=0x7fb648829090(0x50404001) sc=0x7fb648728fb0(0x000084a2)
<0>2024-04-15T11:28:54.730786+00:00 [19|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20868768 .body_len=0) - h1c=0x7fb64873a0c0(0x00040000) conn=0x7fb64873a280(0x00000300) h1s=0x7fb64872c940(0x00125010) sd=0x7fb648829090(0x50414001) sc=0x7fb648728fb0(0x000084a2)
<0>2024-04-15T11:29:07.582478+00:00 [16|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21287031 .body_len=0) - h1c=0x7fb64c738d50(0x00040000) conn=0x7fb64472ad60(0x00000300) h1s=0x7fb6187295b0(0x00124010) sd=0x7fb62472bd80(0x50404001) sc=0x7fb62472acf0(0x000084a2)
<0>2024-04-15T11:29:07.582487+00:00 [16|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21287031 .body_len=0) - h1c=0x7fb64c738d50(0x00040000) conn=0x7fb64472ad60(0x00000300) h1s=0x7fb6187295b0(0x00125010) sd=0x7fb62472bd80(0x50414001) sc=0x7fb62472acf0(0x000084a2)
<0>2024-04-15T11:29:08.877755+00:00 [03|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20899561 .body_len=0) - h1c=0x7fb64c72b160(0x00040000) conn=0x7fb61872a160(0x00000300) h1s=0x7fb658731c20(0x00124010) sd=0x7fb65872c100(0x50404001) sc=0x7fb65872b7d0(0x000084a2)
<0>2024-04-15T11:29:08.877767+00:00 [03|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=20899561 .body_len=0) - h1c=0x7fb64c72b160(0x00040000) conn=0x7fb61872a160(0x00000300) h1s=0x7fb658731c20(0x00125010) sd=0x7fb65872c100(0x50414001) sc=0x7fb65872b7d0(0x000084a2)
<0>2024-04-15T11:30:46.234229+00:00 [09|h1|0|mux_h1.c:2942] chunk smaller than announced : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21497716 .body_len=0) - h1c=0x7fb62c72d320(0x00040000) conn=0x7fb6347341e0(0x00000300) h1s=0x7fb61872c7f0(0x00124010) sd=0x7fb61872ab30(0x40404001) sc=0x7fb61872aba0(0x00008402)
<0>2024-04-15T11:30:46.234239+00:00 [09|h1|0|mux_h1.c:3002] processing error on message payload : [F,RUN] [MSG_DONE, MSG_DATA] - req=(.fl=0x00001510 .curr_len=0 .body_len=0)  res=(.fl=0x00011716 .curr_len=21497716 .body_len=0) - h1c=0x7fb62c72d320(0x00040000) conn=0x7fb6347341e0(0x00000300) h1s=0x7fb61872c7f0(0x00125010) sd=0x7fb61872ab30(0x40414001) sc=0x7fb61872aba0(0x00008402)

I'll test dev6 and 2.9.6 now and report back

felipewd · 2024-04-15T11:36:49Z

Just a quick follow-up: dev6 and 2.9.6 are also affected by this symptom. Seems easy enough to cause in production, if you'd like me to test something else.

Don't know how to precisely collect something useful here. Even those traces I'm not 100% sure are for this specific request, since we have a ton of other requests on this server.

capflam · 2024-04-15T12:05:01Z

Thanks @felipewd. So it is probably related to you issue because these traces will result by a ID-- termination states is logs. Eventually IL--. Here the error is reported because the end of the message was reported while some data must still be sent for the last chunk. At least, it is the H1 multiplexer point of view. I'm digging now.

capflam · 2024-04-16T05:34:01Z

I have no reproducer but by reading the code, I can find a way to produce this behavior. Your traces really help me to understand. Can you try the attached patch please, on top of the 3.0-dev7 ?

0001-BUG-MEDIUM-stconn-Don-t-forward-channel-data-if-inpu.patch.txt

felipewd · 2024-04-16T10:31:41Z

hey @capflam I can confirm this fixes the issue. Great and fast work, as usual :-)

Thanks!

capflam · 2024-04-16T12:19:10Z

A bit to quick, the fix was not pushed :)

…filtered Once data are received and placed in a channel buffer, if it is possible, outgoing data are immediately forwarded. But we must take care to not do so if there is also pending input data and a filter registered on the channel. It is especially important for HTX streams because the HTX may be altered, especially the extra field. And it is indeed an issue with the HTTP compression filter and the H1 multiplexer. The wrong chunk size may be announced leading to an internal error. This patch should fix the issue #2530. It must be backported to all stable versions.

…filtered Once data are received and placed in a channel buffer, if it is possible, outgoing data are immediately forwarded. But we must take care to not do so if there is also pending input data and a filter registered on the channel. It is especially important for HTX streams because the HTX may be altered, especially the extra field. And it is indeed an issue with the HTTP compression filter and the H1 multiplexer. The wrong chunk size may be announced leading to an internal error. This patch should fix the issue haproxy#2530. It must be backported to all stable versions. (cherry picked from commit 50d8c18) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

…filtered Once data are received and placed in a channel buffer, if it is possible, outgoing data are immediately forwarded. But we must take care to not do so if there is also pending input data and a filter registered on the channel. It is especially important for HTX streams because the HTX may be altered, especially the extra field. And it is indeed an issue with the HTTP compression filter and the H1 multiplexer. The wrong chunk size may be announced leading to an internal error. This patch should fix the issue haproxy#2530. It must be backported to all stable versions. (cherry picked from commit 50d8c18) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit dcc0a40) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

…filtered Once data are received and placed in a channel buffer, if it is possible, outgoing data are immediately forwarded. But we must take care to not do so if there is also pending input data and a filter registered on the channel. It is especially important for HTX streams because the HTX may be altered, especially the extra field. And it is indeed an issue with the HTTP compression filter and the H1 multiplexer. The wrong chunk size may be announced leading to an internal error. This patch should fix the issue haproxy#2530. It must be backported to all stable versions. (cherry picked from commit 50d8c18) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit dcc0a40) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 439a41b) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

felipewd added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Apr 12, 2024

felipewd changed the title ~~"internal error" when using gzip compression on 3.0-dev7~~ "internal error" stream state when using gzip compression on 3.0-dev7 Apr 12, 2024

capflam added status: reviewed This issue was reviewed. A fix is required. and removed status: needs-triage This issue needs to be triaged. labels Apr 15, 2024

felipewd closed this as completed Apr 16, 2024

capflam reopened this Apr 16, 2024

capflam removed 2.6 This issue affects the HAProxy 2.6 stable branch. 2.8 This issue affects the HAProxy 2.8 stable branch. 2.9 This issue affects the HAProxy 2.9 stable branch. labels Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"internal error" stream state when using gzip compression on 3.0-dev7 #2530

"internal error" stream state when using gzip compression on 3.0-dev7 #2530

felipewd commented Apr 12, 2024 •

edited

felipewd commented Apr 13, 2024 •

edited

felipewd commented Apr 13, 2024

capflam commented Apr 15, 2024

felipewd commented Apr 15, 2024

felipewd commented Apr 15, 2024

capflam commented Apr 15, 2024

capflam commented Apr 16, 2024

felipewd commented Apr 16, 2024

capflam commented Apr 16, 2024

"internal error" stream state when using gzip compression on 3.0-dev7 #2530

"internal error" stream state when using gzip compression on 3.0-dev7 #2530

Comments

felipewd commented Apr 12, 2024 • edited

Detailed Description of the Problem

Expected Behavior

Steps to Reproduce the Behavior

Do you have any idea what may have caused this?

Do you have an idea how to solve the issue?

What is your configuration?

Output of haproxy -vv

Last Outputs and Backtraces

Additional Information

felipewd commented Apr 13, 2024 • edited

felipewd commented Apr 13, 2024

capflam commented Apr 15, 2024

felipewd commented Apr 15, 2024

felipewd commented Apr 15, 2024

capflam commented Apr 15, 2024

capflam commented Apr 16, 2024

felipewd commented Apr 16, 2024

capflam commented Apr 16, 2024

felipewd commented Apr 12, 2024 •

edited

Output of `haproxy -vv`

felipewd commented Apr 13, 2024 •

edited