Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: adding stats to the remote engine #7361

Merged
merged 14 commits into from
May 24, 2024

Conversation

pedro-stanaka
Copy link
Contributor

@pedro-stanaka pedro-stanaka commented May 15, 2024

We are currently losing track of query stats because the remote engine does not transmit performance stats on gRPC calls.
In this PR I am adding some fields to the Query API response to include some stats.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Adding new fields on the response of the QueryAPI to carry stats in gRPC calls.
  • Implementing the Stats() method from the prometheus Query interface for the remoteQuery so we can get stats on the thanos/promql-engine.
  • Changes on promql-engine to properly consume stats from upstream queries (see Tracking peak samples in engine promql-engine#452)

Verification

img

@pedro-stanaka
Copy link
Contributor Author

pedro-stanaka commented May 16, 2024

There is one test e2e test failing, but cant reproduce locally: (also, it seems to be failing on main for CI)

=== RUN   TestCompactorIssue6775
09:07:58 msg started docker environment name c-issue6775
09:07:59 Starting minio
09:08:01 minio: RELEASE.2022-03-14T18-25-24Z: Pulling from minio/minio
09:08:01 minio: b9384ae307c6: Pulling fs layer
09:08:01 minio: 4a054ce2cd6f: Pulling fs layer
09:08:01 minio: 33f761cc4009: Pulling fs layer
09:08:01 minio: 450dea7c1ca4: Pulling fs layer
09:08:01 minio: 79e96d1a1e87: Pulling fs layer
09:08:01 minio: 450dea7c1ca4: Waiting
09:08:01 minio: 84a2d2ef81f8: Pulling fs layer
09:08:01 minio: 15cfa64ddf9a: Pulling fs layer
09:08:01 minio: 84a2d2ef81f8: Waiting
09:08:01 minio: 79e96d1a1e87: Waiting
09:08:01 minio: 15cfa64ddf9a: Waiting
09:08:02 minio: 4a054ce2cd6f: Download complete
09:08:03 minio: 33f761cc4009: Verifying Checksum
09:08:03 minio: 33f761cc4009: Download complete
09:08:03 minio: b9384ae307c6: Verifying Checksum
09:08:03 minio: b9384ae307c6: Download complete
09:08:04 minio: 79e96d1a1e87: Verifying Checksum
09:08:04 minio: 79e96d1a1e87: Download complete
09:08:04 minio: 450dea7c1ca4: Verifying Checksum
09:08:04 minio: 450dea7c1ca4: Download complete
09:08:04 minio: 84a2d2ef81f8: Verifying Checksum
09:08:04 minio: 84a2d2ef81f8: Download complete
09:08:05 minio: b9384ae307c6: Pull complete
09:08:05 minio: 4a054ce2cd6f: Pull complete
09:08:05 minio: 33f761cc4009: Pull complete
09:08:05 minio: 450dea7c1ca4: Pull complete
09:08:05 minio: 79e96d1a1e87: Pull complete
09:08:05 minio: 84a2d2ef81f8: Pull complete
09:08:06 minio: 15cfa64ddf9a: Verifying Checksum
09:08:06 minio: 15cfa64ddf9a: Download complete
09:08:06 minio: 15cfa64ddf9a: Pull complete
09:08:06 minio: Digest: sha256:99db16fdd4f3d3b6c242cf95ef510cb038d8290eb4f80c3d2d9d51a4b145639a
09:08:06 minio: Status: Downloaded newer image for minio/minio:RELEASE.2022-03-14T18-25-24Z
09:08:06 minio: docker.io/minio/minio:RELEASE.2022-03-14T18-25-24Z
09:08:07 Ports for container c-issue6775-minio >> Local ports: map[http:8090] Ports available from host: map[http:32768]
09:08:09 Starting downsampler-downsample
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.124668097Z caller=main.go:77 level=debug msg="maxprocs: Leaving GOMAXPROCS=[10]: CPU quota undefined"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.125051013Z caller=factory.go:53 level=info msg="loading bucket configuration"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.127239263Z caller=downsample.go:173 level=info msg="starting downsample node"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.127408429Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.127462512Z caller=downsample.go:126 level=info msg="start first pass of downsampling"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.127566471Z caller=fetcher.go:476 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=32
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.127728179Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.127750137Z caller=http.go:73 level=info service=http/server component=downsample msg="listening for requests and metrics" address=:8080
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.128181595Z caller=tls_config.go:313 level=info service=http/server component=downsample msg="Listening on" address=[::]:8080
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.128208387Z caller=tls_config.go:316 level=info service=http/server component=downsample msg="TLS is disabled." http2=false address=[::]:8080
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.139906132Z caller=fetcher.go:626 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=12.420953ms duration_ms=12 cached=2 returned=2 partial=0
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.140341507Z caller=downsample.go:253 level=debug msg="downsampling bucket" concurrency=1
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.146716462Z caller=objstore.go:364 level=debug msg="not downloading again because a provided path matches this one" file=01HY03NM5BDQXQ7TE4TPVD5Z9Q/meta.json
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.149576669Z caller=downsample.go:364 level=info msg="downloaded block" id=01HY03NM5BDQXQ7TE4TPVD5Z9Q duration=9.190371ms duration_ms=9
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.229881008Z caller=streamed_block_writer.go:178 level=info msg="finalized downsampled block" mint=1710374400014 maxt=1711584000000 ulid=01HY03NN16FY7NGAVRG92WPGH1 resolution=300000
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.229934592Z caller=downsample.go:392 level=info msg="downsampled block" from=01HY03NM5BDQXQ7TE4TPVD5Z9Q to=01HY03NN16FY7NGAVRG92WPGH1 duration=80.058339ms duration_ms=80
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.235193298Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NN16FY7NGAVRG92WPGH1/chunks/000001 dst=01HY03NN16FY7NGAVRG92WPGH1/chunks/000001 bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.236936922Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NN16FY7NGAVRG92WPGH1/index dst=01HY03NN16FY7NGAVRG92WPGH1/index bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.238819088Z caller=downsample.go:426 level=info msg="uploaded block" id=01HY03NN16FY7NGAVRG92WPGH1 duration=7.794872ms duration_ms=7
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.244714377Z caller=objstore.go:364 level=debug msg="not downloading again because a provided path matches this one" file=01HY03NMFSPQ1FZY66RRR6JJ3R/meta.json
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.247007501Z caller=downsample.go:364 level=info msg="downloaded block" id=01HY03NMFSPQ1FZY66RRR6JJ3R duration=6.328039ms duration_ms=6
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.421881714Z caller=streamed_block_writer.go:178 level=info msg="finalized downsampled block" mint=1710374400014 maxt=1711584000000 ulid=01HY03NN47AHEX1HAK96CDCHQ0 resolution=300000
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.421918756Z caller=downsample.go:392 level=info msg="downsampled block" from=01HY03NMFSPQ1FZY66RRR6JJ3R to=01HY03NN47AHEX1HAK96CDCHQ0 duration=174.669463ms duration_ms=174
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.42750117Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NN47AHEX1HAK96CDCHQ0/chunks/000001 dst=01HY03NN47AHEX1HAK96CDCHQ0/chunks/000001 bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.429493128Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NN47AHEX1HAK96CDCHQ0/index dst=01HY03NN47AHEX1HAK96CDCHQ0/index bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.431264794Z caller=downsample.go:426 level=info msg="uploaded block" id=01HY03NN47AHEX1HAK96CDCHQ0 duration=7.953288ms duration_ms=7
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.433450918Z caller=downsample.go:141 level=info msg="start second pass of downsampling"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.433487709Z caller=fetcher.go:476 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=32
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.43955329Z caller=fetcher.go:626 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=6.084414ms duration_ms=6 cached=4 returned=4 partial=0
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.440099248Z caller=downsample.go:253 level=debug msg="downsampling bucket" concurrency=1
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.444001705Z caller=objstore.go:364 level=debug msg="not downloading again because a provided path matches this one" file=01HY03NN16FY7NGAVRG92WPGH1/meta.json
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.446442245Z caller=downsample.go:364 level=info msg="downloaded block" id=01HY03NN16FY7NGAVRG92WPGH1 duration=6.311872ms duration_ms=6
09:08:10 Ports for container c-issue6775-downsampler-downsample >> Local ports: map[http:8080] Ports available from host: map[http:32769]
level=error msg="function failed. Retrying in next tick" err="getting metrics: Get \"http://127.0.0.1:32769/metrics\": dial tcp 127.0.0.1:32769: connect: connection refused"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.537827663Z caller=streamed_block_writer.go:178 level=info msg="finalized downsampled block" mint=1710374400014 maxt=1711584000000 ulid=01HY03NNAFNNES84Z3NWMDM5F0 resolution=3600000
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.537862996Z caller=downsample.go:392 level=info msg="downsampled block" from=01HY03NN16FY7NGAVRG92WPGH1 to=01HY03NNAFNNES84Z3NWMDM5F0 duration=91.169126ms duration_ms=91
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.542280786Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NNAFNNES84Z3NWMDM5F0/chunks/000001 dst=01HY03NNAFNNES84Z3NWMDM5F0/chunks/000001 bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.543893702Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NNAFNNES84Z3NWMDM5F0/index dst=01HY03NNAFNNES84Z3NWMDM5F0/index bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.545633159Z caller=downsample.go:426 level=info msg="uploaded block" id=01HY03NNAFNNES84Z3NWMDM5F0 duration=6.661622ms duration_ms=6
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.551217698Z caller=objstore.go:364 level=debug msg="not downloading again because a provided path matches this one" file=01HY03NN47AHEX1HAK96CDCHQ0/meta.json
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.553617031Z caller=downsample.go:364 level=info msg="downloaded block" id=01HY03NN47AHEX1HAK96CDCHQ0 duration=6.141414ms duration_ms=6
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.717333041Z caller=streamed_block_writer.go:178 level=info msg="finalized downsampled block" mint=1710374400014 maxt=1711584000000 ulid=01HY03NNDTPE9GA0EM24Z68PPB resolution=3600000
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.717372583Z caller=downsample.go:392 level=info msg="downsampled block" from=01HY03NN47AHEX1HAK96CDCHQ0 to=01HY03NNDTPE9GA0EM24Z68PPB duration=163.508677ms duration_ms=163
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.722687872Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NNDTPE9GA0EM24Z68PPB/chunks/000001 dst=01HY03NNDTPE9GA0EM24Z68PPB/chunks/000001 bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.724532371Z caller=objstore.go:291 level=debug msg="uploaded file" from=/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/downsampler-downsample/01HY03NNDTPE9GA0EM24Z68PPB/index dst=01HY03NNDTPE9GA0EM24Z68PPB/index bucket="tracing: compact-test"
09:08:10 downsampler-downsample: ts=2024-05-16T07:08:10.726111329Z caller=downsample.go:426 level=info msg="uploaded block" id=01HY03NNDTPE9GA0EM24Z68PPB duration=7.12683ms duration_ms=7
level=error msg="function failed. Retrying in next tick" err="getting metrics: Get \"http://127.0.0.1:32769/metrics\": dial tcp 127.0.0.1:32769: connect: connection refused"
09:08:12 Stopping downsampler-downsample
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.516356177Z caller=main.go:182 level=info msg="caught signal. Exiting." signal=terminated
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.516429969Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason=null
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.516454469Z caller=http.go:91 level=info service=http/server component=downsample msg="internal server is shutting down" err=null
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.516653844Z caller=http.go:110 level=info service=http/server component=downsample msg="internal server is shutdown gracefully" err=null
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.516694969Z caller=intrumentation.go:81 level=info msg="changing probe status" status=not-healthy reason=null
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.51672801Z caller=main.go:174 level=info msg=exiting
09:08:12 downsampler-downsample: ts=2024-05-16T07:08:12.516756177Z caller=main.go:77 level=debug msg="maxprocs: No GOMAXPROCS change to reset%!(EXTRA []interface {}=[])"
09:08:12 Starting compact-working
09:08:12 compact-working: ts=2024-05-16T07:08:12.968274721Z caller=factory.go:53 level=info name=compact-working msg="loading bucket configuration"
09:08:12 compact-working: ts=2024-05-16T07:08:12.970435053Z caller=compact.go:264 level=info name=compact-working msg="vertical compaction is enabled" compact.enable-vertical-compaction=true
09:08:12 compact-working: ts=2024-05-16T07:08:12.971615135Z caller=compact.go:687 level=info name=compact-working msg="starting compact node"
09:08:12 compact-working: ts=2024-05-16T07:08:12.971635969Z caller=intrumentation.go:56 level=info name=compact-working msg="changing probe status" status=ready
09:08:12 compact-working: ts=2024-05-16T07:08:12.97171726Z caller=intrumentation.go:75 level=info name=compact-working msg="changing probe status" status=healthy
09:08:12 compact-working: ts=2024-05-16T07:08:12.97174126Z caller=http.go:73 level=info name=compact-working service=http/server component=compact msg="listening for requests and metrics" address=:8080
09:08:12 compact-working: ts=2024-05-16T07:08:12.971793177Z caller=compact.go:1488 level=info name=compact-working msg="start sync of metas"
09:08:12 compact-working: ts=2024-05-16T07:08:12.971932968Z caller=tls_config.go:313 level=info name=compact-working service=http/server component=compact msg="Listening on" address=[::]:8080
09:08:12 compact-working: ts=2024-05-16T07:08:12.971951093Z caller=tls_config.go:316 level=info name=compact-working service=http/server component=compact msg="TLS is disabled." http2=false address=[::]:8080
09:08:13 compact-working: ts=2024-05-16T07:08:12.986095959Z caller=fetcher.go:626 level=info name=compact-working component=block.BaseFetcher msg="successfully synchronized block metadata" duration=14.377241ms duration_ms=14 cached=6 returned=6 partial=0
09:08:13 compact-working: ts=2024-05-16T07:08:12.988254958Z caller=fetcher.go:626 level=info name=compact-working component=block.BaseFetcher msg="successfully synchronized block metadata" duration=16.464281ms duration_ms=16 cached=6 returned=4 partial=0
09:08:13 compact-working: ts=2024-05-16T07:08:12.988277541Z caller=clean.go:34 level=info name=compact-working msg="started cleaning of aborted partial uploads"
09:08:13 compact-working: ts=2024-05-16T07:08:12.988286166Z caller=clean.go:61 level=info name=compact-working msg="cleaning of aborted partial uploads done"
09:08:13 compact-working: ts=2024-05-16T07:08:12.988288916Z caller=blocks_cleaner.go:44 level=info name=compact-working msg="started cleaning of blocks marked for deletion"
09:08:13 compact-working: ts=2024-05-16T07:08:12.988292333Z caller=blocks_cleaner.go:58 level=info name=compact-working msg="cleaning of blocks marked for deletion done"
09:08:13 compact-working: ts=2024-05-16T07:08:12.990274248Z caller=fetcher.go:626 level=info name=compact-working component=block.BaseFetcher msg="successfully synchronized block metadata" duration=4.135122ms duration_ms=4 cached=6 returned=6 partial=0
09:08:13 compact-working: ts=2024-05-16T07:08:12.992100122Z caller=fetcher.go:626 level=info name=compact-working component=block.BaseFetcher msg="successfully synchronized block metadata" duration=3.802039ms duration_ms=3 cached=6 returned=4 partial=0
09:08:13 compact-working: ts=2024-05-16T07:08:12.992119914Z caller=compact.go:1493 level=info name=compact-working msg="start of GC"
09:08:13 compact-working: ts=2024-05-16T07:08:12.99246608Z caller=compact.go:1516 level=info name=compact-working msg="start of compactions"
09:08:13 compact-working: ts=2024-05-16T07:08:12.993142038Z caller=compact.go:1136 level=info name=compact-working group="300000@{case=\"downsampled-block-with-overlap\"}" groupKey=300000@14846485652960182170 msg="compaction available and planned" plan="[01HY03NN47AHEX1HAK96CDCHQ0 (min time: 1710374400014, max time: 1711584000000) 01HY03NN16FY7NGAVRG92WPGH1 (min time: 1710374400014, max time: 1711584000000)]"
09:08:13 compact-working: ts=2024-05-16T07:08:12.993162538Z caller=compact.go:1145 level=info name=compact-working group="300000@{case=\"downsampled-block-with-overlap\"}" groupKey=300000@14846485652960182170 msg="finished running pre compaction callback; downloading blocks" duration=2.75µs duration_ms=0 plan="[01HY03NN47AHEX1HAK96CDCHQ0 (min time: 1710374400014, max time: 1711584000000) 01HY03NN16FY7NGAVRG92WPGH1 (min time: 1710374400014, max time: 1711584000000)]"
09:08:13 compact-working: ts=2024-05-16T07:08:12.999358534Z caller=fetcher.go:626 level=info name=compact-working component=block.BaseFetcher msg="successfully synchronized block metadata" duration=7.140621ms duration_ms=7 cached=6 returned=4 partial=0
09:08:13 compact-working: ts=2024-05-16T07:08:13.009822568Z caller=compact.go:1203 level=info name=compact-working group="300000@{case=\"downsampled-block-with-overlap\"}" groupKey=300000@14846485652960182170 msg="downloaded and verified blocks; compacting blocks" duration=16.645822ms duration_ms=16 plan="[/Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/compact-working/compact/300000@14846485652960182170/01HY03NN47AHEX1HAK96CDCHQ0 /Users/pedro/src/github.com/pedro-stanaka/thanos/test/e2e/e2e_3499631035/data/compact-working/compact/300000@14846485652960182170/01HY03NN16FY7NGAVRG92WPGH1]"
09:08:13 compact-working: ts=2024-05-16T07:08:13.015165523Z caller=compact.go:757 level=info name=compact-working msg="Found overlapping blocks during compaction" ulid=01HY03NQTKGEAC36QQV2ASFJ88
09:08:13 compact-working: ts=2024-05-16T07:08:13.018910353Z caller=intrumentation.go:67 level=warn name=compact-working msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 300000@14846485652960182170: paniced while compacting 01HY03NN16FY7NGAVRG92WPGH1,01HY03NN47AHEX1HAK96CDCHQ0: unexpected seriesToChunkEncoder lack of iterations"
09:08:13 compact-working: ts=2024-05-16T07:08:13.018935853Z caller=http.go:91 level=info name=compact-working service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 300000@14846485652960182170: paniced while compacting 01HY03NN16FY7NGAVRG92WPGH1,01HY03NN47AHEX1HAK96CDCHQ0: unexpected seriesToChunkEncoder lack of iterations"
09:08:13 compact-working: ts=2024-05-16T07:08:13.018984187Z caller=http.go:110 level=info name=compact-working service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 300000@14846485652960182170: paniced while compacting 01HY03NN16FY7NGAVRG92WPGH1,01HY03NN47AHEX1HAK96CDCHQ0: unexpected seriesToChunkEncoder lack of iterations"
09:08:13 compact-working: ts=2024-05-16T07:08:13.018997353Z caller=intrumentation.go:81 level=info name=compact-working msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 300000@14846485652960182170: paniced while compacting 01HY03NN16FY7NGAVRG92WPGH1,01HY03NN47AHEX1HAK96CDCHQ0: unexpected seriesToChunkEncoder lack of iterations"
09:08:13 compact-working: ts=2024-05-16T07:08:13.01911052Z caller=main.go:171 level=error name=compact-working err="group 300000@14846485652960182170: paniced while compacting 01HY03NN16FY7NGAVRG92WPGH1,01HY03NN47AHEX1HAK96CDCHQ0: unexpected seriesToChunkEncoder lack of iterations\ncompaction\nmain.runCompact.func7\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:440\nmain.runCompact.func8.1\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:525\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/go/src/github.com/thanos-io/thanos/pkg/runutil/runutil.go:91\nmain.runCompact.func8\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:524\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1197\nerror executing compaction\nmain.runCompact.func8.1\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:552\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/go/src/github.com/thanos-io/thanos/pkg/runutil/runutil.go:91\nmain.runCompact.func8\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:524\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1197\ncompact command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:171\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1197"
09:08:35 
Error: No such object: c-issue6775-compact-working

09:08:35 Killing minio
--- PASS: TestCompactorIssue6775 (37.51s)
PASS

@pull-request-size pull-request-size bot added size/L and removed size/M labels May 16, 2024
CHANGELOG.md Outdated Show resolved Hide resolved
pkg/query/remote_engine.go Show resolved Hide resolved
pkg/query/remote_engine.go Show resolved Hide resolved
fpetkovski
fpetkovski previously approved these changes May 16, 2024
pkg/query/remote_engine.go Outdated Show resolved Hide resolved
fpetkovski
fpetkovski previously approved these changes May 21, 2024
@pedro-stanaka
Copy link
Contributor Author

@GiedriusS and/or @MichaHoffmann do you mind giving a second look here pls? Thx

We are currently losing track of query stats because the remote engine does not transmit performance stats on gRPC calls.
In this PR I am adding some fields to the Query API response to include some stats.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Copy link
Contributor

@MichaHoffmann MichaHoffmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if s := msg.GetStats(); s != nil {
qryStats = *s
continue
}

ts := msg.GetTimeseries()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since old root queriers dont have the ==nil check ~ this will probably break if leaf queriers are updated to send stats first. One should update root queriers first and then leaf queriers probably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this as a note to the CHANGELOG for the distributed engine use case. I don't think many people use this mode yet, but it is good to point out the rollout strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I dont think its a big issue ~ just wanted to point it out; We can add a small warning in release notes later

@fpetkovski fpetkovski merged commit 1282e84 into thanos-io:main May 24, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants