Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Hub - incorrect metrics values #5784

Open
Duri9292 opened this issue May 6, 2024 · 8 comments
Open

Event Hub - incorrect metrics values #5784

Duri9292 opened this issue May 6, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@Duri9292
Copy link

Duri9292 commented May 6, 2024

Report

I'm getting incorrect values from the external metric. Sometimes the external metrics provide the correct values but most of the time the current values are way off.

Metrics examples:
Here you can see that averageValue is 1040334m which does not make sense and it will trigger the maximum possible scaling.

currentMetrics:
  - external:
      current:
        averageValue: 1040334m
      metric:
        name: s0-azure-eventhub-onb
        selector:
          matchLabels:
            scaledobject.keda.sh/name: event-hub-scaler

From time to time the averageValue is more accurate and it looks more realistic.

   currentMetrics:
  - external:
      current:
        averageValue: "814"
      metric:
        name: s0-azure-eventhub-onb
        selector:
          matchLabels:
            scaledobject.keda.sh/name: event-hub-scaler

Here are the incoming messages metrics directly from Azure and as you can see we have usually an average of 100 incoming messages per minute.
image

Expected Behavior

The Average values should be more consistent and showing the real values.

Actual Behavior

The current values are jumping from 600 to 580334m while the real average incoming message are usually around 100. We are processing approximately 22 000 messages per day so the average value like 580334m does not make any sense.

Steps to Reproduce the Problem

  1. Configure the azure-eventhub trigger
  2. Monitor the HPA average values

Logs from KEDA operator

2024-05-04T18:45:09Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "eb42d4de-1c9f-4dce-b243-32901de7ce0e"}
2024-05-04T18:45:24Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "f30d4ff1-8640-4cf8-8bc1-414fe92bd72c"}
2024-05-04T18:45:40Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "5e95c24b-f0d5-4a0e-bf19-b437ea3b6d71"}
2024-05-04T18:45:55Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "3b326319-9811-441b-899c-c4c712d4451c"}
2024-05-04T18:50:44Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "9fd2fdf5-3de0-4f12-ba91-9733a41a2670"}
2024-05-04T18:51:00Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "a6bf465f-fe21-472c-9059-bc16ddf56617"}
2024-05-04T18:53:51Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "ec65a58d-8f68-418b-bca4-ee34a3a3f952"}
2024-05-04T18:55:08Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "35cb23fa-6803-459a-a459-dd09beafb8b1"}
2024-05-04T18:55:24Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "98d6080b-9c5e-4f14-a4b0-883f7325648c"}
2024-05-04T18:56:26Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "43613e30-d318-410e-8633-d46cec379c31"}
2024-05-04T18:56:42Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "f0ffdb33-b0a5-4bd1-82d7-1238e817f960"}
2024-05-04T18:56:57Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "45100633-32b2-42fb-bd93-0831d15b4ac9"}
2024-05-04T18:57:13Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "dec7edf4-7580-4c96-91ed-8c6766ea98c3"}
2024-05-04T18:57:28Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "6eec8aec-30e1-44e9-8aec-fca3dc955665"}

KEDA Version

2.11.2

Kubernetes Version

1.28

Platform

Microsoft Azure

Scaler Details

azure-eventhub

Anything else?

No response

@Duri9292 Duri9292 added the bug Something isn't working label May 6, 2024
@Duri9292 Duri9292 changed the title Event Hub - incorrect metrics al Event Hub - incorrect metrics values May 6, 2024
@JorTurFer
Copy link
Member

Hello,
Which is the problem exactly? The value using m (1040334m)?
In K8s context 'm' means mili and it's used when the value is a float number because k8s doesn't use float numbers. When you see 1040334m it means 1040,334. In the same way, jumps between 600 and 580334m are quite normal because it's jumping from 600 and 580,334

@Duri9292
Copy link
Author

Duri9292 commented May 6, 2024

Hello @JorTurFer thank you for your quick response. The issue is that once the number is in mili scale the HPA is always scaling to maximum possible replica number. When the average value is non float number the scaler is decreasing the replicas or scaling as expected.

e.g. current average number: 1741
replica:1

image

current average number: 580334m
replica:3
I configured the trigger value to 5000 and the scaler is always active which should not be in this case.

scaled_to_max

@JorTurFer
Copy link
Member

Are you scrapping prometheus metric generated by KEDA? I almost sure that you have a peak which justifies the scaling out, as you said, you're under the threshold. The only option for that behaviour without a peak is that you have changed the target value and the HPA controller is still during the scaling cooldown (300 after the last scaling out)

@Duri9292
Copy link
Author

Duri9292 commented May 6, 2024

The thing is that we turned down Event Hub data ingestion for the last 24 hrs which means that we are getting 0 incoming messages. (we wanted to test scaling to 0) So there are no peaks. Even a value like 1741 does not make very sense but if it is calculating the average value for the last few days it can be relevant. I will be monitoring the behavior once we enable data ingestion again.

Below is a graph for incoming messages to Event Hub (past 48hrs)
Data granularity: 5 minutes
image

@JorTurFer
Copy link
Member

No no, it doesn't use the average value at all. KEDA uses the current value, so if it's 0 in the eventhub and you don't see 0 in KEDA, it can be a misconfiguration or a bug. Do you see any value different from 0? You can manually query the metric value and check what KEDA returns: https://keda.sh/docs/2.14/operate/metrics-server/#querying-metrics-exposed-by-keda-metrics-server

@Duri9292
Copy link
Author

Hello @JorTurFer to answer your question "Do you see any value different from 0?" yes, even when even hub was turned off the HPA had always some number in meterics.

We enabled the event hub again and for some reason, we stopped getting float values, and scaling is working as expected. Or at least I did not catch any float number during my observation since there is no history of this value I cannot confirm. But it seems that once the float values stopped occurring the scaling is ok.

No no, it doesn't use the average value at all.

The documentation mentions that these are average values, we are using default. If that is not true than sorry I must missed it.

image

However, the values from metrics still do not match values from event hub metrics.
image

Event Hub (sum) for the past 30 min
image

Event Hub (avg) for past 24 hrs
image

@JorTurFer
Copy link
Member

The documentation mentions that these are average values, we are using default. If that is not true than sorry I must missed it.

mb, I understood that KEDA recovers the average value from the eventhub. You are right and k8s workload will be scaled based on the average value calculated using the instant eventhub value

@JorTurFer
Copy link
Member

We enabled the event hub again and for some reason, we stopped getting float values, and scaling is working as expected. Or at least I did not catch any float number during my observation since there is no history of this value I cannot confirm. But it seems that once the float values stopped occurring the scaling is ok.

Float values are correct and they can happen, if eventhub returns 7 and you have 4 pods, you'll have a float value in average

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants