Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow statsd stops sending metrics during maximum dagrun #39571

Closed
1 of 2 tasks
paramjeet01 opened this issue May 11, 2024 · 3 comments
Closed
1 of 2 tasks

Airflow statsd stops sending metrics during maximum dagrun #39571

paramjeet01 opened this issue May 11, 2024 · 3 comments
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@paramjeet01
Copy link

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.3

What happened?

Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing. No logs were found in the statsd pod and no spike in cpu or memory is found in the statsd pod.

What you think should happen instead?

The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.

How to reproduce

Run tasks more than 200 in parallel in multiple dags

Operating System

Amazon Linux 2

Versions of Apache Airflow Providers

pytest>=6.2.5
docker>=5.0.0
crypto>=1.4.1
cryptography>=3.4.7
pyOpenSSL>=20.0.1
ndg-httpsclient>=0.5.1
boto3>=1.34.0
sqlalchemy
redis>=3.5.3
requests>=2.26.0
pysftp>=0.2.9
werkzeug>=1.0.1
apache-airflow-providers-cncf-kubernetes==8.0.0
apache-airflow-providers-amazon>=8.13.0
psycopg2>=2.8.5
grpcio>=1.37.1
grpcio-tools>=1.37.1
protobuf>=3.15.8,<=3.21
python-dateutil>=2.8.2
jira>=3.1.1
confluent_kafka>=1.7.0
pyarrow>=10.0.1,<10.1.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Official helm chart deployment.

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@paramjeet01 paramjeet01 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 11, 2024
@rawwar
Copy link
Collaborator

rawwar commented May 12, 2024

Can you confirm if this is happening consistently when you run 200+ tasks in parallel

@paramjeet01
Copy link
Author

@rawwar , yes I can confirm that the issue occurs intermittently while we run 250+ tasks in parallel

@Taragolis
Copy link
Contributor

Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing

The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.

All signs here that this issue with statsd and not Apache Airflow itself

@apache apache locked and limited conversation to collaborators May 16, 2024
@Taragolis Taragolis converted this issue into discussion #39664 May 16, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

3 participants