You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using the Netdata Cloud Business plan in order to monitor several clusters of ours, including some clusters on AWS.
For one of these AWS clusters, we use an EFS in order to bring up persistent volumes.
After some time we noticed that there is a pretty high number of read/write operations between the EFS and the cluster node that the Netdata parent pod is running.
Specifically using nfsiostat, we noticed that on average there were ~300 Operations/Sec, most of which refer to Read operations and fewer refer to Write operations. This increased number of IO operations caused a large increase of the EFS cost on AWS.
We then disabled the persistence for parent.alarms, parent.database and k8sState and noticed that both the IO operations as well as the EFS price were dropped significantly (almost $1000).
Based on the above behavior, I believe that the increased Disk IO traffic is consistent and does not only increase during the parent's startup process.
Finally, is Machine Learning enabled by default for the parent? We could disable it if you believe that the increased number of Disk IO operations comes from ML.
Finally, is Machine Learning enabled by default for the parent?
Yes, ML is enabled everywhere by default.
We could disable it if you believe that the increased number of Disk IO operations comes from ML.
ML is probably the culprit here because it needs to read historical data at regular intervals for every dimension that gets trained. You can disable it by updating the [ml] section of netdata.conf like this:
We disabled machine learning on the parent and upgraded to the latest version (3.7.89). We re-enabled persistence as well, so we'll have to wait for a few days in order to see if there will be an increase in traffic again.
Bug description
We are using the Netdata Cloud Business plan in order to monitor several clusters of ours, including some clusters on
AWS
.For one of these AWS clusters, we use an
EFS
in order to bring up persistent volumes.After some time we noticed that there is a pretty high number of
read/write operations
between the EFS and the cluster node that the Netdata parent pod is running.Specifically using nfsiostat, we noticed that on average there were ~300 Operations/Sec, most of which refer to Read operations and fewer refer to Write operations. This increased number of IO operations caused a large increase of the EFS cost on AWS.
We then disabled the persistence for
parent.alarms
,parent.database
andk8sState
and noticed that both the IO operations as well as the EFS price were dropped significantly (almost $1000).Based on the above behavior, I believe that the increased Disk IO traffic is consistent and does not only increase during the parent's startup process.
Finally, is Machine Learning enabled by default for the parent? We could disable it if you believe that the increased number of Disk IO operations comes from ML.
In case you require any further information please let us know.
Expected behavior
Number of Read/Write operations should decrease if possible.
Steps to reproduce
...
Installation method
helmchart (kubernetes)
System info
Netdata build info
Additional info
No response
The text was updated successfully, but these errors were encountered: