-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client-go/transport: fix memory leak when using rest.Config Dial function #124894
base: master
Are you sure you want to change the base?
client-go/transport: fix memory leak when using rest.Config Dial function #124894
Conversation
…tion with client.New and kubernetes.NewForConfig
Welcome @lizardruss! |
Hi @lizardruss. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: lizardruss The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
/assign @enj |
@lizardruss: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
I didn't check in detail, is this related ot this #117258, we have added some metrics to have more visibility into these problems #117295, @lizardruss just curiosity, do you know if the metrics are useful in this case ? |
@@ -112,7 +112,7 @@ func (c *Config) TransportConfig() (*transport.Config, error) { | |||
} | |||
|
|||
if c.Dial != nil { | |||
conf.DialHolder = &transport.DialHolder{Dial: c.Dial} | |||
conf.DialHolder = &transport.DialHolder{Dial: c.Dial, DisableCache: true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem with all of these types of changes is that they make assumptions about how the caller was using this code. A caller of this method could be re-using the return value, meaning they would correctly be getting the TLS cache benefits, which this PR would now disable. We allow for infinite flexibility, so every change has the potential to break someone in a subtle way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your perspective, and originally I wanted to expose the DialHolder on the rest.Config
struct, however that is a larger change. It would provide a more intentional way of re-using the DialHolder when the caching behavior is expected.
As things exist now, unless the caller takes care to re-use the returned DialHolder, it's not obvious that a memory leak occurs when the Dial
function is configured. The places we most often encounter this issue are client.New() and kubernetes.NewForConfig(), which make re-using the DialHolder
non-obvious.
/triage accepted |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When using a rest.Config
Dial
function and constructing a client withclient.New()
orkubernetes.NewForConfig()
, thetlsConfigCache
is ineffective because theDial
function is wrapped in a newDialHolder
instance every time. This leads to a memory leak since thetlsConfigCache
grows unbounded with unique keys. This change allows skipping thetlsConfigCache
for this case.Which issue(s) this PR fixes:
Fixes #118703
Special notes for your reviewer:
Here are graphs showing the change in memory usage before & after this fix. I have marked this as not having a user facing change, since unless the user has taken care to reuse
DialHolder
instances when creating clients, they would not have seen the benefits of thetlsConfigCache
.before
after
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: