Scheduler: Share `frameworkImpl.waitingPods` among profiles #122945

NoicFank · 2024-01-24T12:50:48Z

What would you like to be added?

I hope to share the waitingPods among multiple profiles, instead of creating a new waitingPods in each profiles with the current implementation.

In other words, waitingPods is only instantiated once in a scheduler app.

kubernetes/pkg/scheduler/framework/runtime/framework.go

Lines 253 to 258 in a1ffded

    
           f := &frameworkImpl{ 
        
           	registry:             r, 
        
           	snapshotSharedLister: options.snapshotSharedLister, 
        
           	scorePluginWeight:    make(map[string]int), 
        
           	waitingPods:          newWaitingPodsMap(), 
        
           	clientSet:            options.clientSet,

@Huang-Wei @ahg-g PTAL, Is there any negative impact of this change that I didn't consider, thanks.

Why is this needed?

In some scenarios, I need to traverse all pods waiting in the permit stage within the current scheduler app, rather than the pods that using the current profile for scheduling and waiting in the permit stage.

For example, we want use coscheduling to achieve all-or-nothing. And we have abstracted the concept of a Cluster, where there are two deployments under each cluster. Each deployments use different scheduler profile, and one cluster corresponds to one podGroup, as following:

Then, we want pods in one cluster (podA1, podA2, podB1) to be scheduled succeed/failed together.

We assume that all pods can pass through the filter plugins, and podA2 is the last to be scheduled. At this point, podA1 and podB1 are both waiting in the permit stage. Afterwards, we traverse the waitingPods to complete the permit waiting for podA1 and podB1. However, from an implementation perspective, now we can only see podA1 in waitingPods, podB1 cannot be seen, becausepodA2 & podB1 use different scheduling profiles. Which will cause podB1 timeout during the permit phase, even if all pods within the podGroup are successfully scheduled.

The text was updated successfully, but these errors were encountered:

NoicFank · 2024-01-24T12:58:33Z

/sig scheduling

NoicFank · 2024-01-24T13:02:58Z

/assign @Huang-Wei @ahg-g

NoicFank · 2024-01-25T02:30:40Z

/triage accepted

kerthcet · 2024-01-26T03:22:38Z

So the Cluster is cross-k8s-clusters object, and these two different deployments under different k8s-cluster, right?

NoicFank · 2024-01-26T03:33:26Z

So the Cluster is cross-k8s-clusters object, and these two different deployments under different k8s-cluster, right?

Not really, these two deployments are located in the same k8s-cluster

It can be understood that the Cluster in the example above is roughly equivalent to an namespace, where these two deployments are located in the same ns, and we hope that all pods managed by the two deployments under this ns can be scheduled successfully or fail simultaneously.

kerthcet · 2024-01-26T03:52:35Z

Can you elaborate more? Why these deployments should be scheduled together, I can image this might be useful, just curious about the user story. Thanks.

NoicFank · 2024-01-26T06:35:23Z

Can you elaborate more? Why these deployments should be scheduled together, I can image this might be useful, just curious about the user story. Thanks.

Of course, our specific usage scenario is for stateful service (DB). For a DB instance (which is a Cluster mentioned above), there will be many pods under each DB instance. We will divide those pods into different components according to their functions (mainly different images), and for each component, we will use a workload(sts/deploy/...) to manage it. Overall, there will be multiple workloads under each DB instance, and each workload will manage its own pods.

For scheduling, we need to consider successfully scheduling all pods under the same instance to ensure DB service availability, because if only some pods under an instance are successfully scheduled, the DB service is still unavailable (even if all pods managed by one workload of the DB instance are successfully scheduled)

NoicFank · 2024-01-26T06:44:14Z

Can you elaborate more? Why these deployments should be scheduled together, I can image this might be useful, just curious about the user story. Thanks.

Following is an open-source project to manage databases on K8s that you may be interested in.
https://github.com/apecloud/kubeblocks

NoicFank added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 24, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2024

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 24, 2024

k8s-ci-robot assigned ahg-g and Huang-Wei Jan 24, 2024

NoicFank mentioned this issue Jan 24, 2024

enhancement(scheduler): share waitingPods among profiles #122946

Merged

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2024

k8s-ci-robot closed this as completed in #122946 Feb 2, 2024

kerthcet mentioned this issue May 17, 2024

enhancement(scheduler): share waitingPods among profiles #124926

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler: Share `frameworkImpl.waitingPods` among profiles #122945

Scheduler: Share `frameworkImpl.waitingPods` among profiles #122945

NoicFank commented Jan 24, 2024

NoicFank commented Jan 24, 2024

NoicFank commented Jan 24, 2024

NoicFank commented Jan 25, 2024

kerthcet commented Jan 26, 2024

NoicFank commented Jan 26, 2024

kerthcet commented Jan 26, 2024 •

edited

NoicFank commented Jan 26, 2024 •

edited

NoicFank commented Jan 26, 2024

Scheduler: Share frameworkImpl.waitingPods among profiles #122945

Scheduler: Share frameworkImpl.waitingPods among profiles #122945

Comments

NoicFank commented Jan 24, 2024

What would you like to be added?

Why is this needed?

NoicFank commented Jan 24, 2024

NoicFank commented Jan 24, 2024

NoicFank commented Jan 25, 2024

kerthcet commented Jan 26, 2024

NoicFank commented Jan 26, 2024

kerthcet commented Jan 26, 2024 • edited

NoicFank commented Jan 26, 2024 • edited

NoicFank commented Jan 26, 2024

Scheduler: Share `frameworkImpl.waitingPods` among profiles #122945

Scheduler: Share `frameworkImpl.waitingPods` among profiles #122945

kerthcet commented Jan 26, 2024 •

edited

NoicFank commented Jan 26, 2024 •

edited