New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not split nodes when searching for nodes but doing it all at once #67555
Not split nodes when searching for nodes but doing it all at once #67555
Conversation
/sig scheduling |
/assign @bsalamat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should try too hard to mimic the Parallelize
API that is being replaced here. If there's a pattern that works better for us here, let's go with it.
|
||
// ParallelizeUntilFeasible is a very simple framework that allow for parallelizing | ||
// N independent pieces of work until get feasible solution. | ||
func ParallelizeUntilFeasible(workers, pieces int, doWorkPiece DoWorkPieceFunc, feasible *int32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normal way to accomplish something like this in golang would be to use a context.Context
for the work that you are dispatching, then cancelling the context when you want to abandon the rest of the work.
It would also make sense to have the workers fan in their results to a goroutine that handles results and decides when to cancel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I understand what you mean, maybe a callback function can do the same, just like bsalamat said. And using Context may be too complicated for a public function.
|
||
// ParallelizeUntilFeasible is a very simple framework that allow for parallelizing | ||
// N independent pieces of work until get feasible solution. | ||
func ParallelizeUntilFeasible(workers, pieces int, doWorkPiece DoWorkPieceFunc, feasible *int32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is in client-go, I think we should make this a more generic function. More specifically, I think we should:
- Rename it to
ParallelizeUntil
. - Change
feasible *int32
to a function that returns a bool and an error, similar to ConditionFunc. When the return value is true, it stops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your kind review. :). I've updated it.
b2d0ce3
to
554b2d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should try too hard to mimic the Parallelize API that is being replaced here. If there's a pattern that works better for us here, let's go with it.
I apologize. I didn't notice that you were adding the util in client-go
. Given that, it makes lots of sense to mimic the Parallelize API.
Indeed, I understand what you mean, maybe a callback function can do the same, just like bsalamat said. And using Context may be too complicated for a public function.
I don't believe that Context should be considered "too complicated" for a public function. In this case, it offers features that we don't need, so a stop channel would also work. (Passing a stop channel around is a very common pattern in k8s, but it's an old pattern that really ought to be replaced with Context.)
|
||
// ParallelizeUntil is a very simple framework that allow for parallelizing | ||
// N independent pieces of work until the condition function return true. | ||
func ParallelizeUntil(workers, pieces int, doWorkPiece DoWorkPieceFunc, condition ConditionFunc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest replacing ConditionFunc with a context.Context
or stop channel. Then each worker just selects on toProcess
and the done/stop channel.
// if the loop should be aborted. | ||
type ConditionFunc func() (done bool, err error) | ||
|
||
// ParallelizeUntil is a very simple framework that allow for parallelizing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/allow/allows/
and I would appreciate it if you fix the same typo on Parallelize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
|
||
// ParallelizeUntil is a very simple framework that allow for parallelizing | ||
// N independent pieces of work until the condition function return true. | ||
func ParallelizeUntil(workers, pieces int, doWorkPiece DoWorkPieceFunc, condition ConditionFunc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a lot of duplicated code here, I suggest rewriting Parallelize
to just call this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR merge, I will submit a separate PR to replace all Parallelize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was a slight misunderstanding. We should only replace the definition of Parallelize
so that existing callers won't see any change. Parallelize
would just call ParallelizeUntil
.
I think that is a small change that should be included in this PR.
|
||
// Stops searching for more nodes once the configured number of feasible nodes are found( | ||
// once the remainingNodesNumber is less than or equal to 0). | ||
workqueue.ParallelizeUntil(16, int(allNodes), checkNode, condition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know a lot of this code was here before your change, but with this new ParallelizeUntil
function, I think we can make this code much cleaner and easier to read.
Rather than having each worker write to an array and atomically increment and index variable, workers should send *v1.Node
objects on a result channel. Then in this function, we can read from the result channel and build our filtered
slice. Since only one goroutine can access filtered
we can remove the synchronization around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question and need your guidance. If pass the channel instead of synchronization as you said, is it also needed for errs and failedPredicateMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the size of the channel buffer is also a problem that needs to be determined.
|
||
// Stops searching for more nodes once the configured number of feasible nodes are found( | ||
// once the remainingNodesNumber is less than or equal to 0). | ||
workqueue.ParallelizeUntil(16, int(allNodes), checkNode, condition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say, it is a little weird that we are building a new util which passes integers to the worker funcs when our specific use case ignores those integers.
Since golang doesn't have generics, the more "canonical" way to solve that would be to have the worker funcs capture their work channel in a closure so that the utility doesn't have to manage that. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with you. We should pass in a channel, and then each worker function only needs data from the goroutine (here is one *v1.Node). But do we need to build this channel first? Because this may be a channel with a large size, this is my doubt.
defer wg.Done() | ||
for piece := range toProcess { | ||
// Return and abort if the condition is satisfied. | ||
if done, err := condition(); done && err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happen if err is NOT nil?
defer wg.Done() | ||
for piece := range toProcess { | ||
// Return and abort if the condition is satisfied. | ||
if done, err := condition(); done && err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, as an util, do we accept nil conditionFunc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,we should handle it. I'll follow misterikkit's comment and update it later.
21aa2c1
to
6b26ab5
Compare
var ( | ||
predicateResultLock sync.Mutex | ||
filteredLen int32 | ||
remainingNodesNumber int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need remainingNodeNumber
? I think we can use filteredLen
and stop searching for more nodes once filteredLen >= numNodesToFind
.
@@ -413,24 +419,23 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v | |||
} | |||
if fits { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we re-write this if statement as below, we can change line 377 to filtered = make([]*v1.Node, numNodesToFind)
. This saves us some memory.
if fits {
len := atomic.AddInt32(&filteredLen, 1)
if len > numNodesToFind {
cancel()
} else {
filtered[len - 1] = g.cachedNodeInfoMap[nodeName].Node()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE.
6b26ab5
to
ae4bf88
Compare
@@ -373,16 +374,19 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v | |||
|
|||
// Create filtered list with enough space to avoid growing it | |||
// and allow assigning. | |||
filtered = make([]*v1.Node, 2*numNodesToFind) | |||
filtered = make([]*v1.Node, numNodesToFind+16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will not add to filtered
if the index is larger than numNodesToFind
, so we should be fine with allocating only numNodesToFind
entries. If you still insist, please create a const, like numWorkers
, and use it here and in ParallelizeUntil
. In a few years, no body would know why this +16 is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry,this is what I tested and I forgot to delete it.
ae4bf88
to
65127ae
Compare
/retest |
/test pull-kubernetes-kubemark-e2e-gce-big |
65127ae
to
7a48c9f
Compare
/test pull-kubernetes-bazel-build |
c54f8ce
to
c6b7f76
Compare
func ParallelizeUntil(ctx context.Context, workers, pieces int, doWorkPiece DoWorkPieceFunc) { | ||
// If the passed ctx is nil, then the default context.TODO() is used. | ||
if ctx == nil { | ||
ctx = context.TODO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change to return Parallelize(workers, pieces, doWorkPiece)
so that we don't need to check if context is done in select{}
for this "ctx == nil" case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other comments, I think the plan is for Parallelize
to call this func with a nil context to eliminate duplicate code.
context.TODO()
implies that we need further code changes to get an appropriate context. Since your goal is to create a context that never cancels, it is more appropriate to use context.Background()
.
However, I think it would be cleaner to write it as,
var stop chan struct{}
if ctx != nil {
stop = ctx.Done()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good idea, ParallelizeUntil should not rely on Parallelize.Because ParallelizeUntil may replace Parallelize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, it would be nice to rewrite the commit description and PR description to more clearly explain the change.
@@ -50,3 +51,37 @@ func Parallelize(workers, pieces int, doWorkPiece DoWorkPieceFunc) { | |||
} | |||
wg.Wait() | |||
} | |||
|
|||
// ParallelizeUntil is a framework that allows for parallelizing N | |||
// independent pieces of work until the context done or canceled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest rewording "until the context done or canceled" to "until done or the context is canceled"
func ParallelizeUntil(ctx context.Context, workers, pieces int, doWorkPiece DoWorkPieceFunc) { | ||
// If the passed ctx is nil, then the default context.TODO() is used. | ||
if ctx == nil { | ||
ctx = context.TODO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other comments, I think the plan is for Parallelize
to call this func with a nil context to eliminate duplicate code.
context.TODO()
implies that we need further code changes to get an appropriate context. Since your goal is to create a context that never cancels, it is more appropriate to use context.Background()
.
However, I think it would be cleaner to write it as,
var stop chan struct{}
if ctx != nil {
stop = ctx.Done()
}
} | ||
|
||
// Stops searching for more nodes once the configured number of feasible nodes are found( | ||
// once the remainingNodesNumber is less than or equal to 0 and cancel the ctx). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs updating, since remainingNodesNumber
is no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
c6b7f76
to
6d7ec85
Compare
@sttts @bsalamat @misterikkit |
/assign @sttts for another round of review and approval |
6d7ec85
to
4753ed6
Compare
toProcess := make(chan int, pieces) | ||
for i := 0; i < pieces; i++ { | ||
toProcess <- i | ||
} | ||
close(toProcess) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would leave these lines.
/lgtm For client-go changes. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, sttts, wgliang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
4753ed6
to
6c63dcf
Compare
New changes are detected. LGTM label has been removed. |
/retest |
@wgliang: You must be a member of the kubernetes/kubernetes-milestone-maintainers github team to set the milestone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ping @bsalamat |
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. |
What this PR does / why we need it:
Not split nodes when searching for nodes but doing it all at once.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
@bsalamat
This is a follow up PR of #66733.
#66733 (comment)
Release note: