[Question] How horovod ensures allreduce is finished before gradients get applied in tensorflow #3900

zhaolianshuizls · 2023-04-24T01:27:37Z

zhaolianshuizls
Apr 24, 2023

In Tensorflow, the main thread calls HorovodAllreduceOP::ComputeAsync to put allreduce requests into a queue through EnqueueTensorAllreduce, which are later consumed by the background thread, and the actual allreduce process is done by the background thread. If I got that right, here is the question which puzzles me for a while. The tensorflow graph is run in the main thread, and the allreduce operations are done inside the background thread, so how these two threads are synced so that allreduce is done before the reduced gradients are applied?

Answered by maxhgerlach

Apr 25, 2023

Hi @zhaolianshuizls, good question, this is not entirely obvious!

In HorovodAllreduceOP::ComputeAsync() we also define a callback (to be called by the Horovod background thread), which will inform TensorFlow that the operation is done (by calling the TF callback done()), with this lambda function:
https://github.com/horovod/horovod/blob/master/horovod/tensorflow/mpi_ops.cc#L492-L505

Op implementations like NCCLAllreduce::Execute() (https://github.com/horovod/horovod/blob/master/horovod/common/ops/nccl_operations.cc#L185) will make sure that this callback is called once the operation is complete. In this case via FinalizeGPUQueue(), https://github.com/horovod/horovod/blob/master/horovod/co…

View full answer

maxhgerlach · 2023-04-25T14:32:57Z

maxhgerlach
Apr 25, 2023
Collaborator

Hi @zhaolianshuizls, good question, this is not entirely obvious!

In HorovodAllreduceOP::ComputeAsync() we also define a callback (to be called by the Horovod background thread), which will inform TensorFlow that the operation is done (by calling the TF callback done()), with this lambda function:
https://github.com/horovod/horovod/blob/master/horovod/tensorflow/mpi_ops.cc#L492-L505

Op implementations like NCCLAllreduce::Execute() (https://github.com/horovod/horovod/blob/master/horovod/common/ops/nccl_operations.cc#L185) will make sure that this callback is called once the operation is complete. In this case via FinalizeGPUQueue(), https://github.com/horovod/horovod/blob/master/horovod/common/ops/gpu_operations.cc#L59

TensorFlow uses a pool with multiple threads of its own to run async operations like a Horovod Allreduce, so they don't all run in the same main thread.

1 reply

zhaolianshuizls Apr 26, 2023
Author

Thanks, that helps a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How horovod ensures allreduce is finished before gradients get applied in tensorflow #3900

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[Question] How horovod ensures allreduce is finished before gradients get applied in tensorflow #3900

zhaolianshuizls Apr 24, 2023

Replies: 1 comment · 1 reply

maxhgerlach Apr 25, 2023 Collaborator

zhaolianshuizls Apr 26, 2023 Author

zhaolianshuizls
Apr 24, 2023

Replies: 1 comment 1 reply

maxhgerlach
Apr 25, 2023
Collaborator

zhaolianshuizls Apr 26, 2023
Author