Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block_in_place + block_on can hang on runtime shutdown / runtime drop #6463

Open
dpc opened this issue Apr 5, 2024 · 10 comments
Open

block_in_place + block_on can hang on runtime shutdown / runtime drop #6463

dpc opened this issue Apr 5, 2024 · 10 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime

Comments

@dpc
Copy link
Contributor

dpc commented Apr 5, 2024

Version
List the versions of all tokio crates you are using. The easiest way to get
this information is using cargo tree subcommand:

1.37.0

Platform
The output of uname -a (UNIX), or version and 32 or 64-bit (Windows)

Linux

Description

We're using a lot the block_in_place + block_on pattern described in #5843 . It has many caveats, but it seems to work OK for us, as a async drop workaround.

However today I'm debugging a hang on shutdown. Basically Runtime is dropping and the whole process hangs. When I attach to gdb I can see that only a handful worker threads remain, and a timer thread as well. All worker threads seems to be inside block_in_place + block_on section, parked, waiting for something to wake them up, but I don't think there's any thread left to actually poll the event loop anymore.

I don't know how well supported this pattern should be, and I might be wrong about the whole thing altogether, but it seems to me that if tokio just reserved a single worker for the purpose of polling events and shut it down last, or somehow just avoided getting all worker threads block_in_placed, or shut down the event polling thread last (if a dedicated thread is used) the whole thing would just work.

@dpc dpc added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Apr 5, 2024
@Darksonn Darksonn added the M-runtime Module: tokio/runtime label Apr 5, 2024
@Darksonn
Copy link
Contributor

Darksonn commented Apr 5, 2024

I would expect IO resources and timers to return errors once runtime shutdown start, so it surprises me that it is hanging. Could you give more details?

@dpc
Copy link
Contributor Author

dpc commented Apr 5, 2024

Hmmm...

Maybe it is specific to what I'm doing.

impl Drop for ProcessHandleInner {
    fn drop(&mut self) {
        let Some(child) = &mut self.child else {
            return;
        };
        let name = self.name.clone();
        block_in_place(move || {
            tokio::runtime::Handle::current().block_on(async move {
                debug!(
                    target: LOG_DEVIMINT,
                    "sending SIGKILL to {name} and waiting for it to exit"
                );
                send_sigkill(child);
                if let Err(e) = child.wait().await {
                    warn!(target: LOG_DEVIMINT, "failed to wait for {name}: {e:?}");
                }
            })
        })
    }
}

@Darksonn the child is tokio::process::Child. Could it be it's just this one particular case does doesn't get handled?

@dpc
Copy link
Contributor Author

dpc commented Apr 5, 2024

Oh, shoot. Now I see it's not actually a send_sigkill and I'm not 100% sure if the process didn't hang. I don't think it did, because they all get killed on ctrl+c all the time reliably, but I'll try to verify.

Edit:
Nah, I changed to send_sigkill and I get the same result.

@dpc
Copy link
Contributor Author

dpc commented Apr 5, 2024

I pasted relevant part of gdb session: https://pastebin.com/VzHF0B5T , including list of threads and stackstrace that is mostly tokio functions if it's of any help.

@wtdcode
Copy link

wtdcode commented May 28, 2024

@dpc I once also met this issue and got more debugging hints via this tool: https://github.com/tokio-rs/console

@Darksonn According to my own experience last time, is it possible that tokio is running out of worker threads? Last time, I noticed that every time I tokio::spawn 33rd task (my machine has 32 logical cores) and used block_in_place to escape to the sync world, the whole runtime would hang and console says all threads are waiting for signals to wake up. I can't give the exact reproduction unfortunately because I have workarounded it by wrapping the code with tokio::spawn_blocking.

@Darksonn
Copy link
Contributor

Yes, it's possible to run out of worker threads when you use block_in_place, but the limit would be the number of blocking threads, which is 512 by default. The fact that you're already blocked at 32 tasks tells me that you're probably blocking the thread outside of block_in_place.

@wtdcode
Copy link

wtdcode commented May 28, 2024

Yes, it's possible to run out of worker threads when you use block_in_place, but the limit would be the number of blocking threads, which is 512 by default. The fact that you're already blocked at 32 tasks tells me that you're probably blocking the thread outside of block_in_place.

Cool, that indeed makes sense. My code base is mixing sync and async almost everywhere, though I know it's bad practice.

Is it possible to configure this limit or at least check if we are reaching the limit to avoid any hang?

@wtdcode
Copy link

wtdcode commented May 28, 2024

Quick googling gives this which resolves my questions. =)

@dpc
Copy link
Contributor Author

dpc commented May 28, 2024

block_in_place blocks the worker threads, not blocking threads, no? The whole point of block_in_place is that not having to Send anything, because the current worker thread is taken out of scheduling to avoid moving to a different thread, no?

@Darksonn
Copy link
Contributor

When you call block_in_place, then the current thread ceases to be a worker thread, and a new worker thread is spawned using spawn_blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime
Projects
None yet
Development

No branches or pull requests

3 participants