Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: ensure handler on_close() is called after open() #2966

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ploxiln
Copy link
Contributor

@ploxiln ploxiln commented Dec 21, 2020

attempt to fix #2958

@bdarnell
Copy link
Member

bdarnell commented Jan 9, 2021

Interesting, this isn't how I read #2958. I thought that issue was about on_close being called when open() was never called in the first place. This looks like it's a real issue, but it might be a separate one from the one in #2958 (or it might be the same thing and I'm misreading things).

I have some misgivings about delaying on_close until after an async open completes - does that make it too hard to respond to this closure by canceling pending operations? I guess given our lack of cancellation support throughout tornado it makes the most sense to delay on_close here (and there's always on_connection_close), but I do see an argument for documenting the behavior rather than changing it.

In any case, there's a lot of subtlety here and I think we need a test that recreates some of the problematic scenarios.

@ploxiln
Copy link
Contributor Author

ploxiln commented Jan 9, 2021

Yeah, I'm not confident about this solution. I left the job and codebase where I observed the problem over a month ago, so I'm not much help with testing and analysis now.

I do recall: when on_close() was called before open(), open() was always called after. And, we used a plain open() method (originally written before open() could be async, and paired with an async prepare()).

For this, just looking at the code, I didn't see an obvious yield point before await open_result, so I figured one possible case was that (with asyncio) we don't know when, during the next iteration of the ioloop, the async open() will actually start running (until its first yield point). So without this delay of on_close(), it seems impossible to guarantee that even the first line of an async open() will run before on_close().

But, thinking back now, as mentioned just above, we used a plain open() anyway ... so I guess the original problem involves _accept_connection() being called, and trying to write the 101 response, after on_close(), which seems odd. Maybe this can happen when the close happens during an async prepare()?

@m-sasha
Copy link
Contributor

m-sasha commented Apr 12, 2021

A colleague of the reporter of #2958 here.

The report was indeed about on_close() being called before open. We have not seen a case of open not being called at all.
If I understand correctly, the culprit(s) are the two awaits, in accept_connection:

try:
    await self._accept_connection(handler)
except asyncio.CancelledError:

and in _accept_connection:

open_result = handler.open(*handler.open_args, **handler.open_kwargs)
if open_result is not None:
    await open_result

Either of them can cause a context switch to the execution of on_connection_close, which then calls on_close directly.

As my colleague mentioned in the ticket, adding time.sleep(1) before the await in self.accept_connection causes the bug to reproduce every time, as it makes sure that the call to on_connection_close is ready to be executed, and is executed when the context switch occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

on_close called before open on short-lived WebSocket connections
3 participants