-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add endpoint to disconnect all users for a given tenant #842
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
|
||
def disconnect_clients(%{assigns: %{tenant: tenant}} = conn, %{"tenant_id" => tenant_id}) do | ||
Tenants.disconnect_clients(tenant_id) | ||
PostgresCdc.stop_all(tenant, @stop_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@filipecabaco this returns either :ok
or :error
. Shouldn't we include the correct resp code depending on the result of PostgresCdc.stop_all/2
? That way whatever consumes the endpoint will know if something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels inconsistent when taking into account the other places this is called:
-
:ok <- PostgresCdc.stop_all(tenant, stop_all_timeout), - If this returns
:error
then500
is returned.
- If this returns
-
PostgresCdc.stop_all(tenant, @stop_timeout) - Returns 204 no matter what.
I would think that for a disconnect upstream would want proper feedback that if failed would get 500
in response.
Tenants.disconnect_clients(tenant_id) | ||
PostgresCdc.stop_all(tenant, @stop_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@filipecabaco are there any potential race conditions here?
For example, what would happen if a postgres changes subscribed client gets disconnected and immediately reconnects? This client has a record in realtime.subscriptions
again upon reconnection. Then when the call to stop PostgresCdc finally goes through would it wipe away the record? Does calling PostgresCdc.stop_all/2
kick out clients too?
Would it be better to first call PostgresCdc.stop_all/2
and if it returns then call :ok
Tenants.disconnect_clients/1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky one as we have no way of doing this without being fully async since we will need to kill potentially thounsands of processes in their own lifecycle.
We can actually suspend the user before hand, wait a couple of seconds (which will already kill socket connections) and then kill postgres cdc
This will prevent new users from joining, kill existing ones and then kill the remainder of the pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably need to:
- suspend
- bust tenant cache
- stop_all (yeah before disconnect)
- disconnect
- unsuspend
Note: need to bust the tenant cache tho too because suspend flag is there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you disconnect and they can't reconnect, then yeah you could stop_all after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like suspend busts the tenant cache already 🥳
a0784a9
to
5cb1201
Compare
5cb1201
to
0d426ee
Compare
0d426ee
to
db43ea7
Compare
@@ -104,6 +104,7 @@ defmodule RealtimeWeb.Router do | |||
pipe_through([:open_cors, :tenant_api, :secure_tenant_api]) | |||
|
|||
post("/broadcast", BroadcastController, :broadcast) | |||
delete("/disconnect_clients", TenantController, :disconnect_clients) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have only a /suspend
and /unsuspend
endpoint where it disconnects clients and kills all the db connections including extensions?
I was having trouble this week figuring out how to restart the realtime_subscription_manager_pub
connections.
Having one endpoint where we could confidently disable Realtime for all the db and client conns for a tenant is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a really good , will implement that instead
What kind of change does this PR introduce?
Add endpoint to disconnect all users for a given tenant