High MTTR with many tenants #4634

etiennedi · 2024-04-09T18:34:50Z

How to reproduce this bug?

Have many tenants, 100s of thousands
Restart node

What is the expected behavior?

Startup time should be more or less instant (with lazy shard loading)

What is the actual behavior?

We see one write operation per tenant that slows down restarting. It took about 8 minutes on a clsuter with between 100k-200k tenants

Supporting information

No response

Server Version

1.24.x

Code of Conduct

I have read and agree to the Weaviate's Contributor Guide and Code of Conduct

etiennedi · 2024-04-09T18:37:17Z

My guess is that the motivation behind the many writes is that we write back the schema after startup in case there was a migration. Could probably be optimized by checking if anything was actually changed. If not, don't store the schema back.

Possible Steps to reproduce:

Import 200k tenants, they need to be cold otherwise you run out of file descriptors on a single node
restart server

etiennedi added the bug label Apr 9, 2024

parkerduckworth mentioned this issue Apr 9, 2024

Significantly improve MTTR when number of tenants is massive #4636

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High MTTR with many tenants #4634

High MTTR with many tenants #4634

etiennedi commented Apr 9, 2024

etiennedi commented Apr 9, 2024

High MTTR with many tenants #4634

High MTTR with many tenants #4634

Comments

etiennedi commented Apr 9, 2024

How to reproduce this bug?

What is the expected behavior?

What is the actual behavior?

Supporting information

Server Version

Code of Conduct

etiennedi commented Apr 9, 2024