Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch delete (delete_many) fails with unclear error on timeout. #1030

Open
moonsift opened this issue Apr 29, 2024 · 0 comments
Open

batch delete (delete_many) fails with unclear error on timeout. #1030

moonsift opened this issue Apr 29, 2024 · 0 comments

Comments

@moonsift
Copy link

Following the documentation here:
https://weaviate.io/developers/weaviate/manage-data/delete

I have implemented a deletion setup as follows:

collection = client.collections.get(collection_name)
end_date = datetime.strptime("2024-04-09T20:16:10Z", "%Y-%m-%dT%H:%M:%SZ")
where_clause =  Filter.by_property("scrape_date").less_or_equal(end_date)
collection.data.delete_many(
 where=where_clause,
)

Running small queries (seems to be less than ~3000) is fine.
But if there are many vectors to delete I receive this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/batch/grpc_batch_delete.py", line 33, in batch_delete
    res, _ = self._connection.grpc_stub.BatchDelete.with_call(
  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1193, in with_call
    return _end_unary_response_blocking(state, call, True, None)
  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "unavailable"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-04-29T15:18:19.331980027+00:00", grpc_status:14, grpc_message:"unavailable"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/data.py", line 162, in delete_many
    return self._batch_delete_grpc.batch_delete(
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/batch/grpc_batch_delete.py", line 67, in batch_delete
    raise WeaviateDeleteManyError(e.details())  # pyright: ignore
weaviate.exceptions.WeaviateDeleteManyError: Query call with protocol GRPC delete failed with message unavailable.

Contrary to the linked documentation:
https://weaviate.io/developers/weaviate/manage-data/delete

There is a configurable [maximum limit (QUERY_MAXIMUM_RESULTS)](https://weaviate.io/developers/weaviate/config-refs/env-vars#general) on the number of objects that can be deleted in a single query (default 10,000). To delete more objects than the limit, re-run the query.

Rerunning the queries does not appear to result in all data being deleted, as the total number of vectors in the collection is unchanged and the query perpetually fails rather than eventually succeeding when (presumably) it would reach the end of the documents.

I've noticed that the timeout seems to be consistently 60 seconds, so I presume that is the reason for the error and that then the batch deletion fails.

It would be helpful if this was clearer in the raised error.

@moonsift moonsift changed the title batch_delete fails with unclear error on timeout. batch delete (delete_many) fails with unclear error on timeout. Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant