Skip to content

v1.25.0 - RAFT-based Schema, Batch Vectorization, Hybrid Search Improvements, Implicit Tenant Creation, Dynamic Index Switching

Compare
Choose a tag to compare
@parkerduckworth parkerduckworth released this 10 May 02:33
· 146 commits to main since this release

Breaking Changes

none

New Features

RAFT-based Schema

We're excited to announce the release of our RAFT-based schema! With this, Weaviate now supports concurrent schema updates, eliminating bottlenecks and significantly improving performance in large-scale and dynamic settings.

  • Update schema manager interface to expose schema version (Part 1) by @reyreaud-l in #4659
  • Update schema manager interface to expose schema version (Part 2) by @reyreaud-l in #4660
  • Implement schema updates using RAFT consensus by @redouan-rhazouani in #3944
  • Rebuild GQL just once when reloading local db by @redouan-rhazouani in #4670
  • Update local_dev script RAFT_JOIN config to bootstrap cluster correctly by @moogacs in #4676
  • Refactor consensus dev script on nodes name weaviate-* by @moogacs in #4677
  • Update object interface to include version by @reyreaud-l in #4675
  • RAFT store Apply tests calling mocks expectations assertion by @moogacs in #4684
  • Save copy of the current RAFT schema into the old format by @moogacs in #4679
  • Trigger schema callbacks on AddProperty only on success by @moogacs in #4685
  • Check error on apply to avoid panics by @moogacs in #4687
  • Add schema version to batch write client and server by @reyreaud-l in #4681
  • Added store-specific statistics to existing RAFT statistics by @nathanwilk7 in #4689
  • Client rpc reuse the same conn by @moogacs in #4686
  • Update batch delete with schema version by @reyreaud-l in #4690
  • Schema v2 props upsert by @aliszka in #4680
  • Autoschema optimizations - cached class schema by @aliszka in #4662
  • RAFT store.Service.Ready is flaky in cluster/store/store_test.go, poll for 2s instead of 1s wait to improve robustness by @nathanwilk7 in #4706
  • Implement schema querying based on a specific version for enhanced version control by @redouan-rhazouani in #4693
  • Update put, merge and add ref with schema version by @reyreaud-l in #4704
  • Update UpdateShardStatus to use schema version by @reyreaud-l in #4713
  • Fix nil ptr panic when closing RAFT leader client by @reyreaud-l in #4723
  • RAFT GRPC: tweak the client service config by @moogacs in #4715
  • Add query shard tenant and update usages by @reyreaud-l in #4727
  • Enable cancelation for HTTP replication requests and reduce timeout for faster node failure detection by @redouan-rhazouani in #4730
  • Schema v2 idempotent tenants by @aliszka in #4699
  • Make batch operation retrieve and pass schema version to client by @reyreaud-l in #4738
  • Update auto schema to return the schema version by @reyreaud-l in #4742
  • Refactor usage of index.getOrInitLocalShard() to the minimum by @redouan-rhazouani in #4744
  • Schema V2 writes with schema version by @aliszka in #4745
  • Allow migration from RAFT to take as much time as necessary by @redouan-rhazouani in #4710
  • Add missing CloudService to ClusterService rename by @aliszka in #4747
  • Update RAFT subsystem to use the same log format by @redouan-rhazouani in #4746
  • Convert TenantExists to handle eventual consistency by @tsmith023 in #4750
  • Add an external gRPC method for getting tenant information by @tsmith023 in #4741
  • update object endpoints to fetch and propagate schema version by @reyreaud-l in #4752
  • Replace slog with logrus for logs consistency by @moogacs in #4753
  • Add action logrus field to RAFT logs by @moogacs in #4755
  • Autoschema fix to correctly return max schemaVersion by @aliszka in #4757
  • Make default CLUSTER_HOSTNAME consistent between memberlist and RAFT by @moogacs in #4758
  • Cluster log formatting by @nathanwilk7 in #4759
  • Handle possible nil classes in autoSchema by @moogacs in #4761
  • Refresh class's schema when props added by autoschema by @aliszka in #4762
  • Specialize voter nodes for metadata storage by @redouan-rhazouani in #4734
  • Update the schema version parsing to not always return 0 by @reyreaud-l in #4767
  • Refactor leader error distinguish between election and network issues by @moogacs in #4784
  • Update query shard tenant to get multiple shards at once by @moogacs in #4731
  • Avoid schema updating race with DB update by @reyreaud-l in #4769
  • Backup restore: return err class exists if it does by @moogacs in #4792
  • Make coordinator node wait for schema version if changed on validate step by @reyreaud-l in #4794
  • Add RAFT_GRPC_MESSAGE_MAX_SIZE Environment Variable to set maximum GRPC message size for RAFT by @nathanwilk7 in #4799
  • Only involve leader for tenant status when necessary by @etiennedi in #4803
  • Shift add tenant to the RAFT leader by @reyreaud-l in #4801
  • Fix RAFT shutdown, force it for dependencies and convert to enterrors.GoWrapper by @moogacs in #4810
  • Allow Leader QueryShardingState type by @moogacs in #4817
  • Update replica usecase to wait for EC on writes with schema version by @reyreaud-l in #4814
  • Schema v2 prevent shutdown by @aliszka in #4821
  • Aggregate querying the leader for classes with variadic by @moogacs in #4787
  • Observe durations of schema reads and writes for local and leader reads by @etiennedi in #4844
  • Reserve RAFT (all casing permutations) as a class name by @moogacs in #4874
  • Make bootstrap exit early on RAFT store reporting ready by @reyreaud-l in #4871

Batch Vectorization

Avoid vectorization API rate-limiting and enjoy faster insertion.

Dynamic Index Switching

Automatically switch vector index types to achieve peak performance and efficiency.

  • Dynamic vector index type by @abdelr in #4350
  • Fix merge conflict on schema validation of dynamic index by @trengrj in #4829
  • Adding more tests around the dynamic index and fixing an existing bug by @abdelr in #4836

Implicit Tenant Creation

Create tenants on the fly by simply including the tenant name during batch inserts.

Hybrid Search Improvements

  • Add nearvector and neartext to hybrid search by @donomii in #4462
  • Batch vectorization with custom rate limits by @dirkkul in #4546
  • Add groupby to hybrid search and bm25f, add moveto/movefrom etc to aggregate hybrid search by @donomii in #4477
  • Add tests, improve parameter extraction in hybrid aggregate by @donomii in #4809

Other

Module Improvements

Performance Improvements

Testing Improvements

Fixes

  • Thread-safe bucket creation and loading by @jeroiraz in #4390
  • Thread saftey on runJobOnBuckets() by @moogacs in #4421
  • Delete existent precalculated tmp bloom filter by @jeroiraz in #4469
  • Thread safety for bucket creation and loading by @moogacs in #4422
  • Fix panic with named vectors and generative module by @dirkkul in #4577
  • Fix parsing of tenant in /schema/{className}/shards?tenant={tenant} by @tsmith023 in #4777
  • Errgroup wait correction on idx add property by @moogacs in #4768
  • Update schema when PQ is enabled for a named vector index by @parkerduckworth in #4779
  • Class info error not found instead of exists by @moogacs in #4788
  • Update error message for class already exists to include class name by @reyreaud-l in #4791
  • Change ListValue message to avoid gRPC N+1 problem by @tsmith023 in #4565
  • Fix backup restore with include & exclude by @tsmith023 in #4804
  • Fix node status indication in /v1/cluster/statistics response by @antas-marcin in #4811
  • Don't return shutdown error incase the shard was already shutdown by @moogacs in #4839
  • Max length field validation during object marshalling by @jeroiraz in #4877

Docs & Chores

New Contributors

Full Changelog: v1.24.12...v1.25.0