Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the field id map from the weight of each fields #4631

Conversation

irevoire
Copy link
Member

@irevoire irevoire commented May 13, 2024

Pull Request

Related issue

Fixes #4484
Fixes #4639

What does this PR do?

  • Make the (internal) searchable fields database always contain the searchable fields (instead of None when the user-defined searchable fields were not defined)
  • Introduce a new « fieldids_weights_map » that does the mapping between a fieldId and its Weight
  • Ensure that when two searchable fields are swapped, the field ID map doesn't change anymore (and thus, doesn't re-index)
  • Uses the weight instead of the order of the searchable fields in the attribute ranking rule at search time
  • When no searchable attributes are defined, make all their weights equal to zero
  • When a field is declared as searchable and contains nested fields, all its subfields share the same weight

Impact on relevancy

When no searchable attributes are declared

When no searchable attributes are declared, all the fields have the same importance instead of randomly giving more importance to the field we've encountered « the most early » in the life of the index.

This means that before this PR, send the following json:

[
  { "id": 0, "name": "kefir", "color": "white" },
  { "id": 1, "name": "white", "last name": "spirit" }
]

Would make the field name more important than the field color or last name.
This means that searching for white would make the document 1 automatically higher ranked than the document 0.

After this PR, all the fields have the same weight, and none are considered more important than others.

When a nested field is made searchable

The second behavior change that happened with this PR is in the case you're sending this document, for example:

{
  "id": 0,
  "name": "tamo",
  "doggo": {
    "name": "kefir",
    "surname": "le kef"
  },
  "catto": "gromez"
}

Previously, defining the searchable attributes as: ["tamo", "doggo", "catto"] was actually defining the « real » searchable attributes in the engine as: ["tamo", "doggo", "catto", "doggo.name", "doggo.surname"], which means that doggo.name and doggo.surname were NOT where the user expected them and had completely different weights than doggo.
In this PR all the weights have been unified, and the « real » searchable fields look like this:

[ "tamo", "doggo", "doggo.name", "doggo.surname", "catto"]
   ^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^
Weight 0                 Weight 1                  Weight 2

@irevoire irevoire force-pushed the introduce-the-horrifying-fields-ids-fieldids-weights-map-to-avoid-reindexing-the-searchable-when-only-updating-their-order branch from 8e6ea3a to 31ed3a2 Compare May 14, 2024 15:00
@irevoire irevoire added this to the v1.9.0 milestone May 14, 2024
@irevoire irevoire marked this pull request as ready for review May 14, 2024 15:22
@irevoire irevoire force-pushed the introduce-the-horrifying-fields-ids-fieldids-weights-map-to-avoid-reindexing-the-searchable-when-only-updating-their-order branch from c30371b to 9fffb8e Compare May 14, 2024 15:36
@irevoire irevoire requested a review from Kerollmops May 14, 2024 17:09
@irevoire irevoire changed the title Introduce the horrifying fields ids fieldids weights map to avoid reindexing the searchable when only updating their order Split the field id map from the weight of each fields May 14, 2024
meilisearch/src/search_queue.rs Outdated Show resolved Hide resolved
milli/src/fieldids_weights_map.rs Outdated Show resolved Hide resolved
milli/src/index.rs Outdated Show resolved Hide resolved
milli/src/index.rs Outdated Show resolved Hide resolved
milli/src/index.rs Outdated Show resolved Hide resolved
milli/src/update/settings.rs Outdated Show resolved Hide resolved
milli/src/update/settings.rs Outdated Show resolved Hide resolved
milli/src/update/settings.rs Outdated Show resolved Hide resolved
milli/src/search/new/db_cache.rs Outdated Show resolved Hide resolved
@irevoire irevoire requested a review from Kerollmops May 15, 2024 16:17
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect to me! Can you also ensure that the field IDs have equal weights, please?

@irevoire irevoire added performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption search relevancy Related to the relevancy of the search results settings diff-indexing Issues related to settings diff-indexing indexing labels May 15, 2024
@irevoire irevoire requested a review from Kerollmops May 15, 2024 23:20
Kerollmops
Kerollmops previously approved these changes May 16, 2024
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge whenever we can 🙂
bors merge

meili-bors bot added a commit that referenced this pull request May 16, 2024
4631: Split the field id map from the weight of each fields r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes #4484

## What does this PR do?
- Make the (internal) searchable fields database always contain the searchable fields (instead of None when the user-defined searchable fields were not defined)
- Introduce a new « fieldids_weights_map » that does the mapping between a fieldId and its Weight
- Ensure that when two searchable fields are swapped, the field ID map doesn't change anymore (and thus, doesn't re-index)
- Uses the weight instead of the order of the searchable fields in the attribute ranking rule at search time
- When no searchable attributes are defined, make all their weights equal to zero
- When a field is declared as searchable and contains nested fields, all its subfields share the same weight

## Impact on relevancy

### When no searchable attributes are declared

When no searchable attributes are declared, all the fields have the same importance instead of randomly giving more importance to the field we've encountered « the most early » in the life of the index.

This means that before this PR, send the following json:
```json
[
  { "id": 0, "name": "kefir", "color": "white" },
  { "id": 1, "name": "white", "last name": "spirit" }
]
```

Would make the field `name` more important than the field `color` or `last name`.
This means that searching for `white` would make the document `1` automatically higher ranked than the document `0`.

After this PR, all the fields have the same weight, and none are considered more important than others.

### When a nested field is made searchable

The second behavior change that happened with this PR is in the case you're sending this document, for example:

```json
{
  "id": 0,
  "name": "tamo",
  "doggo": {
    "name": "kefir",
    "surname": "le kef"
  },
  "catto": "gromez"
}
```

Previously, defining the searchable attributes as: `["tamo", "doggo", "catto"]` was actually defining the « real » searchable attributes in the engine as: `["tamo", "doggo", "catto", "doggo.name", "doggo.surname"]`, which means that `doggo.name` and `doggo.surname` were _NOT_ where the user expected them and had completely different weights than `doggo`.
In this PR all the weights have been unified, and the « real » searchable fields look like this:
```json
[ "tamo", "doggo", "doggo.name", "doggo.surname", "catto"]
   ^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^
Weight 0                 Weight 1                  Weight 2

Co-authored-by: Tamo <tamo@meilisearch.com>
@dureuill
Copy link
Contributor

The windows failure here is a bit disheartening

Copy link
Contributor

meili-bors bot commented May 16, 2024

Canceled.

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors merge

Copy link
Contributor

meili-bors bot commented May 16, 2024

Build succeeded:

@meili-bors meili-bors bot merged commit 7c19c07 into main May 16, 2024
10 checks passed
@meili-bors meili-bors bot deleted the introduce-the-horrifying-fields-ids-fieldids-weights-map-to-avoid-reindexing-the-searchable-when-only-updating-their-order branch May 16, 2024 10:27
Kerollmops added a commit that referenced this pull request May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
indexing performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption search relevancy Related to the relevancy of the search results settings diff-indexing Issues related to settings diff-indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v1.9.0] Relevancy changes Swapping searchableAttributes shouldn't trigger a reindexing
3 participants