Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different search result after persist and restore database index #695

Open
gdeak-monguz opened this issue Apr 13, 2024 · 2 comments
Open

Comments

@gdeak-monguz
Copy link

gdeak-monguz commented Apr 13, 2024

Describe the bug

I created a database and persisted it with @orama/plugin-data-persistence plugin. After restoring the index from JSON string the search result was diffirent, than the search before persisting it.

To Reproduce

With the following code the bug could be reproduced:

package.json

{
  "name": "orama-pilot",
  "private": true,
  "version": "1.0.0",
  "type": "module",
  "dependencies": {
    "@orama/orama": "^2.0.15",
    "@orama/plugin-data-persistence": "^2.0.15",
    "@orama/stemmers": "^2.0.15",
    "@orama/stopwords": "^2.0.15"
  }
}

index.js

import { create, insert, search } from '@orama/orama';
import { persist, restore } from '@orama/plugin-data-persistence';
import { stopwords as hungarianStopwords } from '@orama/stopwords/hungarian';
import {
  stemmer,
  language as hungarianLanguage,
} from '@orama/stemmers/hungarian';

// Database
const originalDatabaseInstance = await create({
  schema: {
    type: 'string',
    name: 'string',
  },
  components: {
    tokenizer: {
      stopWords: hungarianStopwords,
      stemming: true,
      stemmerSkipProperties: ['type'],
      language: hungarianLanguage,
      stemmer,
    },
  },
});

// Insert record
await insert(originalDatabaseInstance, {
  type: 'infantry',
  name: 'Piski ütközet',
});

const searchOptions = { term: 'Piski' };

// Search from original database index
const searchResultFromOriginalDatabaseInstance = await search(
  originalDatabaseInstance,
  searchOptions
);
console.log('Count:', searchResultFromOriginalDatabaseInstance.count);  // Count: 1

// Persist database index
const databaseIndex = await persist(originalDatabaseInstance, 'json');
// Restore database index
const restoredDatabaseInstance = await restore('json', databaseIndex);

// Search from restored database index
const searchResultFromRestoredDatabaseInstance = await search(
  restoredDatabaseInstance,
  searchOptions
);
console.log('Count:', searchResultFromRestoredDatabaseInstance.count); // Count: 0

Expected behavior

After restoring the database, I expected the same search results as before persistence.

Environment Info

OS: Windows 11 Pro
Node: v20.2.0
@orama/orama: 2.0.15
@orama/plugin-data-persistence: 2.0.15
@orama/stemmers: 2.0.15
@orama/stopwords: 2.0.15

Affected areas

Search

Additional context

No response

@micheleriva
Copy link
Member

Hi @gdeak-monguz,
I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data

@gdeak-monguz
Copy link
Author

Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data

How can I do this? I tried to create a new database instance with the same schema and components (tokenizer -> stemmer and stopwords) and use insertMultiple function with this new instance and the persist database index, but it still does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants