Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent output on Dolly NER #393

Open
nxitik opened this issue Dec 1, 2023 · 3 comments
Open

Inconsistent output on Dolly NER #393

nxitik opened this issue Dec 1, 2023 · 3 comments
Labels
feat/model Feature: models usage How to use `spacy-llm`

Comments

@nxitik
Copy link

nxitik commented Dec 1, 2023

Here is a block of example.yml:

- text: Jack and Jill went up the hill.
  spans:
    - text: Jack
      is_entity: true
      label: PERSON
      reason: is the name of a person
    - text: Jill
      is_entity: true
      label: PERSON
      reason: is the name of a person
    - text: went up
      is_entity: false
      label: ==NONE==
      reason: is a verb
    - text: hill
      is_entity: true
      label: LOCATION
      reason: is a location

Block of fewshot.cfg

[paths]


[nlp]
lang = "en"
pipeline = ["llm"]
batch_size = 128

[components]

[components.llm]
factory = "llm"

[components.llm.model]
@llm_models = "spacy.Dolly.v1"
name = "dolly-v2-3b"

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = PERSON,ORGANISATION,LOCATION

[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "example.yml"

[components.llm.task.normalizer]
@misc = "spacy.LowercaseNormalizer.v1"

Block of pipeline run:

from spacy_llm.util import assemble
nlp = assemble(
        "fewshot.cfg"
    )
doc = nlp("Jack and Jill went up the hill.")

print(f"Text: {doc.text}")
print(doc.ents)
print(f"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}")

There are inconsistencies in output, and how do i resolve it?

python spacyllmtry.py
Text: Jack and Jill went up the hill.
(Jack, Jill, hill)
Entities: [('Jack', 'PERSON'), ('Jill', 'PERSON'), ('hill', 'LOCATION')]
python spacyllmtry.py
Text: Jack and Jill went up the hill.
()
Entities: []
@rmitsch
Copy link
Collaborator

rmitsch commented Dec 4, 2023

Hi @nxitik, it seems that Dolly doesn't return the correct output. You can further debug this by setting save_io = True in your config:

[components.llm]
factory = "llm"
save_io = True

I recommend a larger and newer model than dolly-v2-3b - smaller models often struggle with more complex tasks like this one.

@rmitsch rmitsch added feat/model Feature: models usage How to use `spacy-llm` labels Dec 4, 2023
@nxitik
Copy link
Author

nxitik commented Dec 4, 2023

Does it allow quantized models from like for eg TheBloke?

@rafikoham
Copy link

```ini
save_io = True

It still doesn't work for me, and I agree, it's better to use larger models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/model Feature: models usage How to use `spacy-llm`
Projects
None yet
Development

No branches or pull requests

3 participants