Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to run example notebook: pubmed-bm25.ipynb #340

Open
2 tasks done
paulz opened this issue May 4, 2024 · 1 comment
Open
2 tasks done

[Bug] Unable to run example notebook: pubmed-bm25.ipynb #340

paulz opened this issue May 4, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@paulz
Copy link

paulz commented May 4, 2024

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

0%
 0/32 [00:00<?, ?it/s]
---------------------------------------------------------------------------
SparseValuesMissingKeysError              Traceback (most recent call last)
[<ipython-input-22-8f2be8886c89>](https://dtujx39ytn-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240502-060125_RC00_630016150#) in <cell line: 5>()
     35     # new_vectors = { 'sparse_values': {'indices': indices, 'values': values}}
     36     # index.upsert(vectors=new_vectors)
---> 37     index.upsert(vectors=vectors)
     38 
     39 # show index description after uploading the documents

6 frames
[/usr/local/lib/python3.10/dist-packages/pinecone/data/vector_factory.py](https://dtujx39ytn-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240502-060125_RC00_630016150#) in _dict_to_sparse_values(sparse_values_dict, check_type)
    104             raise SparseValuesDictionaryExpectedError(sparse_values_dict)
    105         if not {"indices", "values"}.issubset(sparse_values_dict):
--> 106             raise SparseValuesMissingKeysError(sparse_values_dict)
    107 
    108         indices = convert_to_list(sparse_values_dict.get("indices"))

SparseValuesMissingKeysError: Missing required keys in data in column `sparse_values`. Expected format is `'sparse_values': {'indices': List[int], 'values': List[float]}`. Found keys [16984, 3526, 2331, 1006, 7473, 2094, 1007, 2003, 1996, 12222, 1997, 4442, 2306, 2019, 15923, 1012, 12922, 3269, 9706, 17175, 18150, 2239, 11934, 27806, 7137, 2566, 29278, 10708, 1999, 2049, 3727, 2083, 8676, 1037, 17779, 6198, 20134, 1998, 18323, 9607, 4372, 20464, 18606, 2024, 29111, 5158, 2012, 2415, 2122, 22901, 15436, 2015, 1010, 7458, 3155, 2274, 2013, 12436, 28817,

Expected Behavior

example notebooks should work without error

Steps To Reproduce

  1. run https://github.com/pinecone-io/examples/blob/master/learn/search/hybrid-search/fast-intro/pubmed-bm25.ipynb in Colab
  2. go through steps until error

Relevant log output

0%
 0/32 [00:00<?, ?it/s]
---------------------------------------------------------------------------
SparseValuesMissingKeysError              Traceback (most recent call last)
<ipython-input-22-8f2be8886c89> in <cell line: 5>()
     35     # new_vectors = { 'sparse_values': {'indices': indices, 'values': values}}
     36     # index.upsert(vectors=new_vectors)
---> 37     index.upsert(vectors=vectors)
     38 
     39 # show index description after uploading the documents

6 frames
/usr/local/lib/python3.10/dist-packages/pinecone/data/vector_factory.py in _dict_to_sparse_values(sparse_values_dict, check_type)
    104             raise SparseValuesDictionaryExpectedError(sparse_values_dict)
    105         if not {"indices", "values"}.issubset(sparse_values_dict):
--> 106             raise SparseValuesMissingKeysError(sparse_values_dict)
    107 
    108         indices = convert_to_list(sparse_values_dict.get("indices"))

SparseValuesMissingKeysError: Missing required keys in data in column `sparse_values`. Expected format is `'sparse_values': {'indices': List[int], 'values': List[float]}`. Found keys [16984, 3526, 2331, 1006, 7473, 2094, 1007, 2003, 1996, 12222, 1997, 4442, 2306, 2019, 15923, 1012, 12922, 3269, 9706, 17175, 18150, 2239, 11934, 27806, 7137, 2566, 29278, 10708, 1999, 2049, 3727, 2083, 8676, 1037, 17779, 6198, 20134, 


### Environment

```markdown
- **OS**: Google Colab
- **Language version**: Python
- **Pinecone client version**: default

Additional Context

No response

@paulz paulz added the bug Something isn't working label May 4, 2024
@paulz
Copy link
Author

paulz commented May 9, 2024

tried again:

  0%|                                                                                                                                   | 0/32 [00:01<?, ?it/s]
---------------------------------------------------------------------------
SparseValuesMissingKeysError              Traceback (most recent call last)
Cell In[16], line 30
     22         vectors.append({
     23             'id': _id,
     24             'sparse_values': sparse,
     25             'values': dense,
     26             'metadata': metadata
     27         })
     29     # upload the documents to the new hybrid index
---> 30     index.upsert(vectors=vectors)
     32 # show index description after uploading the documents
     33 index.describe_index_stats()

File ~/workspace/third-party/pinecone/examples/venv/lib/python3.11/site-packages/pinecone/utils/error_handling.py:10, in validate_and_convert_errors.<locals>.inner_func(*args, **kwargs)
      7 @wraps(func)
      8 def inner_func(*args, **kwargs):
      9     try:
---> 10         return func(*args, **kwargs)
     11     except MaxRetryError as e:
     12         if isinstance(e.reason, ProtocolError):

File ~/workspace/third-party/pinecone/examples/venv/lib/python3.11/site-packages/pinecone/data/index.py:171, in Index.upsert(self, vectors, namespace, batch_size, show_progress, **kwargs)
    164     raise ValueError(
    165         "async_req is not supported when batch_size is provided."
    166         "To upsert in parallel, please follow: "
    167         "https://docs.pinecone.io/docs/insert-data#sending-upserts-in-parallel"
    168     )
    170 if batch_size is None:
--> 171     return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
    173 if not isinstance(batch_size, int) or batch_size <= 0:
    174     raise ValueError("batch_size must be a positive integer")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant