Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgvector missing in Docker image ghcr.io/postgresml/postgresml:2.8.2 #1346

Open
remote4me opened this issue Mar 2, 2024 · 2 comments
Open

Comments

@remote4me
Copy link

remote4me commented Mar 2, 2024

I am trying to use the docker image. My environment: Ubuntu 22.04 with GPU, docker

I got these errors:

ERROR: access method "ivfflat" does not exist (when creating index)
ERROR: type "vector" does not exist (when using ::vector in select statemen)

What I did:

docker run --rm -it \
-v postgresml_data:/var/lib/postgresml \
-v postgresml_postgresdata:/var/lib/postgresql \
--gpus all \
-p 5499:5432 -p 8000:8000 \
ghcr.io/postgresml/postgresml:2.8.2 \
sudo -u postgresml psql -d postgresml
  1. Connected with SQL client to port 5499

  2. I want to reproduce steps described in "Vector database", see https://github.com/postgresml/postgresml/?tab=readme-ov-file#vector-database

  3. SELECT pgml.load_dataset('tweet_eval', 'sentiment');

  4. Created table with embeddings:

CREATE TABLE tweet_embeddings AS
SELECT text, pgml.embed('distilbert-base-uncased', text) AS embedding 
FROM pgml.tweet_eval;
  1. Creating index fails:
CREATE INDEX ON tweet_embeddings USING ivfflat (embedding vector_cosine_ops);
--
ERROR: access method "ivfflat" does not exist
1 statement failed.
  1. Using ::vector fails:
WITH query AS (
    SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding
)
SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 5;
--
ERROR: type "vector" does not exist
  Position: 113

    SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding
                                                                                              ^
1 statement failed.
  1. Some additional info:
SELECT extname, extversion FROM pg_extension;
 extname | extversion 
---------+------------
 plpgsql | 1.0
 pgml    | 2.8.2
  1. More details:
SELECT pgml.version();
version
2.8.2 (dd7c74909bdf10cd5d39faf4429df8ba9748fd30)

Documentation (see https://postgresml.org/docs/product/vector-database) say literally this:

If you're using our Cloud or our Docker image, your database has pgvector installed already.

Well... I am using your latest Docker image, and...

@lbialy
Copy link

lbialy commented Mar 20, 2024

CREATE EXTENSION vector;

this however leads to:

postgresml=# \d+ tweet_embeddings
                                      Table "public.tweet_embeddings"
  Column   |  Type  | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-----------+--------+-----------+----------+---------+----------+-------------+--------------+-------------
 text      | text   |           |          |         | extended |             |              |
 embedding | real[] |           |          |         | extended |             |              |
Access method: heap

postgresml=# select * from tweet_embeddings;
postgresml=# CREATE INDEX ON tweet_embeddings USING ivfflat (embedding vector_cosine_ops);
ERROR:  operator class "vector_cosine_ops" does not accept data type real[]

@montanalow
Copy link
Contributor

You'll need to alter the column type to a vector to use pgvector indexes.

https://github.com/pgvector/pgvector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants