Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG #676

Open
1 task done
myoshimu opened this issue May 13, 2024 · 2 comments
Open
1 task done
Assignees

Comments

@myoshimu
Copy link
Member

myoshimu commented May 13, 2024

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb

What happened?

Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:

#Extract text and image metadata from the PDF document
text_metadata_df, image_metadata_df = get_document_metadata(
    multimodal_model,  # we are passing gemini 1.0 pro vision model
    pdf_folder_path,
    image_save_dir="images",
    image_description_prompt=image_description_prompt,
    embedding_size=1408,
)

print("\n\n --- Completed processing. ---")
:



Processing page: 1
Processing page: 2
Processing page: 3
Processing page: 4

:

FzErrorArgument                           Traceback (most recent call last)
[<ipython-input-8-96bfa690e8cb>](https://localhost:8080/#) in <cell line: 14>()
     12 
     13 # Extract text and image metadata from the PDF document
---> 14 text_metadata_df, image_metadata_df = get_document_metadata(
     15     multimodal_model,  # we are passing gemini 1.0 pro vision model
     16     pdf_folder_path,

4 frames
~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47578         Write a pixmap as a JPEG.
  47579     """
> 47580     return _mupdf.fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47581 
  47582 def fz_write_pixmap_as_jpx(out, pix, quality):

FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG

Relevant log output

I think get_image_for_gemini() function in
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:

import fitz
import os
from PIL import Image

def get_image_for_gemini(
    doc: fitz.Document,
    image: tuple,
    image_no: int,
    image_save_dir: str,
    file_name: str,
    page_num: int,
) -> Tuple[Image.Image, str]:
    """
    Extracts an image from a PDF document, converts it to a supported format (JPEG), saves it,
    and loads it as a PIL Image Object.
    """
    
    xref = image[0]
    pix = fitz.Pixmap(doc, xref)  
    
    # Check color map and convert
    if pix.colorspace not in (fitz.csGRAY, fitz.csRGB, fitz.csCMYK):
        pix = fitz.Pixmap(fitz.csRGB, pix)  # Convert as RGB
        
    # Save as JPEG
    image_name = f"{image_save_dir}/{file_name}_image_{page_num}_{image_no}_{xref}.jpeg"
    os.makedirs(image_save_dir, exist_ok=True)
    pix.pil_save(image_name, format="JPEG") # Changed

    # Load as PIL Image object
    image_for_gemini = Image.open(image_name)

    # Release pixmap object
    pix = None 
    return image_for_gemini, image_name

Code of Conduct

  • I agree to follow this project's Code of Conduct
@krupalsmart97
Copy link

Hey all, i tried the above code as I was facing the same issue, the above code is giving the following error

Unexpected item type: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=270x184 at 0x7C60195D9CC0>.Only types that represent a single Content or a single Part are supported here.

not sure if I am doing something wrong

@rocpoc
Copy link
Contributor

rocpoc commented May 16, 2024

@holtskinner +1, I am seeing this issue too.

I've also been hitting numerous quota issues despite adding:

add_sleep_after_page = True
sleep_time_after_page = 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants