Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Generate embeddings for images #892

Merged
merged 15 commits into from
May 15, 2024
Merged

Conversation

cecheta
Copy link
Collaborator

@cecheta cecheta commented May 13, 2024

Required by #748

Purpose

  • This PR fetches embeddings from Computer Vision when an image is uploaded and advanced image processing is enabled
  • Adds functional and unit tests
    • Note that the unit tests rely on pytest-httpserver
    • To enable this, the fixtures for enabling HTTPS for the mock server have been moved to the top-level conftest.py, so they are in the scope of all tests (unit + functional)
  • Created client to call computer vision to generate embeddings, as this is not supported yet by the official SDK
  • Add config validation before saving config file, so that advanced image processing cannot be enabled for non-image types (e.g. .txt)
  • Added computer vision key/endpoint to env vars

Does this introduce a breaking change?

  • Yes
  • No

Testing:

  • Deployed to Azure in keys + rbac
  • Uploaded images, verified embeddings in logs

adamdougal and others added 4 commits May 13, 2024 10:52
Co-authored-by: Adam Dougal <adamdougal@users.noreply.github.com>
Co-authored-by: Adam Dougal <adamdougal@users.noreply.github.com>
Copy link

github-actions bot commented May 13, 2024

Coverage

Coverage Report •
FileStmtsMissCoverMissing
code/backend/batch/utilities/helpers
   azure_computer_vision_client.py470100% 
   env_helper.py1331092%221, 226–227, 230–232, 244, 248–250
code/backend/batch/utilities/helpers/config
   config_helper.py1340100% 
code/backend/batch/utilities/helpers/embedders
   embedder_factory.py10370%12–13, 15
   push_embedder.py510100% 
TOTAL238367371% 

Tests Skipped Failures Errors Time
194 0 💤 0 ❌ 0 🔥 11.979s ⏱️

Copy link
Collaborator

@gaurarpit gaurarpit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.. some comments

@adamdougal adamdougal requested a review from gaurarpit May 13, 2024 16:15
gaurarpit
gaurarpit previously approved these changes May 14, 2024
Copy link
Collaborator

@gaurarpit gaurarpit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Nothing major, some minor ones. Ship-It.

                              ______________________________________________
                          .-'                     _                        '.
                        .'                       |-'                        |
                      .'                         |                          |
                   _.'               p         _\_/_         p              |
                _.'                  |       .'  |  '.       |              |
           __..'                     |      /    |    \      |              |
     ___..'                         .T\    ======+======    /T.             |
  ;;;\::::                        .' | \  /      |      \  / | '.           |
  ;;;|::::                      .'   |  \/       |       \/  |   '.         |
  ;;;/::::                    .'     |   \       |        \  |     '.       |
        ''.__               .'       |    \      |         \ |       '.     |
             ''._          <_________|_____>_____|__________>|_________>    |
                 '._     (___________|___________|___________|___________)  |
                    '.    \;;;o;;;;o;;;;;o;;;;;o;;;;;o;;;;;o;;;;;o;;;;/   |
                      '.~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~   |
                        '. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~  |
                          '-.______________________________________________.'

@gaurarpit
Copy link
Collaborator

tests are failing.. looking into them

gaurarpit
gaurarpit previously approved these changes May 14, 2024
@gaurarpit
Copy link
Collaborator

tests are failing.. looking into them

fixed.

@superhindupur
Copy link
Collaborator

Regarding the use of a mock http server for unit tests:

I understand the point about not wanting to tie our mocks tightly to the underlying implementation (http requests vs sdk), but couldn't we resolve this issue by just using an interface and implementation? Mock the interface behavior in the tests, and the implementation details will remain hidden.
Also, mocks in general tend to be tied to underlying implementation. Why the special consideration for this case?

I would vote for keeping unit tests free of http servers - and reserve this type of testing for our functional tests. Consistency helps readers reason about the code base. I'd argue that in this case readability is more important than the possible rework of the test mocks if/when we switch the underlying calls from http to sdk.

@adamdougal
Copy link
Collaborator

Bhavana and I caught up on a call. Decided to keep the tests as they are, but added a comment to the test class explaining the rationale.

@adamdougal adamdougal enabled auto-merge May 14, 2024 14:33
@adamdougal adamdougal added this pull request to the merge queue May 15, 2024
Merged via the queue into main with commit a96bde6 May 15, 2024
11 checks passed
@adamdougal adamdougal deleted the feat/748/vectorize-image branch May 15, 2024 07:34
liammoat pushed a commit that referenced this pull request May 16, 2024
Co-authored-by: Adam Dougal <adamdougal@microsoft.com>
Co-authored-by: Adam Dougal <adamdougal@users.noreply.github.com>
Co-authored-by: Arpit Gaur <arpitgaur@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants