Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Experience Improvement Summary #1315

Open
1 of 41 tasks
xzdandy opened this issue Oct 24, 2023 · 0 comments
Open
1 of 41 tasks

User Experience Improvement Summary #1315

xzdandy opened this issue Oct 24, 2023 · 0 comments

Comments

@xzdandy
Copy link
Collaborator

xzdandy commented Oct 24, 2023

Search before asking

  • I have searched the EvaDB issues and found no similar feature requests.

Description

This is a issue that summarizes the user's feedback on their experience with EvaDB:

SQL Statement related:

Data related:

  • Convert numbers from string (e.g., with comma, scientific numbers) into integer.
  • Loading different text file (.pdf and .txt) in the same table is problematic.
  • Version incompatibility (table and index are created by an earlier version of EvaDB)
  • Data types are not supported in EvaDB when reading from postgres, e.g., array types or uuid.
  • Most DB cursors sanitize user-provided data in queries automatically, but EvaDB did not seem to have this functionality.
  • Escape all the single apostrophes when insert data into database.
  • No string concatenation method
  • Can not directly insert dataframes
  • On a Windows machine, storing file paths as strings in MySQL caused problems with misidentifying backslashes for
    escape sequences.
  • New data sources:
    • Google / Bing as a data source for web scraping (e.g., Serper)
    • Google map
    • Github issues
    • Reddit PRAW
    • Wiki

User-defined function related:

  • Sklearn now only supports for linear regression model, more machine learning models can be introduced and supported
  • Function only work on table elements. Need to create a table with one tuple.
  • For aggregation user defined functions, we need a way to pass the complete table into the function instead of row by row or batch by batch.
  • Type definition in Custom AI function's forward decorator is confusing.
    • Input/Output Format of a function (e.g., AWS Rekognition service) can be bytes, while EvaDB's forward function requires input in the form of a numpy array
  • Challenge to figure out what is the best algorithm to predict (e.g,, layoff)

LLM related:

  • Deal with various constraints of the OpenAI API, such as token restrictions, rate limits, and other limitations.
  • ChatGPT deviate from the 'yes' or 'no' responses quite often.
  • Convert the ChatGPT response, which in most cases is just some text
  • Unable to find the EvaDB implementation of exactly how the OpenAI API was called, I ran into unsatisfactory results with
    certain prompts and was unable to debug them. Certain prompts resulted in a response indicating that no text had been submitted. I assumed this may have been due to certain characters in the string, but without the ability to inspect the source code of how the API request is made, I ended up resorting to simply trying various prompts until I had satisfactory results.

Index related:

  • Similarity function is not symmetric.
  • Create index does not work on empty table and third party table.
  • Vector databases like Milvus and Pinecone have unavoidable setup efforts by users.

Optimization related:

  • Web scrape and ChatGPT are expensive and bottleneck of the execution
  • There is optimization opportunity for caching, which is not flushed out.

Installation related:

  • Documentation of installation of EvaDB doesn't work for Windows. (i.e., Copying the notebook code to run on windows)
  • Installation for PyTesseract isn't as simple
  • Docker file for EvaDB has been updated for a while.

Use case

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant