Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge API special chars encoding challenge #4008

Open
4 tasks done
sigvardt opened this issue Apr 30, 2024 · 2 comments
Open
4 tasks done

Knowledge API special chars encoding challenge #4008

sigvardt opened this issue Apr 30, 2024 · 2 comments
Assignees
Labels
🐞 bug Something isn't working

Comments

@sigvardt
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a quesion, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • Pleas do not modify this template :) and fill in all the required fields.

Dify version

Version 0.6.5

Cloud or Self Hosted

Cloud

Steps to reproduce

  1. Use the http request node and send a json post to the Knowledge API endpoint
  2. Include special chars in the post request - example "ä" and the strings in the attached screenshots
  3. Issue seems to happen intermittently.
image image_2_10-5-47 image_10-5-10

✔️ Expected Behavior

String captured in Knowledge dataset reflect String submitted via API

❌ Actual Behavior

Special chars are captured as other (unrelated) special chars

Copy link

dosubot bot commented Apr 30, 2024

It appears the issue with special characters encoding in the Knowledge API might be influenced by several factors, including the indexing technique, the configuration of the embedding model, and how the model manager interacts with these configurations. Specifically, the indexing_technique set to 'high_quality' and the details of the embedding model (provider and model_type) are crucial areas to investigate. Errors in embedding model availability or provider token initialization could lead to encoding problems.

The Dify v0.6.5 codebase is designed to handle special characters like "ä" properly, as indicated by its use of UTF-8 encoding which supports these characters natively. This is further supported by the direct use of special characters in the source code, such as in the German translations. Therefore, the issue might not stem from the encoding of characters in the source code but rather from how these characters are processed or displayed in different parts of the application.

The headers set for JSON requests (Content-Type: application/json and Authorization) should not impact the encoding of special characters, as JSON inherently supports Unicode. Any encoding issues are more likely related to server-side handling or the JSON encoding process itself.

To better assist with resolving this issue, it would be beneficial to have more details about the specific conditions under which the encoding issue occurs, including the models, techniques, or parameters used. Additionally, any error messages or logs observed when the issue occurs could provide valuable insights into the underlying cause.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Apr 30, 2024
@JohnJyong JohnJyong self-assigned this May 7, 2024
@rchlz
Copy link

rchlz commented May 10, 2024

i have the same problem, the only difference is api url

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants