You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Jina reader fails to parse URLs containing Chinese characters
Description:
We have encountered an issue where the Jina reader fails to parse URLs that contain Chinese characters. This issue is causing our application to throw errors and prevents us from properly extracting content from certain websites.
Steps to Reproduce:
Make a request to the Jina reader API with a URL containing Chinese characters, such as https://zh.wikipedia.org/wiki/%E5%91%A8%E6%9D%B0%E5%80%AB.
Observe that the Jina reader fails to parse the URL and returns an error.
Expected Behavior:
We expect the Jina reader to properly handle URLs containing Chinese characters and successfully parse the corresponding web pages. The reader should be able to decode the URL, retrieve the web page content, and return it as expected.
Actual Behavior:
When a URL containing Chinese characters is passed to the Jina reader, it fails to parse the URL and throws an error. The error message typically indicates that the reader is unable to read properties of undefined, specifically the 'parentNode' property.
Example Error Message:
Failed tofetchhttps://zh.wikipedia.org/wiki/%E5%91%A8%E6%9D%B0%E5%80%AB: {"code":500,"status":50000,"message":"Cannot read properties of undefined (reading 'parentNode')","name":"TypeError"}
Impact:
This issue prevents our application from properly extracting content from websites that have URLs containing Chinese characters. It limits the functionality of our application and affects the user experience when dealing with such websites.
Potential Causes:
The Jina reader may not be properly decoding the URL before making the request, leading to an invalid URL being passed to the underlying parsing logic.
The parsing logic within the Jina reader may not be handling URLs with Chinese characters correctly, resulting in the "Cannot read properties of undefined" error.
Workaround:
As a temporary workaround, we have implemented a filtering mechanism in our application to skip URLs that contain Chinese characters. However, this is not an ideal solution as it limits the functionality and coverage of our application.
Request:
We kindly request the Jina team to investigate this issue and provide a fix that allows the Jina reader to properly handle URLs containing Chinese characters. It would be greatly appreciated if you could provide an update on the progress and an estimated timeline for the resolution.
Additional Information:
We are currently using the official Jina reader API, not the open-source service.
We are in the process of setting up our own service, but we are unsure if the open-source service also has this issue.
Please let us know if you require any further information or if there are any specific details you need to investigate and resolve this issue.
Thank you for your attention to this matter. We look forward to your response and resolution.
Best regards,
Loki.W
The text was updated successfully, but these errors were encountered:
Issue: Jina reader fails to parse URLs containing Chinese characters
Description:
We have encountered an issue where the Jina reader fails to parse URLs that contain Chinese characters. This issue is causing our application to throw errors and prevents us from properly extracting content from certain websites.
Steps to Reproduce:
https://zh.wikipedia.org/wiki/%E5%91%A8%E6%9D%B0%E5%80%AB
.Expected Behavior:
We expect the Jina reader to properly handle URLs containing Chinese characters and successfully parse the corresponding web pages. The reader should be able to decode the URL, retrieve the web page content, and return it as expected.
Actual Behavior:
When a URL containing Chinese characters is passed to the Jina reader, it fails to parse the URL and throws an error. The error message typically indicates that the reader is unable to read properties of undefined, specifically the 'parentNode' property.
Example Error Message:
Impact:
This issue prevents our application from properly extracting content from websites that have URLs containing Chinese characters. It limits the functionality of our application and affects the user experience when dealing with such websites.
Potential Causes:
Workaround:
As a temporary workaround, we have implemented a filtering mechanism in our application to skip URLs that contain Chinese characters. However, this is not an ideal solution as it limits the functionality and coverage of our application.
Request:
We kindly request the Jina team to investigate this issue and provide a fix that allows the Jina reader to properly handle URLs containing Chinese characters. It would be greatly appreciated if you could provide an update on the progress and an estimated timeline for the resolution.
Additional Information:
Please let us know if you require any further information or if there are any specific details you need to investigate and resolve this issue.
Thank you for your attention to this matter. We look forward to your response and resolution.
Best regards,
Loki.W
The text was updated successfully, but these errors were encountered: