Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes: "langchain.readthedocs.io" -> "python.langchain.com", else it only downloads a single index.html #56

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Madrawn
Copy link

@Madrawn Madrawn commented Apr 18, 2023

The current ingest wget command only downloads a single index.html file, I noticed that "https://langchain.readthedocs.io/en/latest/" redirects to "https://python.langchain.com/en/latest/" and when I change the script to use the second url it downloads correctly everything recursively. Is the wget command used wrongly, or perhaps did the documentation link change and the script is outdated?

Anyways now it scrapes the docs correctly.
Extra: +ingest.bat for us windows scrubs.

@francopiccolo
Copy link

This also doesn't load all the data properly any more. Anyone knows how to scrape the docs properly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants