Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect robots.txt and identify your system #4

Open
respatialized opened this issue Apr 14, 2024 · 0 comments
Open

Respect robots.txt and identify your system #4

respatialized opened this issue Apr 14, 2024 · 0 comments

Comments

@respatialized
Copy link

Recently, some AI companies have given website administrators the option of opting out of AI training by using configuration options in robots.txt.

While this project is for prompting and RAG rather than training, I still think you should provide an option for website users to prevent their websites from becoming ad-hoc databases for or components of AI systems. It seems like you have made your software default to evading detection by using puppeteer's stealth plugin; the user-agent configuration that would allow website owners to identify your project's bots is commented out.

I think this default is deceptive and irresponsible. You should make sure users of your project respect these preferences by incorporating them into the software's defaults. Web administrators may not be inclined to support the additional traffic generated by people using their websites as a component of AI systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant