pip install scrapfly-sdk
You can also install extra dependencies
pip install "scrapfly-sdk[seepdup]"
for performance improvementpip install "scrapfly-sdk[concurrency]"
for concurrency out of the box (asyncio / thread)pip install "scrapfly-sdk[scrapy]"
for scrapy integrationpip install "scrapfly-sdk[all]"
Everything!
For use of built-in HTML parser (via ScrapeApiResponse.selector
property) additional requirement of either parsel or scrapy is required.
For reference of usage or examples, please checkout the folder /examples
in this repository.
You can create a free account on Scrapfly to get your API Key.
asyncio-pool dependency has been dropped
scrapfly.concurrent_scrape
is now an async generator. If the concurrency is None
or not defined, the max concurrency allowed by
your current subscription is used.
async for result in scrapfly.concurrent_scrape(concurrency=10, scrape_configs=[ScrapConfig(...), ...]):
print(result)
brotli args is deprecated and will be removed in the next minor. There is not benefit in most of case versus gzip regarding and size and use more CPU.
- Better error log
- Async/Improvement for concurrent scrape with asyncio
- Scrapy media pipeline are now supported out of the box