scrapy-poet documentation
scrapy-poet
allows to use web-poet Page Objects with Scrapy.
web-poet defines a standard for writing reusable and portable extraction and crawling code; please check its docs to learn more.
By using scrapy-poet
you’ll be organizing the spider code in a different
way, which separates extraction and crawling logic from the I/O,
and from the Scrapy implementation details as well.
It makes the code more testable and reusable. Furthermore, it
opens the door to create generic spider code that works across sites.
Integrating a new site in the spider is then just a matter of write
a bunch of Page Objects for it.
scrapy-poet
also provides a way to integrate third-party APIs
(like Splash and AutoExtract) with the spider, without losing
testability and reusability.
Concrete integrations are not provided by web-poet
, but
scrapy-poet
makes them possbile.
To get started, see Installation and Scrapy Tutorial.
License is BSD 3-clause.