Settings

Configuring the settings denoted below would follow the usual methods used by Scrapy.

SCRAPY_POET_PROVIDERS

Default: {}

A dict wherein the keys would be the providers available for your Scrapy project while the values denotes the priority of the provider.

More info on this at this section: Providers.

SCRAPY_POET_OVERRIDES

Deprecated. Use SCRAPY_POET_RULES instead.

SCRAPY_POET_RULES

Default: web_poet.default_registry.get_rules()

Accepts a List[ApplyRule] which sets the rules to use.

Warning

Although SCRAPY_POET_RULES already has values set from the return value of web_poet.default_registry.get_rules(), make sure to also set the SCRAPY_POET_DISCOVER setting below.

There are sections dedicated for this at Scrapy Tutorial and Rules from web-poet.

SCRAPY_POET_DISCOVER

Default: []

A list of packages/modules (i.e. List[str]) which scrapy-poet will look for page objects annotated with the web_poet.handle_urls() decorator. Each package/module is passed into web_poet.consume_modules where each module from a package is recursively loaded.

This ensures that when using the default value of SCRAPY_POET_RULES set to web_poet.default_registry.get_rules(), it should contain all the intended rules.

Note that it’s also possible for SCRAPY_POET_RULES to have rules not specified in SCRAPY_POET_DISCOVER (e.g. when the annotated page objects are inside your Scrapy project). However, it’s recommended to still use SCRAPY_POET_DISCOVER to ensure all the intended rules are properly loaded.

SCRAPY_POET_CACHE

Default: None

The caching mechanism in the providers can be enabled by either setting this to True which configures the file path of the cache into a .scrapy/ dir in your local Scrapy project.

On the other hand, you can also set this as a str pointing to any path relative to your local Scrapy project.

SCRAPY_POET_CACHE_ERRORS

Default: False

When this is set to True, any error that arises when retrieving dependencies from providers would be cached. This could be useful in cases during local development wherein you outright know that retrieving the dependency would fail and would choose to ignore it. Caching such errors would reduce the waiting time when developing Page Objects.

It’s recommended to set this off into False by default since you might miss out on sporadic errors.

SCRAPY_POET_TESTS_DIR

Default: fixtures

Sets the location where the savefixture command creates tests.

More info at Tests for Page Objects.

SCRAPY_POET_TESTS_ADAPTER

Default: None

Sets the class, or its import path, that will be used as an adapter in the generated test fixtures.

More info at Configuring the item adapter.

SCRAPY_POET_REQUEST_FINGERPRINTER_BASE_CLASS

The default value is the default value of the REQUEST_FINGERPRINTER_CLASS setting for the version of Scrapy currently installed (e.g. "scrapy.utils.request.RequestFingerprinter").

You can assign a request fingerprinter class to this setting to configure a custom request fingerprinter class to use for requests.

This class is used to generate a base fingerprint for a request. If that request uses dependency injection, that fingerprint is then modified to account for requested dependencies. Otherwise, the fingerprint is used as is.

Note

Annotations of annotated dependencies are serialized with repr() for fingerprinting purposes. If you find a real-world scenario where this is a problem, please open an issue.