Settings
Configuring the settings denoted below would follow the usual methods used by Scrapy.
SCRAPY_POET_PROVIDERS
Default: {}
A dict
wherein the keys would be the providers available for your Scrapy
project while the values denotes the priority of the provider.
More info on this at this section: Providers.
SCRAPY_POET_OVERRIDES
Deprecated. Use SCRAPY_POET_RULES
instead.
SCRAPY_POET_RULES
Default: web_poet.default_registry.get_rules()
Accepts a List[ApplyRule]
which sets the rules to use.
Warning
Although SCRAPY_POET_RULES
already has values set from the return value of
web_poet.default_registry.get_rules()
,
make sure to also set the SCRAPY_POET_DISCOVER
setting below.
There are sections dedicated for this at Scrapy Tutorial and Rules from web-poet.
SCRAPY_POET_DISCOVER
Default: []
A list of packages/modules (i.e. List[str]
) which scrapy-poet will look for
page objects annotated with the web_poet.handle_urls()
decorator. Each
package/module is passed into
web_poet.consume_modules
where each
module from a package is recursively loaded.
This ensures that when using the default value of SCRAPY_POET_RULES
set to
web_poet.default_registry.get_rules()
,
it should contain all the intended rules.
Note that it’s also possible for SCRAPY_POET_RULES
to have rules not specified
in SCRAPY_POET_DISCOVER
(e.g. when the annotated page objects are inside your
Scrapy project). However, it’s recommended to still use SCRAPY_POET_DISCOVER
to ensure all the intended rules are properly loaded.
SCRAPY_POET_CACHE
Default: None
The caching mechanism in the providers can be enabled by either setting this
to True
which configures the file path of the cache into a .scrapy/
dir
in your local Scrapy project.
On the other hand, you can also set this as a str
pointing to any path relative
to your local Scrapy project.
SCRAPY_POET_CACHE_ERRORS
Default: False
When this is set to True
, any error that arises when retrieving dependencies from
providers would be cached. This could be useful in cases during local development
wherein you outright know that retrieving the dependency would fail and would
choose to ignore it. Caching such errors would reduce the waiting time when
developing Page Objects.
It’s recommended to set this off into False
by default since you might miss
out on sporadic errors.
SCRAPY_POET_TESTS_DIR
Default: fixtures
Sets the location where the savefixture
command creates tests.
More info at Tests for Page Objects.
SCRAPY_POET_TESTS_ADAPTER
Default: None
Sets the class, or its import path, that will be used as an adapter in the generated test fixtures.
More info at Configuring the item adapter.
SCRAPY_POET_REQUEST_FINGERPRINTER_BASE_CLASS
The default value is the default value of the REQUEST_FINGERPRINTER_CLASS
setting for the version of Scrapy currently installed (e.g.
"scrapy.utils.request.RequestFingerprinter"
).
You can assign a request fingerprinter class to this setting to configure a custom request fingerprinter class to use for requests.
This class is used to generate a base fingerprint for a request. If that request uses dependency injection, that fingerprint is then modified to account for requested dependencies. Otherwise, the fingerprint is used as is.
Note
Annotations of annotated dependencies are
serialized with repr()
for fingerprinting purposes. If you find a
real-world scenario where this is a problem, please open an issue.