usp.web_client.abstract_client

Abstract web client class.

class usp.web_client.abstract_client.AbstractWebClient

Abstract web client to be used by the sitemap fetcher.

abstractmethod get(url: str) AbstractWebClientResponse

Fetch a URL and return a response.

Method shouldn’t throw exceptions on connection errors (including timeouts); instead, such errors should be reported via Response object.

Parameters:

url – URL to fetch.

Returns:

Response object.

abstractmethod set_max_response_data_length(max_response_data_length: int | None) None

Set the maximum number of bytes that the web client will fetch.

Parameters:

max_response_data_length – Maximum number of bytes that the web client will fetch, or None to fetch all.

usp.web_client.abstract_client.RETRYABLE_HTTP_STATUS_CODES = {400, 408, 429, 499, 500, 502, 503, 504, 509, 520, 521, 522, 523, 524, 525, 526, 527, 530, 598}

HTTP status codes on which a request should be retried.

class usp.web_client.abstract_client.AbstractWebClientResponse

Abstract response.

class usp.web_client.abstract_client.AbstractWebClientSuccessResponse

Bases: AbstractWebClientResponse

Successful response.

abstractmethod header(case_insensitive_name: str) str | None

Return HTTP header value for a given case-insensitive name, or None if such header wasn’t set.

Parameters:

case_insensitive_name – HTTP header’s name, e.g. “Content-Type”.

Returns:

HTTP header’s value, or None if it was unset.

abstractmethod raw_data() bytes

Return encoded raw data of the response.

Returns:

Encoded raw data of the response.

abstractmethod status_code() int

Return HTTP status code of the response.

Returns:

HTTP status code of the response, e.g. 200.

abstractmethod status_message() str

Return HTTP status message of the response.

Returns:

HTTP status message of the response, e.g. “OK”.

abstractmethod url() str

Return the actual URL fetched, after any redirects.

Returns:

URL fetched.

class usp.web_client.abstract_client.WebClientErrorResponse

Bases: AbstractWebClientResponse

Error response.

__init__(message: str, retryable: bool)

Constructor.

Parameters:
  • message – Message describing what went wrong.

  • retryable – True if the request should be retried.

message() str

Return message describing what went wrong.

Returns:

Message describing what went wrong.

retryable() bool

Return True if request should be retried.

Returns:

True if request should be retried.

class usp.web_client.abstract_client.LocalWebClient

Bases: AbstractWebClient

Dummy web client which is a valid implementation but errors if called.

Used for local parsing

get(url: str) AbstractWebClientResponse

Fetch a URL and return a response.

Method shouldn’t throw exceptions on connection errors (including timeouts); instead, such errors should be reported via Response object.

Parameters:

url – URL to fetch.

Returns:

Response object.

set_max_response_data_length(max_response_data_length: int | None) None

Set the maximum number of bytes that the web client will fetch.

Parameters:

max_response_data_length – Maximum number of bytes that the web client will fetch, or None to fetch all.

class usp.web_client.abstract_client.NoWebClientException

Bases: Exception

Error indicating this web client cannot fetch pages.

__init__(*args, **kwargs)
classmethod __new__(*args, **kwargs)