usp.objects.sitemap

Objects that represent one of the found sitemaps.

Inheritance diagram of AbstractSitemap, InvalidSitemap, AbstractIndexSitemap, IndexWebsiteSitemap, IndexXMLSitemap, IndexRobotsTxtSitemap, AbstractPagesSitemap, PagesXMLSitemap, PagesTextSitemap, PagesRSSSitemap, PagesAtomSitemap

class usp.objects.sitemap.AbstractSitemap

Bases: object

Abstract sitemap.

__init__(url: str)

Initialize a new sitemap.

Parameters:

url – Sitemap URL.

all_pages() Iterator[SitemapPage]

Return iterator which yields all pages of this sitemap and linked sitemaps (if any).

Returns:

Iterator which yields all pages of this sitemap and linked sitemaps (if any).

all_sitemaps() Iterator[AbstractSitemap]

Return iterator which yields all sub-sitemaps descended from this sitemap.

Returns:

Iterator which yields all sub-sitemaps descended from this sitemap.

to_dict(with_pages=True) dict

Return a dictionary representation of the sitemap, including its child sitemaps and optionally pages

Parameters:

with_pages – Include pages in the representation of this sitemap or descendants.

Returns:

Dictionary representation of the sitemap.

abstract property pages: List[SitemapPage]

Return a list of pages found in a sitemap (if any).

Should return an empty list if this sitemap cannot have sub-pages, to allow traversal with a consistent interface.

Returns:

the list of pages, or an empty list.

abstract property sub_sitemaps: List[AbstractSitemap]

Return a list of sub-sitemaps of this sitemap (if any).

Should return an empty list if this sitemap cannot have sub-pages, to allow traversal with a consistent interface.

Returns:

the list of sub-sitemaps, or an empty list.

property url: str

Return sitemap URL.

Returns:

Sitemap URL.

class usp.objects.sitemap.InvalidSitemap

Bases: AbstractSitemap

Invalid sitemap, e.g. the one that can’t be parsed.

__init__(url: str, reason: str)

Initialize a new invalid sitemap.

Parameters:
  • url – Sitemap URL.

  • reason – Reason why the sitemap is deemed invalid.

to_dict(with_pages=True) dict

Return a dictionary representation of the sitemap, including its child sitemaps and optionally pages

Parameters:

with_pages – Include pages in the representation of this sitemap or descendants.

Returns:

Dictionary representation of the sitemap.

property pages: List[SitemapPage]

Return an empty list of pages, as invalid sitemaps have no pages.

Returns:

Empty list of pages.

property reason: str

Return reason why the sitemap is deemed invalid.

Returns:

Reason why the sitemap is deemed invalid.

property sub_sitemaps: List[AbstractSitemap]

Return an empty list of sub-sitemaps, as invalid sitemaps have no sub-sitemaps.

Returns:

Empty list of sub-sitemaps.

Index Sitemaps

class usp.objects.sitemap.AbstractIndexSitemap

Bases: AbstractSitemap

Abstract sitemap with URLs to other sitemaps.

__init__(url: str, sub_sitemaps: List[AbstractSitemap])

Initialize index sitemap.

Parameters:
  • url – Sitemap URL.

  • sub_sitemaps – Sub-sitemaps that are linked to from this sitemap.

all_pages() Iterator[SitemapPage]

Return iterator which yields all pages of this sitemap and linked sitemaps (if any).

Returns:

Iterator which yields all pages of this sitemap and linked sitemaps (if any).

all_sitemaps() Iterator[AbstractSitemap]

Return iterator which yields all sub-sitemaps of this sitemap.

Returns:

Iterator which yields all sub-sitemaps of this sitemap.

to_dict(with_pages=True) dict

Return a dictionary representation of the sitemap, including its child sitemaps and optionally pages

Parameters:

with_pages – Include pages in the representation of this sitemap or descendants.

Returns:

Dictionary representation of the sitemap.

property pages: List[SitemapPage]

Return an empty list of pages, as index sitemaps have no pages.

Returns:

Empty list of pages.

property sub_sitemaps: List[AbstractSitemap]

Return a list of sub-sitemaps of this sitemap (if any).

Should return an empty list if this sitemap cannot have sub-pages, to allow traversal with a consistent interface.

Returns:

the list of sub-sitemaps, or an empty list.

property url: str

Return sitemap URL.

Returns:

Sitemap URL.

class usp.objects.sitemap.IndexWebsiteSitemap

Bases: AbstractIndexSitemap

Website’s root sitemaps, including robots.txt and extra ones.

class usp.objects.sitemap.IndexXMLSitemap

Bases: AbstractIndexSitemap

XML sitemap with URLs to other sitemaps.

class usp.objects.sitemap.IndexRobotsTxtSitemap

Bases: AbstractIndexSitemap

robots.txt sitemap with URLs to other sitemaps.

Page Sitemaps

class usp.objects.sitemap.AbstractPagesSitemap

Bases: AbstractSitemap

Abstract sitemap that contains URLs to pages.

__init__(url: str, pages: List[SitemapPage])

Initialize new pages sitemap.

Parameters:
  • url – Sitemap URL.

  • pages – List of pages found in a sitemap.

all_pages() Iterator[SitemapPage]

Return iterator which yields all pages of this sitemap and linked sitemaps (if any).

Returns:

Iterator which yields all pages of this sitemap and linked sitemaps (if any).

all_sitemaps() Iterator[AbstractSitemap]

Return iterator which yields all sub-sitemaps descended from this sitemap.

Returns:

Iterator which yields all sub-sitemaps descended from this sitemap.

to_dict(with_pages=True) dict

Return a dictionary representation of the sitemap, including its child sitemaps and optionally pages

Parameters:

with_pages – Include pages in the representation of this sitemap or descendants.

Returns:

Dictionary representation of the sitemap.

property pages: List[SitemapPage]

Load pages from disk swap file and return them.

Returns:

List of pages found in the sitemap.

property sub_sitemaps: List[AbstractSitemap]

Return an empty list of sub-sitemaps, as pages sitemaps have no sub-sitemaps.

Returns:

Empty list of sub-sitemaps.

property url: str

Return sitemap URL.

Returns:

Sitemap URL.

class usp.objects.sitemap.PagesXMLSitemap

Bases: AbstractPagesSitemap

XML sitemap that contains URLs to pages.

class usp.objects.sitemap.PagesTextSitemap

Bases: AbstractPagesSitemap

Plain text sitemap that contains URLs to pages.

class usp.objects.sitemap.PagesRSSSitemap

Bases: AbstractPagesSitemap

RSS 2.0 sitemap that contains URLs to pages.

class usp.objects.sitemap.PagesAtomSitemap

Bases: AbstractPagesSitemap

RSS 0.3 / 1.0 sitemap that contains URLs to pages.