Get Started¶
Ultimate Sitemap Parser can be installed from PyPI or conda-forge:
$ pip install ultimate-sitemap-parser
$ conda install -c conda-forge ultimate-sitemap-parser
Traversing a website’s sitemaps and retrieving all webpages requires just a single line of code:
from usp.tree import sitemap_tree_for_homepage
tree = sitemap_tree_for_homepage('https://example.org/')
This will return a tree representing the structure of the sitemaps. To iterate through the pages, use tree.all_pages()
.
for page in tree.all_pages():
print(page.url)
This will output the URL of each page in the sitemap, loading the parsed representations of sitemaps lazily to reduce memory usage in very large sitemaps.
Each page is an instance of SitemapPage
, which will always have at least a URL and priority, and may have other attributes if present.
Local Parsing¶
USP is primarily designed to fetch live sitemaps from the web, but does support local parsing too:
from usp.tree import sitemap_from_str
# Load your sitemap and parse it in
parsed_sitemap = sitemap_from_str("...")
for page in parsed_sitemap.all_pages():
print(page.url)
The returned object will be the appropriate child class of AbstractSitemap
. Page sitemaps will have their pages as above, but in index sitemaps each sub-sitemap will be an InvalidSitemap
(as it’s unable to make a request to fetch them).