You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Bernhard Ehlers 2349ea31d2 Some requests error messages made less cryptic 1 year ago
LICENSE Version 1.2.0 3 years ago Version 1.2.5: Use requests to fetch web pages 1 year ago Some requests error messages made less cryptic 1 year ago

Sitemap Generator

sitemap_gen is a Python program, that crawls a web site and outputs a XML sitemap.

It has been created by Vladimir Toncar and documented in

This version is a port of his fine program to Python 3.




python3 -b doc -b bmp -o test_sitemap.xml

Command Line Arguments

python3 <options> <starting URL>

Available options:
-h         --help                Print this text and exit

-b <ext>   --block <ext>         Exclude URLs with the given extension;
                                 <ext> must be without the leading dot.
                                 The comparison is case insensitive, so
                                 for example DOC and doc are treated
                                 the same. You can use this option several
                                 times to block several extensions.

-c <value> --changefreq <value>  Set the change frequency. The given value
                                 is used in all sitemap entries (maybe a
                                 future version of this script will change
                                 that). The allowed values are: always,
                                 hourly, daily, weekly, monthly, yearly,

-p <prio>  --priority <prio>     Set the priority. The value must be from
                                 the interval between 0.0 and 1.0. The value
                                 will be used in all sitemap entries.

-m <value> --max-urls <value>    Set the maximum number of URLs to be crawled.
                                 The default value is 1000 and the largest
                                 value that you can set is 50000 (the script
                                 generates only a single sitemap file).

-r <value> --ratelimit <value>   Set a crawl rate limit [requests / second],
                                 zero (the default) results in no crawl rate

-o <file>  --output-file <file>  Set the name of the geneated sitemap file.
                                 The default file name is sitemap.xml.