Only generate sitemaps?
Is there a way to have Zoom scan all pages but only create a sitemap? My site has 100K pages and, while I normally exclude 70K of those pages in terms of indexing content for our site search function, I want to have our sitemaps include all 100K pages...but the resources to truly index that many pages would be enormous, as would the time to execute it.
So I am wondering whether Zoom can function just as a sitemap generator and only follow/record the links. Thanks very much.
For a typical site the bulk of the indexing time is spent downloading each file (the actual indexing of the words is quick once the file is downloaded). This assumes you are using spider mode.
In offline mode the download step is not required and the indexing of the words is a greater percentage of the overall scan time.
To create a site map it is necessary to download each page (in order to check if there are links on that page that lead to new pages that also need to appear in the site map).
So the extra overhead of indexing the words on the page isn't very high if you are already downloading each page. If you did want to limit the index size however you could set a low limit for the number of words to index per file (on the limits tab of the Zoom configuration window).
Note that if you are trying to index > 65,000 pages, you need to select the CGI option in Zoom.
The indexing speed can vary dramatically but if you can get 10 page per second, the job will be done in 2.7 hours. In offline mode it might only take 20min.
Tags for this Thread