PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Stopped handling Multiple Threads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stopped handling Multiple Threads

    I've been using Zoom search for several years without a hitch. It looks like it is no longer handling multiple threads at the same time. It seems like it's running at about 1/5 the speed as it had the last time I ran it. No zoom configuration changes have been made and site site hosting is the same since it last worked fine (about 30 days ago). I can't think of anything that is different. I've even removed AV/Firewall, and updated from v6.0.1020 to 6.0.1025, but no change.

    I've confirmed configuration options are set for Multiple threads (5) and spider set to no-delay between pages. When running the status window shows the 5 threads, but instead of all running at the same time, only one appears in a slot at one time, randomly switching between the thread slots and all the other 4 slots show N/A. It's taking a bit more than about 1 second per page. The site has more than 40,000 pages, so it's going to take about a day to process this without the multi-thread option working. The last time I ran it, it took about 4-5 hours.

    Any ideas where to look? Is there any way to confirm if it's a local PC issue or Server issue blocking multi-threading?

  • #2
    Can you tell us the URL for the site? We can try a few pages from here to see if it is the site causing the slow down or your PC.

    My guess, without looking at the site, is that it might be some throttling on the site, or the site might be overloaded.

    Also are you sure you don't have a robots.txt file on your site with a spider throttling command in it? Like this,
    Crawl-delay: 10

    Comment


    • #3
      Thanks for the fast reply.

      I was wrong - it is slow, but it took as long as it did last time I ran it (about 6 hours).

      A good catch on the robots.txt file. I do have the Crawl-delay:1 and that explains the 1 second per page. It was introduced sometime in the last few months.

      Is there an option to ignore just the Crawl-delay robots setting (but still use the rest of robots.txt)? If not, I may just build an alternative robots.txt file to use while I run the Zoom Indexer, as I know it will run a lot faster than 1 page per second!

      Comment


      • #4
        Originally posted by vcor View Post
        Is there an option to ignore just the Crawl-delay robots setting (but still use the rest of robots.txt)? If not, I may just build an alternative robots.txt file to use while I run the Zoom Indexer, as I know it will run a lot faster than 1 page per second!
        You can disable the obeying of the "robots.txt" file in Zoom by unchecking the option under "Configure"->"Spider options"->"Enable 'robots.txt' support"

        But you can't ask it to obey the rest of the robots.txt file and only ignore the "crawl-delay" setting.

        Note that you can replicate most, if not all, of the robots.txt settings within Zoom via the Skip Options (to control what pages to exclude).
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X