PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Zoom not picking up all files/desc files - correlation with number of threads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zoom not picking up all files/desc files - correlation with number of threads

    It seems like there's a positive correlation between the number of threads Zoom uses and the number of "Invalid URLs" and missing .desc files in the search results. The higher the number of threads, the more Invalid URLs and missing .desc files there are. Is this something webserver related, like the webserver can't keep up with all the opening and closing of connections/requests? We're trying to index a large number of pdf files and their respective .desc files.

    And the missing pdf/desc files are always different, so it's pretty random. All the files exist and can be brought up normally in a web browser. Only when Zoom is indexing will it somehow not be able to download the file(s).

  • #2
    Originally posted by cchan View Post
    The higher the number of threads, the more Invalid URLs and missing .desc files there are. Is this something webserver related, like the webserver can't keep up with all the opening and closing of connections/requests?
    That's correct and it's quite likely the case if your server is overloaded (typically in cheaper shared hosting scenarios where your host puts your site on a server with 1000s of other sites). Servers will start denying requests when it hits a certain load. However, note that Zoom can only, at most, simulate the load of 10 simultaneous users (with 10 download threads) on your server. So if your server struggles with that, it may be an issue you should address with your web host. You could also, of course, lower the number of threads used by Zoom and thus decrease the load put on the server during indexing.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      I was able to get rid of the problem by enabling the KeepAlive feature in Apache on our webserver. This basically allows for persistent connections so in Zoom's case, there's no 3-way tcp handshake overhead opening and closing connections while it's indexing the site. The downside is that keeping connections open doesn't allow other connections that might be waiting to be served. But this can be minimized by setting the KeepAlive timeout value small (in our case I set it to 2 seconds instead of the default 15). Another benefit to enabling the KeepAlive is that it shaved 7 minutes off our total indexing time.

      So if you don't already suggest it to users, I'd say it's a good setting to know especially if the website is large and traffic volume is manageable.

      Comment

      Working...
      X