PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Performance Question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance Question

    Hi there --

    I had a few performance questions about Zoom. I've seen a few things here about the index being loaded into RAM, and was wondering if you could provide more details -- are we talking about the Web server or about the computer the site is being indexed from?

    I'm considering using Zoom on a Web site for our newspaper to be used as the site search engine, and also (and more importantly) to search through PDF archives of the Newspapers. There would be about 20 PDF files going into it per day, and indexing at night isn't a problem -- I'm just worried about performance issues and if Zoom would be able to handle this. I'm not as concerned about the indexing on a desktop as I am the search functionality and the server.

    Any details would be great.

  • #2
    Ooops -- I had another question, too.

    Is it possible to include other PHP scripts in the Zoom search results output page? For example:

    SEARCH RESULTS
    Results in Articles and Archives (Zoom)

    Results in Community (Integrate from a forum search)

    On the Web (Integrate with Google)

    Comment


    • #3
      In answer to the first post.

      During indexing, part of the index is temporarily stored in RAM and part of the index is stored directly into files on the hard disk. This split is done for performance reasons and to minimise the RAM usage. The data structures that change a lot during indexing are held in RAM. Data that doesn't change (after a particular page is indexed) is written to disk.

      This allows large web sites to be indexed at high speed. With enough RAM in your PC you can get to 150,000+ pages in the index.

      Here are details about what you can expect in terms of indexing speed.
      http://www.wrensoft.com/forum/viewtopic.php?t=525

      The index files are compressed and have been stripped of extraneous data. Typically the index files will be 70% smaller that the HTML pages that were indexed. In some cases PDF files compress even more, you might get compression rates around 90%, depending on how your PDFs were made.

      So the index files are normally much smaller than the source files.

      During searching, ONLY PART of the index files are loaded into RAM on the server. Again there is a trade off between speed and RAM usage. (we could lower the RAM usage but increase the search times). Hopefully we have set a good balance.

      We have a benchmark page here, that gives some typical search times,
      http://www.wrensoft.com/zoom/benchmarks.html

      You will see that typical searches only last a second or two. (in the case of the CGI search times are normally sub 1 second). So the RAM usage on the server is very brief. A search might use 30MB on the server, for 2 seconds, but the RAM is then released back to the free memory pool.

      As far as I am aware there have been no complaints from any users about the scripts using too much RAM on the server. A typical web server now has around a GB of RAM in any case.

      So I can't imagine it would be a problem for your server unless your server has less than 128MB of RAM.

      -----
      David

      Comment


      • #4
        In answer to the second post.

        You can wrap our PHP script in your own PHP script to concatenate the Zoom output with the output of other scripts. This can be useful for adding dynamic page headers and footers, plus uses like you have outlined above.

        Details are here,
        http://www.wrensoft.com/zoom/support/faq_ssi.html

        -----
        David

        Comment

        Working...
        X