PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

How does the auto index works?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does the auto index works?

    I had set the scheduler to run every 5 mins. But i'm curious how does the index works?

    Does it reindex the whole folder again ?
    OR
    It is able to only index those files that are newly added into the folder and those files that were indexed before but had new modification to it since last index.

    Utlimately, i would like to know if it would take a very long time for the scheduled index to finish indexing.

    Thanks.

  • #2
    I don't know anything about your web site. But every 5 mins would be far too often for most applications. Once a month or week would be more typical. Or maybe once per day, during the night, for sites that are updated a lot.

    You also want to avoid having a new indexing session start before the last one is complete. Which is a risk with only 5 min intervals.

    The default behaviour is to re-index everything. This can be altered with the incremental indexing options.

    The amount of time required for indexing depends on the size of your site, how remote the site is from you, the options you select and the hardware being used. There are some example V4 indexing benchmarks here.

    Comment


    • #3
      I'm actually indexing local documents. I have some files that is ongoing generating text files. I had tried to explore with the incremental indexing options but seems to me its to index for sites only.

      Had i did it the wrong way or is there other options that i have for indexing local documents.

      Comment


      • #4
        The "Updating an existing index" option is only available in Spider Mode. It is not available in Offline Mode. Offline mode indexing does not use any Internet traffic and it is many times quicker than spider mode, so this should not be necessary and we would recommend a full re-index for offline mode users.

        Even though you can't do an incremental update, it is however possible to add new documents to an existing offline index if you know which documents are new.

        How many documents are you indexing?

        Comment


        • #5
          If i start to index now, it would be about 300,000+ documents. But this figure is still increasing.

          If given this figure, is there any rough time guage that i can expect to wait ??

          Comment


          • #6
            In our tests, we got to around 100 files per second in offline mode. But that was with small HTML files (~10KB each) and reasonable hardware.

            I think with cutting edge hardware you might get up to around 150 files per sec. With bleeding edge hardware maybe 200 files per second.

            If you can match our rate (on reasonable hardware) you can index your 300,000 files in 50min.

            If you are on the bleeding edge, it might be as fast as 20min.

            Comment


            • #7
              Hmm... thinking through again. I may only need to index about 10,000 documents. The rest of the 290,000 documents would only be needed to be index once as they would be old documents with no new updates.

              Is there any way that i can set what folders not to index. In this way, i would be able to index the old doucments once and reindex those "newer" folders more frequently.

              Comment


              • #8
                You can create two sets of index files, e.g. "search archived material" and "search recent material".

                You can skip entire folders during indexing by adding the folder path to the skip list.

                You can also append new offline folders to an existing index.

                But you can't (within a single set of index files in offline mode) schedule some files to updated more often than others.

                Comment

                Working...
                X