PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing halts on CRC match

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing halts on CRC match

    I have noticed that my indexer stops indexing when a CRC match of a second document is found.

    Anyone seen this?

  • #2
    Can you describe how it "stops indexing"? eg. does it finish and create the index files, or does it just sits there like its still looking for files (and does so for a very long time)?

    Can you also check that you are using the latest build of Zoom available (Version 4.0 Build 1016):
    http://www.wrensoft.com/zoom/whatsnew.html

    If this continues to happen, save the index log after indexing ("File"->"Save index log to file"), make sure you have Verbose mode on, and e-mail the log as well as your ZCFG file to us via zoom [at] wrensoft [dot] com.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      It just halts.

      no saving of files.
      just sits there.

      I'll test it with 1016.

      Comment


      • #4
        Problem stll exists.....

        testing with the latest build results in the same problem.

        it halts (hangs).

        If i click stop, it completes the indexing and uploads the new indexs.

        see the line near the end:
        "15:09:11 - [SKIPPED] Skipping http://www.wdf.org/ (Identical page found: CRC-32 signature matched)"

        please see the log:
        http://www.conway.com/search/cdi-indexes-log.txt

        Comment


        • #5
          Think we've found the problem. It's actually when you have a start point which fails a CRC-32 check. Zoom then doesn't realize it should move on to the next start point (or stop indexing).

          We'll fix this bug in Version 4.1. In the meantime, you can remove this start point (http://www.wdf.org/) or disable CRC-32.

          The reason it fails, by the way, is because of http://www.conway.com/wdf/ which was indexed earlier and is the same page.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Excellent work guys!

            Would it work out better to tell it to ignote the directory http://www.conway.com/wdf/

            Then when it hit the wdf domain, it would work?

            Ill ask before looking, but how do i tell it to ignore the http://www.conway.com/wdf/ path?

            Comment


            • #7
              i added
              http://www.conway.com/wdf/
              To the skip list.

              It still hung at the same place.

              I also looked in the logs, it shows that it found and indexed the http://www.conway.com/wdf/ path earlier, and the new log with the skip instruction shows no evidence that the sring "http://www.conway.com/wdf/" was read.

              So it skipped it and still hung.

              Comment


              • #8
                Perhaps another URL somewhere else on your site (or one of the ones you're indexing) has another copy of the same page?

                What you should try to do is, re-arrange your start points so that "http://www.wdf.org/" is first. Because the order of the start points change the order in which pages are indexed, it would allow this site to index first, and then the other "copies" of this page will be skipped accordingly.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  this is strange,
                  becuase i re-ran the index after removing the

                  "*" from the end of the path in question. (I thought it was required) and now it flies by the start point with CRC32 enabled.

                  Good call!

                  Comment


                  • #10
                    Glad to hear that you've got it working.

                    Just to clarify - the skip pages list does not support wildcards (eg. "*"), it serves more like a list of keywords which is matched against the URL. There are some examples in the Help file if you need more info. Hope that helps.
                    --Ray
                    Wrensoft Web Software
                    Sydney, Australia
                    Zoom Search Engine

                    Comment

                    Working...
                    X