PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Missing results when searching for exact phrases

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing results when searching for exact phrases

    I've come across an oddity where no results are found, but, in fact there are files with the exact text.

    It most frequently occurrences on larger files (300kb+). I was able to whittle down a file to 2kb though and reproduce the issue. I have narrowed down the file to a point where pretty much any change at the top of the file affects an exact phrase match at the bottom of the file. It seems to be related to how many times the words in the exact phrase match occurs in a document before the exact match

    Seems to occur around 290 instances or more of words in the phrase occurring before. It doesn't seem to be exact though, and seems to be a combination of both words in the matching phrase occurring in a document before the match. Maybe there is some kind of seek/search limit for performance (I tried changing the optimization though)?

    This is kinda a big issue for us. We have many large pages that are long lists of system definitions. It's not good that the search is sometimes not providing results for these pages where it may be the only file with the exact system phrases in them. And who knows how many files it actually affects, it may affect 30kb files that contain a table with a repeated word in it many times before other text...

    Example file:
    Code:
    <html>
    <head><title>Test Doc</title></head>
    <body id="topic">
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p>
    
    <p>[COLOR="#FF0000"]day two[/COLOR] testing</p>
    
    </body>
    </html>
    Search: "day two"
    Result: No results found.

    Now go and remove/edit/change any of the repeated paragraphs at the top of the test file.
    After a re-index and search again, bam, the exact phrase in red is now correctly matched and returned as a result.

    Why would the number of occurrences of some of the words in an exact phrase search query in a document cause a non match when the exact phrase is actually further down in a document? I tried changing the optimization between Fast, Default, and Slow, same results and issue.

    Index with zoom Search Engine Indexer 7.1 build 1002
    Core Engine: Version 7.1 (Build: 1002) on Windows 7
    ASP.NET Server Control 32-bit build 1002b. Assembly Version: 7.0.5962.22541

    Sorry for the long winded post, the issue is a bit tricky to reproduce and somewhat random.
    Last edited by sanGeoff; May-03-2016, 06:14 PM.

  • #2
    I must say you have a thorough test team -- yes, we were able to confirm this bug.

    This was the result of an optimization implemented in our original search algorithm, that depended on certain index data, which we've since changed (for other benefit), and so the optimization is no longer applicable.

    We will remove this optimization, but as a result, exact phrase matching will be slower in some edge cases such as what you're testing (where the document contains many occurrences that ALMOST match the exact phrase).

    If you are interested, you can contact us by e-mail and we can issue out private test/patch builds to you directly. Address them to Ray and the tickets will be forwarded to me.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      There's a new build up with the aforementioned fix here.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Awesome, looks great, thanks again. You guys are awesome fast.

        I will let you know if our users find anything else. I may try email in the future now that we have upgraded to 7.

        Fyi, just in case you missed it, there is a new title highlighting issue I posted about on the other thread. My JavaScript workaround for that works for now though.

        Comment

        Working...
        X