PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Exact phrase search & garbled text

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exact phrase search & garbled text

    We are currently using Zoom search 6 but I tested on the free version of 7.
    3 people in our department have pointed this issue out to me already.

    When a user searches for a phrase, many times the result title text or context gets a bit garbled. Its like its matching and trying to highlight some matched words in the phrase but gets goofed up and offset in the text stream a bit. It seems to do it in context text to but its a bit harder to reproduce than in the title.

    For example, below the title of the matched document is ABC The quick Fox to the search of ABC Search, but zoom search displays The quick Fox Fox.


    Test File:
    Code:
    <html xml:lang="en-us" lang="en-us">
    <head>
    <title>ABC The quick Fox</title>
    </head>
    <body id="topic_1">
     Text test content. ABC Search Test.
    </body>
    </html>
    Search query:
    Code:
    "ABC Search"
    Results:
    Code:
    1 result found. 
    The quick Fox Fox
    ... ABC Search Test. ...
    
    Search took 0.004 seconds
    Setup:
    Zoom Search Indexer: 7.1 (1001)
    ASP.NET Server Control: Zoom Search 32-bit (7.1 Build 1000)

    Is this a known issues? What can be done about it?

    Thanks.

  • #2
    You can even get odd results on the Zoom search on this site:

    Example: http://www.wrensoft.com/search.php?z...&zoom_cat[]=-1

    Highlighted text is duplicated, slightly different issue.
    Last edited by sanGeoff; Apr-13-2016, 09:48 PM.

    Comment


    • #3
      This was a bug that should have been fixed since V7.0 build 1024.

      The live wrensoft.com search you referenced was using an earlier build. We updated the CGI just then to the latest build, and it is now fixed.

      I also just tested the latest ASP.NET Server Control and confirmed the bug has been fixed.

      Can you double check that you are using the latest V7.1 build 1000 version of the ASP.NET Server Control. Note that you have to uninstall the existing server control to install the new one. It is possible you are still actually loading the new one, especially if you have multiple paths.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thanks for the info.

        I just checked and tested and I am pretty sure I am using the latest version. I switched and tested between 32 and 64 bit ASP.NET Server Control controls and know it took effect because I had to adjust the application pool 32bit support.

        Is the versions available on the site the latest?
        http://www.wrensoft.com/zoom/aspdotnet.html
        The footer says (Version 7.1 Build 1000).
        The timestamp on the DLL files say 1/12/2016.

        I added <% Response.Write((typeof(ASPXSearch).Assembly.GetNam e().Version)); %> to the search page.
        It reports: 7.0.5855.30451

        Is that the latest that should be fixed?
        Its seems to only occur when the partial phrase word match is the first word in the title.
        It just strips it out the text and then repeats text at the end (sometimes even merging words)


        Complete page results:
        Code:
        Search this site
        
        Enter one or more keywords to search for using the Zoom Search Engine.
         Note that '*' and '?' wildcards are supported. 
        
            Search  
        
        Search results for: "abc search"
        1 result found. 
        
        1. [COLOR="#FF0000"]The quick Fox Fox[/COLOR]
        ... ABC Search Test. ...
        
        Search powered by Zoom Search Engine
        
        Search took 0.002 seconds
        [COLOR="#FF0000"]7.0.5855.30451[/COLOR]

        Indexed page:
        Code:
        <html xml:lang="en-us" lang="en-us">
        <head>
        <title>[COLOR="#FF0000"]ABC The quick Fox[/COLOR]</title>
        </head>
        <body id="topic_1">
         Text test content. ABC Search Test.
        </body>
        </html>
        Thanks.
        Last edited by sanGeoff; Apr-14-2016, 04:54 PM.

        Comment


        • #5
          It might be a different scenario. The situation I tested was the one you observed on our website. I'll try to reproduce the "ABC The Quick Fox" page you quoted above and see if I can replicate it and report back.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            I tested with a vanilla index without anything else changed other than the scan directory and platform (ASP.NET).
            Here is a better example that displays the problem:

            Page Results:
            Code:
            Search results for: "[COLOR="#FF0000"]configuring search[/COLOR]"
            1 result found. 
            1. [COLOR="#FF0000"]Server Settingrver Setting[/COLOR]
            ... Configuring search system. ...
            Terms matched: 1  -  Score: 61  -  19 Apr 2016  -  URL: http://www.example.com/test.html
            Search powered by Zoom Search Engine
            Indexed page:
            Code:
            <html xml:lang="en-us" lang="en-us">
            <head>
            <title>[COLOR="#FF0000"]Configuring Server Setting[/COLOR]</title>
            </head>
            <body id="topic_1">
             Text test content. Configuring search system.
            </body>
            </html>
            I was able to reproduce it on this site search as well:

            https://www.wrensoft.com/search.php?...m_cat%5B%5D=-1

            Correct title:
            Google Chrome Benchmarks - JavaScript engines compared (Chrome vs Firefox vs IE)

            Result Text:
            Benchmarks - JavaScript engines compared (Chrome vs Firefox vs IE)fox vs IE)


            Here is another example:
            http://www.wrensoft.com/search.php?z...&zoom_cat[]=-1
            Last edited by sanGeoff; Apr-19-2016, 07:00 PM.

            Comment


            • #7
              We've confirmed the bug. It is indeed a different one to the one before. We'll have this fixed for the next build (V7.1 build 1002) and there will also be a new ASP.NET Server Control for it, should be out in the next week or so.

              Thanks for bringing it to our attention.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Wow, looks like build 1002 is already available (4/22).
                Thanks for looking into the issue and fixing it amazingly fast.

                I did notice one quick thing you might want to check into.
                It has to do with the generated files from the Zoom Indexer.
                The version string in the settings.zdat around line 90 changed a bit.

                In the previous version (build 1001) the settings.zdat file says:
                Version = "Version 7.1 (1001)"

                In build 1002 that I just download settings.zdat says:
                Version = "Version 7.0 (Debug build)"

                Not sure if that makes any difference, but it might indicate a incorrect build.

                Thanks again.

                Comment


                • #9
                  Well, looks like it was almost fixed. Someone pointed out to me today duplicated word in title.
                  This time it's not at the beginning and seems to just be the partial match word duplicated.

                  Page Results:
                  Code:
                  Search results for: "[COLOR="#0000FF"]server configuration[/COLOR]"
                  1 result found. 
                  [COLOR="#FF0000"]Test Server [B]Server [/B]Setting[/COLOR]
                  ... Server configuration is cool. ...
                  Search took 0.002 seconds
                  Indexed page:
                  Code:
                  <html xml:lang="en-us" lang="en-us">
                  <head>
                  <title>[COLOR="#FF0000"]Test Server Setting[/COLOR]</title>
                  </head>
                  <body id="topic_1">
                   Text test content. Test search system. [COLOR="#0000FF"]Server configuration[/COLOR] is cool.
                  </body>
                  </html>
                  Index with zoom Indexer 7.1 build 1002 and using build 1002 32-bit zoom aspx control.
                  7.0.5956.30654

                  Also, the What's new page: http://www.wrensoft.com/zoom/whatsnew.html#windows
                  Says "Fixed CGI bug with garbled titles..." was it CGI as well? What about ASP.NET ?

                  Thanks again, sorry for being a stickler.

                  Comment


                  • #10
                    You can download the new ASP.NET Server Control and let us know if you still have any problems.

                    Yes, the CGI and ASP.NET control shares the same code base, so the news was supposed to refer to both. This has now been corrected. As has the Debug version text.

                    Thanks for bringing these things to our attention.
                    --Ray
                    Wrensoft Web Software
                    Sydney, Australia
                    Zoom Search Engine

                    Comment


                    • #11
                      Excellent, looks like the new ASP.NET Server Control (1002b) fixes the secondary double word issue.
                      Thanks again for the extremely fast response and fix. Looks good now.

                      Comment


                      • #12
                        Hate to be the bearer of bad news, but it looks like the seconds fix that fixed the double word issue caused another issue back on my first issue. If I you go back and test my first example again in this post:
                        It's now excluding the second word in the title. Instead of the correct title of Configuring Server Setting it just displays Configuring Setting. I suspect the duplicate word fix that was an issue only when it was not at the beginning also removed the single correct word if it's a phrase match at the beginning.


                        Also as a minor issue, if there is an actual phrase match in the title it highlights the text, it is missing a space. Ex: Searching my example above for "configuring server" returns a title of Configuring ServerSetting

                        That issue though i can simply fix with a little CSS or JavaScript, for example:
                        .result_title .highlight:after { content: " "}


                        Index with zoom Search Engine Indexer 7.1 build 1002
                        Core Engine: Version 7.1 (Build: 1002) on Windows 7
                        ASP.NET Server Control 32-bit build 1002b. Assembly Version: 7.0.5962.22541
                        Last edited by sanGeoff; Apr-29-2016, 06:33 PM.

                        Comment


                        • #13
                          Clearly the highlight algorithm was more flawed than we first anticipated. Involved a bit more of a rewrite to try to address the various situations, and we clearly need a better way to keep track of test scenarios.

                          Hopefully third time's the charm, here's anothe patch release of the ASP.NET Server Control:
                          http://www.wrensoft.com/zoom/aspdotnet.html
                          --Ray
                          Wrensoft Web Software
                          Sydney, Australia
                          Zoom Search Engine

                          Comment


                          • #14
                            Thanks again for the quick reply and fix.
                            Looks good to me, even the space issue is fixed.

                            Hopefully our phrase searchers in our group don't find anything else.

                            Comment


                            • #15
                              Aw, darn, less than 24 hours and two people in our groups still found an issue.
                              I think they are desperate to find a reason to go back to CHM format and search.
                              I hate to have to continue this thread and report another issue.

                              New issue seems to be with punctuation characters being repeated. The example below is with '(' character but I also noticed same issue with a comma in the title.

                              Example file:
                              Code:
                              <html>
                              <head><title>[COLOR="#FF0000"]Zoom Search (ZS)[/COLOR]</title></head>
                              <body id="topic">
                              <p>[COLOR="#FF0000"]search test[/COLOR].</p>
                              </body>
                              </html>
                              Search: "search test"

                              Below is a table of title result displayed for all the recent 32-bit ASP.NET Control versions.
                              Only the first one is correct, the rest all have extra charterers.

                              Build Assembly Version Result Title Text
                              build 1001 7.0.5855.30451 Zoom Search (ZS)
                              build 1002 7.0.5956.30654 Zoom Search (Search (ZS)
                              build 1002b 7.0.5962.22541 Zoom Search ((ZS)
                              build 1002c 7.0.5966.26855 Zoom Search ( (ZS)
                              All indexed with zoom Search Engine Indexer 7.1 build 1002

                              Since the duplicated text is pretty unique and probably never occurs normally I was able to create the following temporary JavaScript fix for us:

                              Code:
                              //tempFix
                              var resultTitles = document.getElementsByClassName("result_title");
                              var tmpTxt;
                              for(var i = 0; i < resultTitles.length; i++) {
                                  tmpTxt = resultTitles[i].innerHTML;
                                  tmpTxt = tmpTxt.replace(/\( \(/g, '\(');
                                  tmpTxt = tmpTxt.replace(/, ,/g, ',');
                                  tmpTxt = tmpTxt.replace(/&amp;  &amp;/g, '&amp;');
                                  resultTitles[i].innerHTML = tmpTxt;
                              }
                              Last edited by sanGeoff; May-03-2016, 05:35 PM.

                              Comment

                              Working...
                              X