PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Match and highlight words w/spec. chars

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Match and highlight words w/spec. chars

    Hi Ray and David, hello everyone!

    It's been a long time my last post and it doesn't mean we are not using ZOOM any more. Just opposite, our intranet site meanwhile growth to near 300.000 HTML pages, and ZOOM-CGI does very good job.

    But I made some testings with highlighting words in result pages. And I don't have any problem with it. I modified with my favourite HTML editor all the files because I had to put style and script tag into head section, and call highlight function in body onLoad event. And it's all OK.

    Function works fine, but it doesn't highlight any word that contains some spec. character. For example, when I write " član " (on Croatian it means article), ZOOM passes zoom_highlight=%E8lan parameter to highlight function, which returns nothing, because there is no such a word (I mean %E8lan).

    Any idea, how to avoid this problem?

    Thanks in advance. Looking forward to ZOOM 4.1
    Regards,
    Nenad

  • #2
    Hi Nenad,

    What encoding (or charset) does your webpage use? Can you give us a URL to these pages so that we can take a look?
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Hi Ray,

      unfortunatelly, the pages belong to our intranet site, so they are not accesible through the Internet. The character set we use is "windows-1250".

      Do you see any problem with it?

      By the way, ZOOM works very fine with 1250 encoding.
      Regards,
      Nenad

      Comment


      • #4
        It would depend on the way it is entered on the page.

        For example, if the "č" character was actually entered as a HTML entity (ie: "č"), it would fail to find it. The highlighting script does not try to find the HTML entity equivalent of the search text (although perhaps we can make it do this in the next revision).

        However, if the character is stored in the document as it is and not as a HTML entity (which is possible with the windows-1250 charset), then the search script appears to have no trouble highlighting it in our testing.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Hi Ray & David,

          I'm still having problem with our character and highlight function. The function works nice but it highlights only words without that characters.

          To be honest, I don't quite understand you explanation about how the character is entered into HTML file. I tryed a lot, and experimented with different charset, but so far - no result.

          Originally posted by Ray
          It would depend on the way it is entered on the page.

          For example, if the "č" character was actually entered as a HTML entity (ie: "č"), it would fail to find it. The highlighting script does not try to find the HTML entity equivalent of the search text (although perhaps we can make it do this in the next revision).

          However, if the character is stored in the document as it is and not as a HTML entity (which is possible with the windows-1250 charset), then the search script appears to have no trouble highlighting it in our testing.
          Your idea what should I do next is very much appreciated. Thanks in advance.
          Regards,
          Nenad

          Comment


          • #6
            Oops, I just noticed that my explanation was confusing because the HTML entity example I gave you got interpreted by the browser and it didn't show properly.

            What it should say is:

            For example, if the "č" character was actually entered as a HTML entity (ie: "č" in the source code of the original web page), it would fail to find it. The highlighting script does not try to find the HTML entity equivalent of the search text (although perhaps we can make it do this in the next revision).

            However, if the character is stored in the document as it is (eg. with a keystroke) and not as a HTML entity (which is possible with the windows-1250 charset), then the search script appears to have no trouble highlighting it in our testing.
            Perhaps you can give us a URL to the page in question and we can have a look? or e-mail us some examples.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Thanks for quick respond and for additional explanation.

              I entered the characters in wright way (as they are in 1250 codepage, I mean not as &scaron) but highlight function doesn't recognize them.

              I made some testings with some german and french characters. Some of them works fine but some of them don't (for example the same thing happens with french character é).

              Because we don't have the files on-line, I'll mail the exaple (test file) to you.

              Hope you will find the reason why highlight function doesn't like some characters

              Thanks in advance
              Regards,
              Nenad

              Comment


              • #8
                We have looked into this further, and we did find a bug which was related to some words not being highlighted. This fix will be available in the next public release.

                However, the problem persists for some Croatian characters in the windows-1250 charset. The problem is that Javascript lacks the ability to decode these parameters from the URI encoding. We will continue to keep an eye out for a solution to this, but in the meantime we will update our FAQ and documentation with this as a known issue.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment

                Working...
                X