PDA

View Full Version : Match and highlight words w/spec. chars



Nenad
05-04-2005, 07:47 AM
Hi Ray and David, hello everyone!

It's been a long time my last post and it doesn't mean we are not using ZOOM any more. Just opposite, our intranet site meanwhile growth to near 300.000 HTML pages, and ZOOM-CGI does very good job.

But I made some testings with highlighting words in result pages. And I don't have any problem with it. I modified with my favourite HTML editor all the files because I had to put style and script tag into head section, and call highlight function in body onLoad event. And it's all OK.

Function works fine, but it doesn't highlight any word that contains some spec. character. For example, when I write " član " (on Croatian it means article), ZOOM passes zoom_highlight=%E8lan parameter to highlight function, which returns nothing, because there is no such a word (I mean %E8lan).

Any idea, how to avoid this problem?

Thanks in advance. Looking forward to ZOOM 4.1 :D

Ray
05-06-2005, 12:44 AM
Hi Nenad,

What encoding (or charset) does your webpage use? Can you give us a URL to these pages so that we can take a look?

Nenad
05-06-2005, 12:58 PM
Hi Ray,

unfortunatelly, the pages belong to our intranet site, so they are not accesible through the Internet. The character set we use is "windows-1250".

Do you see any problem with it?

By the way, ZOOM works very fine with 1250 encoding.

Ray
05-09-2005, 12:16 AM
It would depend on the way it is entered on the page.

For example, if the "č" character was actually entered as a HTML entity (ie: "č"), it would fail to find it. The highlighting script does not try to find the HTML entity equivalent of the search text (although perhaps we can make it do this in the next revision).

However, if the character is stored in the document as it is and not as a HTML entity (which is possible with the windows-1250 charset), then the search script appears to have no trouble highlighting it in our testing.

Nenad
07-27-2005, 01:02 PM
Hi Ray & David,

I'm still having problem with our character and highlight function. The function works nice but it highlights only words without that characters.

To be honest, I don't quite understand you explanation about how the character is entered into HTML file. I tryed a lot, and experimented with different charset, but so far - no result.


It would depend on the way it is entered on the page.

For example, if the "č" character was actually entered as a HTML entity (ie: "č"), it would fail to find it. The highlighting script does not try to find the HTML entity equivalent of the search text (although perhaps we can make it do this in the next revision).

However, if the character is stored in the document as it is and not as a HTML entity (which is possible with the windows-1250 charset), then the search script appears to have no trouble highlighting it in our testing.

Your idea what should I do next is very much appreciated. Thanks in advance.

Ray
07-28-2005, 12:56 AM
Oops, I just noticed that my explanation was confusing because the HTML entity example I gave you got interpreted by the browser and it didn't show properly.

What it should say is:


For example, if the "č" character was actually entered as a HTML entity (ie: "č" in the source code of the original web page), it would fail to find it. The highlighting script does not try to find the HTML entity equivalent of the search text (although perhaps we can make it do this in the next revision).

However, if the character is stored in the document as it is (eg. with a keystroke) and not as a HTML entity (which is possible with the windows-1250 charset), then the search script appears to have no trouble highlighting it in our testing.

Perhaps you can give us a URL to the page in question and we can have a look? or e-mail us some examples.

Nenad
07-28-2005, 07:41 AM
Thanks for quick respond and for additional explanation.

I entered the characters in wright way (as they are in 1250 codepage, I mean not as &scaron) but highlight function doesn't recognize them.

I made some testings with some german and french characters. Some of them works fine but some of them don't (for example the same thing happens with french character é).

Because we don't have the files on-line, I'll mail the exaple (test file) to you.

Hope you will find the reason why highlight function doesn't like some characters :lol:

Thanks in advance

Ray
08-04-2005, 12:37 AM
We have looked into this further, and we did find a bug which was related to some words not being highlighted. This fix will be available in the next public release.

However, the problem persists for some Croatian characters in the windows-1250 charset. The problem is that Javascript lacks the ability to decode these parameters from the URI encoding. We will continue to keep an eye out for a solution to this, but in the meantime we will update our FAQ and documentation with this as a known issue.