PDA

View Full Version : Characters are not displayed correctly


Philipp
06-20-2005, 10:11 AM
Hi!

All my HTML Pages use the UTF-8 charset, so in the ZOOM settings I selected "Use Unicode (UTF-8 Encoding)". But in the results page special characters aren't displayed properly. In addition when I change the character settings in Zoom to UTF-8 I always get an alert after indexing. (The existing template file in the output folder does not use the same encoding or character set specified(iso-8859-2")). But the template uses UTF-8.

Thanx, Philipp

Ray
06-21-2005, 12:57 AM
Do all accented characters display incorrectly on the search results page? Or only in the search box and the heading "Search results for: ..."? The latter is a known bug for the ASP version (from build 4.0.1016) and will be fixed for the 4.1 version release coming out this week.

As for the warning message, make sure the search_template.html file in your output directory has the correct charset specified. It would require a meta tag charset specification within the <head> section of the page. If you have checked that this is already the case but continue to get this problem, give us a URL to your search page, or if not possible, e-mail us your template file.

Philipp
07-06-2005, 03:15 PM
Hi!

Sorry for the late reply, here's my answer:

- All my HTML Pages are generated automatically by a Content Management Server named Red Dot. The pages are generated using the Charset UTF-8, in addition all pages have a charset utf-8 meta tag.

- In my case, the search_template.html file is emtpy (there's only the Comment in it), so I can't define a meta tag for this template, I defined a meta tag for the surrounding template of course, which is http://195.170.79.226/at/dt/index_at_dt_zoom.asp.
- So the problem is, as I've mentioned before, special characters don't display correctly. (Everywhere not only in the heading "Search results for...)

You'll see when you test the search at: http://195.170.79.226/at/dt/index_at_dt_zoom.asp

Most of the special characters are replaced with HTML or Unicode entities, e.g. &uuml; for ü ...
But I also tried not to escape the characters, but it didn't help to solve the problem.
When pdf documents are found, the special characters aren't displayed either. You can't see it at this server, because it's only the free version...

The problem appeared on both Windows Standard Server 2003 and Windows Server 2003 Web Edition using zoom search engine 4.1.


Thanks Philipp

Ray
07-07-2005, 01:13 AM
We would need to see the source code for "index_at_dt_zoom.asp". It appears that you have either modified the search script, or you are post-processing the search results, and the problem is probably caused by this.

We made some educated guesses and found the default ASP search page here:
http://195.170.79.226/at/dt/search.asp

We presume this is being called by your "index_at_dt_zoom.asp" file in a method similar to what is described here:
http://www.wrensoft.com/zoom/support/faq_ssi.html

If this is the case, then we noted that the results returned by the default script does not have any problems showing the accented UTF-8 characters:
http://195.170.79.226/at/dt/search.asp?zoom_query=investkredit

Let us know if this is not the case. Otherwise, if you find that the problem occurs even with the default script and the provided instructions from our FAQ, then zip up your search files (including "index_at_dt_zoom.asp") and email them to us (see our Contact Us page) so we can take a closer look.

Philipp
07-07-2005, 12:59 PM
My whole search script functions as follows:

- I placed a search box in every page, when the user types in a word and submits the search form, then the template index_at_dt_zoom.asp is called
- inside the index_at_dt_zoom.asp I placed the line
- I modified the search.asp, but I only changed some lines of html output, changes in the search.asp are marked with the comment 'DMC

You're right, the output of http://195.170.79.226/at/dt/search.asp?zoom_query=investkredit is OK.
So the problem must be caused by my template index_at_dt_zoom.asp.

I found out what could be the reason:

As I've mentioned before, a CMS called Red Dot automatically generates all the templates (also index_at_dt_zoom.asp)
The files are saved in the UTF-8 format.
After the generation I opened the template index_at_dt_zoom.asp with the wordpad and saved it as a plain textdocument (the other options are rtf document, text doc - msdos format and text doc unicode).
Suddenly it worked, all characters were displayed correctly. Try it, the changes are still active. Unfortunately this can't be my solution, because then other parts of the page don't work any more, like the DHTML Navigation on top.

I placed a zip file on the server (http://195.170.79.226/at/at_dt.zip) including the search.asp, settings.asp, search_template.html, index_at_dt_zoom.asp and the at_dt.zcfg) I hope it'll help you.

Thanks!

Ray
07-08-2005, 01:32 AM
It seems to be working fine now at:
http://195.170.79.226/at/dt/index_at_dt_zoom.asp?zoom_query=investkredit

Was the "index_at_dt_zoom.asp" file that you sent us (in the zip) before or after you re-saved it in Wordpad? We can not see any problems in that file or reproduce the error (however, we were unable to replicate the scenario completely since you did not include the .zdat files in your zip). I would guess that your server is attempting to re-convert the output to UTF-8, even though it already is in UTF-8.

We did notice however, that your "index_at_dt_zoom.asp" template does not actually contain any ASP scripting. This makes us wonder if your current approach is actually the best way to achieve what you are doing.

For example, it should be possible to only have your CMS software generate the file "search_template.html". This would just be the search template page, with in place of where you want the search results to be. No #include necessary. You can then access the search page from "search.asp" (and even allow it to generate the search form again).

You would not need "index_at_dt_zoom.asp" in this case, or any #includes. However, we are not familiar with Red Dot, so perhaps there are certain limitations with your CMS software that prevent this approach. If possible however, we would recommend this as a cleaner solution, and would probably avoid the current UTF-8/character issues you are having (or at the least, narrow down the possible points of error).