PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Error on non-english character in URL

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error on non-english character in URL

    I am running Zoom V7 on Mac in spider mode. There is a page on my server which has the word "schrӧdinger" in the URL. While indexing, Zoom gives a warning 'Could not download file: http....../schr%D3%A7dinger.....php (File not found)'. In addition it gives a Broken link error on the page which has a link to this url.

    All the pages have UTF-8 charset and the same is set on the Languages tab of Zoom configuration. I tried setting and unsetting 'Percent-encode URLs in UTF-8' in the Advanced tab, but there was no effect.

  • #2
    Accented characters are not valid characters in a URL. See the URL standard, RFC 1738, <http://www.ietf.org/rfc/rfc1738.txt>.

    Zoom will encode the illegal characters in the URL (as % hex hex). It will then depend on your web server and how it is setup to handle the encoded URL string.

    Comment


    • #3
      Thanks for the tip about the web server. I checked and found that my server correctly deals with percent encode urls and serves up the page.

      Looking more carefully, it seems there is some mix up when Zoom encounters the accented character in a link. Here is the error:

      10:59:07 - Broken link found on page: https://www.xxxxxx.com/xxxxxx/
      10:59:07 - (Broken link URL is: https://www.xxxxxx.com/xxxxxx/schr%D3%A7dinger-equation-+-operators.php-equations-in-one-place.php )

      If you look closely at the above link, at the end there is a fragment, "-equations-in-one-place.php", from a different link which occurs previously a few lines up on the page. The correct url should have been: "https://www.xxxxxx.com/xxxxxx/schr%D3%A7dinger-equation-+-operators.php".

      There are three links on the page with "schrӧdinger" and all of them have the fragment at the end from another link. There seems no pattern in which fragment appears or how long it is. Here are the other two errors:

      10:59:08 - (Broken link URL is: https://www.xxxxxx.com/xxxxxx/schr%D3%A7dinger-equation-as-an-eigenvalue-problem.phps-in-one-place.php )
      10:59:09 - (Broken link URL is: https://www.xxxxxx.com/xxxxxx/the-plausibility-of-the-schr%D3%A7dinger-equation.phpm.php )

      Please advise.

      Comment


      • #4
        Accented characters are not valid characters in a URL.
        The server doesn't "deal with them", they are not legal in a URL and you can't use them.
        You need to manually encode them yourself if you are to use them.

        If you still think the links are valid for some reason, we really need to see the page in question to check the HTML rather than looking at the Zoom log.

        Comment


        • #5
          Please also check that you are using the latest build of Zoom For Mac (V7.1 build 1000 at time of posting):
          http://www.wrensoft.com/zoom/whatsnew.html

          If you still have a problem, do give us the URL to the page in question so we can take a look at the HTML and confirm if there is a bug.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X