Results 1 to 4 of 4

Thread: V5 development progress - Charset by page

  1. #1
    Join Date
    Dec 2004
    Location
    Sydney
    Posts
    4,156

    Default V5 development progress - Charset by page

    This is another short update on one aspect the development process of V5 of Zoom and feature rather cryptically known as 'Charset by page'.

    V5 of the indexer will use the charset specified in the page's meta tag or the HTTP header sent by the server when indexing files. Previously, the Indexer would always expect content to be delivered in the same charset specified in the Zoom configuration window (one charset per session).

    This means that you can now index various web pages (or websites) which
    employ different charsets or encoding. The indexed content will then be
    converted to the encoding selected in the configuration window, and your
    search page will use the same encoding.

    In V4.2 it was possible to have a set of index files that spanned multiple languages, but only if all the web sites used the UTF-8 character set or the same character set. In V5 it will be possible index for example, some pages in UTF-8, some pages in English 1252, some pages in ISO-8859-5 Cyrillic, and have them all combined into the same set of index files.

    So this is a significant enhancement in multi-language web site support.

    Note: We offer free upgrades for 6 months after a purchase, so if you purchase V4 now, there will be a free upgrade to V5 when it becomes available.

    -----
    David

  2. #2
    Join Date
    Jun 2006
    Posts
    13

    Default

    Excellent.

  3. #3
    JCF1976 Guest

    Default Re: V5 development progress - Charset by page

    This all sounds excellent! This all would apply to crawling the pages offline too, I would assume, correct? You said, "V5 of the indexer will use the charset specified in the page's meta tag or the HTTP header sent by the server when indexing files." Will this feature be dependent on crawling the site off the server?

  4. #4
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,572

    Default

    This will apply to files scanned in Offline Mode as well. So yes, you will be able to scan pages of varying charset/encoding (as specified by their meta tags) in offline mode, and have the content correctly indexed and searchable.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

Similar Threads

  1. Zoom Seach V5 - XML/RSS output for A9 & Opensearch
    By Ray in forum Zoom Search Engine V5 (Old Version)
    Replies: 9
    Last Post: 11-07-2006, 10:24 PM
  2. V5 development progress - Recommended links
    By wrensoft in forum Zoom Search Engine V5 (Old Version)
    Replies: 2
    Last Post: 11-07-2006, 11:07 AM
  3. V5 development progress - Improved Categories
    By Ray in forum Zoom Search Engine V5 (Old Version)
    Replies: 9
    Last Post: 10-25-2006, 09:53 AM
  4. V5 development progress - Indexing enormous sites
    By wrensoft in forum Zoom Search Engine V5 (Old Version)
    Replies: 19
    Last Post: 10-24-2006, 02:42 AM
  5. V5 development progress - Content filtering
    By wrensoft in forum Zoom Search Engine V5 (Old Version)
    Replies: 0
    Last Post: 06-06-2006, 06:28 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •