PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Advices on multi languages files indexing!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advices on multi languages files indexing!

    First thanks for any help!

    I want to create a virtual coffee library with PDF+Doc+TXT and so on, in French, English and Spanish. I would like to offer users the option, for example, to search:

    1. All Pdf
      PDF in Spanish
      Pdf In English
      Pdf In French
    So if I want to repeat these options for DOC, than right now I have 8 categories possible.

    To complicate things, all coffee docs will have to be in 2 groups:
    1) one for Costa Rica coffee documentation
    2) one for International coffee docs (world minus Costa Rica)

    What would be the simplest way to set up as many options? May be it is too much to ask or Zoom can handle that?

    In others words how would you implement that? Any advices are more than welcome!

    Thanks

    Roger Pilon, Editor
    Costa Rica Coffee Directory

  • #2
    Do you want to search all three documents types seperatly as well as by language.? I would easier to just have the languages selectable from the categories (as a drop down) and to search all document types.

    Comment


    • #3
      It might be possible to use categories, if you are carefully about how you organse the documents into folders.

      You could have 10 categories, using a directory structure something like,
      /international/PDF/Spanish
      /international/PDF/English
      /international/PDF/French
      /international/DOC/Spanish
      /international/DOC/English
      /international/DOC/French
      /CostaRica/PDF/Spanish
      /CostaRica/PDF/English
      /CostaRica/PDF/French
      /CostaRica/DOC/Spanish
      /CostaRica/DOC/English
      /CostaRica/DOC/French

      Maybe try this with just 10 to 20 documents to start with, then add the rest once you think it is all working as you want.

      And you'll also want to use the UTF-8 character set option if you need to support multiple languages.

      Comment


      • #4
        You should be able to do it without splitting the file types into directories too. That is, if you simply have:

        /international/spanish/myfile.pdf
        /international/spanish/myotherfile.doc
        /international/english/file3.pdf
        ...etc.

        You can have category patterns like:

        Category name: All PDFs
        Match pattern: *.pdf

        Category name: PDFs in Spanish
        Match pattern: /spanish/*.pdf

        Category name: DOCs in Spanish
        Match pattern: /spanish/*.doc

        Category name: Costa Rica
        Match pattern: /CostaRica/

        So it is possible I think, but you may end up with alot of categories.

        And when you say that all documents will either be International or Costa Rica, does that also mean you might want to search for "Either International or Costa Rica, PDF files in Spanish"? In which case, you might want to look at the "Allow searching in multiple categories" option which turns the categories into a list of checkboxes.

        Perhaps you don't need categories per file type. If you have filenames enabled for indexing, and if you provide some search tips to your users, people can search PDF files only by adding the criteria "*.pdf" to their search query. eg. searching for:

        cats *.pdf

        ... will return all PDF files containing the word "cats" (for the selected category) - assuming you also have "match all search words" selected on the search form (&zoom_and=1). This would lower the number of categories necessary.
        Last edited by Ray; Jun-28-2007, 01:22 AM.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          There are 3 variables:

          The idea (maybe unrealistic..I admit it) is this one:



          Option One:
          1. Just the coffee docs about Costa Rica?
          2. International docs except Costa Rica?
          3. Do you want to consult all the docs about the coffee industry from anywhere?


          Than:
          1. Spanish Only
          2. French Only
          3. English Only
          4. Any Language


          Than:
          1. PDF
          2. DOC
          3. TXT
          4. etc
          5. Any kind of documents (all extensions)
          How about using wildcard characters ("*" and "?") in a category pattern. This means that I could create a pattern such as "/costa/french_*.pdf" which will get all PDF files, inside a main costa folder, with a filename starting with "french_" (and so on for other languages).

          I could use on top of it the Icon option to tag the kind of files, even if they will be mix out on the results page!?

          Bottom line, forget to group PDF together, DOC together, group them by language with the pattern feature. I would need in that case just 2 folders: one for "International", a last one for Costa Rica.

          Better strategy or not??

          Again, maybe it is too much to ask! How would you do that...if possible at all?

          Thanks for the support,

          Roger Pilon, Editor
          Costa Rica Coffee Directory

          Comment


          • #6
            Sorry Ray, I missed somehow ...

            your post!

            Thanks! I will sleep on it tonight!

            Roger

            Comment


            • #7
              Originally posted by quebecostarica View Post
              How about using wildcard characters ("*" and "?") in a category pattern. This means that I could create a pattern such as "/costa/french_*.pdf" which will get all PDF files, inside a main costa folder, with a filename starting with "french_" (and so on for other languages).
              Yep, that's pretty much exactly what I was trying to illustrate with the example in my last post.

              Originally posted by quebecostarica View Post
              I could use on top of it the Icon option to tag the kind of files, even if they will be mix out on the results page!?
              Sure, that's another option that Zoom offers you. See the Thumbnails and Icons chapter in the Zoom Users Guide.

              Originally posted by quebecostarica View Post
              Bottom line, forget to group PDF together, DOC together, group them by language with the pattern feature. I would need in that case just 2 folders: one for "International", a last one for Costa Rica.

              Better strategy or not??
              Sounds like a better strategy to me.

              To take it one step further, I would reconsider whether you really want multiple languages within the same set of index files. Very rarely does a user ever need to search for something that could exist across the different language sections. Usually they are always only interested in one particular language.

              It often makes more sense to have separate sets of index files for each language on a multi-lingual site. That is, you would have 3 different search functions, one for each language. And you would have 3 different ZCFG configuration files, each configured to index one particular language, and output the search files to a different folder (eg. "/search/english/search.php", "/search/french/search.php", etc.)

              Click here for a previous discussion on indexing a multi-region website, and creating multiple search functions for each region/language.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Originally posted by Ray View Post

                To take it one step further, I would reconsider whether you really want multiple languages within the same set of index files. Very rarely does a user ever need to search for something that could exist across the different language sections. Usually they are always only interested in one particular language.

                It often makes more sense to have separate sets of index files for each language on a multi-lingual site. That is, you would have 3 different search functions, one for each language. And you would have 3 different ZCFG configuration files, each configured to index one particular language, and output the search files to a different folder (eg. "/search/english/search.php", "/search/french/search.php", etc.)
                .


                You are right! Better to have 3 different sets ZCFg for each folders:
                1. /search/english/search.php
                2. /search/french/search.php
                3. /search/spanish/search.php
                I speak all these languages everyday and for me it is just like speaking one language! I am not bragging about it, it is just that with time, you tend to completely forget about it!



                Question one, just to be on the safe side, does it means to create 3 folders:
                • french
                • english
                • spanish
                and put inside each of them any documents PDF +Doc+TXT ETC classified by language?

                Question two:



                The patterns for countries this time would/could be:
                1. "/french/costarica_*.pdf"
                2. "/french/international_*.pdf"
                3. "/english/costarica_*.pdf"
                4. "/english/international_*.pdf"
                and so on!

                Do am I on the right track here??

                Thanks again for any feedbck!

                Roger Pilon, Editor
                Costa Rica Coffee Directory

                Comment


                • #9
                  Yes, you will still need to have the files organized neatly by language (either in filename or different folders as you suggested) so that they can be easily distinguished from each other.

                  You should not need to specify the country in your category patterns anymore, because you would setup each ZCFG configuration to ONLY index files for a specific language/folder.

                  For example, for your french configuration (mysite_french.zcfg), you may have a Skip Pages list of:

                  /english/
                  /spanish/

                  Which ensures that no English and Spanish files are indexed, and only the French files are indexed. Then your categories can simply be for "costarica_*.pdf" and so forth.

                  The only difference between your different ZCFG configurations would only be the Skip Pages list. Their categories can otherwise be the same. I hope that makes sense! I think there is more details in the previous thread I pointed to so check back there for more ideas.
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment

                  Working...
                  X