PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Search results for grouped words should be higher

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search results for grouped words should be higher

    Hello, I am still working with improving some search results over various PDF Files that I have on my server. A little re-cap.... I have PDF files that are appearing above real pages in my search results. I have added meta tags, zoom meta tags, tried page boosting, meta tag page boosting certain files, etc. and the PDF files still appear higher than the real content. I am aware that I can make .DESC files for the PDF's to turn their pageboost down if needed. The reason I don't want to do this is because I would have to create a desc file for each PDF (probably over 100) and if I don't then it will flip out my custom 404 error page and send me a bunch of e-mails when Zoom indexes.

    So, what I was wondering....
    1) My biggest problem is when people type in "contact city hall" without quotes. With quotes, it comes up #1 in the result list and everything is fine. BUT, many people don't know they can do that. Without quotes it is about 10th position. I have typed in meta tags, description tags, etc. the exact phrase but it seems that other files that have the 3 words throughout the file come up higher, even though they don't match the exact search phrase. Is there a way that this can be done?

    2) Is there a way that I can 'flip a switch' to make all PDF's appear lower? I don't want to turn PDF searching off because there is a lot of useful information contained in them, but I might have to if I can't get this to work out.

    Hopefully, I haven't completely confused you ! I am also running the most recent version of Zoom Pro (downloaded the update today)

  • #2
    I too found that PDF files were returning very very high scores compared to the HTML page version of the same document.

    I assumed that this was due to the HTML being broken into several pages for different part of the document, so in a search for "xxxx", the single PDF would contain several hits whilst the individual HTML pages would have those same hits scattered between the pages.

    The only way I could resolve it was to use categories for HTML and PDF searches but I have the luxury of knowing that nothing exists in the PDF's that isnt also somewhere in the HTML's.
    Mark Gallagher

    Comment


    • #3
      Yes this is something we've been looking at and plan to address in Version 4.1. The problem is, as sizbut said, a PDF file can contains many pages and have a much higher chance for a word to appear, than a single HTML file. We're planning on adding a feature in Version 4.1, which will automatically scale the importance of a page based on the size of the document (or the number of words). This should hopefully eliminate the issue. An alternative is to allow users to specify a ZOOMPAGEBOOST value that applies for all files of a certain file type, but we're still considering if that would be as useful as the first option.

      Meanwhile however, you do have to specify a DESC file for each PDF that you wish to lower the importance of. Note that they can all have the same content (unless you also want to specify a different title or description etc) so you just need to create one and make multiple copies.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        What about the first problem? I have the same issue. As an example, when the user types "text box" (without the quotes) into the search engine, I would like pages with the phrase "text box" to come up first. But they are getting drowned out by pages that contain the words "text" and "box" separately.

        Thanks.

        Comment


        • #5
          The upcoming Version 5.0 will have a new feature call "Recommended links" which will allow you to specify keywords or phrases that you can enter in specific results/URLs for. These results will then appear before the rest of the search results. Recommended links will match against the full search query, so even a search for "text box" with or without quotation marks, will return the recommended links entered for the phrase "text box".

          Alternatively, we recommend adding some search tips before the search box to explain the use of quotation marks when searching for an exact phrase.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Very cool! What is the target date for version 5?

            Comment


            • #7
              We're currently aiming for July. But most likely sometime in the next few months. As always, we want to make sure that some of the significant engine overhaul, improvements, and new features we've worked on are stable and ready before rushing anything out the door.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X