Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

"Did you mean" Feature Request: Don't Suggest Searches for which there are 0 Results

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Did you mean" Feature Request: Don't Suggest Searches for which there are 0 Results

    Hi there,

    I am using Zoom Search version 7.0 (build 1017). I've noticed that Zoom will make "Did you mean" suggestions for words that don't exist anywhere on my site, which seems rather unhelpful and a bit confusing for users. It would be really nice if the "Did you mean" functionality automatically filtered out any words, phrases, etc. for which there are going to be No results (and thus don't display a "Did you mean" if it will not actually lead the user to something useful).

    An additional (but probably much more complicated to implement) improvement would be to not make a particular "Did you mean" suggestion if it is only going to return a single result which is already being displayed on the current search results page.

    The first one is the most important, by far. It is weird and confusing when Zoom makes suggestions that have nothing to do with my actual site content.

    Thanks for your consideration!


  • #2
    Zoom should already only make suggestions to words which were found on your site.

    A few possibilities why this might be occurring:
    1) You have not updated all the files after re-indexing (e.g. you are using a "zoom_spellings.zdat" file from a previous index session) and thus it is making suggestions based on a previous indexing. Note: It is important to upload all files listed at the end of indexing in the "Required files" window, otherwise the index would be corrupted.
    2) The suggested queries in particular contains some special cases (e.g. containing punctuation mark or word join characters) which is/was not handled correctly
    3) You are using an older version and build, so it is possible that certain bugs have since been fixed which caused the above (for example).

    As a first recourse, check if it's possible some of your index files are from different sessions. The date and time of the files should be the most telling.

    Then try a re-index, and make sure to upload ALL the files listed at the end of indexing in the "Required Files" window.

    Next, if you still have a problem, update to the latest build from here:
    http://www.wrensoft.com/zoom/whatsnew.html

    Repeat the above and see if the behaviour changes.

    If you are still having this problem, email us the link to the page and the queries in question and we can take a closer look. If the website is not accessible, ZIP up your search files and email them to us.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      My apologies on not replying! For some reason I never received notification of your reply to my post (which is weird, because I'm Subscribed, and my settings are to email me...). I will have a look through your suggestions and also try upgrading. I will clarify though that I believe some of the words/phrases I was seeing were ones that were unusual enough that I can say with certainty had never existed on my site, so that would rule out it being an old indexing file. That is part of what was/is weird about it -- there's been some bizarre words and phrases suggested at times.

      On a somewhat related note, is there a way I can know via PHP if there are 0 results, so that I could then set a 404 HTTP Header programmatically?

      Comment


      • #4
        The words and phrases have to come from somewhere, Zoom isn't capable of generating words out of nowhere (e.g. there is no built in dictionary)

        Quite often, people don't realize the content that might be lurking within their files. For example, PDF files contain a visible layer (which is shown when you open it in Acrobat Reader) and a hidden "text layer" (which you can only see if you try to select and copy and paste the text into another program, such as Notepad). For a document which was scanned in from paper -- this would mean the visible layer is an image (e.g. photo) of the scan, and the text layer is usually created by OCR -- a programmatic attempt at recognizing the text on the paper.

        Sometimes this can generate text that is completely not what you expect, due to anything from failure to recognize handwriting, poor scan quality, or simply fault of the OCR algorithm.

        [Having said the above, there was a recent bug that we fixed which was indexing unusual numbers out of Office 2007 files e.g. docx, pptx, due to embedded drawing data. Note that this is only numbers, not words or phrases. This has since been fixed in the latest build V7.1 build 1011]

        Another possibility is if you have some file formats configured to index incorrectly. For example, you added the scan extension ".doc" to be indexed as an "Acrobat document" (instead of a Microsoft Word document). Or you added some binary files (e.g. ".exe") and asked it to be indexed as "HTML/text". This would also cause Zoom to index alot of binary garbage as text. This would look more like completely random combinations of letters and numbers, like "bbzAaaaZZzztt" etc.

        So if we were to investigate further, we should look at where these "bizarre words" are coming from. If you can e-mail us your ZCFG configuration file, your index log text file, and a few of the files in question, we can take a look. It might be best to create a smaller test case with a smaller set of files (e.g. just 2 or 3) and reproduce the problem, before doing this.

        Regarding returning a 404 error when there are zero results, this is not advised. "404" means there is a broken link and something is wrong with your web site. While "zero results" just means that a user has searched for something that isn't on your site. The latter is a legitimate situation, and not a broken error. Returning 404 headers would lead to all sorts of confusion for users and bots alike.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Thanks much for the thorough explanation -- very helpful! You raise some great points that I hadn't considered, which will help me as I try to figure out where these anomalies are coming from. I haven't had a chance to assemble some more current data yet, but wanted to give a quick reply.

          Regarding the 404 thing: I totally agree with you. The only reason I'm looking into is that Google Webmaster Tools keeps complaining periodically about "Soft 404" errors. I have some links on my site that initiate searches (within my site) based on keywords relevant to that particular content. Some of those search results will have "Did you mean" suggestions that result in search results with no content (ie, no results). This chain of events results in those 0-results pages being in Google Webmaster Tools and it interpreting them as Soft 404's. So I was exploring how I might handle those instances differently to make Google more happy. But I do agree with you -- returning a 404 is not ideal. I'll explore other ways of addressing Google's complaint.

          Comment

          Working...
          X