Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Certain search queries causing 500 Internal Server Error

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Certain search queries causing 500 Internal Server Error

    Hi,

    I noticed that more and more core dump files were being created in our cgi-bin folder. After a bit of research I found that there were certain zoom queries that were causing 500 Internal Server Errors.

    An example being - 'strange dream at night'.
    After searching each individual word in the query, I found that the word 'dream' was causing the error.

    I've read all the documentation on the 500 Server Error - and can see the error being logged as "Premature end of response header" - but I don't understand why this would only happen for certain words though.

    Is anyone able to shed some light on what might be causing this issue?

    Thanks in advance

  • #2
    What version of Zoom are you using? If you aren't already on it, can you upgrade to the final V6 release or the current V7.

    How big is your set of index files? Can you zip them up and make them available for download.

    If the CGI has crashed, creating a dump file, then it is probably normal that the web server throws a Premature end of response header error.

    Comment


    • #3
      Hi,

      We are on Version 6.0 ( Build: 1028 ) - Enterprise Edition. We are looking to upgrade shortly.

      Our zoom_xxx files total just over 1gig - we currently get about 170,000 pages indexed. So not sure if a download would be possible.

      Would it be normal that specific words could cause the CGI to crash? Would there be TOO many results? Or a particular result associated with that word that has issues?

      Thanks
      Last edited by bhtech; 07-11-2014, 02:02 AM. Reason: Typo

      Comment


      • #4
        No, a crash is not normal. Regardless of the search word. But different search words will use different sections of the index and maybe different code. For example some search words will trigger stemming and spelling suggestions, while others won't.

        • Could be a bug in V6.
        • Could be index corruption
          • Using ASCII mode when doing a FTP upload causes a lot of grief.
          • Having a mix of index files from multiple indexing sessions leads to corruption

        • Could be hardware failure (e.g. bad RAM in the machine).
        • Could be corruption of the files on your server (e.g. hard disk corruption).
        • It could be settings on your server. (e.g. limits and CPU and RAM usage that force the process to the killed prematurely)


        A 1GB download is no problem for us, if you can tar/gzip the files on your server. Loading up your files on our server can eliminate some of the possibilities.

        Comment


        • #5
          Hi,

          I found another strange query that caused the same error. I can search 'me' fine but if I search 'me?' I get the error. I found that one very strange.

          I have uploaded the search.cgi and all zoom_xxx files since having the issue and that didn't seem to make any difference - and I did make sure everything was uploaded in binary mode.

          Although I do understand that it could be hardware/settings issue, I can't understand why only certain queries would bring on hardware/settings issues. For the 'me' query - the search page returns 114,752 results, which I imagine would be at the higher end of search results returned.

          I'm leaning more towards your mention of different search words using different sections of the index. This would make the most sense to me - there could be certain parts of the index corrupted.

          I will attempt to tar/gzip the files and get you a link. Do you just want the zoom_xxx files. (Ours are .zdat)

          Thank you very much for your prompt replies and thorough support, I really do appreciate it.

          Thanks

          Comment


          • #6
            Hi,

            After having the site re-indexed yesterday, I uploaded the files to our server.

            I can now search 'dream' and 'me?' and there is no error - but I am now seeing the error if I search 'organiser' or 'children' (there may be more, these were the first I found in the logs).

            This would have to indicate a corruption in the .zdat files, wouldn't it? Something that is particular to those results that are only called when searched.

            If that is the problem, where would I start looking to fix this issue?

            Thanks

            Comment


            • #7
              The search for Me and Me? if different because ? (like the * character) act as a wildcard search.

              Again, if you want us to take a look, Zip up the files.

              Comment


              • #8
                Is it best to email a link to you?

                Can any harm come from posting the link here?

                UPDATE: I have sent you a PM with the link.
                Last edited by bhtech; 07-14-2014, 02:24 AM. Reason: Update

                Comment


                • #9
                  Got the file and having a look at it now.

                  Comment


                  • #10
                    Awesome!

                    I have gone through the logs and found the queries that are causing the server error's.

                    They are: organise, children, print and encourage. (including the plurals, etc. of those words)

                    These look to be the only words causing the issue - I'm not sure if that will help with finding any sort of corruption.

                    Thanks again for your excellent support.

                    Comment


                    • #11
                      The index data does in fact look corrupt. We can get a similar crash on our server. In your index files there are internal file pointers that point past the end of the files. Which should never happen. It might be the result of mixed files on the server, or a bug that caused the index to be corrupt when it was made.

                      Are you using incremental indexing when you built this recent index or was it a full reindex?

                      Are you using offline mode or spider mode when indexing your site?

                      When you uploaded the files, are you sure you uploaded all the files, and not just some of the files? So some files might be from a older smaller indexing session and some files from a larger index. This would nicely explain the behaviour. In particular the zoom_wordmap.zdat file looks too small. we would have expected it to be more around 193MB, rather than 144MB.

                      The index you sent us has ~161K pages. But you mentioned about 170K pages were indexed. Do you have the actual log from the indexing session by any chance?

                      Comment


                      • #12
                        Hi,

                        I believe it was a full re-index - there hasn't been any setting changed to make it incremental.

                        We index the site in spider mode.

                        The zip file I linked you to contains all the files that I upload - they all had the same date on them - so as far as I know they are all from the same index.

                        I'm not sure about the zoom_wordmap.zdat file, I know we have around 25 words in our Word Skip List (small words) - could that explain that?

                        In terms of the number of pages indexed, I'm not sure why that number would be so different, I may have miscalculated the original amount. Unfortunately the logs were never set up for the zoom search - however I have set them up now to log to file.

                        There is nothing in my 'Status' tab though. Is that because the indexer is not running? It always shows 0 for everything.

                        Thanks

                        Comment


                        • #13
                          Also, our Optimization settings were set to the Fastest Search for that index - that may be hindering the amount of pages indexed. I have bumped that down one increment to see if any difference will come for the next index.

                          I also found that our sitemap lists a URL - but when I search for that page in search.cgi (using words that I know are on the page and in the title), it says there are no results.
                          Is the sitemap a list of pages indexed - or just links found?

                          Just trying to narrow down anything that might be helpful.

                          Thanks

                          Comment


                          • #14
                            Rather than spending any more time debugging V6, I am going to send you trial V7 key to see if the problem happens in V7.
                            I'll also get a copy of your configuration file, so we can see if we can produce a corrupted index here.

                            Comment


                            • #15
                              Fair enough.

                              I will do the upgrade to V7(trial) and start an index of the site tomorrow morning.

                              Unfortunately we are running a newsletter script on the server today, so it would be too server intensive to run both.

                              I will post back with how I go.

                              Thanks again!

                              Comment

                              Working...
                              X