PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

File permissions and Indexer crash

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • File permissions and Indexer crash

    We recently upgraded to V6 and our website manager set this up successfully but now the following error appears on our website:

    Pageinfo file and PageInfoSize does not match NumPages specified.Unable to open zoom_pageinfo.zdat
    Check file permissions and that file exists

    When trying to start indexing the indexer crashes after a few seconds.

    Our website manager may know what the issue is but he is uncontactable for the next month! Any ideas?

  • #2
    Pageinfo file and PageInfoSize does not match NumPages...
    This will be because the index is corrupted. This can happen if you only upload half the files, upload a mixed set of files from V5 and V6, or do something like use the V5 search script with the V6 set of index files.

    the indexer crashes after a few seconds...
    Are using using the latest release of V6? Which is, Version 6.0.1027 (22 Aug 2011) at the moment.

    What is the message? Or can you send us a screen shot.

    If the indexing actually starts before the crash can you go into the "Index log" configuration and turn on logging to a file and "debug mode". Then send us the log.

    Comment


    • #3
      Hi Wrensoft,

      I am having the same error.
      'Pageinfo file does not match PageInfoSize specified.Unable to open zoom_pageinfo.zdat
      Check file permissions and that file exists'

      I reuploaded the files and still get the same error.
      So then I reindexed, and it uploaded fine but I still get the same error.
      I checked the file permissions and set them to 777 just to be sure, and still no luck.

      Another thing which might be related is that there are all these 'core' files (core.1365, core.2542, core.2711 etc etc) being created over the time period of when the index is running. Each one is between 65 - 70 Mb and one is created about every 5 - 15 minutes. They all have a permission of 600.
      I've never seen these files before and am not sure where they are from. Are they a product of Zoom?

      Comment


      • #4
        Please check the version and build of Zoom you are using (click "Help"->"About" in the Indexer).

        Check that you have uploaded the "settings.zdat" and the "search.cgi" file that is created with the latest index. If you omit to upload the new files created, you would be using an older version of these files (from a previous build) with the latest .zdat index files.

        Also make sure you are uploading to the correct folder that is hosting the files. We have often seen users who upload to the wrong folder and not actually change the files they are looking at.

        ALSO clarify if you are using Zoom to upload the files or another program.

        If you are still experiencing problems, then ZIP up your search files (all the .zdat files and .cgi file you are uploading) and e-mail them to us or make them available for download from your site. We can then take a closer look.

        Zoom does not create "core" files. These are likely core dumps from your machine. Can you clarify which computer you are seeing these files on, you say that they are "created over the time period when the index is running", which seem to imply its the computer running the Indexer. However, they seem to be Unix based core dump files, and the "600 permission" is also Unix-based. The indexer only runs on a Windows computer, so this doesn't make much sense.

        If these are files that appear on your Linux web server, and the time period happen to coincide with the indexing (being performed on your Windows computer), then this has nothing to do with the indexing. Something else on your web server may be crashing regularly to cause these core dumps.

        The Zoom Indexer running on your Windows computer has no open file connection to write to your web server during the indexing process. It only has this ability at the end when it does the FTP upload (if you have configured Zoom to automatically upload for you).

        If these core dumps are created in the same folder during the FTP upload, it might be that the search is being executed while the files are being uploaded, and it behaves unexpectedly because the index files are incomplete. In this case, go to "Configure"->"FTP" and check the option to "Upload with .tmp filenames and rename when completed". This will eliminate the offline time of the search function while the index files are being uploaded.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Hi Ray,

          Thanks for the excellent and quick response.

          Our version is 6.0 Build 1025

          The 'core' files are probably a core dump like you said. So they are being produced on the linux web server while our windows server runs the zoom indexer. At the end of the indexing process the .zdat etc files get uploaded by zoom.

          I had "Upload with .tmp filenames and rename when completed" ticked.

          I am running the indexer again to see if more core dump files are produced and to see if it works this time around.

          Thanks again for your help, I will post how it goes.

          Comment


          • #6
            The indexer completed. And I don't get the 'Pageinfo file and PageInfoSize does not match ... ' error.

            I had Limits > 'max. files to index' set to 100,000.
            Usually I have this set to 1,000,000.

            Core dump files were produced like before.

            I think possibly the web server filled up with core dump files and ran out of disk space, so when the large .zdat files were attempted to be uploaded, they could only load partially?

            Our web server does give back a few 503 errors while the indexer is running, maybe they are causing the core dump files.

            A couple weeks ago we implemented some iframes (for displaying ads) on every page, could these iframes affect things? We don't use frames or iframes elsewhere so I would be happy to exclude the indexer from following into iframes.

            Comment


            • #7
              With the iframe thing, I have put the URL that is loaded in the iframe into the skip list. It is a page from our domain.
              Hopefully this will stop the indexer going to it, and maybe stop the 503 errors and core dumps.

              Comment


              • #8
                IFRAMES can't cause crashes or core dumps. It is just a presentation markup instruction given to the browser to show the page within a frame.

                What other scripts or CGI do you have running on your web site? My guess is that there is a script that gets evoked via certain URLs. That script/CGI crashes, so when the spider is following certain links on the website and makes a request for that "page", something crashes on your web server.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  There is an article system on our website that is a cgi based system. And we have javascripts running on every page. We also call javascripts from external sites. And we've got plenty of php / Mysql pages.
                  I've been instructed to run the indexer over the weekend with the iframe files skipped, in order to rule this out. Then I'll have a look at something else, maybe the articles.

                  Comment


                  • #10
                    The indexing worked and no core dump files were created.
                    I've set the indexer's file limit from 100,000 to 200,000 and I'll see how it goes next weekend. With the extra 100,000 pages it will take longer to index. It will probably be fine. Each time it successfully indexes, I will double the amount of pages indexed, until I'm up to 1.6 million.
                    Its possible that when its running for a couple days it requests pages when the database is trying to do a backup or the web server is doing a backup image - so that might produce the core dumps. I'll try to keep this thread updated with my results.

                    Comment


                    • #11
                      Indexing worked fine on the weekend. I've now upped the file limit from 200k to 400k for next Saturday.
                      I've also turned off all server based web analytics like webalizer, awstats, etc, which were creating large domlog files. This way I can be sure that zoom will have enough space on the server to upload large *.zdat files.

                      Comment

                      Working...
                      X