PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Size limits for Excel .xls files - file size limit exeeded

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Size limits for Excel .xls files - file size limit exeeded

    My Zoom installation is not indexing .xls files larger than about 300 KB although my file size limit is set to 1000 KB.

    For instance:
    It scanned a 277,504 byte .xls file (as it should)
    It reports that it skipped a 366,080 byte .xls file
    It scanned a 839,367 byte .doc file (as it should)

    Any ideas what is happening, and how to correct?

  • #2
    Maybe the file is being skipped for another reason and not becuase of the size. For example maybe the link to the file was on another domain. Or maybe you have this file name in your skip list.

    Can you save the log file from the file menu in Zoom and post the section of the log file that shows the file being skipped. Or E-mail us your log file and your Zoom configuration file xxxx.zcfg.

    ----------
    David

    Comment


    • #3
      Here is a typical example:

      [ERROR] Maximum file size limit exceeded by http://mywebsite.com/67-75/submissions/dr_67.00800.xls (check configuration settings)

      Note that the file size is 457, 216 kb and that smaller .xls files in that folder ARE being indexed.

      From zoom.zcfg: MAXFILESIZE_LIMIT:1000000

      Comment


      • #4
        You state your file size is, 457216 kb (457MB).
        The limit you set in Zoom is 1000000 bytes (1MB)

        So the file is bigger than the limit you have set. And the bahaviour would be normal in this case.

        If you have made a typo and the file size is only 457216 bytes (rather than kilobytes) then another possible problem is that the file increases in size after the convert to HTML for indexing. And the same Zoom limit applied to the file after conversion into text for indexing.

        Most XLS, DOC and PDF files get smaller after they are converted to plain text or HTML becuase images and formattting details are stripped out. But maybe your files are getting bigger post conversion for some reason. (e.g. becuase lots of space padding was inserted)

        To confirm this you could,
        1/ E-mail us one of the example XLS files, or give us the URL
        2/ Increase the limit to 2MB instead of 1MB and see if more files are indexed
        3/Post up more of your indexing log. You should be able to see if the error was before or after conversion to text.

        --------
        David

        Comment


        • #5
          > You state your file size is, 457216 kb (457MB) <
          D'oh!

          > Increase the limit to 2MB instead of 1MB and see if more files are indexed <

          I have incresed the limit to 3MB, and it is indeed indexing more files, including one, for instance, that it reported was over 800 kb.

          Seeing that there is really no downside in increasing the filesize limit, that fix will work fine for me. I believe that your analysis about the files swelling in size upon conversions is right on. There is so much garbage in some of these files that I wouldn't be surprised!

          > E-mail us one of the example XLS files, or give us the URL <
          I'll do that so you can see what I mean.

          You can consider this case closed. It works.

          Thanks

          Comment


          • #6
            Firstly there *is* a downside to increasing the filesize limit. It will result in increased RAM usage.

            We have examined the Excel file that you sent as an example.

            The size of the example file is 358KB. After we converted it to HTML it turned into a massive 32MB.

            Looking into the Excel file provides the explaination. The last cell in your Excel sheet is cell 7742:IV. (You can jump directly to the last cell using CTRL-END on the keyboard).

            This means you have 1.8 Million cells in use in your spreadsheet (even though most of them are empty). The HTML conversion process turns empty each cell, that is in use, into,
            <TD></TD>
            which explains the 32MB HTML file that is produced as output. And explains why this is above the limit set in Zoom.

            So the solution is to remove the unsed cells from the spreadsheet. Which is not as easy as it sounds despite it being a common problem.

            Have a look at the Microsoft prodcedures here,
            http://support.microsoft.com/?kbid=244435

            Microsoft method 1 of deleting unused rows and columns didn't work for your spreadsheet. I don't know why. Maybe your sheet was imported from Lotus 1,2,3 or elsewhere?

            Method 2 using the XSFormatCleaner.xla add-in did a better job.

            It reduced the size of your binary Excel file from 358KB to 36KB and reduced the HTML file size from 32MB to 140KB.

            So you need to clean up your Excel files and this should solve your problem.

            --------
            David
            Wrensoft

            Comment


            • #7
              MS Excel Excess Format Cleaner

              Wow! I wish that I have known about the excess format cleaner earlier. I thought that these files seemed large for their content. I have read the Microsost Knowledge Base article and have installed the cleaner.

              I applied it to a file that was 462,848 bytes. It reduced it to 128,000 bytes. And I suspect that the difference when converted to HTML will be even more pronounced.

              I then downloaded JD TouchPro to restore the file's original modification date. I'll have to do this to all of our .xls files. (Well, at least it'll keep me off of the streets!) :P

              Thanks for all of your help. You have demonstrated that Zoom has excellent support.

              Comment


              • #8
                Follow up

                I am posting this follow up just in case there is anyone who is unconvinced that excess formatting can make a significant difference in file size. I have just applied the excess formatting tool on a 1,035,776 byte file. It reduced the size of the file down to a slender 47,616 bytes.

                Comment

                Working...
                X