Announcement

**Ray** · Oct-31-2008, 07:45 AM

More new features:

Category results summary (x results found for category A, y results for category B, etc)
Automatic login for cookie-based authentication: Zoom can attempt to login to cookie-based login pages (like PHP pages). It will attempt to mimic the form parameters and send a HTTP POST with your login details. This means it can now login to websites which are not protected with HTTP authentication (which Zoom already supported).
ZIP file indexing: You can now index the content of ZIP files. Zoom will actually extract the files within ZIP archives and index each one individually.
Wildcard support for Skip Pages and Disallow: entries in robots.txt files.
Wildcards for Recommended Links (so you can specify one recommended link that will match multiple search words and queries)
Option to truncate URL displayed in search results
More Custom Meta Field options including: "Money" and "Multi-select" data types, and "Partial text matching" search method.
New weighting option: "Body content" - allows you to give (or lower) preference to content found within the <body> ... </body> tags of a page. This means you can decrease the weight of the main content of a page, and effectively increase weighting for text found in headings, titles, etc. more than the current weight settings allow.
Improved "Content Density" weighting to exclude mark up code, which should make this more effective.
A new PHP script to generate web site search statistics on the server in real time. (Previously stats reports needed to be generated offline)
Significantly improved the "Additional start URL" window in handling a large number of additional start points. For those of you who are indexing 10,000+ domains, this should be a godsend.
New Status window features Progress for each thread, and also System Information such as CPU load, memory load, and physical and virtual memory information. Screenshot below:

... and of course, many other improvements, bug fixes, and performance optimizations than we can fit here.

**Ray** · Nov-14-2008, 06:27 AM

And few more in the latest beta release...

JavaScript capacity increased!
A big change is in the JavaScript version, which utilizes a new method to get around a significant limitation in the scripting engine of browsers such as Internet Explorer. This new implementation allows you to index and search a much larger number of files using JavaScript than ever before. We have tested the new version to index upwards of 10,000+ pages and 100,000+ unique words. However, performance varies on Internet Explorer, which is the slowest performing browser (in terms of JavaScript execution). Firefox and Chrome fares much better with such an intensive JavaScript application.

PHP and ASP capacity
We have also increased the maximum unique words limit for PHP and ASP from 300,000 to 500,000. This was made possible by the optimizations and improvements we have made for these two scripts in V6.

Search Statistics PHP Script
There is a new script which you can use to generate live statistics on your web server. This script is only available for PHP, and does not generate graphical charts like the "Statistics Report" tool in the Indexer. It will however, provide concise, up-to-date statistics on your server without needing to download the log file. See the Help file for more information (under the "Advanced Options" chapter).

Other new improvements include:

Faster offline indexing with large folders
Improved error message reporting
Option to index "param" tags in the form of:
<param name="Proprietary.Data" value="Serial#12344451">
Index ZIP files found within other ZIP files (and the files contained within the recursive ZIP files).
Maximum plugin password length increased from 20 to 40 characters

**David** · Dec-17-2008, 11:33 AM

And one more feature before the final release.

Native 64bit version of the indexer & higher CGI capacity
The Zoom indexer will now be available in both native 32bit and 64bit executables. The 32bit software is full compatible with, and could always run on, 64bit versions of Windows. But it was limited to using only 2GB of RAM, regardless of how much RAM was actually installed in the machine. This was a Window 32bit limitation. The native 64bit release of Zoom in V6 allows an almost unlimited amount of RAM to be used, if the RAM is physically available in the machine and you are running a 64bit O/S.

There is no functional difference between the 32bit and 64bit releases. They have an identical set of features. The only difference is in how much RAM they can use. Being able to access more RAM, means a higher potential capacity. But as the capacity of the 32bit release was around a million pages, very few people will need to use the 64bit release. So the 64bit release will be made available as part of the Enterprise edition.

Removing the RAM bottle neck in the indexer, by itself, doesn't allow for significantly improved capacity however. There were other limits in the CGI script and in the index file format, which was effective 32bit in nature. For example the index had 32bit file pointers and on old versions of Linux it was not possible to handle index files larger than 2GB because of operating system limits.

So we have systemically restructured the index file format and it should now support files of around a terabyte in size, at least in theory. (In practice things start to get impractical one you get into 10's of Gigabytes).

Long time users might be wondering about the CGI's themselves. Are they also 64bit? The answer is no. The CGI's remain as 32bit executables. They don't use enough RAM to justify making a 64bit release, and the 32bit CGI's remain compatible with 64bit Windows and Linux. So there is no benefit to changing them.

So what does this mean for capacity? Capacity in the V4 and V5 releases was effective limited by O/S and file system limits. Now with V6 the limits are related to how much RAM you have installed and how fast your hardware is. Faster hardware allows a larger capacity as large data sets can still be indexed and searched in a reasonable time frame. Over the next few weeks will be posting some benchmarks, but initial testing in house has show search times of a few seconds are still possible with data sets over 2 million pages on a single machine.

**David** · Dec-17-2008, 11:34 AM

V6.0 is now available, 17/Dec/2008.

So this thread can now be closed. But please feel free to open up new V6 threads as issues arise.

Announcement

New features in V6

New features in V6

Comment

Comment

Comment

Comment