Im hoping that this will be added in the future. The site im indexing has many redundant pages. They're technically different (CRC check), but the difference is really only the title of page for instance, or maybe a thumbnail picture. Its because the site allows the user to access the same resources from different navigation sections.
So the problem is that for more common searches, 400 results are returned and 380 of them will be the same page but its different versions.
I completely understand that this is NOT the fault of the zoom indexer. Its doing what its told, but it would be great if I could specify exclusions based on querystring arguments or even use Regular Expressions (a la ISAPI Rewrite).
You can already skip pages based on the query string argument if it is passed via the URL (as HTTP GET parameters). For example, you may have links which sorts a page of listings in descending order, like so:
In which case, you can skip all pages with "&sort=desc" in the URL. You can do this from the Configuration window, under the "Skip Options" tab, and entering "&sort=desc" or just "sort=" in the "Page and folder skip list".
Similarly, other "alternate views" of pages with the same content can be excluded, such as printer-friendly pages, etc.
Wrensoft Web Software
Zoom Search Engine