PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

V5 development progress - Content filtering

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • V5 development progress - Content filtering

    This is a article giving an overview of a new feature coming in V5 of Zoom called Content Filtering.

    V5 will be a free upgrade if you purchase V4 now, or have purchased V4 in the 6 months previous to the V5 release.

    The new Content Filtering feature will allow you to filter out an entire page based on words found within the page's content. A list of filter words can be entered, prefixed with a "+" or a "-". You can specify positive filters (keywords beginning with a "+" character) which means that only pages with these words will be indexed, or you can specify negative filters (keywords beginning with a "-" character) meaning that pages containing these words will NOT be indexed.

    This can be useful for two reasons
    1) It helps if you want to create a specialised 'vertical' search engine. For example you could create a search engine about pets. In this case the word filter might look like, +dog +cat +bird +mouse +hamster +pet, etc, etc...

    2) You might want to avoid indexing some types of content. For example if you were building a religious search engine or a search engine for children, you might want to use negative filters. -adult -casino -sexual -intercourse -porn, etc...

    In V4 of the software you can already filter page based on the URL. This works well and will still be available in V5. But with URL filtering you need to know in advance the page's URL and that the page has bad content in order to filter it by its URL, and this is not always possible when indexing 3rd party sites.

    However content based page filtering will be less efficient than URL based filtering because a page must be downloaded before it can be filtered. With URL based filtering the page can be discarded before it is downloaded, thus speeding up the indexing process. So URL indexing should still be used when possible. Nevertheless Content Filtering will be a powerful new feature for people indexing 3rd party web sites.

    ---
    David
Working...
X