Zoom Search Engine FAQ - Indexing message boards, forums, etc.

Q. How should I index my site if it features a message board, forum, or calendar and other similarly complex scripts?

If your site features a message board, discussion forum (eg. phpBB, UltimateBB, vBulletin, etc.), a dynamically generated calendar, or other similarly complex scripts, please read the following.

Due to the nature of these web scripts, it is important to configure Zoom carefully if you wish to index them as part of your search engine. While spider mode indexing allows you to index these features, careful configuration is required to avoid indexing too much irrelevant information. When you allow the spider to index the large number of content-irrelevant pages created by these scripts, you are reducing the effectiveness of your search results (by returning too many pages that a user would not find useful), as well as significantly extending the time required to index your site, and wasting resources in terms of bandwidth and disk space.

The reason that this is required, is because spider mode is designed to follow every legitimately different link on a web page. But in the case of these scripts, there can often be many useless pages which are simply user options (eg. login procedures, sorting options, various display modes of the same page) and in some cases, they can even create an infinite number of pages - for example: a calendar script which shows the days of the month, and allows you to browse "next" and "previous" months indefinitely. The spider could potentially follow these links and index every day of the month that the web script allows (most do not have a limit and you'll end up indexing all the months from the year 0 up to the year 99999 and onwards).

In most cases, it would make sense to avoid indexing these sections of the site. Most forums feature their own search facility so you can usually exclude the entire forum from the main site's search engine. You can do this easily by using the "Skip pages and folder list" in the Configuration window, under the "Skip Options" tab. For example, if your forum is hosted at "http://www.mysite.com/forum/", you can skip the entire forums with a skip page entry of "/forum/". Similarly, you can skip calendars and alternate views by specifying the folder, filename, or any recognizable parameters in the URL (eg. "&month=", etc.). See the Users Guide for more information on the skipping pages.

If you actually want to index your forum/script as part of your site's search engine, then you will need to carefully consider which links should be omitted and which needs to be indexed.

Below are some example skip lists for some known forum packages. Note that the same concepts apply to other scripts such as VBulletin, Invision, Ikonboard, Gallery, Calendar, etc. If you wish to include them as part of your site's search engine, take the time to run some tests and determine all the different pages that you should skip so that you don't run the risk of over-indexing your website.

Indexing phpBB forums

As an example, the phpBB script contains many pages which you may want to exclude, such as the members list, profile pages, private message pages, login pages, etc. The following is a small example skip list of pages and parameters that you would most likely want to skip when indexing a phpBB website:

/forum/faq.php
/forum/search.php
/forum/profile.php
/forum/privmsg.php
/forum/groupcp.php
/forum/viewonline.php
/forum/memberlist.php
/forum/login.php
/forum/index.php?mark=forums
/forum/posting.php
/forum/modcp.php
/forum/viewtopic.php?p=
&view=previous
&view=next
&watch=topic
&mark=topics
&start=0&postdays=0

This should allow Zoom to index all the topic pages (as opposed to individual posts) and assumes you have the forums in a sub-directory named "forum". Note that this is not necessarily a comprehensive list, and you may wish to add/remove pages to suit your site. Run some test indexing sessions and analyse the log window to determine if there are other pages indexed which you do not need.

Indexing Ultimate Bulletin Board (UBB) forums

Here is a example skip page list for indexing UBB sites. Please see the above explanation for more information on why this is necessary or what this achieves.

?ubb=private_message
?ubb=edit_post
?ubb=send_topic
?ubb=report_a_post
?ubb=reply
?ubb=get_ip
/ubb/get_profile/
?ubb=get_daily
?ubb=next_topic
?ubb=delete_topic
?ubb=print_topic
?ubb=close_topic
?ubb=stick_topic
?ubb=send_topic
?/ubb/my_profile.html
?/ubb/directory.html
?/ubb/search.html
?/ubb/logoff.html
ubb=poll
ubb=transfer

Note that this is not necessarily a comprehensive list, and you may wish to add/remove pages to suit your site. Run some test indexing sessions and analyse the log window to determine if there are other pages indexed which you do not need.

Indexing vBulletin forums

Here is a example skip page list for spidering vBulletin forums. Please see the above explanation for more information on why this is necessary or what this achieves. This assumes your forum is installed in a directory called /forum/

/forum/private.php
/forum/usercp.php
/forum/faq.php
/forum/memberlist.php
/forum/calendar.php
/forum/search.php
/forum/forumdisplay.php?do=markread
/forum/login.php
/forum/modcp/
/forum/member.php
/forum/showthread.php?goto=newpost
/forum/newthread.php
&daysprune=-1&order=
/forum/showthread.php?p=
/forum/showthread.php?mode=hybrid
/forum/showpost.php
/forum/editpost.php
/forum/newreply.php
/forum/online.php
/forum/profile.php
/forum/report.php
/forum/postings.php
/forum/misc.php
/forum/subscription.php
/forum/poll.php
/forum/sendmessage.php
/forum/printthread.php
&goto=nextnewest
&goto=nextoldest
/forum/infraction.php
/forum/archive/
/forum/viewtopic.php
/forum/showgroups.php
/forum/cron.php
/forum/admincp/

Note that this is not necessarily a comprehensive list, and you may wish to add/remove pages to suit your site and your version of vB. Run some test indexing sessions and analyse the log window to determine if there are other pages indexed which you do not need.

Indexing Mambo or Joomla! websites

Joomla! (previously known as Mambo) is an open source Content Management System (CMS). As with many large, complicated CMS's, there may be a number of issues when spider crawling these websites due to the fact that they are rarely search engine friendly. In addition to this, since they are usually heavily configurable and customizable, it is difficult for us to give specific instructions since every install can vary greatly depending on the components you choose to use. However, it is possible to index Joomla sites with Zoom and we will try to provide some general advice below regarding our experiences.

For a majority of Joomla!-based websites, a well defined skip list may be all that's needed. Please see the above explanation for more information on why this is necessary or what this achieves.

/task,calendar/
/task,register/
/task,lostPassword/
/task,userProfile/
/option,com_submissions/

The above will skip the calendar component, register/login, profiles and user submissions. On some sites however, there may be pages with variable components on the side which will change based on an "Itemid" parameter. This causes the existance of many distinct URLs which actually point to the same content page, but with a slightly different sidebar (eg. a "Who's online" box, Events box, etc.). These pages usually look something like this:

http://mysite.com/component/option,com_news/Itemid,100/

Where the number following "Itemid," changes the page sidebar or other components. In such cases, it may be possible to skip these variations, and only index one of these pages by skipping the various other Itemid values. For example, you could skip all variations such as "/Itemid,101", "/Itemid,102" etc. so that only "100" is indexed. Alternatively, if you have a link to the page without the Itemid attribute at all, you can skip all variations by simply having one skip list entry for "Itemid,".

This is made tricky however, if you, ironically enough, have some "search engine friendly" URL settings enabled in Joomla. Some of these settings rewrite the URL so that they may look like this:

http://mysite.com/content/view/5/100/

While some people believe that such URLs, which make the CMS parameters appear as if they are merely subfolders in the path of the URL, and thus make them more "attractive" to search engines (in that it forces the search engine to index more pages from the site); it can in fact, have a negative effect in that it now makes it impossible to recognise the parameters should we want to skip or ignore certain pages intentionally. This means you may end up indexing multiple pages of similar content - and this can have a negative effect on something like Google, if it decides that too many pages of your website look the same and believe your site is spamming.

In such cases, it may still be possible to filter out pages using the "Content filtering" option in Zoom (on the "Content filter" tab of the Configuration window). Here you can specify keywords that you wish to filter out pages by, if the page contains this word. You can also specify HTML in this list. So a content filter list like the following would skip all pages containing the "Who's Online" information box or any page containing the "noindex" meta robots tag:

-Who's online
-<meta name="robots" content="noindex">

For more information on using Content Filtering and the other indexing options of Zoom, please refer to the Users Guide.

Return to the Zoom Search Engine Support page