View Full Version : How the page be scanned but not be indexed ?
ivanlam
11-25-2008, 01:58 AM
The index_main.html contains a lot of url that linking to the other informational web pages. The latter are those I would like to be scanned and indexed while the former I don't want it to be scanned but not to be indexed.
I tried to put index_main.html in the "Page and folder skip list", however, seems that the informational web pages url inside the index_main.html were not be scanned and indexed too. Could anyone help to tell me how to do to fit my purpose in the first paragraph ?
I'm using the "Zoom Search Engine Version 5.1 (Build: 1017) Free Edition for evaluation.
Regards.
Ivan
ivanlam
11-25-2008, 02:38 AM
Sorry, the line in the first paragraph should be "The latter are those I would like to be scanned and indexed while the former I want it to be scanned but not to be indexed.
Specify that page as an Additional Start Point. Click on the "More" button to do so. Here, you can change the Spider Option for that start point (click "Edit") to "Follow links only".
An alternative is to use a robots meta tag set to "noindex" (making sure to enable robots.txt support in Zoom)
Another alternative is to wrap the contents of the page you want to exclude from indexing (but the spider to follow the links) by using the ZOOMSTOP and ZOOMRESTART tags.
Look these features up in the Users Guide (http://www.wrensoft.com/zoom/usersguide.html) for more information.
ivanlam
11-25-2008, 08:03 AM
Hi, Raymond,
Thx for the reply, in fact, there is a lot of page that I would like to be scanned but not indexed, ..., all of them have a common property that their filename contains "index" like "index_catergory.html ", "help_index.html" or "abc_index_xyz.html"...
Is there a way to specify a filename that matched the pattern like such "index_*.html" , "*_index.html" or "*_index_*.html" then those files will only be scanned but not indexed.
Regards.
Ivan
wrensoft
11-25-2008, 08:47 AM
You can also add this tag to each of the pages in question.
<meta name="robots" content="noindex">
ivanlam
11-26-2008, 01:10 AM
Does the .PDF be scanned and indexed? Thx.
"noindex" means to not index any content on the page, but continue to follow any links found on the page. "nofollow" specifies not to follow or look for any links on the page.
So if your PDF files are linked from this HTML page containing a "noindex" tag, they will be found and indexed.
ivanlam
11-27-2008, 05:35 AM
so, could the PDF content ( , not title or filename ) be scanned? let's say if there is a word "Help" within the "customerInfo.pdf", will this file be found out to be one of search result link? (The search key word is "Help" ) Thx.
Please check the FAQ:
Q. Does Zoom (with plugins) index all the words inside the PDF and DOC documents? (http://www.wrensoft.com/zoom/support/faq_plugins.html#pluginsindexing)
You need one of the registered editions (Standard, Pro or Enterprise) to index PDF files.
vBulletin® v3.7.0, Copyright ©2000-2010, Jelsoft Enterprises Ltd.