PDA

View Full Version : Version 3.1 Only Grabbing Page Titles



barkeep
09-06-2005, 04:40 AM
What would cause the indexer to skip all the content in all of my pages and only do the page titles?

Are there certain tags or known content parsing issues?

barkeep
09-06-2005, 04:35 PM
Here is the web site I am indexing. This is a prototype site that we plan to release soon.

http://www.rossgroupinc.com:8001/

wrensoft
09-06-2005, 11:43 PM
I had a look at your site.

You seem to be using a search script called readweb.asp which is not our script. I couldn't find our search.asp script on your site.

We don't know anything about your readweb.asp script and so can't really comment on its behaviour.

I tried indexing a few pages on your site and the content is being indexed. So I don't think there is a problem with the pages that prevents the content from being indexed. Did you turn off content indexing from the Zoom config window?

------
David

barkeep
09-07-2005, 07:56 PM
I do not see that as an option.

After you run the index against our site. Look at the output files it creates, it misses words such as "Redundant" found on the following page: http://www.rossgroupinc.com:8001/readweb.asp?wid=2947

I can see that it indexed that page, but it only appears to have indexed the words in the page title only.

wrensoft
09-07-2005, 09:09 PM
Using V4.2 of Zoom I indexed the page you suggested. Then had a look in the dictionary file (zoom_dictionary.zdat) to check the words that were indexed.


pro-active 820
order 830
HA 840
clients 850
Redundant 860
connections 870
Internet 880
backbone 890

The word 'Redundant' appears in the file. So I don't think there is any problem with V4.2 of Zoom.

I repeated the test with V3.1 build 1033 of Zoom. The contents of the dictionary file were,


pro-active 840
order 850
HA 860
clients 870
Redundant 880
connections 890
Internet 900
backbone 910

So even with the old version of Zoom there doesn't seem to be a problem on the indexing side.


I do not see that as an option.

Yes sorry. This option to turn off content indexing is only available in V4 of the software. Not V3.

-----
David

barkeep
09-08-2005, 06:30 PM
We are using build 1001 so I am assuming the problem was fixed in the build you are using.

If I upgrade to that build, will the asp file that we modified still work with the .dat files output by the new build? Or are the .dat files in a different format?

wrensoft
09-08-2005, 09:10 PM
I don't know what you have done in your custom script, so I can't promise your script will work.

Even if you were just using our default script from V3.1 build 1001 with index files generated with build V3.1 build 1033 I think there might be some problems. In general you can't grab index files from version X and assume it will work with script version Y. (The script version and indexer version should match, we don't test any other combination)

Between builds 1001 and 1033 there were a lot of changes. See, http://www.wrensoft.com/zoom/whatsnew.html
and
http://www.wrensoft.com/zoom/oldversionhistory.html

---------
David

barkeep
09-13-2005, 03:26 PM
I upgraded to the current version and the indexer is now properly indexing all parts of the page.

Thank you