PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

A couple of feature requests for an excellent search engine.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A couple of feature requests for an excellent search engine.

    I recently purchased the professional edition and I am very impressed with the feature set. You guys have thought of almost everything. I do have two minor suggestions.

    First it would be nice if you added the @ symbol to the indexing word rules. That way email addresses could be considered one word. It isn't too difficult to put a space in the in place of the @, but for less savvy users this would be nice to not have to explain. (We use zoom as a search engine for our tech support system and searching for a users email address is very helpful.)

    Second, I am using the command line spider mode. Its really nice to be able to update an index. I currently have about 120,000 pages in a database. I created a script which creates a url list of whats updated over the last day. This works great. However what I really need is a way to tell the spider to do a complete reindex instead of just an update. Maybe I am missing something but is what I would like to do is import a full list of 120,000 urls with the
    http://www.mydomain.com/issue.php?issueid=116291, INDEX_ONLY
    I can apparently do this if I import a list manually to a cfg file or update an existing index. However this list changes so is what I would like to do is once a month use the commandline mode to reindex everything to keep it fast by automatically importing a text file of the current full list of urls. Then use the spider update mode for a month updating just whats changed, then rebuild the entire index again at the beginning of each month. I have about 300-400 pages changing daily.

    So I guess my question/feature is there a good way to automatically reindex a large list of URL's loading the url's from a log file. I guess I could create a page which contains all of the links but I would prefer to only spider certain links, so the index_only feature is really nice for this.

    Thanks for an incredible product. I am very impressed with this search engine.

  • #2
    The process we envisaged that people would follow would be to do a full index of their site periodically, then do minor incremental updates in between the the full indexing sessions.

    The full indexing sessions (which is the default operation) would download all pages and completely replace the existing index.

    For a full index you would not normally need a list of URLs to index. You would normally allow the spider to discover the pages on your site by following links found (starting at your home page for example).

    Importing a list of 120,000 start points, (which are all part of the same domain) is less efficient than allowing the spider to find the 120,000 URLs. This is becuase Zoom processes a single start point at a time, in a serial fashion.

    Is there a reason why you think spider mode wouldn't work? But if this was the case, how did you build the initial index?

    Comment

    Working...
    X