PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

[ELCOLOMBIANO] - Question in the Incremental indexing.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [ELCOLOMBIANO] - Question in the Incremental indexing.

    We have wrensoft zoom search enterprise Order number: WS73NF3137 Date: 22/Feb/2013

    Due to our needs, we need to set up various start points to index part of the site, keep in mind that our site is http://www.elcolombiano.com, that way we don't have a full index home page but a series of articles in its own URL, that is why we create a bunch of files for this purpose and we need to perform incremental indexing due the large size of the site and the regular changes on the files.

    I have put a sample in http://www.elcolombiano.com/bancomedios/z1/t1.html http://www.elcolombiano.com/bancomedios/z1/t2.html http://www.elcolombiano.com/bancomedios/z1/t3.html and http://www.elcolombiano.com/bancomedios/z1/t4.html

    For your review I'm attaching at the bottom of the message the configuration file text we are using to perform this steps.

    These are my steps I performed let me know if am I doing something wrong:
    NOTE: We use HTML and ASP extensions only and we are going to setup for spider mode.
    1. Open up the Zoom search engine
    2. In start Options press MORE and write the following in the dialog
    Spider URL: http://www.elcolombiano.com/bancomedios/z1/t1.html (if you see this file in source mode, you will see only a bunch of links in <A> HTML tag) for the URL
    BASE URL: http://www.elcolombiano.com/ for the BASE URL
    Spidering Options: "Check the Follow all links on this page only"
    3. Save configuration
    4. Run the indexer
    5. When it is done do not upload anywhere.
    6. From this point we are having trouble
    7. Activate the Index>Incremental Indexing>Add start points (or domains) to existing index (which I think is the option we need for this setting).
    8. Add the following start point with the following settings
    Spider URL: http://www.elcolombiano.com/bancomedios/z1/t2.html
    Base URL: http://www.elcolombiano.com for the BASE URL
    Spidering Options: Check the Follow all Links on this page only
    9. Click in Proceed and Voilá, here is the problem, no indexing occurs even tough the file is correct.


    Note: We tried using both start points not using the incremental option the indexing occurs correctly.

    Question is, can you please let me know if I am doing something wrong ?
    Can you help me fixing it?

    We still have some doubts on the Incremental Search. Let say we use one file as a start point but this file changes during the day including in it more pages to index, can I run the indexer several times over one file incrementally? How can I do this?

    Thank you hope we are clear and let us know any question,

    ///////////////////////zoom.cfg content////////////////////////////
    __6_0
    #STARTDIR:
    #SPIDERURL:http://www.elcolombiano.com/bancomedios/zoom/t1.html
    #BASEURL:http://www.elcolombiano.com/
    #OUTDIR:\SitiosWeb\Sitio\buscador
    #SPIDERURLTYPE:5
    #SPIDERURLUSELIMIT:0
    #SPIDERURLLIMIT:0
    #SPIDERURLBOOST:0
    #USE-CRC:1
    #CURRENTMODE:1
    #DLTHREADS:10
    #NOCACHE:1
    #BEEP-ON-FINISH:0
    #THROTTLEDELAY:200
    #OUTPUT:ASP
    #OUTPUT_OS:0
    #ISDOTNET:0
    #VERBOSE:0
    #LOGMODE:1
    #LOGOPTIONS:INDEXED|SKIPPED|FILTERED|INIT|DOWNLOAD |UPLOAD|FILEIO|PLUGIN|INFO|ERROR|WARNING|QUEUE|SUM MARY|THREAD|BROKEN|
    #LOGWRITETOFILE:0
    #LOGWRITETOFILENAME:C:\Documents and Settings\All Users\Application Data\Wrensoft\Zoom Search Engine Indexer\temp\indexlog.txt
    #LOGAPPENDDATETIME:1
    #LOGDEBUGMODE:0
    #LOGHTMLERRORS:1
    #SCAN_NOEXTENSION:0
    #SCAN_FILELINKS:0
    #SCAN_USELOCALDESCPATH:0
    #SCAN_LOCALDESCPATH:
    #SCAN_ROBOTSTXT:1
    #SCAN_CHECKTHUMBS:0
    #PARSEJSLINKS:1
    #REWRITELINKS:0
    #REWRITEFIND:
    #REWRITEWITH:
    #INDEXOPTIONS:METADESC|CONTENT|TITLE|
    #RESULTOPTIONS:TITLE|METADESC|CONTEXT|DATE|
    #USE-UTF8:0
    #CODEPAGE:28591
    #USESTEMMING:0
    #STEMALGO:2
    #DIGRAPHS:0
    #ZLANGFILE:Spanish.zlang
    #SKIPUNDERSCORE:1
    #MINWORDLEN:2
    #FORMFORMAT:2
    #HIGHLIGHTING:1
    #GOTOHIGHLIGHT:0
    #USEXML:0
    #XMLTITLE:
    #XMLDESC:
    #XMLURL:
    #XMLXSLTURL:
    #XML_OPENSEARCH_DESCURL:
    #XMLHIGHLIGHT:0
    #LOGGING:0
    #LOGGING_FILE:./logs/searchwords.log
    #TIMING:0
    #NOCHARSET:0
    #DEFAULT_TO_AND:0
    #CONTEXTSIZE:30
    #EXACTPHRASE:0
    #SEARCHASSUBSTRING:0
    #STRIPDIACRITICS:0
    #NO_TOLOWER:0
    #ZOOMINFO:0
    #USEDATETIME:1
    #WORDJOINCHARS:.-_'
    #ZOOMIMAGE:0
    #SPELLING:1
    #SPELLINGWHENLESSTHAN:5
    #WIZARD_UPLOADREQD:0
    #REPORTUSEDATES:0
    #WORDWEIGHT_TITLE:3
    #WORDWEIGHT_DESC:0
    #WORDWEIGHT_KEYWORDS:1
    #WORDWEIGHT_FILENAME:0
    #WORDWEIGHT_HEADINGS:2
    #WORDWEIGHT_LINKTEXT:0
    #WORDWEIGHT_CONTENT:-1
    #WORDWEIGHT_DENSITY:1
    #WORDWEIGHT_SHORTURLS:1
    #WORDWEIGHT_PROXIMITY:1
    #USE-AUTH:0
    #USE-COOKIES:1
    #USE-COOKIELOGIN:0
    #BINUSEDESC:0
    #PLUGIN_DESCFILES:
    #PLUGIN_USEMETA:PDF|DOC|PPT|RTF|SWF|WPD|XLS|DJVU|I MAGE|MP3|DWF|OFFICE|
    #PLUGIN_USETECHNICAL:MP3|IMAGE|DWF|
    #PLUGIN_TEXTONLY:
    #PLUGIN_PDF_METHOD:0
    #PLUGIN_PDF_HIGHLIGHT:1
    #PLUGIN_IMG_MINFILESIZE:5
    #PLUGIN_ZIP_EXTRACT:1
    #MAXPAGES_LIMIT:65500
    #MAXWORDS_LIMIT:500000
    #MAXFILESIZE_LIMIT:4194304
    #DESCLENGTH_LIMIT:300
    #OPTIMIZE_SETTING:6
    #EXTENSIONS_START
    .html|FILETYPE:0
    .asp|FILETYPE:0
    #EXTENSIONS_END
    #SKIPPAGES_START
    ventanas
    #SKIPPAGES_END
    #SKIPWORDS_START
    #SKIPWORDS_END
    #USECATS:0
    #USEDEFCATNAME:0
    #SEARCHMULTICATS:0
    #DISPLAYCATSUMMARY:1
    #RECOMMENDED_MAX:3
    #USEFILTER:0
    #FILTER_START
    #FILTER_END
    #SITEMAP_TXT:0
    #SITEMAP_XML:0
    #SITEMAP_UPLOAD:0
    #SITEMAP_UPLOADPATH:
    #SITEMAP_USEPAGEBOOST:1
    #SITEMAP_USEBASEURL:1
    #SITEMAP_BASEURL:

  • #2
    [Note that I have also responded to your e-mails]


    It turns out to be a rather obscure scenario that reproduces the problem.


    The problem only occurs IF you use the "Incremental Add Start Points" feature immediately after a full indexing with a start point that has the "Follow All Links On This Page Only" spider option.


    If you close Zoom after the first full index, then open Zoom again, before invoking the "Incremental add start points" feature, then it will not be a problem. As you can imagine, typically the incremental add option is only used in a different session (and not immediately after the full indexing).


    We will fix this behaviour in the next release.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Solved, yes, this is a problem in V6.0.1028.

      Comment

      Working...
      X