PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

website based on java portal ... doesn't work?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • website based on java portal ... doesn't work?

    Hi team,

    first of all I wanted to say thanks for the great software. I found it today and I planning to use it for private and business.

    My company want's to use Zoom Search Engine and actually they think about a Pro edition. But for some reason I can't index our website.

    We work with a java portal (liferay) which contains mostly jsp files. I added .jsp to the configuration of zoom search but the same thing - no errors but also nothing found.

    So when I start indexing I get no search results back. No file will be indexed. What can I do to make this work?

    Our website is located under ...
    www.bnwtravel.com

    Cheers Marcel

  • #2
    The first problem is that the bnwtravel site uses frames and is somewhat broken as a result. So you should fix the site first, preferably by removing frames as they are not search engine friendly (or even human friendly for that matter).

    The next serious problem is your home page. Here is the entire code for your home page, index.html.
    Code:
    <html><head>
    <title></title>
    <meta content="0; url=/c" http-equiv="refresh">
    </head>
    <body onLoad="javascript:location.replace('/c')">
    </body></html>
    This is not a useful home page for a human, nor a search engine. So you'll need to fix this as well. Preferably by having a real home page.

    Then have a read of these FAQ questions.

    Using Zoom with Frames:
    http://www.wrensoft.com/zoom/support...to.html#frames

    Links not being followed and pages not being indexed.
    http://www.wrensoft.com/zoom/support...s.html#skipped
    http://www.wrensoft.com/zoom/support...avascriptmenus
    http://www.wrensoft.com/zoom/support...spider_finding

    Indexing JSP pages is not a problem. So I think the problem is the issues above.

    Comment


    • #3
      File from a JAVA portal not being indexed

      I have a similar problem with files stored in a portal which uses JAVA to display them in the browser (the iGrafx ProcessCentral (R) Repository).
      The zoom indexer is set to spider mode and in the log I can see that it starts indexing. No error message, but the contents is not indexed properly.
      Then I launch the search but my keywords are not found. When I launch the search with with "**", there is on hit in the index and in the description text it says something like
      "...A Sun® Java™ Runtime is required. 32-bit browsers require version 1.3 or later. 64-bit..."
      Yet, the JRE is installed on my PC (on which the indexer runs as well), because when I launch the page in my browser, it displays the page properly using JAVA.
      Unfortunately, I cannot publish the link here, because this is an Intranet page.

      Here comes the log:
      10:09:32 - Start indexing (spider mode) at Thu Apr 08 10:09:32 2010
      10:09:32 - Maximum number of words: 100000
      10:09:32 - Maximum number of files: 25000
      10:09:32 - Will scan files with extensions
      10:09:32 - .htm
      10:09:32 - .html
      10:09:32 - .doc
      10:09:32 - .pdf
      10:09:32 - .xls
      10:09:32 - .ppt
      10:09:32 - .rtf
      10:09:32 - .dot
      10:09:32 - .txt
      10:09:32 - .php
      10:09:32 - Spider from: http://dnde_igrafx/webcentral/BMS_approved/?objid=1047
      10:09:32 - Web site URL: http://dnde_igrafx/webcentral/BMS_approved/
      10:09:32 - Estimated RAM required during index process: 295340 KB
      10:09:32 - Initiating HTTP session (thread #1) ...
      10:09:32 - DL Thread #1, got URL (http://dnde_igrafx/webcentral/BMS_approved/?objid=1047) off queue
      10:09:32 - [DOWNLOAD] Downloading file http://dnde_igrafx/webcentral/BMS_approved/?objid=1047
      10:09:32 - [DOWNLOAD] URL redirected to: http://dnde_igrafx/webcentral/BMS_approved/EU%20BMS/DE/Processes/ZZZ%2DTEST_PROCESSES/Linking_to_documents.igx/Schraubenzieher?objid=1519 [thread #1]
      10:09:32 - [QUEUED] Queued URL: http://dnde_igrafx/webcentral/BMS_approved/EU%20BMS/DE/Processes/ZZZ%2DTEST_PROCESSES/Linking_to_documents.igx/Schraubenzieher?objid=1519
      10:09:32 - Initiating HTTP session (thread #3) ...
      10:09:32 - Initiating HTTP session (thread #6) ...
      10:09:32 - Initiating HTTP session (thread #2) ...
      10:09:32 - Initiating HTTP session (thread #4) ...
      10:09:32 - Initiating HTTP session (thread #5) ...
      10:09:32 - DL Thread #4, got URL (http://dnde_igrafx/webcentral/BMS_approved/EU%20BMS/DE/Processes/ZZZ%2DTEST_PROCESSES/Linking_to_documents.igx/Schraubenzieher?objid=1519) off queue
      10:09:32 - [DOWNLOAD] Downloading file http://dnde_igrafx/webcentral/BMS_approved/EU%20BMS/DE/Processes/ZZZ%2DTEST_PROCESSES/Linking_to_documents.igx/Schraubenzieher?objid=1519
      10:09:33 - Index Thread got ready buffer for http://dnde_igrafx/webcentral/BMS_approved/EU%20BMS/DE/Processes/ZZZ%2DTEST_PROCESSES/Linking_to_documents.igx/Schraubenzieher?objid=1519 (Content-type: HTML text)
      10:09:33 - [INDEXED] Indexing http://dnde_igrafx/webcentral/BMS_approved/EU%20BMS/DE/Processes/ZZZ%2DTEST_PROCESSES/Linking_to_documents.igx/Schraubenzieher?objid=1519
      10:09:33 - [FILEIO] All index files will be written to: X:\search_V6\webcentral
      10:09:33 - [FILEIO] Writing index data for PHP search... (Please wait)
      10:09:33 - [FILEIO] Created pagedata data file (zoom_pagedata.zdat)
      10:09:33 - [FILEIO] Created pagetext data file (zoom_pagetext.zdat)
      10:09:33 - [FILEIO] Created pageinfo data file (zoom_pageinfo.zdat)
      10:09:33 - [FILEIO] Created spelling data file (zoom_spelling.zdat)
      10:09:33 - [FILEIO] Created dictionary data file (zoom_dictionary.zdat)
      10:09:33 - [FILEIO] Created wordmap data file (zoom_wordmap.zdat)
      10:09:33 - [FILEIO] Created script settings file (settings.php)
      10:09:33 - Indexing completed at Thu Apr 08 10:09:33 2010
      10:09:33 - INDEX SUMMARY
      10:09:33 - Files indexed: 1
      10:09:33 - Files skipped: 0
      10:09:33 - Files filtered: 0
      10:09:33 - Files downloaded: 1
      10:09:33 - Unique words found: 256
      10:09:33 - Variant words found: 25
      10:09:33 - Total words found: 266
      10:09:33 - Avg. unique words per page: 256.00
      10:09:33 - Avg. words per page: 266
      10:09:33 - Start index time: 10:09:32 (2010/04/0
      10:09:33 - Elapsed index time: 00:00:01
      10:09:33 - Peak physical memory used: 140 MB
      10:09:33 - Peak virtual memory used: 246 MB
      10:09:33 - Errors: 0
      10:09:33 - URLs visited by spider: 2
      10:09:33 - URLs in spider queue: 0
      10:09:33 - Total bytes scanned/downloaded: 8376
      10:09:33 - File extensions:
      10:09:33 - .htm indexed: 0
      10:09:33 - .html indexed: 0
      10:09:33 - .doc indexed: 0
      10:09:33 - .pdf indexed: 0
      10:09:33 - .xls indexed: 0
      10:09:33 - .ppt indexed: 0
      10:09:33 - .rtf indexed: 0
      10:09:33 - .dot indexed: 0
      10:09:33 - .txt indexed: 0
      10:09:33 - .php indexed: 0
      10:09:33 - No extensions indexed: 1
      10:09:33 - Cleaning up memory used for index data... please wait.
      10:09:34 - Finished cleaning up memory.
      10:09:34 - [FILEIO] Copied search script to: X:\search_V6\webcentral\search.php
      Last edited by forestgreen; Apr-08-2010, 08:12 AM. Reason: added index log

      Comment


      • #4
        Originally posted by forestgreen View Post
        Then I launch the search but my keywords are not found. When I launch the search with with "**", there is on hit in the index and in the description text it says something like

        "...A Sun® Java™ Runtime is required. 32-bit browsers require version 1.3 or later. 64-bit..."

        Yet, the JRE is installed on my PC (on which the indexer runs as well), because when I launch the page in my browser, it displays the page properly using JAVA.
        This is NOT a message from Zoom.

        The description text is given because that is what Zoom indexed. When Zoom requested the page to be indexed, it was just given this error message by your web server, as a web page. There is nothing to indicate this is not the actual web page content, so Zoom indexes it accordingly.

        What this most likely means is that the Spider URL you specified, goes to a page which contains nothing but a Java application and the above message. Without seeing your site, we can only presume that your whole website content is stored within this Java application.

        There is no way that a spider or any search engine we know of, would index content within a Java application. Please note that a Java application (which is essentially a program that runs within the Java environment on your computer) is very different from a JavaScript website, or a JSP website (which uses Java that executes on the server-side).

        You can't index a Java application because there is no standard method of interaction. It might require the user to click on some buttons or move the mouse before a panel of text appears. There is no way a "spider" can run through every permutation of user input to retrieve the text to be indexed.

        So to put it simply, an application like that is really not indexable and cannot be searched using an external search engine. Google will also not be able to find anything within it. If there is an alternative interface that does not require Java, then you might have a chance.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          What configurations required for Indexing Struts 2 Java portal.

          Team,

          Can you please help me to configure the Zoom Search engine to index the Struts 2 Java portal, mostly all URL's will be like "http://localhost/site/my.action?param=1".

          I tried my adding .jsp , .action to configaration. but no chance.

          Please kindly guide me a solution.

          Thanks,
          Thanks,
          Prathap Puppala

          Comment


          • #6
            Such URLs should not be any problem but we've not had any experience with Struts before. You would most likely need to add ".action" as a file extension to index (under "Configure"->"Scan options").

            If you can show us the website in question, or elaborate on what is the problem you are seeing ("no chance" doesn't tell us much - an error message, or what the "Log" tab in the Indexer reports is more useful).
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X