PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Seeking developer comment

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seeking developer comment

    Hello folks,

    I wonder if you can comment on the use of Zoom Search to provide OpenSearch results. I have been really excited about this aspect of your program, but would like a response from you guys from a post by a prominent biologist in his blog:

    http://ispecies.blogspot.com/2006/12...ic-spider.html

  • #2
    2 ideas...

    Was just thinking about that biologist's suggestion and offer two of my own:

    1. expose Zoom Search's RSS/XML output to php instead of just cgi
    2. develop "plug-ins" of sorts such that page developer tags something like:

    <html>
    <zoom_search type: .... />
    <zoom_search subject: ... />
    etc.
    </html>

    could be lifted from the spider to create RSS 1.0 (RDF) Dublin Core or other standardized elements.

    Dave

    Comment


    • #3
      The thrust of the bloggers thread was that RSS1.0 with RDF was a better choice of protocol than RSS2.0. And that his wrapper of Yahoo Image Search provided more information and provided it in a more structured way.

      I posted a comment on the bloggers thread. Here is a copy.

      -=-=-=-=-=-=-=-

      A few points regarding your comments about the Zoom Search Engine XML output.

      1) The Zoom output format was selected to be compatible with the main existing Opensearch aggregator, A9.
      From my reading it seems that A9.com supports both RSS 2.0 and Atom 1.0. (Not RSS1.0 / RDF as you suggest as a preferred format)

      2) In the example you gave, there really doesn't appear to be a great deal of additional information in your RSS1.0 example. There is a title, subject, link. The additional thumbnail and source information only applies to image searches. But you didn't search for an image in Zoom. Plus Zoom returned additional information. Terms matched, score & page context. So the claim of very little infomation being returned is incorrect. For the purposes of providing search results, Zoom produces much more useful information.

      3) It isn't only a protocol format issue. Zoom can only return the information stored in its search index. For example, Zoom doesn't store a 'subject', but does store a 'description'.

      4) I don't see any reason why RSS2.0 would be harder to 'consume' than RSS1.0. In fact I would suggest the simplier RSS2.0 format is easier to deal with.

      5) The XML output was not designed to be directly read by humans. Zoom has HTML output for that. It was designed to be easy to parse (not scrape) by scripting languages and aggregators.
      -=-=-=-=-=-=-=-

      Comment


      • #4
        Thanks for addressing the bloggers discussion. He posted another message to praise your work and to clarify exactly what he's after with RSS/xml outputs. I copy it here for easier access, but it available in its original form here: http://www.blogger.com/comment.g?blo...07514354753306

        Let me respond by saying that it's not my intention to criticise Zoom. It's a cool product. My point was that -- if the ultimate goal is data integration -- it's not enough to provide an easy means to search, or even to return results in a standard format. By way of background, my goal in this area is to advocate using tools from the Semantic Web community. For background see my Semant blog.

        Now, point by point:

        1) Yes, A9 doesn't support RSS 1.0, which in my opinion is an unfortunate choice on their part.

        2) I disagree. The extra information you refer to relates to the search (e.g., score) which, while useful, isn't what I'm after. What my small example gave was a description of an image using an established vocabulary (FOAF) so that others with images described in the same way can aggregate that information. If I wanted, I could extend it further by extracting metadata data from the image itself. Why is this useful? Well again it avoids having a human do this (see Copyright on images for more on this).

        4) Depends what the goal is. My goal is to aggregate information from diverse sources and query it. To do this I need several things, such as consistent identifiers (Globally Unique Identifiers or GUIDs), consistent vocabularies (for example, for basic metadata something like Dublin Core, for people FOAF, for publications PRISM, etc.), and tools for storing and querying this information (such as triple stores and languages such as SPARQL). In other words, the Semantic Web.

        One way to think about this is to ask "who is going to make use of your feed?" If the answer is "people can view it in a feed reader, or add it as a source to A9 and look at the results" then RSS 2.0/Atom is fine. But you want computers to consume the feed and be able to merge it with other feeds and make inferences, then I suggest we need RSS 1.0 and RDF (at the very least, this makes things a lot easier).

        5) Yes, XML is not designed to be directly by humans, but that doesn't necessarily make it easier for computers to handle it. For example, if two different XML sources use different vocabularies to describe an image, how are we supposed to merge those two documents?

        I hope this clarifies why I made the comments I did. Zoom provides a nice tool for searching a web site and providing results in a standard, accessible form. It's just that I want more than that, and by adding just a little more (i.e., RDF), the potential payoff becomes so much greater. Now, the Semantic Web may well be outside the area that you guys want to get into, and given the chasm between the hype and the current reality, that may be sensible. But, I think there's great potential there. Imagine a product that makes it easy for users to aggregate search results from different sites. Not just displaying them like A9, but integrating them. Some people seem to think there's money in this (see Tales of a Semantic Web Consultancy).

        Comment


        • #5
          In my opinion the Semantic Web is just an academic pipe dream. And not a practical proposition at this time.

          The majority of information on the web is unstructured. HTML pages, PDF files & Word documents without any useful meta data most of the time.

          At the present time software is just not sophisticated enough to take arbitrary text and interpret the meaning. Current software would be lucky to even correctly extract the authors name from a Word document. Let along get any deeper meaning and structure.

          So in the absense of software that can extract meaing from text, creating highly structured output is only possible with structured input. Which we don't have.

          I still remember being told 20 years ago that (EDI / X12) would replace all other methods of exchanging data. It has since died a slow death except in very narrow market segments. I have no reason to think that RDF will do any better.

          We build what people ask for. Lots of people ask for XML. Quiet a few ask for Opensearch compatibility. Until now no one has been asking for RSS 1.0.

          Imagine a product that makes it easy for users to aggregate search results from different sites.
          Yes, we did. We built and released the product a couple of weeks ago. You can download a demo of Zoom MasterNode here.

          Comment

          Working...
          X