Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Wordpress. Including only posts

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wordpress. Including only posts

    Hi,

    When I use Zoom Pro it indexes the wordpress blog, but it attempts to index categories and archives as well. Is there a way to tell it to -only- index the posts (ie. with a URI eg.. /blog/?p=1234 ) and skip /blog/?c=Songs or /blog/?m=2011010

    You get the idea, right?

    TIA,

    ---JC

  • #2
    Perhaps you can add something like these to your skip page list ("Configure"->"Skip options"),

    ?c=Songs
    &c=Songs
    ?m=
    &m=

    Note that parameters in the URL (known as HTTP GET parameters) sometimes appear as either ?name=value or &name=value depending on whether they are the first parameter or not.

    Does that help you skip the pages you want?

    There's no specific way of saying "index only pages with URL in this format". You can however specify a list of individual pages to index (by clicking "More" and adding start points one at a time with "Index single page only" as the spider option). But I think it'd be easier to work out a way to skip the unnecessary pages.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Hi again,

      I tried

      ?cat=
      ?m=

      ...the problem is that if you do that, the content is not properly spidered. IOW: Zoom needs to have those ?cat= statements in order to fetch every post. (Which makes sense. It needs -some- 'sitemap' which contains a list of every post.)

      So I get all the posts in the index (good) but the output shows multiple times, one for the actual post and then one time for each category to which it belongs.

      I tried using that option to remove duplicates (CRC check?) but that didn't help.

      Any ideas? Is there a way to give ZI a 'sitemap' to pull from and say 'just index these specific URLs? What I want is

      Search for every post. Which is...

      http://jchmusic.com/wordpress/?=....

      TIA,

      ---JC

      Comment


      • #4
        Looking at your blog, I'm not seeing why it would need the ?cat= URLs to find every post. It seems to me that it should only need to follow the "Next page" link (e.g. "?paged=2").

        But this will probably still mean that you'll get the page listing your posts chronologically in addition to the actual individual post page. Similar to your category and actual post problem.

        One way to avoid this is to add meta robots tag which specify "noindex" for those "listing" pages. It will still follow the links on them. Or you can wrap them with HTML comment tags, <!--ZOOMSTOPFOLLOW--> and <!--ZOOMRESTARTFOLLOW--> which would have a similar effect.

        Ideally a "listing" of all the posts would be best, yes. I think there are wordpress plugins for that.

        It doesn't really work to just "search" for every post, it would have to guess every number in a range. And if not, it's still finding every link from something like the categories page, which mean the above problem would still persist.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Thanks. I created a google sitemap of the blog posts, thinking I could use -that- as the 'Starting Point' for ZI, but it didn't work.

          http://www.jchmusic.com/wordpress/sitemap.xml

          Why not? Shouldn't that give ZI a valid place to spider from? Or do I need a 'real' HTML page?

          ---JC

          Comment


          • #6
            Zoom does not currently get links out of a XML sitemap (although it does provide the option to generate one). We're adding that into V7.

            But yes, in the current version you need a HTML page sitemap.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X