Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Searching for hyphenated words

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Searching for hyphenated words

    I'm trying to understand a search issue I'm having with Zoom.

    Our pdfs include paper numbers like "IPC1996-1800".

    I'm doing a fielded search using .desc files to populate a "paper number" field. That search appears to work fine i.e., a search for "IPC1996" returns all pdfs with paper numbers starting with "IPC1996", while a search for "IPC1996-1800" returns just that pdf. I know I *cannot* search for "IPC1996*" here because we're using the JavaScript platform and it's a simple string search/match.

    If I search in full text, a search for "IPC1996-1800" returns that paper. If I search for "IPC1996*", I get all pdfs with paper numbers starting with "IPC1996". However, if I search for just "IPC1996", like I do in the paper number field, I get 0 hits.

    I've extracted the text from all of the pdfs, and the term "IPC1996-1800" does exist in the extracted text as well as in the .desc files.

    In my cfg indexing word rules, I have hyphens checked under "Allow the following characters to join words".

    Does that FORCE "IPC1996-1800" to be treated as a complete word so that you cannot search for individual parts without the wild card?

  • #2
    Originally posted by freezeb View Post
    I know I *cannot* search for "IPC1996*" here because we're using the JavaScript platform and it's a simple string search/match.
    The JavaScript platform actually does support wildcards. Have you tried this and it didn't work?

    It's exact phrases that JS doesn't support.

    Unless you're referring to the Custom Meta Field search here. In which case, none of the platforms do wildcard matching here.

    The match options available for a Custom Meta Field (of data type "Text") include "Exact match" and "Partial (substring) text match". So if you want the behaviour above, you can select the latter when configuring your custom meta fields.

    Originally posted by freezeb View Post
    In my cfg indexing word rules, I have hyphens checked under "Allow the following characters to join words".

    Does that FORCE "IPC1996-1800" to be treated as a complete word so that you cannot search for individual parts without the wild card?
    Yes. That's the purpose of word joining.

    If you disable the word join, then it will be considered two separate words, "IPC1996" and "1800" and searching for either will return results. In this case, searching for "IPC1996-1800" will also match BUT it may be matching other occurrences of "1800" and "IPC1996". Selecting "match all search words" instead of "any search words" will help narrow it.

    Given that you are already using .desc files to add meta fields, then you can add a tag like:

    <meta name="ZOOMWORDS" content="IPC1996" />

    And this will be indexed as part of the searchable full text content of the page.

    Hope that helps.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      That explains it. And, yes, we are using custom meta fields. Thanks for the info and the tip on how to make it work.

      Comment

      Working...
      X