Help

Quick Help - Problems findings something? Read this first...

1) Check your spelling. If you are not sure of the spelling you can use the Fuzzy Logic and / or Phonetic search options.

2) You can expand your search by switching on the Stemming and / or the Synonyms options.

3) All noise words are excluded from the search eg: and, is, or, the.

4) Some sites cannot be indexed because of the way they are designed. Perhaps they require registration or a login, they might have a flash animation as an entry point, or other scripting techniques which prevent a site from being indexed.

5) Not all websites allow themselves to be indexed. The Scannery obeys the instructions of each websites robots.txt. For example IBM (www.ibm.com) and Proctor & Gamble (www.pg.com), amongst others, do not allow their websites to be indexed.

6) Not all companies have websites, or we have not found them yet ...!



Search Requests Overview

The Scannery supports two types of search requests:

An any words or "natural language" search is any sequence of text, like a sentence or a question. In an "any words" search you can optionally put + in front of any word or phrase that is required and - in front of a word or phrase to exclude it. Examples:

banana pear "apple pie"

"apple pie" -salad +"ice cream"

A boolean search request consists of a group of words or phrases linked by connectors such as and and or that indicate the relationship between them. Examples:

apple and pear

Both words must be present

apple or pear

Either word can be present

apple w/5 pear

Apple must occur within 5 words of pear

apple not w/5 pear

Apple must not occur within 5 words of pear

apple and not pear

Only apple must be present

name contains smith

The field name must contain smith

If you use more than one connector, you should use parentheses to indicate precisely what you want to search for. For example, apple and pear or orange juice could mean (apple and pear) or orange, or it could mean apple and (pear or orange).

Noise words, such as if and the, are ignored in searches.

Search terms may include the following special characters:

?

Matches any single character. Example: appl? matches apply or apple.

*

Matches any number of characters. Example: appl* matches application

~

Stemming. Example: apply~ matches apply, applies, applied.

%

Fuzzy search. Example: ba%nana matches banana, bananna.

#

Phonic search. Example: #smith matches smith, smythe.

&

Synonym Searching. Example: &fast matches fast, quick, speed.

:

Variable term weighting. Example: apple:4 w/5 pear:1



Words and Phrases

Use quotation marks to search for the exact phrase.  You can use a phrase anywhere in a search request. Example:

apple "fruit salad"

If a phrase contains a noise word The Scannery search will skip over the noise word when searching for it. For example, a search for statue of liberty would retrieve any document containing the word statue, any intervening word, and the word liberty.

Punctuation inside of a search word is treated as a space. Thus, can't would be treated as a phrase consisting of two words: can and t. 1843(c)(8)(ii) would become 1843 c 8 ii (four words).



Wildcards (* and ?)

A search word can contain the wildcard characters * and ?. A ? in a word matches any single character, and a * matches any number of characters. The wildcard characters can be in any position in a word. For example:

appl* would match apple, application, etc.
*cipl* would match principle, participle, etc.
appl? would match apply and apple but not apples.
ap*ed would match applied, approved, etc.

Use of the * wildcard character near the beginning of a word will slow searches somewhat.



Natural Language Searching

A natural language search request is any combination of words, phrases, or sentences. After a natural language search The Scannery sorts retrieved documents by their relevance (sort by SCORE) to your search request. Weighting of retrieved documents takes into account: the number of documents each word in your search request appears in (the more documents a word appears in, the less useful it is in distinguishing relevant from irrelevant documents); the number of times each word in the request appears in the documents; and the density of hits in each document. Noise words and search connectors like NOT and OR are ignored.



Synonym Searching

Synonym searching finds synonyms of a word in a search request. For example, a search for fast would also find quick. You can enable synonym searching for all words in a request or you can enable synonym searching selectively by adding the & character after certain words in a request. Example: fast& w/5 search.



Fuzzy Searching

Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors, or for text that has been scanned using optical character recognition (OCR). There are two ways to add fuzziness to searches:

  1. Check the "Fuzzy searching" box to enable fuzziness for all of the words in your search request.
  2. You can also add fuzziness selectively using the % character. The number of % characters you add determines the number of differences The Scannery will ignore when searching for a word. The position of the % characters determines how many letters at the start of the word have to match exactly. Examples:
    • ba%nana Word must begin with ba and have at most one difference between it and banana.
    • b%%anana Word must begin with b and have at most two differences between it and banana.


Phonic Searching

Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smithe and Smythe.

To ask The Scannery to search for a word phonically, put a # in front of the word in your search equest. Examples: #smith, #johnson

You can also check the Phonic searching box in the search form to enable phonic searching for all words in your search request. Phonic searching is somewhat slower than other types of searching and tends to make searches over-inclusive, so it is usually better to use the # symbol to do phonic searches selectively.



Stemming

Stemming extends a search to cover grammatical variations on a word. For example, a search for fish would also find fishing. A search for applied would also find applying, applies, and apply. There are two ways to add stemming to your searches:

    1. Check the Stemming box in the search form to enable stemming for all of the words in your search request. Stemming does not slow searches noticeably and is almost always helpful in making sure you find what you want.
    2. If you want to add stemming selectively, add a ~ at the end of words that you want stemmed in a search. Example: apply~


Variable Term Weighting - Sorting by SCORE

When The Scannery sorts search results after a search, by default all words in a request count equally in counting HITS. However, you can change this by specifying the relative weights for each term in your search request, like this:

apple:5 and pear:1

This request would retrieve the same documents as apple and pear but The Scannery would weight apple five times as heavily as pear when sorting the results.

In a natural language search, The Scannery automatically weights terms based on an analysis of their distribution in your documents. If you provide specific term weights in a natural language search, these weights will override the weights The Scannery would otherwise assign.



AND Connector

Use the AND connector in a search request to connect two expressions, both of which must be found in any document retrieved. For example:

apple pie and poached pear would retrieve any document that contained both phrases.

(apple or banana) and (pear w/5 grape) would retrieve any document that (1) contained either apple OR banana, AND (2) contained pear within 5 words of grape.



OR Connector

Use the OR connector in a search request to connect two expressions, at least one of which must be found in any document retrieved. For example, apple pie or poached pear would retrieve any document that contained apple pie, poached pear, or both.



Integrating The Scannery into your website

You can seamlessly integrate this search engine into your own website or application by passing your search request to our search engine via this url:

http://www.thescannery.com/cgi-bin/vfpweb.dll/thescannery/search_url?userid=X&words=Y&country=Z&consolidate=N

Where:

1) userid=X - a user id unique to your website or application. Simply send us an email and we will create one for you.

2) words=Y - the word or phrase you want to search for. You would create a textbox on your site where your users could enter the search words required.

3) country=Z - a country or group code eg ZA, UK, US, JP, SP 500, etc. We will send you the complete list when we create a user id for you.

4) consolidate=N - Y/N or y/n. This is optional and defaults to "n". Use this to specify whether the search results should be consolidated by company website so that all links which point to the same website are grouped together.

Send us an email and we will create a user id for you and send you a few samples of how you can incorporate The Scannery into your own website and applications.