Searching with Tokens

The search system indexes tokens instead of whole property values.

Tokens are individual chunks of a property value that allow the system to locate files based on pieces of information. This type of searching allows for quicker search results and reduces the impact to the system when searching vaults containing thousands of property values.

Knowing how property values are broken down into tokens allows users to search for tokens only. This reduces the impact to the system when returning multiple results by letting users refine their search. Now when a user performs the same search for DES, only the files with the author's name of DES are returned. The results do not include the Design00.idw file because the first token is Design, whereas the author's initials of DES are a complete token.

How Property Values are Broken into Tokens

All adjacent characters of like type are grouped into a single token. Like type is alphabetic (A, B, C, ...Z), numeric (0,1,2,..9), or special punctuation (-,_,@...$).

Only six punctuation characters are searchable:

All other punctuation are not searchable and are not contained in tokens.

This table shows how three different file names would be broken down into tokens.

File Name Tokens

A-055401-321.ipt

  • A
  • -
  • 055401
  • -
  • 321
  • ipt

Great White Shark.doc

  • Great
  • White
  • Shark
  • doc

Gr8work.xls

  • Gr
  • 8
  • work
  • xls

Searching with Tokens

This ability to specify search values based on tokens allows more latitude when constructing searches. A user can append a wildcard for broader searches or specify a token for more refined returns.