Here's how the PRS Document Indexer manages search text:
Capitalization
The Document Indexer doesn't distinguish between uppercase letters and lowercase letters. A search for HoLiDay will return all documents that contain the word holiday or Holiday.
Words and Punctuation
The Indexer treats every documents as a sequence of terms. A term in this context is any string of letters and digits delimited either by punctuation, non alphanumeric characters or white space (spaces, tabs, ends of lines).
To be a word, a string does not have to be spelled correctly or be included in any dictionary. All that is required is that someone typed it as a single word in a document. Thus, the following are words if they appear delimited in a document: 300ZX, 602e21, WWW, HTTP.
In some common constructs non alphanumeric characters are included in the term, the following examples are treated as single terms:
prshq.com
support@prshq.com
U.S.A
AT&T
25.4
Leading a trailing punctuation is always stripped so that C++ and .NET are stored as c and net.
Phrases
A phrase is a string of words that are contiguous in a document, although they may be separated by any amount of white space or punctuation. They do not have to make sense grammatically; they just have to occur in a document as a contiguous sequence of words. For example:
President of the U.S.A. (4-word phrase)
http://www.election.digital.com (2-word phrase)