ACTION DESIGNER SEARCHING FEATURES As of 4/20/92 The regular expression notation used by the searching routine is very similar to the standard notation defined by the UNIX editor ed. A regular expression (RE) specifies a set of character strings. A substring in the searched string is said to be matched by the RE if the substring is one of the character strings allowed by the RE. If you have never used regular expressions before, the notation is a bit arcane. You'll learn to love it. I won't give up my Brief editor, even under Windows, until I find an editor with comparable RE capability. SIMPLE REGULAR EXPRESSIONS ORDINARY CHARACTERS One kind of ordinary character is a one-character RE that matches itself. The second kind of ordinary character is a two-character expression that indicates that what would otherwise be a meta-character -- the special characters described below -- is to be treated as an ordinary character. The backslash is used as the first character in this sequence. BACKSLASH (\) A backslash (\) followed by a meta-character is an RE that makes the metacharacter into an ordinary character. PERIOD A period (.) is a one-character RE that matches any character. CHARACTER CLASS A non-empty string of characters enclosed in square brackets ([]) is an RE that matches any one character in that string. [STV] will match either an S or a T or a V. The following characters have special meanings within the square brackets. ^ If the first character of the string is a circumflex (^), the RE matches any character except what the RE would otherwise match. The ^ has this special meaning only if it occurs first in the string. - The minus (-) may be used to indicate a range of consecutive ASCII characters. For example, [0-9] is equivalent to [0123456789]. The - loses this special meaning in the following cases: if the - occurs first the - occurs after an initial ^ the - occurs last in the string. if the - is the first character after a range. For example, [0-9-a] would be matched by any of the digits from 0 thru 9, by a dash, or by an a. if the - is the terminating character of a range. For example, [+--a-z] would be matched by any of the characters in the range + through - or in the range a through z. ] The right square bracket (]) does not terminate a string when it is the first character within it or after an initial ^. E.g., []a-f] matches either a right square bracket (]) or one of the letters a through f. THE POSITIONAL REGULAR EXPRESSION The positional regular expression is used to indicate where in a line of text an operation is to occur. It is indicated by angle brackets <> enclosing one or more numbers. For example: <0> is an RE that matches the null string at position 0, the beginning of the string. <0,5,10> is an RE that matches the null string at position 0, or the null string at position 5, or the null string at position 10. ~ End Of Line Specification: If the position is preceeded by a tilda (~), then the position is measured from the end of the string. <~0> matches the null string at the end of the string. <~4> matches the null string at position 4 counting from the end of the string. - Range Specification: If two positions are separated by a minus (-), a range of positions is used. <0-5> matches any of the null strings at positions 0 through 5, <5-~5> matches any null string from position 5 counting from the beginning to position 5 counting from the end. In a range specification, the second position specified must not occur before the first position specified. <5-~5> will always fail to match in a string of 9 characters or less, since 5 positions from the beginning occurs after 5 positions from the end. <~0-~5> always fails. <~5-~0> is correct. COMPLEX REGULAR EXPRESSIONS The following rules may be used to construct REs from other REs: * An RE followed by an asterisk (*) is an RE that matches zero or more occurrences of the RE. ab(ba)*cb searches for all occurences of ab followed by zero or more occurences of ba followed by cb. The patterns abbacb, abbabacb, abbabababacb, and abcb would all be treated as matching this RE. ab(ba)* searches for all occurrences of ab followed by zero or more occurrences of ba. If more than one sequence of ba follows an ab in the text, the match will be made to the entire sequence. (ba)* will always match the beginning of the string. + An RE followed by a plus (+) is an RE that matches one or more occurrences of the RE. ab(ba)+ searches for all occurences of ab followed by one or more occurrences of ba. If more than one sequence of ba follows an ab in the text, e.g., abbababa, the match will be made to the entire sequence. {} Replication Counts: An RE followed by {m}, {m,}, {,n} or {m,n} is an RE that matches a range of occurrences of the RE. The values of m and n must be non-negative integers. {m} indicates exactly m occurrences of the RE. {m,n} If m is less than n, then {m,n} indicates at least m occurrences of the RE and no more than n occurrences. In cases where the RE occurs more than the minimum number of times specified by m, the match will be made to the minimum number. ab(ba){2,4}: if abbabababa is found, the match will be made to abbaba. If m is greater than or equal to n, then {m,n} indicates at least n occurrences of the RE and no more than m occurrences. In cases where the RE occurs more than the minimum number of times specified by n, the match will be made to the longest sequence up to and including the maximum number specified by m. ab(ba){4,2}: if abbabababa is found, the match will be made to the entire sequence. {m,} is equivalent to {m,infinity} and {,n} is equivalent to {infinity,n}. Consequently, * and + are equivalent to {,0} and {,1} respectively. $ Assignment: An RE followed by $c where c is a letter matches whatever the RE alone would match. (Upper and lower case are equivalent.) The expression where c is a letter is an RE which matches whatever value is assigned to the character c. If no previous assignment has been made, then it matches the null string in any position. | Alternation: REs separated by a vertical bar (|) form an RE that will be matched by strings in the text that match any of the REs that make up the complex RE. (s|x|z) will be matched by either an s, an x, or a z. () Grouping: An RE enclosed within parentheses is equivalent in terms of what matches it to the same RE without the parentheses. CONCATENATION REs may be concatenated together to form a single RE that will be matched by the concatenation of the strings that matched the previously separate REs. PRECEDENCE The suffix operators *, +, {}, and $, have the highest precedence. Concatenation has next highest precedence. Alternation, |, has the lowest precedence. The order of operation may be modified by grouping with parentheses.