As a disclaimer, keep in mind that this post is covering regex-directed engines, not text-directed engines.
Regex-directed engines are "eager" in that they will return the first possible regex match any given regex can afford the engine to return. In other words, the engine is "eager" to provide us with the first possible match that a regex provided. The way this works out in practice, is that the left-most tokens in a regex that constitute a valid match will be returned before any other tokens of that regex are considered. Take for example the regex
Ok|Okaythat is matched to the String "Okay". The returned match will be "Ok", even though it would appear that the second token of the regex, Okay, would have been a better fit. The engine was eager to match the regex to the provided String, therefore, moving left-to-right, as soon as the engine found a proper match of the regex token Ok with the first two letters of "Okay", the engine returned a match!
The above example is one of "alternation", or the use of the "|" character, informing the regex engine that one alternative or the other is an acceptable match for the regex. In the case of alternation, there are two ways to protect against the engine being "overly eager" with your matches.
- Order your alternatives in such a way that defeats the engines eager nature
- Okay|Ok will evaluate "Okay" before ever evaluating "Ok"
- Utilize the "optional item" metacharacter "?", with grouping as necessary
- Ok(ay)? will first evaluate "Okay", and if it is not found, evaluate a match for "Ok" next.
With eagerness covered, in my next post, I will turn my attention to another personification of regex-directed regular expression engines: greediness.
No comments:
Post a Comment