A Portable C++ Regular Expression Facility - Regex(C++)

More on matching

Sometimes it is not enough to know whether a match succeeded or not; sometimes we must also know the location and length of the matching substring. Getting this information is easy.

The value returned by match is actually not a simple boolean, but instead an instance of the following class:

       class Substrinfo {
           int i;       // index of matching substring
           size_t len;  // length of matching substring
           // ...


       Regex r("(frob)|(baz)");
       Substrinfo ss = r.match("foobazbar");
           // ss.i = 3, ss.len = 3

The reason why we could earlier treat match as returning simply true or false is because there is a void* conversion and an operator! defined on Substrinfo. Thus, testing against the return value of match will automatically convert the returned Substrinfo to the appropriate boolean value. In the following code, the two if-tests are equivalent:

       Regex r("(frob)|(baz)");
       Substrinfo ss = r.match("foobazbar");
       if (ss) ...  // true
       if (r.match("foobazbar")) ...  // true

If a match is unsuccessful, the index and length of the returned Substrinfo are -1 and 0, respectively.

       Regex r("(frob)|(baz)");
       Substrinfo ss = r.match("blech");
           // ss.i = -1, ss.len = 0
       if (ss) ...  // false

If the target string has more than one matching substring, the one ``found'' by match is the leftmost longest one.

       Regex r("(baz)+");
       Substrinfo ss = r.match("foobazbazbar");
           // ss.i = 3, ss.len = 6
       Substrinfo ss = r.match("bazfoobazbaz");
           // ss.i = 0, ss.len = 3
       Substrinfo ss = r.match("bazbazfoobaz");
           // ss.i = 0, ss.len = 6

Henceforth when we speak of ``the matching substring,'' we will mean ``the leftmost longest matching substring.'' If the user wishes to find all the matching substrings in the target, the Regexiter class, described below, can be used.

Once we have the position and length of the matching substring, we can, if desired, construct the matching substring itself.

       Regex r("(baz)+");
       const char *target = "foobazbazbar";
       Substrinfo ss = r.match(target);
           // ss.i = 3, ss.len = 6
       // matching substring
       String mss(target+ss.i, ss.len);  // "bazbaz"

However, since this operation is so common, the library provides a way of doing it automatically. Specifying an optional String argument in the call to match will, if the match is successful, cause that String to be assigned the matching substring.

       Regex r("(baz)+");
       String mss;
       r.match("foobazbazbar", mss);  // mss = "bazbaz"

The reason why the library does not always provide the matching substring (in the returned Substrinfo) is to obey the Locality of Cost principle: programs that do not want the matching substring should not have to pay to construct it.

In fact, the above example illustrates a general convention of the regular expression library:

   Any function which returns a Substrinfo also accepts
   an optional final String argument which, if present,
   will be assigned the substring denoted by the returned

Keep this convention in mind in the examples below.

Next topic: Subexpressions
Previous topic: Invalid patterns

© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005