regcmp(S-osr5)
regcmp, regex --
compiles and executes regular expressions
Syntax
cc . . . -lc
#include <libgen.h>
char *regcmp (string1 [, string2, . . . (char *)0)
char *string1, *string2, . . .
char *regex (re, subject, [ ret0, . . . ])
char *re, *subject, *ret0, . . .
extern char *__loc1;
Description
The regcmp routine compiles a regular expression
(consisting of the concatenated arguments)
and returns a pointer to the compiled form.
The
malloc(S-osr5)
routine
creates space for the compiled form.
It is the user's responsibility to free unneeded space so allocated.
A
NULL
return from
regcmp
indicates an incorrect argument.
regcmp(CP)
has been written to generally preclude the need
for this routine at execution time.
The regex routine
executes a compiled pattern against the subject string.
Additional arguments are passed to receive values back.
regex
returns NULL on failure
or a pointer to the next unmatched character on success.
A global character pointer
__loc1
points to where the match began.
The regex and regcmp routines were
borrowed from the editor,
ed(C);
however, the syntax and semantics have been changed slightly.
The following are the symbols understood by regex
and regcmp, and their meanings.
[]*.^-
These symbols retain their meaning in
ed(C).
$-
Matches the end of the string; \n matches a new-line.
--
Within brackets the minus means through.
For example, [a-z] is equivalent to [abcd...xyz].
The ``-'' can appear as itself only if used as the
first or last character.
For example, the character class expression []-]
matches the characters ] and -.
+-
A regular expression followed by ``+'' means
one or more times.
For example, [0-9]+ is equivalent to [0-9] [0-9].
{m} {m,} {m,u}-
Integer values enclosed in ``{}'' indicate the
number of times the preceding regular expression is to be applied.
The value m is the minimum number and u
is a number, less than 256, which is the maximum.
If only m is present (for example, {m}),
it indicates the exact number of times the regular
expression is to be applied.
The value {m,} is analogous to {m,infinity}.
The plus (``+'') and star (``'') operations are
equivalent to {1,} and {0,} respectively.
( ... )$n-
The value of the enclosed regular expression is to be returned.
The value is stored in the (n+1)th
argument following the subject argument.
At most ten enclosed regular expressions are allowed.
regex makes its assignments unconditionally.
( ... )-
Parentheses are used for grouping.
An operator, for example,
``'', ``+'', ``{}'',
can work on a single character or a regular
expression enclosed in parentheses.
For example, (a(cb+))$0.
By necessity, all the above defined symbols are special.
They must, therefore, be escaped with a \ (backslash)
to be used as themselves.
Notes
The user program may run out of memory if regcmp
is called iteratively without freeing the vectors that are no
longer required.
See also
ed(C),
free(S-osr5),
malloc(S-osr5),
re_comp(S-osr5)
Standards conformance
regcmp, regex and __loc1
are not part of any currently supported standard;
they are an extension of AT&T System V provided by the Santa Cruz Operation.
Examples
Example 1:
char *cursor, *newcursor, *ptr;
...
newcursor = regex((ptr = regcmp("\n", (char *)0)), cursor);
free(ptr);
This example matches a leading new-line in the subject string
pointed at by cursor.
Example 2:
char ret0[9];
char *newcursor, *name;
...
name = regcmp("([A-Za-z][A-za-z0-9]{0,7})$0", (char *)0);
newcursor = regex(name, "012Testing345", ret0);
This example matches through the string ``Testing3'' and returns
the address of the character after the last matched character (the ``4'').
The string ``Testing3'' is copied to the character array ret0.
Example 3:
#include "file.i"
char *string, *newcursor;
...
newcursor = regex(name, string);
This example applies a precompiled regular expression in file.i
against
string.
See
regcmp(CP).
Example 4:
char *ptr, *newcursor;
ptr = regcmp("[a-[=i=][:digit:]]*",(char*)0);
newcursor = regex(ptr, "123CHICO321");
It is assumed in this example that the current locale's collation
rules specify the following sequence:
A,a,B,b,C,c,CH,Ch,ch,D,d,E,e,F,f,G,g,H,h,I,i.....
The characters I and i are also both in the same
``primary'' collation group.
The following characters are all members of the digit ctype
class:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
This example matches through the string ``123CHIC'' and returns the
address of the character ``O'' in the string.
© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005