|
|
#include <regex.h>int regcomp(regex_t *preg, const char *pattern, int cflags);
int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[ ], int eflags);
size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
void regfree(regex_t *preg);
The structure type regex_t contains at least the following publicly available member:
Member Type | Member Name | Description |
---|---|---|
size_t | re_nsub | Number of parenthesized subexpressions |
The structure type regmatch_t contains the following member:
Member Type | Member Name | Description |
---|---|---|
regoff_t | rm_so | Byte offset from start of string to start of substring. |
regoff_t | rm_eo | Byte offset from start of string of the first character after the end of substring. |
The regcomp function will compile the regular expression contained in the string pointed to by the pattern argument and place the results in the structure pointed to by preg. The cflags argument is the bitwise inclusive OR of zero of more of the following flags, which are defined in the header <regex.h>:
The default regular expression type for pattern is a basic regular expression. The application can specify extended regular expressions using the REG_EXTENDED cflags flag.
On successful completion it returns zero; otherwise, it returns non-zero, and the content of preg is undefined.
If the REG_NOSUB flag was not set in cflags, then regcomp will set re_nsub to the number of parenthesized subexpressions (delimited by \( \) in basic regular expressions or ( ) in extended regular expressions) found in pattern.
The regexec function compares the null-terminated string specified by string with the compiled regular expression preg initialized by a previous call to regcomp. If it finds a match, regexec returns zero; otherwise, it returns non-zero indicating either no match or an error. The eflags argument is the bitwise inclusive OR of zero or more of the following flags, which are defined in the header <regex.h>.
If nmatch is zero or REG_NOSUB was set in the cflags argument to regcomp, then regexec will ignore the pmatch argument. Otherwise, the pmatch argument must point to an array with at least nmatch elements, and regexec will fill in the elements of that array with offsets of the substrings of string that correspond to the parenthesized subexpressions of pattern: pmatch[i].rm_so will be the byte offset of the beginning and pmatch[i].rm_eo will be one greater than the byte offset of the end of substring i. (Subexpression i begins at the ith matched open parenthesis, counting from 1.) Offsets in pmatch[0] identify the substring that corresponds to the entire regular expression. Unused elements of pmatch up to pmatch[nmatch-1] will be filled with -1. If there are more than nmatch subexpressions in pattern (pattern itself counts as a subexpression), then regexec will still do the match, but will record only the first nmatch substrings.
When matching a basic or extended regular expression, any given parenthesized subexpression of pattern might participate in the match of several different substrings of string, or it might not match any substring even though the pattern as a whole did match. The following rules are used to determine which substrings to report in pmatch when matching regular expressions:
* or \{ \} appears immediately after the subexpression in a basic regular expression, or
*, ?, or { } appears immediately after the subexpression in an extended regular expression, and the subexpression did not match (matched zero times)
or
| is used in an extended regular expression to select this subexpression or another, and the other subexpression matched.
If, when regexec is called, the locale is different from when the regular expression was compiled, the result is undefined.
If REG_NEWLINE is not set in cflags, then a newline character in pattern or string will be treated as an ordinary character. If REG_NEWLINE is set, then newline will be treated as an ordinary character except as follows:
The regfree function frees any memory allocated by regcomp associated with preg.
On successful completion, the regexec function returns zero. Otherwise, it returns REG_NOMATCH to indicate no match, or REG_ENOSYS to indicate that the function is not supported.
Upon successful completion, the regerror function returns the number of bytes needed to hold the entire generated string. Otherwise, it returns zero to indicate that the function is not implemented.
The regfree function returns no value.
The regerror function provides a mapping from error codes returned by regcomp and regexec to unspecified printable strings. It generates a string corresponding to the value of the errcode argument, which must be the last non-zero value returned by regcomp or regexec with the given value of preg. If errcode is not such a value, the content of the generated string is unspecified.
If preg is a null pointer, but errcode is a value returned by a previous call to regexec or regcomp the regerror still generates an error string corresponding to the value of errcode.
If the errbuf_size argument is not zero, regerror will place the generated string into the buffer of size errbuf_size bytes pointed to by errbuf. If the string (including the terminating null) cannot fit in the buffer, regerror will truncate the string and null-terminate the result.
If errbuf_size is zero, regerror ignores the errbuf argument, and returns the size of the buffer needed to hold the generated string.
If the preg argument to regexec or regfree is not a compiled regular expression returned by regcomp, the result is undefined. A preg is no longer treated as a compiled regular expression after it is given to regfree.
The following demonstrates how the REG_NOTBOL flag could be used with regexec to find all substrings in a line that match a pattern supplied by a user. (For simplicity of the example, very little error checking is done.)
(void) regcomp (&re, pattern, 0); /* this call to regexec() finds the first match on the line */ error = regexec (&re, &buffer[0], 1, &pm, 0); while (error == 0) {/* while matches found */ /* substring found between pm.rm_so and pm.rm_eo */ /* This call to regexec() finds the next match */ error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
regerror(code,preg,(char *)NULL,(size_t)0)to find out how big a buffer is needed for the generated string, malloc a buffer to hold the string, and then call regerror again to get the string. Alternately, it could allocate a fixed, static buffer that is big enough to hold most strings, and then use malloc to allocate a larger buffer if it finds that this is too small.
#include <regex.h> /* * Match string against the extended regular expression * in pattern, treating errors as no match. * * return 1 for match, 0 for no match */
int match(const char *string, char *pattern) { int status; regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED | REG_NOSUB != 0) { return(0); /* report error */ } status = regexec(&re, string, (size_t) 0, NULL, 0); regfree(&re); if (status != 0) { return(0); /* report error */ } return(1); }