There are several APIs for Regular Expression. However, there is no standard interface for those APIs ( JSR-051 is one of ongoing standardization process for that). The pnuts::regex module defines a common interface to use Regular Expression APIs at script level.
To select a particular Regex API, load a script file corresponds to the API after this module is registered with use() function. ("pnuts::regex" is already registered when you start the interpreter with the pnuts command.)
e.g.
use("pnuts::regex") // usually doesn't need load("pnuts/regex/regex4j") // pick IBM regex4j ...
Currently, the following APIs can be used with this module.
API Script name Apache jakarta-oro pnuts/regex/jakarta-oro Apache jakarta regexp pnuts/regex/jakarta-regexp IBM regex for Java pnuts/regex/regex4j GNU regexp pnuts/regex/gnu-regexp
When a standard interface for Regular Expression API comes in the future, the way to choose a particular implementation may change.
|
match() checks if the String input includes a regular expression regex and returns the result. The syntax of regular expression depends on a particular regex API.
e.g.
match("a*b", "aaaaabbb") => true
|
getMatch(0) returns the previously matched string. getMatch(n) returns the string that matches the Nth regex group.
e.g.
match("(a*)b(c*)", "aaabcc") => true getMatch(0) => "aaabcc" getMatch(1) => "aaa" getMatch(2) => "cc"
|
getMatchStart(0) returns the start index of previously matched string. getMatchStart(n) returns the start index of string that matches the Nth regex group.
e.g.
match("(a*)b(c*)", "aaabcc") => true getMatchStart(0) => 0 getMatchStart(1) => 0 getMatchStart(2) => 4
|
getMatchEnd(0) returns the end index of previously matched string. getMatchEnd(n) returns the end index of string that matches the Nth regex group.
e.g.
match("(a*)b(c*)", "aaabcc") => true getMatchEnd(0) => 6 getMatchEnd(1) => 3 getMatchEnd(2) => 6
|
getNumberOfGroups() returns the number of parenthesized subexpressions available after a successful match. The result refers to the number of parenthesized subgroups plus the entire match itself.
The result of this function is undefined if it is called after a unsuccessful match.
|
When a string is specified as the 2nd parameter, substitute() replaces the string which matches the regex with the string replacement.
When a function with zero argument is specified as the 2nd parameter, substitute() replaces the string which matches the regex with the result of the function call.
e.g.
substitute("[a-z]+", function () getMatch(0).toUpperCase(), "aBcDe") => "ABCDE"
|
split() tokenizes a string with the regular expression regex as the delimiter. It returns an array of the resulting tokens.
e.g.
split(`\.`, "a.b.c") => ["a", "b", "c"]