The pnuts::regex Module

There are several APIs for Regular Expression. However, there is no standard interface for those APIs ( JSR-051 is one of ongoing standardization process for that). The pnuts::regex module defines a common interface to use Regular Expression APIs at script level.

To select a particular Regex API, load a script file corresponds to the API after this module is registered with use() function. ("pnuts::regex" is already registered when you start the interpreter with the pnuts command.)

e.g.
use("pnuts::regex")         // usually doesn't need
load("pnuts/regex/regex4j") // pick IBM regex4j 
...

Currently, the following APIs can be used with this module.

APIScript name
Apache jakarta-oro pnuts/regex/jakarta-oro
Apache jakarta regexp pnuts/regex/jakarta-regexp
IBM regex for Java pnuts/regex/regex4j
GNU regexp pnuts/regex/gnu-regexp

When a standard interface for Regular Expression API comes in the future, the way to choose a particular implementation may change.

match (String regex, String input)

match() checks if the String input includes a regular expression regex and returns the result. The syntax of regular expression depends on a particular regex API.

e.g.
match("a*b", "aaaaabbb")  => true

getMatch (int index)

getMatch(0) returns the previously matched string. getMatch(n) returns the string that matches the Nth regex group.

e.g.
match("(a*)b(c*)", "aaabcc")  => true

getMatch(0)  => "aaabcc"
getMatch(1)  => "aaa"
getMatch(2)  => "cc"

getMatchStart (int index)

getMatchStart(0) returns the start index of previously matched string. getMatchStart(n) returns the start index of string that matches the Nth regex group.

e.g.
match("(a*)b(c*)", "aaabcc")  => true

getMatchStart(0)  => 0
getMatchStart(1)  => 0
getMatchStart(2)  => 4

getMatchEnd (int index)

getMatchEnd(0) returns the end index of previously matched string. getMatchEnd(n) returns the end index of string that matches the Nth regex group.

e.g.
match("(a*)b(c*)", "aaabcc")  => true

getMatchEnd(0)  => 6
getMatchEnd(1)  => 3
getMatchEnd(2)  => 6

getNumberOfGroups ()

getNumberOfGroups() returns the number of parenthesized subexpressions available after a successful match. The result refers to the number of parenthesized subgroups plus the entire match itself.

The result of this function is undefined if it is called after a unsuccessful match.

subsitute (String regex, PnutsFunction func, String input) or
(String regex, String replacement, String input)

When a string is specified as the 2nd parameter, substitute() replaces the string which matches the regex with the string replacement.

When a function with zero argument is specified as the 2nd parameter, substitute() replaces the string which matches the regex with the result of the function call.

e.g.
substitute("[a-z]+", function () getMatch(0).toUpperCase(), "aBcDe")  => "ABCDE"

split (String regex, String input)

split() tokenizes a string with the regular expression regex as the delimiter. It returns an array of the resulting tokens.

e.g.
split(`\.`, "a.b.c") => ["a", "b", "c"]


Back