Regular Expressions
Regular expressions are a powerful tool for pattern matching on strings of text. They are built in to the core of languages like Perl, Ruby, and Javascript. Perl and Ruby are particulary reknowned for adroitly handling regular expressions. So why aren't they part of the D core language? Read on and see how they're done in D compared with Ruby.
This article explains how to use regular expressions in D. It doesn't explain regular expressions themselves, after all, people have written entire books on that topic. D's specific implementation of regular expressions is entirely contained in the Phobos library module std.regexp. For a more advanced treatment of using regular expressions in conjuction with template metaprogramming, see Templates Revisited.
In Ruby a regular expression can be created as a special literal:
r = /pattern/
s = /p[1-5]\s*/
D doesn't have special literals for them, but they can be created:
r = RegExp("pattern");
s = RegExp(r"p[1-5]\s*");
If the pattern contains backslash characters \, wysiwyg string literals are used, which have the 'r' prefix to the string. r and s are of type RegExp, but we can use type inference to declare and assign them automatically:
auto r = RegExp("pattern");
auto s = RegExp(r"p[1-5]\s*");
To check for a match of a string s with a regular expression in Ruby, use the =~ operator, which returns the index of the first match:
s = "abcabcabab"
s =~ /b/ /* match, returns 1 */
s =~ /f/ /* no match, returns nil */
In D this looks like:
auto s = "abcabcabab";
std.regexp.find(s, "b"); /* match, returns 1 */
std.regexp.find(s, "f"); /* no match, returns -1 */
Note the equivalence to std.string.find, which searches for substring matches rather than regular expression matches.
The Ruby =~ operator sets some implicitly defined variables based on the result:
s = "abcdef"
if s =~ /c/
"#{$`}[#{$&}]#{$'}" /* generates string ab[c]def
The function std.regexp.search() returns a RegExp object describing the match, which can be exploited:
auto m = std.regexp.search("abcdef", "c");
if (m)
writefln("%s[%s]%s", m.pre, m.match(0), m.post);
Or even more concisely as:
if (auto m = std.regexp.search("abcdef", "c"))
writefln("%s[%s]%s", m.pre, m.match(0), m.post); // writes ab[c]def
Search and Replace
Search and replace gets more interesting. To replace the occurrences of "a" with "ZZ" in Ruby; the first occurrence, then all:
s = "Strap a rocket engine on a chicken."
s.sub(/a/, "ZZ") // result: StrZZp a rocket engine on a chicken.
s.gsub(/a/, "ZZ") // result: StrZZp ZZ rocket engine on ZZ chicken.
In D:
s = "Strap a rocket engine on a chicken.";
sub(s, "a", "ZZ"); // result: StrZZp a rocket engine on a chicken.
sub(s, "a", "ZZ", "g"); // result: StrZZp ZZ rocket engine on ZZ chicken.
The replacement string can reference the matches using the $&, $$, $', $`, .. 9 notation:
sub(s, "[ar]", "[$&]", "g"); // result: St[r][a]p [a] [r]ocket engine on [a] chicken.
Or the replacement string can be provided by a delegate:
sub(s, "[ar]",
(RegExp m) { return toupper(m.match(0)); },
"g"); // result: StRAp A Rocket engine on A chicken.
(toupper() comes from std.string.)
Looping
It's possible to search over all matches within a string:
import std.stdio;
import std.regexp;
void main()
{
foreach(m; RegExp("ab").search("abcabcabab"))
{
writefln("%s[%s]%s", m.pre, m.match(0), m.post);
}
}
// Prints:
// [ab]cabcabab
// abc[ab]cabab
// abcabc[ab]ab
// abcabcab[ab]
Conclusion
D regular expression handling is as powerful as Ruby's. But its syntax isn't as concise:
- Regular expression literal syntax - doing so would make it impossible to perform lexical analysis without also doing syntactic or semantic analysis.
- Implicit naming of match variables - this causes problems with name collisions, and just doesn't fit with the rest of the way D works.
But it is just as powerful.