Saturday, December 13, 2008

Perl Regular Expressions

Perl Regular Expressions
In Perl, you write regular expressions between / delimiters (or you can change the delimiter if you wish), and you add modifiers after the closing /. To match the contents of a variable to a regular expression, use the =~ operator. Regular expressions are also used by perl built in functions such as grep and split, and by the s operator.Perl uses a very full set of elements within its regular expressions, most of which are terse so hard for the newcomer to follow when maintaining code. It predates, so does not follow, the POSIX standard.Perl 6, currently under development, will support grammars and rules rather than regular expressions. Grammars and Rules will take pattern matching to a whole new level, and tools will be available to covert code - in other words, rules and grammars will do everything that the old Regular Expressions didn't, and more.
Operator Type
Examples
Description
Literal CharactersMatch a character exactly
a A y 6 % @
Letters, digits and many specialcharacters match exactly
\$ \^ \+ \\ \?
Precede other special characterswith a \ to cancel their regex special meaning
\n \t \r
Literal new line, tab, return
\cJ \cG
Control codes
\xa3
Hex codes for any character
Anchors and assertions
^
Starts with
$
Ends with
\b \B
on a word boundary,NOT on a word boundary
Character groupsany 1 character from the group
[aAeEiou]
any character listed from [ to ]
[^aAeEiou]
any character except aAeEio or u
[a-fA-F0-9]
any hex character (0 to 9 or a to f)
.
any character at all(not new line in some circumstances)
\s
any space character (space \n \r or \t)
\w
any word character (letter digit or _)
\d
any digit (0 through 9)
\S \W \D
any character that is NOT a spaceword character or digit
Countsapply to previous element
+
1 or more ("some")
*
0 or more ("perhaps some")
?
0 or 1 ("perhaps a")
{4}
exactly 4
{4,}
4 or more
{4,8}
between 4 and 8
Add a ? after any count to turn it sparse (match as few as possible) rather than have it default to greedy
Alternation

either, or
Grouping
( )
group for count and save to variable
(?: )
group for count but do not save
Variables
$xyz
Insert contents of $xyz into regular expression
\1 \2
Back reference to 1st, 2nd etc matched groupsAfter the closing / of your regular expression, you can add one or more modifiers to change its behaviour.
Modifier
Description
i
Ignore case in matching
g
Global match. Return a list of all matches (list context) or return the next match (scalar context)
x
White space is to be treated as a comment (otherwise it matches exactly)
s
. to match everything including new line (otherwise it matches everything except new line)
m
^ and $ to match embedded new lines
o
Tell compiler that regular expression doesn't change even if it includes a variable reference
e
s command only. Execute the output before you substitute it inThe following Perl functions and operators use regular expressions
Function / Operator
use

If you write a regular expression without an operator, it matches the regular expression against the contents of the $_ variable.
=~
Match the regular expression to the right against the variable to the left
s
Substitute the matched regular expression with a replacement string
grep
Filter a list for all member scalars that match the regular expression
split
split a scalar into a list, dividing the elements at the regular expressionThe above lists show the most commonly used elements of Perl regular expressions, and are not exhaustive.In Perl, you can change the / regular expression delimiter to almost any other special character if you preceed it with the letter m (for match); if you change to ( { or [, the balancing end expression character becomes ) } or ]. Back to Regular Expression Home PageJump to Elements of a regular expressionOrder a Regular Expression Mousemat for £4.95 inclusive

No comments: