For
any pattern :
? – A regex followed by ? implies
zero or one of the occurrence
*- A regex followed by *implies Zero or more
of the occurrence (Used for optional )
+ - A regex followed by + implies
one or more occurrence
{x, y} – A regex with boundaries
repeat itself in between x and y
{x, } - A regex with lower bound is >/ x
{, y} – A regex wit upper
bound is </ y
{x} – A regex with exact x times
Using # is a comment
Using \# in a regexp matches for
the character #.
. : Matches any character except
\n
\. : Matches only dot
Special
case in escape character:
For the regexp [-a-zA-Z]: It
considers hypen(–) as a character and looks for it.
With
Python:
R
with regular expressions in python:
Using r ‘regexp’ turns off
backslash impact of python in the expression
Re
library:
To work with regular expressions
in python we need to import ‘re’ library into our code and is as below:
import re
If a regular expression looks like
this:
Pattern = r’^[A-Z][a-z]{2}\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$’
It might be difficult to revisit
the code and to debug this. Python avoids this with multiline expression
using VERBOSE mode.
Pattern = r’’’
^
[A-Z][a-z]{2}
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
$
‘’’
Exp = re.compile(pattern,
re.VERBOSE) or
Exp = re.compile(pattern, re.X)
Simple example code for Regular
expression
import sys
import re
address_pattern = r'''
^
(?P<Address1>P\.*O\.*\s*(BOX|Box|box)\s\d{1,5})
# address1 field
(?P<City>\s*\w*\W*\s*\w*\W*\s*\w*\W*\s*)
# text between address and zip code
'''
address_reg_exp =
re.compile(address_pattern, re.VERBOSE)
text = "PO Box 1055 Jefferson
City, MO 65102"
match =
address_reg_exp.search(text)
g = match.groups()
if match:
print match.group('Address1')
print match.group('City')
|