This is post on getting and writing programs with regular expressions using python as programming language.
There would be many theoretical intro of why do we use regualar expressions. In simple, regex are used for parsing data either in search engine or web scraping.
Regular expressions: The following line of code is the syntax of regex in python.
Regexp<name> = regexp.search()
Before getting into the lines of code let's have a glance at the patterns.
Quick patterns:
[abc]
|
A single character of: a, b, or c
|
||
[^abc]
|
Any single character except: a, b,
or c
|
||
[a-z]
|
Any single character in the range
a-z
|
||
[a-zA-Z]
|
Any single character in the range
a-z or A-Z
|
||
^
|
Start of line
|
||
$
|
End of line
|
||
\A
|
Start of string
|
||
\z
|
End of string
|
||
.
|
Any single character
|
||
\s
|
Any whitespace character
|
||
\S
|
Any non-whitespace character
|
||
\d
|
Any digit
|
||
\D
|
Any non-digit
|
||
\w
|
Any word character (letter,
number, underscore)
|
||
\W
|
Any non-word character
|
||
\b
|
Any word boundary
|
||
(…)
|
Capture everything enclosed
|
(a|b)
|
a or b
|
a?
|
Zero or one of a
|
a*
|
Zero or more of a
|
a+
|
One or more of a
|
a{3}
|
Exactly 3 of a
|
a{3,}
|
3 or more of a
|
a{3,6}
|
Between 3 and 6 of a
|
a{ ,6}
|
Not more than 6 of a
|
For
any pattern :
? – A regex followed by ? implies
zero or one of the occurrence
*- A regex followed by *implies Zero or more
of the occurrence (Used for optional )
+ - A regex followed by + implies
one or more occurrence
{x, y} – A regex with boundaries
repeat itself in between x and y
{x, } - A regex with lower bound is >/ x
{, y} – A regex wit upper
bound is </ y
{x} – A regex with exact x times
Using # is a comment
Using \# in a regexp matches for
the character #.
. : Matches any character except
\n
\. : Matches only dot
Special
case in escape character:
For the regexp [-a-zA-Z]: It
considers hypen(–) as a character and looks for it.
With
Python:
R
with regular expressions in python:
Using r ‘regexp’ turns off
backslash impact of python in the expression
Re
library:
To work with regular expressions
in python we need to import ‘re’ library into our code and is as below:
import re
If a regular expression looks like
this:
Pattern = r’^[A-Z][a-z]{2}\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$’
It might be difficult to revisit
the code and to debug this. Python avoids this with multiline expression
using VERBOSE mode.
Pattern = r’’’
^
[A-Z][a-z]{2}
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
$
‘’’
Exp = re.compile(pattern,
re.VERBOSE) or
Exp = re.compile(pattern, re.X)
Simple example code for Regular
expression
import sys
import re
address_pattern = r'''
^
(?P<Address1>P\.*O\.*\s*(BOX|Box|box)\s\d{1,5})
# address1 field
(?P<City>\s*\w*\W*\s*\w*\W*\s*\w*\W*\s*)
# text between address and zip code
'''
address_reg_exp =
re.compile(address_pattern, re.VERBOSE)
text = "PO Box 1055 Jefferson
City, MO 65102"
match =
address_reg_exp.search(text)
g = match.groups()
if match:
print match.group('Address1')
print match.group('City')
|
No comments:
Post a Comment