Smart Tech: Regular expressions

Showing posts with label Regular expressions. Show all posts

Friday, March 8, 2019

using regular expression in sublime Text Editor

Sublime Text editor provides an easy way of replacing text using regular expressions.

Let's take a simple regex examples and replacing it with the values that we would like to have.

one of the sample text is :

"CreateTime" : ISODate("2018-12-21T15:10:30.145Z")

This has to be replaced with the value in the quotes.

We can do this using any of the programming language and use grouping and replace with the value.

The same can be done in sublime editor as well:

Just select Edit-->Replace.

Choose Regex mode

Select Replace All will do the job.

Monday, November 21, 2016

Remove the line numbers in source code using notepad++

A sample code with line numbers:

01<!DOCTYPE html>
02<html>
03<head>
04 <meta http-equiv="X-UA-Compatible" content="IE=edge">
05 <meta charset="utf-8">
06 <title>Hello App!</title>
07 <script>

To remove the numbers in each line of code, do the following:

Copy the code into notepad++
Type Ctrl + H
Under Search Mode choose Regular Expression.
In Find What :Enter ^\d+
Leave the Replace with: as blank
Now click Replace All

Now the code looks like:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta charset="utf-8">
<title>Hello App!</title>
<script>

Wednesday, June 17, 2015

Remove punctation from the text - Python

import re
import string
def removePunctuation(text):

#converts to lowercase
text1 = text.lower()

#removes punctuation
#punctation = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
text2 = re.sub('[%s]' % string.punctuation, ' ' ,text1)

#removes leading and trailing spaces
text3 = text2.strip()
return text3

print removePunctuation("'The best investments today,'")
print removePunctuation("Misery in Paradise;")

Output:

the best investments today
misery in paradise

Tuesday, October 7, 2014

Regular expressions with Python

This is post on getting and writing programs with regular expressions using python as programming language.

There would be many theoretical intro of why do we use regualar expressions. In simple, regex are used for parsing data either in search engine or web scraping.

Regular expressions: The following line of code is the syntax of regex in python.

Regexp<name> = regexp.search()

Before getting into the lines of code let's have a glance at the patterns.

Quick patterns:

[abc]		A single character of: a, b, or c
[^abc]		Any single character except: a, b, or c
[a-z]		Any single character in the range a-z
[a-zA-Z]		Any single character in the range a-z or A-Z
^		Start of line
$		End of line
\A		Start of string
\z		End of string
.	Any single character
\s	Any whitespace character
\S	Any non-whitespace character
\d	Any digit
\D	Any non-digit
\w	Any word character (letter, number, underscore)
\W	Any non-word character
\b	Any word boundary

(…)	Capture everything enclosed
(a\|b)	a or b
a?	Zero or one of a
a*	Zero or more of a
a+	One or more of a
a{3}	Exactly 3 of a
a{3,}	3 or more of a
a{3,6}	Between 3 and 6 of a
a{ ,6}	Not more than 6 of a

	For any pattern : ? – A regex followed by ? implies zero or one of the occurrence - A regex followed by implies Zero or more of the occurrence (Used for optional ) + - A regex followed by + implies one or more occurrence {x, y} – A regex with boundaries repeat itself in between x and y {x, } - A regex with lower bound is >/ x {, y} – A regex wit upper bound is </ y {x} – A regex with exact x times Using # is a comment Using \# in a regexp matches for the character #. . : Matches any character except \n \. : Matches only dot Special case in escape character: For the regexp [-a-zA-Z]: It considers hypen(–) as a character and looks for it. With Python: R with regular expressions in python: Using r ‘regexp’ turns off backslash impact of python in the expression Re library: To work with regular expressions in python we need to import ‘re’ library into our code and is as below: import re If a regular expression looks like this: Pattern = r’^[A-Z][a-z]{2}\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$’ It might be difficult to revisit the code and to debug this. Python avoids this with multiline expression using VERBOSE mode. Pattern = r’’’ ^ [A-Z][a-z]{2} \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} $ ‘’’ Exp = re.compile(pattern, re.VERBOSE) or Exp = re.compile(pattern, re.X) Simple example code for Regular expression import sys import re address_pattern = r''' ^ (?P<Address1>P\.O\.\s(BOX\|Box\|box)\s\d{1,5}) # address1 field (?P<City>\s\w\W\s\w\W\s\w\W\s*) # text between address and zip code $ ''' address_reg_exp = re.compile(address_pattern, re.VERBOSE) text = "PO Box 1055 Jefferson City, MO 65102" match = address_reg_exp.search(text) g = match.groups() if match: print match.group('Address1') print match.group('City')