a second glance at regular expressions

a second glance at regular expressionsSocialTwist Tell-a-Friend

i remember learning all about regular expressions, grammer and the like in college but it was only last week when i looking for a quick method for email parsing and validation that i realised its practical use.


Using a good regex engine and a well-writen regular expression, one can perform all kinds of text-manipulation tasks. Regular expressions can be used to identify for certain conditions or charater sequences in a text file or data stream.
The most common place you'd find regular expressions is email address validation and search - replace functions . A search for " email validation using regular expressions" on google would prove my point.
So what do you need to start using regular expressions ? nothin you dont already have. Regular expressions are supported by most languages and tools in use.
i've used the java.util.regex API in java.

so here's a simple example in java for email validation that should give you an idea of how regular expressions can be used.

public static void main(String[] args){
String email="svwaingankar@gmail.com";
email=email.trim();

// Email Address validation
Pattern p=Pattern.compile("[a-zA-Z]*[0-9]*@[a-zA-Z]*\\.[a-zA-Z]*");

/* If you need a more detailed validation
Pattern p=Pattern.compile("^[a-zA-Z][\\w\\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\\w\\.-]*[a-zA-Z0-9]\\.[a-zA-Z][a-zA- Z\\.]*[a-zA-Z]$");
*/
Matcher m=p.matcher(email);
boolean result=m.matches();

if (result==true)
System.out.println( email + " is a VALID email address");
else
System.out.println( email + " is an INVALID email address");

}

What it means
[a-zA-Z]*[0-9]*@[a-zA-Z]*\\.[a-zA-Z]*

[a-zA-Z] --- any characted from the union of a to z and A-Z
[a-zA-Z]* --- the * means zero or more occurences
similarly for [0-9]*
\\. --- a dot ( \\ is escape character )

some more examples of character classes ( anything in [] )
[^x] - any character except x
[a-z && [x-z]] - x, or z ie- intersection

predefianed character classes
\d --- any digit
\w --- a word character ( ie [a-zA-Z_0-9])
. --- any character

Cons : Regular expressions are easier to write than they are to read. so use it only if there arent too many people apart from you maintaining the code.

0 comments:

Post a Comment

BlogCatalog

Travel Blogs - BlogCatalog Blog Directory