a second glance at regular expressions
Posted by
shashank ( shantaram )
a second glance at regular expressions
i remember learning all about regular expressions, grammer and the like in college but it was only last week when i looking for a quick method for email parsing and validation that i realised its practical use.
Using a good regex engine and a well-writen regular expression, one can perform all kinds of text-manipulation tasks. Regular expressions can be used to identify for certain conditions or charater sequences in a text file or data stream.
The most common place you'd find regular expressions is email address validation and search - replace functions . A search for " email validation using regular expressions" on google would prove my point.
So what do you need to start using regular expressions ? nothin you dont already have. Regular expressions are supported by most languages and tools in use.
i've used the
java.util.regex
API in java.so here's a simple example in java for email validation that should give you an idea of how regular expressions can be used.
public static void main(String[] args){
email=email.trim();
// Email Address validation
Pattern p=Pattern.compile("[a-zA-Z]*[0-9]*@[a-zA-Z]*\\.[a-zA-Z]*");
/* If you need a more detailed validation
Pattern p=Pattern.compile("^[a-zA-Z][\\w\\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\\w\\.-]*[a-zA-Z0-9]\\.[a-zA-Z][a-zA- Z\\.]*[a-zA-Z]$");
*/
Matcher m=p.matcher(email);
boolean result=m.matches();
if (result==true)
System.out.println( email + " is a VALID email address");
else
System.out.println( email + " is an INVALID email address");
}
What it means
[a-zA-Z]*[0-9]*@[a-zA-Z]*\\.[a-zA-Z]*
[a-zA-Z] --- any characted from the union of a to z and A-Z
[a-zA-Z]* --- the * means zero or more occurences
similarly for [0-9]*
\\. --- a dot ( \\ is escape character )
some more examples of character classes ( anything in [] )
[^x] - any character except x
[a-z && [x-z]] - x, or z ie- intersection
predefianed character classes
\d --- any digit
\w --- a word character ( ie [a-zA-Z_0-9])
. --- any character
for a detailed tutorial in check out http://java.sun.com/docs/books/tutorial/essential/regex/index.html
Cons : Regular expressions are easier to write than they are to read. so use it only if there arent too many people apart from you maintaining the code.
Friday, August 01, 2008 | 0 Comments
Subscribe to:
Posts (Atom)