a second glance at regular expressions
Posted by
shashank ( shantaram )
a second glance at regular expressionsi remember learning all about regular expressions, grammer and the like in college but it was only last week when i looking for a quick method for email parsing and validation that i realised its practical use.
Using a good regex engine and a well-writen regular expression, one can perform all kinds of text-manipulation tasks.  Regular expressions can be used to identify for certain conditions or charater sequences in a text file or data stream.
The most common place you'd find regular expressions is email address validation and search - replace functions . A search for " email validation using regular expressions" on google would prove my point.
So what do you need to start using regular expressions ? nothin you dont already have. Regular expressions are supported by most languages and tools in use.
i've used the 
 java.util.regex  API in java.so here's a simple example in java for email validation that should give you an idea of how regular expressions can be used.
 public static void main(String[] args){
  email=email.trim();
  //  Email Address validation
  Pattern p=Pattern.compile("[a-zA-Z]*[0-9]*@[a-zA-Z]*\\.[a-zA-Z]*");  
  /* If you need a more detailed validation
  Pattern p=Pattern.compile("^[a-zA-Z][\\w\\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\\w\\.-]*[a-zA-Z0-9]\\.[a-zA-Z][a-zA-  Z\\.]*[a-zA-Z]$");   
  */
  Matcher m=p.matcher(email);
  boolean result=m.matches();
  if (result==true)
   System.out.println( email + "  is a VALID email address");
  else
   System.out.println( email + "  is an INVALID email address");
 }
What it means 
[a-zA-Z]*[0-9]*@[a-zA-Z]*\\.[a-zA-Z]*
[a-zA-Z]  --- any characted from the union of a to z and A-Z
[a-zA-Z]*  --- the * means zero or more occurences
similarly for [0-9]*
\\.  --- a dot ( \\ is escape character  )
some more examples of character classes ( anything in [] )
[^x] - any character except x
[a-z && [x-z]] - x, or z  ie- intersection
predefianed character classes
\d   --- any digit
\w  --- a word character ( ie [a-zA-Z_0-9])
.  --- any character
for a detailed tutorial in check out http://java.sun.com/docs/books/tutorial/essential/regex/index.html
Cons : Regular expressions are easier to write than they are to read. so use it only if there arent too many people apart from you maintaining the code.
Subscribe to:
Post Comments (Atom)
Post a Comment