Aug 102012
 

In order to solve problems in my work, I frequently need to extract information from structured text.  Often it’s in a report or log file generated by software I don’t control.  Other times it’s inside source code in one of a multitude of languages I need to deal with (some of which are domain-specific languages without readily available grammar definitions).  Sometimes it’s even to answer complex questions about the structure or behavior of code I need to maintain.

If I’m lucky, I can tackle this with some simple regular expression matching.  That kind of text matching has been built into many languages (e.g. Perl, Python, Ruby) for a long time.  (C++ has even recently joined the crowd.)  Unfortunately, I’m not always so lucky.

Continue reading »