In order to solve problems in my work, I frequently need to extract information from structured text. Often it’s in a report or log file generated by software I don’t control. Other times it’s inside source code in one of a multitude of languages I need to deal with (some of which are domain-specific languages without readily available grammar definitions). Sometimes it’s even to answer complex questions about the structure or behavior of code I need to maintain.
If I’m lucky, I can tackle this with some simple regular expression matching. That kind of text matching has been built into many languages (e.g. Perl, Python, Ruby) for a long time. (C++ has even recently joined the crowd.) Unfortunately, I’m not always so lucky.