At some point flex added start conditions. I'm curious to know whether this feature extends flex's theoretical ability to match tokens or if it's just a pragmatic solution that tends to make the set of rules (patterns and actions) shorter and easier to read.

There's some ambiguity here because it seems to me that start conditions could be simulated at the C level by clever use of flag variables; if true, then flex without start conditions is equal in power to flex with start conditions. Let's pretend we cannot extend our scanner this way, and all the scanner can do is match tokens via the patterns and echo back the name of the tokens. In this case, can the flex scanner WITH start conditions tokenize more languages than the scanner WITHOUT start conditions? Or can I always write a set of rules without start conditions that do the same thing as set of rules with start conditions?

Difficult question to word clearly but I hope I made it precise and clear enough.



They effectively allow you to run multiple DFAs with the same code. Another way to look at it is that it adds context-sensitivity.

It's done in Cobol for example where the lexical rules for PICTURE strings are completely different from those of the rest of the language, so you have



