flex-lexer,lex,lexing , Do start conditions in a lexer (i.e., scanner) increase the ability to recognize tokens? Or it is just a convenient thing to have?

Do start conditions in a lexer (i.e., scanner) increase the ability to recognize tokens? Or it is just a convenient thing to have?


Tag: flex-lexer,lex,lexing

At some point flex added start conditions. I'm curious to know whether this feature extends flex's theoretical ability to match tokens or if it's just a pragmatic solution that tends to make the set of rules (patterns and actions) shorter and easier to read.

There's some ambiguity here because it seems to me that start conditions could be simulated at the C level by clever use of flag variables; if true, then flex without start conditions is equal in power to flex with start conditions. Let's pretend we cannot extend our scanner this way, and all the scanner can do is match tokens via the patterns and echo back the name of the tokens. In this case, can the flex scanner WITH start conditions tokenize more languages than the scanner WITHOUT start conditions? Or can I always write a set of rules without start conditions that do the same thing as set of rules with start conditions?

Difficult question to word clearly but I hope I made it precise and clear enough.



They effectively allow you to run multiple DFAs with the same code. Another way to look at it is that it adds context-sensitivity.

It's done in Cobol for example where the lexical rules for PICTURE strings are completely different from those of the rest of the language, so you have



Flex (lexer) - matching unicode

Is there a way to get flex to match unicode along the lines of ascSymbol !|#|$|%|&|⋆|+|.|/|<|=|>|?|@|\|^|-|~|: uniSymbol \p{Symbol}|\p{Other_Symbol}|\p{Punctuation} symbol ascSymbol|uniSymbol{-}[^|_"',;] I found http://lists.gnu.org/archive/html/help-flex/2005-01/msg00043.html via Flex(lexer) support for unicode but I'd want to be able to something in an automated way. For example, I'm using cmake and it is configured to...

Wrong lex state when parsing multiple files

I am trying to parse two files with win flex and bison, but I am encountering a problem where lex is not in the state I am expecting. In the lex file: include[ \t]+\" { BEGIN(include_state); } <include_state>([^\\\"\n]|\\.)+ { yyin = fopen(yytext, "r"); if (!yyin) { printf("Error opening include file:...

Can't compile flex & bison (Symbols not found x86_64)

I am trying to compile a simple program on Flex & Bison on my Mac running Yosemite but get the following error: Undefined symbols for architecture x86_64: "_yyerror", referenced from: _yyparse in pr1-19c182.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use...

Lex & yacc parsing statements

I am trying to build a very simple language using lex and yacc It can have only one int variable through assignment like this a = 1 It can print variable like this print a It has to print the value only if variable name matches else it has to...

Flex & Bison Segmenation fault

I to make a compiler and I use flex and bison for that. I'm implementing the boolean comparision. But I have an error when bison goes into my token COMP. I have declared like that : ("==")|(">=")|("<=")|("!=")|(">")|("<") { sscanf(yytext,"%s",yylval.svalcmp); return COMP; } In bison my gramar is : ExpBool :...

Call flex yy_push_state() from bison parser

Is it possible to call yy_push_state() from a bison generated parser? How can this be done? context: /* empty */ { $$ = NULL; yy_push_state(SOME_STATE); } ; rule: context operator STRING { create_expr($2, $3); } ; I would like to be able to call yy_push_state() from parser and also would...

how to write lex file for input like “{\”a\“:1,\”b\“:2}”

I want to implement a json parser, but having problem with parse object like "{\"a\":1,\"b\":2}", currently the parser output somthing like this '(json (object "{" (kvpair "\"a\":1,\"b\"" ":" (json (number "2"))) "}")) but what i actually want is '(json (object "{" (kvpair "\"a\"" ":" (json (number "1"))) "," (kvpair "\"b\""...

Troubles with Flex/Bison

I try set up grammar using flex/bison by next pattern: DATA: 1,2,3,4,5 PROGRAM: add,mult,div,read This input must be transfered into 4 add mult div read 5 1 2 3 4 5 Where: 4 is a number of commands after "PROGRAM" and 5 is a number of data after "DATA". At...

checking unfinished comments in flex

I am a new to flex. I have just written a sample code to detect multi line comments using a flex program. Now I want to improve the code. I want to detect unfinished and ill formed comments in the code. for example: a comment beginning with /* without an...

How to do proper error handling in BNFC? (C++, Flex, Bison)

I'm making a compiler in BNFC and it's got to a stage where it already compiles some stuff and the code works on my device. But before shipping it, I want my compiler to return proper error messages when the user tries to compile an invalid program. I found how...

(Gnu) make: multiple targets with special compiler switches

I have a problem getting a makefile to work. I have several parsers that compile XML schemas as follows: $(srcdir)/schema_1_parser.cpp: \ $(srcdir)/schema_1_parser.l \ $(srcdir)/schema_1_parser.tab.cpp $(LEX) -Pschema_1 -o$(srcdir)/schema_1_parser.cpp \ $(srcdir)/schema_1_parser.l $(srcdir)/schema_1_parser.tab.cpp $(srcdir)/schema_1_parser.tab.hpp: \ $(srcdir)/schema_1_parser.y $(YACC) -ldv -p schema_1 -o $(srcdir)/schema_1_parser.tab.cpp \ $(srcdir)/schema_1_parser.y There are multiple such rule pairs, using schema_2, schema_3,...

Multiple definitions?

I'm having trouble with compiling my flex and bison code. more specifically my parser.yy file. In this file I included MathCalc.h and BaseProg.h, which are classes I've made. The issue is when I instantiate the classes, it gives me a "multiple definition" error on compilation. Any help would be appreciated!...

Checking Valid Arithmetic Expression in Lex (in C)

I have to write code for checking if an arithmetic expression is valid or not , in lex. I am aware that I could do this very easily using yacc but doing only in lex is not so easy. I have written the code below, which for some reason doesn't...

Checking wrong identifier patterns in flex

I am just trying to learn flex and here is a sample code in flex to detect identifiers and digits. I want to improve the code by identifying wrong identifier and digit patterns (for example: 1var,12.2.2,5. etc). How I will detect it? which change do I have to make in...

Where to free up memory allocated (for union) with _strdup in lex / yacc program?

I have defined the following union structure in my yacc / bison file: %union { int num; double dbl; char ch; char *str; } In my lex / flex file I have the following match: [a-zA-Z][a-zA-Z0-9"_"]* { yylval->str = _strdup(yytext); return id; } My question, where do I put the...

bison unexpected identifier error[SOLVED]

I'm getting an "unexpected identifier error", at 1-9, on the line string_op | string_lit { $$ = $1; } here is my grammar %union { int intval; double dubval; char* strval; obj object; } %token <intval> INTEGER %token <dubval> DOUBLE %token <strval> STRING_LITERAL %type <object> number factor value term constant...

Seg fault using flex, yy_scan_string(), and input() until end of input

The following flex code seems to create an executable that seg faults when the input string is an unterminated comment. Note that: this only occurs when the input buffer is a string (e.g. with yy_scan_string()) rather than a file (yyset_in()) the seg fault occurs when the flex code tries to...

Including additional rules with flex

I've been working on a small assembler which uses flex, however, the flex rule list is reasonably long. Ideally, I'd like to solve this by splitting the rules into several files which can be included into the primary lex file. My searching has turned up nothing of relevance which leads...

lexical analysis stops after yy_scan_string() is finished

I use flex to make a lexical analyzer. I want to analyse some define compiler statements which are in the form: #define identifier identifier_string. I keep a list of (identifier identifier_string) pair. So when I reach in the file a identifier that is #define list I need to switch the...

Storing the current line being analysed by flex

In my parser generated by flex, I would like to be able to store each line in the file, so that when reporting errors, I can show the user the line that the error occurred on. I could of course do this using a vector and read in all lines...

Lex match an angle bracket literally

I can't seem to get this lex regex working: %{ #include"y.tab.h" %} %option yylineno /* regular definitions */ angle_bracket_start "<" %% angle_bracket_start /*swallow it, do nothing!*/{} %% But when I test it with lex lex.l gcc lex.yy.c -lfl I got: $ ./a.out < < <--- If it prints out the...

How can I instruct the parser not to continue processing unterminated comments?

I'm working on improving error reporting on my compiler assignment. I'm handling unterminated comments in Flex using the following code: <INITIAL>"/*" {BEGIN(COMMENT);} <COMMENT>"*/" {BEGIN(INITIAL);} <COMMENT>([^*]|\n)+ {} <COMMENT><<EOF>> {yyerror("UNTERMINATED COMMENT"); BEGIN(INITIAL);} The issue is that the parser is printing its error message as well: $ ./comp tests/comments.cf ERROR: UNTERMINATED COMMENT: 27...

Linking CUP and jflex

I am trying to link my parser.java and yylex.java using help from http://www2.cs.tum.edu/projects/cup/examples.php http://www.cs.princeton.edu/~appel/modern/java/CUP/manual.html http://jflex.de/manual.html But I am getting these errors. error: Yylex is not abstract and does not override abstract method next_token() in Scanner error: next_token() in Yylex cannot implement next_token() in Scanner How to resolve them ? My...

multiple String literal in flex

I'm using flex to parse a whole buncha stuff, but I hit a roadbloack when I tried to detect two string literals on the same line. my regex: ["].*["] heres what I mean: "cats" < "dogs" is being recognized as one long string cats" < "dogs Why is flex only...

jison grammar definition leads to wrong token recognition

I recently found the project jison and modified the calculator example from its website. (http://zaach.github.io/jison/demos/calc/) /* lexical grammar */ %lex %% "a" return 'TOKEN1' "b" return 'TOKEN2' <<EOF>> return 'EOF' . return 'INVALID' /lex %start letters %% /* language grammar */ letters : | letters letter ; letter : 'TOKEN1'...

Lex/Flex :Regular expression for string literals in C/C++?

I look here ANSI C grammar . This page includes lot of regular expressions in Lex/Flex for ANSI C. Having problem in understanding regular expression for string literals. They have mentioned regular expression as \"(\\.|[^\\"])*\" As i can understand \" this is used for double quotes , \\ is for...

flex -l longest pattern match strategy - not here?

I have two lex rules and was wondering why I never matched the second rule. Instead rule 1 always fired upon the pattern 2005-05-09- <data>[-]?[0-9]*[.][0-9]* { comma=0; printf("DEBUG: data 1 %s\n",yytext); strcat(data_line,yytext); } <data>[0-9]{4}[-][01][0-9][-][0-3][0-9][-][0-9]{2}[.][0-9]{2}[.][0-9]{2}[.][0-9]{6} { printf("DEBUG: data 2[%s]\n",yytext); /* 1996-07-15-hh.00.00*/ I thought, flex/lex would follow the longest string match rule?...

Flex / Yacc program causing breakpoint on free instruction in VC++

I have Flex / Yacc program that is causing a breakpoint when it is run in the VC++ 2012 IDE. The breakpoint occurs on the instruction (in pre_lxr.l below): free(pre_fname); The project contains the lexer (.l file), the yacc file (.y) and an interface file (which sits between the parser...

Resetting the state of flex and/or bison

As part of a toy project I've been trying to make a small modification of someone else's parser based on flex/bison. I'm really not experienced with either. You can find the original parser here. I've been trying to put together a simple function that accepts a string and returns a...

Error while compiling Concurrent YACC program

I am trying to build implement a basic calculator using Concurrent YACC. I have tried the code by statically creating the threads. But whe I want to dynamically specify how many threads to be created, the parser seems to have a problem. Here are the contents of my code. aa.y...

Regular expression for HTML tags

I am working on Lexical Analyzer. I have an HTML file. I want to convert every letter in the file expect whatever written within an HTML tag into CAPITAL letter. Example: <html> <body> StackOverFlow </body> </html> This will be converted to following. <html> <body> STACKOVERFLOW </body> </html> I just want...

Detecting ill formed strings and comments in flex

I am just learning flex and I have written a flex program to detect a given word is verb or not. I will take input from a text file.I want to improve the code. I want to detect if there is any ill formed or unfinished string in the code.Unfinished...

Why does this scanner not eat whitespaces?

These are my lexer-definitions, there are many lexer-definitions but this one is mine. I have several regexes trying to capture and ignore whitespace, from this sample. The error I get is that in line 1: 14 there is a $undefined Symbold to be found - that is of asci-value 32....

Debug assertion failed error in flex / bison program when trying to program “include” functionality

I get a "debug assertion failed... Expression stream != NULL" error when running a flex / bison program. Here is the relevant code, at the top of the lex file: %x include_state %{ #define MAX_INCLUDE_DEPTH 10 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; int include_stack_ptr = 0; %} ... and later in the lex file:...

How to print comments in lex?

So the title might be a little bit misleading, but I can't think of any better way to phrase it. Basically, I'm writing a lexical-scanner using cygwin/lex. A part of the code reads a token /* . It the goes into a predefined state C_COMMENT, and ends when C_COMMENT"/*". Below...

named variables ambiguous with other token identifiers

Im using a lexer that needs to be able to identify the difference between a named variable and a keyword. to elaborate, in my .l file I have some definitions like "QUIT" {return QUIT;} "AND" {return AND;} "XOR" {return XOR;} and also I have the definition of a name(for a...

Error while compiling lex file

%{ #include <stdio.h> #include "y.tab.h" extern int yylval; %} %% [a-zA-Z] { yylval= *yytext[0]; return ID; } [0-9] { yylval= *yytext[0]; return NUM;} . return yytext[0]; \n return 0; %% i'm compiling this lex file along with the yacc file, when I hit the following command (cc lex.yy.c y.tab.h -ll)the...

write parsed tokens and YYTEXT of flex lexical analyser to a file

I need to write the token and the text parsed of that token in a file with flex analyser. Basically I want to store each parsed token in an output file. Someone has some idea?...

“first use” error when change the code in lex file

Given a .l file like this: %{ #include "y.tab.h" %} %% [ \t\n] "if" return IF_TOKEN ; "while" return ELSE_TOKEN ; . yyerror("Invalid Character"); %% int yywrap(void){ return 1; } and a .y file like this: %{ #include <stdio.h> void yyerror(char *); %} %token IF_TOKEN ELSE_TOKEN MINUS_TOKEN DIGIT_TOKEN %% program...

Removing comments using lex: why doesn't this work?

I'm writing a parser using Python/lex and trying to create an entry to remove C-style comments. My current (faulty) attempt is: def t_comment_ignore(t): r'(\/\*[^*]*\*\/)|(//[^\n]*)' pass This produced a quirk that baffled me. When I parse the string below: input = """ if // else mystery =/*=*/= true /* false */...

Lex priority label opcode priority

I am using lex / yacc to write an assembler I have some opcodes for example ORA [Oo][Rr][Aa] AND [Aa][Nn][Dd] EOR [Ee][Oo][Rr] and rules {ORA} { yylval.iValue = ora; return OPCODE; } {AND} { yylval.iValue = and; return OPCODE; } {EOR} { yylval.iValue = eor; return OPCODE; } I also...