FAQ Database Discussion Community


Lexical analyzer with decimal numbers

java,regex,text-parsing,lexical-analysis
I am trying to write a very basic text processor. the goal is to read from a file char by char and decide what type of token it is. basically what i have done is loaded all the chars into a queue and am using peek() and read() to check...

How to print comments in lex?

comments,lex,lexical-analysis,lexical-scanner
So the title might be a little bit misleading, but I can't think of any better way to phrase it. Basically, I'm writing a lexical-scanner using cygwin/lex. A part of the code reads a token /* . It the goes into a predefined state C_COMMENT, and ends when C_COMMENT"/*". Below...

how to find multiline comments from a java file?

java,string,file,comments,lexical-analysis
I've read my java source file and stored it's contents in String s However i'm facing difficulty in finding the multiline comments from file My task is to find the multiline comments like these:- /* i am helpful i am great */ and display them...

Ruby REPL, determinate if a line it's part of a valid expression

ruby,expression,read-eval-print-loop,lexical-analysis
I'm doing a Ruby REPL (just a hobby, won't be big and professional like pry). I wrote a very simple REPL that works fine if the input it's just a single valid line of Ruby: loop do print "ruby> " input = gets puts "=> #{eval(input)}" end I want to...

How to define a Regex in StandardTokenParsers to identify path?

regex,scala,parsing,lexical-analysis
I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well. class InfixToPostfix extends StandardTokenParsers { import lexical._ def...

Processing complex lexicals in Rascal

lexical-analysis,rascal
What's the best practice for dealing with complex literals in Rascal? Two examples from JavaScript (my DSL has similar cases): Strings with \ escapes - have to be unescaped into actual value. Regular expression literals - need their own sub-AST. implode refuses to map lexicals to abstract trees, they are...

Regular expression for HTML tags

html,regex,lex,lexical-analysis
I am working on Lexical Analyzer. I have an HTML file. I want to convert every letter in the file expect whatever written within an HTML tag into CAPITAL letter. Example: <html> <body> StackOverFlow </body> </html> This will be converted to following. <html> <body> STACKOVERFLOW </body> </html> I just want...

Conversion to ASCII

ascii,lexical-analysis
#include <iostream> #include<string.h> #include<cstring> #include<ctype.h> using namespace std; char *Data1[100]; char *operators[20]; char *identifiers[20][20]; int ascii[100] = {0}; int ascii2[100] = {0}; unsigned int Tcount = 0; unsigned int i; int main(void) { char *text = (char*)malloc ( 100 *sizeof( char)); cout << "Enter the first arrangement of data." <<...

How to analyze a String in Java in order to tell if it is a word or total gibberish?

java,string,spam,spam-prevention,lexical-analysis
I need to analyze a String in Java in order to tell if it contains gibberish. For example: "asdasx123ax" - gibberish "dsjkklcq" - gibberish "12das" - gibberish "samarta" - not gibberish (note that it doesn't have to be a real word from the dictionary in order to be considered "not...

Checking wrong identifier patterns in flex

compiler-errors,flex-lexer,lex,lexical-analysis,lexical-scanner
I am just trying to learn flex and here is a sample code in flex to detect identifiers and digits. I want to improve the code by identifying wrong identifier and digit patterns (for example: 1var,12.2.2,5. etc). How I will detect it? which change do I have to make in...

Unicode escaped comments in Python

python,unicode,escaping,comments,lexical-analysis
I made a Java program that uses unicode escaped characters to break a multiline comment and hide some functionality. The program below prints "Hello Cruel World". I'm wondering if this is possible to do in Python (any version). If it is not possible, how is this prevented in the language?...

checking unfinished comments in flex

c,flex-lexer,lex,lexical-analysis,lexical-scanner
I am a new to flex. I have just written a sample code to detect multi line comments using a flex program. Now I want to improve the code. I want to detect unfinished and ill formed comments in the code. for example: a comment beginning with /* without an...

Annotating a treebank with lexical information (Head Words) in JAVA

java,nlp,stanford-nlp,lexical-analysis
I have a treebank with syntactic parse tree for each sentence as given below: (S (NP (DT The) (NN government)) (VP (VBZ charges) (SBAR (IN that) (S (PP (IN between) (NP (NNP July) (CD 1971)) (CC and) (NP (NNP July) (CD 1992))) (, ,) (NP (NNP Rostenkowski)) (VP (VBD placed)...

what compiler should I use as case study for self studying compiler principles techniques [closed]

c,compiler-construction,code-generation,abstract-syntax-tree,lexical-analysis
I decided to start studying compiler theory but the problem is that I want a compiler for any language in order to track each of lexical analyzer output. syntax tree. intermediate representation. code generation. I dont care for optimization right now I am aware of some questions similar to mine...

Grammar for parsing PHP like language such that it can handle the PHP begin and end tokens (“”) in the grammar

parsing,grammar,context-free-grammar,lexical-analysis
I am trying to understand how one can define a PHP-like grammar. In PHP, one can get out of PHP mode into HTML mode and then back into PHP mode. For the sake of asking this question, I am defining my PHP-like language to be ridiculously simple. This language will...

write parsed tokens and YYTEXT of flex lexical analyser to a file

compiler-construction,flex-lexer,lexical-analysis
I need to write the token and the text parsed of that token in a file with flex analyser. Basically I want to store each parsed token in an output file. Someone has some idea?...

How to create a lexical analyzer in ANTLR 4 that can catch different types of lexical errors

java,compiler-construction,antlr,antlr4,lexical-analysis
I am using ANTLR 4 to create my lexer, but I don't how to create a lexical analyzer that catches different types of lexical errors. For example: If I have an unrecognized symbol like ^ the lexical analyzer should a report an error like this "Unrecognized symbol "^" " If...

Writting a syntax analyser using an AFD for C language

lexical-analysis,finite-automata,deterministic,lexical-scanner
I have been given a task to write a C language analyser using an AFD. I can choose whichever language I want so I think I will go for Ruby. However this task is a little overwhelming to grasp at the beginning. The problem I stumble across is : How...

Symbol table content after Lexical analysis

compiler-construction,lexical-analysis
Say I have a C source code file with following content: int i = 21 + 10; int blah(){ int i = 21; return i + 10; } main(){ int i; i += i + 10; } at the end of lexical analysis phase, what will be the content of...