Wednesday, April 17, 2013

what is the difference between lex and yacc

what is the difference between lex and yacc

vote up 1 vote down star


i have worked with lex for excuting some code whenever some regular expression is found, Can yacc do something more than that?if yes what
lex yacc
flag

asked Jul 27 at 18:04
Manik Mahajan
11●1

5 Answers
oldest newest votes
vote up 7 vote down


Yes, YACC is a parser, Lex is a lexical analyzer. They are typically used together: you Lex the string input, and YACC the tokenized input provided by Lex.

Now, a regular expression can only represent regular languages. One of the constraints of a regular language is the lack of "memory". You cannot define the rules for acceptance further down the string based on what has come before.

This is mostly clearly seen in the case of parenthesis. A regular language can't match nested parenthesis to the correct level. Or any other such structure. The grammars of (most) computer languages can and do, and, because of that, they can't be be parsed with a Lexer or a regular expression. That's where YACC comes in.

One can reverse the question as well. If YACC can do more, why not use it for the lexical analysis? Well, it so happens that you can verify the validity of a regular expression very efficiently, which is not the case of general grammars -- not to the same level. Still, YACC can do basic lexical analysis, if the lexical rules of the language are simple enough.
link|flag

edited Jul 29 at 20:58
laalto

answered Jul 27 at 18:11
Daniel
10.2k●1●15●40

+1 for explaining the difference between regular expressions and CFG's... – Polaris878 Jul 27 at 18:29
another, probably more important reason why yacc is not usually used for lexical analysis is because that is really pretty cumbersome. For instance a production rule to recognize a floating point number in Lex regular expressions is 1 line, about 15 characters. The equivalent Yacc rule would be about 10 lines, maybe 150 characters. – TokenMacGuy Jul 28 at 3:34

vote up 2 vote down


lex is for tokenizing input. That is, separating out your input into the lowest-level objects that your grammar defines. For example, you use lex to identify keywords, identifiers, strings, comments, whitespace, and so on.

yacc is for parsing your grammar. A grammar is a description of your language, typically defined in EBNF or some other context-free grammar. Once you describe your grammar to yacc, you can use it to run the actions of your tool when elements of the language are recognized. This might be, for example, constructing syntax trees for expression solving, defining scope objects, recording variable defintions, and so forth.

They are complimentary products.
link|flag

answered Jul 27 at 18:13
Christopher
3,299●11

+1 nice and succinct – skaffman Jul 29 at 21:02
vote up 1 vote down


lex and yacc are normally used together. This is how you usually construct an application using both:

Input Stream (characters) -> Lex (tokens) -> Yacc (Abstract Syntax Tree) -> Your Applcation

More generally, what Lex will do is read a source file, from the beginning, and try to match a number of regular expressions (lex has its own, special syntax for this, which is a bit different from perl or sed regular expressions), and will then invoke another program with each token it recognizes. Tokens may either just be a plain enumerated value, like for a key word or operator, or it might have some metadata attached, like for a literal value.

Lex is usually (though not neccesarily) used to invoke Yacc. Yacc uses a LALR parser algorithm, which roughly speaking, works by pushing each token onto a stack. If the stack has a sequence of tokens that it recognizes, it will pop all of the tokens, perform an action, and push another token back on the stack.

The proper vocabulary for what Yacc works on is actually terminals and non-terminals. A terminal is a token that it got from the invoking program (usually Lex), and a non-terminal is the result of matching a sequence on its stack.

Usually the actions taken by each Yacc rule are either to evaluate the result of a calculation that the rule corresponds with, or to produce an intermediate representation, like a syntax tree, for another application layer to process.

Yacc, like lex, can be used separate from the other. For instance, you could use Yacc by passing it individual characters from the source text, and use Yacc rules to recognize each kind of token. However Yacc is not designed to be very easy to use that way, and so the resulting lexer will be much more complex than an equivalent lexer in Lex. A more typical use would be to make a hand-coded lexer for reasons of either performance or because you need a smarter lexer. A common example of the second case is as used in C-like languages that have to know about previous uses of identifiers to know if they are used to describe types or variables.
link|flag

answered Jul 27 at 18:24
TokenMacGuy
8,530●4●19

vote up 1 vote down


lex is a lexical analyzer. It splits text up into tokens. Its power is roughly equivalent to regular expression matching. yacc is a parser generator. It takes a sequence of tokens (say, from lex) and interprets them as series of statements. Its power is roughly equivalent to context free grammars.

A typical application of lex and yacc is for implementing programming languages. lex tokenizes the input, breaking it up into keywords, constants, punctuation, etc. yacc then implements the actual computer language; recognizing a for statement, for instance, or a function definition.

In a practical sense, you often use lex to process input text into chunks. Then you use yacc to string those chunks together and process them into some larger meaning.
link|flag

edited Jul 27 at 23:22


answered Jul 27 at 18:12
Nelson
1,313●10

You mean "It takes a sequence of tokens (say, from lex) and..." don't you? – Chris Lutz Jul 27 at 19:16
thanks, corrected. – Nelson Jul 27 at 23:22
vote up 0 vote down


Lex is a tool for building lexical analyzers, that can do some rather stupid lexical stuff (like finding keywords). Yacc is a parser generator, that can create parsers for real computer languages. Its analysis is normally based upon the output of lex (which is a stream of tokens) and from this can create your parse-tree of the programming language -- something that is more than lex does.

Traditionally, compiler builders distinguish between lexical and syntactical analysis -- which are two important steps in a compiler (further ones to follow eg. code creation, optimization).

Reference:
http://blog.ijun.org/2013/04/parse-computer-langauge-syntax.html

No comments: