TOM gp - Synthesized attributes

Go to the first, previous, next, last section, table of contents.

Synthesized attributes

So far, rules were like procedures: you invoke them and after some time they return. However, GP employs an attribute grammar, meaning that rules can be passed attributes and also return them. In this section, we'll look at an infix-expression parser, in which each rule that is responsible for parsing a subexpression will return the result of evaluating that part of the expression.

The calculator repeatedly reads an expression and prints the result of evaluating the expression. The top rule of the grammar will look like this:

%class ex5

%token NUMBER

grammar
                {
                  int result;
                }
        :
          [expr (result) ';'
                {
                  [[[stdio out] print result] nl];
                }
          ]*
 ;

The non-terminal expr, which has not been defined yet, is used as in

expr (result)

indicating that the expr rule employs one attribute. We need to look at the definition of expr, to see what kind of attribute this actually is. The expr rule starts like this:

expr (<int v)
  ...

This indicates that expr has one attribute, `v', of type int. The prefix `<' means that it is a synthesized attribute, i.e., a value for the attribute is returned to the invoking rule.

In the output of gp, the expr rule will be implemented by the following method:

int (v)
  expr;

Thus, the expr method returns an int corresponding to the local variable `v'. Setting the variable v within the rule will affect the value that is returned. Similarly, a rule with multiple synthesized attributes is implemented with multiple return values.

Also in the code generated by gp, the use of the expr rule becomes:

result = [self expr];

The full expr rule looks like this:

expr (<int v)
                {
                  int right;
                }
        :
          term (v)
          ['+' term (right)
                {
                  v += right;
                }
          |'-' term (right)
                {
                  v -= right;
                }
          ]*
        ;

In words: an expr starts with a term, followed by the addition or subtraction of zero or more terms. Similar to an expr, a term has one synthesized attribute. The value it returns is stored in the local variable v, which happens to be the name of the value that the expr rule returns: if the term is not followed by an addition or subtraction, the value returned by the term will be returned by the expr.

When expr is to parse the expression `1 + 2', the invocation of term will digest the `1' and return 1. This 1 will be stored in v. Since it is followed by a `+', the first branch of the alternative that follows will be taken. The term will be invoked again to parse what remains, i.e. `2'. The value that it returns is stored in the variable right, and the action that follows will be executed, which will add right to v, making v equal to 3. The end of the input has been reached and the expr rule will return the value of v: it has successfully parsed the input and correctly returned the result: 1 + 2 is 3.

The parser is still not complete, with the non-terminal term not yet defined. What an expr does with addition and subtraction, a term does with multiplication and division. Because the operators are handled by two rules and because the expr depends on term, the operator priorities are handled automatically and `1 + 2 * 3' will produce 7 and not 9.

term (< int value)
                {
                  int left;
                }
        : atom (left)
          ['*' atom (value)
                {
                  left = left * value;
                }
          |'/' atom (value)
                {
                  left = left / value;
                }
          ]*
                {
                  value = left;
                }
        ;

Note how term uses a different approach than expr to accumulate the result and hence must set the correct return value after it has finished matching the input.

The definition of the atom completes the parser. An atom is a number, a parenthesized expression, or a negated atom.

atom (< int value)
        :
          NUMBER
                {
                  value = [lex number_value];
                }
        | '-' atom (value)
                {
                  value = -value;
                }
        | '(' expr (value) ')'
        ;

Go to the first, previous, next, last section, table of contents.