Models

Building Models

Naming elements in grammar rules makes the parser discard uninteresting parts of the input, like punctuation, to produce an Abstract Syntax Tree (AST) that reflects the semantic structure of what was parsed. But an AST doesn’t carry information about the rule that generated it, so navigating the trees may be difficult.

TatSu defines the tatsu.model.ModelBuilderSemantics semantics class which helps construct object models from abtract syntax trees:

from tatsu.model import ModelBuilderSemantics

parser = MyParser(semantics=ModelBuilderSemantics())

Then you add the desired node type as first parameter to each grammar rule:

addition::AddOperator = left:mulexpre '+' right:addition ;

ModelBuilderSemantics will synthesize a class AddOperator(Node): class and use it to construct the node. The synthesized class will have one attribute with the same name as the named elements in the rule.

You can also use Python’s built-in types as node types, and ModelBuilderSemantics will do the right thing:

integer::int = /[0-9]+/ ;

ModelBuilderSemantics acts as any other semantics class, so its default behavior can be overidden by defining a method to handle the result of any particular grammar rule.

Viewing Models as JSON

Models generated by 竜 TatSu can be viewed by converting them to a JSON-compatible structure with the help of tatsu.util.asjson(). The protocol tries to provide the best representation for common types, and can handle any type using repr(). There are provisions for structures with back-references, so there’s no infinite recursion.

import json

print(json.dumps(asjson(model), indent=2))

The model, with richer semantics, remains unaltered.

Conversion to a JSON-compatible structure relies on the protocol defined by tatsu.utils.AsJSONMixin. The mixin defines a __json__(seen=None) method that allows classes to define their best translation. You can use AsJSONMixin as a base class in your own models to take advantage of asjson(), and you can specialize the conversion by overriding AsJSONMixin.__json__().

You can also write your own version of asjson() to handle special cases that are recurrent in your context.

Walking Models

The class tatsu.model.NodeWalker allows for the easy traversal (walk) a model constructed with a ModelBuilderSemantics instance:

from tatsu.model import NodeWalker

class MyNodeWalker(NodeWalker):

    def walk_AddOperator(self, node):
        left = self.walk(node.left)
        right = self.walk(node.right)

        print('ADDED', left, right)

model = MyParser(semantics=ModelBuilderSemantics()).parse(input)

walker = MyNodeWalker()
walker.walk(model)

When a method with a name like walk_AddOperator() is defined, it will be called when a node of that type is walked. The pythonic version of the class name may also be used for the walk method: walk__add_operator() (note the double underscore).

If a walk method for a node class is not found, then a method for the class’s bases is searched, so it is possible to write catch-all methods such as:

def walk_Node(self, node):
    print('Reached Node', node)

def walk_str(self, s):
    return s

def walk_object(self, o):
    raise Exception(f'Unexpected type {type(o).__name__} walked')

Which nodes get walked is up to the NodeWalker implementation. Some strategies for walking all or most nodes are implemented as classes in tatsu.wakers, such as PreOrderWalker and DepthFirstWalker.

Sometimes nodes must be walked more than once for the purpose at hand, and it’s up to the walker how and when to do that.

Take a look at tatsu.ngcodegen.PythonCodeGenerator for the walker that generates a parser in Python from the model of a parsed grammar.

Model Class Hierarchies

It is possible to specify a a base class for generated model nodes:

additive
    =
    | addition
    | substraction
    ;

addition::AddOperator::Operator
    =
    left:mulexpre op:'+' right:additive
    ;

substraction::SubstractOperator::Operator
    =
    left:mulexpre op:'-' right:additive
    ;

TatSu will generate the base class if it’s not already known.

Base classes can be used as the target class in walkers, and in code generators:

class MyNodeWalker(NodeWalker):
    def walk_Operator(self, node):
        left = self.walk(node.left)
        right = self.walk(node.right)
        op = self.walk(node.op)

        print(type(node).__name__, op, left, right)


class Operator(ModelRenderer):
    template = '{left} {op} {right}'