Building Models

Naming elements in grammar rules makes the parser discard uninteresting parts of the input, like punctuation, to produce an Abstract Syntax Tree (AST) that reflects the semantic structure of what was parsed. But an AST doesn’t carry information about the rule that generated it, so navigating the trees may be difficult.

TatSu defines the tatsu.model.ModelBuilderSemantics semantics class which helps construct object models from abtract syntax trees:

from tatsu.model import ModelBuilderSemantics

parser = MyParser(semantics=ModelBuilderSemantics())

Then you add the desired node type as first parameter to each grammar rule:

addition::AddOperator = left:mulexpre '+' right:addition ;

ModelBuilderSemantics will synthesize a class AddOperator(Node): class and use it to construct the node. The synthesized class will have one attribute with the same name as the named elements in the rule.

You can also use Python’s built-in types as node types, and ModelBuilderSemantics will do the right thing:

integer::int = /[0-9]+/ ;

ModelBuilderSemantics acts as any other semantics class, so its default behavior can be overidden by defining a method to handle the result of any particular grammar rule.

Walking Models

The class tatsu.model.NodeWalker allows for the easy traversal (walk) a model constructed with a ModelBuilderSemantics instance:

from tatsu.model import NodeWalker

class MyNodeWalker(NodeWalker):

    def walk_AddOperator(self, node):
        left = self.walk(node.left)
        right = self.walk(node.right)

        print('ADDED', left, right)

model = MyParser(semantics=ModelBuilderSemantics()).parse(input)

walker = MyNodeWalker()

When a method with a name like walk_AddOperator() is defined, it will be called when a node of that type is walked. The pythonic version of the class name may also be used for the walk method: walk__add_operator() (note the double underscore).

If a walk method for a node class is not found, then a method for the class’s bases is searched, so it is possible to write catch-all methods such as:

def walk_Node(self, node):
    print('Reached Node', node)

def walk_str(self, s):
    return s

def walk_object(self, o):
    raise Exception('Unexpected tyle %s walked', type(o).__name__)

Predeclared classes can be passed to ModelBuilderSemantics instances through the types= parameter:

from mymodel import AddOperator, MulOperator

semantics=ModelBuilderSemantics(types=[AddOperator, MulOperator])

ModelBuilderSemantics assumes nothing about types=, so any constructor (a function, or a partial function) can be used.

Model Class Hierarchies

It is possible to specify a a base class for generated model nodes:

    | addition
    | substraction

    left:mulexpre op:'+' right:additive

    left:mulexpre op:'-' right:additive

TatSu will generate the base class if it’s not already known.

Base classes can be used as the target class in walkers, and in code generators:

class MyNodeWalker(NodeWalker):
    def walk_Operator(self, node):
        left = self.walk(node.left)
        right = self.walk(node.right)
        op = self.walk(node.op)

        print(type(node).__name__, op, left, right)

class Operator(ModelRenderer):
    template = '{left} {op} {right}'

Templates and Translation

As of 竜 TatSu 3.2.0, code generation is separated from grammar models through tatsu.codegen.CodeGenerator as to allow for code generation targets different from Python. Still, the use of inline templates and rendering.Renderer hasn’t changed. See the regex example for merged modeling and code generation.

TatSu doesn’t impose a way to create translators with it, but it exposes the facilities it uses to generate the Python source code for parsers.

Translation in 竜 TatSu is template-based, but instead of defining or using a complex templating engine (yet another language), it relies on the simple but powerful string.Formatter of the Python standard library. The templates are simple strings that, in 竜 TatSu’s style, are inlined with the code.

To generate a parser, 竜 TatSu constructs an object model of the parsed grammar. A tatsu.codegen.CodeGenerator instance matches model objects to classes that descend from tatsu.codegen.ModelRenderer and implement the translation and rendering using string templates. Templates are left-trimmed on whitespace, like Python doc-comments are. This is an example taken from 竜 TatSu’s source code:

class Lookahead(ModelRenderer):
    template = '''\
                with self._if():

Every attribute of the object that doesn’t start with an underscore (_) may be used as a template field, and fields can be added or modified by overriding the render_fields(fields) method. Fields themselves are lazily rendered before being expanded by the template, so a field may be an instance of a ModelRenderer descendant.

The rendering module defines a Formatter enhanced to support the rendering of items in an iterable one by one. The syntax to achieve that is:


All of ind, sep, and fmt are optional, but the three colons are not. A field specified that way will be rendered using:

indent(sep.join(fmt % render(v) for v in value), ind)

The extended format can also be used with non-iterables, in which case the rendering will be:

indent(fmt % render(value), ind)

The default multiplier for ind is 4, but that can be overridden using n*m (for example 3*1) in the format.

Using a newline character (\n) as separator will interfere with left trimming and indentation of templates. To use a newline as separator, specify it as \\n, and the renderer will understand the intention.