Translation¶

Translation is one of the most common tasks in language processing. Analysis often sumarizes the parsed input, and walkers are good for that.

竜 TatSu doesn’t impose a way to create translators, but it exposes the facilities it uses to generate the Python source code for parsers.

Print Translation¶

Translation in 竜 TatSu is based on subclasses of NodeWalker. Print-based translation relies on classes that inherit from IndentPrintMixin, a strategy copied from the new PEG parser in Python (see PEP 617).

IndentPrintMixin provides an indent() method, which is a context manager, and should be used thus:

class MyTranslationWalker(NodeWalker, IndentPrintMixin):

    def walk_SomeNodeType(self, node: NodeType):
        self.print('some preamble')
        with self.indent():
            # continue walking the tree
            self.print('something else')

The self.print() method takes note of the current level of indentation, so output will be indented by the indent passed to the IndentPrintMixin constructor, or to the indent(amount: int) method. The mixin keeps as stack of the indent ammounts so it can go back to where it was after each with indent(amount=n): statement:

def walk_SomeNodeType(self, node: NodeType):
    with self.indent(amount=2):
        self.print(node.exp)

The printed code can be retrieved using the printed_text() method, but other posibilities are available by assigning a stream-like object to self.output_stream in the __init__() method.

A good example of how to do code generation with a NodeWalker and IndentPrintMixin is 竜 TatSu’s own code generator, which can be found in tatsu/ngcodegen/python.py, or the model generation found in tatsu/ngcodegen/objectomdel.py.

Declarative Translation (deprecated)¶

竜 TatSu provides support for template-based code generation (“translation”, see below) in the tatsu.codegen module. Code generation works by defining a translation class for each class in the model specified by the grammar.

Nowadays the preferred code generation strategy is to walk down the AST and print() the desired output, with the help of the NodWalker class, and the IndentPrintMixin mixin. That’s the strategy used by pegen, the precursor to the new PEG parser in Python. Please take a lookt at the mini-tutorial for an example.

Basically, the code generation strategy changed from declarative with library support, to procedural, breadth or depth first, using only standard Python. The procedural code must know the AST structure to navigate it, although other strategies are available with PreOrderWalker, DepthFirstWalker, and ContextWalker.

竜 TatSu doesn’t impose a way to create translators with it, but it exposes the facilities it uses to generate the Python source code for parsers.

Translation in 竜 TatSu was template-based, but instead of defining or using a complex templating engine (yet another language), it relies on the simple but powerful string.Formatter of the Python standard library. The templates are simple strings that, in 竜 TatSu’s style, are inlined with the code.

To generate a parser, 竜 TatSu constructs an object model of the parsed grammar. A tatsu.codegen.CodeGenerator instance matches model objects to classes that descend from tatsu.codegen.ModelRenderer and implement the translation and rendering using string templates. Templates are left-trimmed on whitespace, like Python doc-comments are. This is an example taken from 竜 TatSu’s source code:

class Lookahead(ModelRenderer):
    template = '''\
                with self._if():
                {exp:1::}\
                '''

Every attribute of the object that doesn’t start with an underscore (_) may be used as a template field, and fields can be added or modified by overriding the render_fields(fields) method. Fields themselves are lazily rendered before being expanded by the template, so a field may be an instance of a ModelRenderer descendant.

The rendering module defines a Formatter enhanced to support the rendering of items in an iterable one by one. The syntax to achieve that is:

'''
{fieldname:ind:sep:fmt}
'''

All of ind, sep, and fmt are optional, but the three colons are not. A field specified that way will be rendered using:

indent(sep.join(fmt % render(v) for v in value), ind)

The extended format can also be used with non-iterables, in which case the rendering will be:

indent(fmt % render(value), ind)

The default multiplier for ind is 4, but that can be overridden using n*m (for example 3*1) in the format.

note: Using a newline character (\n) as separator will interfere with left trimming and indentation of templates. To use a newline as separator, specify it as \\n, and the renderer will understand the intention.

Translation¶

Print Translation¶

Declarative Translation (deprecated)¶

Table of Contents

Related Topics

This Page