Translation¶
Translation is one of the most common tasks in language processing. Analysis often sumarizes the parsed input, and walkers are good for that.
竜 TatSu doesn’t impose a way to create translators, but it exposes the facilities it uses to generate the Python source code for parsers.
Print Translation¶
Translation in 竜 TatSu is based on subclasses of NodeWalker
. Print-based translation
relies on classes that inherit from IndentPrintMixin
, a strategy copied from
the new PEG parser in Python (see PEP 617).
IndentPrintMixin
provides an indent()
method, which is a context manager,
and should be used thus:
class MyTranslationWalker(NodeWalker, IndentPrintMixin):
def walk_SomeNodeType(self, node: NodeType):
self.print('some preamble')
with self.indent():
# continue walking the tree
self.print('something else')
The self.print()
method takes note of the current level of indentation, so
output will be indented by the indent passed to
the IndentPrintMixin
constructor, or to the indent(amount: int)
method.
The mixin keeps as stack of the indent ammounts so it can go back to where it
was after each with indent(amount=n):
statement:
def walk_SomeNodeType(self, node: NodeType):
with self.indent(amount=2):
self.print(node.exp)
The printed code can be retrieved using the printed_text()
method, but other
posibilities are available by assigning a stream-like object to
self.output_stream
in the __init__()
method.
A good example of how to do code generation with a NodeWalker
and IndentPrintMixin
is 竜 TatSu’s own code generator, which can be
found in tatsu/ngcodegen/python.py
, or the model
generation found in tatsu/ngcodegen/objectomdel.py
.
Declarative Translation (deprecated)¶
竜 TatSu provides support for template-based code generation (“translation”, see below)
in the tatsu.codegen
module.
Code generation works by defining a translation class for each class in the model specified by the grammar.
Nowadays the preferred code generation strategy is to walk down the AST and print() the desired output,
with the help of the NodWalker
class, and the IndentPrintMixin
mixin. That’s the strategy used
by pegen, the precursor to the new PEG parser in Python. Please take a lookt at the
mini-tutorial for an example.
Basically, the code generation strategy changed from declarative with library support, to procedural,
breadth or depth first, using only standard Python. The procedural code must know the AST structure
to navigate it, although other strategies are available with PreOrderWalker
, DepthFirstWalker
,
and ContextWalker
.
竜 TatSu doesn’t impose a way to create translators with it, but it exposes the facilities it uses to generate the Python source code for parsers.
Translation in 竜 TatSu was template-based, but instead of defining or
using a complex templating engine (yet another language), it relies on
the simple but powerful string.Formatter
of the Python standard
library. The templates are simple strings that, in 竜 TatSu’s style,
are inlined with the code.
To generate a parser, 竜 TatSu constructs an object model of the parsed
grammar. A tatsu.codegen.CodeGenerator
instance matches model
objects to classes that descend from tatsu.codegen.ModelRenderer
and
implement the translation and rendering using string templates.
Templates are left-trimmed on whitespace, like Python doc-comments
are. This is an example taken from 竜 TatSu’s source code:
class Lookahead(ModelRenderer):
template = '''\
with self._if():
{exp:1::}\
'''
Every attribute of the object that doesn’t start with an underscore
(_
) may be used as a template field, and fields can be added or
modified by overriding the render_fields(fields)
method. Fields
themselves are lazily rendered before being expanded by the template,
so a field may be an instance of a ModelRenderer
descendant.
The rendering
module defines a Formatter
enhanced to support the
rendering of items in an iterable one by one. The syntax to achieve
that is:
'''
{fieldname:ind:sep:fmt}
'''
All of ind
, sep
, and fmt
are optional, but the three
colons are not. A field specified that way will be rendered using:
indent(sep.join(fmt % render(v) for v in value), ind)
The extended format can also be used with non-iterables, in which case the rendering will be:
indent(fmt % render(value), ind)
The default multiplier for ind
is 4
, but that can be overridden
using n*m
(for example 3*1
) in the format.
- note
Using a newline character (
\n
) as separator will interfere with left trimming and indentation of templates. To use a newline as separator, specify it as\\n
, and the renderer will understand the intention.