Declarative Translation¶
Translation is one of the most common tasks in language processing. Analysis often sumarizes the parsed input, and walkers are good for that. In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient.
竜 TatSu provides support for template-based code generation (“translation”, see below)
in the tatsu.codegen
module.
Code generation works by defining a translation class for each class in the model specified by the grammar.
Nowadays the preferred code generation strategy is to walk down the AST and print() the desired output,
with the help of the NodWalker
class, and the IndentPrintMixin
mixin. That’s the strategy used
by pegen, the precursor to the new PEG parser in Python. Please take a lookt at the
mini-tutorial for an example.
Basically, the code generation strategy changed from declarative with library support, to procedural,
breadth or depth first, using only standard Python. The procedural code must know the AST structure
to navigate it, although other strategies are available with PreOrderWalker
, DepthFirstWalker
,
and ContextWalker
.
deprecated
竜 TatSu doesn’t impose a way to create translators with it, but it exposes the facilities it uses to generate the Python source code for parsers.
Translation in 竜 TatSu was template-based, but instead of defining or
using a complex templating engine (yet another language), it relies on
the simple but powerful string.Formatter
of the Python standard
library. The templates are simple strings that, in 竜 TatSu’s style,
are inlined with the code.
To generate a parser, 竜 TatSu constructs an object model of the parsed
grammar. A tatsu.codegen.CodeGenerator
instance matches model
objects to classes that descend from tatsu.codegen.ModelRenderer
and
implement the translation and rendering using string templates.
Templates are left-trimmed on whitespace, like Python doc-comments
are. This is an example taken from 竜 TatSu’s source code:
class Lookahead(ModelRenderer):
template = '''\
with self._if():
{exp:1::}\
'''
Every attribute of the object that doesn’t start with an underscore
(_
) may be used as a template field, and fields can be added or
modified by overriding the render_fields(fields)
method. Fields
themselves are lazily rendered before being expanded by the template,
so a field may be an instance of a ModelRenderer
descendant.
The rendering
module defines a Formatter
enhanced to support the
rendering of items in an iterable one by one. The syntax to achieve
that is:
'''
{fieldname:ind:sep:fmt}
'''
All of ind
, sep
, and fmt
are optional, but the three
colons are not. A field specified that way will be rendered using:
indent(sep.join(fmt % render(v) for v in value), ind)
The extended format can also be used with non-iterables, in which case the rendering will be:
indent(fmt % render(value), ind)
The default multiplier for ind
is 4
, but that can be overridden
using n*m
(for example 3*1
) in the format.
- note
Using a newline character (
\n
) as separator will interfere with left trimming and indentation of templates. To use a newline as separator, specify it as\\n
, and the renderer will understand the intention.