.. Copyright (c) 2017-2026 Juancarlo AƱez (apalala@gmail.com) .. SPDX-License-Identifier: BSD-4-Clause .. include:: links.rst Models ------ Building Models ~~~~~~~~~~~~~~~ Naming elements in grammar rules makes the parser discard uninteresting other parts of the input from the output, like punctuation. With naming |TatSu| produces an *Abstract Syntax Tree* (`AST`_) that reflects the semantic structure of what was parsed. But an `AST`_ doesn't carry information about the rule that generated it, so navigating the trees may be difficult. |TatSu| defines the ``tatsu.semantics.ModelBuilderSemantics`` semantics class which helps construct object models from abstract syntax trees: .. code:: python from tatsu.semantics import ModelBuilderSemantics parser = MyParser(semantics=ModelBuilderSemantics()) Then you add the desired node type as first parameter to each grammar rule: Builder semantics are enabled by passing `asmodel=True` to the ``tatsu.compile()`` or ``tatsu.parse()`` functions. .. code:: ebnf :force: addition[AddOperator]: left:mulexpre '+' right:addition ``ModelBuilderSemantics`` will synthesize a ``class AddOperator(Node):`` class and use it to construct the node. The synthesized class will have one attribute with the same name as each of the named elements in the rule. You can also use `Python`_'s built-in types as node types, and ``ModelBuilderSemantics`` will do the right thing: .. code:: ebnf :force: integer[int]: /[0-9]+/ ``ModelBuilderSemantics`` acts as any other semantics class, so its default behavior can be overridden by defining a method to handle the result of any particular grammar rule. Generating Models ~~~~~~~~~~~~~~~~~ To see what the classes for the grammar look like the ``tatsu`` command-line tool will generate a module definition with the required classes: .. code:: bash $ tatsu --object-model mygrammar.tatsu You can capture the output, or specify the module filename with the ``--object-model-outfile`` option to ``tatsu``. .. code:: bash $ tatsu --object-model-outfile mymodel.py mygrammar.tatsu |TatSu| will generate a ``mymodel.MyModelBuilderSemantics`` that can be passed as semantics to the ``parse()`` function to make it generate objects from the model according to rule declarations: .. code:: python model = tatsu.parse( mygrammar_str, text, semantics=mymodel.MyModelBuilderSemantics(), ) Defining Custom Models ~~~~~~~~~~~~~~~~~~~~~~ |TatSu| allows any definition of model classes: .. code:: python class Expression: ... class Addition(Expression): ... There's loss of functionality if model classes are not subclasses of ``objectmodel.Node`` (no ``node.children()``, ``node.parseinfo``, ``node.parent``, ``...``). For complete functionality it's better if custom model classes inherit from ``objectmodel.Node`` and are defined as ``@tatsudataclass`` so they are configured the |TatSu| way: .. code:: python from dataclasses import dataclass from tatsu.objectmodel import Node, tatsudataclass @tatsudataclass class Expression(Node): ... @tatsudataclass class Addition(Expression): ... Once the custom model classes are defined, |TatSu|'s entry points need to know about them, and there are flexible ways to do that: .. code:: python from . import model ct = { 'Expression': model.Expression, 'Addition': model.Addition, } result = tatsu.parse(grammar_str, text, constructors=ct) .. code:: python from tatsu.builder import types_defined_in ct = types_defined_in(globals()) result = tatsu.parse(grammar_str, text, constructors=ct) .. code:: python from tatsu.builder import types_defined_in from . import model ct = types_defined_in(model) result = tatsu.parse(grammar_str, text, constructors=ct) .. code:: python from . import model result = tatsu.parse(grammar_str, text, typedefs=model) .. code:: python from . import model grammar_model = tatsu.compile(gramar_str, typedefs=model) result = grammar_model.parse(text) Passing ``constructors=`` or ``typedefs=`` to the |TatSu| API implies that a model instead of an AST_ is being requested (``asmodel=True``). To know what ``@tatsudataclass`` means, you can take a look at ``objectmodel.TatSuDataclassParams`` for the used ``dataclass`` parameters. Viewing Models as JSON ~~~~~~~~~~~~~~~~~~~~~~ Models generated by |TatSu| can be viewed by converting them to a JSON-compatible structure with the help of ``tatsu.util.asjson()``. The protocol tries to provide the best representation for common types, and can handle any type using ``repr()``. Back references are handled to prevent infinite recursion. .. code:: python import json print(json.dumps(asjson(model), indent=2)) The ``model``, with richer semantics, remains unaltered. Conversion to a JSON-compatible structure relies on the protocol defined by ``tatsu.utils.asjson.AsJSONMixin``. The mixin defines a ``__json__()`` method that allows classes to define their best translation. You can use ``AsJSONMixin`` as a base class in your own models to take advantage of ``asjson()``, and you can specialize the conversion by overriding ``AsJSONMixin.__json__()``. .. code:: python def __json__(self, seen: set[int] | None = None) -> Any: return None # should not be rendered as JSON The ``AsJSONMixin`` implementation of ``__json__` decides what goes into the JSON representation by calling the ``__pub__()`` method. The default implementation of ``__pub__()`` returns the contents of ``vars(self)`` filtering out ``(name, value)`` items when: * ``name`` starts with an underscore * ``value`` is a method that is not also a ``property`` An easy way to restrict what goes into the JSON output is to override the ``__pub__()`` method in classes that inherit from ``AsJSONMixin``. .. code:: python def __pub__(self) -> dict[str, Any]: return { name: value for name, value in super().__pub__() if not name[0].isupper() } You can also write your own version of ``asjson()`` to handle special cases that are recurrent in your context. Walking Models ~~~~~~~~~~~~~~ The class ``tatsu.walkers.NodeWalker`` allows for the easy traversal (*walk*) a model constructed with a ``ModelBuilderSemantics`` instance: .. code:: python from tatsu.walkers import NodeWalker class MyNodeWalker(NodeWalker): def walk_AddOperator(self, node): left = self.walk(node.left) right = self.walk(node.right) print('ADDED', left, right) model = MyParser(semantics=ModelBuilderSemantics()).parse(input) walker = MyNodeWalker() walker.walk(model) When a method with a name like ``walk_AddOperator()`` is defined, it will be called when a node of that type is *walked*. The *pythonic* version of the class name may also be used for the *walk* method: ``walk__add_operator()`` (note the double underscore). If a *walk* method for a node class is not found, then a method for the class's bases is searched. That makes is possible to write *catch-all* methods such as: .. code:: python def walk_Node(self, node): print('Reached Node', node) def walk_str(self, s): return s def walk_object(self, o): raise Exception(f'Unexpected type {type(o).__name__} walked') Which nodes get *walked* is up to the ``NodeWalker`` implementation. Some strategies for walking *all* or *most* nodes are implemented as classes in ``tatsu.walkers``, such as ``PreOrderWalker`` and ``DepthFirstWalker``. Sometimes nodes must be walked more than once for the purpose at hand, and it's up to the walker how and when to do that. Take a look at ``tatsu.ngcodegen.PythonParserGenerator`` for the walker that generates a parser in Python from the model of a parsed grammar. Model Class Hierarchies ~~~~~~~~~~~~~~~~~~~~~~~ It's possible to specify a base class for generated model nodes: .. code:: ebnf :force: additive: | addition | substraction addition::AddOperator::Operator: left:mulexpre op:'+' right:additive substraction::SubstractOperator::Operator: left:mulexpre op:'-' right:additive |TatSu| will generate the base class if it's not already known. Base classes can be used as the target class in *walkers*, and in *code generators*: .. code:: python class MyNodeWalker(NodeWalker): def walk_Operator(self, node): left = self.walk(node.left) right = self.walk(node.right) op = self.walk(node.op) print(type(node).__name__, op, left, right)