- orphan:
Models#
Building Models#
Naming elements in grammar rules makes the parser discard uninteresting other parts of the input from the output, like punctuation. With naming 竜 TatSu produces an Abstract Syntax Tree (AST) that reflects the semantic structure of what was parsed. But an AST doesn’t carry information about the rule that generated it, so navigating the trees may be difficult.
竜 TatSu defines the tatsu.semantics.ModelBuilderSemantics semantics
class which helps construct object models from abstract syntax trees:
from tatsu.semantics import ModelBuilderSemantics
parser = MyParser(semantics=ModelBuilderSemantics())
Then you add the desired node type as first parameter to each grammar rule:
Builder semantics are enabled by passing asmodel=True to the
tatsu.compile() or tatsu.parse() functions.
addition[AddOperator]: left:mulexpre '+' right:addition
ModelBuilderSemantics will synthesize a class AddOperator(Node):
class and use it to construct the node. The synthesized class will have
one attribute with the same name as each of the named elements in the rule.
You can also use Python’s built-in types as node types, and
ModelBuilderSemantics will do the right thing:
integer[int]: /[0-9]+/
ModelBuilderSemantics acts as any other semantics class, so its
default behavior can be overridden by defining a method to handle the
result of any particular grammar rule.
Generating Models#
To see what the classes for the grammar look like the tatsu command-line
tool will generate a module definition with the required classes:
$ tatsu --object-model mygrammar.tatsu
You can capture the output, or specify the module filename with the
--object-model-outfile option to tatsu.
$ tatsu --object-model-outfile mymodel.py mygrammar.tatsu
竜 TatSu will generate a mymodel.MyModelBuilderSemantics that can be
passed as semantics to the parse() function to make it generate objects
from the model according to rule declarations:
model = tatsu.parse(
mygrammar_str,
text,
semantics=mymodel.MyModelBuilderSemantics(),
)
Defining Custom Models#
竜 TatSu allows any definition of model classes:
class Expression:
...
class Addition(Expression):
...
There’s loss of functionality if model classes are not subclasses of
objectmodel.Node (no node.children(), node.parseinfo,
node.parent, ...). For complete functionality it’s better if custom
model classes inherit from objectmodel.Node and are defined as
@tatsudataclass so they are configured the 竜 TatSu way:
from dataclasses import dataclass
from tatsu.objectmodel import Node, tatsudataclass
@tatsudataclass
class Expression(Node):
...
@tatsudataclass
class Addition(Expression):
...
Once the custom model classes are defined, 竜 TatSu’s entry points need to know about them, and there are flexible ways to do that:
from . import model
ct = {
'Expression': model.Expression,
'Addition': model.Addition,
}
result = tatsu.parse(grammar_str, text, constructors=ct)
from tatsu.builder import types_defined_in
ct = types_defined_in(globals())
result = tatsu.parse(grammar_str, text, constructors=ct)
from tatsu.builder import types_defined_in
from . import model
ct = types_defined_in(model)
result = tatsu.parse(grammar_str, text, constructors=ct)
from . import model
result = tatsu.parse(grammar_str, text, typedefs=model)
from . import model
grammar_model = tatsu.compile(gramar_str, typedefs=model)
result = grammar_model.parse(text)
Passing constructors= or typedefs= to the 竜 TatSu API implies that
a model instead of an AST is being requested (asmodel=True).
To know what @tatsudataclass means, you can take a look at
objectmodel.TatSuDataclassParams for the used dataclass parameters.
Viewing Models as JSON#
Models generated by 竜 TatSu can be viewed by converting them to a
JSON-compatible structure with the help of tatsu.util.asjson().
The protocol tries to provide the best representation for common types,
and can handle any type using repr(). Back references are handled to
prevent infinite recursion.
import json
print(json.dumps(asjson(model), indent=2))
The model, with richer semantics, remains unaltered.
Conversion to a JSON-compatible structure relies on the protocol defined by
tatsu.utils.asjson.AsJSONMixin. The mixin defines a __json__()
method that allows classes to define their best translation.
You can use AsJSONMixin as a base class in your own models to take advantage
of asjson(), and you can specialize the conversion by overriding AsJSONMixin.__json__().
def __json__(self, seen: set[int] | None = None) -> Any:
return None # should not be rendered as JSON
The AsJSONMixin implementation of __json__` decides what goes into
the JSON representation by calling the ``__pub__() method. The default
implementation of __pub__() returns the contents of vars(self)
filtering out (name, value) items when:
namestarts with an underscorevalueis a method that is not also aproperty
An easy way to restrict what goes into the JSON output is to override
the __pub__() method in classes that inherit from AsJSONMixin.
def __pub__(self) -> dict[str, Any]:
return {
name: value for name, value in super().__pub__()
if not name[0].isupper()
}
You can also write your own version of asjson() to handle special cases that are recurrent
in your context.
Walking Models#
The class tatsu.walkers.NodeWalker allows for the easy traversal
(walk) a model constructed with a ModelBuilderSemantics instance:
from tatsu.walkers import NodeWalker
class MyNodeWalker(NodeWalker):
def walk_AddOperator(self, node):
left = self.walk(node.left)
right = self.walk(node.right)
print('ADDED', left, right)
model = MyParser(semantics=ModelBuilderSemantics()).parse(input)
walker = MyNodeWalker()
walker.walk(model)
When a method with a name like walk_AddOperator() is defined, it
will be called when a node of that type is walked. The pythonic
version of the class name may also be used for the walk method:
walk__add_operator() (note the double underscore).
If a walk method for a node class is not found, then a method for the class’s bases is searched. That makes is possible to write catch-all methods such as:
def walk_Node(self, node):
print('Reached Node', node)
def walk_str(self, s):
return s
def walk_object(self, o):
raise Exception(f'Unexpected type {type(o).__name__} walked')
Which nodes get walked is up to the NodeWalker implementation. Some
strategies for walking all or most nodes are implemented as classes
in tatsu.walkers, such as PreOrderWalker and DepthFirstWalker.
Sometimes nodes must be walked more than once for the purpose at hand, and it’s up to the walker how and when to do that.
Take a look at tatsu.ngcodegen.PythonParserGenerator for the walker that
generates a parser in Python from the model of a parsed grammar.
Model Class Hierarchies#
It’s possible to specify a base class for generated model nodes:
additive:
| addition
| substraction
addition::AddOperator::Operator:
left:mulexpre op:'+' right:additive
substraction::SubstractOperator::Operator:
left:mulexpre op:'-' right:additive
竜 TatSu will generate the base class if it’s not already known.
Base classes can be used as the target class in walkers, and in code generators:
class MyNodeWalker(NodeWalker):
def walk_Operator(self, node):
left = self.walk(node.left)
right = self.walk(node.right)
op = self.walk(node.op)
print(type(node).__name__, op, left, right)