Using the Tool¶
As a Library¶
竜 TatSu can be used as a library, much like Python’s re
, by embedding grammars as strings and generating grammar models instead of generating Python code.
tatsu.compile(grammar, name=None, **kwargs)
Compiles the grammar and generates a model that can subsequently be used for parsing input with.
tatsu.parse(grammar, input, start=None, **kwargs)
Compiles the grammar and parses the given input producing an AST as result. The result is equivalent to calling:
model = compile(grammar) ast = model.parse(input)
Compiled grammars are cached for efficiency.
tatsu.to_python_sourcecode(grammar, name=None, filename=None, **kwargs)
Compiles the grammar to the Python sourcecode that implements the parser.
to_python_model(grammar, name=None, filename=None, **kwargs)
Compiles the grammar and generates the Python sourcecode that implements the object model defined by rule annotations.
This is an example of how to use Tatsu as a library:
GRAMMAR = '''
@@grammar::Calc
start = expression $ ;
expression
=
| term '+' ~ expression
| term '-' ~ expression
| term
;
term
=
| factor '*' ~ term
| factor '/' ~ term
| factor
;
factor
=
| '(' ~ @:expression ')'
| number
;
number = /\d+/ ;
'''
def main():
import pprint
import json
from tatsu import parse
from tatsu.util import asjson
ast = parse(GRAMMAR, '3 + 5 * ( 10 - 20 )')
print('PPRINT')
pprint.pprint(ast, indent=2, width=20)
print()
print('JSON')
print(json.dumps(asjson(ast), indent=2))
print()
if __name__ == '__main__':
main()
And this is the output:
PPRINT
[ '3',
'+',
[ '5',
'*',
[ '10',
'-',
'20']]]
JSON
[
"3",
"+",
[
"5",
"*",
[
"10",
"-",
"20"
]
]
]
Compiling grammars to Python¶
Tatsu can be run from the command line:
$ python -m tatsu
Or:
$ scripts/tatsu
Or just:
$ tatsu
if Tatsu was installed using easy_install or pip.
The -h and –help parameters provide full usage information:
$ python -m tatsu -h
usage: tatsu [--generate-parser | --draw | --object-model | --pretty]
[--color] [--trace] [--no-left-recursion] [--name NAME]
[--no-nameguard] [--outfile FILE] [--object-model-outfile FILE]
[--whitespace CHARACTERS] [--help] [--version]
GRAMMAR
TatSu takes a grammar in a variation of EBNF as input, and outputs a memoizing
PEG/Packrat parser in Python.
positional arguments:
GRAMMAR the filename of the Tatsu grammar to parse
optional arguments:
--generate-parser generate parser code from the grammar (default)
--draw, -d generate a diagram of the grammar (requires --outfile)
--object-model, -g generate object model from the class names given as
rule arguments
--pretty, -p generate a prettified version of the input grammar
parse-time options:
--color, -c use color in traces (requires the colorama library)
--trace, -t produce verbose parsing output
generation options:
--no-left-recursion, -l
turns left-recursion support off
--name NAME, -m NAME Name for the grammar (defaults to GRAMMAR base name)
--no-nameguard, -n allow tokens that are prefixes of others
--outfile FILE, --output FILE, -o FILE
output file (default is stdout)
--object-model-outfile FILE, -G FILE
generate object model and save to FILE
--whitespace CHARACTERS, -w CHARACTERS
characters to skip during parsing (use "" to disable)
common options:
--help, -h show this help message and exit
--version, -v provide version information and exit
$
The Generated Parsers¶
A Tatsu generated parser consists of the following classes:
A
MyLanguageBuffer
class derived fromtatsu.buffering.Buffer
that handles the grammar definitions for whitespace, comments, and case significance.A
MyLanguageParser
class derived fromtatsu.parsing.Parser
which uses aMyLanguageBuffer
for traversing input text, and implements the parser using one method for each grammar rule:
def _somerulename_(self):
...
A
MyLanguageSemantics
class with one semantic method per grammar rule. Each method receives as its single parameter the Abstract Syntax Tree (AST) built from the rule invocation:
def somerulename(self, ast):
return ast
A
if __name__ == '__main__':
definition, so the generated parser can be executed as a Python script.
The methods in the delegate class return the same AST received as parameter, but custom semantic classes can override the methods to have them return anything (for example, a Semantic Graph). The semantics class can be used as a template for the final semantics implementation, which can omit methods for the rules that do not need semantic treatment.
If present, a _default()
method will be called in the semantics
class when no method matched the rule name:
def _default(self, ast):
...
return ast
If present, a _postproc()
method will be called in the semantics
class after each rule (including the semantics) is processed. This
method will receive the current parsing context as parameter:
def _postproc(self, context, ast):
...
Using the Generated Parser¶
To use the generated parser, just subclass the base or the abstract
parser, create an instance of it, and invoke its parse()
method
passing the grammar to parse and the starting rule’s name as parameter:
from tatsu.util import asjson
from myparser import MyParser
parser = MyParser()
ast = parser.parse('text to parse', start='start')
print(ast)
print(json.dumps(asjson(ast), indent=2))
The generated parsers’ constructors accept named arguments to specify whitespace characters, the regular expression for comments, case sensitivity, verbosity, and more (see below).
To add semantic actions, just pass a semantic delegate to the parse method:
model = parser.parse(text, start='start', semantics=MySemantics())
If special lexical treatment is required (as in 80 column languages),
then a descendant of tatsu.tokenizing.Tokenizer
can be passed instead of
the text:
class MySpecialTokenizer(Tokenizer):
...
tokenizer = MySpecialTokenizer(text)
model = parser.parse(tokenizer, start='start', semantics=MySemantics())
The generated parser’s module can also be invoked as a script:
$ python myparser.py inputfile startrule
As a script, the generated parser’s module accepts several options:
$ python myparser.py -h
usage: myparser.py [-h] [-c] [-l] [-n] [-t] [-w WHITESPACE] FILE [STARTRULE]
Simple parser for DBD.
positional arguments:
FILE the input file to parse
STARTRULE the start rule for parsing
optional arguments:
-h, --help show this help message and exit
-c, --color use color in traces (requires the colorama library)
-l, --list list all rules and exit
-n, --no-nameguard disable the 'nameguard' feature
-t, --trace output trace information
-w WHITESPACE, --whitespace WHITESPACE
whitespace specification