- orphan:
Using the Tool#
As a Library#
竜 TatSu can be used as a library, much like Python’s re, by embedding grammars as strings and generating grammar models instead of generating Python code.
tatsu.compile(grammar, name=None, **settings)Compiles the grammar and generates a model that can subsequently be used for parsing input with.
tatsu.parse(grammar, input, start=None, **settings)Compiles the grammar and parses the given input producing an AST as result. The result is equivalent to calling:
model = compile(grammar) ast = model.parse(input)
Compiled grammars are cached for efficiency.
to_python_model(grammar, name=None, filename=None, **settings)Compiles the grammar and generates the Python source code that implements the object model defined by rule annotations.
tatsu.to_parsermodel_sourcecode(grammar, name=None, filename=None, **settings)Compiles the grammar to the Python source code that for a recursive-descent implementation of the parser.
tatsu.to_python_sourcecode(grammar, name=None, filename=None, **settings)Compiles the grammar to the Python source code that for a recursive-descent implementation of the parser.
This is an example of how to use TatSu as a library:
GRAMMAR = '''
@@grammar[Calc]
start: expression $
expression:
| term '+' ~ expression
| term '-' ~ expression
| term
term:
| factor '*' ~ term
| factor '/' ~ term
| factor
factor:
| '(' ~ @:expression ')'
| number
number = /\d+/
'''
def main():
import pprint
import json
from tatsu import parse
from tatsu.util import asjson
ast = parse(GRAMMAR, '3 + 5 * ( 10 - 20 )')
print('PPRINT')
pprint.pprint(ast, indent=2, width=20)
print()
print('JSON')
print(json.dumps(asjson(ast), indent=2))
print()
if __name__ == '__main__':
main()
And this is the output:
PPRINT
[ '3',
'+',
[ '5',
'*',
[ '10',
'-',
'20']]]
JSON
[
"3",
"+",
[
"5",
"*",
[
"10",
"-",
"20"
]
]
]
Compiling grammars to Python#
TatSu can be run from the command line:
$ python -m tatsu
Or:
$ scripts/tatsu
Or just:
$ tatsu
if TatSu was installed using easy_install or pip.
The -h and –help parameters provide full usage information:
$ tatsu --help
usage: tatsu [--generate-parser | --draw | --railroad | --object-model |
--pretty | --pretty-lean] [--color] [--trace] [--left-recursion]
[--name NAME] [--nameguard] [--outfile FILE]
[--object-model-outfile FILE] [--whitespace CHARACTERS]
[--base-type CLASSPATH] [--help] [--version]
GRAMMAR
竜TatSu takes a grammar in extended EBNF as input, and outputs a memoizing
PEG/Packrat parser in Python.
positional arguments:
GRAMMAR the filename of the TatSu grammar to parse
options:
--generate-parser generate parser code from the grammar (default)
--draw, -d generate a diagram of the grammar (.svg, .png, .jpeg,
.dot, ... / requres --outfile)
--railroad, -r output a railroad diagram of the grammar in ASCII/Text
Art
--object-model, -g generate object model from the class names given as
rule arguments
--pretty, -p generate a prettified version of the input grammar
--pretty-lean like --pretty, but without name: or [Parameter]
annotations
parse-time options:
--color, -c use color in traces (requires the colorama library)
--trace, -t produce verbose parsing output
generation options:
--left-recursion, -l turns left-recursion support on
--name, -m NAME Name for the grammar (defaults to GRAMMAR base name)
--nameguard, -n allow tokens that are prefixes of others
--outfile, --output, -o FILE
output file (default is stdout)
--object-model-outfile, -G FILE
generate object model and save to FILE
--whitespace, -w CHARACTERS
characters to skip during parsing (use "" to disable)
--base-type CLASSPATH
class to use as base type for the object model, for
example "mymodule.MyNode"
common options:
--help, -h show this help message and exit
--version, -V provide version information and exit
$
The Generated Parsers#
A TatSu generated parser consists of the following classes:
A
MyLanguageBufferclass derived fromtatsu.buffering.Bufferthat handles the grammar definitions for whitespace, comments, and case significance.A
MyLanguageParserclass derived fromtatsu.parsing.Parserwhich uses aMyLanguageBufferfor traversing input text, and implements the parser using one method for each grammar rule:
def _somerulename_(self):
...
A
MyLanguageSemanticsclass with one semantic method per grammar rule. Each method receives as its single parameter the Abstract Syntax Tree (AST) built from the rule invocation:
def somerulename(self, ast):
return ast
A
if __name__ == '__main__':definition, so the generated parser can be executed as a Python script.
The methods in the delegate class return the same AST received as parameter. Custom semantic classes can override the methods to have them return anything (for example, a Semantic Graph). The semantics class can be used as a template for the final semantics implementation, which can omit methods for the rules that do not need semantic treatment.
If present, a _default() method will be called in the semantics
class when no method matched the rule name:
def _default(self, ast):
...
return ast
If present, a _postproc() method will be called in the semantics
class after each rule (including the semantics) is processed. This
method will receive the current parsing context as parameter:
def _postproc(self, context, ast):
...
Using the Generated Parser#
To use the generated parser, just subclass the base or the abstract
parser. Then create an instance of it. Then invoke its parse() method,
passing the grammar to parse and the starting rule’s name as parameters:
from tatsu.util import asjson
from myparser import MyParser
parser = MyParser()
ast = parser.parse('text to parse', start='start')
print(ast)
print(json.dumps(asjson(ast), indent=2))
The generated parsers’ constructors accept named arguments to specify whitespace characters, the regular expression for comments, case sensitivity, verbosity, and more (see below).
To add semantic actions, just pass a semantic delegate to the parse method:
model = parser.parse(text, start='start', semantics=MySemantics())
If special lexical treatment is required (as in 80 column languages),
then an implementation of tatsu.input.Text can be passed instead of
the text:
from tatsu.input.text import Text
class MySpecialInput(Text):
...
input = MySpecialInput(text)
model = parser.parse(input, start='start', semantics=MySemantics())
The generated parser’s module can also be invoked as a script:
$ python myparser.py inputfile startrule
As a script, the generated parser’s module accepts some options:
$ python myparser.py -h
usage: myparser.py [-h] [-c] [-l] [-n] [-t] [-w WHITESPACE] FILE [STARTRULE]
Simple parser for DBD.
positional arguments:
FILE the input file to parse
STARTRULE the start rule for parsing
optional arguments:
-h, --help show this help message and exit
-c, --color use color in traces (requires the colorama library)
-l, --list list all rules and exit
-n, --no-nameguard disable the 'nameguard' feature
-t, --trace output trace information
-w WHITESPACE, --whitespace WHITESPACE
whitespace specification