.. Copyright (c) 2017-2026 Juancarlo Añez (apalala@gmail.com) .. SPDX-License-Identifier: BSD-4-Clause .. include:: links.rst Using the Tool -------------- As a Library ~~~~~~~~~~~~ |TatSu| can be used as a library, much like `Python`_'s ``re``, by embedding grammars as strings and generating grammar models instead of generating Python_ code. - ``tatsu.compile(grammar, name=None, **settings)`` Compiles the grammar and generates a *model* that can subsequently be used for parsing input with. - ``tatsu.parse(grammar, input, start=None, **settings)`` Compiles the grammar and parses the given input producing an AST_ as result. The result is equivalent to calling:: model = compile(grammar) ast = model.parse(input) Compiled grammars are cached for efficiency. - ``to_python_model(grammar, name=None, filename=None, **settings)`` Compiles the grammar and generates the `Python`_ source code that implements the object model defined by rule annotations. - ``tatsu.to_parsermodel_sourcecode(grammar, name=None, filename=None, **settings)`` Compiles the grammar to the `Python`_ source code that for a recursive-descent implementation of the parser. - ``tatsu.to_python_sourcecode(grammar, name=None, filename=None, **settings)`` Compiles the grammar to the `Python`_ source code that for a recursive-descent implementation of the parser. This is an example of how to use **TatSu** as a library: .. code:: python GRAMMAR = ''' @@grammar[Calc] start: expression $ expression: | term '+' ~ expression | term '-' ~ expression | term term: | factor '*' ~ term | factor '/' ~ term | factor factor: | '(' ~ @:expression ')' | number number = /\d+/ ''' def main(): import pprint import json from tatsu import parse from tatsu.util import asjson ast = parse(GRAMMAR, '3 + 5 * ( 10 - 20 )') print('PPRINT') pprint.pprint(ast, indent=2, width=20) print() print('JSON') print(json.dumps(asjson(ast), indent=2)) print() if __name__ == '__main__': main() And this is the output: .. code:: bash PPRINT [ '3', '+', [ '5', '*', [ '10', '-', '20']]] JSON [ "3", "+", [ "5", "*", [ "10", "-", "20" ] ] ] Compiling grammars to Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **TatSu** can be run from the command line: .. code:: bash $ python -m tatsu Or: .. code:: bash $ scripts/tatsu Or just: .. code:: bash $ tatsu if **TatSu** was installed using *easy\_install* or *pip*. The *-h* and *--help* parameters provide full usage information: .. code:: console $ tatsu --help usage: tatsu [--generate-parser | --draw | --railroad | --object-model | --pretty | --pretty-lean] [--color] [--trace] [--left-recursion] [--name NAME] [--nameguard] [--outfile FILE] [--object-model-outfile FILE] [--whitespace CHARACTERS] [--base-type CLASSPATH] [--help] [--version] GRAMMAR 竜TatSu takes a grammar in extended EBNF as input, and outputs a memoizing PEG/Packrat parser in Python. positional arguments: GRAMMAR the filename of the TatSu grammar to parse options: --generate-parser generate parser code from the grammar (default) --draw, -d generate a diagram of the grammar (.svg, .png, .jpeg, .dot, ... / requres --outfile) --railroad, -r output a railroad diagram of the grammar in ASCII/Text Art --object-model, -g generate object model from the class names given as rule arguments --pretty, -p generate a prettified version of the input grammar --pretty-lean like --pretty, but without name: or [Parameter] annotations parse-time options: --color, -c use color in traces (requires the colorama library) --trace, -t produce verbose parsing output generation options: --left-recursion, -l turns left-recursion support on --name, -m NAME Name for the grammar (defaults to GRAMMAR base name) --nameguard, -n allow tokens that are prefixes of others --outfile, --output, -o FILE output file (default is stdout) --object-model-outfile, -G FILE generate object model and save to FILE --whitespace, -w CHARACTERS characters to skip during parsing (use "" to disable) --base-type CLASSPATH class to use as base type for the object model, for example "mymodule.MyNode" common options: --help, -h show this help message and exit --version, -V provide version information and exit $ The Generated Parsers ~~~~~~~~~~~~~~~~~~~~~ A **TatSu** generated parser consists of the following classes: - A ``MyLanguageBuffer`` class derived from ``tatsu.buffering.Buffer`` that handles the grammar definitions for *whitespace*, *comments*, and *case significance*. - A ``MyLanguageParser`` class derived from ``tatsu.parsing.Parser`` which uses a ``MyLanguageBuffer`` for traversing input text, and implements the parser using one method for each grammar rule: .. code:: python def _somerulename_(self): ... - A ``MyLanguageSemantics`` class with one semantic method per grammar rule. Each method receives as its single parameter the `Abstract Syntax Tree`_ (`AST`_) built from the rule invocation: .. code:: python def somerulename(self, ast): return ast - A ``if __name__ == '__main__':`` definition, so the generated parser can be executed as a `Python`_ script. The methods in the delegate class return the same `AST`_ received as parameter. Custom semantic classes can override the methods to have them return anything (for example, a `Semantic Graph`_). The semantics class can be used as a template for the final semantics implementation, which can omit methods for the rules that do not need semantic treatment. If present, a ``_default()`` method will be called in the semantics class when no method matched the rule name: .. code:: python def _default(self, ast): ... return ast If present, a ``_postproc()`` method will be called in the semantics class after each rule (including the semantics) is processed. This method will receive the current parsing context as parameter: .. code:: python def _postproc(self, context, ast): ... Using the Generated Parser ~~~~~~~~~~~~~~~~~~~~~~~~~~ To use the generated parser, just subclass the base or the abstract parser. Then create an instance of it. Then invoke its ``parse()`` method, passing the grammar to parse and the starting rule's name as parameters: .. code:: python from tatsu.util import asjson from myparser import MyParser parser = MyParser() ast = parser.parse('text to parse', start='start') print(ast) print(json.dumps(asjson(ast), indent=2)) The generated parsers' constructors accept named arguments to specify whitespace characters, the regular expression for comments, case sensitivity, verbosity, and more (see below). To add semantic actions, just pass a semantic delegate to the parse method: .. code:: python model = parser.parse(text, start='start', semantics=MySemantics()) If special lexical treatment is required (as in *80 column* languages), then an implementation of ``tatsu.input.Text`` can be passed instead of the text: .. code:: python from tatsu.input.text import Text class MySpecialInput(Text): ... input = MySpecialInput(text) model = parser.parse(input, start='start', semantics=MySemantics()) The generated parser's module can also be invoked as a script: .. code:: bash $ python myparser.py inputfile startrule As a script, the generated parser's module accepts some options: .. code:: bash $ python myparser.py -h usage: myparser.py [-h] [-c] [-l] [-n] [-t] [-w WHITESPACE] FILE [STARTRULE] Simple parser for DBD. positional arguments: FILE the input file to parse STARTRULE the start rule for parsing optional arguments: -h, --help show this help message and exit -c, --color use color in traces (requires the colorama library) -l, --list list all rules and exit -n, --no-nameguard disable the 'nameguard' feature -t, --trace output trace information -w WHITESPACE, --whitespace WHITESPACE whitespace specification