orphan:

Using the Tool#

As a Library#

TatSu can be used as a library, much like Python’s re, by embedding grammars as strings and generating grammar models instead of generating Python code.

  • tatsu.compile(grammar, name=None, **settings)

    Compiles the grammar and generates a model that can subsequently be used for parsing input with.

  • tatsu.parse(grammar, input, start=None, **settings)

    Compiles the grammar and parses the given input producing an AST as result. The result is equivalent to calling:

    model = compile(grammar)
    ast = model.parse(input)
    

    Compiled grammars are cached for efficiency.

  • to_python_model(grammar, name=None, filename=None, **settings)

    Compiles the grammar and generates the Python source code that implements the object model defined by rule annotations.

  • tatsu.to_parsermodel_sourcecode(grammar, name=None, filename=None, **settings)

    Compiles the grammar to the Python source code that for a recursive-descent implementation of the parser.

  • tatsu.to_python_sourcecode(grammar, name=None, filename=None, **settings)

    Compiles the grammar to the Python source code that for a recursive-descent implementation of the parser.

This is an example of how to use TatSu as a library:

GRAMMAR = '''
    @@grammar[Calc]

    start: expression $

    expression:
        | term '+' ~ expression
        | term '-' ~ expression
        | term

    term:
        | factor '*' ~ term
        | factor '/' ~ term
        | factor

    factor:
        | '(' ~ @:expression ')'
        | number

    number = /\d+/
'''


def main():
    import pprint
    import json
    from tatsu import parse
    from tatsu.util import asjson

    ast = parse(GRAMMAR, '3 + 5 * ( 10 - 20 )')
    print('PPRINT')
    pprint.pprint(ast, indent=2, width=20)
    print()

    print('JSON')
    print(json.dumps(asjson(ast), indent=2))
    print()


if __name__ == '__main__':
    main()

And this is the output:

PPRINT
[ '3',
  '+',
  [ '5',
    '*',
    [ '10',
      '-',
      '20']]]

JSON
[
  "3",
  "+",
  [
    "5",
    "*",
    [
      "10",
      "-",
      "20"
    ]
  ]
]

Compiling grammars to Python#

TatSu can be run from the command line:

$ python -m tatsu

Or:

$ scripts/tatsu

Or just:

$ tatsu

if TatSu was installed using easy_install or pip.

The -h and –help parameters provide full usage information:

$ tatsu --help
usage: tatsu [--generate-parser | --draw | --railroad | --object-model |
             --pretty | --pretty-lean] [--color] [--trace] [--left-recursion]
             [--name NAME] [--nameguard] [--outfile FILE]
             [--object-model-outfile FILE] [--whitespace CHARACTERS]
             [--base-type CLASSPATH] [--help] [--version]
             GRAMMAR

竜TatSu takes a grammar in extended EBNF as input, and outputs a memoizing
PEG/Packrat parser in Python.

positional arguments:
  GRAMMAR               the filename of the TatSu grammar to parse

options:
  --generate-parser     generate parser code from the grammar (default)
  --draw, -d            generate a diagram of the grammar (.svg, .png, .jpeg,
                        .dot, ... / requres --outfile)
  --railroad, -r        output a railroad diagram of the grammar in ASCII/Text
                        Art
  --object-model, -g    generate object model from the class names given as
                        rule arguments
  --pretty, -p          generate a prettified version of the input grammar
  --pretty-lean         like --pretty, but without name: or [Parameter]
                        annotations

parse-time options:
  --color, -c           use color in traces (requires the colorama library)
  --trace, -t           produce verbose parsing output

generation options:
  --left-recursion, -l  turns left-recursion support on
  --name, -m NAME       Name for the grammar (defaults to GRAMMAR base name)
  --nameguard, -n       allow tokens that are prefixes of others
  --outfile, --output, -o FILE
                        output file (default is stdout)
  --object-model-outfile, -G FILE
                        generate object model and save to FILE
  --whitespace, -w CHARACTERS
                        characters to skip during parsing (use "" to disable)
  --base-type CLASSPATH
                        class to use as base type for the object model, for
                        example "mymodule.MyNode"

common options:
  --help, -h            show this help message and exit
  --version, -V         provide version information and exit
$

The Generated Parsers#

A TatSu generated parser consists of the following classes:

  • A MyLanguageBuffer class derived from tatsu.buffering.Buffer that handles the grammar definitions for whitespace, comments, and case significance.

  • A MyLanguageParser class derived from tatsu.parsing.Parser which uses a MyLanguageBuffer for traversing input text, and implements the parser using one method for each grammar rule:

def _somerulename_(self):
    ...
  • A MyLanguageSemantics class with one semantic method per grammar rule. Each method receives as its single parameter the Abstract Syntax Tree (AST) built from the rule invocation:

def somerulename(self, ast):
    return ast
  • A if __name__ == '__main__': definition, so the generated parser can be executed as a Python script.

The methods in the delegate class return the same AST received as parameter. Custom semantic classes can override the methods to have them return anything (for example, a Semantic Graph). The semantics class can be used as a template for the final semantics implementation, which can omit methods for the rules that do not need semantic treatment.

If present, a _default() method will be called in the semantics class when no method matched the rule name:

def _default(self, ast):
    ...
    return ast

If present, a _postproc() method will be called in the semantics class after each rule (including the semantics) is processed. This method will receive the current parsing context as parameter:

def _postproc(self, context, ast):
    ...

Using the Generated Parser#

To use the generated parser, just subclass the base or the abstract parser. Then create an instance of it. Then invoke its parse() method, passing the grammar to parse and the starting rule’s name as parameters:

from tatsu.util import asjson
from myparser import MyParser

parser = MyParser()
ast = parser.parse('text to parse', start='start')
print(ast)
print(json.dumps(asjson(ast), indent=2))

The generated parsers’ constructors accept named arguments to specify whitespace characters, the regular expression for comments, case sensitivity, verbosity, and more (see below).

To add semantic actions, just pass a semantic delegate to the parse method:

model = parser.parse(text, start='start', semantics=MySemantics())

If special lexical treatment is required (as in 80 column languages), then an implementation of tatsu.input.Text can be passed instead of the text:

from tatsu.input.text import Text

class MySpecialInput(Text):
    ...

input = MySpecialInput(text)
model = parser.parse(input, start='start', semantics=MySemantics())

The generated parser’s module can also be invoked as a script:

$ python myparser.py inputfile startrule

As a script, the generated parser’s module accepts some options:

$ python myparser.py -h
usage: myparser.py [-h] [-c] [-l] [-n] [-t] [-w WHITESPACE] FILE [STARTRULE]

Simple parser for DBD.

positional arguments:
    FILE                  the input file to parse
    STARTRULE             the start rule for parsing

optional arguments:
    -h, --help            show this help message and exit
    -c, --color           use color in traces (requires the colorama library)
    -l, --list            list all rules and exit
    -n, --no-nameguard    disable the 'nameguard' feature
    -t, --trace           output trace information
    -w WHITESPACE, --whitespace WHITESPACE
                        whitespace specification