Introduction

TatSu is different from other PEG parser generators:

  • Generated parsers use Python’s very efficient exception-handling system to backtrack. 竜 TatSu generated parsers simply assert what must be parsed. There are no complicated if-then-else sequences for decision making or backtracking. Memoization allows going over the same input sequence several times in linear time.

  • Positive and negative lookaheads, and the cut element (with its cleaning of the memoization cache) allow for additional, hand-crafted optimizations at the grammar level.

  • Delegation to Python’s re module for lexemes allows for (Perl-like) powerful and efficient lexical analysis.

  • The use of Python’s context managers considerably reduces the size of the generated parsers for code clarity, and enhanced CPU-cache hits.

  • Include files, rule inheritance, and rule inclusion give 竜 TatSu grammars considerable expressive power.

  • Automatic generation of Abstract Syntax Trees_ and Object Models, along with Model Walkers and Code Generators make analysis and translation approachable

The parser generator, the run-time support, and the generated parsers have measurably low Cyclomatic complexity. At around 5 KLOC of Python, it is possible to study all its source code in a single session.

The only dependencies are on the Python standard library, yet the regex library will be used if installed, and colorama will be used on trace output if available. pygraphviz is required for generating diagrams.

TatSu is feature-complete and currently being used with complex grammars to parse, analyze, and translate hundreds of thousands of lines of input text, including source code in several programming languages.