v5.17.1 The Overdue Major Refactoring#
The Refactoring#
Maintenance and contributions to TatSu have been more difficult than necessary because of the way the code evolved through its lifetime.
Long modules and classes that try to do too much
Algorithms difficult to understand or with incorrect semantics
Basic features missing, because the above made them hard to implement
This release is a major refactoring of the code in TatSu.
Complex modules were partitioned into sub-modules and classes with well-defined purpose
Several algorithms were rewritten to make their semantics clear and evident, and their implementation more efficient
Many unit tests were added to assert the semantics of complex algorithms
Several user-facing features were added as they became easier to implement
For the details about the many changes please take a look at the commit log.
repo: neogeny/TatSu
Every effort has been made to preserve backwards compatibility by keeping most unit tests intact and testing with projects with large grammars and complex processing. If something escaped those tests, there will be a bugfix release with the fixes soon enough.
User-Facing Changes#
The TatSu documentation has been improved and expanded, and it has a better look&feel with improved navigation.
TatSu doesn’t care about file names, but the default extension used in unit tests, examples, and documentation for grammars is now
.tatsuEBNF, both ISO and the classic variations, is fully supported as grammar input format
Now
tatsu.parse(...., asmodel=True)produces a model that matches the::Typedeclarations in their grammar (see the models documentation for a thorough review of the features).walkers.NodeWalkernow handles all known types of input. Also:DepthFirstWalkerwas reimplemented to ensure DFS semanticsPostOrderDepthFirstWalkerwalks children before parentsPreOrderWalkerwas broken and crazy. It was rewritten as aBreadthFirstWalkerwith the correct semantics
Constant expressions in a grammar are now evaluated deeply with multiple passes of
eval()as to produce results that are intuitively correct:def test_constant_math(): grammar = r""" start = a:`7` b:`2` @:```{a} / {b}``` $ ; """ result = parse(grammar, '', trace=True) assert result == 3.5
Evaluation of Python expressions by the parsing engine now use
safe_eval(), a hardened firewall around most security attacks targetingeval()(see the safeeval module for details)Because
Noneis a valid initial value for attributes and a frequent return value for callables, the required logic for undefined values was moved to thenotnonemodule, which declaresUndefinedas an alias fornotnone.NotNoneIn [1]: from tatsu.util.undefined import Undefined In [2]: u = Undefined In [3]: u is None Out[3]: False In [4]: u is Undefined Out[4]: True In [5]: Undefined is None Out[5]: False In [6]: d = u or 'OK' In [7]: d Out[7]: 'OK'
objectmodel.Nodewas rewritten to give it clear semantics and efficiencyNew attributes to
Nodeafter initialization generate a warning if the name of a method is being shadowed. This change avoids confusing@dataclass, which is used in generated object models.Nodeequality is explicitly defined as object identity. No attempts are made at comparingNodestructurally.Node.children()has the expected semantics, and is much more efficient.
Node.parseinfois now honored by the parsing engine (previously, only results of typeASTcould have aparseinfo). Generation ofparseinfois disabled by default, and is enabled by passingpareseinfo=Trueto the API entry points.def test_node_parseinfo(self): grammar = """ @@grammar :: Test start::Test = true | false ; true = "test" @:`True` $; false = "test" @:`False` $; """ text = 'test' node = tatsu.parse(grammar, text, asmodel=True, parseinfo=True, ) assert type(node).__name__ == 'Test' assert node.ast is True assert node.parseinfo is not None assert node.parseinfo.pos == 0 assert node.parseinfo.endpos == len(text)
Synthetic classes created by
synth.synthetize()during parsing withModelBuilderSemanticsbehave more consistently, and now have a base class ofclass SynthNode(BaseNode)Now
ast.ASThas consistent semantics of adictthat allows access to contents using the attribute interfaceasjson()and friends now cover all known cases with improved consistency and efficiency, so there are less demands over clients of the APIEntry points no longer list a large subset of the configuration options defined in
ParserConfig, but still accept them through**settingskeyword arguments. NowParserConfigverifies that the settings passed to are valid, eliminating the frustration of passing an incorrect setting name (a typo) and hoping it has the intended effect.TatSu still has no library dependencies for its core functionality, but several libraries are used during its development and testing. The TatSu development configuration uses
uvandhatch. Severalrequirements-xyz.txtfiles are generated in favor of those usingpipwithpyenv,virtualenvwrapper, orvirtualenvAll attempts at recovering comments from parsed input were removed. It never worked, so it had no use. Comment recovery may be attempted in the future.
All pre-existing grammars are compatible with this version of TatSu.
Previously generated Python parsers and models, work with this version of TatSu, yet you should consider generating them anew to take advantage of the improved speed, layout, and features.
CAVEAT: Several functions, methods, and argument names were deprecated. They can still be used, but warnings will be issued at runtime.
CAVEAT: If there are invalid strings or regex patterns in your grammars YOU MUST fix them because now the grammar parser validates strings and patterns.
Many of the functions that TatSu defines for its own use are useful in other contexts. Some examples are:
from tatsu.safeeval import is_eval_safe from tatsu.safeeval import hasshable from tatsu.safeeval import make_hashable from tatsu.util import safe_name from tatsu.util.misc import find_from_rematch from tatsu.util.misc import topsort from tatsu.util.undefined import Undefined # ...