These pages resulted from continuous attempts to create an imperative programming language for the implementation of logic. Implementing logic is special because logical algorithms work on tree structures (formulas, types and proofs) that can have many different forms. For example, formulas can have different logic operators on top. Only when one knows the top-operator of a formula, one knows how many subtrees it has, and which types they have. This makes static type checking of logical algorithms difficult. Static type checking is only possible if classication and extraction of subformulas happens simultaneously. There are two approaches that can do this: Matching and use of derived classes. As soon as there is a program state between classifying the formula, and accessing the subformulas, one needs refinement types in this intermediate program state. This would happen if one uses a switch (or an if) to determine the operator, and after that try to access the subformulas.
Subtypes are closely related to union types and partial types. In case of a a union type, each of the members of the union type, is a subtype of the union type. A partial type can be viewed as a union between a defined and an undefined type. The root problem is that we don't know how to statically type check type unions. One can view the general type of formulas as a union of subtypes, each defined by the operator on top. This problem shows up in logic, but it is present in nearly all application domains.
Current approaches for handling union or partial types (union, std::variant, type casts, std::optional, std::expected, use of nullptr, use of special error-value like std::string::npos) do not allow static type checking. Some of the are dynamically type checked, which means that an error will be generated at run time, when a wrong subtype is accessed. This is the case for std::variant, C++ style casts, std::optional, and std::expected. The other approaches result in UB when a wrong subtype is accessed. Dynamic type checking is better than UB, but static type checking would stil be better because it guarantees that never a wrong subtype will be accessed at run time.
Static type checking for union types is out of reach of current type checking algorithms, but I am still hoping to solve the theoretical problems, and obtain a practical approach.
Despite the unsolved theoretical problems, the programming language project resulted in tools that make it really easy to implement logic in C++. I use the tools in logic research (see below). Moreover, the tools can be used for building compilers. AST has a tree-structure similar to logic formulas.
The parser generator works similar to Yacc, but fully supports C++ with RAII. It creates LALR parsers, and in addition allows extending the language at run time, which is needed for example by Prolog, where operators can be created a run time. The parser generator generates C++ code for a working parser, as well as an implementation of a symbol class.
The tokenizer generator does not create code for a complete tokenizer, but only a classifier . The classifier reads a part from the input and determines to which type of symbol it belongs. Generating C++ code for a full tokenizer would result in tokenizers that are too rigid to be useful. It is important that it is possible to extend the generated classifier by hand. One needs this if one wants to include tokens that are not regular, or create a tokenizer that takes indentation into account.
The tokenizer and parser generator together are called Maphoon. I have used Maphoon many times, in implementation of logic, in implementation of experimental compilers, and also in teaching. It just works. I recommend that you use it. Maphoon can be obtained from here.
The tree-class generator reads specifications of tree-like (algebraic, recursive) datatypes, and creates C++ implementations of these datatypes. It is called TreeGen. The specifications are as concise as in a functional language (if not more concise), while at the same time compiled into efficient C++ code. The generated tree classes have an API that is pleasant to use. In particular, they have value semantics, and use RAII. TreeGen by itself is written in C++. It can be obtained from this page.
The same remark about source code generation, also applies to TreeGen.
The prolog parser
contains only a parser. There is no interpreter.
In order to add operators, type
op( unsigned, type, names ),
where unsigned is the priority, type
is one of fx,fy,xfx,xfy,yfx,xf,yf,
and names is either a single
operator, or a list of operators.
In order to exit the parser, type halt. or #.
It is probably possible to compile without
Maphoon and TreeGen, but you still need
the lexer,
TVM
and these utilities.
I gave a presentation about TreeGen at C++Now 2023. The title was 'Trees for Logic and Parsing'. The slides are here, and this is a recording of the presentation. The implementation described in the presentation is outdated. The latest version is here.
I became interested in compiler construction because I had problems implementing Partial Higher-Order Logic with Interfaces (PHOLI). This is a higher-order, 3-valued logic for verification, that I want to develop and implement. Because the logic is continously changing, it has to be reimplemented all the time. I tried to do this in C++, Haskell and Prolog, but I didn't like any of these languages. In 2019, I decided to develop a programming language specially for logic, but I ended up creating the C++-tools described on this page. The main page of the PHOLI implementation is here, and the logic is described here. Iterating through implementations of PHOLI would be impossible without Maphoon and TreeGen.
Thanks to Danel Batyrbek, Witold Charatonik, Aleksandra Kireeva, Tatyana Korotkova, Akhmetzhan Kussainov, Dina Muktubayeva, Cláudia Nalon, Zhandarbek Saginov, Olzhas Zhangeldinov.
Special thanks for Zoltan Teki for extensively testing Maphoon, and requesting additional features. He found a bug in the tokenizer generator, and thanks to him Maphoon now supports move-only attributes.Parts of this research were sponsored by Nazarbayev University through the Faculty Development Competitive Research Grant Program (FDCRGP), under grant number 021220FD1651.