Tools for Compilers

Introduction

These pages resulted from continuous attempts to create an imperative programming language for the implementation of logic. Implementing logic is special because logical algorithms work on tree structures (formulas, types and proofs) that can have many different forms. For example, formulas can have different logic operators on top. Only when one knows the top-operator of a formula, one knows how many subtrees it has, and which types they have. This makes static type checking of logical algorithms difficult. Static type checking is only possible if classication and extraction of subformulas happens simultaneously. There are two approaches that can do this: Matching and use of derived classes. As soon as there is a program state between classifying the formula, and accessing the subformulas, one needs refinement types in this intermediate program state. This would happen if one uses a switch (or an if) to determine the operator, and after that try to access the subformulas.

Subtypes are closely related to union types and partial types. In case of a a union type, each of the members of the union type, is a subtype of the union type. A partial type can be viewed as a union between a defined and an undefined type. The root problem is that we don't know how to statically type check type unions. One can view the general type of formulas as a union of subtypes, each defined by the operator on top. This problem shows up in logic, but it is present in nearly all application domains.

Current approaches for handling union or partial types (union, std::variant, type casts, std::optional, std::expected, use of nullptr, use of special error-value like std::string::npos) do not allow static type checking. Some of the are dynamically type checked, which means that an error will be generated at run time, when a wrong subtype is accessed. This is the case for std::variant, C++ style casts, std::optional, and std::expected. The other approaches result in UB when a wrong subtype is accessed. Dynamic type checking is better than UB, but static type checking would stil be better because it guarantees that never a wrong subtype will be accessed at run time.

Static type checking for union types is out of reach of current type checking algorithms, but I am still hoping to solve the theoretical problems, and obtain a practical approach.

Despite the unsolved theoretical problems, the programming language project resulted in tools that make it really easy to implement logic in C++. I use the tools in logic research (see below). Moreover, the tools can be used for building compilers. AST has a tree-structure similar to logic formulas.

Parser and Tokenizer Generator

The parser generator works similar to Yacc, but fully supports C++ with RAII. It creates LALR parsers, and in addition allows extending the language at run time, which is needed for example by Prolog, where operators can be created a run time. The parser generator generates C++ code for a working parser, as well as an implementation of a symbol class.

The tokenizer generator does not create code for a complete tokenizer, but only a classifier . The classifier reads a part from the input and determines to which type of symbol it belongs. Generating C++ code for a full tokenizer would result in tokenizers that are too rigid to be useful. It is important that it is possible to extend the generated classifier by hand. One needs this if one wants to include tokens that are not regular, or create a tokenizer that takes indentation into account.

The tokenizer and parser generator together are called Maphoon. I have used Maphoon many times, in implementation of logic, in implementation of experimental compilers, and also in teaching. It just works. I recommend that you use it. Maphoon can be obtained from here.

Remark About Transforming Source Code

With the advent of reflection in C++, transformers that generate source code, will become outdated. I will start looking into this as soon as I have access to a compiler that supports reflection.

Generator for Tree-Like Types

The tree-class generator reads specifications of tree-like (algebraic, recursive) datatypes, and creates C++ implementations of these datatypes. It is called TreeGen. The specifications are as concise as in a functional language (if not more concise), while at the same time compiled into efficient C++ code. The generated tree classes have an API that is pleasant to use. In particular, they have value semantics, and use RAII. TreeGen by itself is written in C++. It can be obtained from this page.

The same remark about source code generation, also applies to TreeGen.

Examples

Below are small examples using Maphoon, TreeGen, or both.

Calculator

The obligatory calculator example. It supports operator precendences. It uses Maphoon for tokenizing and parsing.

Turing Machine Simulator

The Turing machine simulator is faster than web-based simulators, easy to use, and it can handle non-deterministic Turing machines by exploring all possible computations. It is on a separate page. It uses Maphoon for parsing input.

Multimodal Logic Formulas

This example uses the tree generator and Maphoon. It should be possible to compile it without them. You will still need the lexer, and TVM. It parses multimodal formulas from input, and prints them back. For the rest it does nothing. The sources are here.

Prolog Parser

The prolog parser contains only a parser. There is no interpreter. In order to add operators, type op( unsigned, type, names ), where unsigned is the priority, type is one of fx,fy,xfx,xfy,yfx,xf,yf, and names is either a single operator, or a list of operators.

In order to exit the parser, type halt. or #. It is probably possible to compile without Maphoon and TreeGen, but you still need the lexer, TVM and these utilities.

Presentations at C++-Conferences

C++ Now (2023)

I gave a presentation about TreeGen at C++Now 2023. The title was 'Trees for Logic and Parsing'. The slides are here, and this is a recording of the presentation. The implementation described in the presentation is outdated. The latest version is here.

C++-now (2022)

I gave a lecture at C++Now 2022 about parsing, tokenizing and Maphoon. The slides and the code fragments that were shown in the lecture are here, and this is a recording of the presentation.

Proof Checking with 3-Valued Logic

I became interested in compiler construction because I had problems implementing Partial Higher-Order Logic with Interfaces (PHOLI). This is a higher-order, 3-valued logic for verification, that I want to develop and implement. Because the logic is continously changing, it has to be reimplemented all the time. I tried to do this in C++, Haskell and Prolog, but I didn't like any of these languages. In 2019, I decided to develop a programming language specially for logic, but I ended up creating the C++-tools described on this page. The main page of the PHOLI implementation is here, and the logic is described here. Iterating through implementations of PHOLI would be impossible without Maphoon and TreeGen.

Slides on Compiler Construction

My page with teaching materials contains slides that may be helpful when creating your own compiler. The slides contain explanations about tokenizing, parsing, type checking and intermediate code generation. The slides also contain implementation hints.

Slides on Lambda Calculus

In Fall 2023, I taught lambda calculus to master students. I created slides about basic lambda calculus, inductive data types, and the Hindley-Milner type checking algorithm. I think that the natural way to present the HM algorithm is by using a clause based approach. Slides are here. I write this, because lambda calculus is related to compilers.

Author

This page was created by Hans de Nivelle.

Acknowledgements

Thanks to Danel Batyrbek, Witold Charatonik, Aleksandra Kireeva, Tatyana Korotkova, Akhmetzhan Kussainov, Dina Muktubayeva, Cláudia Nalon, Zhandarbek Saginov, Olzhas Zhangeldinov.

Special thanks for Zoltan Teki for extensively testing Maphoon, and requesting additional features. He found a bug in the tokenizer generator, and thanks to him Maphoon now supports move-only attributes.

Parts of this research were sponsored by Nazarbayev University through the Faculty Development Competitive Research Grant Program (FDCRGP), under grant number 021220FD1651.