Maphoon
Maphoon is a C++ based parser generator. It is written in
C++, and it creates parsers in C++.
It consists of two parts, a tokenizer generator and
a parser generator.
The parser generator works like Yacc, but supports
C++ and RAII. It creates LALR parsers, and in addition
allows extending the language at run time, which
is needed for example by Prolog.
The tokenizer generator does not create a complete tokenizer,
but something called a classifier.
The classifier reads part of the input, and determines
to which symbol type it belongs.
Generating a full tokenizer would result in a tokenizer that
is too rigid to be useful. The generated classifier
can be extended by hand.
This is useful for non-regular
tokens, or for tokenizers that need to consider indentation.
Design Goals of the Parser Generator
The parser generator has the following design goals:
- The constructed parser is bottom-up.
Bottom-up parsing is theoretically and practically superior to
top-down parsing. There is usually no need to adapt the grammar,
and attribute computation can be specified directly with the grammar
rules.
- It is possible to use C++17 in action code.
Concretely, it is possible to use attribute types with resource
invariants (copy constructor, copying assignment, destructors)
without effort. (Think of STL-containers, smart pointers, etc.)
If all attribute types are movable, the parser will take
advantage of this. It is possible to use
move-only attributes, i.e. attributes that cannot be copied.
- The parser generator and the constructed parsers use
portable C++ without need for additional libraries.
-
It is easy to maintain source information in symbols for the
generation of error messages.
-
It is possible to define operators at run time.
This is useful if one wants to implement a parser for
Prolog.
Even if one does not want to extend the language at runtime,
using the mechanism often results in simpler
parsers that can be more easily modified.
The same mechanism can be used for defining context-sensitive
key words.
-
It is possible to define meaningful error messages.
Defining error messages is based on regular expressions
that trigger expectations. The error messages
explain what was expected, and what was obtained instead.
While designing Maphoon, we took
CUP as main example
of a nice parser generator. We wanted the same in C++ and added a few
extra features.
Design Goals of the Tokenizer Generator
The tokenizer generator has the following design goals:
-
The constructed tokenizer is flexible.
Our experience with previous tokenizer
generators is that they are borderline useful. Main problem is that
the constructed tokenizers are too rigid.
Often there are a few missing features that one wants to add but cannot,
the language may have a few non-regular symbols, and the user has
no control over the way source information is stored in tokens.
In order to allow flexibility, we don't generate a complete tokenizer,
but only a function that reads input into a buffer and reports
the symbol type that it belongs too. The user
can add other functions that read non-regular symbols into the buffer,
or
define a function that reads whitespace without storing it in the buffer.
-
It is easy to start using it.
In order to obtain this, pairs of regular expressions and
their classifications are initially written in code.
A collection of such pairs is stored in a container
called classifier.
Using the classifier, a function
called readandclassify( ) is called, which matches
the input to the regular expressions. Function readandclassify( )
returns a pair consisting of the number of read characters
together with the classification. After that, one can do
a case analysis on the classification.
-
It is possible to construct efficient tokenizers.
Building a classifier in code is easy,
but not efficient,
because regular expressions are translated into automata every time
the classifier is constructed.
In order to obtain efficient tokenizers, it is possible to print
the classifier as C++ code, and use
the C++ code. This can be done when the
classifier is finished. The resulting tokenizer is efficient.
The idea of directly generating code instead of tables
was taken from re2c.
Download
Both the tokenizer and the parser generator
are published under the
Aladdin Free Public License
The sources of the lexer generator are
here, and the manual
is here.
The sources of the parser generator are here, and
the manual is here.
In order to compile the parser generator,
edit the Makefile and set Lexing to the
directory containing directory lexing2023.
Set Maphoon to the directory containing maphoon2024.
If you want to run the multimodal logic
example, you will also need need TVM and
utilities.
In order to run, edit the Makefiles to set the directories
and type 'make'.
The zip-file contains the following examples:
-
The obligatory calculator.
-
An example which parses modal logic.
This example uses the tree generator, but it can be compiled without.
You will still need TVM.
-
A prolog parser.
There is no interpreter, only a parser.
In order to add operators, type op( unsigned, type, names ),
where unsigned is the priority, type
is in fx,fy,xfx,xfy,yfx,xf,yf,
and names is either a single operator, or a list of operators.
In order to quit, type halt. or #.