Open Source Projects

Mouse: from Parsing Expressions to a practical parser

Parsing Expression Grammar (PEG) is a new way to specify recursive-descent parsers with limited backtracking. The use of backtracking lifts the LL(1) restriction usually imposed by top-down parsers. In addition, PEG can define parsers with integrated lexing.

Mouse is a tool to transcribe PEG into an executable parser written in Java. Unlike some existing PEG generators (e.g., Rats!), Mouse does not produce a storage-hungry "packrat parser", but a collection of transparent recursive procedures. An integral feature of Mouse is the mechanism for specifying semantics (also in Java). This makes Mouse a convenient tool if one needs an ad-hoc language processor. Being written in Java, the processor is operating-system independent.

Version 2.0 of Mouse introduces support for left recursion using an experimental method of recursive ascent. Its principle is explained in a separate document.

Included in the package is PEG Explorer, an interactive tool to investigate the effects of limited backtracking in PEG.

Project page (opens in a separate window).
Explorer page (opens in a separate window).
Sample grammars for Java and C.

Download user's manual / tutorial (PDF file).
Download the entire package (gzipped TAR file).


Computing with Units

Units is a program for computations on values expressed in terms of different measurement units. It is an advanced calculator that takes care of the units.

Project page (opens in a separate window).

Download Units JAR file.
Download the complete package (gzipped TAR file).

What is new in version 1.89.J01

This version mimicks the new version of GNU Units by offering three new features:

  1. Conversion to mixed units, for example, meters to feet and inches, or time to hours, minutes and seconds, like these:
    2 m = 6 ft + 6 in + 6|8in (rounded up to nearest 1|8in)
    1.1 * (2 hours + 5 min) = 2 hours + 17 min + 30 sec
    To save typing, patterns used for such conversions may be defined in units data file as 'unit list aliases'.
  2. Unicode support. Java works internally with double-byte Unicode characters. The units data file must now use the UTF-8 encoding, so you can use Unicode characters in unit names. However, the access to them is restricted by the GUI font, the encoding used by the operating system for command-line input and output, and by what you can enter from the keyboard. You can change the font and encoding using properties and command-line options.
  3. Unit names may end with a digit other than zero, if the digit is preceded by an underscore, e.g. 'NO_2'.
Internally, a generated parser is now used to read the units data file, and a lot of code clean-up has been done.


Latest change 2017-09-04