Publications
- Trying to understand PEG
Submitted for publication in Fundamenta Informaticae.
Preliminary version appeared in Proceedings of CS&P'2016 (2016) 1-12.
Abstract:
Parsing Expression Grammar (PEG) encodes
a recursive-descent parser with limited backtracking.
Its properties are useful in many applications.
In its appearance, PEG is almost identical to a grammar in
the Extended Backus-Naur Form (EBNF),
but may define a different language.
Moreover, PEG has a feature that is useful but alien to EBNF,
namely syntactic predicate.
Recent research found the condition under which
PEG without predicates defines the same language as its EBNF look-alike.
The condition is that
the limited backtracking of PEG
is efficient as replacement for full backtracking.
No EBNF equivalence can be defined for PEG with predicates,
but we conjecture that satisfying the same condition makes the grammar well-behaved.
There is, in general, no mechanical way to check the condition,
but it can be often checked by inspection.
The paper outlines an experimental tool to facilitate such inspection.
PDF file
Slides to presentation at CS&P'2016
- Cut points in PEG
Fundamenta Informaticae 143, 1-2 (2015) 141-149.
Abstract:
Parsing Expression Grammar (PEG) encodes
a recursive-descent parser with limited backtracking.
It has been recently noticed that in the situation when the parser
is to explore several alternatives one after another,
no further alternatives need to be explored after the parser
reached certain "cut point".
This fact can be used to save both processing time and storage.
The subject of the paper is identification of cut points,
which can also help in producing better diagnostics.
PDF file
- More about converting BNF to PEG
Fundamenta Informaticae 133, 2-3 (2014) 257-270.
Abstract:
A sequel to the paper "From EBNF to PEG".
An improved presentation of results from that paper and
further extensions to the results of Medeiros.
PDF file
- From EBNF to PEG
Fundamenta Informaticae 128, 1-2 (2013) 177-191.
Abstract:
Parsing Expression Grammar (PEG) encodes
a recursive-descent parser with limited backtracking.
The parser has many useful properties,
and with the use of memoization, it works in a linear time.
In its appearance, PEG is almost identical to a grammar in Extended Backus-Naur Form (EBNF),
but usually defines a different language.
However, in some cases only minor typographical changes are sufficient
to convert an EBNF grammar into its PEG parser.
As recently shown by Medeiros, this is, in particular,
true for LL(1) grammars.
But this is also true for many non-LL(1) grammars,
which is interesting because the backtracking of PEG
is often a convenient way to circumvent just the LL(1) restriction.
We formulate a number of conditions for EBNF grammar to become
its own PEG parser, and arrive at a condition that we call LL(1p),
meaning that a top-down parser can choose its next action by looking
at the input within the reach of one parsing procedure (rather than
by looking at the next letter).
An extension to LL(kp) for k>1 seems possible.
PDF file
Slides to presentation at CS&P'2012
- An improved construction of deterministic omega-automaton
using derivatives
Fundamenta Informaticae 119, 3-4 (2012) 393-406.
Abstract:
In an earlier paper, the author used derivatives
to construct a deterministic automaton recognizing
the language defined by an omega-regular expression.
The construction was related to a determinization method
invented by Safra.
This paper describes a new construction,
inspired by Piterman's improvement to Safra's method.
It produces an automaton with fewer states.
In addition, the presentation and proofs are simplified
by going via a nondeterministic automaton
having derivatives as states.
PDF file
Slides to presentation at CS&P'2011
- BITES instead of FIRST for Parsing Expression Grammar
Fundamenta Informaticae 109, 3 (2011) 323-337.
Abstract:
In an earlier paper,
the properties FIRST and FOLLOW used in
the construction of predictive top-down parsers were adapted
to Parsing Expression Grammars (PEGs).
The purpose was to obtain warnings for possible "language hiding".
It turned out that FIRST does not work well with lookahead expressions.
To repair this, it is replaced here by a property named BITES:
a set of input strings that an expression may "bite".
PDF file
Slides to presentation at CS&P'2010
- Infinite product of traces represented by projections
Fundamenta Informaticae 101, 1-2 (2010) 105-113.
Abstract:
The construction of an associative omega-product of traces from
an earlier paper is revisited using projection representation of traces.
Using projections instead of trace prefixes gives simpler and
more intuitive definitions and proofs.
PDF file
- Mouse: from Parsing Expressions to a practical parser
Proceedings of the CS&P'2009 Workshop 2,
Ed. L.Czaja and M.Szczuka,
Warsaw University (2009) 514-525.
Abstract:
Parsing Expression Grammar (PEG) is a new way to specify
recursive-descent parsers with limited backtracking.
The use of backtracking lifts the LL(1) restriction usually imposed
by top-down parsers.
In addition, PEG can directly define the structures that usually
require a separate "lexer" or "scanner".
This paper outlines a tool, named Mouse, to transcribe PEG
into an executable parser written in Java.
An integral feature of Mouse is the mechanism for specifying
semantics (also in Java).
This makes Mouse a convenient tool if one needs an ad-hoc language processor.
The name "Mouse" sets this tool apart from another parser generator
named "Rats!" that produces a storage-hungry "packrat parser".
PDF file
- Applying classical concepts to Parsing Expression Grammar
Fundamenta Informaticae 93, 1-3 (2009) 325-336.
Abstract:
The paper is an attempt to see how much we can learn about a given
Parsing Expression Grammar with the help of classical concepts used
in the construction of predictive top-down parsers.
PDF file
- Some aspects of Parsing Expression Grammar
Fundamenta Informaticae 85, 1-4 (2008) 441-454.
Abstract:
Parsing Expression Grammar (PEG) is a new way to specify syntax,
by means of a top-down process with limited backtracking.
It can be directly transcribed into a recursive-descent parser.
The parser does not require a separate lexer,
and backtracking removes the usual LL(1) constraint.
This is convenient for many applications, but there are
two problems: PEG is not well understood as a language
specification tool, and backtracking may result
in exponential processing time.
The paper consists of two parts that address these problems.
The first part is an attempt to find out the language actually defined
by a given parsing expression.
The second part reports measurements of backtracking activity
in a PEG-derived parser for the programming language C.
PDF file
- Parsing Expression Grammar as a primitive recursive-descent parser with backtracking
Fundamenta Informaticae 79, 3-4 (2007) 513-524.
Abstract:
Two recent developments in the field of formal languages
are: Parsing Expression Grammar (PEG) and packrat parsing.
The PEG formalism is similar to BNF,
but defines syntax in terms of recognizing strings,
rather than constructing them.
It is, in fact, precise specification of a backtracking
recursive-descent parser.
Packrat parsing is a general method to handle backtracking
in recursive-descent parsers.
It ensures linear working time, at a huge memory cost.
The paper reports an experiment that consisted
of defining the syntax of Java 1.5 in PEG formalism,
and constructing a parser from it by literally transcribing
the PEG definitions into parsing procedures (accidentally, also in Java).
The resulting primitive parser shows an acceptable behavior,
indicating that packrat parsing might be an overkill
for practical languages.
The exercise with defining the Java syntax suggests
that more work is needed on PEG as
a language specification tool.
PDF file
- Associative omega-product of processes
Fundamenta Informaticae 72, 1-3 (2006) 333-345.
Abstract:
The notion of associative infinite product is applied to processes.
Processes are one of the ways to represent
behavior of Petri nets.
They have been studied for some years as an alternative
to traces and dependence graphs.
One advantage of processes, as compared to traces,
is a very simple way to define infinite concatenation.
We take a closer look at this operation, and show that
it is a free associative infinite product of finite processes.
Its associativity simplifies some arguments about
infinite concatenation,
as illustrated by the proof of interleaving theorem.
PDF file
- Associative omega-products of traces
Fundamenta Informaticae 67, 1-3 (2005) 175-185.
Abstract:
The notion of associative infinite product is applied to traces,
resulting in an alternative approach to introducing infinite traces.
Four different versions of the product are explored, two of them
identical to known definitions of infinite trace.
PDF file
- Asynchronous circuits, communicating processes, and Muller automaton
Fundamenta Informaticae 61, 1 (2004) 47-59.
Abstract:
A finite-state model of communicating processes is presented alongside
with the Muller-Bartky model of asynchronous circuits.
The purpose of such presentation is to expose a close similarity of these models,
at a level of abstraction much lower than the Muller automaton.
PDF file
- On associative omega-products
Fundamenta Informaticae 60, 1-4 (2004) 333-350.
Abstract:
In recent years, a number of classical results
connecting rational languages with finite semigroups
have been extended to infinite-word languages using the notion of an omega-semigroup:
a semigroup augmented with an associative infinite product.
This paper takes a closer look at the associative infinite product itself.
It suggests some improvements and presents a couple of new facts.
PDF file
- Construction of a deterministic omega-automaton using derivatives
Theoretical Informatics and Applications 33, 2 (1999) 133-158.
Abstract:
A deterministic omega-automaton recognizing a given omega-regular language
is constructed from an omega-regular expression with the help of derivatives.
The construction is related to Safra's algorithm, in about the same way as
the classical derivative method is related to the subset construction.
It provides an alternative way for determinization of an omega-automaton,
starting with the automaton's language, rather than the automaton itself.
PDF file
- Schützenberger-like products in non-free monoids
Informatique Théorique et Applications 29, 3 (1995) 209-226.
Abstract:
The Schützenberger product has been introduced for studying languages that are
subsets of a free monoid.
However, it often retains its useful properties when the underlying monoid is not free.
The paper investigates when this is the case.
It uses for this purpose an alternative proof for the Schützenberger product,
based on properties of a certain partition of a monoid.
The same approach is then used for a similar investigation of the diamond product for traces.
PDF file
- Adding an infinite product to a semigroup
Abstract in Automata Theory: Infinite Computations.
Dagstuhl-Seminar-Report 28, eds. K. Compton, J.-E. Pin, W. Thomas.
Internationales Begegungs- und Forschungszentrum für Informatik Schloss Dagstuhl (1992) 9.
Slides to presentation (PDF file)
- Infinite-word languages and continuous mappings
Theoretical Computer Science 43, (1986) 59-79.
Abstract:
Languages with infinite words have often been studied as subsets of a topological space.
The topology used for this purpose was always the natural topology of sequences.
This paper suggests that another topology may be more useful.
It is obtained by taking Lim(W) as the derived set of W.
The resulting topological space is shown to be completely regular, strongly zero-dimensional,
and not normal.
The languages appearing in the theory of omega-automata are shown to be closely related to
functionally closed sets (continuous inverse images of the set containing 0 as the only element).
As a consequence, a number of known results can be expressed and proved in terms of continuous mappings.
PDF file
- HSK: an assembler generator
(together with Olov Hansson).
Technical Report TR 18.222, IBM Nordic Laboratory, Lidingö, Sweden (1975).
- The theory of general events and its application to parallel programming
Technical Paper TP 18.220, IBM Nordic Laboratory, Lidingö, Sweden (1972).
- On arithmetic expressions and trees
Communications of the ACM 12, 2 (1969) 81-84.
Abstract:
A description is given of how a tree representing the evaluation of an arithmetic expression
can be drawn in such a way that the number of accumulators needed for the computation
can be represented in a straightforward manner.
This representation reduces the choice of the best order of computation to a specific problem
under the theory of graphs.
An algorithm to solve this problem is presented.
PDF file
- On arithmetic expressions and trees
In NordDATA-69, part 1, Stockholm (1969) 207-213.
Abstract:
Conference presentation of the paper from Comm. ACM, plus an application of similar
technique to find the optimal order of evaluation on one-address machine with a single accumulator.
PDF file
Latest change 2017-01-11