This document guides you through the usage of the parsers library, a parser combinator library for Dart. It assumes some basic knowledge of Dart.
The parsers library helps you build functions that take a string, check that it belongs to some language and compute something out of it.
For instance, it helps you write a function that accepts strings made of digits only and compute the integer they represent. But it can also help you building more involved functions, like one that accepts valid Java programs only and returns their AST (Abstract Syntax Tree).
It is called a parser combinator library because it is made of functions which build complex parsers out of simpler parsers. For instance, by feeding the sepBy
function with a parser of integers and a parser of commas, we obtain a parser of integers separated by commas. These parsers are themselves a combination of simple parsers, etc. Ultimately, the library exposes a very small set of primitive parsers and every other function it defines is built on top of them.
It is a powerful concept because it allows you to define domain specific combinators and parameterized parsers. For instance, the LanguageParser
class that ships with the library defines a set of parsers specialized in parsing programming languages. It is parameterized by things like the keywords of the programming language, or the syntax of its comments, which gives you control over the behaviour of these parsers while sparing you the gory details.
Now that you’re convinced parser combinators are the greatest thing since sliced bread, let’s get started!
If you know how to set up a Dart project with dependencies on third-party libraries, you can safely skip this section. Simply create a project depending on the latest version of the parsers library.
Create a directory called tutorial
with the following layout.
tutorial/
pubspec.yaml
bin/
tutorial.dart
In pubspec.yaml
, define a project named tutorial
depending the latest version of the parsers library.
name: tutorial
dependencies:
parsers: any
In bin/tutorial.dart
, import the parsers library and define the main function as below.
import 'package:parsers/parsers.dart';
main() {
final p = string('Hello World');
print(p.parse('Hello World'));
}
Install the dependencies by running pub install
in the tutorial
directory.
$ pub install
Resolving dependencies...
Downloading parsers 0.9.1...
Downloading unittest 0.2.8+2...
Downloading args 0.2.8+2...
Dependencies installed!
And finally run your program.
$ dart bin/tutorial.dart
Hello World
You’re all Set!
The main class exposed by the library is Parser<A>
. What we refer to as “parsers” in this document are instances of this class.
An instance of Parser<A>
is an object which – via the parse
method – consumes a string and either computes a value of type A
or fails.
Parser<Foo> p = ...;
p.parse("some string"); // returns a Foo or throws a parse error
You should never have to call the constructor of Parser
. That’s what primitive parsers combinators do for you. We introduce them below.
One of the most primitive and simple parser combinators is string
. It is a function that takes a string and returns a Parser<String>
.
Parser<String> p = string('foo');
Here p
expects the string 'foo'
and will return 'foo'
on success.
p.parse('foo'); // returns 'foo'
p.parse('bar'); // parse error
One thing to be aware of is that p
accepts any string starting with 'foo'
and not only the 3-character long 'foo'
string.
p.parse('foobar'); // returns 'foo'
It simply consumes the three first characters and leaves the rest to whichever parser it is chained with, as we will see later.
The Parser
class overrides the |
operator which is the first “real” parser combinator we’ll encounter: it combines two Parser
s to create a new one.
Given two parsers a
and b
, the parser a | b
(pronounced “a or b”) is the parser that returns a
’s result if a
succeeds, b
’s result otherwise.
final p = string('foo') | string('bar');
p.parse('foo'); // returns 'foo'
p.parse('bar'); // returns 'bar'
Of course, b
can fail too, in which case a | b
fails as a whole.
p.parse('qux'); // fails
What is the type of p
? In this case it is Parser<String>
because both string('foo')
and string('bar')
are parsers of type Parser<String>
. But if they were computing values of different types that would be their closest common ancestor (usually Parser<Object>
). So far however we have only encountered parsers that compute strings. Let us see how to create parsers computing something else.
The ^
operator of Parser
transforms the result of a successful parse. It leaves a failure unchanged.
Parser<int> p = string('123') ^ ((s) => int.parse(s) + 1);
p.parse('123'); // returns 124
p.parse('foo'); // parse error
This is how we obtain parsers which compute something useful (like an AST) instead of simply echoing their input. It should be stressed that ^
doesn’t change which input the parser it applies to accepts, only its result.
As any other combinator, ^
applies to any parser, even to composite ones.
Parser<int> p = (string('1') | string('2') | string('3')) ^ int.parse;
p.parse('1'); // returns 1
p.parse('2'); // returns 2
p.parse('3'); // returns 3
It can as well apply to individual branches of a disjunction, to unify their types.
Parser<String> one = string('one');
Parser<int> two = string('2') ^ int.parse;
Parser<int> oneOrTwo = one ^ ((_) => 1) | two;
These examples however are not very exciting, because we might as well write string('2') ^ ((_) => 2)
instead of string('a') ^ int.parse
since string('2')
is always going to return '2'
on success anyway. In order to build more interesting parsers we need to chain them an undetermined number of times, which is what the next two sections are about.
It would have been natural to introduce the sequencing of parsers earlier in this guide. However, the way it is exposed in the parsers library requires some understanding of the ^
operator, which is why we only tackle it now.
Sequencing is achieved via the +
operator.
final protoParser = a + b;
Given two parsers a
and b
, a + b
parses a
then b
. If one of them fails, it fails altogether. However a + b
is not quite completely a parser, hence the name protoParser
. The reason why it is not a parser is because it is not obvious what it should compute. Parser a
computes some value, parser b
computes some other value, but what should a + b
compute?
The only available way to turn a proto parser into a real one is by calling its ^
operator. It takes as many arguments as there are elements in the sequence.
combine(x, y, z) => ['$z$y', x];
final p = string('foo') + string('bar') + string('baz') ^ combine;
p.parse('foobarbaz'); // returns ['bazbar', 'foo]
Since + has a higher precedence than ^ in Dart, combine
applies to the whole sequence. The operators in parsers are chosen so that they usually “do the right thing”, but precedence can be tricky. When in doubt, use parenthesis.
Thanks to +
and ^
we can now define our very first combinator: a combinator that takes a parser p
and returns a parser of p
parenthesized.
Parser parens(Parser p) {
return string('(') + p + string(')') ^ (left, x, right) => x;
}
final p = parens(string('foo'));
p.parse('(foo)'); // returns 'foo'
That’s the beauty of parser combinator libraries: we piggyback on the host language’s abstraction mechanisms (here Dart functions) to define reusable parsing behaviours. Most of the combinators of the parsers library are defined this way.