|
||||||||||||||||||||||
|
||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionThe Spart library an object oriented recursive descent parser generator framework implemented in C#. In fact, it is a partial port of the excellent Spirit library, which is written in C++ and uses meta-programming. The Spart framework enables a target grammar to be written exclusively in C#. An EBNF grammar can be closely match using C# code. In retrospect, conventional compiler-compilers or parser-generators have to perform an additional translation step from the source EBNF code to C or C++ code. I have takened the liberty to use the structure (and some sentence) of the Spirit documentation. Along the article, some notes are added regarding some issues about the port to C#: Spirit-2-Spart Notes (SSN). As always, this article presents an overview of the library. For deeper details, please refer to the NDoc documentation. The library also comes with a battery of NUnit tests. Quick StartSpart is designed to bring you parser capabilites quickly directly into your code. While it is not suited for creating parsers for entire language like C,C++, it is very effective for building micro-grammars in your code. When you need to build a new parser, there are existing solution: a combination of ....Parse (like So, as for Spirit, one of the main objective of Spart is to let you build easily grammar in C#. To fix this ideas, a few simple grammars illustrate Spart usage: Trivial example #1:Create a parser that will parse a digit: Prims.Digit
(This is a trivial case, SSN: This parser is equivalent to Trivial example #2:Create a parser that will parse a sequence of two digits Ops.Seq( Prims.Digit, Prims.Digit )
Here you see the familiar
Note: when we combine parsers, we end up with a "bigger" parser, But it's still a parser. Parsers can get bigger and bigger, nesting more and more, but whenever you glue two parsers together, you end up with one bigger parser. This is an important concept. SSN: The operator Trivial Example #3Create a parser that will accept an arbitrary number of digit. (Arbitrary means anything from zero to infinity). Ops.Star(Prims.Digit) This is like the regular expression Kleene Star. SSN: * cannot be an unary operator in C#. Less trivial example #4Create a parser that parse a sequence of comma separated digits and record them in a collection (note this can easily be done using // spirit: num_p >> *( ch_p(',') >> num_p)
Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))
Note that Ops.Seq(Prims.Ch(','), Prims.Digit)
which matches a sequence of comma and digit. Note: the Using the parserOnce we have built our parser, we want to use it. A parser can be used directly as is: // string to parser
String s = "1,2,3,4";
// creating the parser as above
Parser p = Ops.Seq(...);
// creating a scanner of the string
StringScanner scan = new StringScanner(s);
// parsing the string through the scanner
ParserMatch m = p.Parse(s);
Notes:
Now that we have parsed the text, the ParserMatch object can help answer questions like: was the match successful, what was the match value, etc... : if (m.Success)
Console.Write("successfull match!");
Semantic ActionsOur parser above is nothing but a recognizer, it does no take any actions. It answers "did our data match the grammar?" but it does not record anything. Remember that we wanted to record the digits into a collection. For example, whenever we parse a digit, we wish to store the parsed number after a successful match. We now wish to extract information from the parser. Semantic actions do this. Semantic actions may be attached to any point in the grammar specification. They through events and event handler. The // A digit recorder actor
public class MyActor
{
ArrayList digits; // digits collection
...
public void RecordDigit(Object sender, ActionEventArgs args)
{
// record the digit into the digits collection
digits.Add( args.TypedValue );
}
}
Then, we add a digit recorder handler to each digit parser: MyActor a = new MyActor();
// digit parser
Parser d = Prims.Digit;
// register actor
d.Act += new ActionEventHandler( a.RecordDigit );
// create parser
Parser p = Ops.Seq( d, Ops.Start( Ops.Seq(',', d)));
This is the same parser as above but now, Basic conceptsSpart follows the concepts of Spirit. There are a few fundamental concepts that need to be understood well: 1) The Parser, 2) the Match, 3) The Scanner, and 4) Semantic Actions. These basic concepts interact with each other, and the functionalities of each interweave throughout the entire framework to make it one coherent whole. I will go quickly over those concepts since they are very well explained in the Spirit documentation and I recommend you take a look there first. The parserCentral to the framework is the parser. The parser does the actual work of recognizing a linear input stream of data read sequentially from start to end by the scanner. The parser attempts to match the input following a well-defined set of specifications in the form of grammar rules. The parser reports the success or failure to its client through a match object. When successful, the parser calls a client-supplied semantic action. Finally, the strategically-placed semantic action extracts structural information depending on the data passed to it by the parser and the heirarchical context of the parser it is attached to. Parsers come in a lot of flavors and usually you don't need to write your own parser. Spart has a collection of built-in parsers that you can combine to create your grammars. The built-in parsers come in two (main) flavors: PrimitivesPrimitive parsers can be used to match characters, string, lower case character, digits, etc... The CombinationCombination parsers can be used to combine parsers, like sequence and star in the example. The The matchThe The ScannerLike the parser, the scanner is also an abstract concept, represented by the Semantic actionsA composite parser forms a hierarchy. Parsing proceeds from the topmost parent parser which delegates and apportions the parsing task to its children recursively to its childeren's children and so on until a primitive is reached. By attaching semantic actions to various points in this hierachy, in effect we can transform the flat linear input stream into a structured representation. This is essentially what parsers do. The RuleThe The Classic Calculator ExampleThere is still a lot to say about Spirit and Spart but I will cut to a final example. A better documentation should appear in a near future as this library is totally new! The favorite grammar example in the Spirit documentation is a calculator grammar:
Let us now show how to build this grammar with Spart (you can find in the demo applciation). Passing aside some initialization details, the C# code would look as follows: group.Parser = Ops.Seq('(',Ops.Seq(expression,')'));
factor.Parser = integer | group;
term.Parser = Ops.Seq( factor, Ops.Klenee(
Ops.Seq('*',factor) | Ops.Seq('/',factor) ));
expression.Parser = Ops.Seq(term,Ops.Klenee(Ops.Seq('+',term) |
Ops.Seq('-',term) ))
In the above, // declaration
Rule group;
...
group = new Rule();
Note also, that the DebuggingIt can be usefull to trace the scanner state and the parser matches. Similarly to Spirit, you can attach a tracer that will output a lot of interresting info: // declaration in the Calculator class
Debugger debug;
...
// in the constructor
// create a debugger that outputs to the console
debug = new Debugger( Console.Out );
// setting rules name for debugging purpose
group.ID = "group";
...
// tracing rules
debug += group;
debug += expression;
...
The output is formated as follows:
For example, when launching the parser on the string expression: (5+2)*(4*2)
term: (5+2)*(4*2)
factor: (5+2)*(4*2)
group: (5+2)*(4*2)
expression: 5+2)*(4*2)
term: 5+2)*(4*2)
factor: 5+2)*(4*2)
group: 5+2)*(4*2
#group: 5+2)*(4*
integer: 5+2)*(4
/integer: +2)*(4
/factor: +2)*(4*2)
/term: +2)*(4*2)
term: 2)*(4*2)
factor: 2)*(4*2)
group: 2)*(4*2)
#group: 2)*(4*2)
integer: 2)*(4*2
/integer: )*(4*2
/factor: )*(4*2)
/term: )*(4*2)
/expression: )*(4*2)
/group: *(4*2)
/factor: *(4*2)
factor: (4*2)
group: (4*2)
expression: 4*2)
term: 4*2)
factor: 4*2)
group: 4*2)
#group: 4*2)
integer: 4*2)
/integer: *2)
/factor: *2)
factor: 2)
group: 2)
#group: 2)
integer: 2)
/integer: )
/factor: )
/term: )
/expression: )
/group:
/factor:
/term:
/expression:
ToDo ListThere are still a lot of features from Spirit that are not implemented in Spart:
References
| |||||||||||||||||||||