|  | Home | Libraries | People | FAQ | More | 
This section is not for the faint of heart. In here, are distilled the inner workings of Spirit.Qi parsers, using real code from the Spirit library as examples. On the other hand, here is no reason to fear reading on, though. We tried to explain things step by step while highlighting the important insights.
          The Parser
template <typename Derived> struct parser { struct parser_id; typedef Derived derived_type; typedef qi::domain domain; // Requirement: p.parse(f, l, context, skip, attr) -> bool // // p: a parser // f, l: first/last iterator pair // context: enclosing rule context (can be unused_type) // skip: skipper (can be unused_type) // attr: attribute (can be unused_type) // Requirement: p.what(context) -> info // // p: a parser // context: enclosing rule context (can be unused_type) // Requirement: P::template attribute<Ctx, Iter>::type // // P: a parser type // Ctx: A context type (can be unused_type) // Iter: An iterator type (can be unused_type) Derived const& derived() const { return *static_cast<Derived const*>(this); } };
          The ParserDerived to do the actual
          parsing. This technique is known as the "Curiously Recurring Template
          Pattern" in template meta-programming circles. This inheritance strategy
          gives us the power of polymorphism without the virtual function overhead.
          In essence this is a way to implement compile time polymorphism.
        
          The Derived parsers, PrimitiveParserUnaryParserBinaryParserNaryParser
Derived parsers must support the following:
bool parse(f, l, context, skip, attr)
f, lfirst/last iterator pair
contextenclosing rule context (can be unused_type)
skipskipper (can be unused_type)
attrattribute (can be unused_type)
          The parse is the main parser entry point. skipper
          can be an unused_type.
          It's a type used every where in Spirit
          to signify "don't-care". There is an overload for skip
          for unused_type that is
          simply a no-op. That way, we do not have to write multiple parse functions
          for phrase and character level parsing.
        
Here are the basic rules for parsing:
true
              if successful, false otherwise.
            first
              is incremented N number of times, where N is the number of characters
              parsed. N can be zero --an empty (epsilon) match.
            first
              is reset to its position before entering the parser function. attr
              is untouched.
            void what(context)
context
                enclosing rule context (can be unused_type)
              
The what function should be obvious. It provides some information about “what” the parser is. It is used as a debugging aid, for example.
P::template attribute<context>::type
Pa parser type
contextA context type (can be unused_type)
The attribute metafunction returns the expected attribute type of the parser. In some cases, this is context dependent.
In this section, we will dissect two parser types:
Parsers
PrimitiveParserA parser for primitive data (e.g. integer parsing).
UnaryParserA parser that has single subject (e.g. kleene star).
          For our dissection study, we will use a Spirit
          primitive, the any_int_parser
          in the boost::spirit::qi namespace.
        
[primitive_parsers_any_int_parser]
          The any_int_parser is derived
          from a PrimitiveParser<Derived>parser<Derived>. Therefore, it supports the following
          requirements:
        
parse member function
            what member function
            attribute
              metafunction
            parse is the main entry point. For primitive parsers, our first thing to do is call:
qi::skip(first, last, skipper);
          to do a pre-skip. After pre-skipping, the parser proceeds to do its thing.
          The actual parsing code is placed in extract_int<T, Radix, MinDigits, MaxDigits>::call(first, last, attr);
        
This simple no-frills protocol is one of the reasons why Spirit is fast. If you know the internals of Spirit.Classic and perhaps even wrote some parsers with it, this simple Spirit mechanism is a joy to work with. There are no scanners and all that crap.
The what function just tells us that it is an integer parser. Simple.
          The attribute metafunction returns the T template
          parameter. We associate the any_int_parser
          to some placeholders for short_,
          int_, long_
          and long_long types. But,
          first, we enable these placeholders in namespace boost::spirit:
        
template <> // enables short_ struct use_terminal<qi::domain, tag::short_> : mpl::true_ {};
template <> // enables int_ struct use_terminal<qi::domain, tag::int_> : mpl::true_ {};
template <> // enables long_ struct use_terminal<qi::domain, tag::long_> : mpl::true_ {};
template <> // enables long_long struct use_terminal<qi::domain, tag::long_long> : mpl::true_ {};
          Notice that any_int_parser
          is placed in the namespace boost::spirit::qi while these enablers
          are in namespace boost::spirit. The reason is that these placeholders are
          shared by other Spirit domains.
          Spirit.Qi, the parser is one domain. Spirit.Karma,
          the generator is another domain. Other parser technologies may be developed
          and placed in yet another domain. Yet, all these can potentially share
          the same placeholders for interoperability. The interpretation of these
          placeholders is domain-specific.
        
Now that we enabled the placeholders, we have to write generators for them. The make_xxx stuff (in boost::spirit::qi namespace):
template < typename T , unsigned Radix = 10 , unsigned MinDigits = 1 , int MaxDigits = -1> struct make_int { typedef any_int_parser<T, Radix, MinDigits, MaxDigits> result_type; result_type operator()(unused_type, unused_type) const { return result_type(); } };
This one above is our main generator. It's a simple function object with 2 (unused) arguments. These arguments are
no_case[p]
              to pass information to inner parser nodes. We'll see how that works
              later.
            Now:
template <typename Modifiers> struct make_primitive<tag::short_, Modifiers> : make_int<short> {};
template <typename Modifiers> struct make_primitive<tag::int_, Modifiers> : make_int<int> {};
template <typename Modifiers> struct make_primitive<tag::long_, Modifiers> : make_int<long> {};
template <typename Modifiers> struct make_primitive<tag::long_long, Modifiers> : make_int<boost::long_long_type> {};
          These, specialize qi:make_primitive for specific tags. They
          all inherit from make_int
          which does the actual work.
        
Let me present the kleene star (also in namespace spirit::qi):
template <typename Subject> struct kleene : unary_parser<kleene<Subject> > { typedef Subject subject_type; template <typename Context, typename Iterator> struct attribute { // Build a std::vector from the subject's attribute. Note // that build_std_vector may return unused_type if the // subject's attribute is an unused_type. typedef typename traits::build_std_vector< typename traits:: attribute_of<Subject, Context, Iterator>::type >::type type; }; kleene(Subject const& subject_) : subject(subject_) {} template <typename F> bool parse_container(F f) const { while (!f (subject)) ; return true; } template <typename Iterator, typename Context , typename Skipper, typename Attribute> bool parse(Iterator& first, Iterator const& last , Context& context, Skipper const& skipper , Attribute& attr_) const { // ensure the attribute is actually a container type traits::make_container(attr_); typedef detail::fail_function<Iterator, Context, Skipper> fail_function; Iterator iter = first; fail_function f(iter, last, context, skipper); parse_container(detail::make_pass_container(f, attr_)); first = f.first; return true; } template <typename Context> info what(Context& context) const { return info("kleene", subject.what(context)); } Subject subject; };
          Looks similar in form to its primitive cousin, the int_parser.
          And, again, it has the same basic ingredients required by Derived.
        
          kleene is a composite parser. It is a parser that composes another parser,
          its “subject”. It is a UnaryParserPrimitiveParserUnaryParser<Derived>parser<Derived>.
        
unary_parser<Derived>, has these expression requirements on Derived:
UnaryParser parser.)
            UnaryParser type.)
            
          parse is the main parser entry point. Since this is
          not a primitive parser, we do not need to call qi::skip(first, last, skipper). The subject, if
          it is a primitive, will do the pre-skip. If if it is another composite
          parser, it will eventually call a primitive parser somewhere down the line
          which will do the pre-skip. This makes it a lot more efficient than Spirit.Classic.
          Spirit.Classic
          puts the skipping business into the so-called "scanner" which
          blindly attempts a pre-skip every time we increment the iterator.
        
          What is the attribute of the kleene? In general, it
          is a std::vector<T>
          where T is the attribute
          of the subject. There is a special case though. If T
          is an unused_type, then
          the attribute of kleene is also unused_type.
          traits::build_std_vector takes care of that minor
          detail.
        
So, let's parse. First, we need to provide a local attribute of for the subject:
typename traits::attribute_of<Subject, Context>::type val;
          traits::attribute_of<Subject, Context>
          simply calls the subject's struct
          attribute<Context>
          nested metafunction.
        
val starts out default initialized. This val is the one we'll pass to the subject's parse function.
          The kleene repeats indefinitely while the subject parser is successful.
          On each successful parse, we push_back
          the parsed attribute to the kleene's attribute, which is expected to be,
          at the very least, compatible with a std::vector.
          In other words, although we say that we want our attribute to be a std::vector, we try to be more lenient than
          that. The caller of kleene's parse may pass a different attribute type.
          For as long as it is also a conforming STL container with push_back, we are ok. Here is the kleene
          loop:
        
while (subject.parse(first, last, context, skipper, val)) { // push the parsed value into our attribute traits::push_back(attr, val); traits::clear(val); } return true;
Take note that we didn't call attr.push_back(val). Instead, we called a Spirit provided function:
traits::push_back(attr, val);
          This is a recurring pattern. The reason why we do it this way is because
          attr can be unused_type.
          traits::push_back takes care of that detail.
          The overload for unused_type is a no-op. Now, you can imagine why Spirit is fast! The parsers are so
          simple and the generated code is as efficient as a hand rolled loop. All
          these parser compositions and recursive parse invocations are extensively
          inlined by a modern C++ compiler. In the end, you get a tight loop when
          you use the kleene. No more excess baggage. If the attribute is unused,
          then there is no code generated for that. That's how Spirit
          is designed.
        
The what function simply wraps the output of the subject in a "kleene“... "”".
          Ok, now, like the int_parser,
          we have to hook our parser to the qi
          engine. Here's how we do it:
        
First, we enable the prefix star operator. In proto, it's called the "dereference":
template <> struct use_operator<qi::domain, proto::tag::dereference> // enables *p : mpl::true_ {};
          This is done in namespace boost::spirit
          like its friend, the use_terminal
          specialization for our int_parser.
          Obviously, we use use_operator to enable the dereference
          for the qi::domain.
        
Then, we need to write our generator (in namespace qi):
template <typename Elements, typename Modifiers> struct make_composite<proto::tag::dereference, Elements, Modifiers> : make_unary_composite<Elements, kleene> {};
          This essentially says; for all expressions of the form: *p, to build a kleene parser. Elements
          is a Boost.Fusion
          sequence. For the kleene, which is a unary operator, expect only one element
          in the sequence. That element is the subject of the kleene.
        
We still don't care about the Modifiers. We'll see how the modifiers is all about when we get to deep directives.