Next: , Previous: , Up: parsing   [Index]


10.2 Parser Combinators

The combinators are basic elements that you get from the module (logic guile-log parser). To note here is with these we define elements to do parsing in a “functional” way together with som utilities to memoize sections and produce tokens. Tokens are more interned scheme data then the standard way of matching that is lightweight and just represent sequences of the incoming chars. They are also cheap to backtrack over but will put a demand on the supply of prolog variables.

10.2.1 higher order elements,

In the arguments to the combinators below a string will be automatically converted to (f-tag "string").

(f-and f ...), higher order function, matches if all f ... matches and returns the values from the last one. Similarly and! only successes ones and for and!! all f ... only sucesses once.

(f-or f ...), higher order funciton, matches if at least one of f ... matches, similarly f-or! matches only once.

(f-seq f ...), higher order function. Matches if f ... macthes in sequence. Similarly f-seq! only matches once and for f-seq!! each f only matches once.

(f-not f), higher order function. matches one char if not f maches. Similarly f-not! stores and returns the matched character in the match standard way of storing data and f-not-pr prints the character on standard output.

(f-not* f), this will not consume any characters and fail if f is true.

(f* f), higher order function, matches f 0 or more times.

(f+ f), higher order function, matches f 1 or more times.

(fn f n m), matches f n to m times

10.2.2 Basic parser functions

(f-tag str), produces a matcher of the tag str. f-tag! stores that match the standard way and f-tag-pr prints the match on standard output.

(f-reg char-regexp), produces a matcher that will macth one char according to char-regexp. Similarly we have f-reg! to store the standard way and

f-true, f-false macthers that represt a successful match and a failed match. These are mute constructs and does not consume any data.

f-eof, matches end of file. f-nl, do not use newlines with e.g. f-reg, it will not work use this in stead to match a newline. There is also f-nl! and f-nl-pr versions.

tok-ws*, tok-ws+, this is an non token generator token e.g. it returns the input state of the out stream. But it does tokenizing by issuing a freeze so it memoizes and will not reparse whitespace. the * version is for 0 or more whitespaces and + for one or more whitespaces. A whitespace is here defined as ’space’ ’tab’ and ’newline’.

10.2.3 constructor combinators,

The argument to these functions will auto output keyword arguments e.g. it does an automatic (f-out #:keyword) in the place.

(f-cons f1 f2), as (f-seq f1 f2), but the output is consed.

(f-list f ...), as (f-seq f ...), but the outut is combined in a list.

(f-cons* f ...), similarly as f-list above.

(f-out tag), will produce a matcher that conses tag to the output at the end of a matching.

(ff* f), as (f* f), but makes a list of the output.

(ff* f tag), as aboove but prepends the list with tag.

(ff+ f), (ff+ f tag), as for ff* above but matches one or more elements.

(ff? f), Matches one or if zero matches outputs #f. (ff? f default), Matches one or if zero matches outputs default.

10.2.4 utilities

(p-freeze tok f mk) This will freeze the interpretation of the comming characters according to the parser function f as token tok. Using backtracking, one can get into the unfourtunate stage of reparsing things over and over again. WIth p-freeze we will memoize the reseult and the second time we try to parse the token at the same position we will just use the memoized data. this is how we tokenize and gives the guile-log parsing system a slower tokenizer on one hand, then traditional tokenizers, but on the other hand allows far more advanced tokens to be builed e.g. tokenize a parenthesied expression in C. mk is a function whos purpose is to construct that value that should be memoized it has the signature

(mk s in-value out-value), where s is state value neede to tarnslate any prolog variables, in-value is the value that get’s into the resulting parser function, and out-value is the value out of the parser function f above. typical uses are (define (mk s cin cout) cout).

(mk-token f), this is the lower level tokenizer that just produces a string token out of the all the matched characters by the parser function f.

(Ds f), will delay the evaluation of the parser matcher f. This function is needed in recursive definitions in order to avoid infinite recursion.

(parse f) (parse string f) The first version will read from standard input and produce an output as returned by the parser function f. We may also in the second version use a string as input to the parser as per convinience.


Next: , Previous: , Up: parsing   [Index]