Toy - A Small Compiler for the Amiga
			 Part 3 - Parsing

	This third article in the series on the Toy compiler describes the
    parsing routines. These are divided up into three modules. The first
    deals with procedure headers, along with the entry and exit code used,
    and with variable declarations. The second deals with expressions, and
    the third deals with statements. As has been mentioned previously, the
    Toy compiler does not build an internal structure to repreresent a
    parsed program. Thus, the parsing routines call on the code generation
    routines directly. I have tried to keep the parser relatively indepen-
    dent of the target processor, but since I haven't tried to port Toy to
    a different CPU, I don't really know how successful I have been.
    Perhaps an ambitious MS-DOS programmer will see these articles and
    build an 8086 version!

	The parsing technique used is that of "recursive descent". This
    means, quite literally, that the parser is recursive, and it descends
    into the structure of the program. The file handling code calls the
    "top" levels of the parser, which handle things like global variables
    and procedures. They in turn decend into lower levels, like statements
    and expressions. Places in which the grammar loops (a right-hand-side
    refers back, either directly or indirectly, to its own left-hand-side)
    correspond to a loop or recursion in the parsing routines.

	The reader may wish to quickly glance back at the previous article,
    to see the interfaces defined for the symbol table and tokenizer code.


    Declarations and Procedures

	Listing 1 shows file "decl.d". It contains only two functions,
    'declareVariables' and 'defineProc'. The first is used to define a
    group of variables, either global to the program, or local to a
    procedure. The second is used to define a procedure, and to call the
    code generation routines at the right places to cause the procedure's
    entry and exit code to be built. This second is one of the larger
    procedures in the compiler. For a more complex language, both of these
    would be split up into more pieces. In particular, there would be a
    function to parse an arbitrary type and return some kind of structure
    defining it. Also, most languages treat two descriptions of the same
    type as equivalent, so there would be facilities for either merging
    multiple descriptions into one, or of comparing two descriptions. To
    illustrate this, consider the following Draco style declarations:

	    [10] * char fred;
	    [10] * char joe;

    Here, both 'fred' and 'joe' are "the same type", which is an array of
    10 pointers to character, thus we can assign one to the other, or
    compare the two. In order to allow this, the compiler has to either use
    the same structure (typically an array node, which has an element-type
    which is a pointer node, which in turn has a subtype which is a built-
    in type node) for both, or to provide a routine which traverses these
    structures and decides they are equivalent. Avoiding this whole area is
    the main reason I kept Toy down to just 'int' and 'bool' as types.

	'declareVariables' is given one parameter, which is the kind of
    variables to be declared. This is either 'sk_localVariable' for
    variables local to a function, or 'sk_globalVariable' for variables
    global to the entire program. This kind is stored in the symbol table
    entries created for the variables, and is also used by the code
    generation routines to assign storage to the variables. The structure
    of 'declareVariables' is that of two nested 'while' loops. The inner
    one loops through a declaration of a number of variables of the type,
    and the outer one loops past several of the inner sets. E.g. in

	    int a, b, c, d;
	    bool fred, joe, sam, harry;
	    bool alice, mary, sue, evangeline;

    there would be three traversals of the outer loop, each consisting of
    four traversals of the inner loop.

	When we enter the function, the current token will be the keyword
    for the type, which is either 'int' or 'bool'. We thus arrange for the
    outer loop to continue while that situation is true. This means that
    the body of that loop must consume all legal tokens up to and including
    the semicolon that separates the declaration sets. We remember the type
    in local variable 'typ' ('type' is a reserved word in Draco, so I can't
    use that as the variable name), then move past the token. At that point
    we expect to see a list of undeclared identifiers, separated by commas,
    followed by a semicolon. We arrange the inner loop so that it always
    starts looking at the new identifier. 'getToken' is called instead of
    'getSimpleToken', since we need the symbol pointer in order to define
    the symbol. If the identifier is already defined, we complain about it
    and go on. Otherwise, we give it the appropriate kind and type, then
    call 'allocateVariable' in the code generation portion of the compiler
    to let it figure out where the variable will be. It will use the
    symbol's 'se_value' field to store any needed information.

	After skipping the newly defined identifier token, we run head on
    into one of the uglier parts of most compilers - that of error
    recovery. The Toy grammar expects a comma or a semicolon after the
    identifier. What happens if there isn't one, but there is, say, a '<'
    token instead? Well, we have to put out an error message, and then try
    to continue meaningfully. Good error recovery is a careful balance
    between skipping too much input (which could result here in some
    variables not getting declared, which would cause extra error messages
    when those variables were used) and skipping too little input (which
    could result in extra error messages here, and perhaps the compiler
    getting into an infinite loop if it ends up in the same place again). I
    won't try to justify the recovery that Toy does - it is a result of my
    experience with tinkering with compilers.

	For correct input, the inner loop would have skipped a comma, and
    would now be looking at the next identifier, or would have found a
    semicolon and done nothing about it, thus leaving it for the outer
    loop. The outer loop expects the semicolon between declaration sets,
    and thus has a very similar error-recovery situation. Here the recovery
    is dependent on the kind of the variable - if we are inside a
    procedure, we should stop at anything that looks like an executeable
    statement, but if we are not, we should stop at anything that looks
    like a new declaration or a procedure definition.

	Declaration processing is one area that is more complicated in a C-
    type language. In Draco, even the most complicated type is handled by
    the general type parser, and variable declaration is very much as is
    shown here. In C, however, the type is a combination of the basic type
    given at the beginning, and the modifiers given around the identifier
    ('*' before it for pointer, parentheses and bounds lists after it for
    function types and arrays). Because surrounding parentheses are
    allowed, the identifier is in fact not seen until the bottom of a
    recursive routine! Parts of the type structure are built bottom-up, and
    other parts are built top-down. In my ANSI-C compiler, I found that
    good error recovery required some finely tuned code with arbitrary
    token lookahead.

	'defineProc' is responsible for all of the work of parsing a
    procedure definition. This includes handling its parameter list, its
    result type, any local declarations, its body, and any work needed to
    help the code generation routines produce correct entry and exit code.
    When it is called (by the system interface code), the current token
    will be the 'proc' token. The first step then, is to skip that token.
    After the 'proc' should be an undeclared identifier - the name of the
    new procedure. Here we see another aspect of error recovery - that of
    preventing the compiler from crashing when given invalid input. Later
    portions of 'defineProc' modify the procedure's symbol table entry. If
    no identifier was given, we don't have a symbol table entry. Thus,
    either those places must all check for this case, or we should build a
    fake one to use. The latter option is used here.

	After the procedure name should be a left parenthesis. Next are the
    procedure's parameters. These are handled very similarly to the
    declaration of variables. The error recovery is slightly different yet
    again, and the symbol kind used here is 'sk_parameter'. We also
    restrict the parameters to be of type 'int', so that the symbol table
    entry for the procedure need contain only the number of parameters. In
    general, it would contain something like a linked list of parameters
    and their associated types. After any parameter (there may be none)
    should be the closing right parenthesis.

	It is easy to miss the very important calls to the code generator
    here. 'procStartParameters' is called before declaring any parameters;
    'procEndParameters' is called after the parameters but before any local
    variables; and 'procEndLocals' is called after the local variables.
    'declareVariable' is called for each parameter and local variable. This
    interface should be adequate for any target CPU, and could even allow
    the code generator to decide to keep some variables in registers.

	In Toy, procedure 'main' is special - it is called by the startup
    code to start the user's program. Thus, it has no parameters, and also
    returns no result. C's 'main' is more complex - it is passed a
    description of the command-line arguments to the program, and returns a
    status result. Draco handles this with library calls, so presumeably an
    expanded Toy would do the same. After giving the procedure symbol its
    proper kind and parameter type, we look for a procedure result type.
    That is also put into its symbol table entry. We then expect to see a
    colon, marking the end of the procedure header. After getting the local
    variables, we can call upon the code generation routines to get ready
    to generate code for the procedure's body. It is on this call (to
    'codeInit') that the procedure's entry code would be produced. This
    entry code must save any registers used by the procedure, allocate
    stack space for local variables, and make the local variables and
    parameters accessible to the body code.

	The part of defining a procedure that seems most important to a
    programmer, that of handling the body of the procedure, is surprisingly
    simple - it just calls on the statement parser to handle a list of
    semicolon separated statements. Error recovery here consists of cont-
    inually calling 'parseStatementList' until a 'corp' or the end of the
    input file is found. By convention, 'parseStatementList' returns the
    type of the last statement or expression in the list. Error checking
    here is making sure that the type returned matches what is expected
    based on the result type of the procedure.

	'procResult' is yet another special entry in the code generator
    which does whatever is necessary to make the last used expression the
    result of a current procedure. 'codeTerm' then does the procedure exit
    code and any other cleanup needed. 'symbolPurge' is then called to
    remove all local definitions from the symbol table. This will include
    the local variables and parameters of the procedure, but not the
    procedure itself. The procedure ends with a 'corp' and an optional
    semicolon.


    Expressions

	Listing 2 shows "expression.d", the Toy expression parser. In this
    file we will start with the last function and work backwards to the
    front. 'parseExpression' is the external interface to the expression
    parser. It is used by the statement parser in several places. It does
    nothing other than call 'parseComparison', which is the top level of
    the recursive descent expression parser. This level of indirection
    allows for the addition of more levels of expressions, such as boolean
    operators, which would have lower precedence than the comparison
    operators.

	'parseComparison' is typical of the expression handling routines.
    It first calls the next lower level of the expression parsing chain, in
    this case 'parseAddSub'. It then examines the next input token. While
    that token is a comparison operator, it loops, parsing and generating
    code for a comparison. When an appropriate operator is found, the
    previously obtained sub-expression, which is to be the left-hand
    operand for the operator, is checked for an appropriate type. The
    operator token is then skipped (skipping the operator after the type
    test makes an error message be associated with the left-hand operand,
    rather than with the operator). Next, the lower level routine,
    'parseAddSub', is then called to get the right-hand operand. That
    operand is also checked for appropriateness. For a more complex
    language, the check would have to compare the types of the left and
    right-hand operands for comparison compatibility. When all is well, the
    code generator is called on to generate a comparison ('compareValues').
    It is passed a saved copy of the operator token, so that it knows which
    type of comparison to use. Note that there are no other calls on the
    code generator - the operands are set up by the lower level parts of
    the expression parser. The result of the function, like all of these
    functions, is the type of the expression. If a comparison was done, the
    type is 'bool', otherwise the type is that of the subexpression that
    would have been the left-hand side if a comparison was present.

	The next routine (moving backwards), 'parseAddSub' is entirely
    analagous. Beyond is 'parseMulDiv', again quite similar. Note that
    these last two share the same code generation routine. Next comes
    'parseUnary', which is a little different - it uses an 'if' instead of
    a 'while', and calls itself recursively if it finds a unary operator.
    At this point it is useful to go through an example parse, to see how
    all of the cases fit together. We will parse the expression

	1 + 2 * -+3 - +++++4 < 5 / (6 + 7)

    When 'parseExpression' is called, the current token will be the '1'.
    The chain of calls will immediately go down through 'parseComparison',
    'parseAddSub', 'parseMulDiv', and 'parseUnary', landing in 'parseUnit',
    which will handle the number and return. The current token will then be
    the '+', which will allow the return to go back up to 'parsePlusMinus',
    which will enter its 'while' loop. It will check the type of what is
    now its left-hand operand, which is 'int', therefore OK. Skipping over
    the '+', it will dive back down the chain again. 'parseUnit' will again
    handle the number (2), and return back up, this time to 'parseMulDiv'.
    'parseMulDiv' will move past the '*' operator, and call 'parseUnary'
    for its right-hand operand. This time, 'parseUnary' sees one of the
    operators it is looking for (the unary '-'), so it goes into its larger
    alternative. The operator is saved and skipped, and 'parseUnary' is
    called recursively. This time it sees the unary '+ ', which is handled
    the same way. The next call to it sees no unary operator, so
    'parseUnit' is called to handle the number 3. We now return to the
    lower of two recursive 'parseUnary' calls and check the operand type.
    All is well, but nothing is done, since the operator was a unary '+',
    which is a do-nothing in Toy. So, we return to the upper of the
    recursive 'parseUnary' calls. This time the operator was unary '-', and
    we call the code generation routine 'unaryOp', giving it the '-' token.
    This is the first call to the code generator, other than calls done by
    'parseUnit', even though we have processed 7 expression tokens. Other
    operations are remembered in the local states of the parsing routines.
    The recursion in 'parseUnary' is done so that the code generation is
    done in the proper order. For unary operators, the execution is done
    right-to-left, whereas the operators are seen left-to-right. Using
    recursion allows us to hold an arbitrary number of them pending.

	The return from the uppermost 'parseUnary' call goes back to the
    second call of 'parseUnary' inside 'parseMulDiv'. After a type check,
    it will finally call 'binaryOp' to do the multiply of 2 and -3. It's
    'while' loop will now exit, and we return to 'parseAddSub', which will
    be in the same situation - it will call 'binaryOp' to do the add. The
    'while' loop is again satisfied by the '-', so we go down for another
    operand. Note what is happening here - the binary operators are being
    evaluated left-to-right, as they should. 'parseComparison' does the
    same thing, even though, e.g. "a < b ~= c", is not valid. This is done
    for error recovery purposes.

	Continuing with our parse, 'parseUnary' will recurse its way
    through the string of unary '+'s, doing nothing with them. After the
    usual handling of the '4', we return back to 'parseAddSub' which can
    complete the handling of the subtraction. We finally return to 'parse-
    Comparison', which sees a comparison operator and dives back down for
    its right-hand operand. The difference here is the parenthesized sub-
    expression. We get down to 'parseUnit' as usual, but here something
    special happens. It sees the '(', skips it and then calls 'parse-
    Expression', which is the top of the chain. This call is the reason
    for the "recursive" in the name "recursive descent". Parsing of the
    subexpression "6 + 7" is standard until we come to the ')'. It matches
    none of the operator tests, so the calls return all the way out of
    'parseExpression', and we end up back in 'parseUnit'. Now we check for
    and skip the ')', and return, thus treating the parenthesized sub-
    expression just like any other bottom-level unit.

	Some observations about the "recursive descent" parsing method:

	    - no temporary storage is needed - everything is kept in local
		variables in the parsing routines
	    - there is no limit to the size or complexity of the expression
		parsed, other than the size of the execution stack for the
		compiler
	    - operator precedence or priority is determined by the position
		of the handler for that operator in the chain of descending
		routines. Here the precedence goes

		    unary >> '*'/'/' >> '+'/'-' >> comparison

	    - new operators are added by augmenting an existing routine if
		they are at a previously used precedence, or by inserting a
		new routine in the call chain
	    - although Toy doesn't have any, postfix operators are added
		much like prefix (unary) operators

	We have seen most of 'parseUnit' except for a couple of details.
    The default in its if-else chain is where the compiler ends up if it
    gets confused. This case should consume at least one token, unless we
    are sure that we won't get back here without somebody else consuming
    one. Handling of identifiers is a bit special. When we see one at the
    beginning of a statement or expression, we don't know if it is being
    used as a value, or we have an assignment statement or a procedure
    call. Thus, we must skip the identifier and make that decision based on
    the next token. Even then, a procedure call could be a void procedure,
    which is a statement, or a valued procedure, which is part of an
    expression. Thus, the compiler treats statements and expressions much
    the same, except that statements must be of type 'void'. 'parseUnit'
    uses a 'pushXXX' code generation call to tell the code generator about
    new bottom-level operands. The model is that of a stack machine - push
    an operand, push another operand, perform the operation. It is entirely
    up to the code generator as to just how it implements this, however.
    'checkUndef' is a utility routine which complains if the identifier is
    not defined, and sets its type to 'rt_error'.

	'parseProcCall' should by now be fairly straightforward. It does
    some checking to avoid excess error messages, and just calls
    'pushParameter' for each parameter to the routine being called. Note
    that the parameters would already be "stacked", but this call is
    telling the code generator that they are to be used as parameters.
    Finally, 'callProc' is called to generate the actual call (which would
    be a 'jsr' or 'bsr' for a 68000 CPU).

	A useful feature of many compilers is that of constant expression
    evaluation, also called constant folding. Here, any operator for which
    both operands are constants is performed directly by the compiler,
    instead of generating code to do the operation. It is of course done
    iteratively/recursively so that e.g., the above expression would end up
    as just "-9 < 0" or "true". This evaluation could be added to Toy as
    part of the parser, or as part of the code generator. In Draco, it is
    done during parsing - each routine of the chain returns not only the
    type of the expression, but also a description of where it is, which
    can be "a constant whose value is XXX". The parsing routines thus check
    directly for two constant operands, and do the operation if needed.
    Doing it that way unfortunately blurs the separation between the parser
    and the code generator. A better way might be to insert a layer between
    the parser and the code generator which is just responsible for
    constant folding.


    Statements

	Listing 3 shows "statement.d", the statement parser for Toy. After
    going through the expression parsing, most of it should be relatively
    straightforward. Once again, it is better to start at the end of the
    file and work back to the front. Recall that a procedure body is done
    by calling 'parseStatementList', which returns the type of the last
    statement/expression in the semi-colon separated list. A semicolon
    after anything means that that thing must have been a statement. Toy,
    like Draco, will not accept an expression as a statement.

	'parseStatementList' is a simple loop which calls 'parseStatement'
    repeatedly, until it encounters a token which indicates the end of a
    statement list. The condition here is a negative one - "loop until you
    find one of these", whereas the loops in the expression parser were
    positive ones - "loop while you see one of these". This choice is made
    for error recovery reasons. 'parseStatement' simply calls one of the
    handlers of specific statement types. Recall that assignment statements
    are handled by calling 'parseAssignmentStatement' from inside the
    bottom level of the expression parser.

	An assignment statement in Toy consists of a variable name, ':=',
    and an expression. 'parseAssignmentStatement' calls 'checkAssign' to
    verify that the given symbol is a variable that can be assigned to.
    'popVariable' tells the code generator to store the latest expression
    value into the indicated variable. The error checking done here simply
    insures that the expression has the same type as the variable. No error
    message is given if either is 'rt_error', which indicates that there
    has been an error previously, and so we don't want an extra one now. In
    a more complex language, assignment compatibility would be considerably
    more complex.

	A 'write' statement in Toy consists of 'write', '(', a list of
    expressions or string constants, and ')'. This is in fact the only
    place in the compiler where the value of a string constant is actually
    used. 'isStatement' is a utility routine which returns 'true' if the
    current token can start a statement. 'parseWriteStatement' loops
    through strings or expressions, calling special code generation
    routines to arrange for them to be printed. Here we see the first use
    of the type 'RelocCode_t', to tell 'callSpecial' which of the special
    run-time routines is to be called.

	'parseReadlnStatement' is very similar, except that reads can only
    be done into assignable variables. The 'checkUndef', needed here, is
    not needed in 'parseAssignmentStatement' since the check was already
    done in the expression parser. The semantics is that a new input line
    is read ("callSpecial(rc_readLine)"), and then the required number of
    integers are taken from that line ("callSpecial(rc_readInt)").

	'parseWhileStatement' is the first routine we have seen which needs
    to generate branches in the output code. The code generator interface
    to do this consists of four routines. 'getPos' returns a value which
    can be used as the target of a later backwards branch. 'branchTo'
    generates an unconditional backwards branch to a previously determined
    target. 'branchForward' generates a conditional or unconditional branch
    to some forward location. It returns an identifier that can be used to
    fix up the branch when the target is known. 'patchBranch' patches such
    a branch to branch to the current code position. This simple interface
    is adequate for Toy, but may not allow for the generation of good code.
    The positions and identifiers used in Toy are actually just offsets in
    the code for the current procedure.

	Given the above, 'parseWhileStatement' is quite simple - skip past
    the 'while', get a statement list for the condition, skip the 'do', get
    a statement list for the body, and skip the 'od'. An unconditional
    branch is generated to go from the end of the body to the beginning of
    the condition statement list, and a conditional branch, which branches
    on a 'false' condition, is made from the end of the condition to just
    past the looping branch. If one wanted to cripple the 'while' statement
    in Toy to requiring just a simple expression as the condition (thus
    disallowing center and end-exit loops), just replace the first call to
    'parseStatementList' with a call to 'parseExpression'.

	'parseIfStatement' is more complicated. It allows the use of the
    "elif" form in "if" statements. This means that it must keep track of a
    number of places where the ends of the alternatives have to all branch
    to after the entire "if statement". In Draco, these are all chained
    together in the code buffer (which also helps the forward-branch-
    shortening code to work), but I wanted a cleaner interface in Toy, so
    a simple fixed-size array is used. This limits the Toy compiler to
    handling 'ELIF_MAX' "elif"s in a single "if" statement.

	The generated code for an "if" statement with "elif"s looks
    something like this:

		    +--------------+--------------+------+
		    ^		   ^		  ^	 |
		    |		   |		  |	 v
	-> C -> A ->+ +-> C -> A ->+ +-> C -> A ->+ A -> next
	   |	      ^   |	     ^	 |	    ^
	   v	      |   v	     |	 v	    |
	   +----------+   +----------+	 +----------+

    where the 'C's are conditions, the 'A's are alternatives, and 'next'
    is the statement that comes after the "if" statement. Each condition
    either drops into its alternative (the condition is 'true'), or
    branches to the next condition. The last condition branches to the
    "else" part if there is one. Each alternative branches to the next
    statement. Array 'offsets' contains the positions of each of the
    forward branches from the ends of alteratives to the next statement.
    The other branches are maintained by simple state variables, since
    there is only one of them active at a time.

	The loop in 'parseIfStatement' loops over all of the conditional
    tests and alternatives. The first is an "if", and the remainder (if
    any) are "elif"s. The loop is a bottom-exit loop since we want to stop
    looping when we have fully processed an alternative, and the next token
    is not "elif". The loop first skips the "if"/"elif". On all except the
    first time around, it then handles the termination of the previous
    alternative. This involves generating an unconditional forward branch
    (to the next statement), and patching the previous condition's branch
    to come "here". Next, it gets the expression to be used as the
    condition for this alternative. This call to 'parseExpression' could
    be replaced with a call to 'parseStatementList', but that could lead
    to some very ugly Toy code. Next comes a check for the "then" token,
    followed by a call to 'parseStatementList' for the body of the
    alternative. Finally we have the loop condition, along with some error
    checking. After the loop, we allow for a single "else" part, which is
    very similar to the previous alternatives, except that it doesn't have
    a condition. After any "else", all that is left is patching all of the
    end-of-alternative branches to come "here".

	The interested reader might want to trace through the parsing of
    the example given in the first article, and reproduced here:

	int N, Sum;

	proc main()void:

	    Sum := 0;
	    readln(N);
	    while N ~= 0 do
		Sum := Sum + N;
		readln(N);
	    od;
	    write("The sum is: ", Sum, "\n");
	corp;

    What sequence of calls to the code generator is produced?


    Miscellaneous

	Not listed here, but included on disk, is file "parseUtil.d", which
    includes some utilities used during parsing. 'checkUndef' complains
    about an undeclared identifier, and sets it up as 'sk_undefined', with
    type 'rt_error'. 'checkAssign' complains if an identifier isn't a local
    or global variable or a parameter. 'isStatement' returns 'true' if the
    current token can appear after a semicolon, i.e. is one that can start
    a new statement or is a separating token in a compound statement like
    "if" or "while". 'findStatement' skips tokens until it finds such a
    token. It is used in error recovery. 'findClose', another error
    recovery routine, skips tokens until it finds a right parenthesis, a
    comma, or something that can start a statement.


    Adding New Language Constructs

	Since it is the parser which defines the language compiled, it is
    in the parser that changes are needed to add new language constructs.
    Adding new types, such as arrays, structures, pointers, etc. would be
    done either in "decl.d" or in a new file which just parses types. As an
    example, parsing a Draco-style array type consists of something like:

	skip the '['
	dimensions := 0;
	while
	    parseExpression();
	    verify that expression is a compile-time integral constant
	    dimensionTable[dimensions] := expression value;
	    dimensions := dimensions + 1;
	    token = ','
	do
	    skip the ','
	od;
	verify and skip a ']'
	call 'parseType' to get the element type

    As was mentioned previously, adding compound types to Toy would require
    the addition of type structures, better type checking, etc. Also
    required for array types would be array indexing in the expression
    parser. This would also have to be available for the left-hand-side of
    assignment statements. Also needed are the two new tokens, '[' and ']'.

	A new construct, such as a 'case' statement, would be done by
    adding a new routine to 'statement.d', along with a test for "case" and
    a call to the routine in 'parseStatement'. A first decision here would
    be what the code for a case statement looks like. Most compilers use
    two (or more) variants. If the case indexes are closely spaced, a table
    of code offsets is generated, and the code produced uses the case
    indexing expression value to index into that table to yield an offset
    to branch to. This may require setting up a "default" case. If the case
    indexes are sparse, some sort of lookup is needed. Lattice C uses a
    linear search done directly inline. Draco calls a run-time-system
    routine to do a binary search of a sorted table. Other compilers might
    code a binary decision tree directly. Whatever methods are used, it is
    likely that additional entry points into the code generator would be
    needed. And of course, additional token types would be needed as well.


    Next Month

	Next month's article will discuss the code generator. We have seen
    most of its entry points in this article, but nothing of just what it
    does. We will go into the details of storage allocation and the
    construction of MC68000 instructions. Some trivial optimizations will
    be shown, and more details of the system interface will be discussed.