Toy - A Small Compiler for the Amiga


	To most people, compilers are black boxes. They read an input text
    file containing source and magically write out an object file
    containing equivalent machine instructions. In actuality, there is
    nothing magical about a compiler. A compiler doesn't even have to be
    very complex or very large. Since programmers are quite demanding,
    however, the compiler writer (being a demanding programmer), is not
    likely to be satisfied with a compiler that is less than perfect. Thus,
    the typical compiler soons ceases to be simple and small. Such minor
    details as complex data types, good code, good error recovery, etc.,
    can add thousands of lines to a compiler. Such an embellished compiler
    is likely not fully understood by its author, let alone anyone else.
    So, when people have requested the source to my Draco compiler, so they
    can learn from it and add some stuff they want, I have been somewhat
    reluctant. To be honest, it is a mess inside. This situation led me to
    write a new compiler, one designed to be simple and understandable, but
    nevertheless a real compiler for the Amiga.

	The result of this is the Toy compiler, which I will describe in
    this series of articles. It is quite simple in operation, and uses the
    same techniques that the Draco compiler itself uses. It is written in
    Draco, which is freely available, so people can tinker with it without
    having to buy a commercial compiler. In spite of its simplicity, it is
    a true compiler. It goes from a source file to an executable object
    file containing MC68000 instructions. It does not use an interpreter,
    or threaded code, or any other such technique for execution - it emits
    straightforward (and not overly efficient) instructions to accomplish
    the steps specified by the input source program statements.

	In this first article of the series, I will describe the language
    that is being compiled, giving some example programs written in Toy.
    I will also outline the general structure of compilers, and the Toy
    compiler in particular. Techniques used in commercial and experimental
    compilers will be mentioned, as will many of the problems that compiler
    writers must contend with, and the decisions they must make.


    The Toy Language

	One of the first decisions to be made when designing a compiler is
    that of deciding what language is to be compiled. To keep the compiler
    simple, we want a simple language. This pretty well rules out a
    complete version of an existing language. Most Amiga programmers would
    prefer a C - like language, since that is what they use. I don't feel
    that is a good choice for a number of reasons. C is normally used with
    a preprocessor (either a separate program or built-in to the compiler),
    and having one would increase the complexity of the project. C's syntax
    is ambiguous (e.g. "fred();" is either a declaration or a call,
    depending on where you see it), thus complicating any error recovery.
    C's inside-out declarations are a pain to handle (it took me a couple
    of tries in the C compiler I did at work). Lastly, C's automatic
    declarations and unusual ideas about scope (anything declared as
    'extern' is global, regardless of where you declare it) add unneeded
    hazards. I didn't see any gain in inventing a whole new language,
    however, so I went with what I am most familiar with - Draco. The Toy
    language is a proper subset of Draco. All of the Toy programs I have
    written compile and run just fine under the Draco compiler.

	For those not familiar with Draco, I will describe the Toy language
    here, using a simple grammar and some examples. A Toy program consists
    of any number of procedures, which may or may not return a result. Each
    procedure can have named, typed parameters. Variables global to the
    file and local to procedures exist, and have similar types. Executable
    statements include the assignment statement, the 'if' statement, the
    'while' statement, and some simple input/output statements. Operators
    usable in expressions include '+', '-', '*' and '/'. Available in
    conditions are the standard comparison operators. In the following
    grammar, the '::=' symbol means that the term on the left can be any of
    the '|' - separated sequences on the right. Items in apostrophes must
    appear exactly as given. Items in angle brackets ('<' and '>') must be
    replaced by one of their right-hand-sides in order to make an
    acceptable program. A following '*' means that the term can appear any
    number of times (including none).

	<program> ::= <variables>* <proc>*
	<variables> ::= <type> <id-list> ';'
	<empty> ::=
	<type> ::= 'int' | 'bool'
	<id-list> ::= <id> | <id-list> ',' <id>
	<id> ::= Any sequence of letters, digits and underscores, starting
		 with a letter or an underscore
	<proc> ::= 'proc' <id> '(' <parameters> ')' <result-type> ':'
		    <variables>* <statement-list> <proc-result> 'corp' ';'
	<parameters> ::= <empty> | <par-seqs>
	<par-seqs> ::= <par-seq> | <par-seqs> ';' <par-seq>
	<par-seq> ::= 'int' <id-list>
	<result-type> ::= <type> | 'void'
	<proc-result> ::= <empty> | <expression>
	<statement-list> ::= <statement> | <statement-list> ';' <statement>
	<statement> ::= <if-statement> | <while-statement> |
			<readln-statement> | <write-statement> |
			<assignment-statement> | <empty>
	<if-statement> ::= 'if' <expression> 'then' <statement-list>
			   <elif>* <else> 'fi'
	<elif> ::= 'elif' <expression> 'then' <statement-list>
	<else> ::= <empty> | 'else' <statement-list>
	<while-statement> ::= 'while' <expression> 'do' <statement-list> 'od'
	<readln-statement> ::= 'readln' '(' <id-list> ')'
	<write-statement> ::= 'write' '(' <write-list> ')'
	<write-list> ::= <write-unit> | <write-list> ',' <write-unit>
	<write-unit> ::= <string> | <expression>
	<assignment-statement> ::= <id> ':=' <expression>
	<expression> ::= <comparison>
	<comparison> ::= <add-sub> | <add-sub> <comp-op> <add-sub>
	<comp-op> ::= '=' | '~=' | '<' | '<=' | '>' | '>='
	<add-sub> ::= <mul-div> | <mul-div> <add-sub-op> <mul-div>
	<add-sub-op> ::= '+' | '-'
	<mul-div> ::= <unary> | <unary> <mul-div-op> <unary>
	<mul-div-op> ::= '*' | '/'
	<unary> ::= <unit> | '-' <unary> | '+' <unary>
	<unit> ::= <id> | <proc-call> | <number> | '(' <expression> ')' |
		   'true' | 'false'
	<proc-call> ::= <id> '(' <expr-list> ')'
	<expr-list> ::= <expression> | <expr-list> ',' <expression>
	<number> ::= Any sequence of decimal digits

    Many parts of the Toy compiler's parser will follow this grammar quite
    closely. Note in particular that the definition of <expression>
    eventually cycles around to mentioning <expression> on the right hand
    side. This recursive definition is echoed in the recursive descent
    parser used in Toy.

	The compiler places many limitations on acceptable programs which
    are not described by the grammar. Types of grammars exist which can
    contain all of the semantic information needed, but building a correct
    one of those is more work than building a compiler, so I have not done
    so. Those interested in such things should read the Algol68 report.
    (I'm being sarcastic here - the report is, in my opinion, unreadable
    without a great deal of study.) There are two main types of limita-
    tions. The first type are due to the specific implementation of the
    compiler. For example, each 'if' statement is limited, by an array in
    the parsing code, to 10 'elif' parts. This could be handled by using a
    dynamically allocated linked list, but is not a problem in a toy
    compiler. The other type of limitation imposed by the compiler is that
    of semantic correctness - the compiler will reject things that don't
    make sense, even if they are correct according to the above grammar.
    E.g.

	    bool flag;
	
	    flag := 6;

    is flagged as an error, even though it fits the grammar just fine. This
    is because Toy is a strongly typed language. The compiler also checks
    such things as parameter counts and types on procedure calls.

	The alert reader will already have noticed that Toy isn't a very
    impressive language. You cannot write the world's best Amiga program
    using it. It is pretty difficult to write ANYTHING useful in it. It has
    only two data types - 'int' and 'bool'. It has no arrays, no
    structures, no pointers, no named constants, and a pretty wimpy I/O
    system. Adding any of these things would have significantly increased
    the size and complexity of the Toy compiler, making it less suitable
    as a tutorial compiler. Interested programmers can add many features to
    Toy themselves. I will be giving hints on how to go about this as I
    discuss the various portions of the compiler.

	In the following discussions of the Toy language and compiler, I
    will assume that the reader is familiar with at least one programming
    language, and will be able to understand the general outline of what
    the various programs are doing without me explaining what all of the
    various language constructs mean. A detailed understanding is not
    necessary to get the general outline of how the compiler works.


    Sample Programs

	A tradition I've held on to for my compilers is that the first
    program I try to write is one that prints out "Hello there world!".
    Here is the Toy version:

	proc main()void:

	    write("Hello there world!\n");
	corp;

    As the grammar indicates, a Toy program can have a series of procedure
    bodies. This program has one procedure, called 'main', which has no
    parameters (there is nothing between the parentheses after the
    procedure name), and returns no result - the special result-word 'void'
    means that no result is returned. 'write', in this case, causes the
    given string to be written out. The pair of characters '\n' represent
    the newline character, and indicate that the output line should be
    ended at that point. 'corp' is simply 'proc' spelled backwards -
    program constructs in Toy are usually ended by a backwards word which
    matches a beginning word for that construct. This style is strongly
    disliked by several vocal people. I first thought that way, but changed
    my mind after using it for a while. It makes it quite clear what
    construct is being closed, something which C's braces or Pascal's BEGIN
    - END pairs fail to do.

	My second program is traditionally a recursive solution to the
    "Towers of Hanoi" problem. In the unlikely event that you haven't heard
    of it, I'll describe it briefly. The problem set-up consists of three
    vertical pegs, on which are placed several round disks. The disks all
    start on the first peg, and must be moved to the third peg, with the
    second peg available for intermediate operations. The disks are of
    varying sizes, and start out in order, with the largest on the bottom,
    and the smallest on the top. They must end up the same way on the third
    peg. The only valid move is to move a single disk from one peg to
    another. At no time may a disk rest on another which is smaller. The
    story says that there are some priests on a mountaintop who started
    doing the problem when the world was created, and when they are done,
    the world will end. It turns out that the traditional 64 disks would
    take so long to move according to the rules, that we need have no fear
    of the world ending.

	The standard solution is a recursive one consisting of three steps,
    dealing with N disks:

	1) Recursively using this method, move the top N - 1 disks from the
	    starting peg to the spare peg.
	2) Move the N'th disk from its starting peg to its target peg.
	3) Recursively using this method, move the top N - 1 disks from the
	    spare peg to the target peg.

    There is a special case - if N = 0, don't do anything. Those not
    familiar with recursion may want to try this out using stacks of small
    pieces of paper. Remember that the meanings of 'start peg', 'spare peg'
    and 'target peg' change as you recurse down.

	Listing 1 shows the Toy implementation of this solution to the
    puzzle. I don't recommend that you try it with 64 disks - your Amiga
    will turn to dust long before it is done! Four or five disks makes a
    nice test. This program is already showing some weaknesses in Toy - a
    separate routine 'printPeg' is used to print out the name of a peg.
    Proper languages would allow string variables, which could be printed
    out directly. Procedure 'hanoi' implements the above solution. Moving a
    disk consists of printing out a message saying where to move it.
    Procedure 'main' (all Toy programs must have a procedure called 'main'
    - this is where execution starts) asks for and reads in the count of
    disks to be used, then does the top level call to 'hanoi'.

	Listing 2 shows 'shoot.t', a program which actually manages to
    implement a little game in Toy. It uses the cursor and colour control
    of the Amiga's console device to provide very primitive graphics and
    animation. The program draws a simple scene consisting of the ground, a
    "gun" (a square box sitting on the ground) and a "target" (a different-
    coloured diamond shape, also sitting on the ground). Shells are shot
    from the gun at a forty-five degree angle at a speed specified by the
    player. Their course is shown on the screen and the game is over when
    one hits the target dead on. The game maintains a score which is the
    number of shells shot, and maintains a message area for things like
    error comments.

	"shoot" is sufficiently complicated that I added some comments. For
    the non-C and non-Draco programmers out there, comments in Toy (not
    mentioned at all in the above grammar!) consist of any text between an
    opening /* and a closing */. The program starts off by declaring a
    bunch of global variables. I have capitalized their names to make them
    look like constants, but the Toy compiler takes no notice of that and
    would be quite happy to let me change their values. They are setup once
    in the main program, then not changed throughout the game. Following
    some global variables, the code starts. 'random' is the first Toy
    procedure we've seen that returns a result. The result type of the
    procedure is specified as 'int', by the 'int' after the parameters in
    the procedure header. The actual value returned is given by the
    expression before the 'corp'. This point also is not mentioned by the
    grammar. The 7 procedures after 'random' send escape sequences to the
    console window to accomplish various actions, such as moving the
    cursor, specifying boldface or normal type, and changing the fore-
    ground colour of the text. Consult the Amiga technical manuals for
    details on these escape sequences.

	Next are several routines which use these special control routines
    to do higher-level game-related actions, such as drawing the ground,
    gun and target. 'hitTarget' is one of the largest procedures in the
    program - it plots the course of the thrown shell, drawing it as an
    asterisk, and checking to see if it goes off the edge of the screen,
    hits the target, or just hits the ground. A 'BOOOM' is left wherever
    the shell hits the ground. Note the final 'if' statement in the
    routine - in a more complete language, the condition would have used a
    logical 'and' operation, rather than an extra 'if' statement.

	The final routine, 'main' is, as usual, where execution starts.
    Since Toy has no way to pre-declare procedures, 'main' must be the last
    procedure in the source file, since every other procedure is either
    called by 'main' or by something that 'main' calls, etc. Note that this
    restriction means that mutually recursive procedures cannot be used in
    Toy. The program requests the screen size so that it knows how to do
    the cursor addressing for correct positioning. The size should be 23
    for a normal non-interlaced screen, and 48 for an interlaced one. The
    console device can tell the program how big the window is, but since
    Toy can only read integer values, it would not be possible to
    understand the string that the console device returns.


    General Outline of the Toy Compiler

	The Toy compiler is completely stand-alone. It does not need any
    other programs or libraries to do its job. In particular, it does not
    make use of any compiler-construction tools, such as lexer generators
    or parser generators. The parser generator Yacc (Yet Another Compiler-
    Compiler) has been discussed by Eric Giguere in his series of articles
    in "Transactor for the Amiga". Yacc and its relatives are often used on
    UNIX systems to generate parsers of various kinds. Along with it, UNIX
    offers 'Lex', a program to generate lexical scanners. These, and other
    similar tools, can simplify some aspects of compiler writing, but I
    prefer the flexibility of custom code, so have not used any for either
    Draco or Toy.

	Like any other program which reads input, writes output, parses
    command lines and prints error messages, a compiler requires an
    interface to the operating system. In the case of a compiler, the
    interface is made more complex by the strict structure required of the
    output machine-code file. In the case of AmigaDOS, the structure of an
    executable file is described in the "AmigaDOS Technical Reference
    Manual". Note that there are two variants of an object file. The first
    variant, normally produced by compilers and assemblers, is an unlinked
    one. This form is not directly executable by the operating system since
    it makes references to support and library routines which are provided
    by the compiler libraries and/or operating system. Such a file is
    linked with similar files containing libraries and interfaces to the
    operating system to produce a final, fully linked object file. This
    latter type of file IS directly executable by the operating system. I
    chose to keep Toy as simple and complete as possible, so it directly
    produces the final linked version. This means that the user does not
    need access to a linker (e.g. either ALink or BLink) in order to
    compile and run Toy programs. This implies that the Toy compiler itself
    must supply any support routines that are needed. For Toy, these
    consist of the ability to read and write integers and to write strings.
    Note that the two Toy source files dealing with the system interface
    and generation of the run-time routines are the only ones which are
    system dependent. The remainder of the compiler does not care where it
    is running, or where its output is going - it is only concerned with
    the input source and the generated MC68000 instructions.

	The system interface supplies input data in a raw form - as a
    sequence of ASCII characters. The first step in compilation is to break
    this sequence up into larger, more meaningful pieces. These pieces are
    such things as identifiers, reserved words such as "proc", "if", etc.,
    numbers, punctuation marks, string constants, etc. This process is
    called "lexical analysis", "lexical scanning" or "tokenization". The
    output from this phase of a compiler is a sequence of "tokens" which
    represent the input program at a higher level than that of a sequence
    of characters. The tokenizer is also responsible for skipping over
    comments, blank lines, spaces, etc. Many compilers do not actually keep
    the tokenized form of the program around - they simply call a routine
    to get the next token when it is needed, and then discard it when they
    go on to the one after that. Toy uses this technique. There are not
    many error messages associated with tokenization - about the only thing
    that can be complained about are characters not in the set of
    characters that the language uses.

	For a C compiler, the sequence of events described here is somewhat
    different. First, the input text is very roughly tokenized - just
    enough for the preprocessor to spot its directives. The resulting
    "preprocessor tokens" are handled by the preprocessor to do macro
    replacement and conditional compilation. The output from the
    preprocessor is another text stream. The compiler proper must then
    tokenize this stream yet again for its own use. It is almost possible
    to do away with the double tokenization, but the strict definition of
    the C language does not really allow this. I tried it for the ANSI C
    compiler I wrote at work, and it is that area of the compiler that most
    needs redoing.

	The sequence of tokens generated by the tokenizer must be parsed
    according to the grammar for the language. In most compilers, the
    parser also issues syntax-related error messages, attempts error
    recovery, and often does semantic (meaning) checking as well. These
    activities are all done by the Toy parsing routines. Parsing the input
    consists of identifying the grammar rules which fit the input, and
    interpreting that input according to those rules. E.g. the only grammar
    rule that the sequence 'if a < b then write("Hello\n"); fi;' fits is
    that for an 'if' statement. The condition <expression> is the
    subsequence 'a < b', which itself breaks down through the expression
    grammar into <id> <comp-op> <id>. There are no <elifs>, and the body
    <statement-list> has only one <statement>, which is a <write-
    statement>.

	At this point the compiler writer has another option. Should an
    internal data structure be built which describes the parsed program?
    Such an internal form can make certain kinds of optimizations easier.
    For example, common sub-expression analysis, which attempts to avoid
    recalculating an expression whose value was just recently computed,
    usually operates by scanning such an internal structure (usually in a
    tree-form) for similar sub-trees. Also, such a representation can be
    used to isolate the target CPU dependencies and the input language
    dependencies into separate sections of the compiler. Newer compiler
    systems are able to use this technique to provide a single "back-end"
    (optimization and code generation) to several different "front-ends"
    (tokenization and parsing), thus drastically reducing the work needed
    to create an entire family of compilers. Lattice C appears to use a
    technique like this - the "quad" files are a quad (four element)
    representation of the parsed program. This theoretically allows them to
    use the same "front-end" for the Amiga compiler as, say, for the MSDOS
    compiler. The Toy compiler, which is designed to be small and easy to
    understand, does not do this. Instead, the parsing code makes direct
    calls to code-generation routines, which generate the code for the
    constructs being parsed. This, unfortunately, prevents a clean
    separation between the parsing and the code generation.

	As you would expect, the Toy compiler has little in the way of
    optimization. What it does have is there mainly for illustrative
    purposes and is simply some checks for special cases which can easily
    be done better.

	The final stage of compilation is that of code generation. This
    stage typically does a lot of bit fiddling to construct the target
    machine instructions. This is where the compiler writer curses the CPU
    designers for not making the machine any where near as orthogonal as
    they claim to have. In a global sense, the MC68000 is reasonably
    orthogonal - the form and coding of addressing modes is consistent, and
    the positioning of most instruction fields is consistent. There are a
    number of nuisances, however. A minor, but easily spotted, example is
    the encoding of the operand length. This is sometimes in the form of a
    two-bit field, but other times in the interpretation of a three-bit
    field. The MOVE instruction uses a two-bit field, but uses a different
    encoding than all of the other uses. The EOR instruction appears to
    follow the pattern of the ADD, SUB, AND, etc. instructions, but it is
    actually missing many of the modes - the encodings for those are used
    for other, totally different instructions. One of the winners on the
    MC68000 has to be the shift/rotate instruction. There are two forms of
    it, one operating on registers, and one operating on memory operands.
    The memory operand form is distinguished by a size field of binary 11.
    The register form has a bit which specifies whether another field is an
    explicit shift count or is the number of a D-reg which contains the
    shift count. Another field, in different places in the two forms, says
    what kind of shift is being done - logical, arithmetic, rotate, or
    rotate with extend.

	The code generation stage of a compiler thus has to contend with
    whatever idiosyncrasies the target CPU has. It also has to contend with
    the limitations of the CPU. These include such things as the number of
    registers available, the special-purpose use of some registers (the
    dreaded Intel 8086 family is notorious for this), and the size of
    various instruction fields (e.g. a 16 bit signed displacement on some
    modes, but an 8 bit signed displacement on others).

	Again, there are several approaches to code generation. Many of the
    newer compilers are table driven - they have a table of the
    instructions and forms that the target CPU has, along with some kind of
    encoded description of what they do. The compiler then attempts to
    match the requirements of the structure it is trying to generate code
    for against the table of instructions, to find one that matches. For
    any but the most simple structures, there will be no match, so the
    compiler will resort to heuristics like loading operands into
    registers, then trying again. For the simpler cases, there might be
    more than one match, so the compiler tries to decide which is
    "cheaper". Once more, the Toy compiler doesn't do any of this - it has
    simple, special purpose code for each thing it tries do. The form of
    these routines will be discussed in full in the article on Toy's code
    generation.


    The Final Example

	As a final example of what the Toy compiler does, I will show the
    stages of compilation for a small program. This will be referenced by
    the later articles in order to illustrate details of the compilation
    process. Here is the initial program source:

	int N, Sum;

	proc main()void:

	    Sum := 0;
	    readln(N);
	    while N ~= 0 do
		Sum := Sum + N;
		readln(N);
	    od;
	    write("The sum is: ", Sum, "\n");
	corp;

    The tokenizer reduces the program to the following sequence of tokens:

	'int' <id "N"> ',' <id "Sum"> ';' 'proc' <id "main"> '(' ')' 'void'
	':' <id "Sum"> ':=' <number 0> ';' 'readln' '(' <id "N"> ')' ';'
	'while' <id "N"> '~=' <number 0> 'do' <id "Sum"> ':=' <id "Sum">
	'+' <id "N"> ';' 'readln' '(' <id "N"> ')' ';' 'od' ';' 'write' '('
	<string "The sum is: "> ',' <id "Sum"> ',' <string "\n"> ')' ';'
	'corp' ';'

    Note that the tokenizer has totally discarded all of the pretty spacing
    and indentation. There are a very large number of intermediate parsings
    that could be shown. One somewhere in the middle is:

	<type> <id-list> ';'
	'proc' <id> '(' <empty> ')' <result-type> ':'
	    <empty>
	
	    <assignment-statement> ';'
	    <readln-statement> ';'
	    'while' <exression> 'do'
		<assignment-statement> ';'
		<readln-statement> ';' <empty>
	    'od' ';'
	    <write-statement> ';' <empty>
	'corp' ';'

    Note that two <empty>'s are needed to allow a semicolon after the last
    statement in a <statement-sequence> (before the 'od' and 'corp'). The
    assembler code generated for this program is:

	0000:	link.w	  fp,#0
	0004:	movem.l   <d1,d2,d3,d4,d5,d6,d7>,-(sp)
	0008:	clr.w	  d7
	000a:	move.w	  d7,Sum
	0010:	jsr	  toy_readLine
	0016:	jsr	  toy_readInt
	001c:	move.w	  d0,d7
	001e:	move.w	  d7,N
	    LOOP:
	0024:	move.w	  N,d7
	002a:	clr.w	  d6
	002c:	cmp.w	  d6,d7
	002e:	sne	  d7
	0030:	andi.w	  #1,d7
	0034:	tst.w	  d7
	0036:	beq	  LOOPEXIT
	003a:	move.w	  Sum,d7
	0040:	move.w	  N,d6
	0046:	add.w	  d6,d7
	0048:	move.w	  d7,Sum
	004e:	jsr	  toy_readLine
	0054:	jsr	  toy_readInt
	005a:	move.w	  d0,d7
	005c:	move.w	  d7,N
	0062:	bra	  LOOP
	    LOOPEXIT:
	0064:	jsr	  16(pc)
	0068:	"The sum is: "
	0076:	jsr	  toy_printString
	007c:	move.w	  Sum,d7
	0082:	move.w	  d7,-(sp)
	0084:	jsr	  toy_printInt
	008a:	jsr	  4(pc)
	008e:	"\n"
	0090:	jsr	  toy_printString
	0096:	movem.l   (sp)+,<d7,d6,d5,d4,d3,d2,d1>
	009a:	unlk	  fp
	009c:	rts

    Finally, the AmigaDOS object module hunks produced are:

	hunk_code:
	0000: 4e56 0000 48e7 7f00 4247 33c7 0000 0116 *NV..H...BG3.....*
	0010: 4eb9 0000 0000 4eb9 0000 0000 3e00 33c7 *N.....N.....>.3.*
	0020: 0000 0114 3e39 0000 0114 4246 be46 56c7 *....>9....BF.FV.*
	0030: 0247 0001 4a47 6700 002c 3e39 0000 0116 *.G..JGg..,>9....*
	0040: 3c39 0000 0114 de46 33c7 0000 0116 4eb9 *<9.....F3.....N.*
	0050: 0000 0000 4eb9 0000 0000 3e00 33c7 0000 *....N.....>.3...*
	0060: 0114 60c0 4eba 0010 5468 6520 7375 6d20 *..`.N...The sum *
	0070: 6973 3a20 0000 4eb9 0000 0000 3e39 0000 *is: ..N.....>9..*
	0080: 0116 3f07 4eb9 0000 0000 4eba 0004 0a00 *..?.N.....N.....*
	0090: 4eb9 0000 0000 4cdf 00fe 4e5e 4e75 0000 *N.....L...N^Nu..*
	hunk_reloc32:
	    hunk 3: 134 
	    hunk 2: 146 120
	    hunk 5: 86 24
	    hunk 4: 80 18
	    hunk 1: 126 94 74 66 60 38 32 12
	hunk_end

    The actual executable object file created contains the Toy startup and
    run-time library before these hunks, but they are fixed for all Toy
    programs, so they are not shown here. About the only part of the
    original program that is still recognizable to a human reader is the
    string "The sum is: ".


    Next Month

	The next article in this series will examine the data structures
    used by the Toy compiler and will present the tokenizer and the symbol
    table handler in detail.