The Zed Programming Language

The Zed Programming Language

Those wishing to run Zed code should refer to "97.2 Current System Status" for information on how to use the compiler, etc.

Many programmers need only read sections 1 - 8 to get a good understanding of how to use the language. Section 9 describes capsules (Zed's "classes") and interfaces. Section 10 describes generics. There is no section 11. Section 12 describes measures and units (e.g. Length, feet, kilometers).

"1 Introduction"

"2.4 Integral Literals"

"2.5 Floating Point Literals"

"2.6 Character and String Literals"

"2.7 Whitespace and Comments"

"2.8 Names"

"3 Overall Structure"

"3.1 Packages"

"3.2 "use" and "import" in Packages"

"3.3 Paths During Compilation"

"4.4.2 "@" Storage Flags"

"4.4.3 General "@" Use"

"4.4.4 Non-"package" "@" Values"

"4.4.5 "package" "@" Values"

"4.5 Array Types"

"4.6 Matrix Types"

"4.7 Enumeration Types"

"4.8 Oneof Types"

"4.9 Struct Types"

"4.10 Record Types"

"4.10.1 Variant Records"

"4.16 Type Equivalence"

"4.17 Type Values"

"5 Declarations"

"5.1 Scopes"

"5.2 Long-form Variable Declarations"

"5.3 "con" and "var" Declarations"

"5.4 Long-form Constant Declarations"

"5.5 "def" Declarations"

"5.6 Array and Struct Initializers"

"6 Statements, Expressions and Constructs"

"6.1 Assignment Statement"

"6.2 Operator Precedence"

"6.3 Unary Operators"

"6.4 Binary Operators"

"6.4.1 Bit Operators"

"6.4.2 Shift and Rotate Operators"

"6.4.3 "relate""

"6.4.4 Exponentiation"

"6.4.5 Multiplication"

"6.6.3 "_IfBytecode_""

"6.7 Assert Statement"

"6.8 Abort Statement"

"6.9 While Statement"

"6.10 Case Construct"

"6.11 For Statement"

"6.12 Return Statement"

"6.13 Eval Statement"

"7 String and Matrix Operations"

"7.1 String Operations"

"7.2 One Dimensional char Arrays"

"7.3 Matrix Operations"

"8 Miscellaneous Expression and Statement Elements"

"8.1 "getBound""

"8.2 "toUint" and "fromUint""

"8.3 "unit""

"8.4 "nonNil""

"8.5 "byteSwap", "evenParity", "onesCount", "lowOneIndex" and "highOneIndex""

"8.6 "flt", "round" and "trunc""

"9 Interfaces and Capsules"

"9.1 Introduction"

"9.2 Interfaces"

"9.2.1 Interface Syntax"

"9.2.3 Using "partial" Interfaces"

"9.2.4 Using Interface "extends" Clauses"

"9.3 Capsules"

"9.3.1 Capsule Basics"

"9.3.2 Capsule Syntax"

"9.3.3 Capsule "procs" Sections"

"9.3.4 Capsule Constructors"

"9.3.5 Static and Dynamic Method Selection"

"10 Generics"

"10.1 Introduction"

"10.2 Syntax"

"10.3 Rules for Generics"

"10.4 A "Lists" Example"

"10.5 A Generic Array Multiplication Example"

"10.6 Generic Type Parameter Interfaces"

"10.7 Multiple Generic Parameters"

"10.8 Using One Generic Inside Another"

"10.9 When Generic Procs Must be Instantiated"

"10.10 _instantiate_"

"12 Measures and Units"

"12.1 Introduction"

"12.2 Defining Measures"

"12.3 Defining Units"

"12.4 Using Units"

"12.5 The "unit" Construct"

"12.6 Units in Expressions"

"12.7 Units in Proc Calls"

"13 "#" Operators"

"13.1 Introduction"

"13.2 "#" Unary Operators"

"13.3 "#" Binary Operators"

"13.4 "#" Assignment"

"13.5 "#" Selection and Selection-Assignment"

"13.6 "#" Indexing"

"13.7 "#" Parentheses"

"13.8 "#" Calls"

"13.9 "#" Braces"

"13.10 "#" Example"

"14 Privileged Versus Non-privileged Code"

"18 Compile Time Execution"

"18.1 Introduction to Compile Time Execution"

"18.2 Package-Level "eval""

"18.3 Zed Compiler Internals"

"18.4 Examples"

"18.5 Library "ctProc" Procs"

"18.6 "varProc" Procs"

"18.7 "ioProc" Procs"

"18.8 "cTime" Procs"

"18.9 Templates"

"18.9.1 Template Types"

"18.9.2 Template Introduction"

"18.9.3 Template Blocks"

"18.9.4 Template Expressions"

"18.9.5 Con and Var Template Declarations"

"18.9.6 Template Implementation"

"18.9.7 Varargs Examples"

"18.9.8 Templates for Inlining"

"18.9.9 Scope Blocks"

"18.9.10 "private" "package" Example"

"18.9.11 Template Pitfalls and Notes"

"18.10 "ctSafe""

"18.11 "##" Accesses"

"19 "cliProc" Procs"

"19.1 Introduction"

"19.2 "cliProc" Proc Specification"

"19.3 Command Arguments"

"19.4 Calling Cli Procs From Normal Zed Code"

"19.5 Redirection, Pipes, and Backgrounding"

"19.6 Miscellaneous"

"20 Dynamic Typing"

"21 Data Copy Facility"

"22 Persistence, Run Time Paths and Databases"

"22.1 Persistent Variables"

"22.2 Persistent Vectors"

"22.3 Run Time Paths"

        L<22.3.1 Current Working Package>

        L<22.3.2 Path Types

        L<22.3.3 Path Constructors

        L<22.3.4 Paths to Packages>

        L<22.3.5 Paths to Persistent Variables>

"22.4 Databases"

        L<22.4.1 Introduction to Zed Databases>

        L<22.4.2 Details of Zed Databases>

"22.5 Considerations for Persistence"

"23 Programmer Defined Constructs"

"23.1 Introduction"

"23.2 Simple Example"

"23.3 ActiveConstruct_t"

"23.4 ConstructStart"

"23.5 ConstructFixed"

"23.6 ConstructInt"

"23.7 ConstructFloat"

"23.8 ConstructName"

"23.9 ConstructWord"

"23.10 ConstructExpr"

"23.11 ConstructBlock"

"23.12 ConstructTypeSf"

"23.13 ConstructSequenceStart and ConstructSequenceEnd"

"23.14 ConstructAltStart, ConstructAltEnd and ConstructAltTrigger"

"23.15 ConstructList"

"23.16 Example Construct Syntaxes"

"23.17 Construct Semantics"

"29 Special Names"

"30 Warnings and Information"

"31 Conclusion"

"96 Philosophy"

"97 History and Status"

"97.1 Overall Philosophy"

"97.2 Current System Status"

"98 Issues and Future Work"

"98.1 Constructors Versus Initializers"

"98.2 Operations on Compound Values"

"98.3 Threading and Parallelism"

"99 Some Decisions Made"

"99.1 "nilOk""

"99.2 Struct Parameters and Results"

"99.3 "in", "out" and "inout" Parameters"

"99.4 Multiple Proc Results"

"99.5 Counting "for" Variations"

"99.6 Template Types"

"99.7 Strings and Char Vectors"

"99.8 Specification Issues"

"99.9 Historical Notes"

1 Introduction

This document is neither a specification of the Zed language nor a tutorial for it. Instead, this document is a description of the Zed language, and so will include informal specifications as well as examples. There currently is no formal specification of the Zed language - this document provides the syntactic description, and much of the semantic description, but does not include a formal specification. The current implementation of the language, which is freely available, is the closest to a current specification. See "99.8 Specification Issues" for more discussion.

Within this document, some material will be enclosed inside square brackets. That material need not be read to get a full description of the language. Such material might be descriptions of alternatives, reasons why Zed does something in a given way, consequences of those decisions, etc.

Formal tokens of the language will be shown in a fixed-width font and will be enclosed in apostrophes, as in '#:=' and 'nonNil'. Some tokens, however, are also normal English words or technical terms. In those cases, the token tends to be emphasized only in definitional situations.

Those wishing to run Zed code should refer to "97.2 Current System Status" for information on how to use the compiler, etc.

A glance at the above table of contents shows that Zed is a very large programming language. Jokes are made that large programming languages "contain everything but the kitchen sink". Well, Zed has several bathtubs, let alone kitchen sinks! Everything in this document is implemented - things are not added to the document until after they are implemented, tested and deemed useful. Basic use of Zed for programming does not require any knowledge of most of the larger features of the language. Writing simple programs is quite straightforward. The presence of the other features will only occasionally be noticed because there are so many reserved words.

Many of Zed's features can be thought of as experimental. The concepts involved are usually not new, but implementing them in a fairly standard programming language may be. Having them all in one programming language has been a big experiment, and I believe the result is clear: yes, it can be done, and it can be done in such a way that they all play well together.

1.1 General

See "96 Philosophy" for some programming language philosophy.

The Zed language is a compiled procedural programming language with strong static typing. However, several features exist which can make it seem like a "higher-level", or more dynamic, language. These include:

introspection: a programmer can examine the internal representation of types, executable code, packages, etc.
explicit run-time type checks: programmers can choose to use values which can be references to any "tracked" values, and then distinguish those values at run time.
compile-time execution: this feature allows programmers to in some ways extend the Zed programming language. User code can run during compilation, accepting whatever kinds and types of arguments it wants. Such code can replace itself with other code, insert code into the calling context, create new types, etc.
dynamic code creation: the Zed compiler itself is a standard part of the run-time environment. Programmers can use the defined calls to create new code and execute it as they wish. Such code can use the full facilities of the language and run-time system, and can be fully optimized to native code.
physical unit analysis: variables and constants can be given physical (and other) units, as in "23(m)" for 23 metres. The compiler analyzes the uses and checks that the units match. When explicitly asked to, it will correctly convert between units and apply scaling factors. The standard text output library, "Fmt", will include units in output, and can simplify unit expressions.
programmer defined operators: programmers can define operators which work with their own types (including in combination with other types). Many people find such use more readable than equivalent code which uses a lot of proc calls. Operators can be prefix, postfix or infix (binary). Also available are programmer defined bracketing, indexing, field selection and list construction. Note, however, that programmers cannot redefine existing meanings, and that programmer defined operators are visually distinct (they start with "#").
programmer defined language constructs: programmers can define their own language constructs which can be used by themselves or other programmers. Since Zed has sufficient constructs for nearly all purposes, adding more constructs is often done for convenience or conciseness. Also, such constructs can introduce language features which are similar to those in other programming languages or environments, to hopefully simplify the task of translating from those languages or environments into Zed.
persistent variables and databases: package-level variables can be made persistent by using a "$" in front of their name. This can be simple scalar values, strings, structs, arrays and vectors (one-dimensional matrixes) which will autoexpand as needed. The latter can have an element presence bitmap associated with them, creating simple "relational" databases. There are additional tools and syntaxes associated with that use.

All Zed procs (procedures) exist within an encapsulation feature called "packages". These are not identical to similar concepts in other languages.

Zed is not advertised as an object-oriented programming language, but it includes "capsules" which are much like Java classes. It also includes "interfaces" which are much like Java interfaces.

The system uses 64 bit values for integral and floating point values, as well as for pointers. Strings are immutable basic entities in the system. All non floating point operations in the system are fully checked at either compile time or run time. This includes checks for tracked reference operations, '@' (pointer-like) operations, arithmetic operations and array subscripting.

The language includes both low level facilites and high level facilities. Privileged programmers can use type casts, pointer arithmetic, unions, etc. All programmers have access to struct types, 'bits' types, types of specified size, direct fixed-sized arrays, "safe" unions, etc. Higher level facilities include easy string operations, dynamically created vectors and matrixes, user-defined operators, functions which appear to have varying argument lists, list constructors, user-defined formatting procs, etc.

There are facilities in the language which are optional, in the sense that the programmer can choose to use them, or can mostly ignore them. One main feature of this kind is 'nonNil'. This is an attribute which can be attached to tracked references, '@' values and pointers. When present, the compiler will ensure it is always met. This can be used to clarify the requirements in proc calls, etc. The presence of 'nonNil' on a tracked reference, '@' or pointer means that the compiler does not need to arrange to check such values against 'nil' at run time, and so code can execute faster.

Here is a variant of the standard "hello world" program, in Zed:

    package /Hello;

    import /Fmt;

    export proc
    Run()void:
        Fmt("Hello there world.");
    corp;


    package /;

    export proc
    main()void:
        Hello/Run();
    corp;

[In the current implementation, execution starts with proc "main" in the root package ("/"). In the future this will not be the case - execution will start with a proc which is a defined command entry point found using a normal path-searching facility. "package" specifications will not be needed.]

The 'import' line is similar to "import" in Java or "#include" in C - it specifies a place where other names can be found. Package "Fmt" provides formatted text output facilities along the lines of "printf" in C. The form "Hello/Run" is like a file path in an operating system - it says to look in package "Hello" for name "Run". Proc "Fmt" displays (prints) the values it is given, and a newline. It is a special kind of proc in Zed which executes at compile-time and accepts arbitrary parameters with optional attached formatting specifications. It replaces the call to itself with appropriate calls to do the required formatting and output. Programmers can write their own procs to do similar things.

Zed supports the use of physical units, like "second", with scaling factors like "kilo". This facility can provide additional checking of calculations involving such units. There are also utility facilities, not actually part of the language, which can make the use of such units "friendlier". Here is an example program fragment:

    proc
    BttF(bool useLightning)void:
        var ampVec := getCurrentCurrents(), totalCurrent := 0.(A);
        for i from 0 upto getBound(ampVec) - 1 do
            totalCurrent := totalCurrent + ampVec[i];
        od;
        float(V) voltage := if useLightning then 1_000_000.(V) else 12.(V) fi;
        /* *How* many jigga-watts?!!?! */
        con power := totalCurrent * voltage;
        Fmt("Power = ", power, " [", power :: gUs, "] {", power :: gUsn, "}");
    corp;

and the output from it:

    Power = +0.121006e+010(A*V) [1.21(GW)] {1.21(gigawatt)}

Those examples are fairly "low level". Zed can also go "high level":

    proc
    VarTest()void:
        Var_t vec := #{1, 3.14, Complex_t(1.1, 2.2), Range(3), #{"hello", "there"#}#};
        Fmt(vec);
        for i from 0 upto 1 do
            vec#[i#] #:= vec#[i#] #+ 1;
        od;
        vec#[3#]#[1#] #:= vec#[3#]#[1#] #* 10;
        vec#[3#]#[2#] #:= vec#[3#]#[2#] #* 100.001;
        Fmt(vec);

        Fmt(Map(Lambda(v) v #+ v end, vec));
    corp;

With the right 'import'ed packages the above example compiles and runs, producing:

    #{1, 3.14, Complex_t(1.1, 2.2), #{0, 1, 2#}, #{hello, there#}#}
    #{2, 4.14, Complex_t(1.1, 2.2), #{0, 10, 200.002#}, #{hello, there#}#}
    #{4, 8.28, Complex_t(2.2, 4.4), #{0, 10, 200.002, 0, 10, 200.002#},
      #{hello, there, hello, there#}#}

There is lots of "#" syntax in this example. That syntax is for user-defined operations. Here, a user-defined package is providing the "Var_t" facility, which does run-time type testing, and provides higher-level operations like vector operations, mapping procs, etc. Another package provides a complex number type, using a record type. Neither of those two packages knows about the other - they are only combined in the test program.

The Zed language is a "safe" language in the sense that programmers cannot cause "memory stomping", "array overflows", etc. The language guarantees that checks are made, either at compile time or at run time, to prevent such things. The exception to this rule is that privileged programmers can do things like working directly with pointers, "unsafe" unions, type casts, etc. They can thus cause all of those nasty problems. It is expected that very few Zed programmers will be privileged. This document does not describe how to do it, although it is currently trivial.

As mentioned above, the semantics of the Zed language are mostly defined by the Zed compiler. Code generators will be intensely scrutinized to ensure that they map those semantics correctly. The semantic checking and construction of internal representations is done by Zed code, and any other implementations of that work will not be accepted without intense scrutiny. If accepted, such an implementation will replace the original - there cannot be more than one accepted implementation. This is the choice that has been made. There can be multiple code generators for various target CPU's.

[A note to C/C++/Java programmers: the normal integral type in Zed is 'uint', which is unsigned. Signed integers ('sint') should only be used when negative values are required. Avoid them unless your application really needs them, since the expectations you are used to might not hold. For example, integral literals are of type 'uint'. This is normally not an issue, however, since the compiler will silently convert an in-range uint constant to a sint constant if needed, and vice versa.]

[The Zed language is, in some places, heavily context sensitive. In other words, the meaning of things, and what you can do with them, varies a lot from item to item. One cause of this is the fact that types are usable as values. A consequence of this nature is that the Zed compiler's recovery from some syntax errors is not very good - a single actual error can sometimes result in several errors being reported by the compiler. More work can address some of these issues, but it has not been a high priority thing. For now, if you get several error messages from one or two lines of source, fix the first problem, then skip ahead 2 or 3 lines before looking for another error to fix in that round of error-fixing. A common error which can cause multiple complaints in the current version of the Zed system is using a C-style "->" instead of Zed's '@.' when dealing with '@' values.]

[The Zed language has a *lot* of reserved words. This is a sharp difference from the C family of languages, which have not wanted to add many reserved words over what the original C language had, so as to not break more old programs than necessary. Since Zed is starting afresh, it can have reserved words which actually tell you what is going on.]

[Zed can be a verbose programming language. In some situations it is in fact less verbose than older languages, but it can certainly be verbose in some situations. For example, this can be a variable declaration:

    path nonNil [] uint con nonNil pth := ...

It declares "pth" to be a non-changing, guaranteed-to-be-not-nil reference to a "path", which is guaranteed to reference a persistent vector of 'uint's. Given the right initializer, that declaration could also be written:

    con pth := ...

The verbose declaration may be needed to describe exactly the nature of the variable being declared, when 'con'/'var' doesn't yield what is wanted. Similar long declarations can happen in C, with "const", "volatile" and pointer types.]

[One thing that many languages do is to introduce a lot of punctuation marks instead of reserved words. That can make source in those languages smaller. To the non-expert reader, it is not necessarily clearer, however. Zed in general uses reserved words and language constructs instead of special punctuation marks.]

2 Lexical Elements

The lexical elements or "tokens" in a programming language are those input things that make up the programs in the language. Examples typically include numeric literals ("137"), punctuation marks (":=") reserved words ("if") and names ("Counter").

2.1 Character Set

All Zed reserved words use only characters from the ASCII set. All names usable in the Zed programming language use the same set. Characters inside string literals, comments, and Cli mode arguments can be any 8 bit value other than most ASCII control characters. Names in the Zed persistent store are also not limited to the lower end of the ASCII character set; they can also contain spaces, minus signs, etc. Such names, however, cannot be directly used in Zed programs - they must be accessed indirectly, usually through 'path's.

[There is nothing stopping someone from externally renaming what is originally a valid name for the Zed programming language into something that is not so valid. Such a name will be displayed in some form or other if code using such a name is displayed. However, changes cannot be made to such code since it will not "parse" correctly. The name in question must be renamed back to something valid first.]

2.2 Reserved Words

Zed has reserved words, not keywords - that means that they always have their special meaning, regardless of where they appear. [Some programming languages use keywords which are only recognized in certain positions, so they can allow, e.g. "IF IF = THEN THEN THEN = IF".] The following shows the reserved words in Zed, with brief notes as to what they are used for. Note: many of these are esoteric, and will not be explained until much later.

    'proc' - start a proc definition
    'corp' - end a proc definition
    'ctSafe' - flag a proc as safe for quick compile-time execution
    'ctProc' - flag a proc as being called at compile time
    'construct' - not used
    'varProc' - flag a proc as taking variable arguments
    'ioProc' - flag a proc as being a variable argument input/output proc
    'cTime' - flag a proc as being for compile time only
    'cliProc' - flag a proc as being a Cli command proc
    'if' - start an 'if' statement or expression
    'then', 'elif', 'else' - used within 'if's
    'fi' - end an 'if' statement or expression
    'and', 'or', 'not' - logical (boolean) operators
    'while', 'do', 'od' - parts of 'while' loops
    'for', 'from', 'upto', 'downto' - parts of 'for' loops
    'case', 'incase', 'default', 'esac' - parts of 'case' statements or expressions
    'matrix' - dynamic vector/matrix creation
    'getBound' - retrieve a matrix bound
    'nil' - unique tracked and pointer value
    'true', 'false' - boolean constants
    'throw', 'catch', 'require', 'atomic' - not used
    'assert' - run-time programmer checks
    'abort' - explicit run-time programmer abort; also proc flag
    'eval' - discard result of expression; run proc at package compile time
    'return' - return from within a proc
    'begin', 'end' - delimit body of explicit scope
    'strict' - restrict optimizations
    'package' - declare containing package
    'subpackage' - declare containing subpackage
    'use' - allow references to names from an external package
    'type' - declare a named type
    'local' - mark item as visible to entire package
    'export' - mark item as exported to other packages
    'import' - make exported names in package as directly usable
    'var' - short-form variable declarations; mark formal as writeable
    'con' - short-form variable declarations with implicit 'con'
    'def' - short-form constant declarations
    'template' - introduce template types and constructs
    'generic' - introduce a generic set of types and procs
    'instance' - create an instance of (instantiate) a generic
    'extends' - relate new capsule/interface to existing one
    'interface' - introduce an interface specification
    'capsule' - introduce a capsule (class)
    'implements' - specify interfaces implemented by a capsule
    'final' - prevent overriding of capsule method
    'procs' - introduce a section of methods within a capsule
    'table' - not used
    'baseCall' - call corresponding method in contained capsule
    'partial' - allow interface implementors to skip some parts
    'external' - not used
    'public' - allow arbitrary use of record/capsule
    'private' - restrict use of record/capsule/bits or fields
    'inline' - absorb a struct into another struct/record/capsule
    'noInit' - mark field as not to be included in constructor
    'con' - mark variable/field as never changing
    'ro' - mark variable/field as not writeable
    'volatile' - force compiler to sequence accesses and stores
    'ref' - not used (previous form of '@')
    'nonNil' - restrict values to those other than 'nil'
    'nilOk' - allow an '@' value to be 'nil'
    'struct' - declare a 'struct' type (direct multivalued)
    'record' - declare a 'record' type (indirect multivalued)
    'union' - declare a 'union' type (overlayed)
    'bits' - declare a 'bits' type (named, sized bitfields)
    'oneof' - declare a 'oneof' type (named specific values)
    'enum' - declare an 'enum' type (named value sequence)
    'path' - declare a 'path' type; the 'path' construct
    'DbType' - declare a database type
    'measure' - declare a (physical) unit category
    'unit' - declare unit of measure; convert units
    'exception' - not used
    'void' - a pseudo-type representing "no value"
    'poly' - a pseudo-type used with interfaces and capsules
    'bool' - boolean built-in type
    'char' - character built-in type
    'ichar' - not used (International character - unicode)
    'achar' - not used (ASCII character)
    'uint' - unsigned integer built-in type
    'sint' - signed integer built-in type
    'float' - floating point built-in type
    'string' - string built-in type
    'istring' - not used
    'astring' - not used
    'any' - pseudo-type that accepts any tracked value
    'autoAny' - like 'any' but with automatic conversions
    'bits8' - 8-bit built-in type
    'bits16' - 16-bit built-in type
    'bits32' - 32-bit built-in type
    'bits64' - 64-bit built-in type
    'bits128' - not used
    'bits256' - not used
    'toUint' - convert various values to 'uint'
    'fromUint' - convert from 'uint' to some other types
    'byteSwap' - swap bytes in a value
    'evenParity' - compute bit parity of a value
    'onesCount' - count the "1" bits in a value
    'lowOneIndex' - find the least signficant "1" bit in a value
    'highOneIndex' - find the most significant "1" bit in a value
    'flt' - convert from integral to floating point
    'round' - convert from floating point to 'sint'
    'trunc' - convert from floating point to 'sint'
    'typeof' - not used (not needed in Zed)
    'sizeof' - alternate way to find the byte-size of a type
    'pretend' - low level type conversion ("type-cast")
    'assign' - run time 'nonNil' check or type check
    'select' - run time selection of variant record alternative

2.3 Punctuation

The punctuation marks in Zed are:

    '=' - test for simple equality
    '~=' - test for simple non-equality
    '<' - test if left-hand-side less than right-hand-side
    '>' - test if left-hand-side greater than right-hand-side
    '<=' - test if left-hand-side less than or equal to right-hand-side
    '>=' - test if left-hand-side greater than or equal to right-hand-side
    '==' - test for string identity (same string)
    '~==' - test for string non-identity
    ':=' - assignment
    '+' - integral, float, etc. addition, string concatenation, unary identity
    '-' - integral, float, etc. subtraction, unary negation
    '*' - integral, float, etc. multiplication, flex array bound
    'E<sol>' - integral, float, etc. division, path component
    '%' - integral, float, etc. remainder
    '~' - bitwise negation
    '&' - bitwise AND
    '|' - bitwise inclusive OR
    '^' - float exponentiation
    '><' - bitwise exclusive OR
    '<>' - "relate" comparison operator
    '<<' - bit shift left
    '>>' - bit shift right
    '<~' - bit rotate left
    '>~' - bit rotate right
    '#=' - user operator equality
    '#~=' - user operator non-equality
    '#<' - user operator less than
    '#>' - user operator greater than
    '#<=' - user operator less than or equal
    '#>=' - user operator greater than or equal
    '#==' - user operator other equality
    '#~==' - user operator other non-equality
    '#:=' - user operator assignment
    '#+' - user operator addition, unary identity
    '#-' - user operator subtraction, unary negation
    '#*' - user operator multiplication
    '#/' - user operator division
    '#%' - user operator remainder
    '#~' - user operator bitwise negation
    '#&' - user operator bitwise AND
    '#|' - user operator bitwise inclusive OR
    '#^' - user operator exponentiation
    '#><' - user operator bitwise exclusive OR
    '#<>' - user operator "relate"
    '#<<' - user operator shift left
    '#>>' - user operator shift right
    '#.' - user operator field select
    '#->' - user operator field select in records and capsules
    '@' - safe pointer creation or follow
    ',' - value separator
    ';' - statement separator
    ':' - in case constructs, 'ioProc' size, precision, etc.
    '(' - left parenthesis (evaluation order)
    ')' - right parenthesis
    '[' - left square bracket (array, string indexing)
    ']' - right square bracket
    '{.}' - run-time current working package
    '{' - left brace (type declarations, indirect calling)
    '}' - right brace
    '.' - period - field selection, path component
    '->' - arrow - field select with indirection
    '::' - 'ioProc' output format
    '..' - parent in path
    '#@' - user operator enref or deref
    '#(' - user operator left parenthesis
    '#)' - user operator right parenthesis
    '#[' - user operator left square bracket (usually indexing)
    '#]' - user operator right square bracket
    '#{' - user operator left brace (usually lists)
    '#}' - user operator right brace
    '#' - not currently used
    '##' - reference to export from type

2.4 Integral Literals

Integral literals can be represented in the usual decimal (base 10) notation, or in binary (base 2), octal (base 8) or hexadecimal (base 16). Decimal is used by default - the other bases are chosen by prefixing the number with "0b", "0o" or "0x" respectively. The prefixes can be in lower case or upper case, as can hexadecimal digits. There can be underscores ("_") within the digits of the number - they are ignored but can be useful to humans reading the values. Examples:

    3 192475340785 0b011 0x1234_5678_9abc_def0 0o074

2.5 Floating Point Literals

Floating point literals are similar to those in other programming languages. They can include an exponent or not, as needed. An exponent is introduced by "e" or "E" and can have a sign. Prefix "0f" can be used to indicate that a number is a floating point literal rather than an integral literal. If the character after that prefix is not a digit or a decimal point, then the entire number is treated as a special input form. Special form "0fNaN" represents the IEEE non-signalling NaN 0fx7ff8000000000000. Special form "0fInf" represents the IEEE infinity 0fx7ff0000000000000. Arbitrary forms can be specified in hexadecimal after prefix "0fx" - in this form there must be exactly 16 digits. As usual, both lower and upper case work in all of these forms. Examples:

    12. 39846.03918 0.394374e27 7783378.3974e-53 0f23 0fNaN 0fx123456789ABCDEF0

Note that the treatment of "infinities", "NaNs" and "denorms" might vary from CPU to CPU. Thus, any program which relies on any specific behaviour is probably non-portable. These facilities are in the language for the use of those who know what they are doing. Note also that formatted output facilities in Zed might use "NaN" or "Inf" for all such values if hexadecimal output is not selected.

[By now, mathematicians and numerical analysts will be sighing in disgust at my dismissal of important issues in floating point arithmetic, and my blind assumption that the entire world is IEEE floating point. First, I am neither a mathematician nor a numerical analyst - in fact, a simple Mandelbrot Set program is usually the only thing I use floating point numbers for. I would have to start with a new brain and a new career path in order to do much in the way of properly defining floating point. Second, Zed is intended to run on existing and near future processors. Thus, I really don't have much choice - I use what is available. Chances are I could do better with what is available - I welcome detailed information that I can understand.

Something solid I could use is advice on exactly how to define the few floating point things that Zed has, along with detailed information on how to effectively implement them. I am not averse to a system where there are two (more?) modes of compilation, one that compiles for speed, and one that compiles for floating point correctness, with all needed compile-time and run-time checks. See the 'strict' constructs for something relevant.]

2.6 Character and String Literals

In the current form of Zed, only full quotation marks (") can be used to indicate character and string literals. A literal with only one character in it is a character constant, but it can be used in nearly all contexts which require a string. This relates to the fact that Zed will silently convert a character value into a string value when needed. [Yes, this actually works. I wasn't sure it would, but I implemented it and it all worked out.] [Zed might switch to an apostrophe (') in the future, when I add support for international characters (e.g. Unicode), and need a way to represent literals of that kind. For now, character/string literals represent ASCII characters.] String literals can contain any character given to them, but the quote character and the backslash ("\") have special meaning. A quote character ends the literal unless followed, after whitespace, by another quote character. A backslash character introduces an escape sequence, which can be one of:

    \\ - a single backslash character
    \b - an ASCII backspace character
    \n - an ASCII newline character
    \r - an ASCII carriage-return character
    \t - an ASCII tab character
    \<two uppercase hex digits> - ASCII value <= 0xFF described by hexadecimal digits
    anything else - the backslash is ignored

The digits in a hexadecimal escape must be uppercase so as to not be ambiguous with the "\b" escape.

Zed supports the concept of "string breaks", where multiple string literals appear one after another, separated by whitespace. The multiple strings are handled as a single string literal. This allows string literals longer than the input lines to be represented. This is similar to, but more readable than, the C convention of allowing a string to continue on multiple lines if the earlier pieces end in a backslash.

Examples:

    "p" "hello there world!" "spread\tme\tout" "wild: \f7"

    "This is the first part of a long string literal which is longer than the "
    "input lines, and so I've used \"string breaks\" to represent it. Note "
    "that you must put quotation marks on both ends of each piece."

2.7 Whitespace and Comments

"Whitespace" in computer programs is the material that has no importance to a compiler for the language. This includes the traditional "white" on paper documents: blank spaces and empty lines. The term is also extended to mean comments, which technically aren't white. In Zed, normal whitespace consists of the ASCII characters space, tab, newline and carriage return.

Zed has two forms of comments. The first form starts with "/*" and ends with "*/". Such comments can nest, meaning that one can be entirely within another. The second form of comment starts with "//" and goes to the end of the line containing the "//". A "//" inside a "/*"/"*/" comment is not recognized. Similarly, a "/*" or "*/" within a "//" comment is not recognized. Beware that if you accidentally end a "//" comment with a "*/", and then later comment out the entire section with "/*"/"*/", the stray "*/" might become significant and can prematurely end a comment section.

Any whitespace or comment will separate one language token from another. Whitespace and comments are not recognized inside string literals. Examples:

    /* This is a simple comment. */

    // This goes to the end of the line

    /*
        if a < b then
            return a;   // Found what we want!
        else
            /* Accumulate the ones we don't want. */
            b := b + a;
        fi;
    */

2.8 Names

Names in Zed are programmer defined symbols or identifiers. They consist of letters, digits and underscores, and must start with a letter or underscore. Names cannot be longer than the arbitrarily chosen limit of 32 characters. None of the above reserved words can be used as a name. Examples:

    i Count MAXIMUM_OVERDRIVE j1k32 _Fred_ IdCounter

Programmers should not use names starting with underscores other than as described in this document - system libraries reserve all such names for their own use.

When directly referencing persistent variables, a "$" is used in front of the name of the variable. Although not technically correct, it should not cause problems if the "$" is thought of as being part of the variable name. The value of persistent variables are maintained between runs of a Zed "program". Persistent variables can only be declared at the package level. They are discussed in "22.1 Persistent Variables".

3 Overall Structure

3.1 Packages

Much like modern operating systems operate within the context of a hierarchical tree of "directories" or "folders", the Zed system operates within a hierarchical tree of "packages". Packages combine the functionality of directories or folders with that of packages as seen in programming languages such as Java.

Packages can contain other packages, and can contain subpackages. Packages and subpackages can also contain programming elements such as variables, procs, types, etc. They can also contain persistent variables, which fulfill some of the roles played by files in traditional systems. It is also intended, but not yet implemented, that they contain data elements that can be more flexible than persistent variables, and so play other file-like roles. Some of the elements in packages are somewhat artificial, only used internally. For completeness, all are listed here.

Packages can contain:

blank line - as needed by the input source
line comment - "//" style
block comment - "/*"/"*/" style
'use' statement - make some other package available for use
'import' statement - directly import symbols from some other package
long form constant(s) declaration
'def' declaration
constant(s) initialization (array, struct)
long form variable(s) declaration
'con'/'var' declaration
type predeclaration - forward declare struct/record/capsule types
type declaration
measure - define a (physical) measure (e.g. "Distance")
unit - define a unit of a measure (e.g. "metre")
proc predeclaration - predeclare a proc
proc definition - an actual proc with body
generic - an uninstantiated generic with types and procs
generic instance - a specific instance of a generic
interface predeclaration - predeclare an interface
interface - an interface specification
capsule predeclaration - predeclare a capsule type
capsule - a full capsule definition with record and proc sections
'eval' - a proc call to execute at compile time
'assert' - an assert to do at compile time
package - another package within this one
subpackage - a contained subpackage of this package
source file path - temporary artifact in this implementation

Non-"whitespace" package elements are separated by semicolons.

The order of definitions of programming elements within packages is important. For example, package variables (which correspond to file variables in C and C++) are initialized in the order in which they are declared. This matters when an initialization expression depends on the values of previous variables.

Type predeclarations are used when types circularly refer to one another. Capsule predeclarations are used when capsules refer to one-another. Proc predeclarations are used when procs, either directly or indirectly, call each other, in a mutual recursion relationship. Constant definitions (and variable initializations) must appear after any other constant definitions that they depend on. Unit definitions must appear after their measure definitions.

Subpackages in Zed are a way to address the problem of large complex packages of types and code becoming too long in terms of the lines of code within them. The subpackage concept allows the large amount of material to be split up into as many pieces as are needed, but without having to expose types and interfaces outside of the package. See the 'local' attribute defined below.

Package elements which are declarations or definitions of programming items can have a prefix which controls the visibility of the item. Consider a simple constant declaration:

    uint MAX_THINGS = 100;

If the declaration appears at the package level just like that, then the name "MAX_THINGS" is visible to anything within the one package or subpackage containing the declaration. The definition does not extend to any contained packages or subpackages. If the declaration has prefix 'local' then the name is visible within the package and all of its subpackages, regardless of whether the definition is in the main package or in a subpackage. If the package has no subpackages, then there is no difference. Note that a 'local' name is not visible within packages (non-subpackages) contained within the package containing the name - packages do not work like scopes in programming languages.

If the declaration is prefixed with a simple 'export' then the definition becomes universally visible - it can be used anywhere, so long as that place can give a path to the package containing the name. Again, note that packages contained within a defining package do not automatically see symbols exported from the first package. They can set up a 'use' or 'import' to do so. The containing package is referenced as '..'. The 'export' prefix can be followed by a parenthesized, comma-separated list of paths to other packages. In that case, only those other packages (and the current package and any subpackages) can reference the name. This includes any subpackages of the allowed packages, but not any full packages within the allowed packages.

A specific example of this is from system package "Types":

    export(../../Package, ../../Proc, ../../Exec) proc
    SaveStorageFlags(ByteBuffer/Buffer_t nonNil buf; StorageFlags_t sf)void:
        ...
    corp;

Proc "SaveStorageFlags" (used for saving a storage flags collection to a bytestream) is exported only to the three listed packages (as well as its own package, "Types"). The other packages are accessed with "../../XXX" because "SaveStorageFlags" is defined in subpackage "ByteBufferIO" of package "Types".

The 'local'/'export' specification or implicit private specification is called a "visibility specification".

3.2 "use" and "import" in Packages

Most code that programmers write in Zed will be dependent on code found elsewhere, whether that is other code by the same programmer, code by other programmers, or system libraries. The Zed compiler is told where it can search for other code via 'use' and 'import' directives. These both go at the package level, and are usually near the beginning of the package.

Syntactically both consist of:

'use' or 'import'
path to a package

With 'import', all names defined in the imported package which are either fully exported or are exported to this package, become directly available to code in this package. 'import' is usually not used much, since it brings with it the possibility of conflicting names. E.g. if a name defined in the current package is the same as a name that would otherwise be available from an imported package, what should happen? Technically, this situation results in undefined results in Zed. However, an error report will be produced. In the current implementation, the order in which the names are created or found governs the result. That usually means that if a name is available from an imported package, it cannot be defined in the importing package. If names within multiple imported packages conflict, then the first-encountered name will "win". If a new 'import' is placed after the definitions of package names, then a conflicting name from the package being imported will "lose". It is suggested that the use of 'import' be carefully considered. One package that it can make sense to 'import' is "Fmt", since most of the names exported from that package start with "fmt", and so are easily avoided, and that package provides often-needed functionality.

If 'use' instead of 'import' is used, then only the name of the package being 'use'ed is subject to the above conflict handling. When a package has been 'use'ed, then all of the names in it which are exported universally or to this package can be referenced by placing the name of the 'use'ed package, and a separating '/', before them in code. For example, if symbol "DoGoodThings" is universally exported from package "GoodStuff", and package "GoodStuff" is 'use'ed by the current package, then "GoodStuff/DoGoodThings" is a valid name reference in the current package. This looks like a division operation, and can be confusing for a few minutes. However, the compiler is not at all confused since the name before the '/' is a package name, and those cannot be divided. Also, the automatic formatting within the Zed system will never put spaces around '/'s in a path, but will always do so around a division '/'.

It is also possible to use a full path as a reference, but that is rare. Typically, use of more than two names in a path would only be needed if there is more than one package of a given name that needs to be 'use'ed within a single other package. Also, this technique, though somewhat cumbersome, can be used to avoid the conflicting-name situations described above. For example:

    use ComplexStuff/PartOne;
    use ComplexStuff/PartTwo;

    PartOne/SubpartTwo/Init(...);
    ...
    PartTwo/SubpartSeven/Init(...);

When 'use' is used, the final component of the 'use'-ed path becomes a name which can be used directly in code. Thus, for example, that final name can be used by itself to refer to the 'used'-ed package. When 'import' is used, the final name in the path is not made directly available, only appropriate names within the 'import'-ed package are. So, the imported package name cannot be used directly, based on that 'import'.

3.3 Paths During Compilation

Note the header of this section: "Paths During Compilation". Direct access to elements of compilation paths is only possible during compilation. More specifically, it is available when running under the bytecode engine. If a program is compiled to native code, it does not have access to these compilation-time features - the values they represent are long gone. This same limitation also applies to types, when used as 'Types/Type_t' values and to procs when used as 'Proc/Proc_t' values. Packages referenced as values are of type 'Package/Package_t'.

Paths to items in the hierarchy of packages are part of the Zed programming language. They can be used in code just like references to package-local items can be. As mentioned above, packages can 'use' other packages, which adds the name of the package being used to the set of names that the using package has access to. Typically, that name is followed by '/' and the name of some item in the used package. This is one form of path in Zed.

Just like in operating systems, paths can start with the root of the package hierarchy (represented as '/' in Zed). Paths can also start with '.', representing the current package or subpackage, or '..', representing the parent package of the current package or subpackage. '..' can occur anywhere in a path. These paths are resolved at compile time, so, for example, path '.' will refer to the package or subpackage containing the proc being compiled. [Most operating systems use a forward slash ('/') as the delimiter in paths. Windows is, I believe, the only one which uses a backward slash ('\'). Web page URL's also use forward slashes, although many browsers will silently accept backward slashes as well. Zed only accepts forward slashes - like in most programming languages, the backward slash character is used as an "escape" in strings.]

Some examples of path uses (note that system type "Package/Package_t" is the type of packages and subpackages):

    use /Package;
    use /Math;
    use ../MyUtils;
    ...
    proc
    checkPackage(Package/Package_t nonNil pk)void:
        ...
    corp;
    ...
    proc
    test2()void:
        checkPackage(.);
        checkPackage(..);
        checkPackage(/Fmt);
        checkPackage(myChildPackage);
        checkPackage(MyUtils/Strings);
        string str := MyUtils/Strings/CreateMyString();
        Package/Package_t nonNil selfPk := .;
        checkPackage(selfPk);
        MyUtils/MyVars/DebugKey := 1282;
        ../../FredStuff/Setup(true);
    corp;

When a name is encountered which is not yet part of a path (i.e. could be the start of a path, or used by itself), putting things after it is safe, since the name will have been interpreted in normal programming language mode (as opposed to "CLI mode", described in "19 "cliProc" Procs"). However, if an actual path is encountered, the Zed compiler will switch to evaluating input as path components. Within a path, the '/' character represents the path-component separator. This means that comments are to be avoided in that situation. Both comment styles start with the '/' character, and that can be confused with the path separator '/'. If a comment is absolutely required, start it on the next line. This handling is consistent with the typical handling of paths, e.g. when opening "files" - you cannot put comments between path components.

3.4 Procs

Procs are the central element of computation and algorithm in Zed. They can take zero or more input values (parameters) and can produce a result. Some parameters to them might be safe pointers ('@' to values), and so can be input, output or input/output parameters. The named parameters specified in a proc definition are the "formal parameters" of the proc. When the proc is called, those formal parameters receive values which are the "actual parameters" for that call.

The full syntax of a proc definition has several options which are not fully explained until later - they are mentioned here for completeness. The syntax of a proc definition is:

optional visibility specification
optional forced proc type and ':'
'proc'
optional 'abort', 'ctSafe', 'final', 'ctProc', 'varProc', 'ioProc', 'cTime', 'cliProc'
name of proc
proc header
':'
body of proc
'corp'

A proc header consists of:

'('
formal parameter list
')'
optional 'nonNil'
result type (often 'void')

A "forced proc type" is a proc type (see "4.14 Proc Types"), which is used as the type of the proc if that proc is used as a value. Such type forcing is needed when writing procs to use with "#" operations, discussed much later, as well as other interaction with the Zed compiler. Essentially, the "forced proc type" concept is used to state that a proc is intended to be used for that purpose. This differs from other languages where a proc can be used for a given purpose if its signature (full parameter and result specifications) happens to match a given situation. In Zed, proc types are equivalent only if the parameter names are the same. Using a "forced proc type" on a proc allows it to use different names for its parameters, which make more sense than the standardized names in the forced type. [If a given proc is useful for multiple purposes, wrapper procs with forced proc types can be used to make it fit those purposes. The Zed compiler can be relied on to directly use the base proc in actual code, unless optimizations are disabled.]

The formal parameter list is zero or more sets of formal parameter declarations, separated by semicolons. Each set of formal parameter declarations consists of:

type specification
optional storage flags ('var', 'ro', 'volatile', 'nonNil')
zero or more formal parameter names, separated by commas

[Unlike in C and C++, all of the type is within the "type specification" - none is attached directly to a given parameter name. Also unlike C and C++, groups of parameters of a given type are separated by semicolons, not commas, and a parameter type and storage flags can apply to multiple parameters.]

See "4.3 Storage Flags" for general information on storage flags. For proc formal parameters, the flags have the following meanings:

'var' - within the proc, the formal parameter can be changed, thus assignment to it is allowed. Without the 'var', the formal parameter is 'con', and cannot change. 'con' should not be given explicitly.
'ro' - the formal parameter can be changed within the body of the proc, but if '@' of it is taken, that '@' will be an 'ro' '@'. The consequence of this is that the value of the formal parameter cannot be changed by any code not within the proc body.
'volatile' - see "4.3 Storage Flags"
'nonNil' - see "4.3 Storage Flags" - for formal parameters of tracked, '@' and pointer types, the value can never be 'nil'

Note that, by default, proc formal parameters are not writeable, i.e. they cannot be assigned to. An explicit 'var' storage flag is used to make them writeable. Most formal parameters are never written, so the 'var' serves as an indicator that a given parameter is assigned to or otherwise (possibly) modified. There is a corresponding warning when a 'var' formal is never modified.

The body of a proc is a sequence of statements, with a semicolon after each one. If the proc has a result (result type is not 'void') then an expression giving that result must appear after the final (if any) statement in the body. ['return' statements can also be used, but if the proc is not 'void', the result expression must appear.]

The body of a proc is a scope. See "5.1 Scopes".

Since the material which can appear before or after the name of the proc can be fairly long, it is suggested that that material, and the 'proc', appear on one input line, and the proc header, starting with the name, appear on the next. The simplest such line is just:

    proc

but it can be considerably longer, e.g.:

    export(../GUI, ../Logic) Utils/MacroType_t: proc ctProc

This convention can reduce the annoyance of nicely formatting a long proc header.

As an example, here are traditional example procs which return no result:

    proc
    hanoi(uint n; string nonNil fromStr, usingStr, toStr)void:
        if n ~= 0 then
            hanoi(n - 1, fromStr, toStr, usingStr);
            Fmt("  Move disk from ", fromStr, " to ", toStr);
            hanoi(n - 1, usingStr, fromStr, toStr);
        fi;
    corp;

    proc
    doHanoi(uint n)void:
        Fmt("\nTowers of Hanoi for ", n, " disk",
            if n = 1 then "" else "s" fi, ":");
        hanoi(n, "left peg", "middle peg", "right peg");
    corp;

Here is an example (also recursive!) that does return a result:

    proc
    factorial(uint n)uint:
        if n < 2 then
            1
        else
            n * factorial(n - 1)
        fi
    corp;

In "factorial" there are no statements before the result expression, and that expression is itself a compound - an 'if' expression.

Procs can be predeclared, i.e. their header can be specified before their body is given. This allows procs to be called before they are defined, which is needed when procs call each other in mutually recursive fashion. A proc predeclaration consists of just the header portion of a proc declaration, with the ':', proc body and 'corp' not present. For example, a predeclaration of the above "hanoi" is:

    proc
    hanoi(uint n; string nonNil fromStr, usingStr, toStr)void;

The proc header of a predeclaration must be compatible with the proc header in the later proc definition. This includes the types and names of the parameters, the result type, and any 'nonNil' or 'nilOk' storage flags on parameters and the result.

[In the current Zed implementation, other storage flags, such as 'var' or 'volatile', which are given in the proc definition, will also appear in the proc predeclaration.]

The various flags which can be put between 'proc' and the name of the proc are discussed later. A quick summary:

'abort' - the proc does not return (often a custom abort proc)
'ctSafe' - the proc is safe to call at compile time
'final' - capsule/interface proc cannot be overridden
'ctProc' - proc runs at compile time, but is otherwise normal
'varProc' - proc takes variable arguments at compile time
'ioProc' - proc takes variable arguments with formatting info
'cTime' - proc is intended for compile-time execution
'cliProc' - proc defines a command

A proc which has 'abort' in its header must never return. All flows of execution through it must end in an 'abort' (see "6.8 Abort Statement"), or a call to some other proc with 'abort' in its header. Alternatively, such a flow can end in an infinite 'while' loop, i.e. one where the condition can be determined at compile time to be 'true'.

4 Types

Types in a programming language are the way in which meaning is given to the raw bit patterns in the running program. For example, a given bit pattern means one thing when interpreted as an unsigned integer, but means something else quite different when interpreted as a floating point number. Some types represent addresses of memory within the computer, rather than simple values.

Zed categorizes types (very roughly) into four categories:

simple types - these types are types like integers, floating point values, boolean (true/false) values, enumeration values, etc. They represent themselves and do not refer to anything else.
tracked types - these types, at the implementation level, are pointers to regions of storage in the computer. The pointers are "safe" in the sense that they will always be either 'nil' or a pointer to something valid and of the correct type. They are called "tracked" because the internal techniques of reference counting and garbage collection are used with them. Tracked types include strings, allocated vectors/matrixes, record types, interface types, capsule types, path types and 'DbType' types.
pointer types - these types are also pointers to chunks of memory, but they are not automatically managed by the system. '@' types are usually fairly short-lived and, because of the language rules governing them, they are always safe to use. Zed also allows programmers to directly use standard pointer types. These are not safe to use, and so full use of them is restricted to privileged users in Zed.
compound types - these are types representing multiple values. They are struct types, array types (which are statically sized) and union types. Because 'bits' types must fit within single "words", they are not considered to be compound types even though they have multiple sub-fields.

4.1 Basic Types

Zed has several predefined types. Their names are reserved words. They are:

'bool' - logical truth value. Allowed values are 'true' and 'false'
'char' - single character. Currently these are only 8 bit ASCII characters.
'uint' - unsigned integer. These are currently 64 bit values. The system checks for overflow, underflow and division by zero. [An optimizer is allowed to use a different size if it can determine that no non-privileged code can tell the difference. So, when dealing with external size specifications, always use the appropriate "bitsXX" type.]
'sint' - signed integer. These are also currently 64 bit, and the system again checks for overflow, underflow and division errors.
'float' - floating point. These are 64 bit values also. The system does not do any checking on operations using these, but it is possible to construct "signalling NaN"s and so cause exceptions on some systems.
'string' - string values are tracked values and support operations including concatenation and substring. In some situations, vectors of 'char' can be used as 'string' values - see "4.6 Matrix Types". String values contain a length indicator - they do not terminate with a NUL (0-valued) character.
'bits8', 'bits16', 'bits32', 'bits64' - these are low-level types that occupy the indicated amount of storage. They are unsigned integral values, but operations using them are not checked. These types are generally used in low-level operations such as matching Zed types to types in other languages, to protocol specifications or to hardware descriptions. These types are collectively referred to as "bitsXX" types. The behaviour of operations on values of these types follows that defined by the target processor. For example, if that processor would wrap on overflow, then that is what will happen. [Zed does not specifically disallow representations other than 2's complement, but no consideration has been given to behaviour on such a system.]
'void' - no value. This pseudo-type is used to indicate that a proc does not return a result.
'nil' - this pseudo-type is only mentioned for completeness - it is the type of 'nil' and has no other use. The reserved word 'nil' cannot be used as a type.
'poly' - this pseudo-type is used in the headers for interface and capsule method declarations. It means that a capsule that implements the method must replace the 'poly' with the capsule type. "poly" is short for "polymorphic".
'any' - this type is somewhat like '* void' (which in turn is very similar to C's "void *") except that it is limited to "tracked" types. 'any' does not require privilege to use. In order to use a value of type 'any', the true type of the referenced storage must be determined at run time and applied, e.g. by using an 'assign' construct. Note that if 'any' is renamed so that names (e.g. "#" operations) can be exported from it, no actual value will have that naming type as its run-time type. Run-time types always come from the actual allocation of values.

'autoAny' - this type is the same as 'any' except that types 'bool', 'char', 'uint', 'sint' and 'float' are compatible with it. Values of those types will be automatically wrapped in one of the following record types exported from package "Basic":

    export record Bool_t public {
        bool theBool;
    };

    export record Char_t public {
        char theChar;
    };

    export record Uint_t public {
        uint theUint;
    };

    export record Sint_t public {
        sint theSint;
    };

    export record Float_t public {
        float theFloat;
    };

[There is no 'bits128' type, even though the name is a reserved word in Zed. This is more a matter of "I don't want to do that yet" than any language problem with defining them. This type would be useful, for example, for dealing with IPV6 addresses, etc. The main difficulty is, I believe, in native code implementation of them. Zed does not pass any multi-value items directly on proc calls. On a 64 bit architecture, that would be needed in order to implement 'bits128' on a 64 bit CPU. The Zed native code generator has no concept of "register pairs", and the X86-64 calling convention (the AMD and Windows forms differ) adds extra complexity in dealing with them.]

4.2 Naming Types

Constructed types in Zed allow the user to create new types, such as array types, enumeration types and proc types. Some of these types will automatically have a type name, due to the syntax of defining them. Others can be given a name using the syntax:

optional visibility specification
'type'
name for new type
'='
type to be named

This can only be done at the package level, i.e. type names cannot be defined inside procs. Type names defined this way follow the usual rules for their use in other packages - it is controlled by any visibility specification given.

A named version of a type is not equivalent to the unnamed version. The two are assignment compatible, either way, and the properties of the unnamed version apply to the named version. However, if two new names are given to the same type, those two new named types are not compatible. If a named type is given another new name, the newly named type loses the properties of the original type - it becomes an abstract type which has no properties. The doubly-named type is only compatible with itself and the type it renames. An exception to this is when a name is given to a type in an instance of a generic - that named type is completely equivalent to the longer form of selecting the type from the instance (see later on instances). Only that one initial renaming is treated this way, however.

It is conventional within the Zed system to have the last two characters of any type name be "_t". This is done whether the name is introduced by this naming syntax, or is the name of an enumeration, struct, record, bits, union, capsule or interface type. It is done to make type names more visible. In Zed, types are also values, and can be used in situations where types are perhaps not expected, so having the "_t" suffix can help programmers be aware of what is going on. The predefined types have reserved words for them, and do not have the trailing "_t".

Only one level of name can be given to 'template' types - things get too complicated if multiple levels are allowed.

4.3 Storage Flags

The concept of "storage flags" or "storage attributes" in Zed is used in several places, including in some types. The meaning of the various flags changes slightly from situation to situation, but the general intent of a given flag is consistent. The storage flags are:

'inline' - this flag is usable when a struct type is used as an element of a struct, record or capsule type. The flag is applied to the field, and it means that the fields of the struct become direct fields of the enclosing compound type. For example, if the struct type has fields "str_tag" and "str_value", then the enclosing compound type will get fields of those exact names and types. This allows them to be accessed without needing the extra level of field selection. The struct field will still exist within the enclosing compound, however, and is usable as normal. The struct field is consistent as a struct, i.e. it meets all of the alignment and padding requirements for the struct. Programmers should not mix references using the direct names and double-level names - that is too confusing to readers. This flag can be useful when structs or records are defined inside generics.
'private' - this flag says that the described field, etc. is only usable by code within the package which defined the type. Code in other packages cannot read or write the field.
'noInit' - this flag is only valid for the fields of struct, record and capsule types. For those, it means that no value is given for the field in a struct or record constructor or implicit capsule constructor. The field will be initialized to a default value or values (see below). Note that direct compound fields in records and capsules, such as fields of array or struct types, are implicitly 'noInit'. 'noInit' cannot be combined with 'nonNil'.
'con' - this flag is like the following 'ro' flag, but is stronger. It says that, once initialized during variable initialization, record construction, etc. this value can never change. This can be used to prevent all assignments to the variable, etc., and can also allow an optimizer to be sure that cached values remain valid, regardless of proc calls, etc. 'con' cannot be combined with 'volatile'. In order for a path object with 'con' to successfully reference a persistent variable, that persistent variable must be stored in read-only storage.
'var' - this flag is only usable with proc formals. It indicates that the formal is not the default 'con', but is changeable.
'ro' - this flag makes a struct, record or capsule field, or a matrix element, writeable only within the package which defines the type. Code in other packages can only read the value. When used on a package variable, this flag makes the variable writeable only by code within the same package. When used on a proc formal parameter or local variable, this flag causes any '@' of the variable to be an 'ro' '@'. This means that the formal or local is only writeable within the proc. Note that it makes no sense to have this flag together with the 'con' flag.
'volatile' - the 'volatile' flag is similar to the one in C. It means that the compiler is forced to preserve the full sequence of reads and writes to this field, variable, etc., and to preserve that order with respect to all other 'volatile' entities accessed in the thread of execution. 'volatile' is typically used on variables or fields which are shared between threads or processes, or are part of hardware entities. Note that this flag alone might not be sufficient for the desired correct operation - specific hardware situations might require other constraints. 'volatile' cannot be combined with 'con'.
'nonNil' - variables or fields which are declared as 'nonNil' cannot be assigned the value 'nil'. This flag is only relevant to tracked types, pointer types and '@' types. The flag guarantees that the value is never 'nil', and so code generators can avoid run-time checks for 'nil'. 'nonNil' cannot be combined with 'noInit' or 'nilOk'. A path object with 'nonNil' in its type is one which actually references a persistent variable or package, as opposed to just containing a path to a potential target.
'nilOk' - variables or struct fields of '@' types declared with 'nilOk' can have the value 'nil'. This is the opposite of 'nonNil', and is only needed with variables and fields of '@' types and variables initialized to 'nonNil' expressions since both default to 'nonNil'.
'package' - this flag is only usable with '@' types. It specifies that the type is only compatible with statically allocated storage, i.e. variables at the package level.

Fields marked as 'noInit' are initialized as follows:

    bool - false
    char - "\00"
    uint - 0
    sint - 0
    float - 0.0
    bitsXX - 0
    tracked types - nil
    pointers - unspecified

Types containing storage flags, e.g. '@' and pointer types, have restrictions on how values can be assigned, based on the nature of the storage flag. This situation sometimes also applies with actual values, rather than references to values. While the details can vary, the general rules are:

in most cases the effect of 'con' is that it prevents writes and it allows optimizations by the compiler. Because of the latter, 'con' cannot be freely added. However, it is safe to assign a 'con' value to a destination that specifies only 'ro' because the writes are still prevented - the optimizations are no longer done.
assigning a non-'ro' value to an 'ro' destination is freely allowed, because no restriction is lost - one is gained. This can be useful if the programmer wants to ensure that a reference is not used to do a write. The reverse is not allowed, since that would grant write access.
assigning a non-'volatile' value to a 'volatile' destination is freely allowed since this simply disables optimizations. The reverse is not true, since that would remove the required access restrictions.
assigning a 'nonNil' value to a destination that is not marked as 'nonNil' is allowed, since that only discards the knowledge that 'nil' checks are not needed - this simply creates less efficient code. The reverse is not allowed since the compiler will not take the programmer's word that a reference cannot be nil.

For example, a tracked value that is known to be 'nonNil' can be assigned to a destination that is not marked as 'nonNil', but the reverse is not true. An '@' value whose type does not contain the 'ro' storage flag can be assigned to a variable whose type does contain 'ro', but the reverse is not true. Etc.

4.4 "@" Types

4.4.1 "@" General

'@' types are "safe pointers" - they are constrained pointers to other values. '@' values are creating by putting '@' in front of the expression that is to be referenced. To get back to that value, put '@' after an expression of '@' type.

[Note that de-'@'-ing is explicit in Zed, rather than implicit. This is quite deliberate - a main goal of Zed is that of encouraging correct programs. By making the de-'@' explicit, the programmer is kept aware that the value being dealt with actually exists somewhere else, so it might change unexpectedly, and assigning to it might have unexpected consequences. Also, the explicit de-'@' makes it clearer that there is a cost involved. This is different from the use of C++ reference values, where the de-referencing is implicit.]

Syntactically, an '@' type is declared as:

'@'
optional storage flags ('con', 'ro', 'volatile', 'nonNil' and 'package')
the type to be '@'-ed

There are two categories of '@' types. If an '@' type has the 'package' storage flag, then values of that type can only reference variables at the package level. Because package variables are setup at the start of program execution and are never destroyed (ignore for now the possibility of executing code calling Zed internal code to add variables to a package at run time), references to them are always safe. Thus, operations on 'package' '@' values are mostly unrestricted. This provision is also extended to the values of package-level matrix, record and capsule variables which are 'con', since they must be initialized at package setup time and the value they hold can never be changed.

The other category of '@' types is those without 'package'. These values can reference proc formals, local variables, fields of dynamically allocated entities, matrix elements, etc.

One easy use of such local '@' variables is to provide a sort-of "alias" for other values. In that situation, an indirect value contains multiple values within it, and several operations (perhaps including passing to a proc call) are being done on parts of it. It is cumbersome (and, if there are side effects, perhaps incorrect), to continually access the internal elements via the original expression. Defining an '@' to point at an internal element, and then using that local variable multiple times, can be both clearer and more efficient. This is especially true if the internal element is itself a struct or array and contains multiple elements which are accessed repeatedly. This use is similar to the Pascal "with" statement. See the "useUser" example proc in "4.4.3 General "@" Use".

If an '@' value is of a variable local to a proc, or of something within a dynamically allocated entity, restrictions and extra code are needed to keep all operations safe. Local variables have a "scope level" (see "5.1 Scopes"), and the rules for '@' of local variables prevent the '@' from being used after the target has gone out of scope. When '@' of something within a dynamically allocated entity is taken, the Zed compiler generates internal code which references the entity, preventing automatic storage management such as reference counting or garbage collection from freeing the entity. This makes sure that '@' values can be freely used with no other hidden costs. The limitation on this automatic protection is that the compiler will not allow an '@' into a dynamically created entity to be assigned to an '@' variable that is not directly within the current scope. [The hidden protection variable is created within the current scope, and so would disappear on exit of the scope. It would sometimes be possible to force a protection variable to be in the same scope as the variable being assigned to, but this has not been done. Note that it would be messy to describe when it is possible and when it is not.]

A 'package' '@' value can be freely used as a non-'package' '@' value (e.g. be assigned to a non-'package' '@' variable), but the reverse is not true.

Any type which can be stored in variables, fields, etc. can have an '@' type created from it, other than an '@' type which is not 'package'. In other words, you can have a double '@' type (like a double pointer type in C), so long as the inner '@' type is a 'package' '@' type.

[The syntax "@ package ..." is fairly bulky. Zed uses a prefix '@' to create an '@' value, and a postfix '@' to dereference it. However, Zed uses a prefix '&' to create a standard pointer, and a postfix '*' to dereference it. This latter is expected to be somewhat more familiar to C programmers. It would be possible to use either '&' or '*' for both pointer operations, thus freeing up the other for 'package' '@' types. Since Zed is already a context sensitive language, the resulting ambiguity of '&' after an expression can be resolved based on the type of the expression, just as is done for '*'. Doing this could create a lot of confusion for C programmers, however. It would also mask the strong relationship between 'package' '@' types and other '@' types. Advice and/or suggestions are welcome.]

Type '@ void' plays a role similar to C's "void *" (and Zed's '* void') - it is a universal '@'. Any '@' value with compatible storage flags can be assigned to an '@ void' destination. Such a value cannot then be dereferenced, however. Instead, the value is typically used by privileged code, where it is type cast (using 'pretend') into some kind of pointer and then dereferenced. Type '@ package void' is the same, except that compatible '@' values must have the 'package' storage flag. '@ ro void' can accept any '@' value at all. '@ void' can only accept values which are not 'ro'. '@ package void' can only accept values which are 'package' and not 'ro'. '@ ro package void' accepts only values which are 'package'.

Two '@' types are assignment compatible, subject to their storage flags being compatible (discussed in "6.1 Assignment Statement"), if the referenced types are the same, or one is a direct rename of the other.

4.4.2 "@" Storage Flags

The storage flags in an '@' type restrict the values which variables, etc. of the '@' type can reference. Restrictions on '@' variables, etc. themselves are done using storage flags between the full type and the name(s) of the variables, fields, etc. [A similar situation occurs in C, with the "const" and "volatile" attributes applied to pointer types.]

If an '@' type has storage flag 'con', then only '@' values of 'con' locations are valid values for it. A 'con' value is one which can never change, at least for the lifetime of the '@' value in this case.

If an '@' type has storage flag 'ro', then any otherwise acceptable '@' values are valid values for it. Values referenced through an 'ro' '@' cannot be changed through the '@', but it is possible that other code can change the value. Note that when a proc formal or local has storage flag 'ro', the proc itself is allowed to change the formal or local. When an '@' value is created from such a formal or local, the 'ro' storage flag is carried over to the '@' value and so the formal or local value cannot be changed via the '@' value, even within the proc itself.

If an '@' type has storage flag 'volatile', then fetches and stores through such an '@' value will have their order preserved with respect to all other 'volatile' references in the active thread of execution. The storage flag constrains the optimizations that a compiler is allowed to do, but might not be enough, by itself, to guarantee atomicity or sequencing.

If an '@' type has storage flag 'nonNil', then it is referencing "tracked", pointer or other '@' values, and those values will never be 'nil'. Note that this does not say anything about whether or not the '@' value itself can be 'nil' (i.e. whether it references anything or not).

Using 'nilOk' as a storage flag within an '@' type is not useful, since that is equivalent, in that context, to not specifying 'nonNil'.

The 'package' storage flag for '@' types is discussed above.

When an '@' value is de-'@'-ed, the storage flags of the resulting value or destination are those from the '@' type. When an '@' value is created, the storage flags for the resulting '@' value are taken from the value being '@'-ed.

4.4.3 General "@" Use

By default, '@' variables, fields, etc. are assumed to be 'nonNil'. This is the normal situation for many uses, and is a "safe" default. If an '@' variable, field, etc. needs to accept the value 'nil', then storage flag 'nilOk' must be applied to it. This is the reverse of tracked and pointer types which are allowed to be 'nil' by default, and are constrained if storage flag 'nonNil' is added to their declaration.

To clarify, declaration

    @ nonNil string nilOk atString1;

declares a variable "atString1" which can take on the value 'nil', and if it does not have that value, it must reference a location which always contains a non-'nil' string reference.

Declaration

    @ string atString2 := <init-value>

declares a variable "atString2" which can never be 'nil', and so always references some location, but that location can contain 'nil', or can contain a string reference.

'@' variables, fields, etc. which are not 'nilOk' must always be initialized to some valid non-'nil' value.

'@' fields in structs must be declared 'nilOk' since struct fields are not always initialized. '@' fields in records, capsules and matrixes must be 'package' '@' types, since otherwise they could allow '@' of proc local values to live beyond the scope of the local variables they reference.

Proc formals of '@' types are fairly common - they provide a way to pass compound values (array and struct types) to procs. Zed does not allow such values to be passed directly. Similarly, '@' parameters allow procs to return multiple or compound results, by assigning through them (unless the '@' type contains 'ro' or 'con').

A proc can return a 'package' '@', but not a non-'package' '@'. The proc's result 'nonNil' attribute is used to say whether or not the proc can return 'nil' - the default of 'nonNil' for '@' variables, fields, etc. does not apply in this situation.

A small '@' example:

    struct Pair_t {
        uint p_a, p_b;
    };

    record User_t {
        ...
        Pair_t us_pair;
    };

    ...

    proc
    fixPair(@ Pair_t aP)void:
        if aP@.p_a = 0 and aP@.p_b = 0 then
            ...
        fi;
    corp;

    proc
    useUser(User_t nonNil us)void:
        fixPair(@us->us_pair);
        @ Pair_t aP := @us->us_pair;
        ... several uses of "aP@" instead of "us->us_pair".
    corp;

'@' types and values are not a replacement for pointer types and values - Zed also has pointers. '@' types and values are an alternative to pointer types and values. Since '@' operations are guaranteed safe by the language and have no extra overhead (except for nil checks when 'nilOk' '@' types are used) whenever they can be used instead of pointers, they should in general be used.

Sometimes the programmer wants to have a proc which optionally returns a value. This can be done using a 'nilOk' '@' parameter, as described above. However, it can be inconvenient to check it for 'nil' on each use, so it is useful to have a local '@' variable that is always 'nonNil', and is either a copy of the parameter if that is not 'nil', or '@' of a local variable if it is. That can be achieved this way:

    proc
    hasOptionalResult(@ uint nilOk aUParam)void:
        uint u;
        con aU :=
            if aUParam ~= nil then
                nonNil(aUParam)
            else
                @u
            fi;
        ...
        aU@ := 10;
    corp;

There are two 'nil' checks on the parameter value, but code generators can spot this case and only use one. The "more obvious" methods run into '@' scope issues.

4.4.4 Non-"package" "@" Values

Non-'package' '@' struct fields and variables can be assigned new values, but there are additional checks that the Zed compiler performs on these assignments. Variables inside procs have an associated "scope level", which shows how "deep" they are in the proc's code. Variables at the package level ("static" variables) have scope level 1. Formal parameters have scope level 2. Variables declared directly in the outermost parts of the body of a proc also have scope level 2. Variables declared in a scope inside that top level have a scope level of 3, and so on. An '@' field or variable can never have as value the '@' of something that has a higher scope level than itself. As a matter of style, not using storage flag 'var' with such formal parameters can be a valuable hint that the parameter always references the same storage.

For example:

    proc
    test(@ uint aUp)void:
        uint u1;
        @ uint aU1 := @u1;      // allowed
        @ uint aU2 := aUp;      // allowed
        if aUp@ = 0 then
            uint u2;
            @ uint aU3 := @u2;  // allowed
            @ uint aU4 := aU2;  // allowed
            @ uint aU5 := @u1;  // allowed
            aU1 := @u2;         // not allowed
            aU2 := aU3;         // not allowed
            aU1 := aU5;         // not allowed
            aU5 := aUp;         // allowed
        fi;
    corp;

If '@' of a struct type, which itself has an '@' field, is passed to a proc, the proc is not allowed to assign anything other than 'nil' to that field. This is because it is in general not possible to know the scope level of the referenced struct.

The following example shows a use of '@' values inside a struct:

    struct StackList_t {
        @ StackList_t nilOk sl_next;
        uint sl_this;
    };

    proc
    getThisValue()uint:
        ...
    corp;

    proc
    handleOneItem(@ StackList_t sl)void:
        ...
    corp;

    proc
    processNew(@ StackList_t nilOk slCaller)void:
        StackList_t slNew := {slCaller, getThisValue()};
        if slNew.sl_this = 0 then
            for sl from @slNew then sl@.sl_next do
                handleOneItem(sl);
            od;
        else
            processNew(@slNew);
        fi;
    corp;

In this example, a recursive proc, "processNew" is building a linked list of "Stack" state frames as it does its work. If the new "sl_this" value happens to be 0, then it will loop up the stack of state frames, performing "handleOneItem" on each one. This type of linked list could be maintained using a record type, but it is more efficient to use '@' pointers instead of full tracked pointers. This is because no additional allocation/deallocation is required, and in a situation where reference counting and/or garbage collection are needed, the '@' values do not need to be considered.

It is possible to take the '@' of a struct/array which contains fields or elements which are not fully accessible in the current context. Such fields could be 'private' or just 'ro'. This operation is allowed, and the storage flags associated with the new '@' value will as usual represent the storage flags of the struct/array as a whole. However, because of the type of the struct/array, access via the new '@' might not be allowed. This property is used in the Zed compiler, to have private structs that it can safely use for its ongoing internal state, even though the structs themselves are local variables in arbitrary calling code.

4.4.5 "package" "@" Values

This example shows the use of 'package' '@' values to maintain a sorted singly-linked list:

    package Test;

    use /Sys;
    import /Fmt;

    struct ValueList_t {
        @ package ValueList_t nilOk vl_next;
        uint vl_val;
    };

    uint COUNT = 10;

    [COUNT] ValueList_t ValueLists;

    @ package ValueList_t nilOk FreeListHead, Sorted;

    proc
    initValues()void:
        for i from 0 upto COUNT - 2 do
            ValueLists[i].vl_next := @ValueLists[i + 1];
        od;
        ValueLists[COUNT - 1].vl_next := nil;
        FreeListHead := @ValueLists[0];
        Sorted := nil;
    corp;

    proc
    alloc()nonNil @ package ValueList_t:
        if assign @ package ValueList_t vl := FreeListHead then
            FreeListHead := vl@.vl_next;
            vl
        else
            abort "No more free ValueList_t slots";
        fi
    corp;

    proc
    free(@ package ValueList_t nonNil vl)void:
        vl@.vl_next := FreeListHead;
        FreeListHead := vl;
    corp;

    proc
    insert(uint val)void:
        con vlNew := alloc();
        /* Note that "aPrev" itself is not 'nilOk' - it is never 'nil'. */
        @ nilOk @ package ValueList_t aPrev := @Sorted;
        while
            if assign @ package ValueList_t vl := aPrev@ then
                if vl@.vl_val < val then
                    aPrev := @vl@.vl_next;
                    true
                else
                    false
                fi
            else
                false
            fi
        do
        od;
        vlNew@.vl_next := aPrev@;
        vlNew@.vl_val := val;
        aPrev@ := vlNew;
    corp;

    proc
    delete(uint val)bool:
        @ nilOk @ package ValueList_t aPrev := @Sorted;
        while
            if assign @ package ValueList_t vl := aPrev@ then
                if vl@.vl_val = val then
                    aPrev@ := vl@.vl_next;
                    free(vl);
                    return true;
                fi;
                aPrev := @vl@.vl_next;
                true
            else
                false
            fi
        do
        od;
        false
    corp;

    proc
    doDelete(uint val)void:
        if delete(val) then
            Fmt("Deleted '", val, "'");
        else
            Fmt("Could not delete '", val, "'");
        fi;
    corp;

    proc
    showValues(string nonNil header)void:
        Fmt("\n", header, ":");
        for vl from Sorted then vl@.vl_next do
            Fmt(vl@.vl_val);
        od;
    corp;

    export proc
    testValues()void:
        initValues();
        for i from 1 upto COUNT do
            uint val := i * 10 % 3 + 1;
            insert(val * 1000000 + val * 1000 + val);
        od;
        showValues("Before");
        doDelete(1001001);
        doDelete(1001001);
        doDelete(1001001);
        doDelete(1001001);
        doDelete(3003003);
        doDelete(3003003);
        doDelete(3003003);
        doDelete(3003003);
        showValues("After");
    corp;

Asside from some syntactic differences, this is essentially the same code as could be used in a C program doing the same thing. The run-time cost should be very close as well, since use of 'package' '@' values does not incur any memory management overhead. In this version, constant "COUNT" sets the size of the array of structures. As mentioned earlier, the same package-level scope is granted to the values of package-level 'con' matrix variables. This allows the size of the set of structures to be changed at run time, subject to the limitation that the size must be computed before the package is initialized.

In the above example, explicit typed declarations have been used. This is done to show the exact type needed in this example. Most programmers would use 'con' or 'var' declarations to avoid having to figure out the exact type needed for "aPrev" in "insert" and "delete". Note also that the 'else' branch in "alloc" does not yield a value. This is because of the 'abort' statement - the compiler knows that nothing directly after an 'abort' can ever be executed, and so does not require a value. In fact, putting a value there will provoke an "unreachable" warning.

4.5 Array Types

Arrays in Zed are similar to those in many other programming languages. The bounds of an array are fixed - see matrix types for an alternative. Array types can have up to an arbitrarily chosen limit of 256 dimensions. The valid index values for a given array bound range from 0 up to 1 less than the specified bound - Zed does not support arrays with offset indexes.

The syntax for an array type is:

'['
one or more bounds, separated by commas
']'
the element type, i.e. the type of each array element

An array bound can be:

a constant expression of type 'uint', 'char', 'bits8', 'bits16', 'bits32' or 'bits64'
a constant expression of an 'enum', 'oneof' or record selector type
'bool' or 'char'
the name of an 'enum', 'oneof' or record selector type
'*'

If the array bound is given as 'bool', then the numeric bound is 2. If the array bound is given as 'char', then the numeric bound is 256. If the array bound is given as an 'enum' or record selector type, then the numeric bound is the number of elements in the enumeration. If the array bound is given as a 'oneof' type, then the numeric bound is one greater than the largest value in the 'oneof'.

If a '*' bound is used, it must be the only bound. This can only be used with an initialized array, where the actual bound value will be the number of initializers given. That actual bound can then be determined using "8.1 "getBound"".

Example array types:

    [3] uint

    [8, 8] ChessPiece_t

    [char] float

    [MAX_CLUSTER][ROW_SIZE + 2][2] MyThingType_t

The array syntax is applied 3 times in this last array type. This type can be read as "array of MAX_CLUSTER arrays of ROW_SIZE + 2 arrays of 2 MyThingType_t's.

Bound expressions in an array type must be known at compile time. In Zed, this is more permissive than in many other languages. Zed allows literals, named constants, enumeration, 'oneof' and variant record tags, the results of compile time proc calls, expressions involving those, and conditional and case expressions whose condition/index is determinable at compile time.

Indexing is done using square brackets and the index expressions after the array expression. The indexes are given in the same order as the bounds appear in the array type. Arrays in Zed are normally stored in row-major order, meaning that as you look through the memory used for an array, it is the last index that varies most quickly. An optimizer is allowed to change this. [It seems unlikely that any such optimizer will be created. What I am doing here is giving programmers who wish to do their own optimization the information they need to do so, while at the same time not disallowing an optimizer. Wording it this way allows the compiler to ignore what privileged programmers might try to do with pointers into an array.]

Index expressions are checked against the bounds, either at compile time or at run time. Such expressions can be of any of the types listed for bound expressions. When an array bound is given as a type, indexing expressions for that bound must be of that type. For example, if an array bound is given as 'bool', only 'false', 'true' or expressions of type 'bool' can be used to index the array in that bound.

'false' has value 0 as an index. 'true' has value 1 as an index. 'enum' elements have index values of 0 for the first element in the 'enum' element list, and increasing by 1 for each subsequent element. Record selector elements work the same as 'enum' elements. 'char' values when used as indexes have values corresponding to the ASCII value of the character. For example, the character "a" has value 97 (0x61).

Here is a simple example of array use:

    uint M = 10, N = 5, P = 7;

    proc
    arrayMul()void:
        [M, N] float a;
        [N, P] float b;
        [M, P] float res;

        /* Initialize "a" and "b". */
        ...

        /* Now multiply them, yielding "res". */
        for i from 0 upto M - 1 do
            for j from 0 upto P - 1 do
                float sum := 0.0;
                for k from 0 upto N - 1 do
                    sum := sum + a[i, k] * b[k, j];
                od;
                res[i, j] := sum;
            od;
        od;
    corp;

Two array types are only equivalent if they have the same dimensionality (number of bounds), they have the same (or the same through one rename) element type and all of their bound expressions/types are the same. Having the same value of the bound is not enough - the expressions must be equivalent. [At the moment, the compiler only accepts literals, named constants and generic 'uint' parameters while comparing bounds expressions - anything more complex makes the bounds not equivalent. This is done so that the type equivalence is not maintained when it should not be. For example if one expression depends on a named constant imported from some other package, changing the value in that other package will make the array bound change, and so become different. This kind of thing can be hard to deal with in a system which tries not to recompile everything when minor changes occur.]

Zed supports array assignment - the compiler will internally generate appropriate copying code as needed. [The programmer is warned that array assignment can be expensive in terms of needed CPU resources. In particular, if the array elements are or contain tracked values, the compiler might need to generate nested loops and individual assignments to properly handle them.]

Array element types cannot be 'void', 'nil' or 'poly', and cannot contain an '@' generic type parameter. Note also that there is no provision for storage flags on array element types, but the storage flags of the array variable or field itself extend to the elements of the array.

If an array type has been given a name, "constructors" of that type can be used. Array constructors are actually just hidden array variables which are initialized with the elements given in the "constructor". The elements can be arbitrary expressions whose type is compatible with the array element type. Array constructors can only be used inside procs. Array initializers, which can be used outside of procs, are described in "5.6 Array and Struct Initializers". The syntax of an array constructor is:

name of array type (might be a path)
'('
list of values, separated by commas
')'

The syntax of non-scalar individual array element values is shown later, in "5.6 Array and Struct Initializers". An array constructer yields a 'con' '@' of the array type. This means that assignment of an array constructor to an array field or variable needs to put an '@' after the constructor, which looks strange. However, doing it this way makes it easier to have conditional "array values". 'con' is valid here since there is no non-privileged way in Zed to change any of the values in the hidden constructed array. Since an 'ro' '@' can receive a 'con' '@', many programmers will use an 'ro' '@', not wanting to get into the semantics of 'con' values.

The order of evaluation of expressions used as array bounds, as array indexes and in array constructors is not specified. If the evaluation order matters, assign the final values to constants/variables before the declaration/indexing/constructor, and then use those names to represent the evaluated values. When indexing into an array of arrays, the order of evaluation of all of the index expressions is not defined. [In all of these cases, if a code generator is trying to generate efficient code, the actual order may result from simple algorithmic choices.]

The "8.1 "getBound"" construct can be used with array values. This is typically only needed when '*' bounds are used - again, see "5.6 Array and Struct Initializers" for details.

4.6 Matrix Types

Matrix types in Zed work much like array types. The difference is that all matrix values must be dynamically allocated at run time (or initialization time) using the 'matrix' construct described in "7.3 Matrix Operations". Matrix types include nothing for actual bound values, since the bounds are determined when an actual matrix is created. Using matrix values is syntactically the same as using array values. Matrix values are tracked values. Note that assigning matrix values does not copy the matrix contents - only the matrix reference is copied. Matrix variables, fields, etc. can have value 'nil'. Matrix types with only one dimension are often called "vectors".

The syntax for a matrix type is:

'['
zero or more commas, indicating the number of dimensions
']'
optional storage flags ('private', 'ro', 'volatile')
the element type, i.e. the type of each matrix element

The 'getBound' construct is used with matrix values to determine the actual bounds that the values were created with. 'getBound' yields a 'uint' value, unless it is applied to a persistent vector (see "22.2 Persistent Vectors"). This allows code to handle any size matrix without having to be told the size.

Example matrix types:

    [] uint

    [,,] volatile Cell_t

    [] ro [,] Descriptor_t

    [,] [10] DataObject_t

The last matrix type above has length 10 arrays as its elements.

As with array operations, all matrix operations are checked. An additional check with matrix values is a check against 'nil'. The array multiplication example above is more useful when used on matrixes:

    proc
    matrixMul([,] float nonNil a, b)nonNil [,] float:
        if getBound(a, 1) ~= getBound(b, 0) then
            abort "matrixMul: bounds not compatible";
        fi;
        [,] float con nonNil res := matrix([getBound(a, 0), getBound(b, 1)] float);
        for i from 0 upto getBound(a, 0) - 1 do
            for j from 0 upto getBound(b, 1) - 1 do
                float sum := 0.0;
                for k from 0 upto getBound(a, 1) - 1 do
                    sum := sum + a[i, k] * b[k, j];
                od;
                res[i, j] := sum;
            od;
        od;
        res
    corp;

[With code like this which has straightforward use of index values derived from 'getBound', it is expected that the Zed compiler will be smart enough to tell code generators that run time bounds checks are not needed. Such smarts have not yet been implemented, however.]

Matrix element types cannot be 'void', 'nil' or 'poly', cannot be of a non-'package' '@' type and cannot contain an '@' generic parameter type. [This is slightly more restrictive than with array element types, because matrixes are dynamically allocated, and so are not constrained by scopes. It might be possible to have the compiler track the scope level of matrixes with non-'package' '@' values like it does '@' values themselves, but I have not looked into this.]

Storage flags 'private', 'ro' and 'volatile' can be used with matrix types. Note that they refer to the elements of the matrix, and not to the matrix reference itself - storage flags for that appear in declarations of matrix variables, fields, parameters, etc. 'volatile' has the usual meaning, but the others are unusual. Flag 'ro' means that the matrix elements cannot be changed through this reference to the matrix. This is true regardless of which package the code in question is in. Flag 'private' operates more like 'ro' usually operates - the elements of the matrix can only be changed by code within the same package as that in which the matrix type was created.

Normally, matrix types are compatible if they have the same (or the same through one rename) element type and the same number of dimensions. In Zed, neither the 'volatile' flag nor the 'ro' flag can be removed via assignment, parameter passing, etc. For example, a value of type "[,] uint" can be assigned to a variable of type "[,] ro uint", but not vice versa. However, if a variable, field, etc. of type "[,] private uint" is declared and used in one package, it is a different type than such a matrix type used in another package. This is true even though matrix types do not need to be named.

A 'private' matrix is compatible with an 'ro' matrix type, even in the same package. The reverse is not true, however. Also, a 'private' matrix type is compatible with a matrix type without either 'private' or 'ro', so long as the compatibility being tested is within the same package as the 'private' matrix type was created in.

In other words, a package can define its own data structures and use 'private' with its matrix types. It can read and write the matrix elements freely. Once such a matrix leaves code in that package, however, it can only be referred to via an 'ro' matrix type, and thus code outside of the package cannot modify the matrix elements. Any package can remove write access to matrix elements by assigning the matrix reference to a variable of an 'ro' matrix type, and referencing via that variable.

Zed allows one small bit of extra flexibility when dealing with matrix (and pointer) types - differences in storage flags can be handled at the same time as one type name is skipped. Normally, if one had a value of, for example, type "[] Stuff_t", and wanted to assign it to a variable of type "StuffVec_t" defined as "type StuffVec_t = [] private Stuff_t", it would take two steps - one to handle the unnamed-to-named change, and one to handle adding the 'private' storage flag. For convenience, Zed allows these two steps to be done together. [The net result of this is typically that you can declare the required temporary variable using 'con' instead of explicitly specifying the matrix type (with the storage flag).]

One other special compatibility rule involves matrixes. A value of type "[] char" (vector of 'char') can be used in situations where a 'string' is required - as assignment sources, actual parameters on proc calls, substring subjects, etc. In such uses, the data in the vector is copied to create the needed string - later modification of the contents of the vector will not affect the string. 'char' vectors can also be used as arguments to concatenation, with the result being of type 'string'. This can be useful when a 'string' value needs to be constructed one character at a time (or in a lot of chunks). For example:

    con lower := matrix([26] char), upper := matrix([26] char);
    for i from 0 upto 26 - 1 do
        lower[i] := i + "a";
        upper[i] := i + "A";
    od;
    con alpha := lower + upper;

"alpha" now contains a lower-case alphabet followed by an upper-case alphabet, and is of type 'string'. Note that Zed strings can contain the NUL character and other unprintable characters, so strings created from char vectors can contain such without their being an explicit escape in a string literal for them.

Matrix types cannot have either 'nonNil' or 'nilOk' storage flags. This is because there is no way in Zed to initialize the matrix elements on matrix creation, other than via a vector constructor, as descibed below, and thus they are inherently 'nilOk'.

[When adding 'nonNil' to code, it is annoying that array and matrix elements cannot be declared 'nonNil'. Both array and vector constructors can be used, but semantically they consist of the declaration and creation followed by a series of assignment statements. Because of that, it is currently not allowed to use 'nonNil' with either vector types or array variables, etc.

One possibility for matrixes is to provide an optional initializer of some kind in the matrix constructor. That could either be a value that is given to all elements before the new matrix is available, or perhaps could be a proc which yields the required values, based on index parameters.

If matrix constructors had a way to initialize the matrix elements, what would it look like? It could be a single fixed value. It could be a single proc which is called repeatedly with all possible index combinations, and yields values to store at those positions. It could also be a capsule value of a capsule which implements an interface specified for this purpose. This latter would allow additional data to be passed in to the proc which yields the value for the elements.]

Named vector types (one-dimensional matrix types) can have constructors. Such a constructor consists of:

name of vector type (might be a path)
'('
zero or more expressions separated by commas
')'

The value created is 'nonNil', and can be used like any other matrix value of that type. Vector constructor example:

    type UintVec_t = [] uint;

    proc
    test()void:
        UintVec_t nonNil uiv1 := UintVec_t(1, 2, 3, 4, 5, 6, 7, 8, 9);
        showUintVec(uiv1);
        showUintVec(UintVec_t(99, 98, 97));
    corp;

As with array bounds, indexing and constructors, the order of evaluation of expressions used in matrix constructors, matrix indexing and vector constructors is not specified.

4.7 Enumeration Types

An enumeration type lists a set of names which are the elements of the enumeration. The first name in the list is given value 0, and successive names are given values 1 greater than the previous name. Unlike in C, enumeration elements cannot be given explicit values - see "4.8 Oneof Types" for that ability. The syntax of an enumeration type definition is:

optional visibility specification
'enum'
name for new enumeration type
'{'
list of names separated by commas
'}'

As with other symbols defined at the package level, enumeration type definitions can be preceeded by a visibility specification. If the enumeration type is available in a given context, then all of its names are available.

Example enumeration types and uses:

    enum Monochrome_t {
        mono_black,
        mono_white
    };

    export enum TVColour_t {
        tvc_red,
        tvc_green,
        tvc_blue,
    };

    proc
    doSomething()void:
        for tvc from tvc_red upto tvc_blue do
            ...
        od;
        TVColour_t tvc := ...
        if tvc ~= tvc_red then
            ...
        fi;
        uint distance := tvc - tvc_red;
        if tvc < tvc_blue then
            tvc := tvc + 1;
        fi;
    corp;

Enumeration values will always be one of the elements of the enumeration. If they are not explicitly initialized, they will be implicitly initialized to the first one (value 0). Addition and subtraction of enumeration values is allowed, but only within a single enumeration type. Such arithmetic is checked, at compile time or run time, for possible underflow or overflow conditions. Enumeration values can be used to index arrays and matrixes, can be used as counting 'for' loop init and limit, and can be used as a selector in 'case' constructs. Enumeration types cannot be used as 'bits' fields, since entire 'bits' variables can be assigned from 'uint' values, and that could yield an out of range enumeration value. Since 'bits' types are typically used in situations where execution efficiency is important, a run-time test of the resulting value would not be appropriate.

The memory and alignment needed for an enumeration value depends on the number of names in the enumeration type. Enumerations with 256 or fewer names fit in one byte, those with 65536 or fewer fit in 2 bytes (and so require 2-byte alignment), and others fit in 4 bytes and require 4 byte alignment. [The Zed system supports 64 bit enumeration types, but it would take so long to simply parse the type that they are not going to happen, at least not until the Zed compiler generates efficient native code, and CPU's manage to become a lot faster than they are now. Note that the Zed compiler does not attempt to parallelize the parsing of an enumeration type, so multiple threads will not help. It would also take a lot of memory to hold the 4 billion unique names.]

In Zed code, it is conventional to have a prefix on the front of the names of the enumeration elements, which relates to the name of the enumeration type. This is an aid to readers and maintainers of Zed code - it is a reminder of what the tag is part of. Some tag names can logically be part of several enumeration types within a program, so having a "taglet" on the front of them can help the reader recognize where a particular tag is coming from. If code maintainers do not fully understand the code they are maintaining, they are more likely to introduce bugs than if they do understand the bit of code they are working with. This is not as bad a problem as in C, where enum tags are not strongly typed. The same convention of taglets is also used with oneof, struct, record, bits, union and capsule types. Note that it is not always possible to have unique short taglets for all types in use - in those cases it is hoped that context is enough for the reader. The Zed compiler requires that all enumeration and 'oneof' tags be unique within a package, since they can be used on their own, without context.

4.8 Oneof Types

'oneof' types in Zed are similar to enumeration types in that they consist of a number of names of constants of that type. However, in 'oneof' types, names can be given explicit values by the programmer. No two names are allowed to be given the same value, but it is possible to define a constant of a 'oneof' type, which has the same effect. The syntax for a 'oneof' type declaration is:

optional visibility specification
'oneof'
name for new 'oneof' type
'{'
list of <name> definitions separated by commas
'}'

A "<name> definition" must include the name to be defined. It can also include '=' and a compile-time expression giving the value for the new name. If no '=' and expression is given, then the value given to the name will be 1 greater than the value of the previous defined name. If the name is the first in the 'oneof', it is an error to not give a value.

The size and alignment of 'oneof' values depends on the size of the largest value given to any name. A maximum value of 255 (0xff) fits in 1 byte, a maximum value of 65535 (0xffff) fits in 2 bytes with 2 byte-alignment, a maximum value of 4294967295 (0xffff_ffff) fits in 4 bytes with 4 byte-alignment, and larger values require 8 bytes and 8 byte alignment.

'oneof' values are assignment compatible (both ways) with 'uint' and the various "bitsXX" types. Note that value truncation can occur. 'oneof' values can be compared for size, but arithmetic on them is not allowed. 'oneof' values can be used as a selector in 'case' constructs and can be used as array or matrix bounds or indexes. They can also be used as counting 'for' loop init and limit. Since 'oneof' variables can have any value which fits in them, regardless of whether or not there is a name for that value, they can be used as fields of 'bits' types.

Example 'oneof' types and use:

    local oneof MC68000_Mode_t {
        m_dDir  = 0b000,
        m_aDir  = 0b001,
        m_indir = 0b010,
        m_inc   = 0b011,
        m_dec   = 0b100,
        m_disp  = 0b101,
        m_index = 0b110,
    };

    local oneof MC68000_MoveOp_t {
        mv_movb = 0b01,
        mv_movl = 0b10,
        mv_movw = 0b11,
    };

    export oneof ScaleFactor_t {
        sf_one = 1,
        sf_ten = 10,
        sf_hundred = 100,
        sf_thousand = 1000
    };

    proc
    getScale(uint n)ScaleFactor_t:
        if n >= 1000 then
            sf_thousand
        elif n >= 100 then
            sf_hundred
        elif n >= 10 then
            sf_ten
        else
            sf_one
        fi
    corp;

    proc
    handleValue(uint n)void:
        ScaleFactor_t sf;
        if n = 0 then
            sf := 0;
        else
            sf := getScale(n);
        fi;
        case sf
        incase sf_one:
            ...
        incase sf_ten:
            ...
        incase sf_hundred:
            ...
        incase sf_thousand:
            ...
        default:
            ...
        esac;
    corp;

    oneof Grouping_t {
        g1_zero = 0,
        g1_one,
        g1_two,
        g2_zero = 10,
        g2_one,
        g2_two
    };

[Historically, Zed didn't have variant records. Instead, it had non-variant records and a "case-oneof" type that was just the variant part of a variant record. Then, what is now called 'oneof' was called a "set-oneof". This got cleaned up when "case-oneof" and records were merged.]

Because 'oneof' types are compatible with numeric types, there is no guarantee that a 'oneof' value is one of the named values. When arbitrary values are used as values for 'oneof' destinations, the value may be truncated (high bits removed) before the value is stored. When a 'oneof' value is a field of a 'bits' type, any values stored to the field, or used for that field in a 'bits' constructor, are checked at run time to make sure they fit in the space available. Note that because it is possible to name a constant of a 'oneof' type, the set of named values of the type can vary from place to place. Its bits size cannot change, however, since there may have already been uses of the 'oneof' type as a field of a 'bits' type.

4.9 Struct Types

A struct is simply a bunch of values which are bound together, to make handling of them easier. 'struct' types in Zed are very similar to C "structs". The main difference is that in Zed, except as described below, struct elements do not have to appear in the exact order given in the declaration - the compiler is free to move them around to conserve space. The Zed language guarantees that if all of the fields of a struct have the same alignment requirement, then the compiler will not re-order the fields, and it will not insert any padding. For example, if all fields are 'uint', pointer, 'bits64' or 'bits' types with a size of 64 bits, then no re-ordering will be done and no padding will be inserted. In Zed, the 'bits' types are intended for overlaying hardware registers, external protocols, etc, but the above rule allows carefully designed struct types to be used to map externally specified structures larger than can be handled by a single 'bits' type (i.e. larger than 64 bits in total).

The alignment requirement of a struct type is the maximum of the alignment requirements of its fields. Struct fields are allocated offsets in the struct that satisfy their alignment requirements, based on assuming the beginning of the struct is aligned to at least that alignment in actual use. See the example at the end of "4.17 Type Values".

A 'struct' type declaration consists of:

optional visibility specification
'struct'
name for new struct type
optional 'private' or 'ro'
'{'
one or more struct fields declarations, separated by semicolons
'}'

Each struct fields declaration consists of:

type of fields
optional storage flags ('inline', 'private', 'noInit', 'ro', 'volatile', 'nilOk')
list of field names separated by commas

By default, struct types are "public". If they are marked 'ro', then all of their fields are only writeable within the package which defines the type. If they are marked 'private', then none of their fields are accessible outside of the defining package. Storage flags on all fields could also accomplish both of these, but it is easier and clearer to mark the struct all at once when that is what is desired. As examples, see the various temporary structs that the Zed compiler uses for construction of types, "Exec_t"'s and procs.

Note that it is legal to take the '@' of a struct that is exported as 'private' from some other package. The resulting '@' value is not an 'ro' '@', and so if such a value is passed back into the defining package, the fields are writeable by code within that package.

Storage flags valid for 'struct' fields are:

'inline' - fields of a struct type which is used as a struct field will be inlined into the greater type. This means they can be directly named as fields of the outer struct.
'private' - if a struct field is 'private' then it cannot be accessed by code outside of the defining package
'noInit' - in a struct initializer or constructor, no value should be given for fields marked 'noInit' - such fields have default values until explicitly set.
'ro' - if a struct field is 'ro', then it cannot be written by code outside of the defining package
'volatile' - 'volatile' has the usual meaning for struct fields
'nilOk' - struct fields of '@' types must be 'nilOk'. Since by default '@' values are 'nonNil', the 'nilOk' must be given to override this.

Struct fields cannot be 'con' or 'nonNil' since they must be initialized after the declaration/creation of the variable containing the struct. For struct fields, 'ro' is pointless if 'private' is given. If 'ro' is given for the struct as a whole, then all fields are essentially 'ro', and any 'ro' given for individual fields is redundant. Similarly, if 'private' is given for the struct as a whole, any 'private' on individual fields is redundant, and any 'ro' is pointless.

Inlining one struct inside another struct is a bit like the C++ notion of inheriting the inner struct by the outer struct. Only the data fields are thus "inherited", however, and there cannot be multiple inheritance because of name clashes. The other usage model that is supported by struct inlining is that from old C code where macros were used to rename nested fields with shorter names. For example (this is C code, not Zed code):

    struct inner {
        int in_iSize;
        char in_iData[4]
    };

    struct outer {
        struct inner out_inMain;
        ...
        struct inner out_inOther;
    };

    #define out_iSize out_inMain.in_iSize
    #define out_iData out_inMain.in_iData

That practice of name-shortening was questionable in C and is similarly questionable in Zed. However, see the system "List" generics for a situation where 'inline' is useful.

If the part of the struct declaration including and within the braces is not given, then the declaration becomes a pre-declaration. Such a struct type is known to be a struct from that point on, but its size and alignment are not known, and none of its fields can be accessed. This means that variables, fields, etc. of the struct type cannot be declared. What can be done with the struct type is to declare procs, types, fields, etc. that use pointers to the struct or '@''s of the struct. Struct pre-declarations are useful when struct types reference each other via pointers or '@'s.

The field names of a struct type are not independently visible - they are only visible when attempting to access a field of the struct. Because of this, the field names of struct types do not conflict, i.e. multiple struct types can have fields of the same name. All names within a given struct type must be unique, however, and this includes when inner struct types are made 'inline'.

Each field of a struct is like a variable, in that it can be used as a value, and can be assigned to. The fields in a struct are accessed by using an expression which yields a struct value followed by a '.' and the name of the field to be selected. This syntax is used in all situations, i.e. both when the value in a field is to be extracted, and when a new value is to be assigned to a field.

Zed does not support struct comparison - programmers must compare individual fields as required. Similarly, struct types cannot directly be proc parameters and cannot be returned from a proc. Passing a struct to a proc can be done indirectly using an '@' parameter.

Struct types can be assigned - the Zed compiler will use an appropriate combination of bulk data copy and individual assignments. If the struct is 'private' or 'ro' or has any fields which are 'private' or 'ro', then code outside of the defining package cannot assign values of the struct type. [Struct assignments can be expensive in terms of CPU time. For example, if a struct contains a lot of tracked values, and the system is using reference counting to handle those values, individual assignments might be needed to keep the reference counts correct.]

Example 'struct' types and use:

    struct Complex_t {
        float cplx_real, cplx_imag;
    };

    export(../MyFriend) struct MainData_t {
        [KEY_SIZE] bits8 md_key;
        uint md_usedCount;
        string md_tag;
        [] bool md_flagVec;
    };

    proc
    setKey(@ MainData_t aMd; @ [KEY_SIZE] bits8 aKey)void:
        /* Example only - could just use array assignment here. */
        for i from 0 upto KEY_SIZE - 1 do
            aMd@.md_key[i] := aKey@[i];
        od;
        aMd@.md_usedCount := aMd@.md_usedCount + 1;
    corp;

    proc
    initStuff(string tag; @ [KEY_SIZE] bits8 aKey; uint flagCount)void:
        MainData_t md;
        md.md_usedCount := 0;
        setKey(@md, aKey);
        md.md_tag := tag;
        if flagCount ~= 0 then
            md.md_flagVec := matrix([flagCount] bool);
        else
            md.md_flagVec := nil;
        fi;
    corp;

Struct types can have "constructors". Syntactically, a struct constructor is identical to an array "constructor":

name of struct type (might be a path)
'('
list of values, separated by commas
')'

Again, the individual elements are as described in "5.6 Array and Struct Initializers". The elements must appear in the same order as the fields are declared in the struct declaration. Struct fields which are marked 'noInit' must not be given values in a struct constructor. If a struct field is an 'inline' struct, then the fields of that inlined struct must be directly included in the constructor (and in initializers), as directed by their flags and types. Non-inlined multivalued elements which are not 'noInit' must be included as described in "5.6 Array and Struct Initializers". Similar to array constructors, a struct constructor yields a 'con' '@' of the struct type. See "4.7 Enumeration Types" for the rule for enumeration fields.

Zed does not specify the order of evaluation of the expressions in a struct constructor.

4.10 Record Types

'record' types in Zed are very similar to 'struct' types. The main difference is that record values are tracked values - they are pointers to allocated memory containing the record fields. As such, record values can be compared, assigned, passed to procs, returned from procs, etc. Note, however, that it is just the tracked values which are being manipulated, not the contents of the records.

A 'record' declaration is very similar to a 'struct' declaration:

optional visibility specification
'record'
name for new record type
optional 'public'
'{'
one or more record fields declarations separated by semicolons
optional variant field set
'}'

Each record fields declaration consists of:

type of fields
optional storage flags ('inline', 'private', 'noInit', 'con', 'ro', 'volatile', 'nonNil', 'nilOk')
list of field names separated by commas

Variant records will be discussed later.

By default, record types are essentially "ro". That means that they cannot be constructed outside of the defining package, and their fields cannot be written by code outside of the defining package. Their fields can be read by any code which can see the type, unless the field itself is marked 'private'. By making the record type 'public', code outside of the defining package can do both of those things. There is no 'private' attribute for records - if that effect is desired, then a renaming of the record type can be done, as in:

    export type Renamed_t = Record_t;

[Doing the above creates an anonymous type, "Renamed_t", which is known to be a tracked type, but which is otherwise unknown. If that type is exported but the type it renames is not, then only the anonymous form can be used outside of the exporting package. This technique does not work for struct types because similar uses of struct types usually involves the '@' of struct values, and '@' values must match exactly to be compatible - a renaming will not be accepted. It would certainly be possible to add a 'private' attribute to record types, but this has not been done.]

Record types can be predeclared, just like struct types. However, since all record values are just pointers, it is possible to declare variables, fields, parameters, etc. of record types which are only predeclared. As with struct types, however, the names of the record fields (including any variant fields) are not yet known and so cannot be used. Record predeclarations are similarly used when record types reference each other, either directly or via some longer chain of types.

A new record value is created using a record constructor, which consists of:

the record type name (might be a path)
'('
zero or more initial value expressions separated by commas
')'

The initial value expressions are for all of the fields of the record which are not marked 'noInit' and are not of direct compound types (arrays, and structs which are not marked as 'inline'). The values must be given in the order the fields of the record are declared - Zed has no syntax to indicate other orders. The record constructor yields a new 'nonNil' value of the record type.

As mentioned in the sections on array and struct types, "constructors" for those types are actually initializers and so use the rules for initializers, rather than the rules here for actual constructors. In particular, record (and capsule) constructors do not include values for non-inlined struct fields or for array fields, whereas array and struct initializers and "constructors" do. This reflects the expected run-time frequency of occurrence of the various constructors.

Zed does not specify the order of evaluation of parameters to a record constructor. Thus, if there are side effects in the initial value expressions, the programmer should compute the values before the constructor, saving them in local variables, and then use those local variables as values in the constructor.

Since record values are only created in an initialized state, it makes sense for their fields to be 'con' and/or 'nonNil'. Flag 'noInit' was mentioned above. Storage flags 'inline', 'private', 'ro' and 'volatile' work the same as for struct fields.

Fields of types other than enumeration types which are not initialized explicitly in the constructor have undefined values. Reference to such a field before assigning a value to it yields undefined results. The compiler may attempt to warn of such situations, but it is not required to do so. The Zed system will protect its own integrity in these cases, but it is not defined how it does that.

Record fields are accessed similar to struct fields - an expression yielding a record reference, a '->', and the field name. [My thinking is that '->' distinguishes them from struct, union, and bits field references, and is a continual reminder of the indirection that is happening.]

Example record types and use:

    export record Complex_t public {
        float cmplx_real, cmplx_imag;
    };

    proc
    useComplex(Complex_t nonNil cmplx)nonNil Complex_t:
        cmplx->cmplx_imag := cmplx->cmplx_imag + 1.0;
        if cmplx->cmpx_real < 0.0 then
            Complex_t(- cmplx->cmplx_real, - cmplx->cmplx_imag)
        else
            cmplx
        fi
    corp;

    export record Board_t {
        Board_t noInit bd_prev, bd_next;
        [8, 8] Square_t bd_squares;
        uint bd_historyCount;
        [] History_t nonNil bd_history;
        uint volatile bd_lock;
    };

    proc
    createBoard()nonNil Board_t:
        Board_t nonNil bd := Board_t(0, matrix([0] History_t), 0);
        for i from 0 upto 7 do
            for j from 0 upto 7 do
                initSquare(@bd->bd_squares[i, j], i, j);
            od;
        od;
        bd
    corp;

In the constructor in "createBoard", fields "bd_prev" and "bd_next" are not explicitly initialized because they are declared as 'noInit'. The elements of field "bd_squares" are initialized manually because, as described above, struct and array fields are not initialized in record constructors.

4.10.1 Variant Records

See "6.10 Case Construct" for general information on the 'case' construct.

A variant field set consists of:

'case'
name for record variant selector type
name for variant selector field in record
one or more variant field declarations separated by semicolons
'esac'

A variant field declaration consists of:

'incase'
name (tag) for new variant field
':'
type for new variant field
optional storage flags for new variant field ('private', 'con', 'ro', 'volatile', 'nonNil')
name for new variant field
';'

A record variant, if present, must be the last field in a record type. A record variant essentially consists of a restricted 'any' value (restricted to the set of types present in the variant fields), along with an enumeration type which indicates which of the variants is currently present. Multiple variant fields can have the same field type, but all must have a unique variant field tag. Because of those tags, using record variants does not require a run time type test - it only requires a 'case' or 'select' on the tag value. Variant fields cannot be 'noInit' - they must always have a value. Variant fields must always be of a tracked type, so 'inline' is not allowed.

The new "record variant selector type" is essentially a new enumeration type defined in the package along with the new record type. The rules associated with it are identical to those for enumeration types. The "variant selector field" cannot be written - it is set when the record is constructed and never changes.

Variant records are the central types for some Zed system internals:

"Names/Info_t" - this variant record is the "symbol table entry" used within the Zed system
"Types/Type_t" - this variant record is used to represent types within the Zed system
"Exec/Exec_t" - this variant record is used to represent all of the executable (code) elements within Zed, including proc bodies
"Package/PackageElement_t" - this variant record is used to represent all of the elements which can be in a package

Record constructors for variant records are the same as for non-variant records, except that the desired variant tag is appended to the record name with a separating '.', and the value for the variant field is added as a new last parameter to the constructor.

A given variant record reference can be of any of its variants, and this can usually only be determined at run-time. The Zed compiler does not assume the programmer is correct in knowing which variant a given value has - the programmer must use Zed language constructs which test the value appropriately. These constructs are the 'case' construct and the 'select' construct.

The 'case' construct is fairly standard, usually as a 'case' statement. The variant record it is to work with must be referenced by a proc formal or a local variable. For the duration of such a 'case', the formal or variable cannot be assigned to - this prevents changing which record is referenced after the 'case' has determined which variant is present. The index expressions on the 'case' must be the variant field tags defined in the variant record definition. Multiple tags cannot be used on a given case alternative. Within an alternative, the variant field associated with the tag can be accessed and written. Outside of that alternative, the variant field cannot be accessed. If the 'case' has a default, no variant fields can be accessed within the code of that default. The non-variant fields of the record are always available, subject to the usual record field rules.

The "6.6.2 "select"" construct is described later - it can be used with an 'if' construct, testing for a specific variant, or it can be used with an 'assert' construct, requiring a specific variant.

An example using variant records:

    record VariantOnly_t {
        case VariantOnlyKind_t vo_kind
        incase vok_id:
            Basic/Uint_t nonNil vo_id;
        incase vok_name:
            string vo_name;
        incase vok_flagVec:
            [] bool nonNil vo_flagVec;
        esac;
    };

    export(../OtherPackage) record Variant_t {
        Variant_t vr_next;
        uint vr_count;
        bool vr_hasError;
        case VariantKind_t vr_kind
        incase vrk_vo:
            VariantOnly_t nonNil vr_vo;
        incase vrk_pattern:
            string nonNil vr_pattern;
        esac;
    };

    proc
    insertByCount(@ Variant_t aHead; Variant_t nonNil vrNew)void:
        if assign Variant_t nonNil vrThis := aHead@ then
            if vrThis->vr_count > vrNew->vr_count then
                /* Insert before "vrThis". */
                vrNew->vr_next := vrThis;
                aHead@ := vrNew;
            else
                insertByCount(@vrThis->vr_next, vrNew);
            fi;
        else
            aHead@ := vrNew;
        fi;
    corp;

    proc
    createWithPattern(@ Variant_t aHead; uint count; string nonNil pattern)void:
        Variant_t nonNil vr := Variant_t.vrk_pattern(nil, count, false, pattern);
        insertByCount(aHead, vr);
    corp;

    proc
    createWithVO(@ Variant_t aHead; uint count; VariantOnly_t nonNil vo)void:
        con vr := Variant_t.vrk_vo(nil, count, true, vo);
        insertByCount(aHead, vr);
    corp;

    proc
    showVariant(Variant_t nonNil vr)void:
        FmtN("(", if vr->vr_next ~= nil then "..." else "<nil>" fi, ", ",
             vr->vr_count, ", ", if vr->vr_hasError then "BAD" else "GOOD" fi,
             ", ");
        case vr->vr_kind
        incase vrk_vo:
            FmtN("vo-");
            VariantOnly_t nonNil vo := vr->vr_vo;
            case vo->vo_kind
            incase vok_id:
                FmtN("id=", vo->vo_id->theUint);
            incase vok_name:
                string name := vo->vo_name;
                if name ~= nil then
                    FmtN("name=\"", name, "\"");
                else
                    FmtN("unnamed");
                fi;
            incase vok_flagVec:
                FmtN("flags:");
                [] bool nonNil vec := vo->vo_flagVec;
                for i from 1 upto getBound(vec) do
                    if i ~= 1 then
                        FmtN(",");
                    fi;
                    FmtN(vec[i - 1]);
                od;
            esac;
            Fmt();
        incase vrk_pattern:
            Fmt("pattern=\"", vr->vr_pattern, "\"");
        esac;
    corp;

    proc
    showAll(@ Variant_t aHead)void:
        for vr from aHead@ then vr->vr_next do
            showVariant(vr);
        od;
    corp;

The above example shows a pair of variant record types, and code for creating, manipulating and displaying them. Note the variant record constructors in "createWithPattern" and "createWithVO". Proc "showVariant" shows a pair of 'case' constructs which display the contents of two variant record types. Note the use of the list-iterating 'for' loop in proc 'showAll' - it is a way in Zed to loop down the elements of a linked list, using a variable that is known to be 'nonNil', so 'nil' checks are not needed with it. That construct is described later.

Storage flags 'nonNil' and 'con' are common with variant fields. 'nonNil' requires that the field is never 'nil', and thus the base record and that variant can be treated as a single larger record. 'con' states that the variant value, once set during record construction, can never be changed. That base record and the variant are then an inseperable pair. There are situations in which some or all variant fields need to allow 'nil' values or need to be changeable, and that is done by not specifying the relevant storage flag for the variant.

One restriction is currently imposed on variant record types - they cannot be defined inside generics. See "10 Generics" for discussion.

[This form of variant records is one of several possible choices. As mentioned previously, it is possible to have the variant part as a separate type. I decided against that on efficiency reasons - the representations within Zed (and presumeably lots of other situations) required an extra chunk of memory for each such record. There seemed to be no actual benefit from that restriction. One current restriction is that there is only one variant part allowed per record type. With the current form of the variant part, there could be multiple of them, and they don't need to be the last field. This idea causes syntactic difficulties with the record constructor - how are the choices for multiple variants specified? (The obvious choice is multiple "'.' <tag>" pairs on the type.) A requirement I had was to not make the constructor for non-variant records more complex.

Another current restriction is that the variant fields are all required to be tracked values. This is not strictly necessary, but it does simplify some things, and has run-time advantages. If the variant fields vary in size, then the allocated size of the record values also varies. This can lead to more memory fragmentation. It also makes it more difficult for reference counting and garbage collection - they now must examine the variant selector field, and use that with the record type description in order to know where all of the tracked values are. Currently, such code knows that the variant part is tracked, and deals with it directly as that. Early on, I had not yet decided that all tracked values contain a pointer to their type definition, and so examining the variant selector field might have been necessary. Having all tracked values contain a pointer to their type is what enables the use of things like 'any' and 'autoAny'.

There are also possible issues during record construction. Garbage collection can run at nearly any time (constraining it too much risks running out of memory when there is actually a lot available). If it runs during record construction, how does it deal with a partially-constructed record? This can be easier, and not require pre-zeroing of the allocated space, if the variant field is at a single fixed offset.]

4.11 Bits Types

'bits' types in Zed are related to struct types, in that they consist of a set of fields within a larger value. The differences relate to how 'bits' types can be used. Fields in 'bits' types are packed at the individual bit level, rather than at a byte or larger level. This means that there are never any padding fields inserted. Also, 'bits' types are restricted in the type of their fields, and are restricted to no more than 64 bits in total. 'bits' types are generally used to conserve memory by packing information more tightly, and are also used to map Zed types over top of external protocol specifications, hardware registers, etc. Fields in 'bits' types are not given any alignment - they are packed one after another, bit by bit.

Syntactically, a 'bits' type is defined as:

optional visibility specification
'bits'
name for new bits type
optional 'private' or 'ro'
'{'
one or more bits field declarations or fixed bits definitions, separated by semicolons
'}'

Each bits field declaration consists of:

optional bits width specification and ':'
type of field
optional storage flags ('private', 'con' or 'ro')
field name

Each fixed bits definition consists of:

optional bits width specification and ':'
constant bits value

"Fixed bits definitions" within the 'bits' type allow the programmer to specify sets of bits within the larger unit which have fixed values. Such values are usually all 0 bits, but can be other values, depending on the application. The "constant bits value" must be a value of type 'uint' which the Zed compiler can determine at compile time. If no bits width specification is given with such a value, then the compiler will use the smallest number of bits which can contain the value. Note that specifying leading 0's on a literal fixed value does not change the number of bits needed. When bits constructors are used, the specified fixed bits fields will automatically be included in the final value.

The 'private' and 'ro' flags on 'bits' types have the same effect as with struct types.

Note that "bits field declarations" do not allow multiple fields of the same type to be declared together - multiple "bits field declarations" must be used to define multiple similar fields.

The only types allowed for bits fields are:

'bool' - size of 1 bit cannot be changed
'char' - size of 8 bits cannot be changed
'uint' - default to 1 bit, but can be increased
other 'bits' types - specific size cannot be changed
'oneof' types - default to size needed for largest value, but can be increased

Renamed versions of such types can be used, so custom formatting procs can be given to the renamed types and thus used with output of 'bits' types. When a 'oneof' type is used as a 'bits' field, the maximum value of the 'oneof' values determines the bit size needed - it is not rounded up to a power of two as it would be if the 'oneof' were a separate variable or a field of a 'struct', etc.

The actual amount of space in bytes used by a 'bits' type is rounded up to a power of two when the 'bits' type is not used within another 'bits' type. Thus the size of such a final 'bits' type is:

1 - 8 bits => 1 byte
9 - 16 bits => 2 bytes, with 2 byte alignment
17 - 32 bits => 4 bytes, with 4 byte alignment
33 - 64 bits => 8 bytes, with 8 byte alignment

Padding bits are added to the end of the 'bits' type as necessary.

When outer 'bits' types in memory are accessed, the size of the access, in terms of native CPU instructions, is the size as given above. Some CPU's do not have such instructions, and so accesses will need to be of larger size, with subsequent operations to extract or insert field values. Note that this access rule holds even if the field being accessed is of a size and position that could be fully referenced by a smaller access. For example, if a 'bits' type consists of a pair of 8-bit values, any access to those fields will be done with a full 16 bit memory fetch. If the programmer wishes to access the fields using individual 8 bit fetches and stores, then they should not use a 16 bit 'bits' type to represent them. Instead a 'struct' type could be used.

Accessing of and storing to individual 'bits' fields is done using the same '.' syntax as for 'struct' fields. Note that 'bits' fields can be marked as 'con' (and so can never change) since entire 'bits' values are created by 'bits' constructors and can be assigned atomically. Such fields cannot be changed individually - new values must be created using constructors.

When a 'bits' type is used inside another 'bits' type, the above filling out of unused bits is not used - only the actual bits within the included 'bits' type are added to the enclosing 'bits' type. Any further fields in the enclosing 'bits' type will thus come immediately after the fields of the enclosed type.

The bit positions within a 'bits' type start with the high order bit. Thus, if the first field of a 'bits' type is of type 'bool', it will occupy the high-order bit of the enclosing value (unless that enclosing value is itself a 'bits' type, and the enclosed type is not the first field). Programmers can use this to increase the efficiency of their programs at a very low level, by knowing that CPU's often have quicker ways to test the high-order (sign) bit than to test other bits.

'bits' constructors are similar to 'struct' constructors. There are no separate 'bits' initializers, since they are not needed ('con' or 'var' declarations do not need to specify the type of the new variable when it is initialized from a 'bits' constructor). If the 'bits' type is 'private' or 'ro', or contains any fields that are 'private' or 'ro', then constructors for that type can only be used in code within the defining package.

An example, using the 'oneof' types "Mode_t" and "MoveOp_t" from the section on 'oneof' types:

    bits EA_t {
        3 : Mode_t ea_mode;
        3 : uint ea_reg;
    };

    bits Move_t {
        2 : 0b00;
        MoveOp_t mv_op;
        EA_t mv_dst;
        EA_t mv_src;
    };

The result allows us to define the Motorola M68000 "move" opcode encoding. Note that in "EA_t" we specify the bits size of the "ea_mode" field to be 3 bits, but that isn't necessary because that would be the default size of the "Mode_t" 'oneof' type. The explicit bits size is needed on field "ea_reg" since it is of type 'uint', which defaults to a size of one bit. The size is not specified on the "mv_op" field of "Move_t", and it will default to the correct 2 bits.

Another example set of 'bits' types;

    bits Inner_t {
        3 : uint in_top;
        3 : uint in_bottom;
    };

    bits Outer_t {
        bool out_f1;
        bool out_f2;
        2 : uint out_count;
        Inner_t out_first;
        Inner_t out_last;
    };

    bits Bigger_t {
        Move_t b_mv;
        char b_ch;
        8 : uint b_n;
    };

Here are the bit positions of the fields of "Outer_t", numbering the fields starting with the high order (sign) bit as bit 0:

out_f1: 0
out_f2: 1
out_count: 2 - 3
out_first.in_top: 4 - 6
out_first.in_bottom: 7 - 9
out_last.in_top: 10 - 12
out_last.in_bottom: 13 - 15

This completely fills a 16 bit (2-byte) containing value. If field "out_f2" were not present, all subsequent bit positions would decrease by 1, and there would be a single (low-order) bit left unused.

The order of the bits in multi-bit fields is the native order for the CPU involved. Thus no explicit bit shuffling is needed when accessing or storing multi-bit fields, unless an externally specified protocol or hardware that the 'bits' type is to be used with uses another ordering. In such a situation, the programmer is responsible for the bit shuffling - there is no facility for doing that in Zed (but, see the 'byteSwap' construct).

Since 'bits' types must have no more bits than 'uint', there are 'bits' constructors which work the same as record constructors, and yield values of the 'bits' type (directly, with no indirection).

Accessing bits fields is done using the same '.' syntax as accessing the fields of structs. When values are being assigned to 'bits' fields of 'uint' or 'oneof' types, the size of the values must be checked, either at compile time or at run time, to make sure that they fit within the defined bits space. This happens whether the values are provided in a 'bits' constructor or in individual assignments. On many CPUs, assigning values to individual 'bits' fields can require fetching the old full value from memory, masking out the space for the new value, then or-ing in the new value. The Zed compiler generates such code as needed.

Some example code using the above 'bits' types:

    proc
    test3()void:
        /* This gets 0xa3ec  0b1010001111101100 */
        Outer_t out := Outer_t(true, false, 2, Inner_t(1, 7), Inner_t(5, 4));
        Outer_t OUT = Outer_t(true, false, 2, Inner_t(1, 7), Inner_t(5, 4));

        bool f1 := false, f2 := true;
        uint n1 := 0, n2 := 1, n3 := 2, n4 := 3, n5 := 4;
        out := Outer_t(f1, f2, n1, Inner_t(n2, n3), Inner_t(n4, n5));
        out := Outer_t(false, false, 0, Inner_t(n2, 0), Inner_t(0, 0));
        out := Outer_t(false, false, 0, Inner_t(n1, 0), Inner_t(n2, 0));
    corp;

    proc test4()void:
        Move_t mv;
        MoveOp_t mOp;
        Mode_t m1, m2;
        Bigger_t b;
        EA_t ea;
        char ch;
        uint n;

        /* This gets 0x110f  0b0001000100001111 */
        mv := Move_t(mv_movb, EA_t(m_dDir, 4), EA_t(m_aDir, 7));

        /* This has fixed part 0xc4  0b0000000011000100 */
        mv := Move_t(mOp, EA_t(m1, 3), EA_t(m2, 4));

        ea := EA_t(m_index, 4); /* 0xd0  0b11010000 */

        /* This has fixed part 0x3000  0011000000000000 */
        mv := Move_t(mv_movw, ea, ea);

        /* This gets 0x2b6461f0  0b00101011011001000110000111110000 */
        b := Bigger_t(Move_t(mv_movl, EA_t(m_disp, 5), EA_t(m_dec, 4)), "a", 0xf0);

        b := Bigger_t(mv, ch, n);

        /* This gets a fixed part 0x00420000  0b00000000010000100000000000000000 */
        b := Bigger_t(Move_t(mOp, EA_t(m1, 1), EA_t(m2, 2)), ch, n);

        /* Lots of range checks generated here. */
        mv := Move_t(n, EA_t(n, n), EA_t(n, n));
        /* One here as well. */
        mv.mv_op := n;
    corp;

Zed has an alternate 'bits' constructor syntax which is useful when the 'bits' type has a lot of 'bool' fields. For 'bool' fields, the field name is given instead of a 'bool' expression. Also, 'bool' fields can be skipped in the constructor, and those fields will get value 'false'. Non-'bool' fields cannot be skipped - they must be given explicit values. This alternate constructor uses brace brackets ('{'/'}') instead of parentheses.

An example type from the Zed system showing this in use:

    export bits StorageFlags_t {
        bool sf_inline;             // storage is inline
        bool sf_private;            // storage is private
        bool sf_noInit;             // storage not in constructors
        bool sf_con;                // storage is unchanging
        bool sf_var;                // value can be written - overrides 'con'
        bool sf_ro;                 // storage is changeable only where declared
        bool sf_volatile;           // storage is volatile
        bool sf_nonNil;             // stored value is never nil
        bool sf_nilOk;              // value can be 'nil' - overrides 'nonNil'
        bool sf_package;            // '@' referent must be at package level
        bool sf_inheritPrivate;     // inherited as private to other package
        bool sf_inheritRo;          // inherited as ro to other package
        bool sf_inheritNonPublic;   // inherited from non-public to public
    };

    var sf := Types/StorageFlags_t{};
    ...
    sf := Types/StorageFlags_t{sf_ro};
    ...
    sf := Types/StorageFlags_t{sf_private, sf_con, sf_nonNil};
    ...
    if sf ~= Types/StorageFlags_t{} and sf ~= Types/StorageFlags_t{sf_nonNil} then
        ...
    fi;

That type has only 'bool' fields. Another example:

    bits Mixed_t {
        bool mx_flag1;
        3 : uint mx_counter;
        Mode_t mx_mode;
        bool mx_flag2;
        bool mx_flag3;
    };

    proc
    languageEx()void:
        Mixed_t mx1 := Mixed_t{3, m_indir};
        mx1 := Mixed_t{0o7, m_aDir, mx_flag2, mx_flag3};
        Mixed_t mx2 := Mixed_t{mx_flag1, 0b101, mx1.mx_mode, mx_flag3};
    corp;

Note that all fields, including 'bool' ones, must be given values in the order that they appear in the 'bits' type definition. It is not allowed to include an expression of type 'bool' in this form of constructor, since it is often not obvious which field it should be used for.

[My initial thoughts for 'bits' fields went further. I wanted to have some kind of "oneOf" within them, so that the structure of some bits could vary depending on other bits. My hope was that this could allow the Zed run-time system to come close to being a disassembler for some machine instructions. This turned out to be impractical.]

The order of evaluation of the expressions used in a 'bits' constructor is not specified. As usual, using local variables to hold pre-computed problematic values is the suggested way to resolve required ordering issues.

'bits' types are compatible with 'uint', 'bits8', 'bits16', 'bits32' and 'bits64'. No checking is done when a 'bits' value is assigned to or from one of the other types. This is the reason that 'bits' types cannot contain fields of enumeration types - Zed enforces and assumes that all enumeration values are in range, and I didn't want to require range checks on those assignments.

'bits' values can be directly assigned, and, as described above, numeric values can be assign to 'bits' destinations. Note that this might result in values for 'oneof' fields which are not one of the named 'oneof' values. It can also result in values for "fixed bits definition" fields which differ from those defined in the type. Since 'bits' values can be handled atomically by typical CPUs, Zed also allows equality comparison of 'bits' values. These comparisons use simple comparisons of the entire 'bits' values - they do not do a field-by-field comparison. Because of this, stray bits introduced by assigning numeric values to entire 'bits' variables can cause bits values to differ even when no individual named fields differ.

If a 'bits' type is 'private' or 'ro', or has any fields which are 'private' or 'ro', then assignment of entire bits values can only be done by code within the package which defines the 'bits' type. Comparison of 'bits' values is not restricted in that way.

If enough care is used, 'bits' types can be used in system level code to directly map hardware resources like control registers. The rule stating that references to 'bits' values are atomic in the entire value is key to this. As described above, it means that each reference to a field within a 'bits' value will fetch the entire 'bits' value, even if the field being referenced happens to be such that a smaller access would work. Similar for stores - all stores will be of the size of the entire 'bits' type, which, as described above, is a power of two bytes.

In such a situation, the reference to a hardware resource will nearly always be declared with the 'volatile' storage flag. This ensures that each reference to that resource will be done with a separate fetch from it. If a field within such a 'bits' value is updated directly, it will be done with a fetch of the old value and a store of the updated value. If some kind of hardware atomic fetch-and-update operation is needed, programmers will have to do that with lower level facilities, since Zed does not define any.

If a local variable, or other location within "normal" memory, contains a 'bits' value, and that value is assigned to a 'volatile' 'bits' location, then no fetch of an old value is performed - the new value is replacing the entire old value, and so only a single store is performed. This is the case, for example, when the 'bits' value is coming from either form of 'bits' constructor.

Programmers are warned to beware of bit numbering. The bit numbering used in the "Outer_t" example above is arbitrary - the Zed language does not define any one specific bit numbering. This is because different CPUs, buses, protocols and hardware devices all have their own bit numbering systems, and so attempting to consistently use any one numbering scheme will not work well much of the time.

Programmers should also beware of trying to make 'bits' values larger than is correct. If a hardware interface is defined such that parts of it can be fetched and stored either separately or together, then it is often best to not try to define a single 'bits' type that encompasses the entire interface. Instead, define a 'struct' type which includes fields (of 'bits' or other appropriate types) for each of the independent sub-values. If the hardware interface sometimes requires separate references/stores and sometimes requires combined ones (e.g. to make an atomic store), then the programmer will have to do this manually, perhaps by having multiple references to the hardware using different types.

There are other possible problems that programmers must beware of:

fetches from hardware resources can have effects
register fields can be cleared by storing a '0' to them
register fields can be cleared by storing a '1' to them
register fields can be cleared by fetching from them
fetching from or storing to a field can affect other fields

These are not even considering issues of atomicity, parallel accesses, etc. Reasons like these are why direct access to hardware resources is best left to those with experience doing it, and full correct documentation of the hardware.

4.12 Union Types

'union' types in Zed are very similar to union types in C. They are much like struct types except that all of the fields are overlayed at the beginning of the space for the union. Thus, they cannot hold values for more than one of their fields at the same time. The fields in a union are typically called "members", rather than "fields". Traditionally, union types have been used to accomplish things like "type cheating", but in a visually cleaner way than using pointer kludging. The C version of the Zed internals uses union types to represent Zed variant records, since C doesn't have anything that directly corresponds. There is no checking associated with union types - the programmer must keep track of which of its alternatives a union variable currently holds, based on information outside of the union variable itself.

If a union contains only "numeric" types, which includes integrals, floating point numbers, 'bits' types and 'oneof' types, along with arrays of such types, then they are considered to be "safe" unions. Safe unions can be declared and used by all programmers. Unsafe unions, containing pointers of any kind, or enumeration types, can only be declared and used by privileged programmers.

The inclusion of enumeration types (and the similar variant record kind values) is not allowed because the Zed language guarantees and assumes that such values are always in range. Union types are generally used where efficiency is needed, and so having to do range checks on modification of a union value containing an enumeration value is undesireable. It is always possible to use an integral type to represent the enumeration in a union, and convert to and from by adding/subtracting the first element of the enumeration - the needed check will then be done when adding the first enumeration element to the integral value.

Note that non-privileged programmers can declare variables, fields, etc. of unsafe union types. It is the selection of a member that is not allowed. They are also allowed to assign unsafe union values. Together, this means that all programmers can save and pass around unsafe union values - they just can't try to otherwise use or change such values.

Union members cannot be 'void', any '@' type, 'poly', or any incomplete (predeclared only) struct type.

A 'union' type declaration consists of:

optional visibility specification
'union'
name for new union type
'{'
one or more union members declarations, separated by semicolons
'}'

Each union members declaration consists of:

type of members
list of member names separated by commas

Union members are referenced using the '.' syntax. The alignment requirement of a union type is the maximum of the alignment requirements of its members. The size of a union type is the maximum of the sizes of its members. Note that unions do not allow storage flags for their members, and do not have any overall flags on the union itself.

Union values can be assigned - the operation will be a simple copy of the bytes of the data. Any further semantics are up to the programmer.

[I could probably have inline unions, which would work like inline structs. I don't think it is worth it, however. In fact, I personally don't like inline structs - they are really only there from early attempts at simpler inheritance methods. The language would be simpler without them. It could be possible to allow unprivileged programmers to assign union values that have addresses in them, but it would require the entire union becoming "private", i.e. preventing unprivileged programmers from storing to any member or reading address members. Again, I don't see much point.]

4.13 Pointer Types

Pointer types in Zed are very similar to pointer types in C. Pointer values are usually the address of some storage in use in the running program. Pointer values can be subject to modification by adding or subtracting 'uint' values to them. The validity of such a modified pointer value is unknown. Thus, only privileged Zed programmers are allowed to create or modify pointers.

Syntactically, pointer types consist of:

'*'
optional storage flags ('con', 'ro', 'volatile', 'nonNil', 'nilOk')
pointed-to type

The 'nonNil' and 'nilOk' attributes are only valid if the pointed-to type is a tracked or pointer type - they refer to the pointed-to value, and not to the pointer itself. A pointer type with the 'ro' attribute does not allow the pointed-to value to be changed. This differs slightly from the usual meaning of 'ro' for fields. Pointers are created by putting '&' in front of some parameter, variable or field reference. The attributes of that parameter, variable or field reference determine the attributes of the pointer value. Attributes must be consistent when assigning pointer values. Pointers can be "followed" by putting '*' after a pointer expression. The attributes of the resulting reference come from those in the pointer type. Such a reference can be fetched from or assigned to.

Since pointer creation is a privileged operation in Zed, the language makes them somewhat harder to use than in other languages, like C. The Zed compiler and code generators are free to assume that pointers never alias, that is, that stores through a pointer will not affect loads or stores through any other active pointer, tracked value or '@' in the program. Since only experienced programmers will have their pointer-creating code accepted for widespread use, and such use is expected to be quite constrained, it should be possible for this to work without making Zed unusable for those low-level operations which absolutely require pointer operations. Programmers can guarantee the expected order of loads and stores through pointers which might be aliased by making the pointers 'volatile'.

Since non-privileged programmers have no way to make tracked references (records, capsules, matrixes, etc.) point to overlapping storage, those are clearly non-aliased. '@' values can be aliased, for example if one is the '@' of a struct and another is the '@' of a field within that struct. In such situations, if the compiler cannot guarantee that no aliasing can happen, it might generate less optimal code than the equivalent situation using pointers. This reflects that the primary goal of Zed is that of correct code, with efficiency of execution being secondary to that.

Non-privileged code can use pointers provided to them by privileged code. See "14 Privileged Versus Non-privileged Code" for more information on the use of pointers. If a pointer value is not 'nonNil', then a "nil check" is done when it is "followed" by non-privileged code, but not when it is "followed" by privileged code.

Type '* void' is much like C's "void *" - it is a "universal" pointer type. Values of any pointer, tracked or '@' type can be assigned to destinations of type '* void'. '* void' values cannot be dereferenced, however.

As with '@' types, subject to the rules in "6.1 Assignment Statement" concerning storage flags, pointer types are compatible if the pointed-to types are the same or one is a direct rename of the other.

4.14 Proc Types

The syntax of proc types is that of 'proc' followed by a "proc header", described above in the description of procs as package elements. Proc types in Zed are used for variables, fields, etc. whose values are references to actual procs. Some example proc types:

proc()void
proc(uint var a, b)uint
proc(string nonNil s1)nonNil string
proc(@ nilOk [10] bool aFlags; uint pos; string tag)bool

There are two kinds of package-level items which can start with 'proc'. These are a proc predeclaration or definition, and the declaration of a package variable of a proc type. The standard Zed parser needs to be able to decide which case is present when it sees 'proc'. It does this by peeking ahead at the next character, to see if it is a '(' or not. When it does this, it does it in a mode where whitespace does not continue onto other lines. A consequence of this is that, at the package level, a proc type cannot be broken across lines after the 'proc' token. It is not expected that this issue will effect anyone, and it is described here only for completeness.

See "6.5 Proc Calls" for information on calling procs.

Proc values come from actual procs - a proc is a value of the proc type matching its proc header, unless it was defined with an overriding proc type, in which case its type is that overriding type. Overriding proc types are generally used to indicate that the proc is intended for use as a proc of that type. This differs from a proc just happening to have a matching type - the explicit overriding type is a deliberate indication of the proc's intended use.

For example:

    type BinaryOperator_t = proc(float a, b)float;

    BinaryOperator_t: proc
    sum(float a, b)float:
        a + b
    corp;

    BinaryOperator_t: proc
    difference(float left, right)float:
        left - right
    corp;
    ...
    BinaryOperator_t CurrentOp := difference;
    ...
    CurrentOp := ../PartnerPackage/PartnerOperator;

The use of the overriding proc type "BinaryOperator_t" on "sum" and "difference" shows that those procs are specifically intended to be used for that purpose. Note that "difference" has changed the parameter names, to clarify which is which in its body.

Only storage flags 'nonNil' and 'nilOk' are relevant for the formal parameters in proc types. Storage flags such as 'var' and 'volatile' matter in an actual proc, since they affect how the proc deals with its formal parameters, but they are not part of the proc's type.

Proc type compatibility is strict - the number, types, storage flags and names of the formal parameters must match, as must the result type and any result 'nonNil' indication. The use of an overriding proc type can allow the formal parameters to be renamed to something more helpful in the proc body. The grouping of formal parameters is not significant, i.e. a proc header with "bool a, b;" is equivalent to one with "bool a; bool b;", etc.

Proc values can only be compared against 'nil' - you cannot compare explicit proc values. You can compare "Proc/Proc_t" values against explicit proc names. Proc values can be called, but "Proc/Proc_t" values cannot since their proc type is not known at compile time (see "6.6.1 "assign""). Also, values of type "Proc/Proc_t" typically only exist at compile time, so calling such a value at runtime makes no sense. However, as mentioned previously, the Zed compiler internals are part of the Zed runtime, so it is possible to create "Proc/Proc_t" values at run time. Calling a proc value is done by placing an expression yielding the proc value inside braces, and then following it by the actual parenthesized parameter list. Building on the previous example:

    proc
    runOperator(float f1, f2)void:
        float result := {CurrentOp}(f1, f2);
    corp;

The braces alert the reader that indirection to the actual proc is taking place. This same syntax is used when calling methods from interfaces and capsules.

4.15 Other Types

The remaining kinds of types in Zed are 'path's, 'capsule's, 'interface's, types created by instantiating generics, and 'template' types. Path, capsule, interface and template types are tracked types. All of these types are explained in later sections. Syntactically, 'path' and 'template' types are like '@' types. See "22.3.2 Path Types" and "18.9.1 Template Types".

4.16 Type Equivalence

Most types in Zed have names. As has been mentioned previously, no two such types are equivalent. However, if type A is just a rename of type B, then A and B are assignment compatible. Types that don't inherently have names include matrix and array types, proc types and pointer and '@' types. As usual, if a named type is created that names such a type, then that named type is not equivalent to some other named type that names the identical type.

The unnamed types themselves can be equivalent, subject to restrictions shown below.

'@' types are equivalent if the '@'-ed types are equivalent and the storage flags are the same.

Pointer types are equivalent if the pointed-to types are equivalent and the storage flags are the same.

Template types are equivalent if the templated types are equivalent and the storage flags are the same.

Path types are equivalent if the referenced types are equivalent and the storage flags are the same.

Array types are equivalent if they have equivalent element types, have the same number of dimensions, and each bound expression or type is equivalent from one type to the other. Bound expressions are equivalent if they are literal constants with the same value, or if they are references to the same named constant (usually of type 'uint'). The paths to the named constant do not have to be the same, but the final target must be the same. Slight extensions are implemented to allow generics to deal with array types. [It would be possible to allow more complex expressions, doing a recursive comparison of the entire expression, but this is not deemed worthwhile. Currently, a minor annoyance is that uses of "Types/Range" are not considered for equivalence.]

Matrix types are equivalent if the element types are equivalent, the numbers of dimensions are the same, the element storage flags are the same and the defining packages are the same (since storage flag 'private' is linked to the defining package).

Proc types are equivalent if:

the result types are equivalent
the result 'nonNil' flags match
the formal parameter counts are the same
corresponding formal parameter types are equivalent
corresponding formal parameter storage flags are the same
corresponding formal parameter names are the same

Note the requirement for formal parameter names matching. This prevents most accidental matches - i.e. procs being used for purposes other than what they were created for. Procs can be explicitly given a named proc type using a "forced proc type" as described earlier. This also allows the proc to have different formal parameter names than are used in the named proc type.

Type equivalence is not the same thing as assignment, etc. compatibility. If a source and destination in an assignment have equivalent types, then the assignment, etc. is allowed. However, such assignments can be allowed where the source and destination are not of equivalent types. For example, 'uint' and the "bitsXX" types are compatible, even though they are not the same types. "6.1 Assignment Statement" gives additional rules.

4.17 Type Values

In Zed, types are "first class entities". This means that they can be used as values. When used that way, they have type "Types/Type_t", which is the Zed compiler's internal record type for types. The basic built-in types can be directly used as values, as can type expressions (types constructed from other types).

Note that types are entities whose representation exists at compile time in the compiler itself. When running under the "zed" command, all of the compilation data is still present when user code runs under the bytecode engine. Thus, these kinds of activities, and more, will work.

However, when running in native code code binaries created by the "zedc" compiler, no such representations exist at runtime. Since the compiler internals are part of the Zed libraries, user programmers can create types, procs, etc. at runtime, and deal with those as values. But, much of what is shown below cannot be done in native runtime code since the required information about the types involved may not be present.

Some examples:

    Types/Type_t nonNil myType := uint;
    doSomethingWithType(@myType);
    if myType = sint then
        doSignedStuff();
    elif myType = string then
        doStringStuff();
    fi;
    myType := [,] float;
    ...
    checkTypes(myType, [,] float);
    ...
    myType := MyRecord_t;
    myType := MyCapsule_t;
    myType := MyInterface_t;
    myType := MyInstantiation.GenericRecord_t;
    myType := proc(string nonNil s1, s2; uint len)nonNil string;
    myType := * [3, VEC_LEN + 7] float;

Constant expressions of type "Types/Type_t" can be used as types in declarations, and in other places where direct types are normally used. Some examples:

    bool DO_SIGNED = true;
    ...
    type MyType_t = if DO_SIGNED then sint else uint fi;
    MyType_t MyVar1 := 123456;
    if not DO_SIGNED then uint else sint fi MyVar2 := 0x1122334455667788;
    ...
    proc ctProc
    GetMetresType(bool signed)nonNil Types/Type_t:
        if signed then sint(m) else uint(m) fi
    corp;
    ...
    GetMetresType(true) NewVar := 107(m);

These kinds of uses can become hard to follow and hard to format, so programmers should restrict their use to absolutely necessary cases. It can also be difficult to realize possible problems - for example, the initialization of "MyVar1" above would fail with a negative value if "DO_SIGNED" were 'false'.

The following example shows a legitimate use for access to types and to procs which are normally internal parts of the compiler. As such, fully understanding it requires some familiarity with the workings of the Zed compiler, but a rough understanding is straightforward. This example also uses capsules, which are explained in "9 Interfaces and Capsules".

    capsule Inner_t {
        record {
            bool in_flag;
            uint in_n;
        };
    };

    capsule Middle_t extends Inner_t {
        record {
            bool mid_flag;
            uint mid_n;
        };
    };

    capsule Outer_t extends Middle_t {
        record {
            bool out_flag;
            uint out_n;
        };
    };


    proc
    showOffsets()void:
        Types/Type_t nonNil t := Outer_t;
        t := Types/SkipNameAndExec(t);
        assert select cap := t->t_capsule;
        Fmt("Bytesize of Outer_t: ", cap->cap_byteSize);
        assert assign [] ro Types/Field_t nonNil vec := cap->cap_initVec;
        for i from 0 upto getBound(vec) - 1 do
            assert assign Types/Field_t nonNil fld := vec[i];
            Fmt(fld->fld_name, ": ", fld->fld_offset);
        od;
    corp;

Output from running this code is:

    Bytesize of Outer_t: 56
    in_flag: 24
    in_n: 32
    mid_flag: 25
    mid_n: 40
    out_flag: 26
    out_n: 48

Zed is using a 64 bit model, so things like 'uint' are 8 byte values, and require 8 byte alignment. This example shows how Zed can pack fields tightly - the three 'bool' fields are all within the same 8 byte unit. Note that in this example, long-form declarations have been used, to show the actual types involved.

5 Declarations

Declarations of some compound types have been described above, and others are described later. The most common declarations are those for variables and constants. Both of these can appear at the package level or within procs, at the local level (not to be confused with 'local' visibility on package-level declarations).

5.1 Scopes

A "scope" in a programming language is usually a portion of the text of the program within which some set of names is defined. In Zed, a proc body is a scope, and several constructs in the Zed language introduce additional scopes. These include 'if', 'while', 'for' and 'case'. The range of validity of a particular local name is from its point of declaration until the end of the scope it is declared in.

Consider the following example:

    proc
    scopes(uint n; float f)uint:
        uint sum := 0;
        string nonNil s := "default";
        while
            uint tempN := func(n, f, sum);
            sum := sum + tempN;
            tempN < 1_000_000
        do
            float tempF := otherFunc(f, s);
            f := f - flt(sum) + tempF;
        od;
        sum
    corp;

Variables "sum" and "s" are declared at the top of the proc, and so they are available until the end of the proc body. Proc formals are similarly available from the beginning of the body until its end, and so "n" and "f" are considered to be in the same "scope". Variable "tempN" is declared within the 'while', so it is available only until the end of the 'while'. Variable "tempF" is in the same scope as "tempN", but it does not appear until later in that scope - it cannot be used before its declaration.

[Implementation note: tracked variables require special handling in Zed when Zed is using reference counting or garbage collection. This handling is needed on scope exit and sometimes on scope entry. 'return'ing out of a scope containing tracked variables requires additional handling.]

In Zed, new local names are not allowed to override or "hide" names which are visible within the scope that a new name would be declared in. For a proc local, this means that a new name is disallowed if that same name already exists in the current scope, an outer (surrounding) scope, or is a formal parameter to the proc, or exists at the package level. This differs from other programming language where such overriding or "hiding" is allowed.

Note that if a name is declared in an outer scope after an inner scope has ended, that name cannot cause a conflict with the same name in that already-ended inner scope. Names in an outer scope before an inner scope name is declared do cause a conflict, but names in that same outer scope after the inner scope has ended do not cause a conflict. This might seem inconsistent, but it also seems "natural", in that a name that has not yet been defined does not cause a conflict.

The same rule is used for package-level names. A proc cannot declare, as a formal or as a local, a name that is the same as a package-level name that appears before the proc. However, the exact same name declared in the package after that proc does not cause a conflict.

Note that this rule extends to names 'import'-ed into the package, and to the names of packages which are 'use'-ed by the current package.

[Many language designers and programmers will not like this "order matters" choice. However, I've seen too many bugs caused by having a local name silently override a more global name, and that overriding stop at the end of a scope. Requiring the compiler to go back and insert errors in declarations long after a declared name has gone out of scope because the same name is later declared in an outer scope is also confusing.]

5.2 Long-form Variable Declarations

Syntactically, long-form variable declarations consist of:

optional visibility specification when declaring package variables
type for variables
optional storage flags for variables
list of individual variable declarations, separated by commas

Each individual variable declaration consists of:

name for the new variable
optional ':=' and initial value expression for the variable

Only package variables can have a visibility specification - variables declared inside procs are valid from their declaration up to the end of the scope they are contained within.

Storage flags 'con', 'ro', 'volatile', 'nonNil' and 'nilOk' are relevant to variable declarations.

Variables which are declared 'con' must be initialized in their declaration, and can never be changed. No other assignment to such a variable is allowed. This property can allow optimizers to know that the value of such a variable (or field, etc.), once fetched, need never be fetched again.

Package variables which are marked 'ro' can only be assigned to by code within the same package - they are only readable by other code (assuming they have been exported so that other code can see them). If a proc formal or local variable is marked 'ro', then any '@' of the formal or local will be an 'ro' '@'. This means that they cannot be changed by any code not directly within the proc.

The 'volatile' attribute on variables has the usual meaning - all fetches from the variable and all stores to the variable must be performed, and they must be performed in the same order in which they occur during normal program execution. Additionally, all directly visible variables, fields, etc. with the 'volatile' storage flag are treated as a group, and all fetches and stores to any of them must be done in the same relative order. When run time proc or method calls are present, all of the fetches and stores must be complete before any call. One way of looking at these rules is that completely unoptimized code which literally does all operations as given, is correct, but optimized code is restricted in how it can move things around - it cannot change the order of any of the fetches from or stores to 'volatile' entities.

All values assigned to a 'nonNil' variable must themselves be 'nonNil', and that fact must be known at compile time. 'nonNil' variables must be initialized in their declaration. This means that 'nonNil' variables always contain a valid 'nonNil' value.

Tracked and '@' variables which are initialized to a 'nonNil' value in their declaration default to being 'nonNil'. If they need to be non 'nonNil' (so that 'nil' can be assigned to them at some point), then storage flag 'nilOk' must be given explicitly. Having 'nonNil' in the declaration storage flags is a way to require that all variables in the set be initialized and that all initialization values be 'nonNil'.

Note that there is only one set of storage flags for possibly multiple variable declarations. This, in combination with the rule about initialization with a 'nonNil' value, in the absense of an explicit 'nonNil' or 'nilOk', means that a set of variables declared together might end up with different values for the 'nonNil' attribute.

Variable initializations are performed in the order in which they appear. The expression used to initialize a variable cannot use a name that is defined after that expression in the containing package or scope. The exact timing of package variable initializations is not specified, but it is guaranteed that they will be done before any procs in the package which references package variables can run. It is an error to attempt to use a proc that has not yet been defined (i.e. that has only been pre-declared) in an expression to initialize a package variable (or constant).

Variables of non-'package' '@' types cannot be declared at the package level - they can only be local variables. Variables of '@' types not marked 'nilOk' must always be initialized in their declaration.

Variables, of types other than 'enum' types, which are declared but not initialized have undefined values. The value of such a variable, when used, is undefined. The compiler may attempt to warn of such situations, but it is not required to do so. The Zed system will do whatever it wishes in order to preserve its own integrity in such situations, but that behaviour is also not defined.

Example long-form variable declarations:

    [] bool con FlagVec := matrix([100] bool);
    uint FlagsUsed := 0;
    float StandardValue := computeStandardValue(),
        UpperLimit := StandardValue + VALID_RANGE,
        LowerLimit := StandardValue - VALID_RANGE,
        WorkingVar1, WorkingVar2, WorkingVar3;
    ...
    proc
    doExtensiveComputation(float f1, f2, f3, f4, f5, f6)float:
        float temp1 := f1 * f2 + 1.0, temp2 := (f5 + f6) / (f3 - f4);
        ...
        float temp3 := if temp1 < temp2 then temp1 else 0.0 fi;
        ...
    corp;

5.3 "con" and "var" Declarations

There are situations in many programs where the declaration of variables is one of the main contributers to the visual size of the code. For example:

    uint cnt := f(7), i := LIMIT - 1;
    bool completed := false;
    float average := 0.0;

That's three lines of code to declare four variables. Another situation which can be bulky is the common idiom of creating a new instance of a record or capsule, and assigning it to a local variable. This can be especially bad in Zed if the package exporting the type has a long name. For example:

    PackageInUse/NameOfRecord_t nilOk nor :=
        PackageInUse/NameOfRecord_t(nil, myKey, cnt + 3, false);

The 'con' and 'var' style of variable declaration is of help in these situations. In 'con' and 'var' declarations, no type is given. Instead, it is taken from the initialization value, which must always be present. Also, since the type comes from the initialization value, variables of differing types can all be declared in one declaration. The above examples can be done as:

    var cnt := f(7), i := LIMIT - 1, completed := false, average := 0.0;
        nor nilOk := PackageInUse/NameOfRecord_t(nil, myKey, cnt + 3, false);

Whether this is more or less readable will be a personal thing.

'con' declarations are the same as 'var' declarations except that storage flag 'con' is implicitly specified for all of the variables being declared. Such variables are given their initial value when they are created at run-time and can never again be changed. A surprisingly large proportion of variables can be of this kind.

For the programmer, or a program reader, using 'con' means that they can be sure that the variable will always have the value given in the declaration - there is no need to search for assignments to the variable to see if it is being changed. For the compiler, the 'con' storage flag helps as described above in "5.2 Long-form Variable Declarations".

There are two formats of 'con'/'var' declarations. The second form, con/var template variable declarations, is described in "18.9.5 Con and Var Template Declarations". Otherwise, 'con' and 'var' declarations consist of:

'con' or 'var'
list of individual variable declarations, separated by commas

Each individual variable declaration consists of:

name for the new variable
optional storage flags for the variable
':='
initialization value for the variable

Storage flags have the same meaning with 'con' and 'var' declarations as they do with long-form variable declarations. Note that there is a separate set of storage flags with each variable, so if a set of variables with consistent type and non-default storage flags is to be declared, it can be shorter to use the "long-form" of variable declarations. Also, if an initialization value for a pointer, tracked or '@' variable is 'nonNil', then that variable will default to 'nonNil', and an explicit 'nilOk' must be given to override it if needed. With 'con' declarations, using storage flag 'volatile' is an error, since that conflicts with storage flag 'con'.

Note that the storage flags come after the variable name. This is since storage flags before the first name in a 'con' or 'var' could easily be misinterpreted as applying to all variables in the set.

If variable declarations are done at the package level, they can be preceeded by the usual visibility specification, which will apply to all of the variables declared in the 'con' or 'var' set.

The type of the initialization expression must be directly determinable, and must be a valid type for variables. For example, you cannot use 'nil' as an initializer in a 'con' or 'var' declaration, since it does not provide a proper type for the variable. If the initializer is a complex expression, the programmer must be sure they know the type rules for Zed expressions, so that they get the final type they are intending. If the rules involved are not fairly simple, it might be better to use a long-form declaration, so that readers can also know what type will be used.

If 'con' and 'var' are used as much as possible, an additional benefit of readability appears: if 'con' is used, the reader knows that the variable will not be changing, but if 'var' is used (or a long-form declaration is used without 'con'), the reader knows it *will* be changing. See "97.2 Current System Status" for warnings which can be useful in doing this.

One particular situation can be confusing. As mentioned earlier, a struct or array constructor actually yields the '@' of the constructed value. So, a variable declared in a 'con' or 'var' set using such a constructor will be of that '@' type, rather than directly of the struct or array type. Note also that you cannot put a struct or array initializer in a 'con' or 'var' declaration. This is because the compiler needs to know the expected type before it can properly parse such an initializer. Examining one attempted use of a struct initializer:

    con structVar := {1};

the syntax of Zed will attempt to treat "1" as a proc expression to call!

5.4 Long-form Constant Declarations

Constant declarations are very similar to variable declarations. [Currently, they can appear as local names, but I might want to remove that ability - only variables and constants can be local names currently, and it might clean things up if it was only variables.] They use '=' instead of ':=' to specify a value, which they must always have, and they cannot have any storage flags.

Syntactically, constant declarations consist of:

optional visibility specification when declaring package constants
type for constants
list of individual constant declarations, separated by commas

Each individual constant declaration consists of:

name for the new constant
'=' and value expression for the constant

The value expressions for constants must be constant expressions, i.e. something which the compiler can evaluate at compile time. This can include expression elements, 'if' and 'case' constructs which can be evaluated at compile time and calls to compile-time procs. The values must be assignment compatible with the specified constant type. The value expression for a constant must reference only symbols defined previous to it in its or surrounding contexts.

Constants can be of the following types only: 'bool', 'char', 'uint', "bitsXX", 'sint', 'float', 'string', 'enum' types, 'record' selector types, 'bits' types, 'oneof' types and types with units. Array and struct types are discussed below. Renames of these types are not allowed in this context.

Example constant declarations:

    uint COUNT = 10, COUNT2 = COUNT * 2;
    bool DO_CHECKS = true, DO_LOGGING = false;
    string INITIAL_PATH = ".";
    export float
        PI      = 3.141592653589793238,
        PI_DBL  = 6.283185307179586477,
        PI_HALF = 1.570796326794896619,
        PI_QUAR = 0.7853981633974483096;
    ...
    bits B1_t {
        bool b1_flag;
        uint b1_count;
        3 : uint b1_3bits;
        6 : 0b010100;
        Oo1_t b1_oo1;
    };
    B1_t B1_CONST1 = B1_t(false, 1, 0o7, oo1_two);
    ...
    bits Flags_t {
        bool fl_a;
        bool fl_b;
        bool fl_c;
        bool fl_d;
        bool fl_e;
        bool fl_f;
        bool fl_g;
        bool fl_h;
    };
    export(../BrotherPackage) Flags_t C_AND_H = Flags_t{fl_c, fl_h};

5.5 "def" Declarations

Short-form variable declarations with 'con' and 'var' were described in "5.3 "con" and "var" Declarations". A similar short form exists for constant declarations. Syntactically, they consist of:

optional visibility specification when declaring package constants
'def'
list of individual constant declarations, separated by commas

Each individual constant declaration consists of:

name for the new constant
'=' and value expression for the constant

These are identical to the individual constant declarations in the long form. The difference is that with 'def' declarations, no type needs to be given, and constants of different types can be given in the same 'def' set.

'def' declarations can be done at the proc and package levels.

Again, whether this form is more or less readable than the long form will vary from programmer to programmer.

The type of a constant given with a 'def' declaration is taken from the value expression given with it. Note that there are no literals of the "bitsXX" types, so long-form constant declarations are still needed in some instances. Constants of type 'sint' defined from literals can be forced by using a unary '+' or '-' in front of the literal, as appropriate.

Some example 'def' declarations:

    def COUNT = 100;
    local def SIZE = 13.7;
    def AREA = 14(m) * 23(m);
    export(../Pk1, ../Pk2) def MIN = -10, MAX = +10, ZERO = +0;

5.6 Array and Struct Initializers

As mentioned previously, named array types and struct types allow multi-valued initializers. These are allowed for variables and for constants, at the package or proc level. These initializers appear in the "initial value" position in variable or constant declarations. Syntactically, an array initializer consists of:

'['
list of values for array elements, separated by commas
']'

Similarly, a struct initializer consists of:

'{'
list of values for struct fields, separated by commas
'}'

In both cases, the values can be simple scalar values, or can be array or struct initializers for inner arrays or structs. When providing initializers for constants, all elements must be appropriate constants. When providing initializers for variables, elements can be arbitrary run-time expressions. Constant initializers are 'con' - they can never change. [This allows them to be placed in physically read-only storage.] Initializers for structs must not include values for 'noInit' fields. This can result in valid struct initializers with no values, which is equivalent, at the top level, to not initializing a variable. If a struct is inlined into an outer struct, an initializer for the outer struct must also contain values for the fields of the inner struct, and those values are not in another level of braces.

As usual, the values in initializers must only use names defined previous to them in their and surrounding contexts. The order of evaluation of complex expressions containing side effects is not specified. If the order matters, then the programmer should use local names of element or field types, defined and initialized before the initializer containing the complex expressions, and then reference those local names instead of directly using the complex expressions.

Example array and struct initializers:

    [3] uint UIA1 = [1, 2, 3];

    struct Str1_t {
        uint str1_n;
        bool str1_flag;
        float str1_size;
        [4] char str1_ch;
    };

    Str1_t Str1 := {789, true, Math/PI, [".", ",", ":", ";"]};

    [3] Str1_t Str1s := [
        {111, false, Math/Sin(1.23), ["a", "b", "c", "d"]},
        {222, BoolVar, 4.56, ["A", "B", "C", "0" + UintVar]},
        {333, false, 7.89, ["0", "1", "2", "9"]}
    ];

Inside procs, named arrays and structs can also be initialized using an array or struct constructor. This is not recommended however, as it can be considerably less efficient than using an initializer.

If a one-dimensional array is being declared and initialized, the bound of the array can be given as '*' instead of an actual value. In this case, the resulting bound of the array type will be the number of initializing values that were given. The 'getBound' construct can be used with such an array to determine the bound value. In such a use, 'getBound' is a constant expression.

For example:

    [*] string StrSet := ["hello", "there", "world", "-", "how", "are", "you?"];

    proc
    showStrings()void:
        uint COUNT = getBound(StrSet);
        Fmt("There were ", COUNT, " strings");
        FmtN("[");
        for i from 0 upto getBound(StrSet) - 1 do
            if i ~= 0 then
                FmtN(", ");
            fi;
            FmtN("\"", StrSet[i], "\"");
        od;
        Fmt("]");
    corp;

Sometimes there can be a lot of initial values. For example, in a device driver for a complex device, such an initialized array could be the byte image of the firmware for the device, which is downloaded to the device by the driver. Note that declarations with initializers for arrays with '*' bound can only have one variable/constant in the declaration list, even though the syntax allows multiple.

[At the moment, there is no mechanism in Zed to declare the size of an initialized array, and only provide a first subset of the initial values for the array. Similarly, Zed has no mechanism to allow only a small number of selected values to be provided. This latter can be handled for writeable variables by initializing the array in the package init code.]

6 Statements, Expressions and Constructs

Many of the statements and constructs in Zed have been seen in previous examples. Information in this section is more specific and complete.

[Zed tends to used a reversed reserved word style for its compound statements, mixed with braces for many type declarations. This makes long sequences of compound statement ends more readable, but many programmers, used to C's braces, dislike it. Zed actually only has four reversed reserved words: 'fi', 'od', 'esac' and 'corp'. Except for 'fi', all are easy to pronounce. I pronounce 'fi' as if it were spelled "fie", as in "fee fie foe fum...".]

Statements in Zed are always executed in the order in which they appear. The compiler is not free to extract common subexpressions from statements and only execute them once unless it is certain that such an optimization cannot affect anything other than the run-time and memory reference sequence of the affected statements. If the programmer sees such common code and wants to only evaluate it once, they should create a local variable to hold the value, and then use that local variable in place of the expression uses. The memory reference sequence can be important for things like device drivers using memory-mapped I/O or for memory shared between multiple execution contexts. In such situations the programmer should be marking the memory as 'volatile', either directly or indirectly, so that the memory reference sequence is strictly determinable. If duplicate code is present because its affect on the execution time of the code is wanted, then it should be placed within a 'strict' section. See "8.9 "strict"".

6.1 Assignment Statement

Assignment in Zed uses the ':=' assignment operator. [I believe that historically a left-pointing arrow character was used for the assignment operator, but that generally doesn't exist on standard keyboards, so the character pair ":=" replaced it. The early 5-bit teletype encodings (ETA2 or Baudot) did not have '<'.] Zed uses '=' (and '==') for comparison operators. This is consistent with Algol-derived languages.

The assignment operator is also used when initializing variables. The meaning is the same, with the difference that it is possible to initialize a 'con' variable in the declaration, but not to assign to it later.

The basic assignment of, e.g. 'uint's, 'float's, tracked values, etc. is quite simple - the value is copied to the destination. However, there are a lot of rules for more obscure cases:

procs that run at compile time cannot be used as Proc/Proc_t values
uninstantiated generic procs cannot be used as values outside of their defining generic
'nil' can be assigned to any tracked or pointer type
any tracked value can be assigned to types 'any' and 'autoAny'
any '@' value can be assigned to type '@ void'
any '@ package' value can be assigned to type '@ package void'
any pointer, tracked or '@' value can be assigned to type '* void'
any template value can be assigned to type Exec/Exec_t
values of types 'bool', 'char', 'uint', 'sint' or 'float' will automatically be wrapped in "Basic/" wrappers and so can be assigned to type 'autoAny' as needed
values of a capsule type which implements a given interface can be assigned to destinations of the interface type
'char' values can be used as 'string' values - a new string is allocated containing just that one character
"[] char" values can be used as 'string' values - a new string is allocated, containing the characters from the vector
the specific values '0' and '0.0' (without units) can be used as values of any similar type with units
a string literal of length 1 can be used as a 'char' value
a 'uint' constant not larger than SINT_MAX can be used as a 'sint'
any explicit reference to a proc can be used as a Proc/Proc_t value
types 'uint', 'bits8', 'bits16', 'bits32' and 'bits64' are mutually compatible - no range checking is done
'bits' types are compatible with 'uint' and the above "bitsXX" types, and vice-versa; however two different 'bits' types are not compatible
'oneof' types are compatible with 'uint' and the above "bitsXX" types, and vice-versa; however two different 'oneof' types are not compatible
any path value can be assigned to type 'path void'
subtypes for 'template', '@', pointer types, path types, and element types for array and matrix types, must match exactly, or one must be a direct rename of the other
storage flags for 'template', '@', pointer types, path types, and element types for matrix types, must match, with exceptions described later
matrix types must have the same number of dimensions
a value of a capsule type can be assigned to a destination whose type is a capsule type which is extended by the value type
a value of an interface type can be assigned to a destination whose type is an interface type which is extended by the value type
if the type of the destination is a direct rename of the type of the value, or vice versa, the assignment is allowed, so long as the types are not proc types - named proc types must match exactly
if '@', pointer, template, path or matrix types match by these rules, then one, but not both of them, can be named and the types will match
if both value and destination types are renames of a common type, the assignment is not allowed
if the types involved result from instantiating the same generic type, they match only if they are from the same instances, even though they have the same names
values of generic type parameters can be assigned; however, for '@' generic parameter types, the assignment triggers instantiation of the containing proc

Storage flags on 'template', '@', and pointer types, and element types for matrix types must in general match for assignment compatibility. However, the rules are relaxed somewhat:

a value with a type without 'volatile' can be assigned to a destination of a type with 'volatile'
a value with a type without 'ro' can be assigned to a destination of a type with 'ro'
a value with a type with 'nonNil' can be assigned to a destination of a type without 'nonNil' but with 'ro'
a value with a type with 'con' can be assigned to a destination of a type without 'con' but with 'ro'
storage flags are ignored with 'template void'
since matrix elements must be initialized after matrix creation, matrix types cannot have 'nonNil'
a matrix value of a matrix type with 'private' can only be assigned to a destination of a matrix type with 'private' if the two types are defined in the same package
a matrix value of a matrix type with 'private' can be assigned to a destination of a matrix type without 'private' only in the same package as the value matrix type is defined in
a matrix value of a matrix type with 'private' can be assigned to a destination of a matrix type without 'private' but with 'ro'

Rules for 'path' types are given in "22.3.2 Path Types" and "22.3.5 Paths to Persistent Variables".

The above rules governing assignments and storage flags are only applicable to the top level storage flags in a template, '@' or pointer type. If the storage flags differ "further down" in the type, the rules are not applied and the storage flags must match exactly. For example, the following are allowed (showing just the types):

    * ro uint := * uint;
    * volatile char := * char;
    * ro * ro float := * con * ro float;

but the following are not:

    * * ro uint := * * uint;
    * * volatile float := * * float;
    * * ro char := * * con char;

Note that assigning a tracked or '@' value to a '* void' destination is considered to be creating a pointer, and as such is privileged. The distinction between '@ void' and '@ package void' destinations is made to allow code to maintain the 'package' property while still having a "universal" '@' type.

As with binary operators, the evaluation order of the destination and source expressions is not defined in Zed. This only matters if they contain side effects. See the section on "6.4 Binary Operators" for ways to prevent reordering.

There is at least one "assignment compatibility" error that requires additional explanation. That error is "'nonNil' '@' cannot be received by writeable plain '@'", and the similar errors for pointer and template types. The 'nonNil' storage flag on '@' types indicates whether or not the '@'-ed value can be 'nil'. Assigning one such value to a destination results in the two referencing the same storage. If one of the '@''s permits the value to be 'nil', then the other must be prepared for a 'nil', i.e. the 'nonNil' storage flags must match. If the destination '@' is 'ro' or 'con', i.e. is not writeable, then that reference cannot be used to change the value, and so cannot violate the 'nonNil' status of the source value. A related situation, where a value without the 'nonNil' storage flag is assigned to a destination with the 'nonNil' storage flag, is prevented with the error message "Plain '@' cannot be received by 'nonNil' '@'". Assigning a value of a 'nonNil' '@' type to a destination '@' without the 'nonNil' is allowed, so long as that destination cannot be used to change the now-common location. Referencing through that destination will incur run-time checks for 'nil' when they aren't needed, but there is no semantic problem.

6.2 Operator Precedence

Not all operators in Zed have the same precedence. In other words, how tightly they "bind" to their operands varies. The relative precedence of the operators is shown in the table that follows. Parentheses can be used to change the way the operators apply. Note that "#" versions of operators have the same precedence as the corresponding non-"#" operators.

When parentheses are used in expressions, their only function is to allow the overriding of operator precedence. They do not have any other semantic meaning. The compiler is free to ignore parentheses that do not affect the evaluation of expressions due to operator precedence. For example, the expression:

    (a + b) - c

is considered to be equivalent to:

    a + b - c

If the programmer wishes to have the "b - c" subexpression evaluated first, then they must compute it separately, or use 'strict'. See "8.9 "strict"". This might be desired, for example, if it is possible that "a + b" will overflow, but the pre-subraction of "c" from "b" prevents that.

Highest precedence
postfix operators, indexing, '##', field selections, constructors, proc calls, etc.
prefix '@' and '&'
prefix unary '~', '-' and '+'
binary '&', '><', '<<', '>>', '>~' and '<~'
binary '|'
binary '<>'
binary '^'
binary '*', '/' and '%'
binary '+' and '-'
comparisons: '=', '~=', '<', '>', '<=', '>=', '==' and '~=='
prefix unary 'not'
binary 'and'
binary 'or'
assignment operator ':='
Lowest precedence

C programmers should note that the boolean operators are not at the same precedence level as they are in C. The consequence is mostly that fewer parentheses are needed in some cases.

6.3 Unary Operators

In general, the result type of a unary operator in Zed is the same type as the operand. The unary operators in Zed are:

prefix '~' - this is the "bits not" operator, which flips the bits in its argument. It accepts values of types 'uint', 'bits8', 'bits16', 'bits32' and 'bits64', or single renames of them, and yields the same type it is given.
prefix '-' - this is the negation operator, which negates signed values. It operates on values of types 'sint', 'float', 'bits8', 'bits16', 'bits32' and 'bits64', along with 'sint' and 'float' types with units, or single renames of any of these. The result will be of the same type as the operand. Negation also accepts 'uint' constants which are small enough to be 'sint' values, and yields a 'sint' (perhaps with units) result. When negating the "bitsXX" types, no checks for overflow/underflow are done. With 'sint' values, it is possible to get an overflow because the smallest negative value cannot be negated in 2's complement representation. Utility routine Basic/SintNeg is available for situations where explicit error detection is required.
prefix '+' - the prefix '+' operator is essentially a no-op. It operates on 'sint' and 'float' values as well as those types with units, or single renames of any of those. If given an in-range 'uint' constant, it will return that constant as a 'sint' (perhaps with units). It's main use is to allow the programmer to format literals using '+' in the same way as formatting literals using '-'. Utility routine Basic/UintToSint is available for explicit error detection.
prefix 'not' - this operator works only with 'bool' values, and simply reverses the condition.
prefix '@' - this is the "enref" operator, which takes the address of its operand, yielding a value of an '@' type. Storage flags for the resulting type are taken from those of the value.
postfix '@' - this is the "deref" operator, which takes a value of an '@' type and accesses the referenced value.
prefix '&' - this is the "address-of" operator, which takes the address of its argument, yielding a pointer type. Storage flags come from those on the value. This is a privileged operation.
postfix '*' - this is the address dereference operator, which takes a pointer value and accesses the pointed-to value. C programmers note that Zed has this as a postfix operator, not a prefix operator.

If the operand to '~', '-', '+' or 'not' is a compile-time constant, then the result will also be a compile-time constant.

There are constructs in Zed, described later, which are operator-like in their operation but do not have the simple operator syntax. They are: 'toUint', 'fromUint', 'unit', 'byteSwap', 'evenParity', 'onesCount', 'lowOneIndex', 'highOneIndex', 'flt', 'round', 'trunc' and 'pretend'.

See also section "13.2 "#" Unary Operators" for information on user-defined versions of the unary operators.

6.4 Binary Operators

The binary operators in Zed generally do what most programmers would expect them to do. However, there are some special cases that must be detailed. In general, all binary operators will be evaluated at compile time when both operands are known at compile time. Some situations are slightly more permissive with constant operands. For example, 'uint' or "bitsXX" constants which are in-range are allowed to operate with 'sint' values. Similarly, positive 'sint' constants can be used as unsigned values. In cases where at least one of a pair of constant operands has type 'sint' the operation is signed, but the result, if it is positive, might be usable as an unsigned constant. Without special cases like this, the simplest 'sint' operations would be disallowed.

When operands are constants, the Zed compiler may issue warnings or remove unneeded operations. See "97.2 Current System Status" for more information. Operations on "bitsXX" values are not checked for overflow, etc. at run time, but the compiler may do such checks at compile time. Warnings produced by these extra checks are not enabled by default. Again, see "97.2 Current System Status".

Usually, when an operator accepts 'uint' operands, it also accepts operands of the "bitsXX" types. In such situations, non-constant 'uint' values are treated a lot like "bits65", in that they are allowed, and they win the "size" contest for result type. If both operands are actually "bitsXX", the result type is usually the larger of the operand types. Values of smaller "bitsXX" types are expanded with high-order 0 bits to the size of the larger operand before the operation is done. If 'uint' constants are used with "bitsXX" types, they are treated as if they had the smallest "bitsXX" type that is large enough to contain their value.

One level of type names is stripped from the types of operands to binary operators. Thus, if both operands are renames of one type, both renames are effectively removed, and the operation will be allowed. This differs from the assignment operation, parameter passing, etc. where having both types be renames is an error. With those situations, the two types play different roles, whereas with binary operators the two play the same role. [Arguments can be made that this is not the way to do this. Disallowing it, however, might make the use of, say, renames of integral types, less useful.]

Operands with units are handled special by those operators that allow them. If operand types are renames of allowed types, that renaming will always be stripped off when choosing the type of the expression result. This will very rarely matter to programmers, and if it does, the programmer can use intermediate variables to more precisely control operand types. In extreme cases, programmers might wish to define their own artificial measures and units to force more checking.

Examples:

    bits8 FLAG_MASK = 0x7c;

    "FLAG_MASK + 50" - result type is bits8

    "FLAG_MASK + 200" - result type is bits8, but overflow has happened

    "FLAG_MASK + 300" - result type is bits16

    bits8 runtimeVar := FLAG_MASK;
    ...
    "runtimeVar + 200" - overflow not detected, result has type bits8

[I expect that the current implementation differs in details between the bytecode engine and the early X86-64 code, in terms of behaviour with mixed sizes and/or overflows. Does that actually matter - are such things best left as specifically undefined? I would like operations involving the "bitsXX" types to be efficient, since those are low-level operations, so I would like to avoid a lot of extra masking.]

See also "13.3 "#" Binary Operators" for information on user-defined versions of the binary operators.

The order of evaluation of the the two operands in a binary expression is not defined. If the order matters, e.g. due to hidden side effects, then the programmer should assign both operands to local variables, and then do the binary operation using those local variables. A comment describing why that is necessary is usually a good idea.

[The issue of operand re-ordering occurs in several situations. One of the commonly important situations is in proc calls. If actual parameters are themselves complex expressions, then on many current CPU architectures, several machine registers can be needed to evaluate the expressions. If many machine registers are already "in use" because of other proc parameters or pending expressions, the generated machine code might have to temporarily save register values (typically by pushing them on to the stack) before the registers can be re-used for the current expression. This can slow down execution of the code. If the compiler is allowed to re-order the evaluation of parameter expressions, then it can evaluate the complex ones first, so that only one machine register per parameter is used as the full set of parameters is evaluated.

A risk with the compiler silently doing this is that the evaluation of parameter expressions might involve calls to procs that have side effects. Thus, changing the order of parameter evaluation can change the value of some parameters passed to the proc. This can result in difficult-to-diagnose problems. The argument can be made, however, that if the code is such that this can happen, then the affected parameter expressions should all be evaluated separately, before the proc call, with their values being assigned to new local variables, which are then passed to the call instead of the expressions. Doing this avoids any risk of the compiler re-ordering the evaluation, and also makes the hidden dependency clearer to code maintainers, especially if there is an accompanying comment.

So, the decision made in Zed is that in these situations, the compiler is allowed to re-order evaluation. The above technique can be used to avoid problems, and to make the situation clearer.]

6.4.1 Bit Operators

The bit operators '&', '|' and '><' are the bitwise "AND", "INCLUSIVE-OR" and "EXCLUSIVE-OR" operators. Both operands must be 'uint' or one of the 4 "bitsXX" types. The result type is that of the larger (most bits) operand.

6.4.2 Shift and Rotate Operators

The bit shift operators are '<<' and '>>'. The bit rotate operators are '<~' and '>~'. Both operands must be 'uint' or one of the 4 "bitsXX" types. Because of this, negative shift/rotate amounts are not supported, and arithmetic (signed) shifts are not directly supported. The result type is the type of the left operand. Bits "lost" by a shift operation are discarded. Note that the rotate operators can get quite different results depending on the size of the operand being rotated.

Utility routines Basic/ShiftLeft and Basic/ShiftRight are available for when detection of a too-large shift (for 'uint') amount is required.

[The bytecode engine exits with an error if the shift amount is >= 64. The initial X86-64 code generator produces code that will abort if the shift amount is >= the size of the left operand. Some argue that the shift should allow an amount one greater than the size of the operand. However, since real CPU's vary in their behaviour with a shift amount greater than the bits size, allowing the larger shift would be expensive to produce on such a CPU. Shifts are considered low-level operations in Zed, so extra cost is not desired. Since some native CPUs have shifts in various sizes, the Zed bytecode engine perhaps should as well, so that the behaviour is consistent. However, other than cases where explicit type casts are done by a privileged programmer, its not clear that any differences would be visible. The differing shift amount limits can easily be seen, however. The bytecode engine does have different rotate opcodes for the different sizes, since those get different results. The opcodes get run-time exceptions if the rotate amount is greater than or equal to the bit size of the left operand.]

6.4.3 "relate"

The "relate" binary operator, written '<>', is a kind of comparison operator - it takes a pair of comparable values and yields a 'sint' value of -1, 0 or +1, depending on whether its left-hand operand is less than, equal to, or greater than, its right-hand operand. This operator is typically used when instantiating generic sorting code. Values of types 'uint', 'sint', 'char', 'float', 'string', enumeration values, variant record selector values and 'oneof' values can be compared using the "relate' operator. In addition, "bitsXX" values can be compared, and compared with 'uint' values. See "6.4.10 Comparison" for the standard comparison operators.

6.4.4 Exponentiation

The '^' operator is the exponentiation (power) operator. It operates only on 'float' values.

[Arguments could be made for extending this to allow 'uint' and perhaps 'sint' operands. I should also create Basic/ routines to allow explicit error detection.]

6.4.5 Multiplication

The '*' operator is the multiplication operator. Pairs of 'float' operands yield a 'float' result. Pairs of 'sint' operands yield a 'sint' result. Pairs of 'uint' or "bitsXX" operands yield a result whose size is that of the largest operand. As mentioned previously, the operand "size" of a plain 'uint' constant is determined by the value of the constant. Similarly, 'uint' constants can participate in 'sint' and "bitsXX" multiplications. Overflow is detected for 'uint' and 'sint' multiplication, but not for "bitsXX" multiplication. 'float' multiplication error semantics are not specified - Zed will use standard target CPU multiplication instructions, and will not generate any additional code for checking purposes. Multiplication of 'float' constants will directly use host machine semantics with no checking.

If one or two unit types are involved (on 'float', 'uint' or 'sint' operands), then the product type will have a unit type which is the product of those operand unit types.

Routines Basic/UintMul and Basic/SintMul are available for explicit error detection.

6.4.6 Division

The '/' operator is the division operator. Usually, the type of the result is the same as the type of the left hand operand. Floating point values can be divided by other floating point values. 'sint' values can be divided by other 'sint' values, and in-range integral constants can be either operand with 'sint' division. 'uint' and "bitsXX" values can be divided by 'uint' or "bitsXX" values or by positive 'sint' constants. Division by a constant 0 or 0.0 is a compile error. Division by a constant 1 or 1.0 may generate a warning, depending on the warning level. Division by 0 is also detected at run time.

If both operands have a unit type, then the result will have a unit type which is the left operand's unit type divided by the right operand's unit type. If the left operand has a unit type, but the right operand does not, then the result will have the left operand's unit type. If the right operand has a unit type, but the left operand does not, then the result will have the inverse of the right operand's unit type.

Routines Basic/UintDiv and Basic/SintDiv are available for explicit error detection.

6.4.7 Remainder

The '%' operator is the remainder or modulo operator. Usually, the type of the result is the same as the type of the left hand operand. Floating point values can be remaindered by other floating point values. 'sint' values can be remaindered by other 'sint' values, and in-range integral constants can be either operand with 'sint' remainder. 'uint' and "bitsXX" values can be remaindered by 'uint' or "bitsXX" values or by positive 'sint' constants. The integral forms of this operation truncate non-integral results towards 0. Remainder by a constant 0 or 0.0 is a compile error. Remainder by a constant 1 or 1.0 may generate a warning, depending on the warning level. Remainder by 0 or 0.0 is also detected at run time.

Semantically, the remainder operation is viewed as a repeated subtract, rather than as a divide. So, if both operands have a unit type, those unit types must match. Any unit type on the left operand appears on the result type. Remaindering a non-unit type left operand by a unit-type right operand is not allowed.

Routines Basic/UintRem and Basic/SintRem are available for explicit error detection.

6.4.8 Addition

The addition operator, '+', has more allowed combinations than most other binary operators. Warnings may be produced for 0 or 0.0 operands. As usual, in-range 'uint', 'sint' and "bitsXX" constants can be used with any of those types. Addition is commutative, i.e. the operands can be reversed and the result will be the same, so in the following list, only one ordering is shown. When a pointer, enumeration or record selector type is the result type, it is the same type as the operand. The type combinations allowed and the result types for addition are:

'uint' '+' 'uint' => 'uint' (or "bitsXX" types)
'sint' '+' 'sint' => 'sint'
'float' '+' 'float' => 'float'
'string' '+' 'string' => 'string'
'char' '+' 'char' => 'string'
'string' '+' 'char' => 'string'
'char' '+' 'uint' => 'char'
pointer '+' 'uint' => pointer
'*' 'void' '+' 'uint' => '*' 'void'
enumeration '+' 'uint' => enumeration
record selector '+' 'uint' => record selector

Overflow/underflow is detected at run time for 'uint' and 'sint' addition. Note, however, that it is possible that an incorrect result can be stored before the overflow/underflow is detected. This can happen if the compiler can use a single machine instruction which fetches a value, adds to it, and stores the result back. Range checks are done at run time for addition to 'char', enumeration and record selector values. Pointer arithmetic is privileged and is not checked.

"Addition" involving strings is string concatenation. Values of type "[] char" can be concatenated as if they were of type 'string', and the result of the concatenation will be of type 'string'. This flexibility is not extended to other 'string' operations.

If one operand has a unit type, then the other operand must have a matching unit type, and the result will have that unit type.

Utility routines Basic/UintAdd and Basic/SintAdd are available for explicit error checks.

Note that addition (and subtraction) involving pointers does not scale by the bytesize of the pointed-to type. This differs from C and C++.

6.4.9 Subtraction

The subtraction operator, '-', has more allowed combinations than any other binary operator. Warnings may be produced for 0 or 0.0 operands. As usual, in-range 'uint', 'sint' and "bitsXX" constants can be used with any of those types. When pointer, enumeration or record selector types are involved in subtraction, the operand types must match, or the second operand must be 'uint' or "bitsXX" and the result type will be the same as that of the left operand. The type combinations allowed and the result types for subtraction are:

'uint' '-' 'uint' => 'uint' (or "bitsXX" types)
'sint' '-' 'sint' => 'sint'
'float' '-' 'float' => 'float'
'char' '-' 'uint' => 'char'
'char' '-' 'char' => 'uint'
pointer '-' 'uint' => pointer
pointer '-' pointer => 'uint'
'*' 'void' '-' 'uint' => '*' 'void'
'*' 'void' '-' '*' 'void' => 'uint'
enumeration '-' 'uint' => enumeration
enumeration '-' enumeration => 'uint'
record selector '-' 'uint' => record selector
record selector '-' record selector => 'uint'

Underflow is detected at run time for 'uint', 'sint', 'char', enumeration and record selector values. Note, however, that it is possible that an incorrect result can be stored before the underflow is detected. This can happen if the compiler can use a single machine instruction which fetches a value, subtracts from it, and stores the result back. Pointer arithmetic is privileged, and is not checked.

If both 'uint', 'sint' or 'float' operands have a unit type, then they must have the same unit type, and the result will have that unit type. 'uint' operands mixed with other types must not have a unit type.

Utility routines Basic/UintSub and Basic/SintSub are available for explicit error checks.

6.4.10 Comparison

The comparison operators are '<', '<=', '=', '~=', '>=', '>', '==' and '~=='. Operators '==' and '~==' can only be used with strings, where they are comparing the identity of the strings ("Is this the same string entity?", rather than "Is this a string with the same sequence of characters?").

Operators '==' and '~==' can use 'nil' as one operand. If 'nil' is used against a string with any other comparison operator, a level-0 warning is produced. If only one operand has a unit type, then the other operand must be a constant 0 or 0.0. Pointer, tracked and 'nilOk' '@' operands can be compared against 'nil'. Tracked operands can only be compared using '=' and '~='.

Explicit procs cannot participate in comparisons. For example, if "doSomething" is a proc, then a comparison of "pPointer = doSomething", where "pPointer" is a variable (or expression) of the appropriate proc type, is not allowed. However, comparisons of values of type "Proc/Proc_t" are allowed, as are comparisons of non-explicit values of proc types.

[The intent here is to allow comparisons versus "Proc/Proc_t" to use the internal representation of a proc in the comparison, while allowing comparisons of actual proc values to use code addresses. The net result is that the internal representations can often be unneeded at run-time. It might be possible to recognize which situation is present, and do the appropriate thing, but it would then be necessary to choose which form is presented in situations where that is not possible - e.g. when an explicit proc name is passed to a compile-time proc.]

If 'capsule' values are being compared, the type of one operand may be a capsule type that 'extend's the other type. If 'interface' values are being compared, the type of one operand may be an interface type that 'extend's the other type.

If a 'char' value is compared against a 'string' value, the 'char' value is converted to a 'string' for the comparison.

In all other cases, the operand types must be the same. Values of all types other than struct, array, union and '@' generic type parameters can be compared. Note that 'uint' and 'sint' values cannot be compared against each other. This is because there are possible values that each type can have that the other type cannot represent, and proper comparisons would need to test the absolute size of the values as well as comparing the bits themselves. Most programmers would not expect that extra cost in such a "basic" operation. [There could be many cases in C code where signed and unsigned values are accidentally compared, occasionally yielding incorrect results.]

The result of a comparison is a 'bool' value.

6.4.11 Boolean AND

The 'and' "operator" can only be applied to 'bool' values, and produces a 'bool' result. If the left-hand operand evaluates to 'false', then the right-hand operand will not be evaluated at all. This is often called "short-circuit evaluation".

6.4.12 Boolean OR

The 'or' "operator" can only be applied to 'bool' values, and produces a 'bool' result. If the left-hand operand evaluates to 'true', then the right-hand operand will not be evaluated at all. This is the other case of "short-circuit evaluation".

6.5 Proc Calls

Calls to procs in Zed have the traditional form:

the proc to call
'('
zero or more parameter expressions, separated by commas
')'

If the proc to call is named directly in the call, then the name, or a path to the name, is simply written explicitly. However, if the proc to call is determined at run-time, then the expression that yields the proc is put inside curly braces ('{' and '}'). This latter is done to remind the reader that something a little special is happening. Note that unlike other programming languages with object-oriented features, this special marking is also done when a method is being called. Sometimes the compiler can know the target method at compile time, but the programmer cannot assume that, so braces are always required.

Zed has only one parameter passing mechanism - call by value. That simply means that, other than an optional single result from a proc, values only flow into the proc, and not out. To allow procs to yield multiple result values, or to have values which are passed both in and out of calls, parameters can be of '@' types, and '@' of variables, etc. in the calling context can be passed on calls.

Zed uses the name "formal parameter" to refer to a proc parameter as it is seen in the proc header, or from within the body of the proc. The name "actual parameter" is used for the expression given in an actual proc call, or when referring to that value within the context of the code of the proc. When there is unlikely to be confusion, just "parameter" may be used.

Zed does not specify the order in which proc parameters are evaluated. If the evaluation order is important, then the programmer should do the evaluations outside of the call, assign the results to local variables, and then pass the local variables on the call. [The Zed compiler itself sometimes has to partially do this, in order to avoid the possibility of the proc creating dangling references involving parameters. Because of that, programmers might see the evaluation order of the actual parameters change as they change the proc header and calls to it.]

The Zed language includes the concept of compile-time execution of procs. This affects the kinds of things that proc calls can do, but for standard 'ctProc' procs this does not affect the syntax of proc calls. Other kinds of compile-time procs, discussed later, can have modified calling syntax, but the basic structure of a parenthesized list of comma-separated items is still used for the calls themselves. Also, on calls to some compile-time procs, the first parameter is a special compilation context parameter, and is passed implicitly by the compiler - the programmer should not try to pass an explicit value.

A special form of proc reference is 'baseCall'. This can only be used inside procs which implement methods in capsules. When used inside a proc that implements a method, it refers to the proc available in the capsule that the current capsule extends - it is an error if there is no such proc. The proc can be for a capsule method or an interface method.

Some example proc calls:

    Fmt("The final result is: ", result);

    x := Math/Sin(y * 3.0);

    uint a, b;
    computeResults(@a, @b, getValue(1), getValue(2), 13, false);

    {myProcVar}("Fred");

    string name := {myCapsuleVar->xx_getName}();
    name := {myInterfaceVar->yy_getName}(NAME_TYPE_PRETTYNAME);

    ... inside a capsule method ...
    name := baseCall(thisVar);

When proc parameters are of '@' types, such as struct and array parameters, it is possible that '@''s of the same actual storage are provided multiple times on a single proc call. This is called "aliasing", and can result in incorrect operation if the proc was written to assume that all storage it is using is independent. For example, it might be reading a pair of arrays and writing a third. If the output array is given as one of the inputs, incorrect results can be produced, since intermediate stores to the output array can overwrite values in that array used as input. The Zed compiler attempts to detect such situations and issue a warning. However, it cannot detect all such situations because the aliases might be created by run time actions. These aliasing situations have undefined behaviour in Zed.

A situation arises with user-defined ("#") operators where '@' is needed on an actual proc parameter, but it is confusing for the programmer to supply it. The mechanism Zed uses to avoid this is based on the ability in Zed to put arbitrary flags (and in fact much more) on types. Package Exec exports symbol "AUTO_AT", which provides a name which can be put on types using proc "Types/ExportBoolAdd". For example:

    struct MyData_t {
        ...
        struct fields
        ...
    };
    ...
    eval Types/ExportBoolAdd(MyData_t, Exec/AUTO_AT);

"ExportBoolAdd" is a 'ctProc' proc, so the call is done at compile-time, meaning that the flag is added at compile time. The 'eval' package-level syntax is discussed later in "18.2 Package-Level "eval"". Thus, any code compiled after the execution of this "ExportBoolAdd" will be done with the "AUTO_AT" feature active on type "MyData_t". With that feature active, if a proc has a formal parameter of type "@ MyData_t" (with optional storage flags), and the actual parameter given is of type "MyData_t", then the Zed compiler will automatically add a hidden '@' to that parameter. [I hate implicit things like this, but it was badly wanted for use with compile-time "#" stuff, where it is often not easy to tell if the '@' is needed or not.]

If a proc being directly called has flag 'abort' in its header, code after the call cannot be executed. A warning will be issued about such code.

6.6 Conditionals

The 'if' construct ('if' statement or 'if' expression) is the main conditional construct in Zed. The conditions in 'if's are usually expressions that yield a 'bool' result. Conditions can also be 'assign' or 'select' clauses. Syntactically, an 'if' construct consists of:

'if'
condition
'then'
body sequence
zero or more 'elif' clauses
optional 'else' clause
'fi'

Each condition consists of one of:

boolean expression (expression yielding a 'bool' result)
'assign' clause
'select' clause

A body sequence is a sequence of statements, separated by semicolons. If the 'if' is an 'if' expression, then the last element in each body sequence must not be followed by a semicolon, and yields the value for this branch or alternative of the 'if'.

Each 'elif' clause consists of:

'elif'
condition
'then'
body sequence

An 'else' clause consists of:

'else'
body sequence

If an 'if' construct is an 'if' expression, then it must have an 'else' clause, since it must always yield a value.

Each body sequence in an 'if' is a scope.

The type of an 'if' expression is determined by the types of the various body sequences within it in a somewhat complicated way. The first body sequence usually defines the result type. However, as usual, Zed has rules which allow desireable cases to work without requiring extra work by the programmer. For example, if the first body sequence is a simple 'uint' constant, but the second is an expression of type 'sint', then the result of the 'if' will be 'sint', and the 'uint' constant must be in range. Similarly, a 0 or 0.0 with no unit type will work with a later body sequence which has a unit type. Also present is the usual rule that 'nil' can be used with a later tracked type or a later pointer type. A further special case is that 'char' values can be mixed with 'string' values, and they will be converted to 'string's. Tracked values are compatible with a later 'any' or 'autoAny' result type.

One special case requires more explanation. In Zed, a value of a 'capsule' type which implements a given 'interface' is a valid value of that 'interface' type. The 'interface' type need not be the type of the first alternative - Zed will accept it as a later alternative. Similarly, a value of a 'capsule' or 'interface' type is a valid value of a 'capsule' or 'interface' type which the first 'extend's. This rule is allowed for the branches of an 'if' expression, provided that the proper final type is present on the first alternative.

If any of the alternatives of an 'if' expression is not 'nonNil', then the result of the 'if' expression is not 'nonNil'. If the final type of the 'if' expression is going to be "Proc/Proc_t", then explicit proc names can be used as the values of alternatives. If explicit proc names are used otherwise, then they must all be of the same proc type.

These various rules sound complex, but most programmers will not need to worry about them. The rules allow the Zed compiler to determine the result type of a conditional without complex and time-consuming processing. They also allow the compiler to issue error messages which are nearly always at the point of the error, rather than at the end of the condition.

[I have tests for all of this, I think, but what I have written above should be checked.]

Some example conditionals:

    if a < b then
        Fmt("a is smaller");
    fi;

    if getName(@name, namedThing) and name = "Fred" then
        processFredThing(namedThing, true);
    elif namedThing.nt_copiedName then
        processOtherThing(namedThing, false);
    else
        appendErrorNote(namedThing, EN_WRONG_NAME);
    fi;

    float(m) len := if Stuff/Defined(a) then Stuff/Len(a) else 10.0(m) fi;

    sint range :=
        if val < -100 then
            -2
        elif val < 0 then
            -1
        elif val = 0 then
            0
        elif val <= 100 then
            +1
        else
            +2
        fi;

6.6.1 "assign"

An 'assign' clause consists of:

'assign'
type or 'con' or 'var'
optional storage flags
name for new variable
':='
initial value expression for variable

The variable defined in an 'if' 'assign' clause is defined until the end of the alternative body. The variable defined in an 'assert' 'assign' clause is defined until the end of the current scope. If 'con' or 'var' is given instead of an explicit type, then the type is taken from the initial value expression, and the 'assign' is a "nonNil assign", i.e. one that is not a run-time type test, but is simply a 'nil' test. Such a 'nil' check can also be done with the "8.4 "nonNil"" construct. If a type is given, then it must be one for which a value of 'nil' is valid. The variants of the 'assign' clause are:

expression is of type 'any' or 'autoAny' and the type is a tracked type other than 'any' or 'autoAny' - the body is executed if the value is of the specified type. This is an explicit run-time type check. The type cannot be a proc type, a template type or an uninstantiated generic type. Note that if type 'any' (or 'autoAny') is renamed so that type exports can be added to it, no value will have that naming type as its run-time type. Run-time types always come from when the tracked value is allocated. In general, the types must match exactly. However, if the given type is from an instantiation of a generic, then the run-time comparison will skip through a naming of a type on either or both of the types (the specified type and the allocation type). This check requires full run-time type information and so is not (yet) supported with Zed native code.
expression is of type "Proc/Proc_t" and the type is some specific proc type - the body is executed if the signature of the "Proc_t" value exactly matches the specified type. This test is often used to allow calling a newly-created (at run-time) proc. If this form of 'assign' is used at compile-time, it is possible that it will fail even though the types are correct - this can happen because the "Proc/Proc_t" is referencing a proc which does not yet have its run-time representation. This check requires full run-time type information and so is not (yet) supported with Zed native code.
expression is of an 'interface' type, and the explicit type is of a 'capsule' type which 'implement's the 'interface' type - the body is executed if the value is actually of that 'capsule' type. This test can be used to differentiate among the implementing 'capsule's of an 'interface'. This check requires full run-time type information and so is not (yet) supported with Zed native code.
expression is of a 'capsule' type and the explicit type is also of a 'capsule' type. The explicit 'capsule' type must be one which 'extend's, either directly or indirectly, the 'capsule' type of the expression. A run-time check is done to verify that the provided value is actually of that explicit type. This test is similar to "downcasting" in other programming languages. Note that the value cannot be of a 'capsule' type which extends the explicit 'capsule' type - this differs from typical "downcasting". In that situation, the value can simply be assigned to a variable of the explicit capsule type - that check can be done at compile time.
expression is of an 'interface' type and the explicit type is also of an 'interface' type. The explicit 'interface' type must be one which 'extend's, either directly or indirectly, the 'interface' type of the expression. A run-time check is done to verify that the provided value is actually of that explicit type. See immediately above for notes.
expression is of type "Exec/Exec_t" and the type is some specific 'template' type - the body is executed if the value is an "Exec_t" whose "ex_type" is of the templated type. This test can be used in compile-time code using templates, to check that an "Exec_t" is usable in some context. Storage flag testing matches that described for assignment statements. However, beware that some entities that can be used as an expression are inherently 'con'; this doesn't matter in normal code, but can in a 'template' 'assign' - you will often need 'ro' (or 'con') after 'template' in the type. This check requires full run-time type information and so is not (yet) supported with Zed native code.
expression and variable are of path types - if the path subtypes are the same and the variable type contains 'nonNil' then this is a run-time lookup of a path for either a package or a persistent variable. If the expression is of type 'path void' and the variable has a specific path type, then this is a run-time check of the type and storage flags of the found persistent variable versus the declared path variable type. Persistence and paths are not yet defined for Zed native code.
otherwise, the expression must be assignment compatible with the specified type (or is implicitly compatible via 'con' or 'var'), and not known to be 'nonNil' - the body is executed if the expression value is 'nonNil'. This test is used to get a 'nonNil' value using a test in one place, so that further uses of the value do not need to test for 'nil' at run-time. This variant of 'assign' is by far the most common in typical code, and is supported for Zed native code.

Note that all tests except the last will get a run-time error if the value they are testing is 'nil'. The last form of tests can be used to safely test that.

The run-time type checking done when 'assign'ing from an 'any' or 'autoAny' checks for exact type match only - no naming will be skipped, unlike in assignment statements, etc. [This choice means that the implementation of this check requires only a simple comparison at run time, and does not require the presence of data structures needed to represent type naming.]

Similarly, the checking done when 'assign'ing from a "Proc/Proc_t" requires an exact match - no type renaming is allowed for, and formal parameter names must match. In addition, if the proc value is marked as in error, or as being from a capsule or generic, or is a compile-time proc of some kind, the 'assign' will report it as not matching.

If the storage flags for the new 'assign' variable do not include 'nilOk', then 'nonNil' will automatically be added. This is because the variable will always start off with a value which is not 'nil'. If the variable is declared using 'con', then storage flag 'con' is implicitly added.

Some example 'if's with 'assign' clauses:

    any nonNil a := getValueOfUnknownTrackedType();
    if assign string s := a then
        processStringValue(s);
    elif assign [] string sV := a then
        processStringVector(sV);
    else
        processUnknownValue(a);
    fi;

    if assign con pr := Proc/DefineProcEnd(@tdp) then
        if assign proc(uint a, b)uint nonNil p := pr then
            uint result := {p}(13, 0xff);
        else
            /* Proc is of wrong signature */
            /* No check for nil needed on "pr" here. */
            Fmt("Proc ", pr->pr_name, " has wrong signature");
        fi;
    else
        /* Proc creation failed */
    fi;

[It is not uncommon for the expression in a "nonNil" 'assign' to be a simple variable or formal parameter. In that case, the construct can look like:

    if assign thingNN := thing then
        ... use "thingNN"
    fi;

It is tempting to change the language to make that usage nicer. One thought is to use something like "if nonNil thing then", and then in the body of the 'if' "thing" is 'nonNil'. Does the new "thing" occupy different storage than the old one? If so, then assignments to it won't affect the "outer" one, unless the semantics is that the value is implicitly copied back after the 'if'. If not, then what if there is an existing '@' of "thing", created before the "if nonNil" construct - it can change the value to 'nil'? Also, the syntax above *requires* a proc local or formal parameter, and those are less common than expressions. So, this idea has not been persued further.]

Run-time tests using an 'assign' from "Exec/Exec_t" can be tricky. One of the common issues is that literals are 'con', thus an "Exec_t" for such cannot be assigned to a template variable that does not have that needed 'con'. This is because a template value without 'con' (or 'ro') is something that can be used as an assignment destination, and assigning to 'con' and 'ro' things cannot be allowed. In other words, storage flags representing the nature of the "Exec_t" must be compatible with the storage flags of the 'template' type.

Template 'assign's also do not allow implicit conversion like regular assignments do. This is because template 'assign's do not make copies of the value - they only make a copy of the reference to the compiler representation of the value.

This can be a nuisance when dealing with the various integral types, which are usually assignment compatible. One simple way to deal with this is to wrap a call to "Exec/ToUintNew" around the "Exec_t", if the type of the "Exec_t" is not already 'uint'. For example:

    assert assign template uint tU :=
        if exSrc->ex_type = uint then
            exSrc
        else
            Exec/ToUintNew(pctx, exSrc)
        fi;

The explicit 'toUint' here does the same thing as the implicit one active for assignment statements. Similar handling is needed when mixing 'char' and 'string'.

6.6.2 "select"

A 'select' clause consists of:

'select'
optional storage flags
name for new variable
':='
name of proc formal or local of a variant record type
'->'
field name of a variant field in the variant record type

Note that 'select' clauses do not include a specified type for the new variable. This is because the type is taken from the variant being selected. Note also that it is a field name that is selected, not a variant selector.

When values of variant record types are being used, 'case' statements based on the variant tag are often used to deal with the various alternatives in the variant record. However, sometimes only one or two variants out of many need to be handled in some situation. The 'select' clause can be used for that purpose.

'select' clauses do not require additional runtime support like the runtime type tests possible with 'assign' clauses. They operate using only the explicit stored variant tags in variant records and thus can be considered as "cheap" and "free".

The body of the alternative is executed, with the new variable defined and initialized, when the active variant of the expression is of the alternative named in the 'select'. Note that assignments to the 'select' variable do not affect the variant in the record itself - the 'select' variable is a copy of the variant value.

Some example 'if' 'select' clauses:

    record MyVar_t {
        case MyVarKind_t mv_kind
        incase mvk_tag:
            string mv_tag;
        incase mvk_loop:
            MyVar_t nonNil mv_loop;
        incase mvk_other1:
            ...
        incase mvk_other2:
            ...
        incase mvk_other3:
            ...
        incase mvk_other4:
            ...
        esac;
    };
    ...
    MyVar_t nonNil mv := yieldMyVariant();
    if select s := mv->mv_tag then
        displayStringNicely(s);
    elif select con nilOk mv2 := mv->mv_loop then
        processOtherMyVar(mv2);
    else
        processAllOtherVariants(mv);
    fi;

If no explicit storage flags are specified for the new variable, then it will get the 'con', 'ro', 'volatile' and 'nonNil' flags from the variant field. Usually only an explicit 'nilOk' is specified, if anything. Note that you cannot add 'nonNil' to the variable since if the variant itself is not 'nonNil', a separate test is required for that, and it should be explicit. In the unlikely event that the variant is specified as 'volatile', and explicit storage flags are given, the 'volatile' must be present.

Note that the syntax of a 'select' clause specifies that the variant record reference can only be a proc formal or local. This is required because otherwise it sometimes is not clear whether a '->' is part of a 'select' clause or part of a record reference expression. Since a 'select' clause declares a new variable to hold the selected variant, and only references the base variant record reference once, it is not necessary to restrict assignments to that base reference within the scope of the 'select' clause.

[It would be possible to generalize the use of 'assign' and 'select' clauses, so that they can be considered as boolean expressions, and so their use in 'if' and 'assert' are not special situations. For example, there are situations in which 'assign' would be useful as part of the condition in a 'while' loop. Some situations could benefit from the 'and' of multiple 'assign's. Note, however, that the declared variables must become invalid (go out of scope) at the end of the current branch of an 'or' since their branch of the 'or' might not be evaluated, and so their variables would not be defined. Describing such situations fully would be difficult. Implementing this would also be difficult as it would require either the creation of many extra scopes (most of which would not be needed), or some mechanism to modify the contents of a scope at the various 'or' merge points. Perhaps it would be enough to simply allow an 'and' sequence of 'assign'/'select', and to handle such in a 'while'.]

The order of evaluation of 'if' conditionals and clauses is the order in which they appear - the compiler is not allowed to re-order the tests and clauses unless it is certain that no possible execution situation can notice the re-ordering. The compiler can, however, extract common portions of conditions to before the entire 'if', so that they are not needlessly re-evaluated. If this is a problem because of side effects inside those conditions, then the conditions must be put inside a 'strict'.

Note that 'con' and 'var' (and 'nonNil', but it isn't as confusing) are both storage flags and parts of language constructs. This can lead to some confusion. For example, the statements:

    assert assign con spel := spelParam;
    assert select con spelVec := spel->spel_list;

look to have identical use of 'con' but do not. In the first, the 'con' is part of the 'assign' clause. In the second, the 'con' is a storage flag applied to "spelVec".

6.6.3 "_IfBytecode_"

This construct is mentioned here for completeness - it is not expected that any programmers will have a use for it. Syntactically it is: '_IfBytecode_' 'then' <body> <optional 'else' and <else-body>> 'fi'. <body> and <else-body> (if present) must yield 'void'. When this construct is compiled for bytecode execution only the <body> has code generated for it - any <else-body> is ignored. When this construct is compiled for native code execution, <body> is ignored, and any <else-body> has code generated for it as normal. This construct is used for situations in the Zed compiler that are dictated by the differing execution environments and needs of bytecode execution versus native code execution.

6.7 Assert Statement

The basic 'assert' statement in Zed consists of 'assert' followed by an expression of type 'bool'. The expression is evaluated, and if it yields 'false' then program execution is terminated. Programmers can use 'assert's to check their assumptions about the data in their programs, and to communicate their understanding of those assumptions to other readers.

If the Zed compiler can determine that the condition of an 'assert' is always 'true', it is not required to produce a run time representation of the 'assert', i.e. it can skip it. 'assert's can assist the compiler in optimizing program execution. The compiler knows that immediately after an 'assert', the condition of the 'assert' is 'true'. In loops, it is even possible that the compiler can use that knowledge to determine that the 'assert' is in fact not needed.

[I hope in future to implement 'assert's involving the fields of records, structs, etc. so that carefully chosen assertions about them can allow the compiler to eliminate run-time checks.]

There are 'assert' versions of 'assign' and 'select' clauses. Syntactically, they consist of 'assert' in front of the clause instead of 'if' or 'elif', and nothing after the initialization value. In these forms, the variables declared are added to the existing scope, and are valid until the end of that scope. Program execution is terminated if the 'assign' or 'select' assignment fails, in the same ways as described under "6.6 Conditionals".

Example 'assert's:

    assert count ~= 0;

    assert assign string nonNil s := getName();

    assert select mv2 := mv->mv_loop;

'assert' 'select' is a useful short form of an 'if' 'select' or a 'case', where all variants other than the chosen one are errors. It is shorter, and does not require an additional level of indentation. If the 'assert' 'select' above were done as a 'case' it would be:

    case mv->mv_kind
    incase mvk_loop:
        MyVar_t nonNil mv2 := mv->mv_loop;
        ... code using mv2 ...
    default:
        abort "mv kind is not mvk_loop!";
    esac;

A common use of 'assert' 'assign' is to get a 'nonNil' value from one which cannot be marked 'nonNil' even though it is guaranteed to not be 'nil' by design of the program. This can be when the source of the value is a struct, array or matrix. Values marked as 'nonNil' are often needed as parameters to procs, for example. For an even shorter method of doing this see "8.4 "nonNil"".

It is also possible to put 'assert's with 'bool' conditions directly at the package level. In that situation they are evaluated at compile time, and are useful for making sure that assumptions among constant values hold. The expression given must be evaluable at compile time. One example is that of making sure that a string literal contains the same number of characters as an enumeration has elements, thus matching one-for-one.

6.8 Abort Statement

The 'abort' statement in Zed consists of 'abort' followed by an expression of type 'string'. Program execution is unconditionally terminated. The string is typically displayed as part of the termination indication. Depending on context and implementation details, an execution traceback might be produced, or a debugger might be invoked.

Example 'abort's:

    abort "name is invalid";

    abort "length " + FmtS(len) + " is too large, limit " + FmtS(LIMIT);

Any code directly following an 'abort' is unreachable, and the compiler will warn about it.

6.9 While Statement

The 'while' statement is one of two looping constructs in Zed. It executes its condition and, if it yields 'true', executes its body, then goes back and starts with the condition again. In Zed, the 'while' condition can be a multi-statement sequence, where the last element in the sequence is the condition. Declarations done in the condition sequence extend through the body - the entire 'while' statement is a scope. With multi-statement blocks used as the condition, some 'while' loops can have an empty body.

Syntactically, a 'while' loop consists of:

'while'
optional statements followed by semicolon
condition expression
'do'
optional body statements
'od'

Example 'while' statements:

    sint i := -5;
    while i < +5 do
        doSomethingWith(i);
        i := i + 1;
    od;

    while
        <declare variables>
        <processing>
        conditionIsTrue
    do
        <more processing, variables still active>
    od;

    while
        MyUnit_t mu := getNextUnit();
        processUnit(mu);
        not lastUnit(mu)
    do
    od;

If the condition for a 'while' statement can be determined at compile time to be equivalent to 'true', then any code after the 'while' statement is not reachable, and the compiler will warn about it.

A note for writers of code generators, and programmers trying to understand disassemblies of code generated by Zed: the system-level initialization and freeing of variables declared inside loops (and this includes 'for' loops) can be done outside of the loop itself, so that they only get executed once, even though the loop body, and thus any user initialization, might be executed many times.

6.10 Case Construct

Like the 'if' construct, the 'case' construct can have multiple alternate branches, and can be either a statement or an expression. The determination of the result type of a 'case' expression is done the same as that for an 'if' expression.

[The 'case' construct is not written using "switch", as C uses, in order to make it clearer that 'case' does not fall-through from one case to the next by default, like C's "switch" does. With the "switch" semantics, the name reflects switching into a single big chunk of code, with no implicit switch back out. The use of 'case' and 'incase' in Zed tries to make the difference harder to miss. My guess is that C used "switch" for a similar reason - to make it clearer that the fall-through is the default, unlike with earlier "case" constructs in Algol languages.]

Where the 'if' construct chooses an alternative based on one or more independent conditions, the 'case' construct chooses one of its alternatives based on multiple possible values of a selector value. The selector value can be 'uint', "bitsXX" or of an enumeration, oneof or variant record selector type. In the latter cases, if not all elements of the enumeration, oneof or record selector type are accounted for in the 'case', and the 'case' does not have a 'default', level 1 warnings will be produced for each such element. If the 'case' is an expression and the selector is 'uint', "bitsXX" or 'oneof', then it must have a 'default', since arbitrary values are possible with those types of selector. Similarly, a 'default' is required on a 'case' expression if not all enumeration or record selector tags are present.

Syntactically, a 'case' construct consists of:

'case'
selector expression
zero or more alternative branches
optional 'default' part
'esac'

Each alternative branch consists of:

one or more index value specifications
alternative body

Each index value specification consists of:

'incase'
index value - constant or tag
optional '..' and range end index value
':'

A 'default' part consists of:

'default'
':'
alternative body

All index values must be compatible with the type of the selector expression.

Each 'case' alternative body is a scope.

When the selector expression is a record variant selector taken directly from a variant record which is a proc formal or proc local, special rules allow access to the variant fields of that record. See "4.10.1 Variant Records".

If an index range is given with an alternative, then that alternative body is executed if the selector value is anywhere within the index range. In such a range, the range end value must be strictly greater than the range beginning value. As usual, the values must be type compatible with the selector expression. Note that if two or more index value specifications are given in a row, they share the same alternative body. If an empty alternative body is desired in a 'case' statement, then it must be done explicitly, using a semicolon.

Example 'case' constructs (see above link for variant records):

    case getUint()
    incase 0:
        Fmt("Got 0 - stopping here");
    incase 1:
        Fmt("Got 1 - proceeding");
        handleOne();
    incase 2:
        Fmt("Got 2 - recursing");
        recursiveCall();
    esac;

    enum MyEnum_t {
        me_first,
        me_second,
        me_third
    };
    ...
    float size :=
        case myEnumExpr
        incase me_first:
            0.0
        incase me_second:
            computeSeconder()
        incase me_third:
            1000.0
        esac;

    case getNextChar(...)
    incase "A" .. "Z":
    incase "a" .. "z":
        handleLetter(...);
    incase "0" .. "9":
        handleDigit(...);
    default:
        handlePunctuation(...);
    esac;

6.11 For Statement

The 'for' statement is the other looping construct in Zed, the first being the 'while' statement. There are three variants of the 'for' loop with related syntax. The first variant loops a variable over a set of unsigned integral values - these are called "counting 'for' loops". The second variant loops a variable through a sequence of 'nonNil' values - these are called "nonNil 'for' loops". The third variant contains a 'while' clause to explicitly control when the loop terminates - these are called "general 'for' loops". General 'for' loops can be either counting loops (with signed values allowed) or 'nonNil' loops.

The syntax of a counting 'for' loop is:

'for'
name for iterator variable
'from'
initial value expression for iterator variable
'upto' or 'downto'
limit expression
'do'
optional loop body sequence
'od'

The syntax of a nonNil 'for' loop is:

'for'
name for iterator variable
'from'
initial value expression for iterator variable
'then'
'*' or expression yielding next value
'do'
optional loop body sequence
'od'

The syntax of a general 'for' loop is:

'for'
name for iterator variable
'from'
initial value expression for iterator variable
'while'
termination test expression
'then'
'*' or expression yielding next value
'do'
optional loop body sequence
'od'

The type of the initial and limit values for a counting 'for' must be 'uint', "bitsXX", 'char', an enumeration or record variant selector type, or a 'uint' with units type. The type for a 'nonNil' 'for' must be one for which 'nil' is a valid value, i.e. record types, pointer types, proc types, matrix types, template types, interface types, capsule types, path types, non-'@' generic parameter types, 'nilOk' '@' types, 'string', 'any', 'autoAny' or 'poly'. Any scalar type can be used with a general 'for' loop, i.e. types other than array and struct types.

The iterator variable defined in a 'for' loop is valid throughout the 'for' construct, except in the initialization expression, and, for counting 'for' loops, in the limit expression. It's type is the type of the initial value. It's storage flags are 'con' for counting and general loops, and 'con' and 'nonNil' for 'nonNil' loops. The value cannot be changed by the programmer, and will always hold a 'nonNil' value in a 'nonNil' loop. The 'con' attribute should be noted, as it affects the type of '@' of the iterator variable, as well as its type when used as a parameter to a compile-time proc with a 'template' formal parameter.

The 'for' loop body sequence is a scope, with the special considerations for the 'for' iterator variable just given.

A counting 'for' loop using 'upto' will sequentially assign values from the initialization value upto and including the limit value to the iterator variable, and execute the loop body for each value. A counting 'for' loop using 'downto' will sequentially assign values from the initialization value downto and including the limit value to the iterator variable, and execute the loop body for each value. The order of evaluation of the initialization and limit expressions is not defined, but each will be evaluated exactly once.

A 'nonNil' 'for' loop assigns the initial value to the iterator variable, and if it is 'nonNil' or is not 'nil', it executes the body. After that initial body execution, the 'then' expression is evaluated, and if the value is not 'nil', it is assigned to the iterator variable and the loop body is again executed. Execution continues until the 'then' expression returns 'nil'. It is usual, but not required, for the 'then' expression to reference the iterator variable. If the 'then' expression is given as '*', then the 'from' initialization expression is re-used.

In general 'for' loops, the initial value is evaluated and assigned to the 'for' variable. Next, the 'while' test is evaluated, and if it yields 'true', the body of the loop is executed. After the body is executed, the 'then' step expression is evaluated and that new value is assigned to the 'for' variable. Execution continues with the 'while' test being checked again, etc. With 'nonNil' general loops, tests for 'nonNil' or 'nil' are done as described in the preceeding paragraph. As with 'nonNil' loops, if the 'then' expression is given as '*', then the 'from' expression is re-used.

Example 'for' loops:

    for i from 1 upto 10 do
        Fmt("i = ", i);
    od;

    for ch from "a" upto "z" do
        showNamesStartingWith(ch);
    od;

    for index from MAX_VALUE - 100 downto 0 do
        Fmt("index = ", index);
    od;

    record MyList_t {
        MyList_t ml_next;
        string ml_name;
    };
    ...
    MyList_t list := getList();
    for ml from list then ml->ml_next do
        /* "ml" is nonNil, so check for that is not needed. */
        Fmt("Got name \"", ml->ml_name, "\"");
    od;

    [,] float mat := ...
    for i from 0 upto getBound(mat, 0) - 1 do
        for j from 0 upto getBound(mat, 1) - 1 do
            mat[i, j] := mat[i, j] * 2.;
        od;
    od;

    for size from 1.0 while size < 1.e10 then size * 2. do
        tryFit(size);
    od;

    for s from +27 while s >= 0 then s - 3 do
        Fmt(" s = ", s);
    od;

    for l from head while l->l_this < 7 then l->l_next do
        Fmt("Value is ", l->l_this);
    od;

    for l from MyAtList.SLGetHead(@head) then * do
        Fmt("List element is ", l@.l_this);
    od;

The third-last example shows how to count by something other than 1, in this case counting downwards by 3. That example also shows one way to force the iterator variable to be of type 'sint'. Note, in the last example, that the "Lists" generic proc "SLGetHead" removes the yielded element from the list.

Since Zed 'for' loops make the iterator variable 'con' within the loop body, the iterator variable cannot be changed in the loop body. This means that only the header portion of the 'for' loop needs to be examined in order to determine what values the iterator variable can take. The nature of the 'nonNil' 'for' loop (either the simple form or the 'nonNil' form of a general for loop) allows the compiler to make the iterator variable 'nonNil'. If there were no 'nonNil' loops, determining that the iterator variable can be 'nonNil' would require the compiler to examine the semantics of a general 'for' loop, which could be difficult and time-consuming. The fact that a counting 'for' evaluates the limit expression only once means that even a simple non-optimizing compiler can generate good code for it.

Note that the expression in a 'while' clause is just a simple expression - the syntax does not allow statements before the final expression. If that rare situation is needed, explicit 'begin'/'end' "18.9.9 Scope Blocks" can be used.

When the 'then' expression is given as '*', the 'from' expression is re-used as the 'then' expression. Typically this means that code generators will arrange things so that the code generated for that expression is re-used by having it be part of the code executed each time through the loop, but that is not required. This usage can be useful when using "generators", which are designed to yield a sequence of values on successive calls.

6.12 Return Statement

A proc which returns a value must have that value as the last element in the sequence of statements that is the proc body. Procs which do not return a value (the return type is given as 'void') just drop off the end of their body. Both forms sometimes need to return from the proc from somewhere other than the end of their body. This can be done using the 'return' statement. When a 'return' is executed, the containing proc is immediately exited, and if the proc needs a result, the value on the 'return' statement is used.

For 'void' procs, the 'return' is a complete statement by itself. For procs which return a value, the 'return' must be followed by an expression which gives the value to be returned from the proc. That value must be compatible with the specified return type of the proc.

Example 'return' statements:

    proc
    doSomething(string a, b, c)void:
        while a ~= b do
            if a = "" then
                /* Something is wrong. */
                return;
            fi;
            a := newString(a, c);
        od;
    corp;

    proc
    computeAnswer([,,,,] float input)float:
        ...
        if cannotComputeAnswer then
            return 0.0;
        fi;
        float theAnswer;
        ...
        theAnswer
    corp;

Code after a 'return' statement is unreachable. This yields an awkward situation - consider the following code:

    proc
    yieldSomething(uint a, b, c)uint:
        ...
        uint newVar :=
            if <condition> then
                return c;
            else
                c + 4
            fi;
        newVar + b
    corp;

Note that there is no value for the 'if' expression in the 'true' branch. This could work because that branch contains a 'return', and code after a 'return' is not reachable. The Zed compiler will not allow this situation, complaining that you cannot return from inside a partial expression.

6.13 Eval Statement

The 'eval' statement is a simple "throw away this value" statement. It is most often used with proc calls, where the proc returns some value, but the calling context does not need that value - it is calling the proc for its side effects. For example, a proc might return a value indicating an error, but the calling context is already handling a failure situation and does not care about any further error - it is just doing its best to clean up. The 'eval' statement consists of 'eval' followed by an expression which is to be evaluated as normal, but whose result value is to be discarded. Eval is similar to the "(void)" cast in C.

Example 'eval' statements:

    proc
    CloseWindow(Window_t w)bool:
        if w ~= nil then
            ...
            true
        else
            reportError("nil window passed to CloseWindow");
            false
        fi
    corp;
    ...
        eval CloseWindow(myWindow);

        /* Strange way to do a sequence of conditionals: */
        eval doThing1() and doThing2() and doThing3();

The other use of the 'eval' reserved word is in a package-level 'eval', which calls a proc at compile time. These are discussed later, under "18.2 Package-Level "eval"".

7 String and Matrix Operations

Several aspects of strings and string operations have already been discussed. This section brings all of the information relevant to strings together in one place. Also discussed here is the 'matrix' construct.

7.1 String Operations

Long string constants can be written on multiple lines - simply end each segment with a closing quotation mark, and the compiler will build the long string constant out of the multiple sections. Comments are allowed between such sections of string literals.

Strings can be concatenated using the '+' operator. Strings and character values can be concatenated in the same way, and pairs of character values can be concatenated to give strings. String concatenation allows either or both operands to be of type "[] char".

When comparing strings, the usual comparison operators compare the sequences of characters within the strings. The additional comparison operators for strings are '==' and '~=='. These operators compare the internal string pointers, to see if the values are the very same string, internally. With these operators, it is possible to compare against 'nil', which is needed because strings, like other tracked values, can be 'nil'. It is possible for "a = b" to be true when "a == b" is false, but it is not possible for "a = b" to be false when "a == b" is true.

Zed will automatically convert 'char' values to strings when needed. This includes in parameter passing, assignment and concatenation. The only situation in which a programmer is likely to notice that a single character literal is actually of type 'char' is when the programmer is writing compile-time code. Such code might need to be aware, and handle 'char' values where 'string' values are otherwise needed.

Strings can be indexed like vectors (one-dimensional matrixes) to yield individual characters. For example, if variable "s1" has value "abcdef", then the expression "s1[3]" will yield "d", which will have type 'char'. Note that string indexing starts with index 0, just like matrix indexing.

Values of type "[] char" can be used as strings, including in assignments, parameter passing and constructors - they will be copied into a new string value for that use. However, strings cannot be used as values of type "[] char". This is because the Zed language treats strings as immutable - they cannot be changed. Assignment to the characters in a string is not allowed, but assignment to the 'char' elements of a vector of characters is allowed. When a character is indexed within a string constant, by a constant, the result is a non-assignable value of type 'char'. Thus you cannot take '@' of a character in a string constant if the index is a constant expression. Otherwise, '@' of a character in a string results in a value of type "@ ro char". Converting a "[] char" to 'string' can be done by taking a substring of it consisting of the entire string (range "0 upto '*'").

Two variations of the indexing syntax are used to take substrings of strings. The first syntax consists of:

string expression
'['
start expression
'upto'
end expression
']'

With this form of substringing, the substring will start at the position given by the start expression (again, 0 origin) and end at the position given by the end expression. The start and end values can be the same, in which case the substring will be of length one (it will still be of type 'string', however). It is an error if the end expression is less than the start expression or if either expression is beyond the length of the string. The end expression can be given as '*', which means "the last position".

The second substringing syntax is:

string expression
'['
start expression
'for'
count expression
']'

In this form, the length of the desired substring is specified. A length of 0 is allowed - that yields an empty string. It is an error if the start expression is beyond the length of the string or if the start expression plus the count expression is beyond the length of the string. The count expression can be given as '*', which means "whatever count is needed to get to the end of the string".

Vectors (1-dimensional matrixes) of 'char' can have substring operations applied to them. The result will be of type 'string'.

Some example string manipulations:

    string HELLO = "Hello there world", GOODBYE = "Goodbye cruel world!",
        LONG =
    "Now is the time for all good men to come to the aid of their party. The "
    "quick red fox jumps over the lazy brown dog. Anyone who took old-style "
    "typing exercises will recognize those sentences.";
    ...
    string nonNil evenLonger := HELLO + LONG + GOODBYE;
    string nonNil str := "";
    for i from 1 upto 10 do
        str := str + (i + "a");
    od;
    string thereCruel := evenLonger[6 for 5] + GOODBYE[8 upto 12];
    string s := getString();
    if s ~== nil then
        s := s + str;
    else
        s := thereCruel[5 upto *];
    fi;

There are several standard library routines in Zed that deal with strings.

7.2 One Dimensional char Arrays

Strings are very convenient to use, but they require more resources, in terms of both CPU usage and memory usage, than one-dimensional arrays of 'char'. Note: this discussion is about arrays, whose size is fixed at compile time, not about vectors (matrixes), whose size is determined at run time. These differences are decreased if the array size defined for storing "string" values is large, and the typical value stored is small. The traditional way of storing string-like values (e.g. names, parts of addresses, etc.) in databases is to use arrays of characters, padded to the full size of their array with spaces. It is possible for databases to store actual string values using other representations. They can also store a "used length" so that large arrays do not actually require that the padding be stored.

To simplify discussion, one-dimensional arrays of 'char' will be called "char arrays". Zed databases ("22.4 Databases") allow strings to be stored directly. However, the Zed language also makes it easy to use space padded char arrays in string-like ways. In assignments, constructors, initializers, etc. string constants (whether literals, defined constants or constant expressions) can be used as the value for char arrays. At compile-time, the Zed compiler will truncate or pad the string value as needed, in order to produce the needed array constant. By default, a warning is issued if the string value is longer than the declared size of the destination array.

There are situations where an explicit initializer with a list of 'char' literals is actually preferred. For example:

    [26] char Alpha1 = [
        "a", "b", "c", "d", "e", "f", "f", "h", "i", "j", "k", "l", "m",
        "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"];
    [26] char Alpha2 = "abcdefghijklmnopqrstuwxyz";

The second form is certainly easier to read. But, did you notice that "v" is missing? That will make indexed values "v" to "z" silently yield incorrect results. In the first form, the compiler will complain if there are too few or too many initializing characters given.

Run-time conversions between string values and char arrays will automatically be created as needed. With these run-time conversions, no warning is possible, and no run-time error will occur if the string must be truncated. When converting from a char array to a string, any trailing spaces (padding) will be left out. Note that because these are run-time operations, it is not possible to use a non-constant string expression to initialize a char array in a constant initializer (where the initialization is specified with '=' rather than with ':=').

Indexing a char array will produce a value of type 'char'. Note that no checking is done to see if the index value is into any space padding - if that is desired, then assign the char array to a 'string' destination first. Explicit conversion to 'string' is also required to take a substring of a char array. Zed does not allow direct comparison of array (or struct) values, so the explicit conversion to a 'string' is also required when comparing char arrays.

Example char array manipulations:

    [50] char name;
    [3][50] char address := ["<House>", "<Street>", "<City>"];
    [100] char description;

    name := "Fred Flintstone";
    string a1 := address[1];
    if a1 = "<Street>" then
        address[1] := "Groat Road";
    fi;

    description := "|" + description + "|";
    description := description + getNewDescString();

7.3 Matrix Operations

Matrix values are created dynamically at run time using the matrix constructor construct. It's syntax is:

'matrix'
'('
'['
list of bound expressions separated by commas
']'
element type for matrix
')'

Matrix types with only one dimension are referred to as vectors. As described in "4.6 Matrix Types", named vector types can have constructors.

The newly allocated and initialized matrix is returned from the construct. The returned value is 'nonNil'. The type of the value returned from a matrix constructor does not have any storage flags. See "4.6 Matrix Types" for information on matrix type compatibility. See "8.1 "getBound"" for information on retrieving the bounds of matrix values.

8 Miscellaneous Expression and Statement Elements

8.1 "getBound"

The 'getBound' construct is used to get the bound (element count) of one dimension of an array or matrix value or to get the character count of a 'string' value. Syntactically, it consists of:

'getBound'
'('
array, matrix or string expression, or persistent vector variable name
optional ',' and bound selector
')'

If the array or matrix is one dimensional (has only one bound), then the bound selector should not be given. Otherwise, the bound selector is a 'uint' expression which selects which bound of the array or matrix is desired. The bound selection value is 0 for the first bound, 1 for the second, etc.

Getting a bound of an array is mostly useful when the array was declared with '*' for its (only) bound. In that situation, the actual value for the bound is determined by the number of initializers that were given in the array variable declaration. Note that if a '*' bound is used, it must be the only bound. A bound from an array is a constant expression, known at compile time. Correspondingly, the bound selector for an array must be a constant expression.

When applied to a persistent vector, the current number of elements in that persistent vector is returned.

When applied to a matrix value, the corresponding bound that was given when the matrix value was created is returned.

When applied to a string value, the returned value is the length in characters of the string. The length of a string literal or string constant expession is a constant expression, known at compile time.

The result of a 'getBound' is of type 'uint', unless it is of a persistent vector, in which case it is of type 'bits64'.

8.2 "toUint" and "fromUint"

The 'toUint' and 'fromUint' constructs are used to convert values to and from type 'uint'. The 'toUint' construct consists of:

'toUint'
'('
value to be converted
')'

Any value except compound values (struct, union and array values) can be converted to 'uint'. If the value is 'float', then the bit pattern of the floating point value is re-interpreted as a 'uint'. This differs from in C, where a cast from float to int does a conversion from float to int. See "8.6 "flt", "round" and "trunc"" for that kind of conversion in Zed. Similarily, 'sint' values are converted with no change in the bit pattern. Values which are pointers, tracked values or '@' values are converted with no change to the bit-pattern, just like a cast of pointer to int in C. All other values (scalar values) are converted with no change, although in some cases high-order 0 bits might need to be added to produce a value that is large enough for 'uint'.

'toUint' can be used to convert pointers, etc. for printout. Note that 'toUint' on a 'float' does not yield a value that is meaningfull to most programmers - use the "Fmt" package to convert floating point values to human-readable forms. If the value to be converted has a unit type, then that unit information is passed through to the 'uint' result. Explicitly converting a bitsXX value to 'uint' is useful only in rare situations, typically involving templates.

The 'fromUint' construct is more constrained. It consists of:

'fromUint'
'('
value to be converted
','
type to convert to
')'

'fromUint' can only convert to the following types (or a single rename of one of them):

'char'
'sint'
'float'
'bits8'
'bits16'
'bits32'
'bits64'

As with 'toUint' the bit patterns are not changed. This is useful to the compiler itself, when it constructs 'float' values from input strings. The expression given to 'fromUint' must be of type 'uint', of a "bitsXX" type, or of type 'uint' with units, or a single rename of such a type. Note that no range checking is performed, not even for conversion to 'char' - any extra high order bits are silently discarded.

Any unit type on the argument value is discarded - the result type is exactly the specified type. Explicitly converting from 'uint' to a bitsXX type is only useful in certain rare situations, typically involving templates.

8.3 "unit"

See "12.5 The "unit" Construct"

8.4 "nonNil"

The 'nonNil' construct is a convenient way to assert that a value is not 'nil'. It consists of:

'nonNil'
'('
value to be constrained to be not nil
')'

The 'nonNil' construct checks, at run time, whether or not its argument is 'nil', aborting if it is. If the argument is not 'nil', it is yielded from the construct and nothing else is done. Using:

    nonNil(<xxx>)

is thus similar to:

    begin
        assert assign con __temp := <xxx>;
        __temp
    end

One difference is that with the 'nonNil' construct it is not obvious that an assert is possible. But, the 'nonNil' construct can be used directly inside an expression and as an argument to a proc call or constructor. Another aspect is that the 'nonNil' construct can be more efficient in that it does not actually introduce a new variable, and so does not have to pay the costs of initializing and terminating a tracked variable. Typically, 'nonNil' makes the most sense in situations where the nature of the code is such that the value is only going to be used once in the current context, and it is known that the value cannot be 'nil' but there is no way for the compiler to be sure of that.

As an example, consider using a simple vector of strings:

    proc
    acceptString(string nonNil str)bool:
        ...
    corp;

    [] string con nonNil vec := yieldVectorOfStrings();
    var countOfAccepted := 0;
    for i from 0 upto getBound(vec) - 1 do
        if acceptString(nonNil(vec[i])) then
            countOfAccepted := countOfAccepted + 1;
        fi;
    od;

In such situations it can be known that "yieldVectorOfStrings" fills in the returned vector with 'nonNil' strings, but there is no direct way to tell that to the compiler. Note that if optimizations can convince the compiler that the strings will not be 'nil' it is free to optimize out the checking done by 'nonNil'.

Keep in mind, however, that the Zed language specifies that all tracked values that it does not know are 'nonNil' must be checked at run-time before they can be dereferenced. For example, in:

    record MyRecord_t {
        uint mr_count;
        ...
    };

    ...

    MyRecord_t mr := getPossiblyNilMr(...);
    total := total + mr->mr_count;

it is known that the access of "mr->mr_count" requires checking that "mr" is not nil. Adding a 'nonNil' in such a situation does not change the semantics, and will likely result in exactly the same code being generated. Whether a 'nonNil' construct is used in such situations is up to the programmer. It is suggested that if the checking for 'nil' is unusual or important, then a full "assert assign ..." be used, to make it stand out, otherwise nothing be used.

[It would also be possible to add similar constructs to convert between 'uint' and 'sint', checking for in-range values and asserting if the values are out of range. I chose not to add these, however, since I don't think they would be used as much. Also, ignoring the possibility of units on the types, they can be done by simple procs.]

The 'nonNil' construct introduces a parsing issue to the Zed language which requires lookahead of one additional token. In an expression context, the sequence '@' 'nonNil' ... could start the '@' of something produced via a 'nonNil' construct, or it could start a directly specified '@' type value which has at least the 'nonNil' storage flag. E.g.

    @ uint au := @nonNil(uintVector)[which];
    Types/Type_t t := @ nonNil uint;

The standard Zed parser skips past the 'nonNil' and checks for the '('. A consequence of this is that you can't put the subtype of an '@' in this context inside parentheses. In a context where it is known that a type is needed, e.g. in a proc header, putting the subtype in parentheses is allowed. Since the system pretty-printer will remove such unneeded parentheses, this situation will not be further addressed.

Note that, from the above pseudocode for an example 'nonNil' construct, the construct yields a *value*. Thus a 'nonNil' construct cannot have its address taken. For example, given a vector of strings, one can '@' an element of the vector, as in:

    @ string aStr := @vec[i];

Here, '@' value "aStr" directly references the vector element, and so can be assigned through, thus modifying the vector element. But, if "aStr" is desired to be "@ nonNil string", the following will not work:

    @ nonNil string aNStr := @nonNil(vec[i]);

because "nonNil(vec[i])" is a value, not a location.

8.5 "byteSwap", "evenParity", "onesCount", "lowOneIndex" and "highOneIndex"

The constructs described here are low-level "bit fiddling" operations. The intent is that these operations specify things that CPU's often have native instructions to compute. It can be difficult, however, for a programmer to be sure that a compiler will actually use those instructions when desired. By having these operations as language constructs, the programmer intent is clear.

All of the constructs in this group have syntax:

operation name
'('
value to be operated on
')'

All of these operations take 'uint' or "bitsXX" values as arguments, or a single rename of those types. 'byteSwap' does not accept 'bits8' arguments. 'evenParity' yields a 'bool' result. All others yield a result which is the same type as their argument. All operations support compile-time evaluation.

'byteSwap' swaps the bytes ('bits8' chunks) of its argument, i.e. the sequence of 'bits8' chunks within the value is yielded reversed. 'evenParity' yields 'true' if its argument contains an even number of "1" bits, else yields 'false'. 'onesCount' yields the number of "1" bits in its argument. 'lowOneIndex' and 'highOneIndex' search for the least significant or most significant "1" bit in their argument, as appropriate. If there is no "1" bit (the value is 0), then the value yielded is undefined. [Yes, this was done to allow use of the X86 BSR and BSF instructions.] Bit indexes are 0 for the least significant bit, 1, for the next bit, etc. Thus, the maximum possible returned value depends on the size of the argument.

For this code:

    bits8 b8 := 0;
    bits16 b16 := 0x1234;
    bits32 b32 := 0x12345678;
    bits64 b64 := 0x1234567890abcdef;

    Fmt(byteSwap(b16), "  ", byteSwap(b32), "  ", byteSwap(b64));
    Fmt(evenParity(b8), "  ", evenParity(b32));
    Fmt(onesCount(b8) :: d, "  ", onesCount(b32) :: d);
    Fmt(lowOneIndex(b8) :: d, "  ", lowOneIndex(b32) :: d);
    Fmt(highOneIndex(b8) :: d, "  ", highOneIndex(b32) :: d);

the output is:

    0x3412  0x78563412  0xefcdab9078563412
    true  false
    0  13
    0  3
    0  28

The details of the bit operation done, such as byte swapping, is based on the type of the argument. Since the type of an integral literal in Zed is 'uint', using these bit operations on such a literal will usually do the wrong thing. For example, an IPV4 port number is a network-order (big endian) 16 bit value, so on a little-endian machine (e.g. X86) the value must be made into a 'bits16' constant or put into a variable before applying 'byteSwap' to it in order to get the proper value. E.g.

    bits16 SERVER_PORT = 12345;
    ...
    if I_AM_LITTLE_ENDIAN then
        sin.sin_port := byteSwap(SERVER_PORT);
    else
        sin.sin_port := SERVER_PORT;
    fi;

8.6 "flt", "round" and "trunc"

These constructs are used to convert between integral and 'float'. 'flt' takes a 'uint', 'sint' or "bitsXX" value and yields a 'float' value that is the closest approximation to the given value. Any unit information on the value type is passed through to the 'float' result.

The 'round' and 'trunc' constructs take 'float' values and convert them to 'sint'. Any unit information on the input types is preserved.

[These constructs currently do not check for any errors, and there is no Basic/ proc to check with. My current thinking is that there should be two modes of compilation with respect to floating point operations. One mode would do no checking not explicitly done via Basic/ calls, and the other mode would do all reasonable checking on all floating point operations. The speed difference would be quite large.]

8.7 "pretend"

The 'pretend' construct is Zed's privileged, low-level construct for doing things like pointer casting, converting numbers to pointers, etc. It will convert values of most scalar types to other scalar types. It will not, however, do conversions which can be done with 'toUint' and 'fromUint' - the programmer is forced to use those non-privileged constructs to show specifically that the operation is not privileged. 'pretend' will also not operate on 'void', 'nil', 'poly', array, struct, union or generic parameter values. 'pretend' will not convert to 'void', 'nil', 'poly', '@ void', '* void', 'uint', 'any', 'autoAny' or to an array, struct, union or generic type parameter.

Syntactically, a 'pretend' construct consists of:

'pretend'
'('
value to have its type changed
','
type to yield for the value
')'

Any 'nonNil' attribute of the provided value is copied to the result, where that makes sense.

8.8 "sizeof"

The 'sizeof' construct takes a type argument and yields the size in bytes of values or variables of that type. This isn't really needed in Zed, since type field "t_byteSize" provides the same information - in fact the 'sizeof' construct simply accesses that value.

Care must be taken when using 'sizeof' inside of generics. If the type whose size is being determined is a parameter to the generic or is a type declared within the generic which depends on a generic parameter, then the value yielded by 'sizeof' can depend on the instantiating types and values. If the 'sizeof' is inside code running at compile time, the values yielded might not be meaningful. Values can differ from instantiation to instantiation.

Syntactically, a 'sizeof' construct consists of:

'sizeof'
'('
type whose bytesize is to be yielded
')'

8.9 "strict"

The 'strict' construct can be either an expression:

'strict'
'('
expression to be strictly evaluated
')'

or a statement block:

'strict'
'begin'
statement sequence to be strictly executed
'end'

Within a 'strict' expression or block, statements and expressions are evaluated exactly as they are written. This is much like turning off optimization for such code. However, some optimizations (such as the use of registers for holding values instead of memory) are still valid. Optimizations such as expression re-arrangement and the extraction and use of common subexpressions are disabled. Some programs require this sort of explicit control over evaluation in order to achieve their desired results.

"Constant folding", the compile-time evaluation of expressions involving only constants, is turned off by 'strict', and so things like defining named constants whose value expression involves constant folding will not work. The evaluation of constant conditions (e.g. comparison of a named constant against another named constant) happens regardless of 'strict' - this is required for "conditional compilation" as required by this language description. If part of such a condition requires constant folding, then it will fail if 'strict' is active.

'strict' sections cannot be nested. There are no "unstrict" expressions or blocks.

[Help - I need better, or more detailed examples!]

One example is in dealing with floating point values which can be of very different magnitudes - the careful programmer can specify the operations in an order that does not lose precision, but an agressive optimizer in a compiler might rearrange the order of evaluation (e.g. to do more constant folding) and thus break the code. Another example is when dealing with integral expressions that might overflow. E.g. in:

    uint a, b, c;
    ...
    c := a + BIG - DELTA_FACTOR + b + BIG;

If "BIG" and "DELTA_FACTOR" are very large (e.g near or over half of the largest 'uint' value), a re-arrangement of the code to:

    c := a + b + (BIG + BIG - DELTA_FACTOR);

could result in a compile-time overflow when adding the two "BIG" values. In the orginal form, the programmer has explicitly subtracted "DELTA_FACTOR" to avoid that.

Writing this as:

    c := strict(a + BIG - DELTA_FACTOR) + b + BIG;

forces the evaluation of "a + BIG - DELTA_FACTOR" to happen as written.

Note that 'strict' does not affect the behaviour of procs called within the range of the 'strict' - it only affects code that is directly within its range. Code added within such a range by the use of compile time execution (e.g. template expansion) is affected, because such code is compiled within the range.

'strict' blocks are scopes, so names declared within them are not available outside of their range.

'strict' is like the proverbial "big stick" - it can disable a lot of things. Because of that, it is usually not the best answer. However, if a piece of code is not working as expected, and the programmer suspects some sort of optimization is the problem, putting the code inside 'strict' can quickly determine whether of not that is possible. By then using 'strict' around smaller and smaller parts of that code, the exact problem or problems can be pinpointed. Then, other, explicit, fixes such as using temporary variables to hold values that must be evaluated in a strict fashion, can be used as a permanent fix.

There can be unusual situations where the Zed compiler cannot maintain the ordering required by 'strict'. These occur when possible side affects in a later operand, actual parameter, etc. would invalidate an earlier one. In that situation, an error is issued, and the programmer must manually use local variables to hold the operands, etc., and then use those locals instead of the original expressions.

8.10 "private"

Like the 'strict' construct, the 'private' construct can be either an expression:

'private'
optional 'package'
'('
expression to be hidden
')'

or a statement block:

'private'
optional 'package'
'begin'
statement sequence to be hidden
'end'

Code within a 'private' section cannot be seen by code in any other package (other than Exec). This means that the system pretty-printer cannot show it, so any IDE tools will not be able to show or edit such code, unless the access rights of the tool includes access to the package containing the code containing the 'private' section. This can be used to hide "secret" values or code sequences while still having most of the code visible. For example, code for an AI in a game might wish to hide some details of its operation, while still having its overall nature public. Programmers should take note of the saying "security by obscurity doesn't work" and not attempt to use 'private' for any real security situations.

The optional 'package' flag on 'private' is only useful with "18 Compile Time Execution". Advanced programmers should consult the header comment for "Exec/PrivateBlockStart" and the example in "18.9.10 "private" "package" Example".

9 Interfaces and Capsules

9.1 Introduction

Interfaces and capsules in Zed correspond to interfaces and classes in Java. They are one of Zed's mechanisms for polymorphism - the ability of code to operate on multiple kinds of data without requiring detailed knowlege of the nature of the data elements it is working with. Capsules also provide object oriented programming.

Zed capsules do not support multiple inheritance. Using interface and capsule methods in Zed is more explicit than in most other languages. The selection of a method and its call is shown directly in the syntax, and the "self" formal parameter in methods is shown explicitly. However, the "self" parameter is not explicitly passed to method calls, since the correctness of method calls requires that it must be the same value as that from which the method was obtained. Zed does allow some minor additions to how these concepts can be used. The way in which capsules are defined in Zed is also much more explicit than in languages like C++.

Interfaces and capsules defined in Zed are types, and so should be named like other types are named, usually with a trailing "_t".

9.2 Interfaces

9.2.1 Interface Syntax

Syntactically, a global interface declaration, which can only appear at the package level, consists of:

optional visibility specification
'interface'
name for interface
zero or more of 'partial' and 'final'
optional interface 'extends' clause
'{'
one or more method specifications, separated by semicolons
'}'

If the braces and method specifications are not present, then this is an interface predeclaration. A predeclared interface must be fully defined later, and in that definition the name, the visibility specification, the 'partial' and 'final' flags, and any 'extends' clause must be the same as in the predeclaration.

Predeclared interfaces (and capsules) cannot be selected from since their contents are not yet known. However, their nature is known, so variables, fields, parameters, etc. of such types can be used. Also, the assignment compatibility rules based on 'extends' clauses are in effect.

An interface 'extends' clause consists of:

'extends'
path to interface being extended

In many situations, the interface being extended is in the same package as the new one being defined, so the path to it is just its name.

A method specification is just a proc header. These proc headers must have an initial formal parameter of type 'poly' - all actual method calls will pass in the value from which the actual proc was obtained. The method declarations are basically proc types, and so only storage flags 'nonNil' and 'nilOk' are meaningful for formal parameters. The initial 'poly' formal parameter is implicitly 'nonNil'. A 'nonNil' on a result type is relevant.

Every value of an interface type will have attached to it a vector of procs matching the full interface method set. If the interface is 'partial' some of the slots in that vector can contain 'nil'. 'poly' is a tracked type. Capsules which implement an interface will provide implementation procs for the methods of the interface.

All interface values are derived from capsule values, of capsules which implement the interface, either directly or through inheritance.

Here are two example interfaces from a GUI package:

    export interface ValueRange_t {
        proc SetValue(poly vr; uint newValue)void;
        proc SetMin(poly vr; uint newMin)void;
        proc SetMax(poly vr; uint newMax)void;
    };

    export interface Container_t {
        proc HasChild(poly cont; Widget_t wid)bool;
        proc TakeChild(poly cont; Widget_t wid)void;
        proc LetChild(poly cont; Widget_t wid)void;

        /* ChildWantsResize - return 'true' if a resize event will be coming,
           else return 'false'. */
        proc ChildWantsResize(poly cont; Widget_t wid)bool;

        proc ActivateTree(poly cont)void;
        proc ShowAll(poly cont)void;

        /* Things that implement Container_t are presumed to inherit from
           Window_t. */
        proc GetWindow(poly cont)Window_t;
    };

As mentioned in "6.5 Proc Calls", when the proc to be called might not be known until run time, the expression yielding the proc is placed inside braces. This is done with method calls, since the actual method implementation must be found based on the origin of the interface (or capsule) value that is being used. This is how polymorphism is accomplished with interfaces and capsules. The method is chosen using the '->' notation for field selection. So, if "ifcVar" is a variable of some interface type and "Method1" is the name of a method that the interface type contains, then a call to that method is done via:

    {ifcVar->Method1}(... other actual parameters ...)

Note that "ifcVar" is not passed as an explicit parameter to the method - the semantic safety of method calls requires that it not be possible to pass a different interface value to the method call than the method implementation was found from. Zed accomplishes this by passing that value implicitly. This "self" parameter is always the first formal parameter of any proc which implements a method. It is the one which has appeared with 'poly' type in the example interfaces. Because it is declared explicitly, it can have storage flags like 'volatile', and can be named as appropriate, rather than always being "this". [See later - this "self" parameter is implicitly 'nonNil' and not 'var'.]

9.2.3 Using "partial" Interfaces

If an interface is marked as 'partial' then capsules which 'implement' the interface are not required to provide implementing procs for all methods in the interface. If an interface is not marked 'partial', then all implementing capsules must provide implementations for all of the methods. The 'partial' flag only applies to methods which are defined directly in the new interface - methods inherited via 'extends' maintain the status they had when they were defined.

In Zed, it is possible to determine, at run-time, whether or not an actual interface value is from a capsule which implements a given method. In other words, a program, given a value of an 'interface' type, can determine if a given method is present. This is needed because implementing capsules of 'partial' interfaces do not need to provide all methods of the interface. The same syntax of the interface value expression, a '->' and the name of the method, can be used to reference the method implementation as a value of type '* void', which can then be compared against 'nil' to see if the value's capsule has implemented that method. When the syntax is used directly to call a method, special handling in the compiler knows the actual type of the method, and so allows the method call to proceed. If the interface in which the method is defined is not 'partial' then such a '* void' value will never be 'nil' and so is flagged as 'nonNil'. Values of methods defined in 'partial' interfaces are not 'nonNil'.

For example:

    export interface Int1_t partial {
        proc UseValue1(poly int1; uint i; float f)void;
        proc UseValue2(poly int1; string s)uint;
    };
    ...
    Int1_t nonNil int1 := getAnInt1FromSomewhere();
    if int1->UseValue1 ~= nil then
        {int1->UseValue1}(123, 3.14);
    else
        Fmt("Value does not implement UseValue1");
    fi;

[Unfortunately this two-step usage pattern can result in an information message about an implicit 'nil' check on the method call, when the Zed compiler's information level enables such checks. Hopefully the compiler can be improved to do enough value flow analysis to avoid the messages.]

If an interface is marked as 'final' then it cannot be extended. In reality, this aspect has little use other than as a note to the programmer and to readers.

9.2.4 Using Interface "extends" Clauses

The interfaces seen above are all independent of any other interfaces. However, an interface can be declared to "extend" some other interface. In that case the new interface "inherits" all the methods of the interface that it 'extends', and will then add new ones. A value of an interface type is a valid value of an interface type that the value's type 'extends', either directly or via a chain. This is because all of the extending interfaces will always include all of the methods of the extended interface (if there is a 'partial' present then some of the methods might not actually be present on some interface values). The reverse is not true, however - an interface that has been extended does not contain all of the methods of an interface that 'extends' it, and so is not a valid value of such a type. An 'assign' construct can be used to do a run-time check to go from an interface value to a value of an extending interface.

Building on the above example:

    interface Int2_t extends Int1_t {
        proc UseValue3(poly int2; char ch)void;
        proc UseValue4(poly int2; bool f1, f2)void;
    }
    ...
    Int2_t nonNil int2 := getInt2();
    if int2->UseValue2 ~= nil then
        {int2->UseValue2}("hello");
    else
        Fmt("Value does not implement UseValue2");
    fi;
    {int2->UseValue3}("Z");
    {int2->UseValue4}(true, false);

An interface can only extend one other interface. Multiple interfaces can extend a given smaller interface. Interface inheritance chains can be as long as needed.

There are no constructors for interfaces, nor any other way of directly producing an interface value. Values which can be used as interface values must originate as capsule values, of capsules that implement the interface.

A note on naming: an interface "B" which extends interace "A" will have one or more additional methods. [You get a warning if not.] Thus, "B" has more capabilities than "A", and so Zed uses the term "extends".

9.3 Capsules

9.3.1 Capsule Basics

Capsules in Zed correspond to classes in other programming languages. Capsules combine the fields of a record, an optional internal interface, and actual procs to implement that internal interface along with such an interface inherited via a capsule 'extends' and other (external) interfaces that the capsule 'implements'.

In terms of low-level implementation, objects of a capsule type consist of a dynamically allocated record which contains the capsule data fields, along with one or more references to small structures that contain references to the capsule's actual procs which implement all of the interfaces involved. Since each object of a given capsule type is associated with the same method implementation procs, these small structures of proc references are shared by all such objects.

A simple picture can clarify this arrangement:

    +-------------+  +---------------------------+
    | object A ref==>| v | fld1-A | fld2-A | ... |
    +-------------+  +-v-------------------------+
                       v
                       v  +-------------+
                       ==>| v | v | ... |
                       ^  +-v---v-------+
                       ^    v   v
                       ^    v   ===>method implementation proc #2
                       ^    v
                       ^    v======>method implementation proc #1
                       ^
                       ^
    +-------------+  +-^-------------------------+
    | object B ref==>| ^ | fld1-B | fld2-B | ... |
    +-------------+  +---------------------------+

Each capsule object contains space for all of its data fields, and also contains all needed references to proc reference structures (which are sometimes called "vtables"). Thus, all of the data, and access to the procs that operate on that data, are available within one allocated unit - they are encapsulated.

In programming languages like Java and C++, the matching of the procs within the class to the methods that the class needs to implement, is done strictly by name from the single namespace within the class. Zed does it a bit differently - the interface that a given implementation proc is intended for is explicitly indicated. Thus, it is possible for a capsule to provide different implementations for methods having the same name, but in different interfaces. Unlike other languages, Zed does not use name/operator overloading.

9.3.2 Capsule Syntax

Syntactically, a capsule declaration, which can only happen at the package level, consists of:

optional visibility specification
'capsule'
name for capsule
zero or more of 'public' and 'final'
optional capsule 'extends' clause
optional capsule definition

If a capsule declaration does not have a capsule definition, then it is a pre-declaration of the capsule. In that case, the capsule must be fully defined later and its name, visibility, 'public' and 'final' attributes and any 'extends' clause must be the same.

If a capsule is marked as 'final', then it cannot be extended. This allows the programmer and readers to know that they do not need to be aware of any extending, and that the method procs they have seen in the visible 'extends' list are the only ones which can possibly be invoked via method calls. This, or 'final' on the method proc itself, can also allow the compiler to know what method proc will be used, at compile time. It can then use a faster version of proc calling.

A capsule 'extends' clause consists of:

'extends'
path to capsule being extended

A capsule definition consists of:

optional 'implements' clause
'{'
optional capsule 'record' part
optional capsule 'interface' part
zero or more capsule 'procs' sections (see "9.3.3 Capsule "procs" Sections")
'}'

An 'implements' clause consists of:

'implements'
list of paths to interfaces, separated by commas

A capsule 'record' part consists of:

'record'
'{'
one or more record fields declarations separated by semicolons
'}'
';'

See "4.10 Record Types" for the form and rules of field declarations. The data fields in capsules work the same as non-variant fields in records. Zed does not allow variant capsules - there would be issues with explicit constructors.

A capsule 'interface' part consists of:

'interface'
optional 'partial'
'{'
one or more method specifications, separated by semicolons
'}'
';'

A capsule interface specification is the same as an interface specification outside of a capsule. The difference is that the capsule interface is the basic one that is always associated with that capsule, whereas external interfaces can be implemented by multiple capsules. As with interface methods in external interfaces, the first formal parameter of a method in an internal interface must be of type 'poly', and is implicitly 'nonNil'.

A capsule which does not 'extend' some other capsule must have either a capsule 'record' part or a capsule 'interface' part, otherwise it is incomplete and cannot be used.

If one capsule 'extends' another capsule, then the extending capsule's data will include all of the data fields of the capsule it 'extends', and its internal interface will include all of the methods that the internal interface of the capsule being extended has.

The 'partial' property on a capsule's internal interface affects only the methods declared within that interface. It does not affect methods in external interfaces or methods inherited from a capsule being extended. Similarly, it does not affect methods in a capsule which 'extends' the one being defined.

When one capsule 'extends' another, the one being extended is sometimes called the "base" capsule for the extending one. If there is a chain of these, then the other capsules can be called the "bases" of the later capsules. This name can be seen in the 'baseCall' construct, seen later.

If capsule "Internal_t" is not 'public' in its package, but is, in that same package, extended by a capsule "External_t" that is 'public', then the fields of "Internal_t" are not writeable outside of the package, even if the fields added by "External_t" are writeable. However, "External_t" can be constructed in other packages. If the wrapping capsule is not 'public', then it cannot be constructed outside of its package, and none of its fields can be written, but its methods can be used. To hide the methods, i.e. to make an entity that is an "Internal_t" but which has no visible fields or methods, then export a renaming of "Internal_t" as an anonymous (but still known to be a "tracked" value) type. I.e.

    export type Anonymous_t = Internal_t;

[I have thought about restricting the 'extends' capability of both interfaces and capsules, so that you can only extend one that is in the same package as the extending one. The idea is to minimize the amount of recompilation that is needed when a base capsule or interface is modified. However, I have not yet decided to actually do this. One issue is that it doesn't solve the problem of access to capsule fields, which can still be done from any package that has access to those fields. There could be a whole set of "setter/getter" methods, but that is clumsy for general use.]

Capsule compatibility works the same as interface compatibility. If capsule "Cap2_t" 'extends' capsule "Cap1_t", directly or though an 'extends' chain, then values of type "Cap2_t" are valid as values of type "Cap1_t" since they will contain at least all of the fields of "Cap1_t", and will also contain at least all of the methods of "Cap1_t" and all interfaces it implements. Again, the reverse is not true, both because "Cap1_t" might not contain all of the methods that "Cap2_t" does, and because "Cap1_t" might not contain all of the data fields that "Cap2_t" does.

If capsule "Cap_t" 'implements' interface "Int_t" then values of type "Cap_t", and any capsules that extend it, are valid values of type "Int_t". This is because "Cap_t" is providing all of the needed method procs for "Int_t". This conversion is implicit, as needed. Note that the conversion can get a run-time 'nil' check when a 'nil' capsule value needs to be used as an interface value. [There is a very small run-time cost in this compatibility - essentially a single indirection and field selection is needed to access a small stub structure that references the "vtable" within "Cap_t" for "Int_t", and a copy of the pointer to the main object data.] One place where this can be obscure is in a conditional expression, e.g. an 'if' expression:

    InterfaceType_t inft :=
        if <some-condition> then
            nonnil-capsule-value
        else
            nil
        fi;

This example will get a run-time 'nil' check when "<some-condition>" yields 'false'. This is because in that case the 'if' expression is yielding a 'nil' capsule value, and attempting to get the stub structure from that gets a 'nil' check. For that specific situation, the simplest solution is:

    InterfaceType_t inft := nil;
    if <some-condition> then
        inft := nonnil-capsule-value;
    fi;

9.3.3 Capsule "procs" Sections

A capsule 'procs' section consists of:

'procs'
optional name of this capsule, or path to implemented interface
'{'
one or more full proc definitions separated by semicolons
'}'
';'

If a 'procs' section has no name or path, then the procs within it are "utility capsule procs". Such procs have access to the data fields of the capsule, and can call any of the methods of the capsule or interfaces it implements. Utility procs can also call utility procs defined earlier in the capsule. Utility procs like this are cleaner than having a proc that is predeclared before the capsule definition, then defined after the capsule definition. Calling a utility proc is just a regular proc call - it is not a method call, so uses the normal proc call syntax.

If the 'procs' section is named by the name of the capsule, then the procs in that section must implement the methods in the capsule's internal interface. The procs are matched to the methods by name. An additional option is if the name of such a proc is the capsule name. In that case, the proc is an explicit capsule constructor. They will be discussed later.

If a 'procs' section is named by one of the interfaces that the capsule implements, then the procs in that section must implement the methods in that interface.

A capsule can provide an implementation of an interface method even though a capsule that it 'extends' already provides an implementation - the new proc overrides the old one for objects of this new capsule type and any other capsules which extend this one (unless those capsules override the method further). Note that we don't talk about overriding the explicit constructor of a capsule which is being extended - the new capsule is providing an explicit constructor proc for itself. The signature (number and type of formal parameters) of explicit constructors is not constrained other than the initial parameter.

There can be multiple 'procs' sections with the same name or path - the procs in them accumulate. This allows utility procs to be defined close to method procs that use them.

Procs in capsules that implement methods can have property 'final' given to them (by putting 'final' before the proc name, among the other proc kind modifiers). Such a proc cannot be overridden by a capsule which 'extends' the one which includes the 'final' method proc.

A proc which is implementing a method must match that method in terms of the number, types and storage flags of the formal parameters and the result type. Such a proc must have an initial parameter which is of the capsule type - the "self" parameter is always the first parameter in a method proc. This corresponds to the 'poly' parameter in the method. This parameter is implicitly made 'nonNil' (since the method cannot be called unless it is first retrieved from an actual object reference) and not 'var'.

If a capsule does not have an explicit constructor proc, then values of the capsule type are constructed in the same way in which values of record types are constructed. The 'noInit' concept applies similarly. If a capsule type extends another capsule type, then an implicit constructor for the extending capsule type must include values for all of the fields in the extended capsule type, followed by all of the new fields in the extending capsule type.

An example of the concepts discussed so far:

    capsule Inner_t {
        record {
            string nonNil in_name;
        };

        interface {
            proc Display(poly in)void;
            proc ToUint(poly in)uint;
        };

        procs Inner_t {
            proc
            Display(Inner_t in)void:
                FmtN("\"", in->in_name, "\"");
            corp;

            proc
            ToUint(Inner_t in)uint:
                Basic/StringHash(in->in_name)
            corp;
        };
    };

    [100] Inner_t Inners;
    uint InnerCount := 0;

    proc
    Append(Inner_t in)void:
        Inners[InnerCount] := in;
        InnerCount := InnerCount + 1;
    corp;

    proc
    Dump(string header)void:
        FmtN(header, ":");
        if InnerCount = 0 then
            Fmt(" empty");
        else
            Fmt();
            for i from 0 upto InnerCount - 1 do
                Inner_t in := Inners[i];
                FmtN("  ", i : 2, "/", {in->ToUint}() % 1000000 :: d0 : 6, ": ");
                {in->Display}();
                Fmt();
            od;
        fi;
    corp;


    uint COUNT = 5;

    capsule Vector_t extends Inner_t {
        record {
            [COUNT] uint vec_data;
        };

        interface {
            proc Sum(poly vec)uint;
        };

        procs {
            proc
            showSum(Vector_t nonNil vec)void:
                uint s := Sum(vec);
                Fmt("The sum is ", s);
            corp;
        };

        procs Vector_t {
            proc
            Display(Vector_t vec)void:
                FmtN("\"", vec->in_name, "\": [");
                for i from 0 upto COUNT - 1 do
                    if i ~= 0 then
                        FmtN(", ");
                    fi;
                    FmtN(vec->vec_data[i]);
                od;
                FmtN("]");
            corp;

            proc
            Sum(Vector_t vec)uint:
                uint sum := 0;
                for i from 0 upto COUNT - 1 do
                    sum := sum + vec->vec_data[i];
                od;
                sum
            corp;

            proc
            ToUint(Vector_t vec)uint:
                Basic/StringHash(vec->in_name) + {vec->Sum}()
            corp;
        };
    };


    capsule Triple_t extends Inner_t {
        record {
            uint trip_left;
            char trip_op;
            uint trip_right;
        };

        procs Triple_t {
            proc
            Display(Triple_t trip)void:
                FmtN("\"", trip->in_name, "\": {", trip->trip_left, " \"",
                     trip->trip_op, "\" ", trip->trip_right, "}");
            corp;
        };
    };


    export proc
    run()void:
        Dump("At start");
        Append(Inner_t("simple #1"));
        Append(Triple_t("triple #1", 123, "+", 345));
        for i from 1 upto 3 do
            Vector_t vec := Vector_t(FmtS("vector #", i));
            for j from 0 upto COUNT - 1 do
                vec->vec_data[j] := i * 100 + j * j;
            od;
            Append(vec);
        od;
        Append(Triple_t("triple #2", 987, "-", 654));
        Append(Inner_t("simple #2"));
        Dump("At end");
    corp;

Output from this example:

    At start: empty
    At end:
       0/490289: "simple #1"
       1/293425: "triple #1", {123 "+" 345}
       2/485699: "vector #1", [100, 101, 104, 109, 116]
       3/486200: "vector #2", [200, 201, 204, 209, 216]
       4/486701: "vector #3", [300, 301, 304, 309, 316]
       5/293426: "triple #2", {987 "-" 654}
       6/490290: "simple #2"

Lower-level capsule "Inner_t" has an interface which defines a "Display" method to print a representation of a value, along with a "ToUint" method to provide a 'uint' summary of a value. Associated with it is code to maintain an array of "Inner_t" values, and to be able to dump out that array.

Capsule "Vector_t" 'extends' "Inner_t" with an array of "COUNT" (5) 'uint's, and adds a method "Sum" which here is only used inside "Vector_t"'s "ToUint" method, and unused utility proc "showSum". Note that the call to method "Sum" in utility proc "showSum" uses normal proc-call syntax rather than method-call syntax. This causes it to use "static" method selection, which is resolved at compile time, rather than "dynamic" method selection, which is resolved at run time. See "9.3.5 Static and Dynamic Method Selection".

Capsule "Triple_t" 'extends' "Inner_t" with a binary-operator-like set of three fields.

Since both "Vector_t" and "Triple_t" extend "Inner_t", values of those types can be passed to the "Append" proc associated with "Inner_t", and will have their methods used when "Dump" is called.

Proc "run" creates a first "Inner_t", a first "Triple_t", three "Vector_t"'s, a second "Triple_t" and a second "Inner_t". All are passed to "Append", so the array "Inners" will contain 7 objects which are all valid "Inner_t" values. In this way, procs "Append" and "Dump" are operating polymorphically.

There is a situation which might be confusing to programmers familar with other languages offering object-oriented features. If capsule "Cap_t" implements one or more interfaces, one of which is "Int_t", then "Cap_t" values must be explicitly converted into values of type "Int_t" before the relevant interface methods can be used. The method-calling syntax, when used with a capsule value, can only access methods in the capsule's internal interface. The requirement of an explicit conversion is Zed's way of avoiding ambiguity over which method among several with the same name is being called. Other programming language do matches based on the details of the parameters and result of the methods, or have other rules for implicitly choosing.

As an example of this, assume capsule "Cap_t" has method "Meth1" in its internal interface. Assume "Cap_t" implements interfaces "Int1_t" and "Int2_t" both of which also contain method "Meth1". Then:

    con cap := Cap_t(...);
    {cap->Meth1}(...);          // calls Cap_t's Meth1

    Int1_t int1 := cap;
    {int1->Meth1}(...);         // calls Cap_t's implementation of Int1_t's Meth1

    Int2_t int2 := cap;
    {int2->Meth1}(...);         // calls Cap_t's implementation of Int2_t's Meth1

[At one point, Zed had a syntax that enabled the choosing of an interface and calling one of its methods all in one place, without the use of a variable of the interface type. This ended up being more awkward in many cases, so was removed.]

9.3.4 Capsule Constructors

A capsule can have an explicit constructor proc, as mentioned above. The explicit constructor must be declared to yield 'void' (have no result) and must have an initial formal parameter which is of the capsule type. That formal will be forced to be not 'var' (value not modifiable within the constructor proc) and 'nonNil' (a valid tracked value is always passed in). With this kind of constructor, the formal parameters (other than the first) are whatever values the programmer needs in order to construct an object of this capsule type.

If the capsule does not have an explicit constructor, then it will have an implicit constructor. Implicit constructors work exactly like record constructors - their parameters must be the various non-compound, non-'noInit' fields of the capsule, including all inherited fields and 'inline' fields.

If a capsule has an explicit constructor, then any capsule which 'extends' it must also have an explicit constructor (and so on), since the Zed compiler cannot know how to implicitly call an inner explicit constructor from within an outer implicit constructor.

Capsules with explicit constructors cannot have any fields added at that level which are declared as 'nonNil'. This is because there is no such value available to be assigned to those fields before the code of the explicit constructor runs. If the current capsule extends another, then either that inner capsule also has no 'nonNil' fields, or it does not have an explicit constructor, so the new explicit constructor must use the inner capsule's implicit constructor, and pass appropriate 'nonNil' values to it.

If a capsule has an explicit constructor, and also 'extends' another capsule which has an explicit constructor, then the first statement in the body of the new explicit constructor must be a call to the explicit constructor of the capsule which has been extended. In this situation, the first parameter of the new explicit constructor must be passed explicitly as the first parameter of the constructor of the capsule that has been extended. If such an extended capsule does not have an explicit constructor, then the first line of the extending capsule's explicit constructor must be a call to the inner capsule's implicit constructor. In this special situation, that implicit constructor acts like it "returns 'void'", even though it normally "returns a new object". The capsule reference parameter to this explicit outer constructor is passed to the inner implicit constructor. The inner constructor call serves to provide values for the fields of the capsule which was extended.

As with other constructors and initializers, Zed does not specify the order of evaluation of expressions passed in to capsule constructors, whether implicit or explicit.

The Zed compiler processes capsule definitions linearly. If it encounters a use of a capsule constructor for a capsule that is currently being defined (i.e. inside one of the procs of the capsule), then there cannot later be an explicit constructor for the capsule.

Examples of capsule construction:

    capsule InnerExp_t {
        record {
            uint inex_n;
            bool inex_new;
        };

        procs InnerExp_t {
            proc
            InnerExp_t(InnerExp_t inex; uint n)void:
                inex->inex_n := n;
                inex->inex_new := true;
            corp;
        };
    };

    capsule OuterExp_t extends InnerExp_t {
        record {
            float outex_f;
            float outex_totalSeen;
        };

        procs OuterExp_t {
            proc
            OuterExp_t(OuterExp_t outex; uint n; float f)void:
                InnerExp_t(outex, n);
                outex->outex_f := f;
                outex->outex_totalSeen := 0.0;
            corp;
        };
    };
    eval FmtAdd(OuterExp_t, nil);

    capsule InnerImp_t {
        record {
            uint inim_n;
            string inim_tag;
            bool noInit inim_flag;
        };
    };

    capsule OuterImp_t extends InnerImp_t {
        record {
            char outim_ch;
        };

        procs OuterImp_t {
            proc
            OuterImp_t(OuterImp_t outim; uint n; string tag; char ch)void:
                InnerImp_t(n, tag);
                outim->outim_ch := ch;
            corp;
        };
    };
    eval FmtAdd(OuterImp_t, nil);

    capsule AllImp_t {
        record {
            uint aim_n;
            char aim_ch;
        };
    };
    eval FmtAdd(AllImp_t, nil);

    proc
    test1()void:
        OuterExp_t outex := OuterExp_t(11, 22.33);
        OuterImp_t outim := OuterImp_t(44, "imp", "A");
        AllImp_t aim := AllImp_t(55, "b");
        Fmt(outex);
        Fmt(outim);
        Fmt(aim);
    corp;

Output from running this is:

    OuterExp_t(11, T, 22.33, 0.)
    OuterImp_t(44, "imp", F, "A")
    AllImp_t(55, "b")

Here we have a capule "InnerExp_t" that has an explicit constructor proc. It takes only one value, and initializes field "inex_new" to a default value. Capsule "OuterExp_t" 'extends' "InnerExp_t" and also has an explicit constructor proc, which sets field "outex_totalSeen" to a default value. Note that the first line of that constructor is a call to the constructor of "InnerExp_t", as required.

Capsule "InnerImp_t" has no explicit constructor proc, so it will have an implicit constructor which works like record constructors. It has one field marked 'noInit', so no value for that field is given when using the (implicit) constructor. Capsule "OuterImp_t" extends "InnerImp_t" and has an explicit constructor proc, the first line of which is the special form call to the implicit constructor for "InnerImp_t".

Final capsule "AllImp" has no explicit constructor, and so will have an implicit one. It has no 'noInit' or compound fields, so its constructor will need values for all of it's fields.

This test uses "FmtAdd" (at compile-time) to add default formatters to the three top-level capsules, so that it can simply use "Fmt" to output values of those types. Proc "test1" creates objects of those three capsule types. It then uses "Fmt" to display the created values. Note from the output shown that the display procs display all fields, including those from the inner capsules and those which are not given to an implicit constructor.

A capsule value of a capsule type with an explicit constructor is considered to be fully initialized only when that explicit constructor completes. If references to fields of the value are possible before that point, e.g. from another thread, the values seen, and any effect on the execution of the constructor, are not defined by the Zed language. The Zed implementation will do whatever is necessary to protect its own integrity, but need not do more than the minimum needed.

9.3.5 Static and Dynamic Method Selection

A situation which can arise in object oriented programming is one where a capsule needs to override a method that exists in a capsule which it is extending, but also wants to use the functionality provided by the method that it needs to override. If the programmer has control over the package being extended, they can provide another way to get at that functionality, so that overriding the method doesn't make the functionality inaccessible. A cleaner technique is desireable however. What is needed is a way to say "I want the version of this method from the capsule that I'm extending". The way to do this in Zed is with 'baseCall'. 'baseCall' can be used as a proc name in a proc call, within a proc which implements a method, and it will refer to the implementing proc of the same name available in the capsule that the current one 'extends'. The call is syntactically like a normal proc call, not like a method call. The actual proc need not be in the capsule being extended - it can be further back in the 'extends' chain.

The reverse situation is different. If a capsule can be extended, then what version of a method will it get if it calls a method within a method? Will it get its own version of the method, or will it get some proc which was defined in some extending package? In Zed, this is controlled by the syntax used for the call. If the indirect proc syntax (with braces) is used, then the method is looked up at run-time, and the method for the actual type of the capsule object will be used. If normal proc call syntax is used, then the method called is the one defined in this capsule, or, if none is defined here, then the one defined in the capsule that this one extends, following the 'extends' chain as needed.

Both 'baseCall' and the standard proc call form of calling methods, are only valid inside procs inside of capsules, i.e. in method procs and capsule utility procs.

This example illustrates these rules:

    capsule Lower_t {
        record {
            uint low_n;
        };

        interface {
            proc display(poly low)string;
            proc show(poly low)string;
        };

        procs Lower_t {
            proc
            display(Lower_t low)string:
                FmtS("Lower_t/display, n = ", low->low_n)
            corp;

            proc
            show(Lower_t low)string:
                FmtS("Lower_t/show, n = ", low->low_n)
            corp;
        };
    };

    capsule Middle_t extends Lower_t {
        interface {
            proc print(poly mid)void;
        };

        procs Middle_t {
            proc
            display(Middle_t mid)string:
                "Middle_t/display, " + baseCall(mid)
            corp;

            proc
            print(Middle_t mid)void:
                Fmt("Middle_t/print #1: ", {mid->display}());
                Fmt("Middle_t/print #2: ", display(mid));
                Fmt("Middle_t/print #3: ", {mid->show}());
                Fmt("Middle_t/print #4: ", show(mid));
            corp;
        };
    };

    capsule Upper_t extends Middle_t {
        record {
            bool up_flag;
        };

        procs Upper_t {
            proc
            display(Upper_t up)string:
                "Upper_t/display, " + baseCall(up)
            corp;

            proc
            show(Upper_t up)string:
                "Upper_t/show, " + baseCall(up)
            corp;
        };
    };

    export proc
    main()void:
        Lower_t low := Lower_t(123);
        Fmt({low->display}());
        Middle_t mid := Middle_t(456);
        {mid->print}();
        Upper_t up := Upper_t(789, true);
        {up->print}();
    corp;

Output from this example is:

    Lower_t/display, n = 123
    Middle_t/print #1: Middle_t/display, Lower_t/display, n = 456
    Middle_t/print #2: Middle_t/display, Lower_t/display, n = 456
    Middle_t/print #3: Lower_t/show, n = 456
    Middle_t/print #4: Lower_t/show, n = 456
    Middle_t/print #1: Upper_t/display, Middle_t/display, Lower_t/display, n = 789
    Middle_t/print #2: Middle_t/display, Lower_t/display, n = 789
    Middle_t/print #3: Upper_t/show, Lower_t/show, n = 789
    Middle_t/print #4: Lower_t/show, n = 789

Capsule "Lower_t" implements "display" and "show" methods that show the "low_n" value from the passed object. Capsule "Middle_t" does not provide a "show" method, so calls to it for a "Middle_t" object will use "Lower_t"'s "show". "Middle_t"'s "display" uses 'baseCall' to access "Lower_t"'s "display", appending that to its own description string. Capsule "Upper_t" provides both "display" and "show", so dynamic method lookup on objects of type "Upper_t" will find those implementations.

The demonstration is in proc "Middle_t/print". It's "#1" output is coming from the object's actual "display" method. Since all three capsules provide a "display" method, the output from this line will identify the object's capsule. There are only two calls to "print" in "main" since "Lower_t" does not implement "print". The "#2" output is calling "display" using the normal proc call syntax, and so it will get the "display" that is active for "Middle_t", which is the one defined in "Middle_t". That is seen in the "#2" output lines. Line "#3" uses the "show" method from dynamic lookup on the object. But, only "Lower_t" and "Upper_t" implement "show". So, "#3" output lines indicate they are coming from "Lower_t" and "Upper_t", even though the first output is from a "Middle_t" value. Line "#4" uses normal proc call semantics for "show", so it will get the "show" that is active for "Middle_t". Since "Middle_t" does not provide a "show", the one from "Lower_t" is used.

A note on naming: a capsule "B" which extends capsule "A" will have one or more additional record fields or methods in its internal interface. Thus, "B" has more capabilities than "A", and so Zed says that "B" "extends" "A", rather than the forms "B inherits from A", "A is the parent of B", or "B is a child of A'. Another aspect of the reasoning here is that "B" can be larger than "A" in terms of its in-memory size, so having it be the "child" of "A" is peculiar.

10 Generics

10.1 Introduction

Interfaces and capsules allow polymorphic code to be used with multiple data types. With capsules, however, a relationship must exist among the types involved in order that the polymorphism work. With both capsules and interfaces, only one thing can vary - the capsule or interface type itself. Sometimes other requirements on code and data structures do not allow the interface and capsule techniques to be appropriate.

As an example, consider simple linked lists. If the "pointer to next element" fields are at the beginning of the linked list nodes, then the operations required to traverse lists, and to insert and delete elements relative to other elements, are identical regardless of what other data is in the linked list nodes. The concept of generic code is something that programming languages have used to handle this situation. A simple linked list generic is shown below.

Another use for generics is sized containers. In that situation, the generic will have a 'uint' parameter which says how large the container is. Within the generic, the value is not yet known - it is supplied when the generic is instantiated. With such a generic, it is likely that the code in the generic must be instantiated (copied, with modifications), since the size will be a constant within the instantiated version.

Generics with 'uint' sizing parameters can also be used to create code dealing with arrays. Code dealing with matrixes can obtain the sizes involved using 'getBound' and so can handle different sizes of matrixes. That isn't possible with arrays, since the bounds of an array are part of its type, and passing '@' of an array to a proc requires that the bounds all match. This can be handled using a generic which accepts array element types along with array bounds. The "bubblesort" example below ("10.6 Generic Type Parameter Interfaces") shows this. Again, most code in the generic must be instantiated, since the sizes appear as constants within that code.

[One might ask why someone would bother with the complexity of a generic for something like array multiplication when a proc taking matrix parameters will handle all sizes of matrixes. One answer is that in some areas of computing, the extra overhead involved in allocating and indexing within a matrix can be significant. Also, because the array bounds are constants within the instantiated code, it might be possible for the compiler to optimize the array indexing.]

Here follows a very simple linked list 'generic' in Zed. Note that the Zed libraries include a much more capable set of linked list generics - the one here is used to illustrate various 'generic' concepts.

    generic LinkedList(type dataType) {
        export record ListNode_t {
            ListNode_t ln_next;
            dataType ln_this;
        };

        export proc
        ListNext(ListNode_t nonNil ln)ListNode_t:
            ln->ln_next
        corp;

        export proc
        ListInsert(@ ListNode_t aHead; dataType data)void:
            aHead@ := ListNode_t(aHead@, data);
        corp;
    };

"LinkedList" is the name of the generic. "dataType" is the name of this generic's single generic type parameter. In this example, only simple types (like 'uint', 'string', record and capsule references, etc.) can be used as that generic type parameter. Proc "ListNext" simply moves on to the next element in a list, thus hiding the use of the "ln_next" field. Proc "ListInsert" is passed the '@' of a list node reference, and inserts a new element before that element. It can be used to insert at the head of the list as well.

Note that the type and the two procs in the generic are marked 'export'. This does not mean that they are exported universally from the package that contains the generic. Instead, it means that they are exported from the generic that contains them. If the generic is exported from its containing package, then everything that is exported from the generic is similarly exported from the package. Large generics can contain internal procs and types that need not be exported.

The procs and types in generics cannot be used directly outside of the generic. The generic must first be "instantiated", by supplying types or values for all of the generic's parameters. This is done with the 'instance' construct. An example instance of the above generic is:

    instance StringList = LinkedList(string);

With that instance created, the following procs can be defined:

    proc
    printStringList(StringList.ListNode_t head)void:
        Fmt("List contains:");
        for ln from head then StringList.ListNext(ln) do
            Fmt("  \"", ln.ln_this, "\"");
        od;
    corp;

    export proc
    test1()void:
        StringList.ListNode_t strList := nil;
        printStringList(strList);
        StringList.ListInsert(@strList, "world");
        StringList.ListInsert(@strList, "there");
        StringList.ListInsert(@strList, "hello");
        printStringList(strList);
    corp;

This simple code provides a proc to scan down a list of strings, printing the string values; sets up a short list of strings; and displays it. The types and procs used are selected using the '.' separator from the instance name.

It is possible to provide a shorter name for a type from an instance, as in:

    type StringNode_t = StringList.ListNode_t;

Normally a type definition like this would produce a new named type that is a rename of the provided type, and so is not exactly equivalent, although the two would be assignment compatible both ways. However, Zed makes an exception in this situation - the new name for the type is exactly equivalent to the longer form. Any further renaming does not have this special meaning. In Zed, this special situation is called "aliasing" of an instantiated type.

Generics wouldn't be useful if there could be only one 'instance' of them. The following re-uses "LinkedList" for a linked list of vectors of floating point values:

    instance VecList = LinkedList([] float);
    type VecNode_t = VecList.ListNode_t;

    proc
    createVec(uint n)nonNil [] float:
        [] float nonNil vec := matrix([n] float);
        for i from 1 upto n do
            vec[i - 1] := flt(i * i);
        od;
        vec
    corp;

    proc
    showSum(VecNode_t nonNil vn)void:
        float sum := 0.0;
        for i from 0 upto getBound(vn->ln_this) - 1 do
            sum := sum + vn->ln_this[i];
        od;
        Fmt("Sum = ", sum);
    corp;

    export proc
    test2()void:
        VecNode_t vecList := nil;
        VecList.ListInsert(@vecList, createVec(10));
        VecList.ListInsert(@vecList, createVec(5));
        for vn from vecList then VecList.ListNext(vn) do
            showSum(vn);
        od;
    corp;

This part of the example shows using a short-form name for the new instance's node type. Unfortunately, there is no handy way to provide a shorter name for a proc, other than writing a short wrapper proc. That alternative might be appropriate if there are a lot of uses of the proc.

As mentioned above, renames of instance types are called "aliases" of those types. If you think of a rename of a type as "pointing to" the type, then those renames lose the special properties of the type they are "pointing to". In an alias of a type from an instance, however, both the type as directly selected from the instance and the type that aliases it, can be thought of as "pointing to" the actual instantiated type.

For most uses, this distinction does not matter and the alias is simply a convenience. However, when attaching "exports" to named types, it can matter. When the alias is made, the new aliasing type is made to share the type exports table of the aliased type. If that table does not yet exist, there is nothing to share, and if type exports are subsequently added, the two names for the type will get separate exports tables. If, however, something is added to the type exports table before an alias is made, then the two will share the single exports table.

Many languages that provide a facility similar to Zed's generics implement it by always essentially copying the types and procs when the generic is instantiated. Such implementations can be misused, leading to "code bloat". In Zed, procs are not always copied - binary code for the generic proc might be created, and can be directly used for some instances. The conditions and limitations for this are discussed below.

The example above showed only a single type parameter to a generic, but generics can have multiple type parameters and 'uint' parameters.

10.2 Syntax

The syntax of a generic definition, which can only occur at the package level, is:

optional visibility specification
'generic'
name for generic
'('
one or more generic parameter sets, separated by semicolons
')'
'{'
one or more generic elements, separated by semicolons
'}'

Each generic parameter set consists of:

generic parameter header
one or more new generic parameter names, separated by commas

Each generic parameter header consists of one of:

'uint'
'type'
'any' 'type'
'@' 'type'

The significance of the different forms is described below.

Each generic element, other than instances and interface specifications, can be preceeded by 'export' to indicate that they are available for use outside of the generic. Without the 'export', the elements are private to the generic, and can only be used by types and code within the generic. Generic elements can be:

an interface specification for a generic type parameter
a proc definition, with an optional forced proc type specification and proc kind indicators
a 'struct' declaration or predeclaration
a 'record' declaration or predeclaration
a 'union' declaration or predeclaration
a 'type' declaration
an 'instance' declaration

[It could be useful to allow other kinds of types (e.g. enum, oneof) to be declared inside generics, but this is currently not supported. One reason for this is that there is nothing generic about such types - they can make no use of the parameters to the generic, and they never need to be instantiated, so there is little gained semantically by declaring them inside the generic. For enum and oneof types, using them outside the generic would be clumsy, since an <instance>.<name> form would be needed. The semantics of them could also be unclear, since if they are not instantiated, then they are equivalent across instances. Another alternative would be to allow the names to be referenced as <generic>.<name>, but, although semantically a bit clearer, its still just as verbose.]

See "10.6 Generic Type Parameter Interfaces" for information on the syntax and use of interface specifications for generic type parameters.

Note that you cannot declare regular interfaces or capsules inside a generic, and vice versa. Instances created inside generics are useful when one generic uses another as part of its implementation. For example, a generic that implements a higher level container might want to use a "Lists" instance as its underlying representation.

The syntax of an instance definition, which can occur at the package level or inside a generic, is:

optional visibility specification
'instance'
name for instance
'='
path to generic
'('
list of type and 'uint' constant expressions, separated by commas
')'

10.3 Rules for Generics

Procs inside generics must be regular procs, not compile-time procs of any kind. The reason for this is that compile-time procs can add code into procs that they are invoked from, and we cannot allow them to add uses of generic types, since that results in use of those generic types outside of the generic within which they are valid. [%%% No, it doesn't. Even if a proc in a generic is compile-time, when it runs it will always be with respect to some instance (no, it could just be running as part of compiling some generic proc), so I don't currently see the need for this restriction. I need to think about it some more. Perhaps there just needs to be a rule that such procs cannot be exported from the generic.]

Generics act somewhat like packages in regards to any 'struct', 'union' and 'record' types defined within them. If such a record type is not marked 'public', then it can only be constructed by code within the generic. Similarly, if fields of any such type are marked as 'private', then they can only be accessed by code within the generic. And lastly, if fields are marked 'ro', then they can only be modified by code within the generic. Similarly, if a matrix type is defined and named inside the generic, and it has marked its elements as 'private', then those elements can only be modified by code inside the generic. If such a type has not been named, then it has the usual restriction that the matrix elements are writeable by any code in the containing package. [This could be fixed by adding "md_containingGeneric" and "md_containingInstantiation" to matrix descriptors (they currently have md_containingPackage), but is it worth it?]

'uint' parameters to generics can be used to create a generic which implements a container of a static size determined by such parameters. Those containers could then be of array types instead of dynamically allocated matrix types. Alternatively, the semantics of the context might require a strict limit on the number of contained elements, and such a generic automatically enforces that.

Another use of generics with 'uint' parameters is for array operations. A matrix multiply proc, using dynamically allocated matrixes, must check the bounds of its arguments and result at run-time, to make sure they are compatible for the multiplication. A generic which is instantiated with the bounds of its array parameters has the bounds checked at compile time, when a call to the generic multiply proc is compiled. The procs in such a generic will see those bounds as constants, and thus the compiler can create more efficient machine code for them. There are also situations where the dynamic allocation associated with the use of matrixes is not wanted - using statically allocated arrays then allows similar operations, and generics allow shared source code to be used.

As has been described before, 'uint' generic parameters are constants within the instantiation of the generic. Thus, they can be used as the bound for array types. Constant expressions involving them can be used when instantiating an "inner" generic within an "outer" generic. But, constant expressions involving 'uint' generic parameters *cannot* be used as array bounds.

[Technically, 'uint' generic parameters are considered to be constant during instantiation, but not during the definition of the contents of the generic. (Internal proc "IsUintConstantExpr".) Their use as array bounds is an explicit special case. I believe it would be possible to make them more generally be constants, but it would require work and perhaps quite a bit of code. It would require a treewalk to determine if using 'getBound' on such a bound should trigger the need to instantiate the containing proc. Similar issues would arise when comparing array types for assignment compatibility. (Internal proc "BoundsEquiv".) My current view is that this is not actually needed.]

If a generic type parameter has no 'any' or '@' in front of it, then it is called a "plain" generic type parameter. When the generic is instantiated, any single-value type can be used to instantiate such a parameter. So, 'array', 'struct' and 'union' types are not allowed. Within the generic, the single-value nature of the type parameter is known, and so that generic type parameter can be used as a proc parameter type and a proc result type. Record fields of the type require initializers in record constructors, if they do not include a 'noInit' storage flag in the field definition.

Because the types available for "plain" generic type parameters include 'uint', 'float', 'bits8', 'string', etc. procs involving those parameters must be instantiated. This is because generated machine code will likely have to use different machine registers and instructions to deal with the values. Currently, the Zed compiler will instantiate all procs within a 'generic' if that generic has one or more "plain" generic type parameters. [This is likely overkill, but work and care would be needed to avoid it.]

A generic type parameter with 'any' in front of the 'type' can accept any type that is compatible with 'any', i.e. any tracked type. Such a type parameter cannot be instantiated with, say, 'uint', or with a struct, array or union type. Depending on what the code in the proc does, it might not be necessary to instantiate a given generic proc based on the proc using 'any' 'type' generic parameters. See "10.9 When Generic Procs Must be Instantiated".

The Zed compiler considers 'any' 'type' generic parameters to be tracked types. This means that 'assign' tests for 'nil' and for run-time type checking can be used with them.

If a generic type parameter has '@' in front of the 'type', then it can accept nearly any type in its instantiation. The main types not allowed are '@' types, since that results in a double '@' which is only permissible in certain circumstances. Although it is possible to use such a type directly within generic procs, e.g. declaring local variables, assigning values, etc., it is expected that most uses will use '@' in front of the values. '@' 'type' generic parameters cannot be used as proc parameters or results, since it is not known whether they are multi-valued or not. Essentially, the Zed compiler treats such values as multi-valued, in how it determines what can be done with them.

With this kind of type parameter it is again possible that procs within the generic need not be instantiated. As an example, the "Lists" standard library package uses '@' 'type' generic parameters, and none of its procs need to be instantiated. In a large project with many linked lists of different types in use, this can result in considerable code savings.

Not many operations are available for values of generic type parameters. This is because very little is known about the nature of the actual types that will instantiate the generic. All generic type parameters allow assignment. "Plain" type and 'any' 'type' parameters allow equality comparison.

A type can be given a new name inside a generic. When this is done, the effect is the same as similar renaming elsewhere - a new type is created which is assignment compatible with the original, but not with any other renames of that original. You cannot rename a type from outside of the generic inside the generic.

Zed does not currently allow record types with variant parts to be defined inside generics. The problems are mostly implementation issues, relating to how to handle the variant selector type and the variant selector tags. If this feature becomes a requirement it could be added.

[There is nothing about the variant tags or selector type that depends on the generic parameters - they are just like an 'enum' type. When variant records are declared outside of a generic, those names are defined in the containing package, with the visibility specification the same as that for the overall record type. If a variant record defined inside a generic is declared as 'export', then doing that could make sense - it allows the names to be referenced without having to use the dot notation with an instance. But, if the variant record type is not 'export', then the names must be restricted to within the generic. There is currently no mechanism to do that, nor are there mechanisms to represent references to such names. Also, the tags and the type name, when used outside of the generic, are uses of an uninstantiated generic type outside of the generic. Because of that, special checks would be needed in various places in the Zed compiler to allow those uses. See the asside above about enum, etc. types inside generics - the issues are basically the same.]

10.4 A "Lists" Example

The following small piece of example code is from the "Lists" package:

    export generic DList(@ type gen) {
        export record DList_t {
            DList_t noInit ro dl_next;
            DList_t noInit ro dl_prev;
            gen inline dl_this;
        };

        export struct DHead_t {
            DList_t noInit ro dlh_head;
            DList_t noInit ro dlh_tail;
        };
        ...

Here the single generic type parameter, "gen", is marked as '@'. That means that it can be instantiated with nearly any type.

The linked list pointers are marked as 'ro'. That means they can only be modified by code inside the generic "DList". Similar for the fields of the "DHead_t" list head structure. This choice means that the generic is retaining complete control over the structure of the list - external callers can examine the various references, but cannot change them. This allows the generic code to know that all lists that it deals with will be correctly formed.

The linked list pointers are also marked as 'noInit'. This means that when external code creates a list node it does not supply values for the list pointers. This is consistent with the generic maintaining complete control of those pointers.

Here is a short example using the above generic:

    struct Pair_t {
        uint p_left, p_right;
    };

    instance PairDList = Lists/DList(Pair_t);
    type PairDListHead_t = PairDList.DHead_t;
    type PairDList_t = PairDList.DList_t;

    PairDListHead_t Pdlh;

    proc
    test1()void:
        PairDList.DLInit(@Pdlh);
        PairDList.DLInsertHead(@Pdlh, PairDList_t(1, 100));
        PairDList.DLInsertTail(@Pdlh, PairDList_t(2, 200));
        ...
    corp;

We instantiate "DList" with a small structure, and define local names for the two types that "DList" exports.

In proc "test1", note that the constructors do not include values for the list next and prev pointers - only for the fields of "Pair_t". This is as described above.

10.5 A Generic Array Multiplication Example

The example here has 3 'uint' parameters to the generic. They control the bounds of the arrays which can be worked with, and will be set when the generic is instantiated. This generic contains an array multiplication proc:

    export generic ArrayMul(uint L, M, N) {
        export proc
        Mul(@ [L, N] float aDst; @ [L, M] float aA; @ [M, N] float aB)void:
            for i from 0 upto L - 1 do
                for j from 0 upto N - 1 do
                    float sum := 0.0;
                    for k from 0 upto M - 1 do
                        sum := sum + aA@[i, k] * aB@[k, j];
                    od;
                    aDst@[i, j] := sum;
                od;
            od;
        corp;

        Package/InstantiationCompleter_t: proc
        _instantiate_(Package/PContext_t nonNil pctx;
                      Package/Instance_t nonNil inst)void:
            con l := inst->inst_params[0].insp_value,
                m := inst->inst_params[1].insp_value,
                n := inst->inst_params[2].insp_value;
            Fmt("ArrayMul instantiated, L=", l, ", M=", m, ", N=", n);
        corp;
    };

The parameters to "Mul" are '@' of arrays of float. In Zed this "address of array" is explicit, whereas in other languages it tends to be implicit.

Ignore the details of proc "_instantiate_". It is enough to know that it runs at compile time and shows the values of the three 'uint' parameters that "ArrayMul" is being instantiated with.

Here is a use of that generic:

    uint A = 10, B = 20, C = 15;

    instance MyArrayMul = ArrayMul(A, B, C);

    [A, C] float Arr1;
    [A, B] float Arr2;
    [B, C] float Arr3;

    proc
    test()void:
        MyArrayMul.Mul(@Arr1, @Arr2, @Arr3);
    corp;

The 'Mul' proc in this instance will be compiled with L=10, M=20 and N=15. Those values will be 'uint' constants within 'Mul'. If we try tests with the array bounds not matching:

    [A, 14] float Arr4;
    [11, B] float Arr5;

    proc
    bad()void:
        MyArrayMul.Mul(@Arr4, @Arr2, @Arr3);
        MyArrayMul.Mul(@Arr1, @Arr5, @Arr3);
    corp;

we get error messages like:

    // Array bound mismatch
    // '@'-ed types do not match
    // Value is not compatible with parameter "aDst"

    // Array bound mismatch
    // '@'-ed types do not match
    // Value is not compatible with parameter "aA"

because the passed arrays do not match the instantiated parameters of "Mul".

Because Zed provides full access to compile time execution, compiler internals, etc., it should be possible to write things like an array multiply proc which runs at compile time, examines the array bounds, and creates an actual proc with the loops re-arranged for the best use of processor cache, array indexing, etc. [It might also be possible to create, at compile time, custom FFT procs based on the size of the FFT. I recall that Myrias had an "FFT-gen" program, but I don't know any details of it, or whether what it did is still relevant.]

10.6 Generic Type Parameter Interfaces

When generics are initially processed, little is known about the nature of the types of the generic parameters. For "plain" generic type parameters, it is known that the values will be non-multiple, and so can be assigned, compared for equality, and passed to and from procs. For 'any' generic type parameters, it is known that the values will be "tracked" values, and so can be compared and assigned. For '@' generic type parameters, the only direct operation is assignment. This lack of knowledge greatly limits what can be done with values of such types, or types built up from those type parameters.

To enable more operations, Zed allows a "generic type parameter interface" to be defined for each type parameter. That interface specifies procs and operators which can be used with values of the type. When the generic is instantiated, the compiler checks the types used in the instantiation, to verify that the procs and operators in the interface are available and of the proper signature. Generic procs and types are then instantiated, and will use the actual procs attached to the actual types used in the instantiation.

Some languages call generic parameters with such an interface "constrained type parameters" because only types which provide the procs in that interface can be provided in an instantiation. Zed looks on the positive side of things, and considers the presence of such an interface to be adding capabilities to the code in the generic.

An interface specification for a generic type looks similar to a normal interface specification ("9.2.1 Interface Syntax"), except there can be operator lists, there is no visibility specification, there is no 'partial' or 'final', there can be no 'extends' clause, and instead of an interface name there must be the name of one of the type parameters to the generic.

Additional syntax within a type parameter interface specification is available to specify which "#" operators are available to the code within the generic. Any instantiation of the generic must provide those operators on the actual types which are used in the instantiation. In an internal instantiation, the instantiating generic must provide all procs and operators needed by the instantiated generic.

There is one small, but important, exception to the above rule. If the type given in an instantiation is one of the "4.1 Basic Types", then the needed operators will be provided by that basic type. If the type does not support a needed operator, errors will be indicated during the instantiation of generic procs which require the operator. Basic types cannot provide named procs specified in a generic type parameter interface.

This exception allows things like sorting and searching generics to operate with types like 'uint', 'float', 'string', etc.

The syntax for operator specification clauses within the interface is:

"binary" or "prefix"
list of allowed "#" operators, separated by commas
';'

Note that the "binary" or "prefix" are string literals. The allowed operators after "binary" are: '#=', '#~=', '#<', '#>', '#<=', '#>=', '#==', '#~==', '#+', '#-', '#*', '#/', '#%', '#&', '#|', '#^', '#><', '#<>', '#<<', '#>>', '#<~', '#>~' and '#:='. The allowed operators after "prefix" are '#+', '#-' and '#~'.

Compile-time "#" operator procs can be used. See "13 "#" Operators" for details on what the procs must look like.

As an example, consider a teaching variant of the simple "bubblesort":

    generic BSort(any type genType; uint COUNT) {
        interface genType {
            "binary" #>;
            proc Dump(string nonNil header; @ ro [COUNT] genType aDataArray)void;
        };

        export proc
        Sort(@ [COUNT] genType ro aA)void:
            for i from 0 upto COUNT - 2 do
                for j from 0 upto COUNT - 2 - i do
                    if aA@[j] #> aA@[j + 1] then
                        con temp := aA@[j];
                        aA@[j] := aA@[j + 1];
                        aA@[j + 1] := temp;
                    fi;
                od;
                genType##Dump(FmtS("0 - ", COUNT - 1 - i), aA);
            od;
        corp;
    };

Note that this generic also has a 'uint' parameter, which provides the size of an array of values to be sorted. Given that this sorting proc sorts tracked values, it is more likely that a real version of this proc would be passed a vector of values, and would determine its size at run time.

When sorting values, we normally don't want to order the values by their address, but by the contents of the values. Since the generic code knows nothing of their contents, it cannot do that without help. In this example, the use of the interface specification for the generic type parameter "genType", specifies that the '#>' operator will be available to compare values. This interface also provides one proc, "Dump", which it uses to show how far it has progressed in the sorting. Since the input array is also passed to the proc, that proc can display the partially sorted values.

The interface proc is accessed using the '##' syntax, selecting the proc from the name of the generic type parameter. This syntax is used to emphasize that the procs must be attached to the actual type as "exports" on that type, the same as in other uses - see "18.11 "##" Accesses".

As a simple concrete example, we use a rename of 'string' for a sort test.

    type MyString_t = string;

    uint M = 10;

    [M] MyString_t Strings := [
        "the", "quick", "red", "fox", "jumps",
        "over", "the", "lazy", "brown", "dog"
    ];

    proc
    StringGreaterThan(MyString_t str1, str2)bool:
        str1 > str2
    corp;

    proc
    StringDump(string nonNil header; @ ro [M] MyString_t aA)void:
        FmtN(header, ": ");
        for i from 0 upto M - 1 do
            FmtN(" ", aA@[i]);
        od;
        Fmt();
    corp;

    eval Types/ExportProcAdd(MyString_t, Exec/HASH_GREATER_THAN, StringGreaterThan);
    eval Types/ExportProcAdd(MyString_t, "Dump", StringDump);

    instance StringSorter = BSort(MyString_t, M);

    proc
    showStrings(string nonNil header)void:
        Fmt("Strings ", header, ":");
        for i from 0 upto M - 1 do
            if i ~= 0 then
                FmtN(" ");
            fi;
            FmtN("\"", Strings[i], "\"");
            if i ~= M - 1 then
                FmtN(",");
            fi;
        od;
        Fmt("\n");
    corp;

    export proc
    testStrings()void:
        showStrings("Unsorted");
        StringSorter.Sort(@Strings);
        Fmt();
        showStrings("Sorted");
    corp;

We rename 'string' as "MyString_t" so that we can export procs on it - we cannot export procs on 'string' because it is a language builtin type. There can be multiple renames of 'string' with different sets of exports, thus allowing, for example, case sensitive versus case-insensitive sorting.

Comparison proc "StringGreaterThan" is defined and attached to our type "MyString_t", using name "Exec/HASH_GREATER_THAN", the standard name for the '#>' operator. Dumping proc "StringDump" is defined and attached using name "Dump", as required by the generic type interface for "BSort"/"genType". The example then instantiates "BSort", and tests the sorter.

See the example below for a more complex situation.

The actual procs which implement generic type parameter interface methods must match those methods in terms of parameters and result. They cannot match exactly because the method signatures reference the uninstantiated generic types while the actual procs reference types involving the instantiating types. The exact rules are:

if the result type of a method proc is the uninstantiated generic type, then the result type of the actual proc must be the corresponding instantiating type; otherwise the result types must match exactly
the 'nonNil' status of the actual proc must match that of the method
the number of parameters in the actual proc must match that in the method

For each parameter in a generic type method signature:

if the method parameter type is the uninstantiated generic type, then the actual proc parameter type must be the corresponding instantiating type
if the method parameter type is '@' of the uninstantiated generic type, then the actual proc parameter type must be '@' of the corresponding instantiating type and the storage flags of the two '@' types must match
if the method parameter type is a matrix of the uninstantiated generic type, then the actual proc parameter type must be a matrix of the corresponding instantiating type, and the two matrix types must have the same number of bounds, the same storage flags and, if the 'private' storage flag is specified, the same package of definition
if the method parameter type is '@' of an array of the uninstantiated generic type, then the actual parameter type must be '@' of a same-dimensionality array of the corresponding instantiating type and the bounds of the arrays must match. If the bound in the method parameter is a generic 'uint' parameter, then the bound in the actual proc parameter must be the corresponding instantiating 'uint' simple compile-time expression. If the array element type in the method is not the uninstantiated generic type, then the element type in the actual proc parameter must be of the same type.
otherwise the actual proc parameter type must be the same type as the method parameter type
the 'nonNil' attribute of the actual proc parameter type must match the 'nonNil' attribute of the method parameter type: A property of Zed matrix type compatibility allows sorting of vectors (1 dimensional matrixes) to work nicely. Vectors are compatible (ignoring any storage flags) if the element types are the same, or one is a rename of the other. So, for example, a vector of 'string' can be passed to instantiations of a sorting generic instantiated with different renames of 'string', and thus having different comparison procs. This allows different sorts to be done on the same vector, simply by choosing which instantiation to select the sorting proc from.

10.7 Multiple Generic Parameters

All of the generics we have seen so far had only one type parameter. Few generics will need more than that one. The example here is a "mapping" generic, which maps one tracked value to another tracked value, using a hash table. Some code has been replaced with "..." to reduce the example size.

Things have gotten more complex with this example - it takes two type parameters. It requires that the "key" type provide both a '#~=' operator and a "hash" proc. The former is used to compare key values when a candidate is found in the hash table and the latter is used to provide a hash code for use with the hash table.

Note that type "EntryStatus_t" must be defined outside of the generic - see the rules above.

    enum EntryStatus_t {
        ens_free,
        ens_empty,
        ens_used
    };

    export generic Mapper(any type keyType, valueType) {
        interface keyType {
            "binary" #~=;
            proc UintHash(keyType nonNil k)uint;
        };

        struct Entry_t {
            EntryStatus_t en_status;
            uint en_hash;
            keyType en_key;
            valueType en_value;
        };

        /* This is exported as the mapping type. */
        export record Mapping_t {
            [] Entry_t nonNil m_contents;
            uint m_entryCount;
        };

        proc
        clear([] Entry_t nonNil contents)void:
            ...
        corp;

        export proc
        Create(uint size)nonNil Mapping_t:
            /* This matrix creation triggers instantiation of this proc. */
            con contents := matrix([size] Entry_t);
            clear(contents);
            Mapping_t(contents, 0)
        corp;

        proc
        search(Mapping_t nonNil m; keyType nonNil key; uint hash)uint:
            con entries := m->m_contents, size := getBound(entries);
            var pos := hash % size;
            con start := pos;
            uint found;
            var first := true, foundEmpty := false;
            while
                if pos = start and not first then
                    /* Scanned entire table - key not found - return the first of
                       the free slots we found. We must have found one, since we
                       only let the tables get 4/5th full. */
                   return found;
                fi;
                con aEn := @entries[pos], ens := aEn@.en_status;
                if ens = ens_free then
                    /* Found an unused slot - key is not in table. */
                    return if foundEmpty then found else pos fi;
                fi;
                if ens = ens_empty then
                    /* Found a slot that was used in the past, so keep looking,
                       but remember this slot for entry if it is the first such. */
                    if not foundEmpty then
                        foundEmpty := true;
                        found := pos;
                    fi;
                    true
                elif aEn@.en_hash ~= hash then
                    /* Match not possible - keep looking. */
                    true
                else
                    /* Same hash - compare keys. */
                    nonNil(aEn@.en_key) #~= key
                fi
            do
                pos := pos + 1;
                if pos = size then
                    pos := 0;
                fi;
                first := false;
            od;
            pos
        corp;

        export proc
        Enter(Mapping_t nonNil m; keyType nonNil key;
              valueType nonNil value)void:
            con count := getBound(m->m_contents);
            if m->m_entryCount >= count * 4 / 5 then
                /* Expand the table. */
                ...
            fi;
            con hash := keyType##UintHash(key),
                aEn := @m->m_contents[search(m, key, hash)];
            aEn@.en_status := ens_used;
            aEn@.en_hash := hash;
            aEn@.en_key := key;
            aEn@.en_value := value;
            m->m_entryCount := m->m_entryCount + 1;
        corp;

        export proc
        Lookup(Mapping_t nonNil m; keyType nonNil key)valueType:
            con hash := keyType##UintHash(key),
                aEn := @m->m_contents[search(m, key, hash)];
            if aEn@.en_status = ens_used then
                aEn@.en_value
            else
                nil
            fi
        corp;

        export proc
        Delete(Mapping_t nonNil m; keyType nonNil key)void:
            con hash := keyType##UintHash(key),
                aEn := @m->m_contents[search(m, key, hash)];
            if aEn@.en_status = ens_used then
                aEn@.en_status := ens_empty;
                aEn@.en_hash := 0;
                aEn@.en_key := nil;
                aEn@.en_value := nil;
                m->m_entryCount := m->m_entryCount - 1;
            fi;
        corp;
    };

The next section partially instantiates this doubly-parameterized generic.

When a generic has more than one parameter, it is technically possible that the expressions for those parameters in an instantiation can have side effects, and so the order of evaluation of the parameters can matter. This is true even though the instantiation of the generic is happening at compile time. Zed says nothing about whether or not evaluation of the parameters can be re-ordered. If you find yourself in a situation where it matters, fix it.

10.8 Using One Generic Inside Another

As shown in the above section on the syntax of generics, a generic can contain an instantiation of another generic. This is done when the new generic makes use of the old generic as part of the implementation of the new generic. Building on the above "mapping" example, we can provide a generic which provides "symbol tables" keyed with strings, but whose entry values must be further instantiated:

    package /SymTab;
    use /Types;
    use /Exec;

    type SymTabString_t = string;

    /* These two procs must be exported so that our SymTab stuff can be used from
       other packages. */

    export proc
    symTabNotEqual(SymTabString_t nonNil s1, s2)bool:
        s1 ~= s2
    corp;

    export proc
    symTabUintHash(SymTabString_t nonNil s)uint:
        /Basic/StringHash(s)
    corp;

    eval Types/ExportProcAdd(SymTabString_t, Exec/HASH_NOT_EQUAL, symTabNotEqual);
    eval Types/ExportProcAdd(SymTabString_t, "UintHash", symTabUintHash);

    export generic GenSymTab(any type entry) {
        instance SymTab = /Mapping/Mapper(SymTabString_t, entry);

        export type SymTab_t = SymTab.Mapping_t;

        export proc
        Create(uint size)nonNil SymTab_t:
            SymTab.Create(size)
        corp;

        export proc
        Enter(SymTab_t nonNil st; string nonNil key; entry nonNil en)void:
            SymTab.Enter(st, key, en);
        corp;

        export proc
        Lookup(SymTab_t nonNil st; string nonNil key)entry:
            SymTab.Lookup(st, key)
        corp;

        export proc
        Delete(SymTab_t nonNil st; string nonNil key)void:
            SymTab.Delete(st, key);
        corp;
    };

As in the sorting example, we must rename 'string' in order to be able to export procs on it. Code in this generic simply calls procs in the original "mapping" generic.

If a generic [Gen1] instantiated inside another generic [Gen2] has type parameters which have associated interfaces, then the instantiating generic [Gen2] must also have interfaces associated with the type parameters which it uses to instantiate the first generic [Gen1]. The interfaces in the instantiating generic [Gen2] must provide all of the operators and methods needed by the instantiated generic [Gen1], and the provided methods must match in terms of parameters and result type. The definition of "match" here is that detailed above for the matching of actual procs exported by an actual type to the methods in a generic type parameter interface that the actual type is being used for. The interfaces provided in the instantiating generic [Gen2] might have additional methods not needed by the instantiated generic [Gen1]. [Help! I need better wording here. Also a meaningful example. I first added this facility when doing QuickSort, where I wanted to use an insertion sort generic inside the main QuickSort code, and I was using a "Compare" interface method rather than '#<>' ('<>' didn't exist yet, and neither did the ability to use "#" operators in generic code).]

10.9 When Generic Procs Must be Instantiated

As mentioned above, the Zed system does not always need to instantiate generic procs when the overall generic is instantiated. When a proc is not instantiated, all instantiations can use the same machine code for the proc. When the proc must be instantiated, new machine code is created for that instantiation. Thus, less overall "code bloat" occurs when generic procs are not instantiated.

Generic proc instantiation is reported when the compiler "info" level is set to 2 or greater. See "97.2 Current System Status". Two concepts are related to the need to instantiate generic procs.

The first is when a type's size cannot be known in its uninstantiated form because it directly contains uninstantiated generic types. The size of 'any' 'type' generic parameter types is known (they are all tracked values), but "plain" and '@' generic parameter types can vary in size from instantiation to instantiation. If such types are used as the element type for an array, or as a field in a struct, then the size of that new type is also not known. Similarly if a 'uint' generic parameter is used as the bound of an array type, the size of that array type cannot be known. In the list below, all of these types are known as "varying" types.

The second concept is when a type contains, anywhere within it, a generic parameter. This can be nested within record types, struct types, matrix types, etc. This also includes array types whose size is a 'uint' generic parameter. In the list below, these types are known as "dependent" types.

Things which trigger the need for generic proc instantiation are:

the containing generic has a "plain" generic type parameter
assignment of a value of a "varying" type
calling of a proc which itself must be instantiated
use of a 'uint' generic parameter
use of 'getBound' on an array bound which is a 'uint' generic parameter
use of an operator or proc from a generic type parameter interface
declaration of a local variable of a non-'any' generic type parameter
declaration of a local variable of a "varying" type
indexing of an array or matrix of a "varying" type
use of 'sizeof' on a "varying" type
use as value of a proc which itself needs instantiating
use of a record constructor where the record type is a "dependent" type
construction of a matrix of a "dependent" type
use of a struct initializer or constructor for a "dependent" type
use of a vector constructor for a "dependent" type
use of an array initializer or constructor where the element type is a "varying" type
use of a struct/record field that is at an offset within a struct or record that is "varying". I.e. the field comes after a field of "varying" type.
use of 'assign' to do a run-time type check where the type being checked for is a "dependent" type
use of 'assign' to do a run-time proc type check where the proc type being checked for is a "dependent" type
use of 'select' from a variant record where the variant part comes after a field of a "varying" type
use of a "dependent" type as a value

This might seem to be a lot of situations, but it is quite possible to create generics in which procs do not need to be instantiated. An example is the system "Lists" package, in which the generic linked list procs do not need instantiation. Note that the use of compile-time execution within the code of generic procs can trigger one of the above situations.

Compile-time execution (calls to 'ctProc', 'ioProc', etc. procs) within generic procs happens as normal. This is as desired, since if the generic proc is one which does not need any instantiation, such compile-time execution must happen during the initial processing of the proc. As mentioned above, the compile-time execution can effect whether or not the proc will need instantiation.

If a generic proc is not instantiated, then it is part of the package containing the generic. If a generic proc is instantiated, then an instantiation of it is part of the package containing the instance. This can have rare subtle effects, e.g. when examining "pk_bcActiveCount" to see if any bytecode from a package is running.

10.10 _instantiate_

When a generic is instantiated, Zed checks for a proc in the generic with name "_instantiate_", and forced proc type "Package/InstantiationCompleter_t", which is defined as:

    proc(PContext_t nonNil pctx; GenericInstance_t nonNil inst)void

After the instantiation work is done, the "_instantiate_" proc will be called. This call happens at compile time - see section "18 Compile Time Execution" for details of compile time execution. This call can be used, among other things, to add type exports to the instantiated types. For example:

    generic Gen2(any type genType) {
        export record Gen2_t {
            Gen2_t gen2_next;
            genType gen2_this;
            string gen2_tag;
            uint gen2_n;
            float gen2_size;
        };

        export type Gen2Vec_t = [] Gen2_t;

        /* This needs to be exported from the generic so that Fmt can access
           instances of it. */
        export proc
        fmtVec(CharBuffer/OBuf_t nonNil ob; Gen2Vec_t vec; string format;
               uint width, precision)void:
            if vec ~= nil then
                uint count := getBound(vec);
                if count = 1 then
                    FmtB(ob, "<Gen2Vec_t[", vec[0], "]>");
                elif count ~= 0 then
                    FmtB(ob, "<Gen2Vec_t has ", count, " elements>");
                else
                    CharBuffer/OString(ob, "<empty Gen2Vec_t>");
                fi;
            else
                CharBuffer/OString(ob, "<nil Gen2Vec_t>");
            fi;
        corp;

        Package/InstantiationCompleter_t: proc
        _instantiate_(Package/PContext_t nonNil pctx;
                      Package/Instance_t nonNil inst)void:
            /* We are already running at compile time, so we can't directly use
               "FmtAdd". We use alternative "FmtAddCT". Because we are adding the
               "Fmt" proc here, it will know the name of the type as "Gen2_t", and
               not as any rename of the type that might be done later. */
            FmtAddCT(pctx, Package/GetTypeByName(inst, "Gen2_t"));
            /* We are adding the generic proc "fmtVec" here. The "Fmt" code
               understands that, and will use the proper instance. */
            Types/DoExportProcAdd(pctx, Package/GetTypeByName(inst, "Gen2Vec_t"),
                                  FMT_THIS, fmtVec);
        corp;
    };

In this example, "_instantiate_" uses "Fmt/FmtAddCT" to create and add a custom "fmt" routine for the instantiated "Gen2_t" type, and adds "fmtVec" as a custom "fmt" routine for instantiated type "Gen2Vec_t". Because these type exports are done during the instantiation, they happen before any aliasing of the instantiated types can occur, and thus the type exports tables will be shared between the long form and aliased forms of the instantiated types.

With those exports made, "Fmt" and friends can be used on values of types "Gen2_t" and "Gen2Vec_t" from all instantiations of "Gen2", and the results will be "nicer" than just hexadecimal addresses.

A simpler use of "_instantiate_" appeared in the above array multiplication example - "10.5 A Generic Array Multiplication Example". In that example, the values of instantiating 'uint' generic parameters are retrieved from the internal data structures used to hold them, and printed.

Note that "_instantiate_" is compiled as part of the generic, and so is treated the same as other generic procs with respect to instantiation. As mentioned above, one aspect of this is that if it is instantiated, it is created within the package which contains the instance, and not within the package which contains the generic (and thus contains the original uninstantiated version).

12 Measures and Units

12.1 Introduction

The measures and units facility in Zed is intended to help check the correctness of programs dealing with physical (and other) units. This kind of facility can be found in CAD systems, and I am aware of one other general purpose programming language ("Physcal") that implemented it. The name "measure" refers to a basic kind of value which can be measured, often in the physical world. The name "unit" refers to one specific unit of a measure. Some of the checking that measures and units enable is often termed "Dimensional Analysis".

For example, "Length" and "Time" are common measures. "mile", "metre" and "light-year" are units of Length, while "second", "hour" and "millenium" are units of Time. When dealing with units, standard scale prefixes such as "kilo", "nano" and "Mibi" are often used. Often, calculations produce values whose unit expression does not have a name defined in the active context. Such a value has a more complex "measure expression" and "unit expression". Zed supports all of these concepts.

The Zed language itself does not pre-define any measures or units. Instead, Zed provides language constructs for defining measures and units. A standard Zed distribution will include library packages containing standard measure and unit definitions. The scale prefixes are, however, all built in to the Zed language.

There are two main error situations which are addressed by the Zed measures and units facility. The first is that of measure mismatches. For example, if an expression is determined to be of a numeric type which has an attached measure expression of "Length", then that expression cannot be assigned to a variable whose type has an attached measure expression of "Time". The Zed compiler will flag this as a "measure mismatch".

The second error situation is that of unit mismatches. There are two kinds of unit mismatches. The first kind occurs when the units themselves differ. For example, an expression whose type has units "minute" cannot be directly assigned to a variable whose type has units "second". A multiplication by 60 is needed here, and such value changes must be stated explicitly in Zed, usually using the 'unit' construct. The second kind is when the units match, but the scale factors do not. For example, an expression whose type has units "kilometre" cannot be directly assigned to a variable whose type has units "metre". Here, a multiplication by 1000 is needed, and again Zed requires that the value change be done explicitly.

The use of measures and units in Zed code is optional - the language does not require any types to have units. However, library routines might have parameters or results with units, and calls to such routines must use correct types.

Unit expressions can only be used with numeric types 'float', 'uint' and 'sint'. Unit expressions are applied using a parenthesized description after the type or literal. All of the rules and capabilities of the non-unit types are applicable to the types with units. Additional rules deal with the unit expressions.

[The wording "type with associated unit expression" is very awkward. This document will normally use shorter forms, such as "type with unit expression", "type with units", "unit type", etc. All of these forms mean the same thing. The choice of which to use will often depend on context - in a discussion clearly involving units, the shorter forms are used, but in a context where units have not been relevant, longer forms will be used for clarity.]

Numeric types without unit expressions are sometimes referred to as the "plain" versions of the types.

The "Fmt" text-output library routines are aware of measures and units. By default, a value to be output whose type includes a unit expression will be output with that unit expression, enclosed in parentheses, after the value. If a format string is given with the value, no unit expression will be output unless the format string includes the modifier character "u". Any field length specification is applied only to the actual value - the unit expression is then output after that formatted output.

The 'unit' construct can be used to change, add or remove unit expressions. So, it can be used to change the output unit expression of a value which is being output by the "Fmt" procs. For example, if a value has units "kilometres/second" and it really makes sense for it to be "kilometres/hour", then "unit(<expr>, km/hr)" can be used to do the conversion, directly in the parameter list to the "Fmt" proc.

'float, 'uint' and 'sint' output formatting with "Fmt" has additional options, which are done as modifiers to the basic format codes. Any of these modifiers implicitly enables "u" output. Modifier "U" asks the "Fmt" code to try to find a single unit which has the same measure expression as the value being output. If such a unit can be found, the base unit will be used for integral output, but 'float' output will try to find the one such that the smallest multiplier greater than 1.0 is needed to convert values.

Another modifier is "m", which asks the "Fmt" code to simplify the measure expression of the value as much as possible. For example, a value which is in "(volt * ampere)" will end up as "(watt)". There is some overlap in functionality between "U" and "m".

The final modifiers are "s" and "S". Unlike the other modifiers, which operate at compile time within the "Fmt" code, these modifiers base their actions on the exact value being output, and so work at run time. They attempt to scale the value, using the scale factors defined for the single unit of the value, so that the output is "people friendly". To be specific, the code attempts to scale the value, using the defined scale factors, so that it is between 1 and 10000. For example, 23948762 watts will be shown as 24 Gigawatts for integral values, or as 23.95 Gigawatts for 'float' values.

The scaling code operates slightly differently when the binary scale factors are used - it attempts to reduce the value such that it can be displayed using 3 hexadecimal digits.

If modifier "S" is used, then the full set of scales defined for the unit will be tried. If the modifier is "s", then the "partial" scales ("centi", "deci", "deca" and "hecto") will not be used. Combining "s" or "S" with "U" or "m" can be useful.

If multiple "s" or "S" modifiers are given, then the set of scales defined for the type is ignored, and all scaling factors can be used. For example, with only a single "s", a value might be output as "1230000.(Mg)", but with two "s"s provided, the value is output as "1.23(Tg)" even though the "tera" scaling factor is not defined for the "gram" unit.

For reference, modifier "U" uses proc "Types/FindSingle", modifier "m" uses proc "Types/FindSimpler", and modifiers "s" and "S" use procs "Types/FindUintScale", "Types/FindSintScale" and "Types/FindFloatScale".

As mentioned above, there is no run-time cost to using measures and units, except when explicit scaling is requested via the 'unit' construct, or when the "Fmt" code has been given an "s" or "S" format modifier, asking it to try to scale the run-time value to a "human" size.

12.2 Defining Measures

Measures are defined at the package level using 'measure' declarations. As with other package level declarations, they can be private to the current subpackage, 'local' to the current package, or exported to all or a set of other packages. Because of how they are used, most will be globally exported. Syntactically, a measure declaration consists of:

optional visibility specification
'measure'
name of new measure
'('
priority of new measure
')'
optional '=' and measure expression

A measure expression consists of a measure expression numerator and possibly a ':' and a measure expression denominator. The numerator can be the literal '1' if no other terms are needed. Both numerator and denominator are one or more references to other measures, separated by '*'s.

[It would be more familiar to use '/' to separate the numerator from the denominator, but that is confusing because of the non-standard operator precedence that would result - all denominator terms are grouped together, without any grouping parentheses. ':' is sometimes used to represent a "ratio", so it is appropriate here. Other syntaxes were considered, such as using whitespace to separate the terms, but all had problems of one kind or another.]

If a measure definition does not have a measure expression, then the new measure is a basic measure (such as "Length") that is not defined in terms of any other measures. Given appropriate existing packages and measure definitions (the standard library of definitions is in "Units/MGS"), examples of measure expressions are:

Length : Time
Length : Time * Time
Units/MGS/Mass * Units/MGS/Mass * Units/MGS/Length : Units/MGS/Time
1 : /Prog/OtherUnitPackage/Time

The priority value given to a new measure is used to define a standard (canonical) order for measures. This order is applied to all internal representations of measures and units, and so will be used in measure and unit expressions in error messages, and in unit expressions produced by "Fmt" code. Large priorities sort first. The intent here is to encode the conventional ordering that is normally used. For example, most people would expect "kg*m : s" rather than "m*kg : s", even though both are valid. [Mentally translate to "kg*m/s" vs "m*kg/s".] If two measures have the same priority, then their order of appearance in error messages and Fmt, etc. output is undefined, and can vary from run to run of a piece of Zed code. [There is no current reason for the order to vary.]

Example measure declarations:

export measure Length(1001);
local measure Time(1002);
export(../OtherPk) measure Speed(100) = Length : Time;
measure MyMeasure(10) = Speed * Speed * Speed : Mass * Mass;

[Measure names are only used in measure expressions and unit declarations. They are normal package-level symbols, but the Zed language has no other situations in which a measure symbol would be usable.]

12.3 Defining Units

Units are defined, also at the package level, using unit declarations. The defined units have the usual private/local/export capabilities. The declarations consist of:

optional visibility specification
'unit'
path to the measure that the new unit is a unit of
'('
abbreviation for the new unit
')'
optional '=' and defining unit expression
optional '{', scale factor list, and '}'

A defining unit expression is analagous to a defining measure expression, except that it uses unit abbreviations with optional scale prefixes instead of measure paths as it elements, and the numerator can be preceded by a literal multiplication factor (either integral or floating point). One unit of a given measure can be without a defining unit expression - that unit is the base unit of the measure. Example base units are "metre", "gram" and "second".

Note that unit expressions use unit abbreviations, not unit names. This is consistent with the use of the abbreviations in other situations, such as when making a type be a unit type, and when specifying a unit expression for a numeric literal. Unit abbreviations are not considered to be defined symbols within the package - only the unit name is so defined. Instead, unit abbreviations are in a name space all their own.

Since there are no spaces between a scale prefix and a unit abbreviation, it is possible to construct ambiguous cases. The Zed compiler does its best to interpret all situations it sees, but there are some it cannot properly resolve. It will report some cases as errors when it can. These problems must be resolved either by restricting the set of scale factors which can be used with the units involved, or by choosing different abbreviations for the units, even if that results in non-standard abbreviations.

Programmers who are defining new measures and units to be used alongside the existing standard measures and units should test their new definitions along with all of the standard ones, to make sure they have not accidentally introduced unresolvable ambiguities. If a new set of measures and units can only be used without the standard ones, the defining package should clearly state that.

Note that many of the common units which are used are not used with very many scale prefixes. For example, we commonly speak of "kilograms", but not of "Megagrams", since we normally use other units for masses much larger than kilograms.

A scale factor list consists of zero or more scale factor abbreviations, separated by commas. The scale factor abbreviations, their full names and their meanings are:

    q - quecto - 10^-30
    r - ronto - 10^-27
    y - yocto - 10^-24
    z - zepto - 10^-21
    a - atto - 10^-18
    f - femto - 10^-15
    p - pico - 10^-12
    n - nano - 10^-9
    u - micro - 10^-6
    m - milli - 10^-3
    c - centi - 10^-2
    d - deci - 10^-1
    da - deca - 10^1
    h - hecto - 10^2
    k - kilo - 10^3
    M - mega - 10^6
    G - giga - 10^9
    T - tera - 10^12
    P - peta - 10^15
    E - exa - 10^18
    Z - zetta - 10^21
    Y - yotta - 10^24
    R - ronna - 10^27
    Q - quetta - 10^30
    Ki - kibi - 2^10
    Mi - mibi - 2^20
    Gi - gibi - 2^30
    Ti - tibi - 2^40
    Pi - pebi - 2^50
    Ei - exbi - 2^60
    Zi - zebi - 2^70
    Yi - yobi - 2^80

Note that some of these scaling factors require more than 64 bits to represent their scale factor. This means that Zed cannot directly represent the scale factors, since it is currently limited to 64 bit values.

Units names are not used anywhere in normal Zed code. Like measure names, they are normal package-level symbols, but there simply isn't any syntactic location where they are used. The one current use is in an error message, to name an existing base unit for a measure when a subsequent unit definition without a defining unit expression is encountered.

Assuming the previous set of example measure declarations, some example unit declarations could be:

export unit Length Metre(m) {y, z, a, f, p, n, u, m, c, k};
export unit Time Second(s) {y, z, a, f, p, n, u, m};
local unit Time Minute(min) = 60*s;
export(../pk1, ../pk2) unit Length Inch(in) = 0.00254*m;
unit Speed MilesPerHour(mph) = 17.6*in : s;
unit Units/MGS/Mass Pound(lb) = 453.592*g;
unit Units/MGS/Mass Ton(t) = 2000*lb {k, M, G};

[Spaces are used around '*'s in measure expressions, but not in unit expressions. Measure expression terms can include paths with '/'s, so not using spaces there could be hard to read. I currently believe that not using spaces with unit expressions makes them a bit easier to read. Spaces are used around a ':'. Note that this is not a language rule, but a convention that is used in sample source files, and which is used by the system pretty-printer.]

When defining new units which are of the same measure as previous units, it is best if the multiplication factors are > 1.0, as this allows the "unit simplification" code to work best. In other words, make the basic units the "smallest" units of their measure, and expand to "larger" units from there. The demands of existing standards often override this rule, however. For example, the standard mass unit is "gram", not "yoctogram".

[Standard package "Units/MGS" contains a collection of measures and units based on a "metre-gram-second" system, plus many additional measures and units from information on Wikipedia. I am not personally familiar with many of these units, so corrections and additions are welcome. Also, updates to the lists of used scaling prefixes would be appreciated.]

12.4 Using Units

A unit expression can be applied to types 'float', 'uint' and 'sint' by following the type reserved word with a parenthesized unit expression. Examples:

float(cm*cm)
uint(m : s)
sint(g*g*m : s*s)

Such unit types can be used wherever other types can be used. Variables, fields and parameters of unit types can be declared and used like those without unit expressions, named constants of unit types can be defined, etc.

Numeric literals can be given a unit type by following the literal with a parenthesized unit expression. Examples:

12(cm)
13.7e-15(g*um : s)

A constant 0 (or 0.0) with no unit expression is compatible with any unit type that it would otherwise be compatible with. I.e. "0.0" is compatible with 'float' unit types, and "0" is compatible with 'uint' and 'sint' unit types. This rule is similar to the one which allows 'nil' to be compatible with all tracked and pointer types.

Other than the above exception, unit types must match exactly. Unit expressions cannot be silently dropped or added, and multiplication and scale factors will not be silently used. [The reasoning here for the latter is that applying a multiplication or scale factor changes the value and can reduce the numeric accuracy or cause an underflow or overflow. I felt that this should not be done without an explicit request.]

12.5 The "unit" Construct

The 'unit' construct is used to add, remove or modify the unit type of an expression. It cannot be directly used to change from one unit type to a completely different unit type (one with a different underlying measure expression), but that can be done with a pair of 'unit' constructs, the first of which removes the unit type.

Syntactically, the 'unit' construct consists of:

'unit'
'('
value expression whose unit type is to be affected
','
'*' or a unit expression
')'

Note that the unit expression is not enclosed in separate parentheses. Since only 'float', 'uint' and 'sint' can have unit types, the value expression must be of one of those types, or one of those types with a unit expression. The syntax here follows the pattern of other similar constructs in Zed. If '*' is used instead of a unit expression, then the provided value expression must have a unit type, and that unit type is removed. Otherwise, the given unit expression is applied to the value expression. Some examples:

unit(valueInMetres, cm)
unit(valueInFeet, cm)
unit(valueWithUnits, *)
unit(valueWithNoUnits, cm*cm : s*s)

The first example shows an explicit request for scaling - in this case a multiplication by 100 will be done. The second example shows a conversion between units of the same measure - a multiplication will be done. The third example shows the stripping of a unit expression from the value expression type, yielding a "plain" value. The fourth example shows applying a unit expression to a plain value.

There are some additional special cases with using 'unit' to remove a unit expression. If the value expression is '@' of an array of 'float', 'uint' or 'sint' with unit expression, then the result is '@' of a corresponding array without units. The same is true for a pointer to such an array. These special cases allow library routines dealing with arrays of "plain" numbers to be used with arrays of numbers with unit types. Similarly, a unit expression can be removed from the elements of a matrix of numbers.

When 'unit' is converting a value from one unit expression type to another such type with the same measure expression, it must calculate a combined factor for the conversion. The code in the Zed compiler that is responsible for this calculation works hard to maintain accuracy. Calculations are done using integral operations where possible, including the use of cancellation of GCD's (Greatest Common Divisor) to avoid overflows. This is done with integral multiplication factors and with scale factors.

If the final conversion is purely integral, then it will require either a single multiplication or a single division. If division is required (e.g. converting from inches to feet), then the division is a direct truncating one - i.e. no attempt is made to do a rounding division. Note that when doing integral multiplication it is possible to overflow, and that would be signalled at run time unless the value expression to be converted is known at compile time. If the final conversion factor is floating point, or the original value is floating point, then a floating point multiplication will be done.

If the computation of the final conversion factor overflows the 64 bit unsigned integer calculations used, the compiler will switch to floating point calculations for the remainder of the computation of that single conversion factor, and that can result in a small loss of accuracy. [One alternative would be to switch to arbitrary-precision arithmetic in the compiler, but this has not been done - in real situations loss of accuracy is very unlikely.]

12.6 Units in Expressions

Unit types can be the results of 'if' and 'case' expressions. Constants "0" or "0.0" with no associated unit expression can be used as values in alternatives.

Unit types can be used with the following constructs: 'toUint', 'fromUint', 'flt', 'round' and 'trunc'. The associated unit type will be passed through, on to the result type.

The following operators accept unit type arguments:

unary '-', '+' - the associated unit type is passed through
binary '*' - either or both arguments can have a unit type. The associated unit expressions are multiplied for the result type.
binary '/' - either or both arguments can have a unit type. The associated unit expression for the numerator is divided by the associated unit expression for the denominator to yield the associated unit expression for the result type.
binary '%' - the two arguments must match, and the result will be a plain type. [Think of '%' as being a series of subtracts, yielding what is left, rather than as a divide.]
binary '+', '-' - argument types must match
comparison operators - arguments must match or one must be constant "0" or "0.0" with no associated unit type

The presence of a unit type does not affect any constant folding that is done. Some examples using unit types:

    uint(s) MIN_TIME = 1(s);
    ...
    uint(m) distance := 13456(m);
    uint(s) time := 12(s);
    ...
    uint(m : s) speed := distance / time;
    speed := distance / MIN_TIME;
    speed := distance / 2(s);
    uint(m : s*s) accel := speed / time;
    ...
    float(g) weight := 10.(g);
    Fmt("Rate of change is ", weight / flt(time));
    float(g : s) negChange := - weight / 1.347(s);

Long form declarations, with explicit types, are used here to show how it is done. Most programmers would use 'con', 'var' or 'def' declarations instead.

As described previously, the "Fmt" package understands units and will by default include parenthesized unit descriptions after numeric output of values whose type includes a unit expression.

12.7 Units in Proc Calls

Nothing special is done with unit types involved in proc calls. For example:

    proc
    useSeconds(float(s) x)float(s):
        s * 3.75
    corp;

Proc "useSeconds" accepts a floating point value with attached "seconds" unit type and returns a value of the same type. This is fine, but what about procs which perform a mathematical operation for which there is a logical change to the unit expressions involved? For example, a proc which returns its single parameter multipled by itself has a logical behaviour with respect to unit types, but there is no way in Zed to declare that behaviour. Similarly, library proc "Sqrt" has a logical behaviour with respect to the unit type of its argument. It, however, also has a restriction - its logical behaviour requires that all factors in the argument's unit type have even powers. That would allow, for example:

    float(m*m) area := ...;
    float(m) length := Math/Sqrt(area);

What should not be allowed with such a proc includes:

    float(g) weight := ...;
    con something := Math/Sqrt(weight);

The unit type attached to "weight" does not have a unit type that can be "halved", as a square root proc might want to do. Zed allows the programmer to write specific procs for specific situations. For example:

    proc
    secSqrt(float(s*s) x)float(s):
        unit(Math/Sqrt(unit *, x), s)
    corp;

but there is no way to write such procs in a general purpose fashion.

A syntax could be invented, and made part of 'proc' types, which would allow proc calls to modify unit types in a more general way. This has not been done in Zed, however.

There are a couple of reasons why the above capability has not been implemented:

it is of fairly limited use
in most cases it is not necessary
it probably wouldn't be sufficient for all possible uses

[I know almost nothing about code involved in actual physical calculations. Thus, I do not know how (or even if!) unit types can be used in such code. My goal was simply to provide the facility for situations in which it can be of use in helping to write correct code. There are other software tools available, in a variety of programming languages, that deal with physical and other units. The most comprehensive I have seen is the "units" facility available on Linux (and likely other) systems. For those with access, file "/usr/share/units/definitions.units" can be an interesting read. Someone might wish to take on the job of turning that into Zed sources to do similar things, however it will not all be possible, because of things like Zed's requirement that unit multipliers be > 1.]

13 "#" Operators

13.1 Introduction

Unless stated otherwise, rules which affect non-"#" operators, calls, etc., also apply to their "#" counterparts. This includes operand/parameter evaluation order and operator precedence.

The "#" [I pronounce it "hash"] operators in Zed allow the programmer to define the meaning of various operators for their types. This can only be done for named types, e.g. for 'capsule' types, 'struct' types and types explicitly named via "type Name_t = ...". [Other languages support similar concepts - e.g. C++ allows the definition of operator functions for classes.]

In Zed, the operators whose behavior can be defined in this way all start with the "#" character. [This makes them ugly. It also makes them obvious, so that readers know that they need to look elsewhere to understand what is going on.] The meaning of the "#" operators is determined by programmer-provided procs which are associated with the type involved. For binary operators, such as '#/', it is usually the type of the left hand operand which chooses the operation to perform. [This differs from C++, where overloading is allowed and the compiler must examine all operands against all variants of the operator in order to choose which one to use. Zed can get something similar to operator overloading by allowing the programmer to specify "#" operator procs that run at compile time, and can thus choose what to allow for some operands, and how to deal with them.]

[Defining operators for your own types can make your code less bulky, and, if used with care, more readable. However, in many cases doing this will actually make your code less readable, since the reader must always be examining the definition of the operators to be aware of whether something special can be going on in any specific situation. Also, it is more difficult to define your own "#" operators than it is to use those defined in a well supported library. As a rough rule of thumb, I suggest that you not introduce a "#" operator unless it will be used at least half a dozen times.]

The "#" operators have official names which are given by strings exported from package Exec. Those names, their values and the operator involved can be seen from the definitions of the names:

 export string
    /* unary operators */
    HASH_NEG = "hashNeg",                               // #-
    HASH_PLUS = "hashPlus",                             // #+
    HASH_TILDA = "hashTilda",                           // #~
    HASH_AT = "hashAt",                                 // #@
    HASH_AMPERSAND = "hashAmpersand",                   // #&
    HASH_POST_AT = "hashPostAt",                        // postfix #@

    /* binary operators */
    HASH_B_AND = "hashBAnd",                            // #&
    HASH_B_XOR = "hashBXor",                            // #><
    HASH_B_SHL = "hashBShl",                            // #<<
    HASH_B_SHR = "hashBShr",                            // #>>
    HASH_B_IOR = "hashBIor",                            // #|
    HASH_RELATE = "hashRelate",                         // #<>
    HASH_POW = "hashPow",                               // #^
    HASH_MUL = "hashMul",                               // #*
    HASH_DIV = "hashDiv",                               // #/
    HASH_REM = "hashRem",                               // #%
    HASH_ADD = "hashAdd",                               // #+
    HASH_SUB = "hashSub",                               // #-
    HASH_LESS_THAN = "hashLessThan",                    // #<
    HASH_LESS_OR_EQUAL = "hashLessOrEqual",             // #<=
    HASH_EQUAL = "hashEqual",                           // #=
    HASH_NOT_EQUAL = "hashNotEqual",                    // #~=
    HASH_GREATER_OR_EQUAL = "hashGreaterOrEqual",       // #>=
    HASH_GREATER_THAN = "hashGreaterThan",              // #>
    HASH_STR_EQ_EQ = "hashStrEqEq",                     // #==
    HASH_STR_NOT_EQ_EQ = "hashStrNotEqEq",              // #~==

    /* special "operators" */
    HASH_ASSIGN = "hashAssign",                         // #:=
    HASH_ARROW = "hashArrow",                           // #->
    HASH_ARROW_ASSIGN = "hashArrowAssign",              // A #-> B #:= C
    HASH_DOT = "hashDot",                               // #.
    HASH_DOT_ASSIGN = "hashDotAssign",                  // A #. B #:= C
    HASH_INDEXING = "hashIndexing",                     // A #[ B ... C #]
    HASH_CALL = "hashCall",                             // A #( B ... C #)
    HASH_PARENS = "hashParens",                         // #( A #)
    HASH_BRACES = "HashBraces";                         // #{ ... #}

The precedence of these operators is the same as that of the corresponding non-"#" ones. The "arrowAssign" and "dotAssign" "operators" are compound forms, which correspond to assignment statements with "dot" or "arrow" selection in the assignment destination. The "indexing" "operator" allows programmers to define something that is syntactically like array/matrix indexing. "Indexing" can appear as the left-hand-side of '#:=', in which case it is the "destination" for that "assignment". The "braces" operation allows programmers to have an explicit list of any number of expressions which together create some final value. Examples of these operators are given later. The "call" operation allows the definition of something that is syntactically similar to a proc call.

Zed supports both tracked and direct multivalued types with "#" operators. ["Tracked" types include 'record', 'capsule', 'interface' and matrix types. "Multivalued" types are 'struct' and 'array' types.] In the examples below, assume the following type declarations:

    record Tracked_t {
        <fields of Tracked_t>
    };

    struct Multivalued_t {
        <fields of Multivalued_t>
    };

When dealing with tracked types and "#" operator procs for them, the 'nonNil' attribute is treated as usual by the Zed compiler. For example, if a '#+' operator proc is defined to take 'nonNil' tracked parameters and to return a 'nonNil' result, then the compiler will complain if an operand to the '#+' operator is not 'nonNil' or if the proc does not return a 'nonNil' result.

Some examples of "#" operators in use:

    Tracked_t tr1 := ..., tr2 := ..., tr3 := ...;
    Tracked_t tr4 := (tr1 #+ tr2) #* tr3;
    tr4 #:= tr3;

    Multivalued_t mv1, mv2, mv3;
    initMv(@mv1);
    initMv(@mv2);
    if mv1 #< mv2 then
        mv3 #:= mv1 #* #-mv2;
    else
        mv3 #:= #-mv1 #* mv2;
    fi;
    mv3#.noSuchUintField := mv2#.noSuchUintField * 7;
    float f := mv3#@;

    Database_t db := dbOpen(...);    // Note: *not* a Zed 'DbType'
    db#["key1"#]#.value := 17.6;
    db#["key2", getKey3() + 1#].found := true;

    float fl := ...
    MyFloatList_t mfl := #{fl / 2.0, 17.5, Math/Sin(fl) + 13.0#};

More examples are given later.

The "#" operator facility in Zed can make use of procs that run at compile time - that is discussed in the relevant places in the following sections. Readers who are not yet familiar with this rather complex aspect of Zed can safely ignore those parts of the descriptions. Note, however, that compile-time execution is needed in order to implement '#[' ("indexing") and for '#{' ("braces"), as well as for binary "#" operators that deal with mixed types.

In the following descriptions, T will refer to the type from which the definitions of the "#" operators are retrieved. The "#" operators are defined for a named type T by calling "Types/ExportProcAdd", providing T, the intended operator, and the proc to use for that operator. "Types/ExportProcAdd" is a 'ctProc' proc, meaning that the call to it will happen during compilation, rather than during any actual code execution. After such a call in a package, the "#" operator that it added will be available for use. See the later examples.

Unary operators have only one operand, so T is taken from that operand. If the operand is of a named type, then that operand type is used as T. If the single operand type is '@' of a named type, then that named type is used as T. Otherwise, T cannot be determined and the Zed compiler complains. "#" "indexing" must find its type from the expression before the '#['. "#" "parens" works like unary operators.

With binary operators, T can come from either operand, but there are several restrictions. If the type can be obtained from the left-hand operand, then it is obtained from there, regardless of the right-hand operand. If the left-hand operand is a 'uint', 'sint', 'float', 'string', 'char' or 'bool', then the right-hand operand must yield T, and the proc to handle the binary operation must be a compile-time one, so that it can do something appropriate with the various possible left-hand operand types. With "#" assignment and initialization, T is always the type of the destination.

Zed does not allow procs to take direct multivalued parameters or to return multivalued results. So, procs for "#" operators using a multivalued type T must use parameters which are '@' of T. Similarly, the result for a non-comparison binary operator proc is given by an initial such '@' parameter, which must be writeable. The Zed compiler will, as needed, create hidden variables of type T to hold intermediate values.

The procs added to a type can be ones with fairly standard proc signatures, and the Zed compiler will insert calls to them into code which uses them, as the direct replacement for the "#" operator. However, the procs can be specified such that the Zed compiler will run them at compile time. In that case, they will run when the Zed compiler is compiling the use of the "#" operator involved. The procs must then return an "Exec/Exec_t" which provides whatever code is needed to do the "#" operation. That code can contain proc calls, loops, expressions, etc. See section "18 Compile Time Execution" for much more on compile time execution in Zed.

Operators governed by compile-time proc execution do not need to yield any specific result type, and can yield different result types for different parameters, or even on different calls with the same parameters. [This ability should not be abused.] All parameters passed to compile-time operator procs are 'nonNil'. Even though such procs are called at compile time by the Zed compiler, they must not be marked 'ctProc' or 'cTime'.

When compile-time procs are called, a sequence will have been created, so that the procs can generate multiple statements if needed. "Exec/SequenceAppend" can be used to add declarations and other statements to that created sequence. The Zed compiler will append the "Exec/Exec_t" returned from the compile-time proc to the sequence, and then end the new sequence, using it as the final replacement for the entire operation. If it turns out that no sequence was needed, the compiler's internal mechanisms may eliminate it.

[If you have compilation errors in your compile-time procs, the Zed compiler cannot run them, and so it is quite likely that you will see many followon errors. Get your compile-time procs compiling and running before you worry about fixing errors relating to their use.]

Non compile-time operator procs usually need to be 'export'-ed, so that they can be properly referenced by any code that uses them. Compile-time operator procs do not need to be exported - they are made available directly on the type they apply to, and the Zed compiler finds them there.

[The Zed compiler itself cannot normally do any optimizations, such as constant folding, on sequences of "#" operations. However, compile-time procs for these operations might be able to, by inspecting the nature of their operands and modifying the resulting "Exec/Exec_t"s based on special cases. For example, a compile-time complex number addition proc can observe that one operand is of type 'float', and so generate faster code for the operation than if both operands are complex numbers.]

It can be useful with multivalued types to add the "Exec/AUTO_AT" flag to the type, to ask the compiler to automatically insert a needed '@' on proc calls. That is done with a package-level 'eval', like this:

    eval Types/ExportBoolAdd(Multivalued_t, Exec/AUTO_AT);

With that setting in effect, a "Multivalued_t" can be passed to a proc which takes a parameter of type "@ Multivalued_t" without an explicit '@'. This is most useful when passing the result of "#" binary operators, where the result is '@' of a variable defined by the Zed compiler. It is recommended that programmers not overuse this facility, however, as it can lead to confusion on the part of a reader.

[The meaning of "#" operators does not need to relate to the meaning of the corresponding non-"#" operators, but readability of code using them is improved if their behaviour has some such relation.]

13.2 "#" Unary Operators

The "#" unary operators are prefix '#-', '#+', '#~', '#@' and '#&', and postfix '#@'. When a compile-time unary operator proc is passed an "Exec/Exec_t" for a multivalued T, that Exec_t will be '@' of such a value. See discussion under "13.4 "#" Assignment" for additional information.

Non compile-time procs for unary operators for a tracked type T

have a single parameter of type T
return a result of type T

Non compile-time procs for unary operators for a multivalued type T

have a first parameter which is a writeable '@' of type T
have a second parameter which is an '@' of type T
have no result

Compile-time procs for unary operators

must be forced to have type "Exec/CompileTimeUnary_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
return a 'nonNil' result of type "Exec/Exec_t"

Examples of unary "#" operator procs:

    proc
    trackedNeg(Tracked_t tr)Tracked_t:
        <construct and return the '#-' of "tr">
    corp;

    proc
    multivaluedPlus(@ Multivalued_t aDst; @ ro Multivalued_t aSrc)void:
        <construct the '#+' of "aSrc@", and store it via "aDst">
    corp;

    Exec/CompileTimeUnary_t: proc
    trackedPostAt(Package/PContext_t nonNil pctx;
                  Exec/Exec_t nonNil ex)nonNil Exec/Exec_t:
        <construct Exec/Exec_t structures to implement postfix
         '#@' on whatever "ex" is. Return the top level node>
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_NEG, trackedNeg);
    eval Types/ExportProcAdd(Multivalued_t Exec/HASH_PLUS, multivaluedPlus);
    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_POST_AT, trackedPostAt);

13.3 "#" Binary Operators

The "#" binary operators are '#&', '#><', '#<<', '#>>', '#|', '#<>', '#^', '#*', '#/', '#%', '#+' and '#-', along with the comparison operators '#<', '#<=', '#=', '#~=', '#>=', '#>', '#==' and '#~=='.

There are three categories of "#" binary operators - the "relate" operator, those which are used as comparison operators, and those which yield a value of type T. Procs for "relate" must return a 'sint', and comparison operators must return a 'bool' which gives the result of the comparison. Other non-compile-time operator procs return values of type T. Non-relate, non-comparison operators defined by compile-time procs are allowed to yield any type. When "Exec/Exec_t" parameters are passed to compile-time procs, values representing multivalued T values will be '@' of such values. See discussion under "13.4 "#" Assignment" for additional information.

Non compile-time procs used for "relate" of a tracked type T

have a first parameter of type T
have a second parameter of type T
return a 'sint' value

Non compile-time procs used for comparison of a tracked type T

have a first parameter of type T
have a second parameter of type T
return a 'bool' value

Non compile-time procs used for other binary operations of a tracked type T

have a first parameter of type T
have a second parameter of type T
return a value of type T

Non compile-time procs used for "relate" of a multivalued type T

have a first parameter which is '@' of type T
have a second parameter which is '@' of type T
return a 'sint' value

Non compile-time procs used for comparison of a multivalued type T

have a first parameter which is '@' of type T
have a second parameter which is '@' of type T
return a 'bool' value

Non compile-time procs used for other binary operations of a multivalued type T

have a first parameter which is a writeable '@' of type T
have a second parameter which is '@' of type T
have a third parameter which is '@' of type T
have no result

Compile-time procs used for "relate" with either tracked or multivalued types

must be forced to have type "Exec/CompileTimeBinary_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
have a third parameter of type "Exec/Exec_t"
return a 'nonNil' result of type "Exec/Exec_t" which yields a result of type 'sint'

Compile-time procs used for comparison with either tracked or multivalued types

must be forced to have type "Exec/CompileTimeBinary_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
have a third parameter of type "Exec/Exec_t"
return a 'nonNil' result of type "Exec/Exec_t" which yields a result of type 'bool'

Compile-time procs used for other binary operations with either tracked or multivalued types

must be forced to have type "Exec/CompileTimeBinary_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
have a third parameter of type "Exec/Exec_t"
return a 'nonNil' result of type "Exec/Exec_t"

Examples of binary "#" operator procs:

    proc
    trackedMul(Tracked_t nonNil left, right)nonNil Tracked_t:
        <Compute the '#*' of "left" and "right", and create and return a
         "Tracked_t" containing that result>
    corp;

    proc
    trackedLessThan(Tracked_t left, right)bool:
        <return 'true' if "left" '#<' "right">
    corp;

    proc
    multivaluedDiv(@ Multivalued_t aRes; @ ro Multivalued_t aLeft, aRight)void:
        <compute result of '#/' and store it via "aRes">
    corp;

    proc
    multivaluedEqual(@ ro Multivalued_t aLeft, aRight)bool:
        <return 'true' if "aLeft@" '#=' "aRight@">
    corp;

    Exec/CompileTimeBinary_t: proc
    multivaluedGreaterThan(Package/PContext_t nonNil pctx;
                           Exec/Exec_t nonNil left, right)nonNil Exec/Exec_t:
        <create Exec/Exec_t structures which compute the '#>' of whatever
         "left" and "right" are. We know that "left" is of type
         "Multivalued_t", but "right" can be anything. Return the top
         Exec/Exec node of those structures>
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_MUL, trackedMul);
    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_LESS_THAN, trackedLessThan);
    eval Types/ExportProcAdd(Multivalued_t, Exec/HASH_DIV, multivaluedDiv);
    eval Types/ExportProcAdd(Multivalued_t, Exec_HASH_EQUAL, multivaluedEqual);
    eval Types/ExportProcAdd(Multivalued_t Exec/HASH_GREATER_THAN,
                             multivaluedGreaterThan);

As with non-"#" binary operators, Zed does not specify the order of evaluation of the operands of "#" binary operators. The standard solution of putting operands in local variables, and then using those local variables in the "#" binary can be used.

13.4 "#" Assignment

Assignment operators ('#:=') can be used when assigning to already existing variables or when initializing new variables. Unlike when initialized with the regular ':=' assignment, variables initialized with '#:=' cannot be marked as 'ro' or 'con'. [This is unfortunate, but is required because the variables must be assigned to by the normal code within the '#:=' proc, and that code can be such that it is not possible to know, in all cases, that it is actually initializing the variable before it can be used.] Similarly, such variables cannot be or contain any 'nonNil' items. [This is very unfortunate. It greatly reduces the usefulness of '#:=' initialization, mostly to struct types which do not contain any tracked elements. Tracked elements can be used, but the need for not being 'nonNil' is "infectious", in that it spreads from the initial requirement.]

When used in code, the '#:=' operation allows an '@' value as the destination - this works because the '@' is needed internally. However, this is not allowed during a '#:=' variable initialization, since such an '@' variable must be directly initialized before any '#:=' proc can be called.

Non compile-time assignment procs for a tracked type T

have a first parameter which is a writeable '@' of type T
have a second parameter which is of type T
have no result

Having the first parameter be '@' of the destination allows things like using common storage for equivalent values, rather than having multiple copies of those (perhaps large) values.

Non compile-time assignment procs for a multivalued type T

have a first parameter which is a writeable '@' of type T
have a second parameter which is an '@' of type T
have no result

Compile-time assignment procs, which have the same signature as procs for compile-time binary procs

must be forced to have type "Exec/CompileTimeBinary_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
have a third parameter of type "Exec/Exec_t"
return a 'nonNil' result of type "Exec/Exec_t"

Examples of "#" assignment procs:

    proc
    trackedAssign(@ Tracked_t aDst; Tracked_t src)void:
        log("Assigning ", src, " to ", aDst@);
        aDst@ := src;
    corp;

    proc
    multivaluedAssign(@ Multivalued_t aDst;
                      @ ro Multivalued_t aSrc)void:
        <code to assign fields from "aSrc@" into "aDst@">
    corp;

    Exec/CompileTimeBinary_t: proc
    variantAssign(Package/PContext_t nonNil pctx;
                  Exec/Exec_t nonNil dst, src)nonNil Exec/Exec_t:
        <create code to assign something to "dst", based on "src".
         Do any desired checking on the type of "src">
        <yield code sequence, or, if templates are used internally, yield
         Exec/NothingNew()>
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_ASSIGN, trackedAssign);
    eval Types/ExportProcAdd(Multivalued_t, Exec/HASH_ASSIGN, multivaluedAssign);
    eval Types/ExportProcAdd(Variant_t, Exec/HASH_ASSIGN, variantAssign);

As usual, Zed does not specify an order of evaluation between an expression which specifies the destination and the expression which specifies the value to be assigned.

For a compile-time proc, if the source value is of type 'T', the Zed compiler makes sure that the passed source Exec_t is '@' of the actual T value. This is done here, as well as with compile-time unary and binary operators, for consistency. It also allows the compile-time code to examine the storage flags on the '@' types given to it. This consistency of presentation of parameters for compile-time procs for multivalued types is not done for "#" selection, indexing, parentheses, calls or braces - they are passed whatever the operation is given.

If template code is used as the main part of a proc used for compile-time handling with multivalued types, it can be a nuisance to produce a template value for such operands, since template type checking must match exactly, which includes storage flags. It is suggested that programmers use "Exec/AtDerefNew" on the passed Exec_t's, in order to produce an Exec_t which will be of their type T. That value can then be 'assign'ed as a template value, and used in template code to implement the operation.

Typically, the body of these procs will be a scope, and you cannot yield '@' of a local from its containing scope, so the yielded value will be directly of the compound type. Chaining of unary (and binary) "#" operators works because the Zed compiler will internally add any needed '@'.

For example:

    struct Str_t {
        uint str_a, str_b;
    };

    ...

    Exec/CompileTimeBinary_t: proc
    strAdd(Package/PContext_t nonNil pctx;
           Exec/Exec_t nonNil left, right)nonNil Exec/Exec_t:
        assert assign template ro volatile Str_t tLeft :=
            Exec/AtDerefNew(pctx, left);
        assert assign template ro volatile Str_t tRight :=
            Exec/AtDerefNew(pctx, right);
        con varEx := Exec/DeclareHiddenVariable(pctx, Str_t);
        assert assign template Str_t str := varEx;
        template begin
            str.str_a := tLeft.str_a + tRight.str_a;
            str.str_b := tLeft.str_b + tRight.str_b;
        end;
        template(str)
    corp;

    eval Types/ExportProcAdd(Str_t, Exec/HASH_ADD, strAdd);

The use of "AtDerefNew" does not cause the rules regarding the storage flags on '@' types to be ignored - the compiler checks the overall correctness of the code that results from template expansion, etc.

There are a number of non-obvious items with the above code.

why do the initial template types have 'ro' 'volatile'? This is because that is a "universal receiver". Without the 'ro', the asserts would fail if the Exec_t represented an 'ro' or 'con' value. Without the 'volatile' they would fail if the Exec_t included 'volatile'.
why does the code use "DeclareHiddenVariable" and a template 'assert' instead of just declaring "str" inside the template block? Recall that the Zed compiler creates a sequence for the expansion of compile-time operator procs, not a scope. Such a declaration inside the template block would be moved out to the sequence associated with the containing scope, and thus would be visible in a "pretty-print" of code which uses the operator.
why does the proc end in template(str) instead of just having "str" as the last element of the template block, with no semicolon after the 'end'? A template block always yields 'void' and the proc is supposed to yield an Exec_t of type "Str_t". It also works to use a template expression (which must contain an explicit block because of the multiple statements required) yielding "@str". That's more obscure.
this proc is tricky for such a simple operation. Is it fully correct? No it is not. Note that the template block uses both "tLeft" and "tRight" twice. They are "Exec_t"'s. They could represent something that should not be evaluated more than once (e.g. a proc call with side effects). To be fully correct, this proc should make hidden variables which are '@' of the left and right operands, then reference the fields through those. The effect here is much like what happens if a C macro parameter is used more than once in the macro, and the macro is passed something with a side effect. It is possible to examine the Exec_t's and avoid the extra variables when they are not needed, but care must be taken.

The above considerations are some of the reasons why compile-time operator procs should be considered only if there is truely a need for them. They should then be heavily tested.

13.5 "#" Selection and Selection-Assignment

The selection operators, '#.' and '#->', perform operations that appear similar to that of selecting a field from a 'struct', 'record' or 'capsule' using the '.' and '->' operators. As with those operators, the field to be selected is given by a simple field name. The name is known at compile time, and is given as a normal Zed name. As a slight extension, if a name to be used does not meet the rules for Zed names, it can be given as a string literal. [Note that the system does not remember in which form a name was given, so on output, valid names appear as names, and only invalid ones appear as string literals.]

Selection operators need to be used both in expressions and in the destinations of assignments. The first use is straightforward. The second use is slightly more complex, in that there are three parameters involved in it: the base expression of the destination, the field name of the destination and the source value.

Both selection and selection-assignment require a field name that is known at compile time. This field name must be handled at compile time, and so only compile-time "#" operator procs exist for selection and selection-assignment.

Procs for simple selection on type T

must be forced to have type "Exec/CompileTimeSelect_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
have a third parameter of type 'string'
return a 'nonNil' result of type "Exec/Exec_t"

Procs for selection-assignment on type T

must be forced to have type "Exec/CompileTimeSelAssign_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
have a third parameter of type "Exec/Exec_t"
have a fourth parameter of type 'string'
return a 'nonNil' result of type "Exec/Exec_t"

In selection-assignment, the second parameter is that of the destination base, and the third is that of the assignment source.

Examples of "#" selection and selection-assignment operator procs:

    Exec/CompileTimeSelect_t: proc
    trackedDot(Package/PContext_t nonNil pctx; Exec/Exec_t nonNil base;
               string nonNil name)nonNil Exec/Exec_t:
        <code to create code to do a "#." selection on "base"
         of something field-like called <name> >
    corp;

    Exec/CompileTimeSelAssign_t: proc
    multivaluedArrowAssign(Package/PContext_t nonNil pctx;
                           Exec/Exec_t nonNil base, src;
                           string nonNil name)nonNil Exec/Exec_t:
        <code to create code to assign to something field-like
         called <name> on expression "base", the value "src">
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_DOT, trackedDot);
    eval Types/ExportProcAdd(Multivalued_t, Exec/HASH_ARROW_ASSIGN,
                             multivaluedArrowAssign);

13.6 "#" Indexing

"#" "indexing" resembles array/matrix indexing in that it involves a base value (playing the syntactic role of the array or matrix), followed by some number of "index" expressions enclosed in square brackets. For "#" indexing the brackets are '#[' and '#]'.

As with the "#" selection operations, "#" indexing operations are implemented only by compile-time operator procs. Indexing is somewhat more complex, however, since there can be an arbitrary number of "index" expressions. This is handled using a Zed 'interface'. The interface is:

    export interface Exec/HashIndexingInterface_t partial {
        proc indexAppend(poly hii; @ ro Exec/TempHashIndexing_t thi;
                         Exec/Exec_t nonNil index)bool;
        proc indexingFor(poly hii; @ ro Exec/TempHashIndexing_t thi;
                         Exec/Exec_t exFor)bool;
        proc indexingUpto(poly hii; @ ro Exec/TempHashIndexing_t thi;
                          Exec/Exec_t exUpto)bool;
        proc indexingNew(poly hii; @ ro Exec/TempHashIndexing_t thi)Exec/Exec_t;
        proc indexingAssignNew(poly hii; @ ro Exec/TempHashIndexing_t thi;
                               Exec/Exec_t nonNil src)Exec/Exec_t;
    };

Proc "indexAppend" is used to add "index" expressions to the indexing operation. Procs "indexingFor" and "indexingUpto" are optionally used to add 'for' and/or 'upto' clauses to the indexing operation. Procs "indexingNew" and "indexingAssignNew" are used to specify that there are no more "index" expressions in this indexing operation, and to return the final "Exec/Exec_t" for the code to implement this particular indexing operation. In the case of "indexingAssignNew", the Zed compiler encountered an "assignment" to an indexed value ('#:=' operator), and parameter "src" is the "Exec/Exec_t" of the value to be "assigned". "indexAppend", "indexingFor" and "indexingUpto" can return 'false' to indicate that they have noticed a problem and want the resulting operation marked as erroneous. "indexingNew" and "indexingAssignNew" can mark the operation erroneous by returning 'nil'.

Note that the "Exec/Exec_t" passed to "indexingFor" and "indexingUpto" is not marked as 'nonNil'. The value passed will be nil if the corresponding value is given as "*" in the indexing expression. This corresponds to that same syntax meaning "to the end of the string" in standard substringing.

"HashIndexingInterface" is marked as 'partial', meaning that capsules which implement it do not need to implement all of the interface procs. The Zed compiler treats only the "indexAppend" proc as mandatory, but also requires one of "indexingNew" or "indexingAssignNew". Missing procs indicate that T does not support those options. If a user programmer attempts to use those options when they are not supported, the Zed compiler will issue appropriate error messages. As usual, all of the interface procs can also issue their own error messages.

Type "Exec/TempHashIndexing_t" is a 'struct' type which the Zed Exec code uses to control its handling of the overall indexing operation. It is defined as:

    export struct Exec/TempHashIndexing_t ro {
        Package/PContext_t thi_pctx;
        Types/Type_t thi_type;
        Exec/Exec_t thi_base;
        Exec/TempExecs_t thi_indexes;
        Exec/HashIndexingInterface_t thi_interface;
        Exec/Exec_t thi_forUpto;
        bool thi_hasError, thi_isFor;
    };

Programmer code which is dealing with "#" indexing can examine the fields of this struct, but cannot directly modify them. Field "thi_forUpto" can be used by "indexingNew" or "indexingAssignNew" to access any 'for' or 'upto' clause that was given, and field "thi_isFor" indicates which was given, if any.

Programmer code must define a Zed 'capsule' type which 'implements' the above interface. The Zed Exec code will interact with the programmer code through that interface. This interfacing is set up when the Zed Exec code calls an indexing proc attached to the type of a base expression under name "Exec/HASH_INDEXING". That proc must return a value of the capsule type which implements the interface. The Exec code will then control the interaction. The last step will be the Exec code calling either the "indexingNew" proc, or the "indexingAssignNew" proc, which return the top "Exec/Exec_t" node for the code to implement the entire indexing operation.

The Zed compiler places no restrictions on the number of "index" expressions. It is the programmer of the "#" indexing operation who decides what the rules are, and implements them. However, to be consistent with the nature of string indexing, if there is a 'for' or 'upto' clause present, the index so modified must be the only index.

The Zed Exec code appends new "index" expressions to the "Exec/TempExecs_t" in the "Exec/TempHashIndexing_t" after calling "indexAppend" and before any calls to "indexingFor" or "indexingUpto".

Even though the "indexAppend", "indexingFor" and "indexingUpto" procs are called by the Zed compiler at compile time, those procs should not attempt to create any of the code that will be implementing the indexing operation. All such code should be created in "indexingNew" or "indexingAssignNew". See the general discussion of compile-time procs earlier in this section for more information.

Procs used for "#" indexing

must be forced to have type "Exec/CompileTimeIndexing_t"
have a first parameter of type "@ ro Exec/TempHashIndexing_t"
return a 'nonNil' value of type "Exec/HashIndexingInterface_t"

The value returned by the indexing proc will be a value of the capsule type which implements "Exec/HashIndexingInterface_t". That value is compatible with "Exec/HashIndexingInterface_t" by the type compatibility rules of Zed.

Example of "#" indexing:

    capsule TrackedIndexing_t implements Exec/HashIndexingInterface_t {
        record {
            <any private data fields needed>
        };

        procs Exec/HashIndexingInterface_t {
            proc
            indexAppend(TrackedIndexing_t ti; @ ro Exec/TempHashIndexing_t thi;
                        Exec/Exec_t nonNil index)bool:
                <We are presented with a new "index" expression, "index". We
                 examine it for correctness against what we have already seen,
                 and against the base in "thi@.thi_base", etc. If we are happy
                 with it, we return 'true', else we return 'false'.>
            corp;

            proc
            indexingNew(TrackedIndexing_t ti;
                        @ ro Exec/TempHashIndexing_t thi)Exec/Exec_t:
                <Examine all of the collected information, and see if it is
                 valid for this indexing operation. If so, create and return an
                 Exec/Exec_t tree that implements the operation, otherwise
                 return C<'nil'>.>
            corp;
        };
    };

    Exec/CompileTimeIndexing_t: proc
    trackedIndexing(@ ro Exec/TempHashIndexing_t thi)
                    nonNil Exec/HashIndexingInterface_t:
        <Create and return a TrackedIndexing_t. This can be as simple as
         just a capsule allocation with no values.>
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_INDEXING, trackedIndexing);

Note that this example does not provide "indexingFor", "indexingUpto" or "indexingAssignNew". That means that type "Tracked_t" would not support 'for' or 'upto' clauses, and would not allow assigning to an indexed value.

To clarify the sequences of calls that a programmer of "#" indexing could expect, here are some examples of Zed source, and the resulting sequences of calls to the interface procs:

    dest #:= array#[#]

        indexingNew

    dest := array#[p + 1, q - 1, "fred"#]

        indexAppend 'p + 1'
        indexAppend 'q - 1'
        indexAppend '"fred"'
        indexingNew

    dest#[j for k - 1#] #:= source

        indexAppend 'j'
        indexingFor 'k - 1'
        indexingAssignNew 'source'

In a much more complex example like:

    dest#[i#]#["fred", j#] #:= source#["first", "second"#]#[2#]

there would be 4 separate "#" indexing sequences, and perhaps not to the same set of procs if there is more than one "#" indexing type involved.

13.7 "#" Parentheses

An expression can be enclosed in "#" parentheses. The type of the value within the '#(' and '#)' pair must have a "parens" handler attached. The "parens" handler is always a compile-time proc, so the type of the result it computes does not have to match the type of its parameter.

Procs used for "#" "parens"

must be forced to have type "Exec/CompileTimeUnary_t"
have a first parameter of type "Package/PContext_t"
have a second parameter of type "Exec/Exec_t"
return a 'nonNil' result of type "Exec/Exec_t"

Example of "parens" proc:

    Exec/CompileTimeUnary_t: proc
    trackedParens(Package/PContext_t nonNil pctx;
                  Exec/Exec_t nonNil ex)nonNil Exec/Exec_t:
        /* This proc selects and returns field "tr_count". */
        Exec/FieldRefNew(pctx, ex, "tr_count")
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_PARENS, trackedParens);

13.8 "#" Calls

The syntax of a standard proc call is that of the name of the proc followed by a parenthesized list of parameters. "#" calls use the same syntax, except with '#(' and '#)' instead of regular parentheses. The type from which to find the '#' call proc is that of the expression representing the "proc".

Symbol "Exec/HASH_CALL" is searched for as an export of that type. If found, it must be a proc, with forced proc type "Exec/CompileTimeCall_t". It will be called, at compile time, with the required "Exec/TempHashCall_t" parameter. That type is defined as follows:

    export struct TempHashCall_t ro {
        Package/PContext_t thcal_pctx;
        Exec/Exec_t thcal_theProc;
        Exec/TempExecs_t thcal_execs;
        Exec/HashCallInterface_t thcal_interface;
        bool thcal_hasError;
    };

Only fields "thcal_pctx" and "thcal_theProc" will have been set.

Based on that, the "HASH_CALL" proc must either return 'nil', signalling that it is denying the entire call, or it must return a value of interface type "Exec/HashCallInterface_t" defined as:

    export interface HashCallInterface_t {
        proc callAppend(poly hci; @ ro Exec/TempHashCall_t thcal;
                        Exec/Exec_t nonNil ex)bool;
        proc callNew(poly hci; @ ro TempHashCall_t thcal)Exec/Exec_t;
    };

That interface is used to co-ordinate among the Zed parser, the Zed semantics code and the user code handling the "#" "call". User code will use a capsule type which implements "HashCallInterface_t", and returning an object of that type is how the "HASH_CALL" proc yields the interface value.

After that, the Zed compiler will call the "callAppend" method for each parameter to the "#" "call" that it finds. Note that there might not be any parameters. "callAppend" returns 'true' to indicate that it is happy with the parameter, and 'false' to mark the "#" "call" as being in error.

When there are no more parameters, the Zed compiler will call the "callNew" method. That method can return 'nil' to indicate that the "call" should be marked as in error, or return an "Exec/Exec_t" that represents the replacement code for the entire "call". This code does not need to be a proc call - it can be anything, and its nature will be checked within the context of the "call", as usual.

See the preceding and following sections for more details and examples of similar "#" constructs.

13.9 "#" Braces

A "braces" "#" proc in Zed differs from other "#" procs in the way in which it is found. There is no left-hand operand or obvious parameter from which the appropriate type can be determined. Instead, name "HashBraces" is looked up in the current package and in all 'import'ed packages. The name found in that manner must be an appropriate proc, and that proc is called to start a "#" "braces" sequence. [Name "Exec/HASH_BRACES" provides that name, but there is no nice way to use that symbol to name a proc.]

There can be any number of expressions of any type within the "#" braces, separated by commas. The Zed compiler interacts with the provided code to handle those expressions in a way similar to the interaction used for "#" indexing. The provided proc will run at compile-time, and so the result type of the "#" "braces" sequence can be anything. [Typically, a "braces" sequence is used to represent something like a list or vector constructor, with an explicit set of elements.] The interaction is handled using the Zed 'interface':

    export interface Exec/HashBracesInterface_t {
        proc bracesAppend(poly hbi; @ ro Exec/TempHashBraces_t thb;
                          Exec/Exec_t nonNil ex)bool;
        proc bracesNew(poly hbi; @ ro Exec/TempHashBraces_t thb)Exec/Exec_t;
    };

Proc "bracesAppend" is used to add new expressions to the "braces" operation. Proc "bracesNew" is used to indicate that there will be no more expressions, and returns the final "Exec/Exec_t" which is the value of the "braces". "bracesAppend" can return 'false' to indicate that it wishes the entire "braces" sequence to be marked as erroneous. "bracesNew" can indicate an error by returning 'nil'.

Type "Exec/TempHashBraces_t" is a 'struct' type which the Zed Exec code uses to control its handling of the overall "braces" operation. It is defined as:

    export struct Exec/TempHashBraces_t ro {
        Package/PContext_t thb_pctx;
        Exec/TempExecs_t thb_execs;
        Exec/HashBracesInterface_t thb_interface;
        bool thb_hasError;
    };

The fields of this struct are readable by the programmer code which implements the "braces" operation, but not writeable. When "bracesAppend" is called, the new element will not yet have been added to "thb_execs".

Programmer code must define a Zed 'capsule' type which 'implements' the above interface. The Zed Exec code will interact with the programmer code through that interface. This interfacing is set up when the Zed Exec code calls the "HashBraces" proc that it has found. That proc must return a value of the capsule type which implements the interface. The Exec code will then control the interaction.

As with "#" indexing, the "bracesAppend" proc should not attempt to create any of the code for the actual "braces" operation. Instead, all of the code should be created by "bracesNew". See the general discussion of compile-time procs for more information.

All "HashBraces" procs

must be forced to have type "Exec/CompileTimeBraces_t"
have a first parameter of type "@ ro Exec/TempHashBraces_t"
return a 'nonNil' value of type "Exec/HashBracesInterface_t"

The value returned by the "braces" proc will be a value of the capsule type which implements "Exec/HashBracesInterface_t". That value is compatible with "Exec/HashBracesInterface_t" by the type compatibility rules of Zed.

Example of "#" "braces":

    capsule MultiList_t implements Exec/HashBracesInterface_t {
        record {
            <any private data fields needed>
        };

        procs Exec/HashBracesInterface_t {
            proc
            bracesAppend(MultiList_t nonNil ml; @ ro Exec/TempHashBraces_t thb;
                         Exec/Exec_t nonNil ex)bool:
                <...>
            corp;

            proc
            bracesNew(MultiList_t nonNil ml;
                      @ ro Exec/TempHashBraces_t thb)Exec/Exec_t:
                <...>
            corp;
        };
    };

    export Exec/CompileTimeBraces_t: proc
    HashBraces(@ ro Exec/TempHashBraces_t thb)nonNil Exec/HashBracesInterface_t:
        MultiList_t(...)
    corp;

Note that the initiating proc must be named "HashBraces", and is not added to any type. [It is usually 'export'-ed so that other packages can use it.]

If multiple packages available in a given situation offer a "HashBraces" proc, it is undefined which one will be used by the compiler. The choice can also vary from time to time. The term "braces" describes only the syntax of this facility - it says nothing about any semantics that might be implemented. For these reasons it is usually better to use a 'varProc' or 'ioProc' proc to implement the desired collection ability. That way, the name of the proc can say something about the semantics of the collection - e.g. "set", "list", etc. [Should I remove the "HashBraces" facility altogether?]

13.10 "#" Example

The following example uses a 'struct' type to represent complex numbers. The various basic operators are added to the type. No compile-time "#" operator procs are needed here. The bodies of several procs have been removed in order to keep this example of a reasonable size. Also removed are all details about packages.

    export struct Complex_t {
        float cplx_real, cplx_imag;
    };

    export proc
    complexNeg(@ Complex_t dst; @ ro Complex_t cplx)void:
        dst@.cplx_real := - cplx@.cplx_real;
        dst@.cplx_imag := - cplx@.cplx_imag;
    corp;

    export proc
    complexMul(@ Complex_t dst; @ ro Complex_t left, right)void:
        float lReal := left@.cplx_real, lImag := left@.cplx_imag,
            rReal := right@.cplx_real, rImag := right@.cplx_imag;
        dst@.cplx_real := lReal * rReal - lImag * rImag;
        dst@.cplx_imag := lReal * rImag + lImag * rReal;
    corp;

    export proc
    complexDiv(@ Complex_t dst; @ ro Complex_t left, right)void:
        float lReal := left@.cplx_real, lImag := left@.cplx_imag,
            rReal := right@.cplx_real, rImag := right@.cplx_imag,
            denom := rReal * rReal + rImag * rImag;
        dst@.cplx_real := (lReal * rReal + lImag * rImag) / denom;
        dst@.cplx_imag := (rReal * lImag - lReal * rImag) / denom;
    corp;

    export proc
    complexAdd(@ Complex_t dst; @ ro Complex_t left, right)void: ...

    export proc
    complexSub(@ Complex_t dst; @ ro Complex_t left, right)void: ...

    export proc
    complexLessThan(@ ro Complex_t l, r)bool:
        float lr := l@.cplx_real, li := l@.cplx_imag, rr := r@.cplx_real,
            ri := r@.cplx_imag;
        lr * lr + li * li < rr * rr + ri * ri
    corp;

    export proc
    complexLessOrEqual(@ ro Complex_t l, r)bool: ...

    export proc
    complexEqual(@ ro Complex_t l, r)bool:
        l@.cplx_real = r@.cplx_real and l@.cplx_imag = r@.cplx_imag
    corp;

    export proc
    complexNotEqual(@ ro Complex_t l, r)bool: ...

    export proc
    complexGreaterOrEqual(@ ro Complex_t l, r)bool: ...

    export proc
    complexGreaterThan(@ ro Complex_t l, r)bool: ...

    export proc
    complexAssign(@ Complex_t dst; @ ro Complex_t src)void:
        dst@.cplx_real := src@.cplx_real;
        dst@.cplx_imag := src@.cplx_imag;
    corp;

    export proc
    complexFmt(CharBuffer/OBuf_t nonNil ob; @ ro Complex_t ro cplx;
               string format; uint width, precision)void:
        CharBuffer/OChar(ob, "(");
        FmtFloat(ob, cplx@.cplx_real, format, width, precision);
        CharBuffer/OString(ob, ", ");
        FmtFloat(ob, cplx@.cplx_imag, format, width, precision);
        CharBuffer/OChar(ob, ")");
    corp;

    eval Types/ExportBoolAdd(Complex_t, Exec/AUTO_AT);

    eval Types/ExportProcAdd(Complex_t, Exec/HASH_NEG, complexNeg);

    eval Types/ExportProcAdd(Complex_t, Exec/HASH_MUL, complexMul);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_DIV, complexDiv);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_ADD, complexAdd);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_SUB, complexSub);

    eval Types/ExportProcAdd(Complex_t, Exec/HASH_LESS_THAN, complexLessThan);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_LESS_OR_EQUAL,
                             complexLessOrEqual);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_EQUAL, complexEqual);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_NOT_EQUAL, complexNotEqual);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_GREATER_OR_EQUAL,
                             complexGreaterOrEqual);
    eval Types/ExportProcAdd(Complex_t, Exec/HASH_GREATER_THAN,
                             complexGreaterThan);

    eval Types/ExportProcAdd(Complex_t, Exec/HASH_ASSIGN, complexAssign);

    eval Types/ExportProcAdd(Complex_t, FMT_THIS, complexFmt);

A full complex number package would include several additional procs, such as trigonometry procs, conversion routines, etc. Given the above code, we can have a test proc:

    proc
    test1(@ Complex/Complex_t c1, c2)void:
        Complex/Complex_t c3 #:= c1 #+ c2, c4 #:= c1 #* c2 #- c3,
            c5 #:= #- c2;
        Fmt("c3: ", c3, ", c4: ", c4, ", c5: ", c5);
    corp;

[As mentioned previously, when dealing with expressions involving "#" operators on multivalued types the Zed compiler will create and use temporary variables as needed. For example, in the expression

    (a #+ b) #/ (c #- d)

a temporary variable is used to hold "c #- d" before the '#/' is done.]

14 Privileged Versus Non-privileged Code

The Zed language distinguishes between privileged and non-privileged code. Several items have already been described as being available only to privileged code. These include the use of "unsafe" 'union's, pointer arithmetic and type casts using 'pretend'. One of the design goals for the Zed language is that non-privileged code, which is intended to be virtually all code, should not be able to violate the type rules of the language, or to have any way to perform operations that the language intends to prevent. Thus, non-privileged code should always be safe to use, within the constraints of what it has access to via the normal, non-privileged Zed operations. Such code can deny operating access to other code, such as by over-consumption of memory or by consuming all available CPU resources, but control of such things is outside of the scope of the Zed programming language, and belongs in the supporting operating system and libraries.

How a programmer becomes privileged is not a part of the Zed language - it is something that is decided externally. For example, on Unix-based operating systems like Linux, privilege can be determined by the effective user ID during execution - user "root" can be privileged, but most other users likely are non-privileged.

It is also a goal of the Zed language that execution should be as efficient as possible. This is a secondary goal, however, less important than fully safe execution of non-privileged code. Reaching this second goal is helped by having facilities in the language that are usable by non-privileged programmers, but which allow execution very similar in cost to what can be achieved by privileged programmers. One example of this is the use of 'package' '@' values in some situations where other programming languages would need to use unconstrained pointers.

Many facilities in Zed have already been described which allow programmers to control what kind of access other code has to their data structures, etc. These facilities usually introduce restrictions on access. Given that the Zed system always checks array and matrix accesses, 'enum' values, arithmetic errors, tracked and '@' "pointer" uses, etc. there are only three major abilities that must be restricted.

The first of these is the use of "casts", via the 'pretend' construct. Non-privileged programmers cannot use 'pretend' - all of the things they should validly want to do can be done by other conversion constructs, such as 'fromUint', 'toUint', 'flt', etc. Allowing non-privileged programmers to use 'pretend' would allow them to violate Zed's safety rules by doing such things as casting one type of '@' to another and then using it to write invalid values to memory locations, or to read from or write to memory locations that should not be accessible.

The second of these is "unsafe" 'union' types. Since the use of these is unchecked, there would be nothing to prevent non-privileged programmers from effectively doing type casts using unions. Zed provides checked unions in the form of variant records. These are not as efficient as 'union's, but many non-privileged uses of unions need the checking in some form or other anyway. See the description of "4.12 Union Types" for the definition of "safe" unions.

The third of the capabilities that is restricted for non-privileged programmers is the free use of pointers. When '@' values can be used instead of pointers, there is no additional cost involved, either in terms of CPU usage or memory usage. However, tracked values (e.g. record and capsule references) can incur extra cost, both in CPU time and memory use. So, allowing non-privileged programmers some access to pointers is desireable.

Non-privileged programmers cannot be allowed to create pointers to arbitrary data values. Creating an unrestricted pointer to a proc formal or proc local can lead to problems if the pointer lives beyond the life of the formal or local. Controlled use of such things is allowed via non-'package' '@' values. Pointers to elements within record, capsule or matrix entities can lead to similar "dangling pointers" if the Zed storage management system (reference counting or garbage collection) frees those entities. Pointers to package-level ("static allocation") variables could be allowed, but that is already done safely via 'package' '@' operations.

Non-privileged programmers cannot be allowed to do pointer arithmetic. In general, the compiler and run-time system cannot know the run-time range of all objects which are referenced only by pointers. If pointer arithmetic were to be allowed, then non-privileged programmers could use it to turn a valid pointer into an invalid pointer, and then write invalid data to otherwise-constrained memory locations, or to write to or read from memory locations that they should not be able to access.

With the above restrictions, the only pointers that non-privileged code can use are pointers provided to it by privileged code. Since other mechanisms exist in Zed to control access to things like struct fields, there is no reason to prevent non-privileged code from reading and writing via pointers. Note that run-time checks for 'nil' will be used if the pointer value is not 'nonNil'. Such checks are not used when privileged code uses pointers, even if the pointer does not have the 'nonNil' attribute.

One issue that remains is that of the lifetime of memory regions which are referenced by pointers. If a pointer to or into a region of memory can be kept indefinitely by non-privileged code, then that region of memory can never be freed. This is because there is no reliable way for privileged code to determine when there are no longer any pointers into a given region of memory. The automatic storage management facilities (reference counting and garbage collection) only work with tracked values, which are constrained pointers to memory regions which contain additional fields to allow the required full tracking.

If privileged code exports to non-privileged code a proc which essentially allocates memory and returns a pointer to it, then such allocated memory can never be safely freed. If that sort of need arises in a program, the programmer is advised to work with record or capsule objects to allow automatic memory freeing. Alternatively, only a very small set of memory regions could ever be returned, in which case the inability to free them is not an issue.

Another possible scenario is one in which privileged code calls out to non-privileged code, passing one or more pointers to memory regions. If the non-privileged code can preserve such pointers, or other pointers obtained indirectly from the initial ones, such that future callouts from the privileged code can still access those memory regions, then again the privileged code has no way to know which memory regions it can safely free.

This scenario is allowed in Zed by having the following restrictions:

non-privileged code cannot write a pointer into a package ("static" allocation) variable
non-privileged code cannot write a pointer into a record or capsule object
non-privileged code cannot write a pointer into a matrix
non-privileged code cannot write a pointer through an '@'

Note that explicit 'nil' values are allowed to be written. Non-privileged code is allowed to write pointers indirectly through pointers - this will be as allowed by the types exported from the privileged code. These restrictions prevent the non-privileged code from "hiding" a pointer somewhere that the privileged code cannot find.

Thus, for example, privileged code can build complex data structures linked together via pointers, and then pass a pointer to them to non-privileged code. Depending on the field access rights granted to the non-privileged code, the non-privileged code might be able to re-arrange the data structures, or construct temporary linkages through the structures. It can traverse links not denied to it by the package exporting the types involved. Careful design of the structs involved, and the access rights to their pointer fields, allow this to be done in a way that lets the creating privileged code continue to keep track of the memory involved, while still allowing the non-privileged code freedom to re-arrange the structures.

[Zed briefly allowed pointers to be written through non-'package' '@'s. Unfortunately, non-'package' '@'s need to be able to take on 'package' '@' values when manipulating structures involving them. Thus, the compiler could never be sure that a given non-'package' '@' does not actually reference a package variable. I briefly considered adding another storage flag, e.g. "proc" for use with '@'s which could only take on values known to be non-'package', and allowing pointers to be stored through those. It is not clear that would actually work out, however.]

[There has been virtually no use of pointers in the Zed system. So, it is unknown if the above rules, implemented by the compiler, are sufficient to prevent non-privileged programmers from accomplishing bad things, given carefully designed structures provided by privileged programmers. It is also unknown if the rules allow anything useful to be accomplished with pointers.]

18 Compile Time Execution

[The concept of compile time execution has been in the Zed language from the very beginning. Initially it was intended to be implicit, in the sense that if something could be executed at compile time, it would be. However, that turned out to be unworkable, so I added the 'ctProc' flag on procs. My initial thought was that any proc that was to run at compile time must be flagged that way, but that idea went away as soon as I started implementing it. Another thought was that things like generic abilities, complex types, etc. could be implemented as compile time procs from libraries that return newly created types. That sort-of works, but is clumsy.

It is very difficult to convince myself that all compile time activity that Zed currently allows is safe, in the sense that it does not allow a programmer to "trick" the compiler into allowing something that violates any of the semantic rules. These violations could be accessing a field that should not be accessible, accessing a field or variable using the wrong type, accessing beyond the end of an array, struct or "tracked" value, etc. Hence, it is possible that some amount of compile-time activity may have to be made privileged. I have no ideas about this at the moment.]

18.1 Introduction to Compile Time Execution

Some programming languages allow certain types of execution to happen at compile time, rather than at run time. These actions are typically used to perform file inclusion, conditional compilation, constant naming, etc. Languages vary considerably in the power of their compile time features. Some example languages:

C, C++: the C preprocessor allows limited conditional compilation (using literals and symbols whose value is defined in the preprocessor), file inclusion and text macros (parameter substitution within a copy of the body). Recent updates to C++ allow more to happen at compile time.
PL/I: the full specification of the PL/I programming language allows variables, conditional statements, loops, etc. to be used at compile time. The syntax is the same as for normal code, but the keywords are prefixed by '%'. Not all compilers support this feature.
IBM assemblers: the standard assemblers for IBM 360/370 mainframes had a powerful macro language which allowed procedures with parameters, looping, conditionals, calling other macros, etc. Single macro calls could expand to hundreds of lines of assembler code.
LISP: some LISP variants provide a punctuation mark which causes the system to evaluate a given S-expression at parsing time, with the yielded result used for further processing instead of the initial expression. This can essentially allow the entire power of the language to be used at compile time.

The Zed language allows the full power of the entire language to be used at compile time. The basic ability can be triggered by putting the reserved word 'ctProc' before the name of a proc being defined. With that flag present, any calls to the proc in other code will be executed at compile time, rather than at run time. There are other reserved words that also trigger compile time execution, but they also add additional meaning, and are discussed in their own sections. Additionally, use of of "#" operations can implicitly call, at compile time, procs attached to the types involved. It is also possible for procs running at compile time to locate and call other procs indirectly.

A simple example of a compile-time proc and a call to it:

    proc ctProc
    sum(uint a, b)uint:
        a + b
    corp;

    proc
    test()void:
        uint LIMIT = 10;
        uint j := sum(LIMIT, 3);
    corp;

Proc "sum" looks just like a regular proc, and it performs exactly the same except that calls to it are done at compile time. In this case, when the Zed compiler is compiling proc "test", it will call "sum" with 'uint' constants 10 and 3. Since the Zed compiler does full "constant folding" (evaluation of expressions involving only constants at compile time), there is no reason to write such a simple proc. However, programs might need to compute much more complex formulas, involving loops, conditionals, temporary values, calls to other procs, etc. Such things make more sense as compile-time procs. In this example, "j" could have been declared as a constant, since the value given to it is known at compile time.

If a compile-time proc wishes to call other procs, it can do so, but those procs should not have the 'ctProc' flag, unless it is desired that the second proc be run during compilation of the first one. A special case is that a 'ctProc' proc can be recursive - calls to itself will be done normally, as it is running (which will be during compilation of some other code). Note that you cannot pre-declare compile-time procs, so there is no way to set up mutually recursive compile-time procs.

Actual parameters passed to compile-time procs must be known at compile time. This generally means they are literals, named constants, the result of constant folding or compile time evaluation of conditionals, the result of other compile time calls, etc. Special handling, described later, allows arbitrary parameters, by declaring the formal parameter to be of type "Exec/Exec_t" or of a template type. The result of compile-time procs can be any of the following basic types: 'bool', 'char', 'bits8', 'bits16', 'bits32', 'bits64', 'uint', 'sint', 'float' or 'string'. A compile-time proc can also be 'void', in which case it yields no value and is called for its side effects. A compile-time proc can yield a value of type "Types/Type_t", i.e. a type of some kind. A compile-time proc that yields type "Exec/Exec_t" gets its actual result type from the "Exec_t" that it yields. This means that different calls to the proc can in effect yield different result types, and that any type at all can be yielded. It also means that the other available result types are not actually needed. However, they have been kept for clarity. 'template' types can also be returned, and they are handled much the same as "Exec_t". Compile-time procs can do anything that can be done in the full language, and this includes writing out text, creating windows, starting other programs, etc. The wise programmer will not mis-use this capability.

Many simple compile-time procs just return some value or "Exec/Exec_t" to be used in place of the call. Others return no result (result type 'void') but insert statements into the current context, using "Exec/SequenceAppend". It is possible for a compile-time proc to do both - insert code into its calling context and return a replacement for its call. This, however, should be avoided where possible, since it can be quite confusing. One aspect of that confusion is related to the use of scopes. See the "uintToBits16" example in section "18.9.5 Con and Var Template Declarations".

In general, code running at compile time should not modify global (package-level) data if the order of the modifications matters, and it is not perfectly clear what order the compile-time calls will be made in. This is because various things that the Zed compiler does, such as constant-folding, can change the order of the compile-time calls.

The mixing of compile-time execution and generics can be confusing. When generic procs are initially compiled, actual types or values that they will be dealing with will not yet be known. In some senses, dummy types and values will be used, although this is not an explicit choice; rather, it falls out of what has to happen.

When generic procs are instantiated, some or all of the types and values will be known. Procs marked as 'ctProc', 'varProc' or 'ioProc' will be re-executed as part of the instantiation process. This is normally what is desired, and happens silently, but if such execution has side effects (such as printing out debugging messages), the behaviour can become visible. Note also that calls to 'ctProc' procs (but not the other kinds of compile-time procs) within template sections are delayed until template expansion time, which means that such procs cannot yield "Exec/Exec_t" results, since the actual result type of the call is then not known until template expansion time.

18.2 Package-Level "eval"

This package-level statement has been seen in previous examples. Syntactically, a package-level 'eval' is simply:

'eval'
call to 'ctProc' proc

The 'ctProc' proc is called at that point in processing the package or subpackage containing the 'eval'. The main use of this package item is to attach values to types which have just been declared. These values are often procs, like a custom "Fmt" proc, or handlers for "#" operators.

Examples:

    type Vector_t = [] uint;
    eval FmtAdd(Vector_t);


    record Tracked_t {
        ...
    };

    proc
    trackedNeg(Tracked_t nonNil tr)nonNil Tracked_t:
        ...
    corp;

    Exec/CompileTimeUnary_t: proc
    trackedPostAt(Tracked_t nonNil tr)nonNil Tracked_t:
        ...
    corp;

    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_NEG, trackedNeg);
    eval Types/ExportProcAdd(Tracked_t, Exec/HASH_POST_AT, trackedPostAt);


    struct Multivalued_t {
        ...
    };

    proc
    multivaluedPlus(@ Multivalued_t aDst; @ ro Multivalued_t src)void:
        ...
    corp;

    eval Types/ExportProcAdd(Multivalued_t, Exec/HASH_PLUS, multivaluedPlus);

Because these calls are happening directly during compilation, all of the actual parameters to the calls must be known at compile time. So, no variables can be used, and no proc calls can be used, unless the procs called are also 'ctProc' procs. This can be a nuisance, but can often be worked around by using simple 'ctProc' wrapper procs. [The error here is usually "Cannot have executable code outside of a proc". The 'eval' itself works because calls to 'ctProc' procs are done by the compiler creating an anonymous stub proc to contain only that call. Calls to normal procs do not get that special treatment, and in order to yield a parameter to pass to the 'ctProc' proc would need to execute before the special wrapper proc is created by the compiler.]

18.3 Zed Compiler Internals

Warning: things are about to get "interesting".

Internally, the Zed compiler maintains an unusually clear and strong separation between its parser and its semantic code. This means that the parser, the code that understands the syntax of the Zed language, is often much simpler than similar code in other compilers. The code simply uses temporary state structures exported by the semantic code, and calls entry points into the semantic code, passing subsequent items that it has parsed.

For example, here is the actual parsing code for the 'while' construct from the original C parser for Zed:

    static Exec_Exec_t *
    parseWhile(parse_State_t *ps)
    {
        Exec_TempWhile_t twh;
        Exec_Exec_t *ex;

        Lex_GetToken(ps->ps_ls);        /* consume 'while' */
        Exec_WhileStart(&twh, ps->ps_context);
        Parse_ParseSequence(ps);
        Exec_WhileCondition(&twh);
        parse_expect(ps, lex_tkDo);
        Parse_ParseSequence(ps);
        ex = Exec_WhileNew(&twh);
        parse_expect(ps, lex_tkOd);
        return ex;
    }

[Compiler writers will note the lack of any syntax error recovery!]

Type "Exec_TempWhile_t" is a struct type used to track the progress of defining the 'while'. Procs "Exec_WhileStart", "Exec_WhileCondition" and "Exec_WhileNew" are entry points in the Zed compiler semantic code which are used to check and construct internal data structures for the 'while' construct.

Zed versions of all of the compiler semantic code are available to be called by any programmers - they are part of the standard Zed libraries. There are many consequences to this.

These semantic routines can be called at run time to dynamically create code and new procs. Such newly created procs can then be called by subsequent code. The procs and code are subject to all of the normal rules and activities of the Zed compiler, including compile time evaluation, optimization, native code generation, etc.

The semantic routines can also be called at compile time, from compile-time procs. This is a very powerful ability, as it provides ways for programmers to add new capabilities to the Zed language.

For various technical reasons, all code that runs at compile time runs within the Zed bytecode system. That limits the efficiency of such compile time execution, but in practice is not a problem.

Note that user-written compile time code cannot change the semantics of existing Zed constructs and facilities - allowing that would violate the basic Zed rules of safety, correctness and readability.

There are four basic packages in the Zed compiler semantic code which are most useful to programmers:

Package Package provides the definition of the internal structure of packages. Packages contain a vector of their elements: comments, procs, interfaces, variable definitions, etc. Exported procs allow the addition of new elements to packages. Type "Package/Package_t" is a simple record type.
Package Types provides the internal representation of types within the Zed language, along with exported procs for computing type properties, creating new types, etc. Type "Types/Type_t" is a variant record type.
Package Proc provides the internal representation of procs. It also exports procs allowing the creation of new procs. Type "Proc/Proc_t" is a simple record type.
Package Exec provides the internal representation of Zed language statements, expressions, constructs, etc. Many subtypes, helper types, utility procs, etc. are exported. Type "Exec/Exec_t" is a variant record type. One of its fixed fields is "ex_type" which is a "Types/Type_t" giving the type of the value (or whatever) described by the "Exec_t".

In the Zed language types are "first class" items. That means they can be used as values. The values are of type "Types/Type_t". Type values can use any of the existing syntaxes for defining types. Examples:

    proc
    typeIsInt(Types/Type_t nonNil t)bool:
        t = uint or t = sint
    corp;

    proc
    typeIsVector(Types/Type_t nonNil t)bool:
        if select md := t->t_matrix then
            /* A type is a vector if it is a matrix with 1 dimension. */
            md->md_dimCount = 1
        else
            false
        fi
    corp;

    ...

    Types/Type_t nonNil MyType_t := [] string;
    ...
        if typeIsVector(MyType_t) then
            ...
        fi;

The Zed language's constant folding and conditional compilation interact fully with types as values:

    bool IS_SIGNED = true;
    ...
    proc
    doSomething(if IS_SIGNED then sint else uint fi param)string:
        ...
    corp;

This sort of thing can be difficult to read and understand, so should not be overused and should be well commented.

When the Zed compiler is compiling code, creating procs and declarations, etc., it needs a context within which to work. That context is provided by simple record type "Package/PContext_t". One field of that record is "pctx_ectx" which, if not 'nil', references a record of type "Exec/EContext_t". That record references the current proc being compiled, and contains other active state relating to the working of package Exec. The PContext_t also contains a reference to the current package and various other information. PContext_t's are passed to many calls within the four main semantic packages.

When a compile-time proc is called, the Zed compiler can pass the current PContext_t to it, as an implicit first parameter. All the compile-time proc has to do is declare an initial parameter of that type - the compiler will do the rest. Calls to the compile-time proc should not provide an explicit value for the PContext_t.

The Zed language does not specify exactly when compile time execution happens. It happens during "compile time", but that concept is not defined. Thus, compile time execution should normally do the same thing and yield the same result every time it happens. It is not required to do so, but there are few uses for code that does not.

[Consider a situation where code is being edited in an interactive development environment. As changes are made, code is recompiled as needed. If that code contains compile time execution, that execution will happen as needed by the environment. If nearby code depends on values from other packages, changes in those other packages can trigger recompilation, and thus re-execution of the compile time code. Similarly, when code is expanded from a byte-stream, such as when it is read from disk or from a network connection, compile time execution might be triggered.]

18.4 Examples

Putting a bunch of things together, and using a number of entry points that programmers will have to look up, we can make a complete example which uses types and semantic entry points from the Package, Exec and Proc packages:

  1    proc
  2    createProc()void:
  3       /* We need a PContext_t for "." in order to create a proc in it. */
  4       Package/PContext_t nonNil pctx :=
  5           Package/CreatePContext(nil, nil, ., 1, 1);
  6
  7       /* Start defining a proc using that PContext_t. */
  8       Proc/TempDefineProc_t tdp;
  9       pctx := Proc/DefineProcStart(@tdp, pctx, "myProc", Package/nl_private,
 10                                    Proc/pt_regular);
 11       /* There are no formal parameters, but the proc returns 'uint'. */
 12       Proc/DefineProcMiddle(@tdp, uint, false);
 13
 14       /* Create a simple Exec_t tree for the proc: "123 + 456" */
 15       Exec/Exec_t nonNil left := Exec/UintConstantNew(nil, 123);
 16       Exec/Exec_t nonNil right := Exec/UintConstantNew(nil, 456);
 17       Exec/TempBinary_t tbin;
 18       Exec/BinaryStart(@tbin, pctx, left, Exec/bo_add);
 19       Exec/Exec_t nonNil sumExpr := Exec/BinaryNew(@tbin, right);
 20
 21       /* Complete the proc definition. */
 22       Exec/SequenceAppend(pctx, sumExpr);
 23       if assign Proc/Proc_t pr := Proc/DefineProcEnd(@tdp) then
 24           /* Attempt to call the new proc. */
 25           if assign proc()uint nonNil f := pr then
 26               Fmt("proc assign succeeded - calling it");
 27               uint answer := {f}();
 28               Fmt("Done it - got answer ", answer);
 29           else
 30               Fmt("proc assign failed!");
 31           fi;
 32       else
 33           Fmt("Failed to create new proc!");
 34       fi;
 35    corp;

In this example, long-form declarations have been used throughout, in order to show the system types involved. In typical code, this would not be done.

Line 4 shows creating a new "PContext_t". Most code will not have to do this - they will use an active one provided by the compiler. Note the syntax '.' for the package to use. Normal shell-level paths can be used directly in Zed code - they are evaluated at compile time. So, '.' evaluates to the current package when "createProc" is being compiled, which is the package within which "createProc" is defined.

Line 8 shows the use of one of many temporary state 'struct's, this one used during the proc creation process.

Lines 9 and 10 start the proc creation process. Note that a new "PContext_t" is obtained and used thereafter. This new version has the proc being created ("myProc") as the active proc, within which executable code is being created. This is important for things like looking up formal parameter names. "nl_private" is a "Package/NameLevel_t" which corresponds to having no 'export' or 'local' in front of a symbol being defined. "pt_regular" is a "Proc/ProcType_t" which means the proc we are creating will not be 'ctProc' or some other special kind.

Our proc has no parameters, so there are no calls to "Proc/AddFormal". Line 12 specifies that the proc returns a 'uint', and the result is not 'nonNil' ('nonNil' is not possible for 'uint'). Note that the calls here for "Start", "Middle" and "End" are wrappers for more detailed calls for phases 1 through 8 - this example did not need the additional control allowed.

Lines 14 - 19 define the body of the proc. Here, the body is simply the addition of 'uint' constants "123" and "456". If the generated bytecode for "myProc" is disassembled, it is seen that the body simply pushes "579" onto the stack and returns that. This is because constant folding is done, inside "BinaryNew".

The convention within the Zed compiler is that you don't pass an explicit "Exec/Exec_t" for the proc body. Instead, you append elements to a "scope" and "sequence" which are created inside "DefineProcMiddle". That scope is then ended and used as the body of the proc, inside "DefineProcEnd".

On line 23, we end the proc definition and test to see if it worked. Note that "DefineProcEnd" returns a "Proc/Proc_t" since it must handle any type of proc (any parameter set, result type, etc.). So, an 'assign' construct is needed to check (during "createProc"'s execution) that the proc signature is what we expect - line 25. After that checking, we call the new proc on line 27.

Note that new proc "myProc" will have been added to the current package. If "createProc" is called a second time, it will fail since there is already a symbol "myProc" in the package.

This example is not in fact a compile-time proc, but it shows some of the environment in which compile-time procs run.

The simplest form of compile-time proc is created by adding the reserved word 'ctProc' after the 'proc' in the proc definition. This tells the Zed compiler that the proc should be run at compile-time, with special handling for an initial "Package/PContext_t" parameter. Other kinds of compile-time procs are discussed later.

A simple 'ctProc' proc example:

  1    proc ctProc
  2    addOne(Package/PContext_t nonNil pctx; Exec/Exec_t nonNil ex)nonNil Exec/Exec_t:
  3        Exec/TempBinary_t tbin;
  4        if ex->ex_type = uint or ex->ex_type = sint then
  5            Exec/BinaryStart(@tbin, pctx, ex, Exec/bo_add);
  6            Exec/BinaryNew(@tbin, Exec/UintConstantNew(nil, 1))
  7        elif ex->ex_type = float then
  8            Exec/BinaryStart(@tbin, pctx, ex, Exec/bo_add);
  9            Exec/BinaryNew(@tbin, Exec/FloatConstantNew(nil, 1.0))
 10        elif ex->ex_type = string then
 11            Exec/BinaryStart(@tbin, pctx, ex, Exec/bo_add);
 12            Exec/BinaryNew(@tbin, Exec/StringConstantNew("one"))
 13        else
 14            Package/EmitErrorString(pctx, "addOne: parameter not int, float or string");
 15            Exec/ErrorNew(pctx)
 16        fi
 17    corp;
 18
 19    proc
 20    useAddOne()void:
 21        uint u := 1;
 22        sint s := 2;
 23        u := addOne(u);
 24        s := addOne(s);
 25        float f := addOne(3.0);
 26        string str := addOne("hello");
 27        bool flag := addOne(true);
 28    corp;

Parameter "ex" to proc "addOne" is of type "Exec/Exec_t nonNil". "Exec_t" is a variant record type, and a value will always be passed - the Zed compiler will not call "addOne" if no parameter is given on the call. If a call has 'nil' as the actual parameter, then an "Exec_t" of kind "exk_nil" will be passed.

For whatever chunk of code is present as the actual value to a compile-time proc parameter of type "Exec/Exec_t", the Zed compiler will pass an "Exec_t" which represents that value. On lines 23 and 24 that will be "exk_localVarRef". On line 25 it will be "exk_floatConstant", and on line 26 it will be "exk_stringConstant". Line 27 will pass "exk_boolConstant", but that is never used since "addOne" rejects all 'bool' values.

"addOne" returns a "nonNil Exec/Exec_t", representing the adding that it has chosen to do. That "Exec_t" replaces the call to "addOne" for further processing by the Zed compiler, all the way to code generation. As mentioned previously, a compile-time proc can also be 'void'. In that case, the Zed compiler assumes that the proc has used "Exec/SequenceAppend" to append code elements (typically statements) to the currently active sequence of statements. To avoid misleading operation, a compile-time proc that is not 'void' should not append code elements in that way.

The above example shows typical uses for the implicit "PContext_t" parameter that is available - for errors and as needed by the various Exec procs. "addOne"'s error message will be shown for line 27, since "addOne" doesn't support adding one to a 'bool' value. Both "EmitErrorString" and "ErrorNew" will mark the "PContext_t" as having an error, which will prevent code generation for "useAddOne".

This example only examines the type of the values represented by the parameter to "addOne", in order to correctly do the adding. More complex examples can look inside "Exec_t"'s and extract parts of them, often doing different code creation based on the nature of the "Exec_t"s.

It is possible to get quite devious and convoluted using compile time execution. The only real reason for much of that is in attempts to get past the many semantic checks in the Zed compiler. Sometimes, it is possible for a check to mark an error without actually emitting an error message (see "Package/AddError"). More common is that the error message comes out at an unexpected place. For example, the Zed compiler re-validates procs before generating any code for them - if an error is only generated during that process, the error message will come out in relation to the end of the proc.

18.5 Library "ctProc" Procs

The Zed system provides some pre-defined procs which are 'ctProc'. Because of their nature, they are intended to have an effect during compilation. Such procs exported from package Package are:

    export proc ctProc
    SetWarningLevel(PContext_t pctx; uint newLevel)uint;

    export proc ctProc
    SetOptLevel(PContext_t pctx; uint newLevel)uint;

    export proc ctProc
    SetInfoLevel(PContext_t pctx; uint newLevel)uint;

As with other 'ctProc' procs, the programmer does not supply the "Package/PContext_t" actual parameter - that is supplied automatically by the Zed compiler. Proc "SetWarningLevel" sets the current warning level to the supplied "newLevel", and returns the old warning level. Typically, the warning level is then restored after the section of code in which the value needed to be changed. The following example is from system package Types:

    uint OLD_WARN_LEVEL = Package/SetWarningLevel(1);
    rSize@ := Basic/BYTES_PER_CHAR * Basic/BITS_PER_BYTE;
    eval Package/SetWarningLevel(OLD_WARN_LEVEL);

If the "rSize" expression is evaluated at a high warning level, a warning is produced since "BYTES_PER_CHAR" is 1. I like to have the basic Zed sources all compilable without warnings at the highest warning levels, so this protection is needed to avoid a warning about multiplying by 1.

Proc "SetOptLevel" does a similar thing for the optimization level. The Zed sources do not currently use it - I might remove it, because I have a different mechanism in mind for controlling optimization at the expression and statement level.

Proc "SetInfoLevel" sets the information level. This could be used to make sure that certain information messages are produced, or to hide information messages.

18.6 "varProc" Procs

"varProc" procs are a kind of compile-time proc in Zed. This facility looks similar to the old C standardized ways ("varargs.h" or "stdargs.h") for accessing function arguments in a way which allows the function to have an arbitrary number of arbitrary-typed arguments. Zed 'varProc' procs do not operate at all like the C ones, but from the point of view of the using programmer, they appear similar.

When a 'varProc' proc is called, the call looks like a normal proc call, except that there can be any number of any type of actual arguments. If the definition of the proc contains formal parameters, then those formal parameters must be matched by the first actual arguments given. 'varProc' procs can be 'void' or can return a result.

A 'varProc' proc is created by putting 'varProc' after the 'proc' in the proc definition. For example:

    proc varProc
    show(Package/PContext_t nonNil pctx; string nonNil header)uint:
        ...
    corp;

The "pctx" parameter must be used for a 'varProc' proc since that proc must pass it to the Zed system, as shown below. The value for "pctx" is provided implicitly by the Zed system.

The 'varProc' proc itself runs at compile time, and interacts with the Zed system to handle the variable arguments given to it. This interaction is done via exported interface "ActiveVarProcCall_t" defined as follows:

    export interface ActiveVarProcCall_t {
        proc AppendArg(poly avpc; Exec_t nonNil ex)bool;
        proc Complete(poly avpc)bool;
    };

To implement a 'varProc' proc, the programmer must provide a 'capsule' type which implements the above interface. Any 'record' data in the capsule is up to the programmer, but generally it is whatever is needed to accumulate or deal with the variable arguments.

After the Zed system has handled any fixed parameters to a call to a 'varProc' proc, it will call the proc itself, at compile time. The proc should create an object of its capsule type, initialized as needed, and pass it to "Exec/ProvideVarProcCallHandler", along with the active "Package/PContext_t".

After that, the Zed system will call the capsule's "AppendArg" method for each encountered variable argument. When there are no more such arguments, the Zed system will call the capsule's "Complete" method. The methods will typically use "Exec/SequenceAppend" to append any needed code to replace the 'varProc' call. The last such item will be the result of the overall 'varProc' call.

The two methods both yield a 'bool' result. This allows them to tell the Zed system that there has been an error in the call to the 'varProc' proc, by yielding 'true' on error cases. This will cause the Zed system to mark the call as being in error. Creating invalid code is another way to do this, but that can be quite confusing to the end user.

Remember, the calls to the methods happen at compile time, so the actual needed run time work cannot be done directly inside those procs. Instead, those procs need to create Zed code which will do the needed work at run time.

The following is a complete example of a 'varProc' proc implementation:

    capsule SumActiveVarProcCall_t implements Exec/ActiveVarProcCall_t {
        record {
            Package/PContext_t nonNil sumvpc_pctx;
            Exec/Exec_t sumvpc_varRef;
        };

        procs Exec/ActiveVarProcCall_t {
            proc
            AppendArg(SumActiveVarProcCall_t nonNil sumvpc;
                      Exec/Exec_t nonNil valueEx)bool:
                Package/PContext_t nonNil pctx := sumvpc->sumvpc_pctx;
                if assign Exec/Exec_t varRef := sumvpc->sumvpc_varRef then
                    /* For each subsequent argument we get, add it to our temporary
                       variable. */
                    Exec/TempAssignment_t tass;
                    Exec/AssignmentStart(@tass, pctx, varRef);
                    Exec/TempBinary_t tbin;
                    Exec/BinaryStart(@tbin, pctx, varRef, Exec/bo_add);
                    Exec/Exec_t nonNil partSum := Exec/BinaryNew(@tbin, valueEx);
                    Exec/SequenceAppend(pctx, Exec/AssignmentNew(@tass, partSum));
                else
                    /* On our first call, declare a temporary variable to
                       accumulate into, and initialize it to the first value. Note
                       that its type comes from the first value. */
                    sumvpc->sumvpc_varRef :=
                        Exec/DeclareAndRefTemp(pctx, valueEx->ex_type,
                                               Types/StorageFlags_t{}, valueEx);
                fi;
                false
            corp;

            proc
            Complete(SumActiveVarProcCall_t nonNil sumvpc)bool:
                if assign Exec/Exec_t varRef := sumvpc->sumvpc_varRef then
                    Exec/SequenceAppend(sumvpc->sumvpc_pctx, varRef);
                fi;
                false
            corp;
        };
    };

    proc varProc
    Sum(Package/PContext_t nonNil pctx)nonNil Exec/Exec_t:
        Exec/ProvideVarProcCallHandler(pctx, SumActiveVarProcCall_t(pctx, nil));
        Exec/ErrorEx
    corp;

This code defines a "Sum" 'varProc' proc which accumulates the sum of the values passed to it, and returns that sum as its result. Note that the standard Exec code for binary operators is used to do the addition ("BinaryStart" and "BinaryNew"). That code can handle addition of signed and unsigned integers, floating point values and strings. The code above takes advantage of that flexibility to allow "Sum" to accumulate any of those types. The first part of how that is done is in the call to "Exec/DeclareAndRefTemp", where the "Types/Type_t" parameter value is taken from the first argument passed to "Sum" (as "valueEx"). That results in the temporary variable used for accumulation being declared with the same type as that first argument. The second part is in the result type of "Sum" itself, which is "Exec/Exec_t".

When a compile-time proc is declared to yield Exec_t, the Zed system takes the actual result type from the type of the yielded value. That way, different calls to the compile-time proc can yield different result types. If "Sum" had been declared to yield 'uint' for example, then it would be expected to yield a 'uint' via "Complete". Presumeably, "Sum" would then only work with 'uint' arguments. The Zed system does not require that, however.

By way of explanation, consider the following call to "Sum":

    Sum(12.3, 45.6)

In some detail, the steps involved are:

Since "Sum" has no fixed formal parameters, other than the hidden "pctx" parameter, there are no initial parameters to be processed using the normal parameter handling mechanisms for Zed proc calls.
"Sum" itself is called. It creates an object of the capsule type, and passes that, along with "pctx", to "Exec/ProvideVarProcCallHandler". This sets up the interaction between the "Sum" system and the Zed system. Since "Sum" is declared to return a 'nonNil' "Exec/Exec_t", it must do so. The value will be totally ignored by the Zed system, so using "Exec/ErrorEx", which is a freely available error object, works fine. This oddity is needed because the declaration of a 'varProc' proc is where the result type of the proc is specified.
The Zed system parses the "12.3" argument. This could be any expression that "Sum" can work with, and is dealt with normally. When the argument is fully handled otherwise, the Zed system calls Sum's "AppendArg" method, providing it with the Exec_t representation for the floating point constant 12.3 .
"AppendArg" checks the "sumvpc_varRef" data field of the capsule object and finds it to be 'nil'. So, it knows this is the first variable argument of this call. It uses "Exec/DeclareAndRefTemp" to create a variable in which to accumulate the values, and uses the passed argument Exec_t to initialize it. As mentioned above, the temporary variable is declared with the type of this first argument. The "Sum" code doesn't check for type consistency of its arguments - it lets the Zed binary operator code complain.
After the first call to "AppendArg" returns, the Zed system continues handling the call to "Sum". The parser will check for a separating ',' or a terminating ')'. In this case another argument, "45.6", is found. So, "AppendArg" is called again with that argument. On this call "AppendArg" finds that "sumvpc_varRef" is not 'nil'. So, it must create code to add the new argument to the sum already accumulated in the temporary variable. It does this by using Zed assignment and binary operator builders. See the Zed compiler code for more details on those. Here, the operator wanted is '+', so "Exec/bo_add" is passed to "BinaryStart" to specify that.
There are no more arguments to the call, so the Zed system will now call Sum's "Complete" method. That method uses "Exec/SequenceAppend" to yield the value of the temporary variable as the final result of the "Sum" call.
The Zed system will now do any additional work needed to complete the "Sum" call. In this case, since "Sum" was declared to yield an Exec_t, it will take the type of the sequence represented by that Exec_t and use that as the type of the entire "Sum" call. Note that if there were no arguments, there will be no calls to "AppendArg", and the sequence within the scope that Zed wraps around the entire operation will be empty. Zed internal code will yield an empty 'void' value in that case, so the entire "Sum" call will yield 'void', i.e. no value.

As further explanation, consider a test proc which contains just:

    proc
    test4()void:
        Fmt("Sum is ", Sum(12.3, 45.6));
    corp;

"Fmt" is an 'ioProc', as described below, which also runs at compile time to replace itself with generated code. If we display the hidden code actually created for this test proc, we get:

    proc
    test4()void:
            fmtString0("Sum is ");
        fmtFloat00(
            float __L0 := 12.3;
            __L0 := __L0 + 45.6;
            __L0
);
        fmtFlushL();
    corp

This is poorly formatted, since the actual stucture created is not one which would normally be created by a programmer following the Zed language as defined in this document - it is an internal form. In particular, the entire sequence of code for "Sum" is itself the argument to internal "Fmt" proc "fmtFloat00". What concerns us here is the sequence:

    float __L0 := 12.3;
    __L0 := __L0 + 45.6;
    __L0

This is the sequence created by the "Sum" 'varProc' call.

"Sum" is a fairly simple example. Its only complexity is its allowance for multiple types of summation. Other 'varProc' procs could be used for output formatting, list or vector construction, etc.

This example proc does not do "constant folding". In the "createProc" example in "18.4 Examples", constant folding was done by "Exec/BinaryNew". Why doesn't that happen here? It doesn't happen here because the calls to "BinaryNew" all involve a run-time variable (created by "DeclareAndRefTemp"). "Sum" could be made more complex, to check for constant parameters, and so accumulate and yield a constant result when possible.

The implementation of "Sum" above involves several direct Exec calls. Some of those can be replaced by 'template' operations. Templates are described in "18.9 Templates". However, since templates require strict typing, the type flexibility is lost. Also, since "Sum" performs such a simple operation, the overheads involved in using templates are proportionately large. Recoding using templates with 'uint' arguments yields:

    capsule SumUTActiveVarProcCall_t implements Exec/ActiveVarProcCall_t {
        record {
            Package/PContext_t nonNil sumvpc_pctx;
            template uint sumvpc_varRef;
        };

        procs Exec/ActiveVarProcCall_t {
            proc
            AppendArg(SumUTActiveVarProcCall_t nonNil sumvpc;
                      Exec/Exec_t nonNil valueEx)bool:
                assert assign template ro uint con tValue := valueEx;
                if assign con tVar := sumvpc->sumvpc_varRef then
                    template begin
                        tVar := tVar + tValue;
                    end;
                else
                    con name := Proc/CreateNewName(sumvpc->sumvpc_pctx);
                    template begin
                        con template tVar uint name := tValue;
                    end;
                    sumvpc->sumvpc_varRef := tVar;
                fi;
                false
            corp;

            proc
            Complete(SumUTActiveVarProcCall_t nonNil sumvpc)bool:
                if assign con tVar := sumvpc->sumvpc_varRef then
                    template begin
                        tVar
                    end;
                fi;
                false
            corp;
        };
    };

    proc varProc
    SumUT(Package/PContext_t nonNil pctx)uint:
        Exec/ProvideVarProcCallHandler(pctx, SumUTActiveVarProcCall_t(pctx, nil));
        0
    corp;

In proc "AppendArg", we have declared parameter "valueEx" as an "Exec/Exec_t". That requires us to have the 'assert' to get a "template uint" out of it. This is needed because the 'varProc' interface specifies "Exec/Exec_t" for that parameter, and our implementation of the interface must match. We use "template ro uint" since we don't need write access and to allow passed values to be non-writeable, e.g. literals.

Note the somewhat strange construction in "Complete" - we have a template block, which itself is a statement (i.e. 'void') whose contents is non-void. This is as needed to allow the created block to yield the 'uint' result. There cannot be a ';' after the "tVar", since that would be trying to create a statement out of an expression. We cannot 'eval' "tVar", since that would result in "SumUT" yielding 'void'.

For further examples, see "18.9.7 Varargs Examples".

18.7 "ioProc" Procs

"ioProcs" are another kind of compile-time proc in Zed. They operate similar to 'varProc' procs, but the interaction is more complex. "ioProcs" are designed for use with text output formatting routines, which play a role similar to "printf" in C. The primary example of these is standard package "Fmt" which implements a capable and general text output facility using "ioProcs". Also available is standard package "Debug" which re-uses code in "Fmt" to implement a debug print facility which allows arbitrary output, along with debug level control at both compile time and execution time.

When an 'ioProc' is used in code, a call to it must consist of:

path to the 'ioProc'
'('
zero or more regular parameters, separated by commas
zero or more ioProc value parameters, separated by commas, and separated from any regular parameters by a comma
')'

Each "ioProc value parameter" consists of a main expression - the value to be formatted, optionally followed by formatting codes and sizes. Using square brackets to indicate optional parts, these look like:

    <value> [C<'::'> <format-codes>] [C<':'> [<field-width>] [C<':'> <precision>]]

The <format-codes> can be either a name or a string literal. Usually a name is provided, but the name is not looked up in any way - the string of it is simply passed in to the 'ioProc' handler to interpret as it wishes. The string literal form is provided to allow format codes to be given which are not valid names. Note that there must be a space between the colons when there is a precision but no field width, otherwise you have the '::' token.

Both <field-width> and <precision> must be expressions. The type of these expressions is usually 'uint', but the Zed compiler does not enforce that - it is up to the 'ioProc' code. The standard interpretation is that <field-width> gives the number of characters that the main expression value is to be formatted into, and that <precision> is the number of digits after the decimal point that should be shown for floating point values. One possible alternate interpretation of the <field-width> could be as a repetition count in a vector or list constructor 'ioProc'.

Using "FmtS" as an example, the following:

    FmtS(127 :: X0 : 3)
    FmtS(127 :: d : 10)
    FmtS(123.456 : 10 : 4, "|", 123.456 :: f0 : 12 : 2, "|", 123.456 :: e : 10)

would yield:

    "0x7f"
    "       127"
    "  123.4560|000000123.46|+0.12e+003"

The first two examples have only one actual parameter passed to "FmtS", while the third has 5 parameters, two of which consist only of the value to be formatted.

The first example has "format-codes" "X0", which tells "FmtS" to use hexadecimal output with a leading "0x" and leading 0's. That format requires 4 characters of output which is greater than the requested "field-width" of 3, so "FmtS" has silently used more characters of output than it was told to.

The second example supplies a "format-codes" of "d", which is simply the default decimal output for "FmtS", and a "field width" of 10, so 7 leading spaces are added.

The third example shows 5 different "ioProc value parameters" being passed to "FmtS". Two of them are the string literal "|", which "FmtS" copies directly to the output. The other 3 are 3 different formattings of the same floating point value. The first has no "format-codes", so "FmtS" will use default formatting for floating point values, which is "F format" for medium-sized values and "E format" for larger or smaller values. A "field-width" of 10 has been requested, and a "precision" of 4. So, "FmtS" has added a trailing 0 to make 4 digits after the decimal point, and 2 leading spaces to make a total width of 10 characters. The remaining two outputs of the same value simply show more options that work with "FmtS".

Note that just as the overall evaluation order of parameters to proc calls is not specified by the Zed language, the order of evaluation of the various parts of an 'ioProc' parameter is not specified.

Proc "FmtB" is a variant which writes its output into a "CharBuffer/OBuf_t" - its proc header is:

    export proc ioProc
    FmtB(Package/PContext_t nonNil pctx; Exec/Exec_t nonNil bufExec)void;

The presence of the 'ioProc' proc kind tells the Zed compiler how to handle calls to this proc. Here we see the usual "PContext_t" formal parameter, for which an actual value is implicitly passed by the Zed compiler. We also see a second required parameter, "bufExec", which is an internal representation of an expression which yields a "CharBuffer/Buf_t" for "FmtB" to append its output to. This is not directly a "CharBuffer/Buf_t" because a value of that type is required at run time, not at compile time. No explicit formal parameter is used for the arguments to be formatted.

Code provided for use with an 'ioProc' proc must interact with the Zed parser in order to receive the variable number of parameters following any fixed parameters, and to deal with the optional "format-codes" and sizes. This is done via interface "Exec/ActiveIoCall_t", defined as follows:

    export interface ActiveIoCall_t {
        proc PhaseHandler(poly aiocl; Exec_t ex; IoPhase_t iop)bool;
        proc FormatHandler(poly aiocl; string nonNil format)bool;
    };

Type "Exec/IoPhase_t" is an enumeration type defined as:

    export enum IoPhase_t {
        iop_main,               // the main value is being presented
        iop_width,              // the field width is being presented
        iop_precision,          // the precision is being presented
        iop_done,               // end of the individual IO call
        iop_complete            // end of the entire IO statement
    };

When the Zed semantic code is handling a call to an 'ioProc' proc, it will call the proc after it has gathered all of the fixed parameters that the proc requires. This call will be before any of the variable actual parameters have been processed. The code in the 'ioProc' must call back into the Zed Exec code, providing a capsule object of a capsule which implements "ActiveIoCall_t", to proc "ProvideIoHandler", which is declared as:

    export proc
    ProvideIoHandler(Package/PContext_t nonNil pctx;
                     ActiveIoCall_t nonNil aiocl)void;

After the 'ioProc' has returned, the Zed parser will begin dealing with the variable arguments. For each one, it will call into the provided "ActiveIoCall_t" handlers as follows:

    PhaseHandler(..., <Exec_t-for-value>, iop_main);

    if there is a format name or string then
        FormatHandler(..., <name-or-string>);
    fi;

    if there is a field width then
        PhaseHandler(..., <Exec_t-for-width>, iop_width);
    fi;
    if there is a precision then
        PhaseHandler(..., <Exec_t-for-precision>, iop_precision);
    fi;

    PhaseHandler(..., nil, iop_done);

After all of the variable arguments have been dealt with, the call:

    PhaseHandler(..., nil, iop_complete);

is made. This last call, along with the initial call to the 'ioProc' itself, mark the start and end of the variable arguments, if any. The "PhaseHandler" and "FormatHandler" procs can emit error messages via "Package/EmitError" as desired. They return 'false' if they detect no errors on that particular call, and 'true' if they detect an error.

See all of the previous material on compile time execution for information and examples relating to what an 'ioProc' can do. Consider the prototype for "Fmt/FmtS":

    export proc ioProc
    FmtS(Package/PContext_t nonNil pctx)nonNil string;

This 'ioProc' is declared to return a string. This means that the actual body of "FmtS" must yield a string, but that string value is totally ignored. "FmtS" runs at compile time, so it must arrange for the final code it adds to the current context to be code that yields the actual string result. This is typically done in the above "PhaseHandler(..., nil, iop_complete)" call. That final call might also be adding code that flushes all of the data so far accumulated.

18.8 "cTime" Procs

Some procs which run at compile time are short and self contained. Others can be much larger and require helper code. If that helper code is not intended for any run-time use, then it is helpful if there are ways to avoid it ever appearing in runtime binaries.

One way to do that is to split up packages containing compile-time code into subpackages containing only compile-time code and subpackages containing code used at run time (and possibly also at compile time). Then, libraries containing that material only need to contain the non-compile-time-only code. They would contain run-time-only code and code used at both compile time and run time.

Another way to accomplish this in Zed is to put the reserved word 'cTime' in the proc header. With that flag set, code generators for native code can skip generating code for such procs. 'cTime' should not be given with 'ctProc', 'varProc' or 'ioProc' since those flags already force compile-time-only execution, and code generators should also skip them.

18.9 Templates

[The current version of templates in Zed is the second attempt for them. It uses most of the implementation from the first attempt, with some new parts added. I gave up on the first attempt because they didn't work for the things I wanted them to work for. However, I didn't remove them from the compiler, hoping that I could come back to them with better success.

I was able to do that - templates are indeed useful, just not for the things I was initially trying to use them for. The first version had a concept called "dynamic variable", which syntactically looked like an expression inside curly braces. The theory was that the expression yielded a string which was to be used as the name of a variable at template expansion time. That just doesn't work, but the idea of "template variable" does work and so that is what has survived.

I have been unable to convince myself that template usage is 100% safe. There may be ways to do things with templates (that you couldn't also do with the exported Exec/Types/... procs) that could lead to semantic violation of Zed's integrity. My recent concerns had to do with scopes and sequences, and programmers finding ways to trick the compiler into allowing invalid references to local variables, etc. See, for example .../Test/TestOk/LocalCheat.zed, although the testing there doesn't use templates. Also, are there any things which you can do with templates that you can't do with Exec/etc. calls?]

18.9.1 Template Types

Template types are like '@', '*' and 'path' types in form:

'template'
optional storage flags ('con', 'ro', 'volatile', 'nonNil')
templated type

Template values are tracked values. See the discussion below for the meaning of template type storage flags.

18.9.2 Template Introduction

The first example in section 18.3 has only a very simple piece of code being created: "123 + 456". Doing that took 6 lines of code. More complex pieces of code require correspondingly more lines of code to manually create, and there are often more requirements placed on the creation process. It very quickly becomes a large task to create complex code. Because so many Exec and other procs and types are involved there is a maintainance burden as well.

The main solution to this problem in Zed is 'template's. Templates appear within, and are "executed" within, procs running at compile time. This lets them be conditionally and repetitively applied. Syntactically, the body of a template is normal Zed code. The key thing about it is that it can refer to "template variables" which provide pieces of executable code to be inserted into the template during the process of "template expansion".

Here is a simple example:

    proc ctProc
    Inc1(template uint tU)void:
        template begin
            tU := tU + 1;
        end;
    corp;

    proc
    useInc1()void:
        uint i := 7;
        Inc1(i);
    corp;

This example is overly simple. "Inc1" is a compile-time proc that expects a single parameter which is the description of a 'uint' value (in particular, one which can be modified). It uses that description twice in order to add 1 to its value. "useInc1" shows a use of "Inc1". The procs together are equivalent to:

    proc
    useInc1()void:
        uint i := 7;
        i := i + 1;
    corp;

Any type T can have 'template' put in front of it, including 'void'. The resulting type means 'an Exec_t whose "ex_type" field is T'. There can be storage flags 'con', 'ro', 'volatile' and 'nonNil' between the 'template' and the subtype, similar to how storage flags can be used with '@', etc. types. The compatibility rules regarding the storage flags are the same for 'template' types as for pointer, etc. types.

When a compile-time proc with one or more template parameters is called, the effect is the same as if those parameters were of type "Exec/Exec_t" except that additional checking is done to make sure the values passed match the template types and storage flags. As such the values will always be 'nonNil' (some "Exec_t" will be passed, even if it is one representing 'nil' - "exk_nil"). A template value created with a template expression is also 'nonNil'. Template formals and locals can be explicitly assigned value 'nil', and if they are nil when they are needed during template expansion, an error occurs.

Because a template value is an "Exec/Exec_t" value, any template value is compatible with "Exec_t". The reverse is not true - template types are more restrictive because they include a specific subtype, and because they have storage flags which must be satisfied. Getting a template value from an "Exec_t" requires an 'assign' construct, either the 'assert' form or the 'if' form. Both involve a run-time type check, but since such code is almost always running during compilation, that cost is not significant.

If a template value does not have the 'con' or 'ro' template attribute, the value it represents can be assigned to in template sections (see below). If a compile-time proc is passed a constant for a template formal parameter without 'con', an error occurs. In particular, values, like literal constants, cannot be assigned to. If a template parameter is never used as a destination in a template section, declare it as 'template' 'ro' to avoid this problem. For many situations, 'template' 'con' actually works better, since the iterator variables in 'for' loops are 'con' within the loop body. Note that the 'template' 'ro' is independent of any 'ro' attribute on the template parameter itself, which simply prevents assignments to the parameter within the proc.

Proc formal parameters of template types can be confusing, especially if the type involved is a type which can itself have storage flags. Consider the following formal parameter declaration:

    template ro @ volatile uint var tAU

The overall syntax for declaring a single formal parameter is <type> <optional storage flags> <name>. Since the formal name is clearly "tAU" ("template" "at" "uint"), we know that the final 'var' must be the storage flags associated with the proc formal itself. That 'var' means that the formal is allowed to change within the proc. We then know that "ro @ volatile uint" includes the storage flags for the 'template' type and the templated type. An '@' type can have storage flags, between the '@' and its subtype. So, the 'volatile' must be the storage flags of the '@' type. That leaves the 'ro' as the storage flags for the 'template' type. 'ro' here means that the represented template value cannot be used in a writeable way within the proc. And, we see that the templated type is "@ volatile uint".

Template expressions and template blocks must appear only within code which is running at compile time. Attempts to use them elsewhere cannot be detected by the Zed compiler, and so will result in run-time errors.

Code within template sections can reference package-level names, in the package containing the template code and in other packages such code utilizes. Such references can be valid because of active 'use' or 'import' clauses. The context into which the template body is being expanded might not have the needed clauses, and thus the references would be invalid there. This can be avoided by using absolute paths for such references. However, the Zed compiler simplifies things by automatically converting all such references within 'private' 'package' sections within template sections into absolute paths. This includes names within the active package itself. This conversion is invisible to the programmer, unless the programmer uses a disassembler on the resulting code, or displays the "alternate" code of the proc into which the template is expanded.

18.9.3 Template Blocks

Syntactically, a template block consists of:

'template'
'begin'
template section, which is like a proc body or 'if' body, etc.
'end'

When a template block is encountered in a running proc, the template section is copied, with all uses of template proc formals or local variables replaced by their current values. The resulting "Exec_t" is then appended to the open scope/sequence in the code containing the call to the running (at compile-time) proc. The effect of this can be seen in the example above. Note that expressions of template types other than direct references to proc formals or locals are not allowed within template sections. This minor restriction simply requires that such values be assigned to a local variable before the template section, and then the local variable can be used in the template section.

Most proc formals or locals of non-template types which are not declared within template sections cannot be referenced within template sections. Exceptions to this rule are the "simple" types 'bool', 'char', 'uint', 'sint', 'float' and 'string' along with "Exec/Exec_t". Those are discussed later. The other exception is within the "inserted variable name" expression in template variables, described below. An illegal example:

    ...
    Record_t nonNil r := Record_t(...);
    template begin
        uint j := r->uintField;
    end;

What would the assignment statement mean? "r" is a local variable which exists only during execution of the above code. Variable "j" will exist when the proc, within the compilation of which the above code is running, is itself finally running. The record object that "r" refers to might not even exist when that final code is running.

The above simple types which are allowed inside template sections are handled by getting the value of the formal/local at template expansion time, and creating a constant of that value which is then used within the code which the template section is being expanded into. Similarly, a value of type "Exec/Exec_t" is fetched, but instead of creating a constant from it, the "Exec_t" itself is validated and used in place of the reference to the proc formal or local. [If you don't follow this, don't worry about it. If you can word it better for me, please do!]

For example:

    proc ctProc
    doCTStuff(template uint tU; uint u)void:
        for i from 1 upto 3 do
            template begin
                tU := tU + strict(i * 10 / u);
            end;
        od;
    corp;

    proc
    test2()void:
        uint myUint := 10;
        doCTStuff(myUint, 5);
    corp;

Here, "tU" is a template formal/variable, but "u" and "i" are just proc locals (not template). As "doCTStuff" executes its loop, they will have values at template expansion time. Those values will be treated as 'uint' literals in the code that is inserted into "test2", which here will be 3 assignment statements involving "myUint". The result will be equivalent to:

    proc
    test2()void:
        uint myUint := 10;
        myUint := myUint + 1 * 10 / 5;
        myUint := myUint + 2 * 10 / 5;
        myUint := myUint + 3 * 10 / 5;
    corp;

The 'strict' expression is used so that we can actually see the expressions involved - without it, constant folding simplifies them.

Within a template section, the 'template' is conceptually removed from the template type of the template formals and variables, and the Zed compiler does all of its checking based on those types. Any storage flags from the template type are applied. Template blocks are 'void' - they do not yield a value to the proc they are contained in. However, the code within them can yield a value, in which case there must be no ';' after that code.

Template sections cannot be nested.

Template sections can contain simple variable declarations. The declarations will become part of the code that the template is expanded into. The normal local variable name rules of Zed disallow a new local variable from overriding an existing local variable of the same name. That could make it more difficult than necessary to use template sections. So, that rule is relaxed for declarations coming from template sections: a variable declared within a template section can override a variable with the same name in an outer scope of the code the template section is being expanded into.

For example, a template block can contain a 'for' loop with an iterator variable named "i" without fear that the new iterator variable will clash with any existing "i" in the code that the template will be expanded into. This works because 'for' loops create their own scope, and the iterator variable is declared within that scope. In situations where scopes are not created (for example, if multiple template sections are used to create a given chunk of code), other solutions must be used to work around this problem. Note that the situation is the same with local constant definitions. See below for template variables, and 'begin'/'end'.

The Zed compiler does not insert an implicit scope around the body of a template block. This allows the pieces of a scope to be built up by the expansion of multiple template blocks during compile-time execution. This makes template blocks inappropriate for many uses that need to yield an expression. See the discussion of the examples below.

If a template block declares local names, but does not create an explicit scope to contain them, then the proc containing that template block cannot be used in a situation which does not allow declarations. The most notable such situation is that of being within an expression. For example, if a 'ctProc' proc returns a result and has such a template block, validation of a calling proc will typically get error "Declarations not allowed within expressions".

The nature of template sections leads to a slightly confusing situation, which in practice is unlikely to be encountered. If a template formal or local is of a writeable template type (i.e. the template type storage flags do not include 'con' or 'ro'), then that formal or local can be "assigned to" inside a template section, even if the formal or local itself is not writeable. If the formal or local itself is not writeable then it cannot be assigned to outside of a template section, regardless of the storage flags of the template type. Outside of template sections, the assignability is governed by the writeability associated with the proc formal or local variable itself. Inside of template sections the "assignability" is governed by the storage flags of the template type. Inside template sections the "assignment" is not trying to assign to the proc formal or local variable, but is creating an assignment to the entity that is referenced by the template formal or local.

18.9.4 Template Expressions

Syntactically, a template expression consists of:

'template'
'('
a single expression or statement
')'

The expression is a template section, just like the one in template blocks.

A template expression yields a value of a template type which is 'template' of the type of the expression or statement, and thus can be "template void". The storage flags of the resulting template type are taken from the corresponding status of the expression, where that makes sense. New template variables declared and initialized within procs can be used just like template formal parameters to such procs.

If a template variable of type "template void" is used inside a template section, the value it contains will be an "Exec_t" representing a 'void' entity. As such, no further syntactic elements can occur with it - it is a statement all by itself. This is the only situation in the Zed language in which a proc formal or proc local name by itself is a complete statement. An example:

    proc
    voidTemplate(template uint tU)void:
        template void voidExec := template(tU := 0);
        ...
        template begin
            voidExec;
        end;
    corp;

As with template blocks, the Zed compiler does not insert a scope around the expression of a template expression.

18.9.5 Con and Var Template Declarations

The final template element is the "con template"/"var template" declaration. Syntactically, such a declaration consists of:

'con' or 'var'
'template'
optional storage flags for new template variable
name for new template variable
type, without 'template', for new template variable
optional storage flags for new inserted variable
expression yielding a string - name for new inserted variable
optional ':=' and initialization expression for inserted variable

If there is an initialization expression for the inserted variable, it is a template section, like the ones in template blocks and template expressions. The first storage flags set applies to the variable that is declared within the proc containing the template section. If it does not include an explicit 'nilOk', then it will implicitly include 'nonNil' (since it is always given a nonNil value). If 'con' is used instead of 'var', then 'con' is automatically added to the storage flags for the template variable. This is the same as how using 'con' to declare a simple variable makes that variable 'con', i.e. not changeable. The second storage flags set applies to the variable that will be created within the proc within which the current proc is running at compile time. It is whatever is appropriate for that variable. If 'con' is included, then there must be an initializer expression. The expression for the name of the inserted variable can be 'nil', in which case a unique name is generated at template expansion time. Otherwise, it must be a string literal, a proc formal or a local variable. Note that the expression is not considered to be within a template section - evaluation within it is normal. Note that the full type must always be given - there is no syntax to take an implicit type from the initializer expression.

For this explanation, assume there is a proc, "tProc", which runs at compile time, during the compilation of some other proc, "uProc". The following var template declaration within "tProc":

    var template tpltUintVar uint varName := func();

is "executed" inside "tProc" during the compilation of "uProc". "varName" is an existing parameter or local variable of type 'string' inside "tProc". Assume that "varName" has value "X378" during "tProc"'s execution. A local variable declaration declaring a variable named "X378" of type 'uint' will be added to "uProc". That variable will be initialized with the result of calling "func()". A local variable named "tpltUintVar" of type "template uint" is declared inside "tProc". It will be initialized with an "Exec_t" describing variable "X378" in "uProc". Thus, later code within template sections in "tProc" can refer to "tpltUintVar" to insert code into "uProc" which uses "X378".

For example:

    proc
    example(Package/PContext_t nonNil pctx; template uint nonNil tU)void:
        string nonNil varName := Proc/CreateNewName(pctx);
        template begin
            con template tAU @ uint varName := @tU;
            tAU@ := tAU@ + 1;
        end;
    corp;

Here, proc "example" is creating a new unique name for a temporary variable, using "Proc/CreateNewName". Then, inside a template block, it is declaring a variable of type "@ uint" inside the proc being compiled, which is initialized to the '@' of whatever value "tU" describes. "example" emits into the proc being compiled code that uses the new variable twice to increment whatever it is the '@' of. Basically, this technique avoids evaluating whatever "tU" refers to more than once, which is usually desired if it could have side effects. In this example, local variable "varName" can be removed, and its use in the 'con' 'template' statement can be replaced with 'nil'.

As mentioned in section "18.1 Introduction to Compile Time Execution", compile-time procs should avoid both inserting statements into the current context and returning something to replace the compile-time proc call. If a situation requires multiple statements and a returned result, this can be done using an explicit scope:

    proc ctProc
    uintToBits16(template ro uint tU)nonNil template ro bits16:
        template(begin
            con template tVar bits16 con nil := tU;
            tVar
        end)
    corp;

This example creates a proc which can be used in expressions, and which changes the bits size of the expression. This could be useful, for example, if a 16 bit byte-swap, or some other size-dependent operation, is needed, and a 'uint' is the original argument. Note the use of an explicit 'begin'/'end' scope block within the parentheses of the template expression.

Local variables defined inside template sections are defined as normal in the active scope (usually the one containing the template section). However, they are also defined inside whatever context into which the template material is inserted. For example:

    proc ctProc
    myTemplateProc(...)void:
        ...
        template begin
            float val := ...
            ... use val ...
        end;
        ...
    corp;

    proc
    myUsingProc(...)void:
        ...
        myTemplateProc(...);
        ...
    corp;

declares "val" inside "myTemplateProc", so that it can be used inside the template section. However, "val" will also be declared inside whatever proc "myTemplateProc" is executed for. So, there will be a local variable "val" inside of "myUsingProc". This can lead to errors complaining of a name already being in use if care is not taken. The usual solution to this problem is to put an explicit scope inside the template section, so that each such local name and its uses are inside that scope. Typical use of 'con'/'var' 'template' declarations allows for this by using 'nil' for the name of the variable to be inserted, so that the system will generate a new name for each template insertion.

It is possible to used fixed names (such as "val" above) and have that name as part of the specification of what "myTemplateProc" does. However, that technique is "brittle" in that it can be accidentally broken by a programmer who is unaware of it. This is similar to a C macro declaring or using a local variable of a fixed name.

18.9.6 Template Implementation

A language description doesn't normally include implementation details. Some details might help in understanding Zed templates, however, and help in predicting what something will do.

At the end of compiling any proc in Zed (which does not include code generation here), the compiler calls its validation routine. That routine is needed because compile-time procs might have tricked the semantic code into constructing an invalid "Exec_t" tree for the proc body. For example, a piece of code that is valid in one situation within a proc might not be valid in another situation. If compile time code has kept a reference to such a piece of code and then subsequently uses it in some other place, the checks in the semantic code might not manage to detect the problem. Another example is trying to use code from inside one proc in another proc.

The validation process recurses through the entire "Exec_t" tree, calling all of the semantic routines to rebuild the tree, and thus fully checking all of the pieces of the tree in the correct context. Conditional compilation, compile time code, etc. is not re-executed - only its result is validated and recreated.

When bytecode is running at compile time and encounters a template section, the validation proc is called on the "Exec_t" for that template section, in a special "expanding-template" mode. Within that validation, references to proc formals or locals of template types or of type "Exec/Exec_t" result in the retrieval of the values of those formals or locals from the bytecode engine. Those values are in turn processed in "expanding-template" mode, and the results are used instead of the references to those formals or locals. When a "con template" (or "var template") statement is encountered in this mode, internal structures for the declaration of the new variable in the proc being compiled are created and appended in place of the "con template". The template local within the running proc is assigned a newly created record referencing the newly defined variable.

For types 'bool', 'char', 'uint', 'sint', 'bits64', 'float', 'string' and template types or "Exec/Exec_t", template sections can reference proc formals or locals which are part of the proc containing the template being expanded. The values of the variables are retrieved from the bytecode engine, and, for types other than template types or "Exec/Exec_t", are wrapped in Zed's constant descriptor "Exec_t"'s and substituted into the template section being expanded. Values of template types or type "Exec/Exec_t" are validated and inserted in place of the formal or local reference. See later for examples using the latter. [It would be possible to handle 'enum' types and variant record tag types as well, but I've chosen not to, since that would require another pair of interfaces to access 16 and 32 bit values from the bytecode engine.]

Compile-time procs are normally executed as the Zed compiler is "parsing" calls to them. However, for 'ctProc' procs called within template sections, the execution is delayed until template expansion time. This allows those procs to be called with parameters reflecting the code environment into which the template is being expanded, instead of seeing the template variables within the 'ctProc' proc. This doesn't matter if the proc is only looking at the types of things, but it does matter for procs which are examining the nature of the "Exec_t"'s that they are dealing with.

This delaying of the call does not happen with other kinds of compile-time procs, since those procs must interact with the "parsing" of their parameters. For example, 'ioProc's are given formatting codes, field widths, etc. as they are encountered in the calling code. That allows any errors they detect to be flagged at the proper location. So, if a call to a non-'ctProc' compile-time proc appears in a template section, it will be called at that point - while the template section is being compiled. Recall, however, that within a template section, the 'template' part of the type of template formals and locals is stripped off for the processing, thus allowing the usual set of operations available for the type. As far as the compiler and any such compile-time procs are concerned, it is as if the formal or local were simply of the non-'template' type. Where this doesn't work so well is when the compile time code looks more deeply into the "Exec_t" records representing a proc formal or local that it sees - it will see the type of that formal or local as the full type with the 'template' present. This will rarely be an issue, but might be for complex compile time code. Such code can also see the storage flags for the formal or variable, which might differ from those on the template type.

Since template sections cannot be nested (e.g. you can't have "template begin" inside a template block), a programmer might try to use manual calls to the proc call creation semantic procs to create a call to a 'ctProc' proc within a template block. Since those procs are not 'ctProc', nothing special will happen, and the sequence of calls will end up being inserted into the proc being compiled when the proc containing the template block is running. That is unlikely to be the desired result, unless the code is trying to build compile time code using compile time code.

Note that 'ioProc's like "Fmt" execute at compile time, so they are subject to the same issues as above. [So far, I have seen no problems - you can put "Fmt" calls in all of the interesting spots I've tried, and everything works properly. The constant-handling optimizations in the "Fmt" procs are not activated, however, because the types they see are not simple numeric types.] The origin of template expressions was as a means to create small pieces of code for use in template sections. However, they are not restricted to that - if a programmer has other uses for pre-built "Exec/Exec_t" structures, template expressions are a way to create them, with the limitations they are subject to.

The "Fmt" package does not use templates to implement its main functionality. Compile-time creation of custom formatting procs does use templates, however. The persistence code (the part which expands references to persistent items in Zed source code into appropriate calls to the persistence code) uses templates a lot. The template sections are not large, but there are many of them, and there are also many "assert assigns" involving template variables.

18.9.7 Varargs Examples

As has been mentioned above, proc formals or locals of a proc containing template sections can be used within those template sections if they are of appropriate types. Simple types have been described earlier. Use of "Exec/Exec_t" formals and locals is more subtle. This use will be illustrated by two versions of a "varargs" example. The first version is as follows:

    /* It would be nice to have this proc inside the capsule below, but that is
       not possible, since it is called from within a template section, and at
       template expansion time, there is no available capsule object to use for
       the reference. */

    proc
    handleArgs([] any args)void:
        con count := getBound(args);
        Fmt("handleArgs, count = ", count);
        if count ~= 0 then
            for i from 1 upto count do
                con val := args[i - 1];
                FmtN(i :: d : 3, ": ");
                if assign Basic/Bool_t b := val then
                    Fmt(b->theBool);
                elif assign Basic/Char_t ch := val then
                    Fmt("'", ch->theChar, "'");
                elif assign Basic/Uint_t u := val then
                    Fmt(u->theUint);
                elif assign Basic/Sint_t s := val then
                    Fmt(s->theSint);
                elif assign Basic/Float_t f := val then
                    Fmt(f->theFloat);
                elif assign string str := val then
                    Fmt("\"", str, "\"");
                else
                    Fmt("other tracked value");
                fi;
            od;
        fi;
    corp;

    capsule Varargs_t implements Exec/ActiveVarProcCall_t {
        record {
            Package/PContext_t nonNil va_pctx;
            Exec/TempExecs_t va_args;
        };

        procs Exec/ActiveVarProcCall_t {
            proc
            AppendArg(Varargs_t nonNil va; Exec/Exec_t nonNil ex)bool:
                if not Types/IsTrackedType(ex->ex_type) and
                    not Exec/AutoAnyWrap(va->va_pctx, @ex)
                then
                    /* Our varargs scheme cannot handle a value of this type.
                       Return true to tell the Exec code to reject this value. */
                    Package/EmitErrorString(va->va_pctx,
                                            "Varargs: bad argument type");
                    return true;
                fi;
                Exec/ExecsAppend(@va->va_args, ex);
                false
            corp;

            proc
            Complete(Varargs_t nonNil va)bool:
                con argCount := Exec/ExecsCount(@va->va_args);
                /* "argCount" is a 'uint' local variable. Therefore, it is usable
                   inside template code - it will appear as a 'uint' constant
                   during template expansion. */
                template begin
                    var template tVec [] any nonNil nil := matrix([argCount] any);
                end;
                for i from 1 upto argCount do
                    con arg := Exec/ExecsGetAt(@va->va_args, i - 1);
                    /* Note that we don't want this comment inside the template
                       section, since that results in it being template expanded!
                       That does nothing, but is wasted effort.

                       The fact that this works is subtle. "tVec" is of type
                       "template [] any". In the below template context that is
                       reduced to "[] any". "i" is accessible just as "argCount"
                       was above. "arg" is of type "Exec/Exec_t" and that also is
                       allowed inside template code. Since it is tracked, it is
                       compatible with 'any'. However, when this template code is
                       expanded, "arg" will likely *not* be of that type - it will
                       be of whatever type the "Exec/Exec_t" passed to "AppendArg"
                       has. "AppendArg" ensures that it is a tracked type, and so
                       this assignment is also valid during template expansion. */
                    template begin
                        tVec[i - 1] := arg;
                    end;
                od;
                template begin
                    handleArgs(tVec);
                end;
                false
            corp;
        };
    };

    proc varProc
    Varargs(Package/PContext_t nonNil pctx)void:
        con va := Varargs_t(pctx);
        Exec/ExecsInit(@va->va_args);
        Exec/ProvideVarProcCallHandler(pctx, va);
    corp;

If we use this implementation with the following test proc:

    proc
    testSmall()void:
        Varargs(13, 14.7, "Fred");
    corp;

and look at the "alternate" code created:

    proc
    testSmall()void:
            [] any nonNil __L0 := matrix([3] any);
        __L0[0] := Basic/Uint_t(13);
        __L0[1] := Basic/Float_t(14.7);
        __L0[2] := "Fred";
        handleArgs(__L0);
    ;
    corp;

we see the series of assignment statements generated by the 'for' loop in proc "Complete". The template section in that proc has used local variable "arg" (which is not a template variable, but is just a local variable of type "Exec/Exec_t" within "Complete") as the value being assigned into the vector element. Proc "AppendArg" had saved the various "Exec_t"'s of the argument values into "va_args", and the 'for' loop is retrieving them. In this version, all of the created code (i.e. both of the template sections) is created inside "Complete".

As the large comment in "Complete" explains, this technique only works for 'any' collections. Because of the use of 'any', proc "handleArgs" must use "if assign" statements (explicit run-time type checks) to access the values.

The second version of "varargs" is as follows:

    proc
    doHeader(uint i)void:
        FmtN(i :: d : 3, ": ");
    corp;

    capsule Varargs_t implements Exec/ActiveVarProcCall_t {
        record {
            Package/PContext_t nonNil va_pctx;
            uint va_count;
        };

        procs Exec/ActiveVarProcCall_t {
            proc
            AppendArg(Varargs_t nonNil va; Exec/Exec_t nonNil ex)bool:
                con i := va->va_count + 1;
                con t := ex->ex_type;
                if t = bool then
                    assert assign template ro bool b := ex;
                    template begin
                        doHeader(i);
                        Fmt(b);
                    end;
                elif t = char then
                    assert assign template ro char ch := ex;
                    template begin
                        doHeader(i);
                        Fmt("'", ch, "'");
                    end;
                elif t = uint then
                    assert assign template ro uint u := ex;
                    template begin
                        doHeader(i);
                        Fmt(u);
                    end;
                elif t = sint then
                    assert assign template ro sint s := ex;
                    template begin
                        doHeader(i);
                        Fmt(s);
                    end;
                elif t = float then
                    assert assign template ro float f := ex;
                    template begin
                        doHeader(i);
                        Fmt(f);
                    end;
                elif t = string then
                    assert assign template ro string str := ex;
                    template begin
                        doHeader(i);
                        Fmt("\"", str, "\"");
                    end;
                elif Types/IsTrackedType(t) then
                    template begin
                        doHeader(i);
                        Fmt("other tracked value");
                    end;
                else
                    Package/EmitErrorString(va->va_pctx,
                                            "Varargs: bad argument type");
                    return true;
                fi;
                va->va_count := i;
                false
            corp;

            proc
            Complete(Varargs_t nonNil va)bool:
                if va->va_count = 0 then
                    template begin
                        Fmt("No arguments given");
                    end;
                else
                    template begin
                        Fmt(count, " arguments given");
                    end;
                fi;
                false
            corp;
        };
    };

    proc varProc
    Varargs(Package/PContext_t nonNil pctx)void:
        template begin
            Fmt("Varargs start");
        end;
        Exec/ProvideVarProcCallHandler(pctx, Varargs_t(pctx, 1));
    corp;

The "alternate" code created for the same "testSmall" example is:

    proc
    testSmall()void:
                fmtString0("Varargs start\n");
        fmtFlush();
    ;
        /Varargs/doHeader(1);
            fmtUint00(13);
        fmtFlushL();
    ;
        /Varargs/doHeader(2);
            fmtFloat00(14.7);
        fmtFlushL();
    ;
        /Varargs/doHeader(3);
            fmtChar0("\"");
        fmtString0("Fred");
        fmtString0("\"\n");
        fmtFlush();
    ;
            fmtUint00(3);
        fmtString0(" arguments given\n");
        fmtFlush();
    ;
    corp;

This is totally different - this version does not save the arguments to "Varargs" - it processes them immediately, inside "AppendArg". Because of that, it cannot show the number of arguments before processing them, since it does not yet know how many there will be. This code does not need to use the special properties of an "Exec/Exec_t" proc formal or local. It also does not need to use explicit run-time type checks since it does all of its type checks at compile time (execution time of "AppendArg"). It does, however, create more code than the first version of "Varargs", and does not share any of it across multiple "Varargs" calls.

In particular, the constant-handling code in "Fmt" is not triggered, even though all of the arguments are constants. This is because "Fmt"'s code runs during the compilation of "AppendArg", and the various parameters it sees at that time (during execution of the "Fmt" code) are not constants - they do not become constants until template expansion time, and by then it is too late - "Fmt" has already run.

18.9.8 Templates for Inlining

Recall example proc "Inc1". Ignoring the template stuff, it is a proc which adds 1 to its single parameter. However, it does that with code inserted into the code which "calls" it, rather than with a proc call. This is exactly what proc inlining does. So, compile-time procs with all template parameters and either a template result or 'void' result, and which have a single template block or template expression as their body, are inline procs, as those are used in other programming languages.

Note, however, that semantically this kind of inlining is more like the use of macros in C, since actual parameter values are substituted directly. If they have side effects, that is likely incorrect. This can be avoided by declaring local variables inside the 'template' code which are initialized to the formal parameters. [These problems are avoided by using explicit 'inline' if that gets implemented. If it does, this whole section can disappear.]

For an example of this use of templates, see the character classification procs in package Char.

18.9.9 Scope Blocks

The 'begin'/'end' scope block is a general construct in Zed which can be used independent of templates. However, the most common uses for them are associated with templates, so they are described here. Scope blocks are simply an explicit scope. Syntactically, they are:

'begin'
block of code
'end'

The block of code can contain declarations, and those declarations will be in a new scope within the existing scope. This is useful inside template blocks in compile-time procs, where new local variables are needed for the template material, and the new scope is wanted to avoid name clashes with any names in outer scopes. For example, if a template section wants to use 'assert' 'assign', it could be written as:

    proc ctProc
    useAssertAssign()void:
        template begin
            begin
                assert assign MyRecord_t myr := GlobalMyr;
                ... use "myr" ...
            end;
        end;
    corp;

Here, name "myr" cannot clash with any name existing in the scopes within which "useAssertAssign" is called. Without the scope block, it might. Note that it is the fact that the new scope is inside a template section that overrides the normal Zed rule that variables declared in inner scopes cannot have names that already exist in an outer scope. [Implementation-wise, this is done via field "pctx_expandingTemplate".] Note also that the use of the scope block allows "useAssertAssign" to be called more than once within a scope of a using proc.

Scope blocks can also be used in rare situations where a block is required but only a simple expression is allowed.

18.9.10 "private" "package" Example

As mentioned in "8.10 "private"", 'private' constructs have an optional 'package' flag. Quoting the header comment for "Exec/PrivateBlockStart":

    /*proc PrivateBlockStart
     *
     * Start of creating a 'private' block. Start a new sequence to contain the
     * block contents.
     *
     * A basic 'private' block simply hides the sequence inside it. It does this by
     * having the Exec_t for the sequence be referenced by a record type (which is
     * used as a variant of Exec_t) which is not exported from package Exec and
     * whose fields are all marked as 'private'. Thus, only code in package Exec
     * can read or write the fields in that record type (PrivateExec_t).
     *
     * If the 'private' construct has the 'package' flag present, then this proc
     * is called with "hasPackage" 'true' by the parser. In that form the construct
     * is used to allow code in an exporting package to grant, in a limited and not
     * visible way, access to package-private items to a piece of code in another
     * (importing) package. This is normally done by putting the 'private'
     * 'package' block inside a 'template' section within a 'ctProc' proc
     * which is exported.
     *
     * The templated code is inserted into any proc which calls the 'ctProc'
     * proc, and that code has access to things private to the exporting package.
     * Typically this will be fields in a struct, record, etc. which are not
     * normally accessible.
     *
     * If the templated code writes to such fields, then code in the 'ctProc'
     * proc (both at compile time and via execution of the templated code) can
     * validate the data being stored, in ways which are not possible via a simple
     * exported "settor" proc.
     *
     * Note that it is the active proc which is doing the granting. Since procs do
     * not have a reference to their containing subpackage, the package attached as
     * the granting package is never a subpackage. As a consequence, private names
     * within a subpackage cannot be granted. If the proc is in a subpackage, the
     * names granted by it must be 'local'. If the proc is not in a subpackage, it
     * can grant access to 'private' names within the same package.
     *
     * There is no point in having a 'private' 'package' block outside of a proc
     * which runs at compile time - it cannot grant access to any external code.
     *
     * Special handling here and in the bytecode engine allows these semantics to
     * also work when "PrivateBlockStart" and "PrivateBlockNew" are called
     * explicitly from compile-time code (rather than implicitly by having the
     * 'private' 'package' inside a 'template' section).
     */

A 'private' 'package' example:

    export struct Priv1_t private {
        uint priv1_n;
        bool priv1_flag;
    };

    export proc
    SetPriv1(@ Priv1_t priv1; uint n; bool flag)void:
        priv1@.priv1_n := n;
        priv1@.priv1_flag := flag;
    corp;

    export proc ctProc
    WritePriv1(template Priv1_t tPriv1; template ro uint tN;
               template ro bool tFlag)void:
        template begin
            private package begin
                tPriv1.priv1_n := tN;
                tPriv1.priv1_flag := tFlag;
            end;
        end;
    corp;

    export proc ctProc
    GetPriv1N(template ro Priv1_t nonNil tPriv1)template ro uint:
        template(private package(tPriv1.priv1_n))
    corp;

    proc
    privateProc(uint n)uint:
        n * 3
    corp;

    export proc ctProc
    UsePrivate(template ro uint tN)template ro uint:
        template(private package(privateProc(tN)))
    corp;

Struct "Priv1_t" is exported, but it is marked as 'private', so code outside of the exporting package cannot read or write its fields. Proc "SetPriv1" is a traditional "settor" proc which allows external code to set the values into a "Priv1_t".

Proc "WritePriv1" is a compile-time proc which uses a 'private' 'package' block within a 'template' block to directly write the fields of the passed "Priv1_t". This allows the fields to be set directly by assignment statements, but more importantly, provides the opportunity to examine and perhaps modify the values which are given, at compile time, and to conditionally control what code, if any, is actually produced. Here, the parameters are all 'template' parameters, but they could be "Exec/Exec_t" parameters which are 'assign'-ed to the 'template' form after checking/modification.

Proc "GetPriv1N" shows producing direct accessor code for the private field. Proc "privateProc" is not exported from the package, but it can be called by using proc "UsePrivate". Again, "UsePrivate" can, at compile time, examine the environment and its passed parameter, and determine whether or not it wishes to allow the call. For example, it could allow the call on a direct local variable, but not on any other kinds of 'uint' values.

The examples above showed the granting of rights to fields of a struct. This also applies to fields of records, capsules and bits types and to the members of unions. Also, within a 'private' 'package' section, elements of matrix values which are 'private' can be accessed. Similarly, package-level names (e.g. procs and type-names) are usable inside a 'private' 'package' section even if they are 'local' or not normally exported to a using package.

Note: for technical reasons, names which are to be used this way must not be private to a subpackage - they must be 'local' or private to the granting package itself - see the "PrivateBlockStart" comment above.

'private' 'package' blocks cannot directly be nested. They can be indirectly nested, however. For example, if "Inner_t" is a struct defined in one package, and "Outer_t" is a struct, containing an Inner_t, defined in a second package, the writing proc for Outer_t can call the writing proc for Inner_t to write the fields of the Inner_t, which the package containing Outer_t does not have access to. In order for this to work, however, the structs should not be private; instead mark their bottom-level fields private. Do not mark Outer_t fields of type (or directly containing type) Inner_t as private. This allows the Outer_t writing proc to have a writeable Inner_t which it can pass to the Inner_t writing proc. The pattern for both procs is that shown for "WritePriv1", above. [The reasons for these restrictions are subtle.]

18.9.11 Template Pitfalls and Notes

Many uses of templates will work as expected. However, some uses will not work as expected or, more likely, will fail to work at all.

In dealing with templates, it is very important to keep track of whether things are to be happening at compile time or at run time. The fact that a proc is not marked 'ctProc' (or one of the other proc kinds that requires compile-time execution) does not mean that the proc will not run at compile-time. Such a proc can be called from a compile-time proc. Often, such procs are meant to be used only at compile-time. If the proc has any 'template' parameters or result, then it is very likely that it is intended to run at compile-time.

If there is a 'template' section in a proc's code, then that is almost certainly intended to run only at compile-time. Attempting to execute a 'template' section at other than compile-time of some proc is an error and will result in an abort. Within a template section, references to template proc formal or local variables will result in fetching of the current value of the formal/local, and either the insertion (after validation) of that code into the proc being compiled (for a template block), or the yielding of that code in the current context of the proc containing the template expression.

The simple form "template(localName)" can result in very different things, depending on what "localName" is. If "localName" is a simple 'uint' variable inside the proc that is running (at compile-time of some other proc), then an "Exec_t" node is created which represents the current value of "localName" as a uint literal. If, however, "localName" is a template local variable, then the value yielded is whatever is represented by the current value of "localName", which could be the equivalent of many lines of complex code.

One situation to watch out for is the use of "Fmt" and similar procs inside templates. If the value to be formatted is a template value, then the rule that inside template sections the 'template' part of the type is stripped off, means that what "Fmt" (or a similar compile-time proc which examines its parameters) will see is the templated type (e.g. 'uint' if the template type is 'template' 'uint'). However, if, instead of a template value, a value of type "Exec/Exec_t" is being used, then "Fmt" will see the Exec_t, which has a custom formatter, and will create a call to that custom formatter. When the template section is inserted into some other proc, both the template value and the Exec_t value will be retrieved from the running bytecode engine, and the fetched values substituted. That retrieved value will be of an appropriate type if it came from a template value, but will not be appropriate for the previously compiled "Fmt" call if it came from an Exec_t. E.g.:

    proc ctProc
    hasTemplate()void:
        template ro uint tUint := template(1);
        Exec/Exec_t exUint := template(2);
        template begin
            Fmt(tUint):
            Fmt(exUint);
        end;
    corp;
    ...
    proc
    useTemplate()void:
        hasTemplate();
    corp;

This example gets the error 'Bad value for parameter "ex": want <Exec_t> got <uint>' on the "hasTemplate" call in "useTemplate". Without the "Fmt" call with parameter "exUint" in "hasTemplate", all is well, as described above. When that call to "Fmt" is made, the compile-time "Fmt" code sees a value of type Exec/Exec_t, on which it finds a custom formatter. So, it produces a call to that custom formatter to replace the "Fmt" call. When "useTemplate" is being compiled, "exUint", a local in "hasTemplate", is used inside a template block and so is interpreted as the Exec_t that it contains - a 'uint' constant. That is not compatible with the Exec_t parameter required by the custom formatter for Exec_t, and so the error given above is produced.

As another example, consider the "uintToBits16" example from earlier:

    proc ctProc
    uintToBits16(template ro uint tU)nonNil template ro bits16:
        template(begin
            con template tVar bits16 con nil := tU;
            tVar
        end)
    corp;

This proc works fine, in all situations. However, other attempts at accomplishing the same thing fail, for various reasons. The version below leaves out the explicit block ('begin'/'end'). Error messages it gets are shown before the relevant line, as '//' comments.

    proc ctProc
    uintToBits16b(template ro uint tU)nonNil template ro bits16:
        template(
            // syntax error - ')' expected
            // Cannot discard value - use 'eval'
            con template tVar bits16 con nil := tU;
            // Name "tVar" is not defined
            tVar
        )
        // missing ";"
        // unrecognized syntax
    corp;

The problem here is syntactic. A template expression requires a single expression within the parentheses. This example attempts to have both the 'template' 'con' declaration and the "tVar" result expression within the parentheses. That just doesn't work, and the Zed parser stumbles over the unexpected code until it recovers.

A third version, below, attempts to use a template block to declare the needed variable, and a template expression to yield the result. There are situations in which this version works, e.g. as the result expresssion in a proc.

    proc ctProc
    uintToBits16c(template ro uint tU)nonNil template ro bits16:
        template begin
            con template tVar bits16 con nil := tU;
        end;
        template(tVar)
    corp;

Most attempts to call this compile-time proc will get "Declarations not allowed within expressions" for each use, and a final "Exec_t validation failed for proc "XXX"". These errors are produced during the final "validation" step of compiling the calling proc, and so will appear at its end. Consider the attempted calls:

    con b16 := uintToBits16c(u1) + uintToBits16c(u2);

Here, the calls to "uintToBits16c" are done in places where declarations cannot occur, namely the operands to the '+' operator. Such declarations are fine if they are within an explicit block which is used as part of an expression, but not without that block.

The above kind of error is obscure, and occurs related to the use of the compile-time proc using templates. Because of that, it can be quite hard to diagnose just what is wrong, especially if the parameters to the compile-time proc and/or the templates are complex. Other hard-to-diagnose situations include those with multiple uses of compile-time procs, with some calls within parameters to other calls. The final line of the last example in section "1.1 General" is one such case:

    Fmt(Map(Lambda(v) v #+ v end, vec));

Here, "Fmt" is a compile-time proc, and "Lambda" is a programmer defined construct. In the latest version of these facilities, none actually use any template code, but it is easy to imagine difficulties debugging the combined use.

Because of these problems, it is a good idea to test template code thoroughly, even if it is intended only for internal use. You are more likely to be able to diagnose problems when you have just written the template code.

As described previously. 'char' vectors can be used as 'string' values in some situations. In normal code this causes no problem. However, in code which is examining the types of Exec_t records, it can matter. This is because a 'char' vector used directly as a 'string' may still have its type as 'char' vector. It will depend on whether the type is being examined before or after the insertion of the "exk_charVecToString" operation.

Sometimes templates provide unexpected benefits. Compile-time code might need to insert calls to procs. How does the context into which the compile-time code is inserting code access such a proc? If that context has an active 'use' or 'import' for the package containing the proc, then the standard accessing calls exported from Package allow appropriate accessing 'Exec_t's to be constructed. But, what if that context does not have access to the proc, but the context of the compile-time code does? Templates to the rescue! [This was not part of the design. I "discovered" it, and believe it is safe, so am noting it here.] The form:

    template(... path to proc ...)

yields an Exec_t which is a reference to the proc. The type of the Exec_t will be 'template' of the type of the proc. This Exec_t can be used with the Exec-exported procs for constructing proc calls, to construct a call which can be inserted into the compile-time context even if that context has no access to the proc. Note, however, that if the context of the proc being compiled does not have access to the package containing the proc in question, and to the proc itself, validation will fail. As mentioned above, paths inside template sections (which includes empty paths to items directly within the package containing the template code) are turned into absolute paths during template expansion. These absolute paths will work without any 'use' or 'import'. Traversing that path must work in the context of the proc being compiled.

The use of 'private' 'package', as described in "18.9.10 "private" "package" Example", can help in this usage.

It is possible to imagine situations where the result is unclear. For example, if you have a proc which returns a template value, and you call that proc inside a template section, what is supposed to happen? As the description above has stated, only access to template variables and proc formals within template sections has the special semantics of fetching a value from the bytecode engine. Other situations do nothing special, so in this case the proc call is part of what is inserted into the context of the proc currently being compiled. Other checks relating to templates, either during parsing or during validation, will usually fire.

What would it mean to allow nested template sections? The obvious answer is that the inner template section would be compiled into the proc that is currently being compiled. Again, however, various checks are likely to prevent this from working.

18.10 "ctSafe"

Section "3.4 Procs" showed 'ctSafe' as one of the modifiers which can be given to a proc, but it has not yet been explained. The token is short for "compile-time-safe", with its meaning relating to that. Many programming languages which allow compile-time execution do not provide any ability for their compile-time code to have access to variables whose values persist beyond the call to the compile-time code. Zed packages can have package-level variables, and compile-time procs within such packages can reference and modify those variables. This introduces problems relating to the allocation of space for, and the initialization of, those variables. These same problems happen with compound constants, since they are also allocated within space associated with the package.

A package can be in an early "uninitialized" state, or it can be in a later "initialized" state. The initialization process includes the allocation of space for the variables and compound constants in the package, the implicit and explicit initialization of those variables and constants, and the call to any provided "_PackageInit_" proc (see "29 Special Names") for the package or subpackage.

If a proc references package variables or compound constants for a package, then it is necessary that the package be initialized before the proc can run. The Zed bytecode engine checks the status of the package containing a proc to be called on every call to a proc. If the package is not initialized, it will do the initialization at that point. This can result in initialization happening at unexpected times, especially if the package exports some of its procs. Note that the top level of calling a compile-time proc is considered to be a call to the proc for this purpose, even though it is not a normal call.

Running code can append additional package variable or compound constant declarations to packages. When and how are those new variables or constants initialized? The solution currently used in Zed is that when a package variable or compound constant is added to a package, the package is marked as not initialized. This means that initialization for a package can occur multiple times. Writers of such code should be aware that this can happen.

An enforced restriction relating to adding package variables or compound constants to packages is that if any proc within the package is currently active in the bytecode machine, such new declarations are prohibited and will fail.

Proc flag 'ctSafe' is designed to reduce the occurrences of the multiple initialization situation. A proc that is marked 'ctSafe' must not itself reference any package variables or compound constants, must not call (at its run time) any proc which is not also marked as 'ctSafe', and must not call any proc indirectly. When the bytecode engine is calling a proc, if the proc is marked 'ctSafe' it will not do early initialization of the package containing the proc.

Since this check is somewhat expensive and intrusive, we do not want native code to be required to do it. It is for this reason that all compile-time execution in Zed uses the bytecode engine, even if native code exists for the procs in question. For the same reason, no native code execution can add package variables or compound constants to a package which has already been initialized.

18.11 "##" Accesses

Some programming languages allow names to be looked up on types and values, in order to do things like finding "print" procs to apply. Zed can do this as well, although it is not much used. See "10.6 Generic Type Parameter Interfaces" for the use of this syntax in generic code. Other languages which do this tend to use run-time type checking, so that when a name is looked up in that fashion, the checking for the appropriateness of what is found is done at run-time. In Zed, the lookup and checking are done at compile time. This is needed because Zed has strong static typing.

The syntax of this is:

named type or expression of named type
'##'
name to look up

For example, if type "MyRecord_t" has been given a formatting proc under the name "fmt", using "MyRecord_t##fmt" will reference that proc. Any kind of entity which can be added to a named type using "Types/DoExportAdd" can be referenced in this way.

If there are minor needs for this capability, they can be handled using compile-time execution and directly calling "Types/ExportFind". So, it is possible that this facility will be removed from the language, other than when associated with generics.

19 "cliProc" Procs

19.1 Introduction

Many people will view with dismay the inclusion of command line processing in the Zed programming language. The requirements for interactive commands and command scripting are thought to be quite different from the requirements for a general programming language, and so combining them will yield a horrible result. I believe the experiment is well worth doing. Note that, for the most part, what is specified here is the syntax of command lines, and not their semantics. Because of that, the descriptions use example commands rather than real ones.

When working interactively in a Cli (shell, command interpreter, whatever) users have come to expect lots of convenience features. These include easy access to the history of previous commands, easy access to the output of commands, command completion, filename completion, input line editing, input/output redirection and pipelines. Many of those features are part of the command input mechanism, and an interactive shell in Zed would support them. The Cli mode features of Zed are only concerned with the interpretation of the command after history substitution, completion and editing are finished.

There are several reasons why I wanted to have a Cli mode that uses the Zed programming language:

the exact lexical and syntactic rules of existing shells are usually not specified to the same level of detail as the rules of programming languages. This can leave users having to experiment to see how they work. Slight differences in those rules can also make it harder to correctly port shell code from one shell to another.
usually, shell scripts do not fully check the syntax of commands against what is needed. For example, a change in the set of flags that a command accepts can invalidate existing scripts that use the command, but there is no mechanism to automatically detect the breakage. Often, a script will appear to be operating correctly, when it is in fact subtly broken in some important way. In scripts with a lot of variable substitution, the exact commands which are being issued are not obvious to the reader. In many computer systems today, such shell scripts are important parts of the system infrastructure. Accepting a new version of a command program should not require manual inspection of dozens of shell scripts, many of which the maintainer is not familiar with. Even finding all of the affected scripts can be nearly impossible.
the programming capabilities of shell languages use lexical and syntactic rules which are completely different from those of standard programming languages. This imposes a significant burden on the maintainer, to be able to properly switch around among the various incompatible sets of rules and conventions. For the most part, those differences are not necessary.
the nature of shell programming encourages the implicit use of global shell variables. This is a very dangerous programming style, in that breakage due to maintainance is often completely silent.

In general, an interactive shell command consists of the command (usually just a name found by lookup in the set of active command directories) followed by any number of command arguments, some of which are flags, some of which are file names (or paths), and some of which are just arbitrary strings. Within such a command can be input/output redirection specifiers. Multiple commands can be given, separated either by pipe indicators ("|") or semicolons.

The command lookup is analagous to the name lookup among the set of in-use packages that Zed does. Built-in commands can be handled however it turns out to be convenient. Separation by semicolons is the same as what Zed does for statements. Providing the remainder of the command line lexical and syntactic form is what Zed 'cliProc' procs, and the rules for using them, are for.

19.2 "cliProc" Proc Specification

Cli procs are created by placing the token 'cliProc' between the 'proc' and the proc name when defining the proc. Cli procs must return a result of type 'uint', which is the status result of a command using that proc. Any formal parameters of a Cli proc must be of the following types:

'bool' - the parameter is a flag, which is 'false' by default, and 'true' if specified
'char' - the parameter accepts a single character as its value
'uint' - the parameter accepts an unsigned integral value
'sint' - the parameter accepts a signed integral value
'float' - the parameter accepts a floating point value
'string' - the parameter accepts any string literal (which includes Cli-mode string literals without quotation marks)
'[] string' - this parameter, of which there can be only one per Cli proc, gets a possibly-empty vector of all otherwise unused parameters, in the order in which they appeared in the command line

Since a string vector parameter gets all unclaimed parameters, it is not given explicitly on command lines, so the name of such a formal parameter is not significant on command lines using the Cli proc. For the other formal parameters of a Cli proc, the name of the parameter is the name of the command line flag. If a parameter name is a single character, that parameter must be given after a single dash ('-') on the command line. If the parameter name contains multiple characters, it must be given on the command line after a double dash ('--'). Multiple single-character parameters may be given together, following a single dash. Any needed value arguments must then be given in the corresponding order after the set of parameter names (flags). Needed values (i.e. for all types except 'bool') for parameters with multi-character names must be given as arguments immediately after the flag. If a Cli proc parameter name contains underscores ('_'), then they may be given on command lines with either underscores or minus signs or a mixture.

If a 'bool' flag is not specified on the command line, the corresponding Cli proc parameter will have value 'false'. Similarly, if flags of the other non-vector types are not given on the command line, the corresponding parameters will have 0 or empty-string values. If no otherwise-unconsumed command arguments exist, then any string vector parameter will get an empty vector as its value.

[Note that this processing is not as flexible as that of "getopt".]

As an example, consider the following Cli proc:

    proc cliProc
    list(bool s, r; uint max; [] string tail)uint:
        ...
    corp;

This command accepts flags "-s", "-r" and "--max". "--max" must have a uint value after it. Any command arguments not consumed as flags or "max"'s value are passed to "list" via vector parameter "tail".

The following are valid commands using "list":

    list
    list package1/package2
    list -rs
    list --max 100
    list first "second arg" -r third --max 10 fourth\ argument\ here

The following are invalid "list" commands:

    list -z
    list --max
    list --max arg
    list -r stuff and such -r
    list --s
    list --rsa
    list --max 10 go and do it --max 20

If any of these commands were inside a Zed proc (Cli or otherwise), the errors would be detected and flagged at compile time of the proc. Because of substitutions, described below, such compile-time detection of errors is not always possible, but the Zed code can detect errors based on partial flags and partial values. A Zed shell must re-check the arguments at run time, after all substitutions have been done, since it must create and execute the final call to the Cli proc.

If a single or double dash appears by itself, it is interpreted as a command argument.

If a proc does not have a parameter named "help" then the Zed facilities will implicitly provide one which prints out usage information based on the proc header. Thus, from the user point of view, all commands support "--help", although it is possible that it does not do anything "help-like".

Because the form of Cli commands is very different from the form of normal Zed statements and expressions, if the Cli proc being used is not found, then a lot of error messages will result. The Cli proc could be not found because it was spelled wrong, because appropriate 'use' or 'import' clauses are not in effect, or simply because it doesn't exist. The error messages result because the parser does not know that it should go into Cli mode for the line of code, and so processes it according to the normal Zed syntax and semantics. [It should be possible to add a flag to the parsing context, set when parsing is on behalf of an explicit interactive shell, that says to assume that unknown names at the beginning of a "line" should be assumed to be Cli procs.]

19.3 Command Arguments

Command line arguments are usually separated by whitespace (spaces or tabs). They can be built up from both quoted (either ''' or '"') and unquoted fragments. Since only unescaped whitespace separates arguments, arguments can be built up from multiple quoted and unquoted fragments. Arguments for a command terminate at the end of the source line, unless the last character on the line is a non-quoted escaped carriage return or newline (see below).

If quoted, an argument fragment runs until the closing quote, and the standard Zed string escapes can be used within it. Note that "string breaks", where multiple quoted string literals separated by whitespace are considered to be one long string literal, are not supported in Cli commands. Such multiple quoted strings are separate arguments if they have whitespace between them. If they have no whitespace between them, then they are fragments of the same overall argument.

When command arguments are not quoted, they terminate at the next space, tab, carriage return, linefeed, quote, parenthesis, semicolon, vertical bar, ampersand, or angle bracket (greater than or less than sign). Escaped (using backslash) characters can also be used inside unquoted strings. The escaping removes the special meaning of all of the terminating characters listed above. The following special escape characters are also supported: "b" (backspace), "n" (newline), "r" (carriage-return) and "t" (tab). When actual carriage returns or newlines are escaped, they cause the next line of text to be part of the current command line.

If an argument fragment starts with an open parenthesis, then a normal-mode Zed expression is expected to follow, and be terminated by a close parenthesis. These expressions are evaluated at run time, although the compiler can and will do constant folding within them. The value of an expression of type 'string' will be substituted for the parenthesized expression. Expression types 'bool', 'char', 'uint', 'sint' and 'float' are also accepted - the values will be converted to strings for use. Such values will be 'true' or 'false' for 'bool', and default conversions without spaces or unneeded leading zeros for numeric values. Values of types 'char' and 'string' will not have quotation marks inserted around them.

A useful expression when dealing with numbered objects is one which is a formatted version of a 'uint' counter value. For example, given local variables:

    string path := "dataDir/data", itemName := "item";
    uint i := 33;

command line argument:

    (path)/(itemName)(FmtS(i :: d0 : 3)).raw

produces, at run time:

    dataDir/data/item033.raw

If just "i" were used instead of "FmtS(i :: d0 : 3)", the leading "0" in the final item name would not be present. If used in a loop situation, the version using "FmtS" produces names of the same length for values of "i" from 0 upto 999.

As a further example of substitutions, assume variables:

    var f := true, c := "a", u := 123, s := -76, x := 137.2184374,
        str := "Fred";

command line argument:

    (f):(c):(u):(s):(x):(str)

produces:

    true:a:123:-76:137.2184374:Fred

Parameters to commands are evaluated left-to-right - they will not be reordered.

Note that because both single and double quotation marks are accepted, and the strings delimited by them are of the same type, there are multiple ways in which the same string can be represented. Using a single quotation mark works well if the string contains double quotation marks - the double quotation marks do not need to be escaped. When Cli code is pretty-printed by the system, it will not always reproduce the exact same input form for strings as was given by the user/programmer. It will always produce an equivalent form, however. [As of this writing, the decision is based on the first, if any, quotation mark within the string - the string is quoted using the other quotation mark.]

19.4 Calling Cli Procs From Normal Zed Code

Many Cli commands are issued interactively. However, "shell scripts" contain Cli commands along with programming. In Zed, the simplest way to issue a command inside code is to use the command as a statement. The Zed compiler recognizes the nature of the 'cliProc' proc, and switches to Cli-mode lexical scanning and parsing for that command. The command will usually end on that same line, but can span multiple lines because of escaped newlines, or because of long expressions inside parentheses. In fact, a Zed shell will use the Zed parser, and will wrap the input command inside a temporary proc which it creates, compiles and calls. The full Zed syntax will then be available interactively.

As an example, running proc "test1" in the following code:

    proc cliProc
    cmd1(bool f, l; uint count)uint:
        Fmt("f = ", f, ", l = ", l, ", count = ", count);
        0
    corp;

    proc cliProc
    cmd2(string tag; [] string tail)uint:
        FmtN("tag = \"", tag, "\", tail =");
        for i from 1 upto getBound(tail) do
            FmtN(" \"", tail[i - 1], "\"");
        od;
        Fmt();
        0
    corp;

    proc
    test1()void:
        cmd1
        cmd2
        cmd1 -fl --count 1000
        con part1 := "Fred", part2 := "Wilma";
        cmd2 --tag "This is the tag" one two "three is here" (part1):(part2)
        for i from 1 upto 5 do
            cmd1 -l --count (i * 10)
        od;
        cmd2 --help
    corp;

produces the following output:

    f = false, l = false, count = 0
    tag = "", tail =
    f = true, l = true, count = 1000
    tag = "This is the tag", tail = "one" "two" "three is here" "Fred:Wilma"
    f = false, l = true, count = 10
    f = false, l = true, count = 20
    f = false, l = true, count = 30
    f = false, l = true, count = 40
    f = false, l = true, count = 50
    Use is: cmd2 --tag <string-value> --help ...

It is legal to put a semicolon after the commands, but not necessary. The system parser currently does not preserve unneeded semicolons.

Even though all 'cliProc' procs are required to return a 'uint' value, all uses of them in the above ignore that value. The Zed language allows for this - when used as statements, Cli command results do not need to be consumed. However, the results are checked at run time, and execution will terminate if an unconsumed result is not 0. If a command is the last statement in a sequence, and is not followed by a semicolon, then the sequence can yield 'uint' in contexts which require it.

In the above, the Cli proc to call is the first element of a line of code. The other way to call a Cli proc in Zed code is to enclose the call in braces. Such an expression has type 'uint', and is a normal Zed expression in terms of how its value is used. This allows the status from a command to be explicitly discarded by using 'eval' in front of the brace-surrounded command. More importantly, this syntax allows the status of a command to be assigned to a variable or otherwise used.

As an example, running proc "test2" in the following code:

    uint LIMIT = 10;

    proc cliProc
    count(bool noFail; [] string args)uint:
        var n := getBound(args);
        if noFail or n <= LIMIT then
            0
        else
            Fmt("Warning: too many arguments - ", n);
            n
        fi
    corp;

    proc
    test2()void:
        var count1 := {count one two three},
            count2 := {count --noFail 1 2 3 4 5 6 7 8 9 10 11 12 13},
            count3 := {count 1 2 3 4 5 6 7 8 9 10 11};
        Fmt("count1 = ", count1, ", count2 = ", count2, ", count3 = ", count3);
    corp;

produces the following output:

    Warning: too many arguments - 11
    count1 = 0, count2 = 0, count3 = 11

Parentheses are used in Cli mode to switch to normal mode, but in normal mode braces are used to switch to Cli mode. The difference is deliberate - braces are typically used in Zed to indicate some kind of indirect or deferred value. Doing both switches in one expression or command can be confusing, so is not recommended.

If a Cli proc is used in any other context, it is simply a proc value, the same as non-Cli procs. In particular, trying to run a Cli proc in the middle of regular Zed code will result in lots of error messages, as the parser tries to make sense of Cli mode parameters as regular Zed code. Note that Cli procs can be called using the regular or indirect proc call syntaxes, if they appear anywhere other than the two special contexts described above.

19.5 Redirection, Pipes, and Backgrounding

[The concepts of "standard input", "standard output", pipelining, and backgrounding have proved themselves to be quite useful. So, my intent is to have some form of them in a Zed system. However, I don't think the ability to redirect arbitrary "fd"'s has proved to be very useful, other than for "standard error". At this point I have done nothing to allow redirection/merging of standard error, and the standard Unix shell syntax for it will likely not be what is used, because of problems parsing it.]

Command lines can contain the following input/output redirection indicators:

< - "standard input" is connected to the item which is specified by the command argument after the indicator. That item must be of type '[] char'.
> - "standard output" is connected to the item which is specified by the command argument after the indicator. The item is emptied before the command runs, and will be created as type '[] char' if it does not exist, and must be of that type if it does exist.
>> - "standard output" is connected to the item which is specified by the command argument after the indicator. The item is not emptied, and new data is appended after any existing data. If the item does not exist, it will be created before the command runs. The item must be of type '[] char'.

There can be only one input redirection and one output redirection per command. Multiple specifications are considered an error and the command will not run.

The "standard output" of one command can be connected to the "standard input" of another command using the '|' pipe indicator. Multiple commands can be connected in this way. For example:

    dataCreate --size 10000 | dataSort -u -io - - | dataCount > summaryFile

Command "dataCreate" runs, with its input being interactive, and its output being connected to the input of command "dataSort". Similarly, "dataSort"'s output is connected to the input of command "dataCount" (we could imagine that the argument sequence "-io - -" tells it to read from its standard input and write to its standard output). "dataCount"'s output is directed to item "summaryFile", which will be created if it does not exist, and will be initially emptied if it does exist.

Note that it is an error to try to redirect the standard input of any except the first command in a pipeline, or to redirect the standard output of any except the last command in a pipeline.

The status of a pipeline is the status of the last command in the pipeline. So, there is no way to get the status of other commands in the pipeline, and if one of them fails, execution stops with an error.

A command pipeline can be terminated by the backgrounding indicator '&'. This causes the command to run in a "process" that is detached from that in which the active program is running. Such a process can outlast the one which started it. Because of this, a backgrounded command has type 'void' instead of 'uint'. Similarly, a non-zero result from a backgrounded command will not cause termination of the process which started it.

Note that as of this writing, redirection, pipelines and backgrounding are implemented in the Zed compiler, but not properly in the runtime. Redirections are ignored, but do consume their targets from the command line. Pipelines run all elements sequentially. The '&' backgrounding operation is ignored.

19.6 Miscellaneous

Zed Cli procs are run time procs just like regular procs. So, they can be compiled to optimized native code. This means that CPU-intensive processing needed as part of a command can often be done in-place, rather than having to call out to some other program to do that processing.

Cli procs can be used as values, just like regular procs. However, when such a value is called indirectly, the "Cli-ness" of the proc is no longer visible at compile time, so the call will have normal Zed calling syntax. If a Cli style call needs to be made to a proc determined at run time, then a stub Cli proc must be written, doing a regular-mode call to the current proc value, passing the arguments that were given to the Cli-mode stub proc. [This issue needs to be examined more, e.g. re adding 'cliProc' to proc types, and new rules for 'cliProc' type compatibility.]

Multiple Cli procs can be exported from the same package or subpackage. This means there is less cost with having multiple variants of a command. It is recommended that the basic functionality of a command be provided by a regular (non-Cli) proc which is exported for others to use. Multiple Cli procs can then use that regular proc to implement variants of the command, and there will be no process overhead with the resulting call. Also, users can then create their own variants of the command by having their own library of Cli procs, some of which call the exported regular proc directly.

Because of the above, it is expected that there would be little need for an explicit command aliasing mechanism in a Zed shell.

Similarly, since Cli procs can directly call utility procs from libraries, many uses of backtick ('`') substitution in command lines will not be needed. For example, it is common to use "pwd" within backticks. With a Zed shell, this could simply be a call, within parentheses, to a utility proc which yields the path to the current working package as a string. Update: see "22.3.4 Paths to Packages" for the '{.}' syntax.

When paths to items are given as command arguments, those paths are not resolved until run time. This is the case even when the entire path is known at compile time. This behavior is consistent with the fact that command arguments are treated as strings. When explicit paths otherwise appear in Zed code, they are resolved at compile time. A consequence of this is that the command to be run is resolved at compile time, and nothing that happens at run time can change that. For example:

    ...
    Package/Package_t testPk := /Scratch/User0/Test;
    Utilities/Summarize -l --max 100 /Scratch/User0/Test
    ...

In this code fragment, variable "testPk" is assigned a reference to a package "Test", whose location is determined at compile time. Thus, whenever this code runs, "testPk" will always reference that package. Command "Summarize" is also found at compile time and will never vary, regardless of where the proc containing this command is run, and what changes are made to the package hierarchy. When "Summarize" runs, however, it will have to look up the path to "Test" and might get a different package than is referenced by "testPk". If the code wishes, the value for "testPk" can be determined at run time, by doing an explicit lookup instead of putting the path directly in the code. Note, however, that doing so does not guarantee that "testPk" will get the same package as "Summarize" finds, since two lookups are happening, and things can change in between them. This is exactly the same as with traditional filesystems and commands.

With Zed code, however, it is possible to do the lookup just once, by calling a lookup proc (at run-time), and assigning the resulting reference to a variable. Then, that variable is used to pass the found entity to multiple non-Cli procs (which presumeably implement the functionality of the commands which we wish to guarantee are working on the same entity). Again, see "22.3.4 Paths to Packages".

Cli procs are a full part of the Zed language. If programmers prefer to use the syntax style of "commands", they are free to do so in Zed. Note, however, that looping constructs and assignments must still be done in the usual Zed manner.

20 Dynamic Typing

The Zed programming language is statically typed, meaning that the types of variables, etc. are known at compile time. Many programming and scripting languages are dynamically typed, in that types are not given on any declarations, and a given variable, field, etc. can take on values with different types at different times during execution. Many such languages do not even use variable declarations - variables are automatically created the first time they are given a value. Using Zed syntax, this style of programming could look like:

    value := 123456;
    ...
    calledProc1(value % 10 + 3);
    ...
    value := "Name of person";
    ...
    calledProc2(value[5 for 4]);

Zed does not directly allow this style of programming. Zed comes close to the brevity using 'con', 'var' and 'def' declarations, and when multiple variables/constants are declared at the same time it can be even briefer.

Some programmers, for some applications, want to be able to program without having to think about types at all, however. Zed can be used to provide a framework which can allow that style. This typically uses types 'any' or 'autoAny' so that it can work with user-defined types as well as a fixed set of types that the framework provides. The "VarTest" example in "1.1 General" shows this style of use, along with several other "advanced" features.

This section explains more about how that is achieved.

Type "Var_t" in that example is defined as:

    export type Var_t = autoAny;

The advantages of using type 'autoAny' are described in "4.1 Basic Types" and elsewhere. Because "Var_t" is a named type, "#" operator procs can be attached to it, so that "#" operators can be used. Normally, very few operators can be used with 'autoAny' values. As the example shows, assignment to "Var_t" variables, etc. does not require any special operator, since 'autoAny' accepts any tracked value as well as automatically converted values of the basic scalar types.

What about operators such as addition, however? Keep in mind that with this scheme, the types of values are not known until run-time. So, explicit run-time type checking is needed, using the 'if' 'assign' construct. For example, an "add" proc could start out as:

    export proc
    add(Var_t left, right)nonNil Var_t:
        if assign con l := left then
            if assign con r := right then
                if assign Basic/Uint_t nonNil bul := l then
                    if assign Basic/Uint_t nonNil bur := r then
                        return Basic/Uint_t(bul->theUint + bur->theUint);
                    elif assign Basic/Sint_t nonNil bsr := r then
                        return uintSintAdd(bul->theUint, bsr->theSint);
                    elif assign Basic/Float_t nonNil bfr := r then
                        return Basic/Float_t(flt(bul->theUint) + bfr->theFloat);
                    else
                        abort "Cannot add this value to a uint";
                    fi;
                elif assign Basic/Sint_t nonNil bsl := l then
                ...

The first two 'if' 'assign' constructs simply test for non-'nil' input parameters. The third is checking to see if the left-hand operand is of type "Basic/Uint_t". If it is, then the right-hand operand is similarly checked, and if both are that type, a new "Basic/Uint_t" is yielded which contains the sum of the two contained 'uint' values. A local proc "uintSintAdd" deals with addition of a 'uint' and a 'sint' - it can choose to return either type, abort with an error, etc. When adding a 'float' to a 'uint', this "add" chooses to convert the 'uint' to 'float' and yield the result of 'float' addition. This particular proc does not allow adding a 'string' to a 'uint' but that could certainly be done - either converting the 'uint' to a string and doing string concatenation, or requiring that the string be convertable to a 'uint', 'sint' or 'float' and doing the appropriate arithmetic. Another possibility for operators such as this is that some conditions trigger conversion to a higher precision representation, e.g. using the Multi-Precision Integer Arithmetic package.

This would all be fine if only the simple pre-defined types were needed, but how does this handle user-defined types, such as the complex number type used in the "1.1 General" example? Each type must provide the operator proc for itself, and that proc must be such that general-purpose code can call it. Appropriate types are exported from Exec:

    export type AnyUnary_t = proc(any nonNil a)nonNil any;
    export type AnyBinary_t = proc(any nonNil left, right)nonNil any;
    export type AnyShift_t = proc(any nonNil a; uint n)nonNil any;

By using these types as the proc signature for operator procs on types, general-purpose code such as the "Var" code here can correctly call them. The relevant section of the above "add" proc is:

                    if assign Proc/Proc_t nonNil pr1 :=
                        Types/FindAnyProc(l, Exec/ANY_ADD)
                    then
                        if assign Exec/AnyBinary_t nonNil pr2 := pr1 then
                            return {pr2}(l, r);
                        else
                            abort "Wrong signature on \"ANY_ADD\" proc";
                        fi;
                    else
                        abort "Cannot add to this value";
                    fi;

In this context, "l" and "r" are of type "Var_t" which is a rename of 'autoAny', and so are compatible with 'any'.

Proc "Types/FindAnyProc" is a key piece of this. It is privileged code, which takes an 'any' (with which 'autoAny' is compatible) value, extracts the actual type value, and looks up the given name (here "Exec/ANY_ADD"), attempting to return a "Proc/Proc_t" value.

[For the curious, the current Zed implementation of "FindAnyProc" is:

    export proc
    FindAnyProc(any nonNil a; string nonNil name)Proc/Proc_t:
        con t1 := pretend(a, * Tracked_t)*.tr_tptr;
        if assign con inf := ExportFind(nonNil(t1), name) then
            if select pr := inf->inf_proc then
                return pr;
            fi;
        fi;
        nil
    corp;

]

Inside the compatible implementation of complex numbers, here as a record type, we have:

    export Exec/AnyBinary_t: proc
    complexAnyAdd(any aLeft, aRight)nonNil any:
        assert assign Complex_t nonNil left := aLeft;
        assert assign Complex_t nonNil right := aRight;
        Complex_t(left->cplx_real + right->cplx_real,
                  left->cplx_imag + right->cplx_imag)
    corp;

In order that this proc be found by "FindAnyProc", the above "complexAnyAdd" must be attached to the "Complex_t" type under name "Exec/ANY_ADD".

The complete set of these names exported from Exec is:

    export string
        /* unary operators */
        ANY_NEG = "anyNeg",                             // -
        ANY_PLUS = "anyPlus",                           // +
        ANY_TILDA = "anyTilda",                         // ~
        ANY_AT = "anyAt",                               // @
        ANY_AMPERSAND = "anyAmpersand",                 // &
        ANY_POST_AT = "anyPostAt",                      // postfix @

        /* binary operators */
        ANY_B_AND = "anyBAnd",                          // &
        ANY_B_OR = "anyBXor",                           // ><
        ANY_B_SHL = "anyBShl",                          // <<
        ANY_B_SHR = "anyBShr",                          // >>
        ANY_B_IOR = "anyBIor",                          // |
        ANY_RELATE = "anyRelate",                       // <>
        ANY_POW = "anyPow",                             // ^
        ANY_MUL = "anyMul",                             // *
        ANY_DIV = "anyDiv",                             // /
        ANY_REM = "anyRem",                             // %
        ANY_ADD = "anyAdd",                             // +
        ANY_SUB = "anySub",                             // -
        ANY_LESS_THAN = "anyLessThan",                  // <
        ANY_LESS_OR_EQUAL = "anyLessOrEqual",           // <=
        ANY_EQUAL = "anyEqual",                         // =
        ANY_NOT_EQUAL = "anyNotEqual",                  // ~=
        ANY_GREATER_OR_EQUAL = "anyGreaterOrEqual",     // >=
        ANY_GREATER_THAN = "anyGreaterThan",            // >
        ANY_STR_EQ_EQ = "anyStrEqEq",                   // ==
        ANY_STR_NOT_EQ_EQ = "anyStrNotEqEq";            // ~==

Similarly, package Fmt exports:

    export string FMT_ANY = "fmtAny";
    export type FmtAny_t = proc(CharBuffer/OBuf_t nonNil ob; any nonNil val;
                                string format; uint width, precision)void;

which can be used to export a formatting proc which takes its argument as an 'any' value, converts to the known actual type, and then appends a formatted version to the supplied buffer as normal. The proc used by the sample record-based complex number package is:

    export FmtAny_t: proc
    complexFmtAny(CharBuffer/OBuf_t nonNil ob; any nonNil a; string format;
                  uint width, precision)void:
        assert assign Complex_t nonNil cplx := a;
        FmtB(ob, cplx);
    corp;

which is in turn assuming a regular "FMT_THIS" proc on "Complex_t".

Note that neither Exec nor Fmt directly uses any of these symbols - they are simply provided as a standardized set which can be used as needed.

The remaining facilities used in the "VarTest" example are:

Range - constructs a vector of Var_t, filled in with Basic/Uint_t values from 0 upto (but not including) the specified limit. The Var_t code accepts and works with vectors of Var_t as valid Var_t values.
Map - takes a proc (signature 'Exec/AnyUnary_t') and a vector of Var_t. Allocates a new vector of the same length and fills it in with the result of calling the proc on the corresponding input vector elements. The result is returned.
a "# braces" proc is defined in the Var_t package. See "13.9 "#" Braces".
Lambda - this is a programmer defined construct (see "23 Programmer Defined Constructs") which creates and returns a proc which here has a single Var_t parameter and whose body is the code between the parenthesized list of formal parameters and the 'end' token.

21 Data Copy Facility

Structs and arrays can be assigned in Zed, and if the contained data supports it, such copies can be done using fast byte (or even longword) copies. Slower copying will happen if fields or elements require individual handling. This is the case for trackables, enum members and record selector values.

However, there are situations in which the values to be copied allow it (e.g. 'uint', 'float' and other similar types whose values are unconstrained), but the details of the required copy do not allow for a simple bulk assignment. For example, if code is working with raw buffers of data ('bits8' values), and wishes to extract multiple 'uint', 'float', etc. values from it, the Zed language does not directly allow the usual techniques of casting pointers to a common type (e.g. C's "void *"), and using utility routines (e.g. "memcpy") to do the bulk copying.

What Zed does allow is assignment of any '@' value to an '@' 'void' destination. User code cannot directly make use of this fact to write a data copying facility because user code can perform no operation on '@' 'void' values. The Zed "data copy facility" has been created to meet this need.

The data copy facility is exported from package Exec and so can make use of resources within that package. In particular, it can generate the bulk data copy "Exec_t" nodes in situations which would otherwise not allow it.

The data copy facility works by exporting compile-time procs which will safely initialize 'ro' structs describing bulk data sources and destinations, and by exporting proc "CopyData" which takes one source and one destination and does the bulk data copy. "CopyData" updates the descriptors, and so can be used to do scatter/gather copying operations.

The exported items are:

Exec/DataSource_t - 'ro' struct describing a copy source
Exec/DataDest_t - 'ro' struct describing a copy destination
Exec/InitSource - compile-time proc to initialize a DataSource_t. Its call signature is expected to be "InitSource(DataSource_t, data-source, uint-offset, uint-bits8-count)"
Exec/InitDest - compile-time proc to initialize a DataDest_t. Its call signature is expected to be "InitDest(DataDest_t, data-destination, uint-offset, uint-bits8-count)"
Exec/CopyData - proc with signature "CopyData(@ DataDest_t; @ DataSource_t)void"

The source/destination parameters on the init calls should be either references to one-dimensional matrixes (vectors), or '@' of an item. If data within an array or struct (as opposed to right at the beginning of those larger entities) is the desired data, then use the "uint-offset" parameter to specify the 'bits8' offset desired, rather than trying to take '@' of an indexed location within the array, or a struct field. Doing the latter makes just the selected element or field be the data to be copied (which might be what is desired, but more often not).

Note that "InitSource" and "InitDest" directly take the structs as their first parameters, and not '@' of them. This is possible because they are 'ctProc' procs. This is needed to work with Zed's rules dealing with '@' scopes. "CopyData" is a normal run-time proc and so takes '@' of those structs.

Here is a simple example of using the data copy facility:

    def COUNT1 = 100;

    proc
    test2()void:
        var arrOff := 16;
        [COUNT1 + 16] float aF;
        [(COUNT1 + 16) * (sizeof(float) / sizeof(bits16))] bits16 aB16;
        Exec/DataSource_t dsrc;
        Exec/DataDest_t ddst;
        Exec/InitSource(dsrc, @aF, arrOff * 8, COUNT1 * sizeof(float));
        Exec/InitDest(ddst, @aB16, arrOff * 2 + 4, COUNT1 * sizeof(float));
        Exec/CopyData(@ddst, @dsrc);
        Fmt("test2 after run:\n  dsrc ", dsrc, "\n  ddst ", ddst);
    corp;

"aF" is an array of floats. "aB16" is an array of 'bits16' values. COUNT1 floats are copied from byte offset 16 * 8 in "aF" into byte offset 16 * 2 + 4 of "aB16". Note that "DataSource_t" and "DataDest_t" have custom formatting procs attached to them, and so can be used with "Fmt".

Here is a more complex example, which does a "scatter" operation, taking FIXED_COUNT 'bits8' values from "fixed" and scattering it to multiple vectors (buffers) indexed from "bufVec", starting at offset OFFSET in the first such buffer.

    def BUF_COUNT = 5, BUF_SIZE = 1000, FIXED_COUNT = 4000, OFFSET = 257;

    proc
    test4()void:
        /* Set up scenario. This is a "scatter" operation, where "FIXED_COUNT"
           bytes are being copied from a single buffer into a number of other
           buffers each of size "BUF_SIZE", starting at offset "OFFSET" in the
           first buffer. "OFFSET" affects the final alignment of the individual
           copy operations, and so controls whether the loads/stores are 8, 16, 32
           or 64 bits each. */
        con bufVec := matrix([BUF_COUNT] [] bits8);
        for i from 0 upto BUF_COUNT - 1 do
            bufVec[i] := matrix([BUF_SIZE] bits8);
        od;
        con fixed := matrix([FIXED_COUNT] bits8);
        for i from 0 upto FIXED_COUNT - 1 do
            fixed[i] := i;
        od;

        /* Initialize for copy. */
        Exec/DataSource_t dsrc;
        Exec/InitSource(dsrc, fixed, 0, FIXED_COUNT);
        var bufInd := 0, off := OFFSET, len := FIXED_COUNT;
        /* Copy loop. */
        while
            /* Local buffer reference needed to avoid problems with '@' scopes. */
            con buf := bufVec[bufInd];
            Exec/DataDest_t ddst;
            Exec/InitDest(ddst, buf, off, len);
            Exec/CopyData(@ddst, @dsrc);
            Fmt("test4 after partial copy:\n  ddst ", ddst, "\n  dsrc ", dsrc);
            len := ddst.ddst_bits8Left;
            len ~= 0
        do
            bufInd := bufInd + 1;       // move to next buffer
            off := 0;                   // start at beginning of it
        od;

        Fmt("test4 after all, dsrc: ", dsrc);
    corp;

22 Persistence, Run Time Paths and Databases

The word "persistence", when it refers to computer data, essentially means data that will still exist after a computer program which creates or uses it has exited. Persistent data will still be around when the program is next run. This differs from typical data within a program, which disappears when the program exits.

In most computer systems, "files" are the usual persistence mechanism for data storage. Some programming languages, including Zed, offer "persistent variables", which can be used much like traditional variables, but maintain their values between runs of programs.

22.1 Persistent Variables

A persistent variable in Zed must be declared at the package level. It's name must be prefixed with "$" - that is the signal to the Zed compiler that the variable is to be persistent. Using a persistent variable is done the same as using a normal variable, but there are additional rules and considerations.

A simple example:

    uint $IdCounter := 0;

    export proc
    GetNextId()uint:
        $IdCounter := $IdCounter + 1;
        $IdCounter
    corp;

Here, a program is maintaining a persistent count of the number of times that "GetNextId" has been called. Each call returns a new, unique value. Given that Zed uints are 64 bit values, the proc can be called many times before the addition will fail with an overflow.

Persistent variables can be single-valued or multi-valued (structs and arrays). They are not allowed to be or contain any "address" values other than strings. Some valid examples:

    string $WinningPlayerName;
    [10] uint $TenBestScores;

    struct Employee_t {
        uint emp_id;
        string emp_name;
        float emp_salary;
        bool emp_active;
        uint emp_supervisor;
    };
    Employee_t $TheBigBoss := {1, "Chris", 1_000_000., true, 0};

"Tracked" values, such as records, capsules, and in-memory vectors cannot be or be within any persistent variables. Saving of a linked network of trackables can sometimes be done by turning the network into a byte stream ("serialization"), and saving that bytestream.

You cannot take the address of persistent variables - they do not have addresses because they do not "live" in memory. This applies to '@' addresses and '&' addresses. The Zed run-time code knows how to find persistent variables, but that will typically not involve normal addresses. Note that this limitation affects how persistent arrays or structs can be given directly to "Fmt". This is because that proc operates by taking the address of such things and working with that. The "Fmt" code will, for "small" values, declare a temporary of the required type and read the persistent value into that variable, and then use the '@' of that variable in its internal operations.

Persistent strings, whether stand-alone variables or within larger variables, could be stored in such a way that duplicate string items are created. Reading equivalent strings from persistent storage can create multiple equivalent in-memory strings. The Zed language does not specify anything here, so use of the '==' and '~==' operators on strings retrieved from persistent storage is not useful.

There is no limit on the length of a persistent string other than that from the total size of the persistent store. However, there is no way to work with a persistent string without first having the string be a normal string in memory, so the amount of memory available will be the limiting factor.

Persistent variables can have a visibility specification and storage flags just like regular package variables. An 'ro' persistent variable can be modified by code within its package of definition, but code in other packages can only read it, as usual. 'export' lists, etc. operate normally.

The concept of a 'con' persistent variable is likely not very useful, but there might be some situations which can use it. Such a variable will always contain the value it was given when it was first created, i.e. the value computed for it on the first run of the program containing its declaration. That will usually be the same as for a non-persistent 'con' package variable, unless the expression which initializes the variable changes from run to run. A non-persistent 'con' variable can thus have a different value from run to run, but a persistent 'con' variable cannot.

[The exact semantics of the 'volatile' storage flag for persistent variables has not yet been worked out. However, my current thought is that all persistent variables are essentially 'volatile', in that every reference or store will go to the persistence code. I've been told that in Perl, a persistent variable referenced with a '|' on an assignment has the additional meaning that code execution will wait until the new value is actually flushed all the way to the physical persistent store (e.g. the spinning rust of a disk drive). Pending further investigation, that would be the semantics of volatile persistent variables in Zed.]

Programmers should not attempt to use persistent variables as a means of interprogram or interprocess control and synchronization. The exact semantics of persistent reads and writes is not specified. Programmers should use explicit synchronization mechanisms provided by appropriate libraries. As mentioned above, the Zed language treats all persistent variables as 'volatile', but that is only with respect to the interfaces to the persistence code, and not with respect to an underlying physical mechanism or operating system. [The persistence code currently does all writes as write-through, but has a general purpose cache of the most recently used values.]

The current persistent storage code guarantees that if a persistent item is smaller than the block size, the data for the item will not span multiple blocks. This attribute might be useful when dealing with inter-process communication. However, the size of the blocks used for persistent storage is not visible at the programming language level. And, with the current implementation, the block size is not known until run time. A run time proc could be added to return the in-use block size.

22.2 Persistent Vectors

Zed allows persistent vectors, as in:

    [] Employee_t $Employees;

Such a persistent vector initially has no elements, and so cannot be read from. The "next" element in a persistent vector can be written. Thus, it is allowed to write to element "0" of an initially empty persistent vector, to element "1" of a persistent vector with one existing element, etc. The number of elements in a persistent vector can be determined using the "8.1 "getBound"" construct (which in this situation yields a 'bits64' value instead of the usual 'uint'). Thus:

    $Employees[getBound($Employees)] :=
        Employee_t(GetNextId(), "Fred", 1., true, 0)@;

is valid. Each such assignment to the whole of or part of the persistent vector element beyond the last will grow the persistent vector by one element. The only limit on the number of elements in a persistent vector is governed by the available space in the underlying persistent storage. Note the '@' after the struct constructor above, needed because a struct constructor yields '@' of the temporary storage used for the struct.

Read references beyond the highest element in a persistent vector yield an error at run-time, as do write references beyond one greater than the highest existing element.

Note: if you are creating a new vector element using multiple assignments to its fields, make sure you only use 'getBound' once to determine the index of the new element, otherwise you may end up with multiple partially-defined vector elements.

22.3 Run Time Paths

The persistent variables discussed so far have been referenced directly by their names as package entities. That is convenient, but not very flexible - many applications will need to choose at run-time where the data they are working with is located. "Paths" in Zed are the solution to that issue.

Referencing something whose location in persistent storage is not known until run-time is somewhat like using a proc or method which is determined at run-time. The latter uses the syntax of enclosing a referencing expression in braces. Referencing through paths uses that same syntax.

22.3.1 Current Working Package

It is not uncommon for code to want to know "where am I now?" in terms of the package hierarchy of the entire system. The '.' syntax gives that answer (yielding a "Package_t") about the compile-time current location. The '{.}' syntax gives the answer for the run-time location (current working package). This is like the "current working directory" concept in traditional operating systems.

Note that '{.}' is a single token in the language - you cannot have spaces between the characters. Trying to use spaces gets syntax and semantic errors - as mentioned above, the brace brackets are used for indirectly calling a proc via expression, and other indirection-like things.

22.3.2 Path Types

Path types are like '@', '*' and 'template' types in form:

'path'
optional storage flags ('con', 'ro', 'volatile', 'nonNil', 'private')
path target type or '*'

Path values are tracked values.

'path nonNil *' is used to indicate a path to a package - it must always include the 'nonNil' storage flag. No trackable types other than 'string' are allowed in path types, but paths can reference persistent vectors. Type 'path void' represents a path to a persistent variable which has not yet been checked for any required type or storage flags.

Storage flags 'con', 'ro' and 'volatile' correspond to the same ones on actual persistent variables. Storage flag 'nonNil' means that the path actually refers to something, i.e. that it has been resolved at run-time. The 'assign' construct can be used with path values to explicitly check paths, thus yielding path values with the 'nonNil' attribute. See below for examples.

Normally, the contents of a path value are visible. For example, "Fmt" will show the path if one is displayed. Storage flag 'private' on a path type prevents this by making the path contents not visible. The path can still be followed, however. Assignment, parameter passing, etc. can freely add 'private' to a path value, but it can never be removed.

Two path types are assignment compatible if their storage flags match using the usual rules and their target types match or one is a direct rename of the other, or they match because they are the same instance type or are matching array types.

22.3.3 Path Constructors

Paths in Zed are run-time entities which consist of a starting package and a path relative to that package. They are created by the 'path' construct, which has syntax:

'path'
'('
expression yielding the starting package
','
string expression yielding the relative path from that package
')'

Some examples:

    path(., "MyCounter")
    path({.}, "DataPackage")
    path(/Users/Chris/Game, "Images/Backgrounds/" + regionName)
    path(findDataPackage(worldName), "ids")
    path(../Data, "Resources")
    path nonNil * pPth := ...
    path({pPth}, "Rules")

Note that the relative path portion does not need to start with a '/'. Paths are conceptually evaluated by changing the current context to the package which is the path starting point, then evaluating the relative path in that context. If a relative path starts with '/' or contains any '..''s, then the final target of the path might not be "within" any expected part of the overall package tree. For some situations, such as command parameters, that might be what is desired. Other situations, however, will want to constrain path targets to within a specific area and so should not allow relative paths starting with '/' or containing '..'. Utility proc "Package/VerifyPath" checks paths for validity and also reports those two conditions.

The path construct executes at run-time, using the parameters it has been given. It yields an item of type 'path void'. The path is not checked to see if it actually references something, until an 'assign' is done with it. Path types are trackable types. Note that it makes no sense to use an empty string in a path constructor - that would be equivalent to just using the base package. Such a path will not resolve - it is considered to have an invalid path string. See below for examples.

22.3.4 Paths to Packages

The '{.}' syntax references the current operating package at run-time. Other packages can be accessed using the package path follow syntax:

'{'
expression of type 'path' '*' giving path to follow
'}'

As mentioned above, all 'path nonNil *' values always include the 'nonNil' attribute, so it is known that they reference a package. The result of a package path follow is a 'nonNil' reference to a value of type "Package/Package_t". As such, fields of that record type can be directly referenced.

The package path follow cannot be written to - it is a value only. The fields of the referenced "Package_t" cannot be written unless the code being compiled has the access rights of package Package.

Example:

    con pth := path({.}, "BulkItems");
    if assign path nonNil * pthPk := pth then
        con itemCount := {pthPk}->pk_contentsCount;
        ...
    else
        Fmt("*** Cannot find BulkItems package!");
    fi;

Typically, a package referenced by a package path follow will be used more than once, so it is assigned to a local variable, rather than doing the path follow for each reference. Doing this also guarantees that all references will be to the same entity, regardless of any changes to the overall package hierarchy that happen.

22.3.5 Paths to Persistent Variables

Paths to persistent variables are also created with the 'path' constructor. Following them is very similar to following a path to a package:

'$'
'{'
expression giving path to follow
'}'

The '$' identifies these as persistent variable accesses. Since path constructors yield values of type 'path void', which cannot be followed, it is necessary to use 'assign' constructs to produce typed paths which can be followed. For example:

    con pth := path(if useMine then .. else {pthEx} fi, "Helpers/Counter");
    if assign path nonNil void pthNN := pth then
        if assign path nonNil uint pthU := pthNN then
            ${pthU} := ${pthU} + 1;
        else
            Fmt("*** .../Helpers/Counter is not a uint");
        fi;
    else
        Fmt("*** Can't find .../Helpers/Counter in ", pth);
    fi;

Here, a path is constructed to refer to an appropriate package. The desired persistent variable, "Counter" (which would be "$Counter" if referenced directly by name) is checked for in the chosen package. If it is found, it is then checked to see if it is of type 'uint'. If it is, then it is incremented. The two checks can be combined into one 'assign' clause, going directly from the path constructor (which yields a value of type 'path void' with no 'nonNil' storage flag) to the final 'path nonNil uint' value. Note that when referencing variables in other packages, the variables usually must be exported, in order that the running code has permission to access them.

Storage flags can be given in path types. As discussed previously, 'nonNil' in a path type means that path objects of such a type have been checked to see that they properly reference a target, whether it is a package or a persistent variable.

When paths are being checked, if the targetted persistent variable is not universally exported (i.e. if it has no 'export', or it has a qualified export clause), then it will not be found via a path. Executing code does not typically have any compile-time access rights. Such rights can come about during run-time construction of code, but those rights are temporary and do not affect indirect access to persistent variables.

If a persistent variable addressed by a path is marked as 'con', or the storage for the variable is read-only, then a valid path to the variable must include the 'con' storage flag. If a persistent variable is not marked as 'con', then a valid path to it must not include 'con'. The 'con' storage flags must match.

If a persistent variable addressed by a path has storage flag 'ro' then a valid path referencing it must have storage flag 'ro'. A path can freely add 'ro' since that just takes away its write access. If a persistent variable has storage flag 'volatile', then a valid path referencing it must have storage flag 'volatile'. A path can freely add 'volatile' since that just results in more accesses to the variable, and hence less efficiency.

Putting several things together can yield some "interesting" syntax:

    struct Str_t {
        uint str_count;
        float str_size;
    };

    [] Str_t $StrVec1, $StrVec2;

    proc
    doTest(bool useOne)void:
        if assign path nonNil [] Str_t con pth :=
            path(., "StrVec" + "12"[toUint(useOne)])
        then
            for i from 1 upto 10 do
                ${pth}[getBound(${pth})].str_count := i;
            od;
            Fmt("Size now ", getBound(${pth}));
        fi;
    corp;

Each time this code is run with a given value for "useOne", the selected persistent vector will be increased in length by 10 elements. The "str_size" fields are not explicitly set in this example.

22.4 Databases

As described above, persistent vectors expand as needed - writing to an element one past the current end will add a new element to the vector. This is very similar to adding a record (row) to a database - the system finds space for it, and tells you the record number of the new record. The difference is that it is possible to delete records from a database, whereas deleting an element from a persistent vector would require rewriting all of the elements beyond the deleted one - a potentially very expensive operation. An alternate solution is for the program to maintain a separate persistent vector which contains 'bool' values saying whether or not particular elements in the main vector are to be considered "present".

22.4.1 Introduction to Zed Databases

In the current Zed implementation, a bitmap within the persistent store is added to the internal representation of a persistent vector, thus producing a (persistent) database. This bitmap grows as needed and flags whether or not the corresponding element in the main database vector is currently "present". The form of the records in a database is defined by a 'struct' type. This yields a new "DbType" (database) type which is the type of a database created using that type (there can be multiple). The fields of the 'struct' become the "columns" of the database records.

As an example, consider the following struct:

    struct Customer_t {
        uint cust_id;
        string cust_name;
        [3] string cust_address;
        float cust_value;
        bits8 cust_age;
        char cust_sex;
        bits8 cust_flags;
    };

Since there can be many records in a database, we have used a bits8 value for the customer age (maximum value 255), and a single character to contain the various modern options for "sex". We have used a single string for the customer name and an array of 3 strings for the address. This representation differs from what would have been used in many traditional databases for the name and address values. Those values would typically have been fixed-size arrays of characters, since those database systems required a fixed size for the customer records.

Using string values in Zed may give us less wasted space, but will cost us in terms of access time. This is because a persistent string in Zed is a separate item within the persistent store, and thus reading, e.g. "cust_name" requires that the code fetch an item id from the "cust_name" space in the database record, and then fetch (or store on a write) the string contents from that separate item. The traditional fixed size character array can be used in Zed if desired. Note that all database records still are the same size, it is just that some of the information for them may be stored elsewhere, "out of band".

Given the above Customer_t struct type, we can define, at the package level, a database type:

    DbType CustTable_t Customer_t;

[I had initially just used "table" for the type description, not "DbType". But, it quickly became clear that out of context it was not clear that it is a *database* table type that is being created, rather than some in-memory type.]

With "CustTable_t" defined as a type, we can now declare an actual database, at the package level:

    CustTable_t $Customers;

We can now add, delete, reference and modify records (rows) in $Customers.

Package /Db contains procs for dealing with databases, so it is normal for code working with databases to "use /Db;". Proc "Insert" takes a database value and adds a record to it, returning the row number (a 'bits64' value) as result. Proc "Delete" takes a database value and a row number, and deletes that row from the database. Proc "Present" takes a database value and a row number, and returns 'true' if that row is currently present in the database, 'false' if not. Proc "Highest" takes a database value and returns a 'bits64' value which is at least as large as the highest currently present row number in the database. See later for how these procs are declared to work with any database.

Putting all of this together:

    proc
    addRecords()void:
        $Customers[Db/Insert($Customers)] :=
            Customer_t(101, "Flintstone, Fred",
                       ["55 Cobblestone Road", "Bedrock", "USA"],
                       1000000., 28, "M", 0xff)@;
        $Customers[Db/Insert($Customers)] :=
            Customer_t(109, "Rubble, Barney",
                       ["57 Cobblestone Road", "Bedrock", "USA"],
                       500000., 26, "M", 0xf0)@;
    corp;

    proc
    showRecords()void:
        for recNum from 0 upto Db/Highest($Customers) do
            if Db/Present($Customers, recNum) then
                Fmt($Customers[recNum]);
            fi;
        od;
        Fmt();
    corp;

    proc
    deduct(float amount)void:
        for recNum from 0 upto Db/Highest($Customers) do
            if Db/Present($Customers, recNum) then
                Customer_t cust := $Customers[recNum];
                con value := cust.cust_value;
                if value < amount then
                    Fmt("Customer \"", cust.cust_name, "\"/",
                        cust.cust_id, " has expired - deleting\n");
                    Db/Delete($Customers, recNum);
                else
                    $Customers[recNum].cust_value := value - amount;
                fi;
            fi;
        od;
    corp;

Proc "addRecords" shows using a struct constructor for Customer_t to initialize newly inserted rows. Note the '@' after the constructor - struct constructors yield '@' of the temporary struct variable that the construction happens into.

Proc "showRecords" shows using Db/Highest and Db/Present to scan through the database rows, printing all of them. [Note that there should be a user-defined construct exported from package Db which can combine the above 'for' and 'if'.] Direct printout like this works because the test code has "eval FmtAdd(Customer_t);" after the declaration of Customer_t. The produced output isn't the nicest possible - e.g. "cust_age" and "cust_flags" are shown in hexadecimal.

Proc "deduct" shows access to an individual field (column) of a record, and using Db/Delete. It also shows using a local variable ("cust") as an in-memory copy of the database record being worked on. This is a bit easier to read, and can be more efficient since it might avoid some implicit calls to the persistence code. Note that fetching the record into the local variable will fetch all of the strings that are part of the record, so there is a cost tradeoff.

See "7.2 One Dimensional char Arrays" for information on how Zed helps deal with one dimensional arrays of 'char' in more string-like ways.

22.4.2 Details of Zed Databases

A database reference ("$Customers" in the above) is a trackable value. If a program is accessing multiple databases of the same type, the relevant database value can be passed around to procs working with the selected one, with the initial values coming from package-level databases like "$Customers":

    con currentDb := $Customers;

The 'DbType' declaration creates a new database type. Syntactically, a DbType declaration consists of:

'DbType'
optional storage flags ('ro', 'con')
'void' or struct type reference
name for new type
optional virtual fields list

A "virtual fields list" consists of:

'{'
one or more virtual field pairs, separated by commas
'}'

A database type based on 'void' is similar to '@' or pointer types based on 'void' - it is a universal destination for database values, subject to any storage flags involved. An 'ro' storage flag on a database type means that field, etc. accesses to records of a value of that type can only read data or examine other information about the database. Thus, fields of records in 'ro' databases cannot be written, and "Db/Insert" and "Db/Delete" cannot be used. The latter two are prohibited by their not having 'ro' in their 'void' database parameter. "Db/Highest" and "Db/Present" do have 'ro' in their 'void' parameter, and so can be applied to databases that the caller only has 'ro' access to. Thus, code which has write access to a database can pass a reference to that database to other code, giving that code only read access.

The struct type that a database type is based on must be already defined, and must contain only fields of types which can be persisted. Nested structs and arrays are allowed. Nested matrixes are not allowed, just as they are not allowed within structs which are persisted directly as persistent variables. As with other persistent values, the only trackable type allowed within database records (rows) is 'string'. "Customer_t", above, is an example of using an array of strings as a field.

If 'void' is specified instead of a reference to a struct type, then the database type being defined is a "general" database type. Subject to the usual rules concerning storage flags, a "void DbType" can accept a value of any database type. Such a database can be passed to the /Db utilities, but cannot be indexed - there is no information about an associated struct type. Similarly, virtual fields cannot be added to a "void DbType".

A virtual field pair consists of:

name of proc
':'
name for new virtual field

The name for the new virtual field must not already exist as a name accessible within the type's associated struct. Virtual fields can only be read, never written. Virtual fields do not occupy any space within the database record - the proc is called to compute or find the required value on each reference to the field.

The proc used for a virtual field must have only one parameter, which is an 'ro' '@' of the struct type associated with the database type. The return type of the proc is the type of the virtual field, and so cannot be 'void'. The type of a virtual field can be a type which is not allowed as a field within the struct type. For example, it is possible to return a trackable value, e.g. as a reference to a cached copy of the database record.

A short example of using virtual fields:

    struct Str1_t {
        uint str1_n;
        float str1_x;
        sint str1_dim;
        bool str1_flag;
    };

    proc
    vField1(@ ro Str1_t aStr1)uint:
        aStr1@.str1_n + toUint(aStr1@.str1_dim)
    corp;

    proc
    vField2(@ ro Str1_t aStr1)bool:
        aStr1@.str1_n >= 1000 or aStr1@.str1_flag
    corp;

    DbType Str1Table2_t Str1_t {
        vField1 : str1_sum,
        vField2 : str1_both
    };

    Str1Table2_t $Str1Tab2;

    proc
    test2()void:
        Fmt("test2 starting");
        con recNum := Db/Insert($Str1Tab2);
        $Str1Tab2[recNum].str1_n := 1000;
        con n := $Str1Tab2[recNum].str1_sum;
        Fmt("Got n = ", n);
        Fmt("str1_both: ", $Str1Tab2[recNum].str1_both);
        Db/Delete($Str1Tab2, recNum);
        Fmt("test2 done\n");
    corp;

The above is a functional example, but does nothing useful.

The procs exported from package /Db include:

    proc Highest(TableVoidRo_t nonNil tab)bits64;
    proc Insert(TableVoid_t nonNil tab)bits64;
    proc Delete(TableVoid_t nonNil tab; bits64 recNum)void;
    proc Present(TableVoidRo_t nonNil tab; bits64 recNum)bool;

All have been shown in use earlier. Types "TableVoidRo_t" and "TableVoid_t" are types used with package /Db as general purpose database types.

Zed databases exist within "ItemStores". In the example above, the ItemStore in use is the default one, which contains persistent items declared with a leading "$". That ItemStore is currently automatically created or opened when a program is run to use one. It is supported within a file-based "BlockStorage_t" within file "ZedWorld". The Zed BlockStorage/ItemStore code exports procs which allow programmers to create and re-open ItemStores, either memory based ones or file-based ones.

This is done with procs /DataStore/Create and /DataStore/Open. Those procs, if they succeed, return a non-nil /DataStore/ItemStore_t. A Zed database can be created using a DbType constructor. If the constructor has no parameter, the database will be created in the default ItemStore (in "ZedWorld"), else it will be created in the ItemStore_t passed to the constructor. The constructor yields a properly-typed reference to the created database. The created database can be used as normal. Note, however, that an in-memory ItemStore is lost when the program exits. Note also that there is no way to re-open a database within a non-default file ItemStore, since there is no way to find it. Thus, these facilities are not currently useful, other than for some basic testing of the mechanisms.

22.5 Considerations for Persistence

Persistent items and Zed databases contained in on-disk persistence files will operate reasonably. However, they cannot be relied on to be valid, since their contents can be modified outside of the running Zed program, or even by the running Zed program itself via low-level system calls (e.g. /Sys/FileWrite{Vec}). The Zed persistence code has checks to validate its operations, but it does not have access to the full compilation info for whatever code resulted in the creation of the file.

In particular, if you change the size, shape or number of the items persisted in the default persistence file ("ZedWorld"), do not expect your code to work properly on a subsequent run. The key "item ids" in use could now be wrong, and the element size of things like persistent vectors and databases could now be wrong, resulting in attempts to access beyond the end of their storage. There is no stored "schema" describing what is stored. It is recommended that all persistent items for a given program be declared together, in one package. If possible, the struct types and array sizes on which any persistent items depend should be defined there as well. That way, you are more likely to realize the consequences of a change.

The above, though scary, should not be unexpected. Any persistent storage system has the same problems - the Zed system is no different. Guaranteeing correctness of operation would require that the operating system not allow any writing, outside of the Zed system, to the resources used for persistent storage. This is typically done by privileged administrators of the operating system.

If persistent variables are implemented directly via non-volatile storage such as flash memory, or some other directly addressable technology, then the use of them could be much like the use of non-persistent variables. There would be differences - e.g. limitations on needing to write entire blocks of flash memory at a time. The time required for persistent variable access would likely be much greater than the time required for non-persistent variable access.

As of this writing, the implementation of persistent variables used in Zed is one based on an underlying block storage - e.g. disks, or a hosting file system. Because of this, access to persistent variables will take a lot more time than access to non-persistent variables. Caching within the implementation will mask some of the slowness. However, there will still be more time involved. In particular, any access to a persistent variable currently requires one or more internal proc calls to the persistence code.

Because of this, programmers might wish to do their own "buffering" of persistent variables. In other words, if you have a big persistent struct, then it is better to have a normal variable which you use to work with the separate fields. The entire struct is assigned to or from at once, thus reducing the total count of references to a persistent variable.

For example, following on from the above "Employee_t" example:

    proc
    updateBoss(string newName; float newSalary; uint newSupervisor)void:
        Employee_t theBoss := $TheBigBoss;
        theBoss.emp_name := newName;
        theBoss.emp_salary := newSalary;
        theBoss.emp_supervisor := newSupervisor;
        $TheBigBoss := theBoss;
    corp;

The benefit will typically be proportional to the number of accesses, minus the outer read and write of the persistent variable.

[There is more to it than the above, however. Reading and writing string fields requires proc calls in all cases. Reading 'enum' and record selector values requires individual calls since the values must be range checked. However, on multi-value assignments, the Zed compiler is able to wrap the various calls within "fetch/release" calls which essentially buffer and lock the needed data. That way, all data other than strings is referenced from that internal buffer instead of going all the way to the block storage. For multi-value items that do not contain strings (or range checks on reads), the compiler will use a single bulk-transfer call.]

The current implementation of persistent storage in Zed uses "holey" items. This means that space, in terms of underlying blocks, is not allocated for parts of an item until that part of the item is written to. So, for example,

    def COUNT = 1_000_000_000_000;

    [COUNT] Employee_t $EmployeeSet;

will work even when the underlying persistent store does not have enough space for the needed trillion Employee_t's. Only when the elements of the array are assigned to will needed space be allocated and used. This property can be used to implement persistent sparse arrays, and can be an alternative to persistent vectors. The actual space needed is roughly proportional to the number of populated elements, but the exact formula is complicated, and involves the size of the underlying blocks. The persistent store system uses a tree structure of blocks to represent the array sparsely, and so insertion of the first record can involve the allocation and use of several blocks.

[In fact, using the smallest block size that the Zed code supports (256 bytes), an item of size 0xffff_ffff_ffff_ffff requires 10 levels of indirection.]

The metadata for the persistent store is stored consistently in terms of byte ordering, so the data (e.g. a "ZedWorld" file) can be moved between systems with different orderings. However, all user data is stored in the byte order that it is given in, so any byte-swapping of it needed is the responsibility of the programmer. For example, if a single persistent 'uint' is stored, it will be handled as if the bytes of it are simply copied directly to the persistent store. This is most obvious when a multi-value entity can be read or written directly - it is simply copied byte-by-byte.

See "97.2 Current System Status" for options relating to the current persistent store implementation.

23 Programmer Defined Constructs

23.1 Introduction

Zed allows programmers to create procs which run at compile time, with behavior different from that of regular procs executing at run-time. See "18 Compile Time Execution". These are somewhat like additions to the Zed language itself. However their syntax is that of standard proc calls, with small differences in the parameters they can take. [Zed previously used flag 'construct' on a proc definition to specify that the proc was a "construct" proc. Its call was followed by 'begin', the body of the "construct" and 'end'. That style of user-defined construct was much more limited than the style described here, and has been removed from the language.]

Zed also allows programmers to, with limitations, create new "constructs" within the Zed programming language. The programmer must define the syntax of the new construct, and provide code, to be called by the compiler as needed at compile time, which defines the semantics of the construct. Such programmer defined constructs can be placed into library packages, for easy use by other programmers.

Since the syntax of constructs and construct-related code must be available at compile time, they must be provided to the Zed compiler before the construct can be used. The syntax is defined using a sequence of calls, at compile time, into Exec code. The construct programmer must also provide a capsule which implements a specified interface dealing with data from the use of the construct and which provides the final Exec_t code which is the implementation of a specific use of the construct. An object of that capsule type is passed to Exec in the initial call. This interface use is similar to the use of such an interface when defining 'varProc' ("18.6 "varProc" Procs") and 'ioProc' ("18.7 "ioProc" Procs") procs.

The syntax of the construct deals with the program material which follows the construct name in code which is using the construct. The construct name is a normal name in the Zed language, and is accessed just like a proc would be accessed. Typically, that would be via a path from a 'use' of the defining package. Note that this means that it is not possible to redefine any existing Zed constructs - they are built in to the standard parser.

A state structure (type Exec/TempConstruct_t) must be declared by the construct programmer code and passed in on all of the definition calls. This is similar to the state structures used when manually calling into the compiler's semantic code. This is easiest done by placing all of the needed calls in a proc which is then triggered at compile time. ("18.2 Package-Level "eval"")

The syntactic pieces of a construct are called "phrases" of the construct. They are represented internal to the Zed compiler in a tree of Exec/ConstructPhrase_t nodes. Uses of a construct can all have different material for the non-fixed phrases of the construct. These pieces are stored in a similar tree of Exec/SpecificElement_t nodes. Portions of the construct syntax which are fixed, i.e. must be present as-is in all uses of the construct, are not represented in the SpecificElement_t tree.

Phrases are categorized as either "simple" or "compound". The terms refer only to how they are dealt with in the compiler's construct code and not to what is in them. The simple phrase kinds are: fixed, "int", float, name, word, expression, block and type/storage-flags. The compound phrase kinds are: sequence, optional/alternatives and list. The phrase kinds are described later.

23.2 Simple Example

To make all of this clearer, here is the full definition of a programmer defined construct which has no body (so it is used as just its name) and which, when executed, prints "Hello there world".

    capsule HelloData_t implements Exec/ActiveConstruct_t {
        procs Exec/ActiveConstruct_t {
            proc
            End(HelloData_t nonNil hdat)nonNil Exec/Exec_t:
                template begin
                    Fmt("Hello there world.");
                end;
                Exec/NothingNew()
            corp;
        };
    };

    Exec/ConstructRunner_t: proc
    helloStart(Package/PContext_t nonNil pctx;
               Exec/SpecificConstruct_t specon)nonNil Exec/ActiveConstruct_t:
        HelloData_t()
    corp;

    proc ctProc
    defineHello(Package/PContext_t nonNil pctx)void:
        Exec/TempConstruct_t tcon;
        Exec/ConstructStart(pctx, @tcon, "Hello", Package/nl_export,
                            helloStart, nil);
        Exec/ConstructEnd(@tcon);
    corp;
    eval defineHello();

Here is a test proc which uses this construct:

    proc
    testHello()void:
        Fmt("testHello starts");

        Hello;

        Fmt("testHello completes");
    corp;

Capsule "HelloData_t" implements the required Exec interface ActiveConstruct_t. Since it needs no other information from the compiler, it only implements the "End" method, which is required to yield the Exec_t representing the construct's semantic replacement for the use of the construct. Here, a template block does all of the work, so the "End" method returns "Nothing".

Proc "helloStart" exists only to create a "HelloData_t" object for a use of the construct. It has forced proc type Exec/ConstructRunner_t, which makes it a valid value for the subsequent call to "ConstructStart". The Zed compiler will call "helloStart" when it sees a use of the construct. Note that because capsule "HelloData_t" implements interface "Exec/ActiveConstruct_t", the returned capsule value is compatible with the declared result type.

Proc "defineHello" defines the syntax of the construct, and provides the needed additional items to the compiler. In this case the construct is named "Hello", and it is fully exported from the defining package. Here there are no calls between the "ConstructStart" call and the "ConstructEnd" call, thus construct "Hello" has no body.

The construct is defined by the compile time call to "defineHello". After that call, the construct is available for use.

As mentioned above, programmer defined constructs do not allow programmers to redefine the syntax or semantics of any existing Zed language constructs. A further limitation is that since such constructs are defined within and run within the Zed programming language, the rules of Zed will limit the exact details of the syntax of constructs. For example, a programmer cannot change the way the Zed compiler interprets literals - that process is done long before anything that a new construct does. Similarly, Zed's rules for names cannot be altered. These limitations likely mean that a programmer cannot use this facility to define constructs which exactly handle some pre-existing "language". However, some string preprocessing may be able to convert material from that pre-existing language into something that can work with Zed constructs.

[Construct programmers are strongly advised to fully test their constructs, starting with very simple tests, before trying to use them in real situations. If the definition process fails, the construct will not be defined, and uses of it will get undefined name errors and quite possibly dozens of parsing errors. In some of my tests, especially with construct lists, the entire remainder of the test program has been skipped over as the parser tries to figure out what is going on in the absence of the construct.

One issue to note is that any use of a "Block" phrase requires that there be a "statement ending token" after the block. This is required so that the Zed parsing code, parsing a block, will stop at that ending token. Otherwise, the parser may consume some or all of the remainder of the construct instance, and perhaps more items, before it stops looking for "statements". This need may require the construct programmer to change the syntax of the construct to allow for this. The ending token is normally required immediately after the block in a sequence phrase, but it can also be the start/end token of an alternative phrase which directly encloses a sequence containing the block. See the "StringCase" example in "23.16 Example Construct Syntaxes".

This situation has caused me to modify at least one construct to use an existing ending token instead of the preferred keyword. An alternative is to use an "Expr" phrase instead of a "Block" phrase. That way, only one Zed statement/expression will be parsed. Note that this may require some explicit scopes ('begin'/'end' blocks) in some construct uses.

Also helpful in debugging the syntax of constructs is "Exec/ConstructSetDebug". This call needs the ConstructRunState_t reference, so is useful in the method callouts when an instance of a construct is being parsed. Output enabled by this facility is fairly detailed, and so should likely be closely controlled to avoid too much output. In a large construct, it may be useful to turn the flag on and off during the parsing of a construct use, so that output is only for those portions which are causing problems.]

23.3 ActiveConstruct_t

The full definition of interface ActiveConstruct_t is shown below. The various methods in it will be described in the relevant descriptions of the syntax-defining calls. The capsule is 'partial', which means that the methods in it are optional - an implementing capsule need not provide a method if it has no need for it. However, method "End" is required since it provides the Zed compiler with the Exec_t tree which is the replacement for the entire construct use.

The exact point at which the various methods can be called is not specified - it depends on the parsing and other processing of the specific method use. In general, they occur after required data from the parser is received, i.e. after the parser has parsed a chunk of the construct use.

    export interface ActiveConstruct_t partial {
        proc
        Start(poly acon; ConstructRunState_t nonNil crst)bool;

        proc
        Fixed(poly acon; string nonNil str)bool;

        proc
        Int(poly acon; sint n)void;

        proc
        Float(poly acon; float n)void;

        proc
        Name(poly acon; string nonNil name)bool;

        proc
        Word(poly acon; ConstructWordKind_t cwk; sint n; float f;
             string wrd)bool;

        proc
        Expr(poly acon; Exec_t nonNil ex)bool;

        proc
        Block(poly acon; Exec_t nonNil ex)bool;

        proc
        TypeSf(poly acon; Types/Type_t nonNil t; Types/StorageFlags_t sf)bool;

        proc
        SequenceStart(poly acon)bool;

        proc
        SequenceEnd(poly acon)bool;

        proc
        AltTrigger(poly acon; string nonNil str)bool;

        proc
        ListStart(poly acon)bool;

        proc
        ListEnd(poly acon; uint count)bool;

        proc
        End(poly acon)nonNil Exec_t;
    };

If the capsule for the programmer defined construct implements the "Start" method, then that method is called as soon as the Zed compiler has noticed the construct use and started handling it. Exec/ConstructRunState_t is a record type which is used by the compiler to track the progress of dealing with the construct use. In particular, field "crst_pctx" is the Package/PContext_t that needs to be used by all of the interface methods to deal with code in the context of the construct use. Typically, that value will be saved in the capsule's record (the above "HelloData_t" capsule has no record and does not need a PContext_t).

Note that all methods in the interface except "End" return a 'bool' value. This allows them to signal an error to the Zed compiler - they must return 'true' if all is well, else 'false' if they detect an error according to their semantic rules for the construct. The capsule methods can also use the various error message generating procs exported from package Package. Such error messages should be issued in the earliest method callout, so that they appear as close as possible to the item being complained about.

23.4 ConstructStart

    export proc
    ConstructStart(Package/PContext_t nonNil pctx; @ TempConstruct_t aTcon;
                   string var nonNil name; Package/NameLevel_t nl;
                   ConstructRunner_t nonNil runner;
                   ConstructDisplayer_t condisp)void;

"ConstructStart" is used in the above example. It must be the first call when defining the syntax of a programmer defined construct in Zed. The PContext_t passed in should be obtained from the compile-time environment in which the ConstructStart call is being made. The easiest way to do this is as shown in the above example. The name passed in must be a valid Zed name, and must not be defined in the current package compilation context. It will be defined as the last step of defining the construct.

The NameLevel_t passed in will control the visibility of the construct. It can be any of Package/nl_private, Package/nl_local or Package/nl_export. No provision has been made to support an explicit export list.

The ConstructRunner_t proc is called by the compiler to obtain a capsule object to use to access the interface methods, etc. It's use is seen in the above example.

Parameter "condisp" of type Exec/ConstructDisplayer_t is intended to be used to pass in a custom pretty-printer which will be used whenever code containing a use of the construct is pretty-printed (displayed). This usage is currently not supported - the default formatting code has so far done a good job of displaying construct uses.

23.5 ConstructFixed

    export proc
    ConstructFixed(@ TempConstruct_t aTcon; string nonNil str)void;

This proc is used during definition of the syntax of a construct to specify that a fixed item (phrase) is to appear literally in all uses of the construct. The item is given as a string, and it can be a defined token in the Zed language (either a reserved word or a "punctuation" token), or can be a valid name to be used as a "keyword".

[A reserved word in Zed is one which can normally never be used by programmers for any purpose other than as defined by the language itself. "keywords" in other programming languages are those whose special meaning is only triggered in appropriate contexts. When used this way in Zed programmer defined constructs, they are all used as keywords, in that any special meaning they normally have is ignored.]

If the line

    Exec/ConstructFixed(@tcon, "Fred");

were added to proc "defineHello" in the above example, between the calls to "ConstructStart" and "ConstructEnd", then the "Hello" construct would require the identifier (here used as a keyword) to appear after the construct name in any use of the construct. Thus, in proc "testHello" the use

    Hello;

must then be changed to

    Hello Fred;

If a "Fixed" method were added to the capsule's ActiveConstruct_t methods, it would be called by the Zed compiler as it sees and checks the "Fred" in the example. The string "Fred" would be passed in on that call. Here, the callout serves little purpose, but in more complex constructs, such callouts can allow the construct code to track the progress of the parser as it deals with a given construct use.

23.6 ConstructInt

    export proc
    ConstructInt(@ TempConstruct_t aTcon)void;

This call adds an "int" phrase to the construct being defined. When the construct is used, an integral literal is expected at this point. This form is simpler to use than the more powerful "Expr" phrase kind, but is more limited - it only accepts literals, with an optional leading sign. The full range of Zed 'uint' input forms is accepted, but the actual form is not retained, unlike with "Expr" phrases. Callout method "Int" is called with the actual 'sint' value.

23.7 ConstructFloat

    export proc
    ConstructFloat(@ TempConstruct_t aTcon)void;

This call adds a "float" phrase to the construct. When the construct is used, a floating point literal must appear. A leading sign is allowed, and all Zed 'float' input forms are accepted. As with "ConstructInt", the specific input form is not retained. Callout method "Float" is called with the actual 'float' value.

23.8 ConstructName

    export proc
    ConstructName(@ TempConstruct_t aTcon)void;

This call adds a "name" phrase to the construct being defined. In a use of the construct, a name (a reserved word is not accepted) must be provided to match this phrase. If a "Name" method for ActiveConstruct_t is provided by the construct's capsule, it will be called with the provided name as parameter.

The name provided may or may not be defined within the current context. It is up to the construct code (typically in the "Name" method) to do any needed checking. Name phrases typically are used to interact with names defined within the larger Zed context - using them, defining them or both.

23.9 ConstructWord

    export proc
    ConstructWord(@ TempConstruct_t aTcon)void;

This call adds a "word" phrase to the construct being defined. In a use of the construct, an integral literal, a floating point literal, a name or a string literal must be provided to match this phrase. If a "Word" method for ActiveConstruct_t is provided by the construct's capsule, it will be called with the provided "word" as one of its parameters. The callout also receives a "ConstructWordKind_t" parameter saying which of the possible forms was provided.

The construct code is responsible for checking that the provided form is valid in the current context for the construct. Since the construct user determines which of the forms to provide, the construct code cannot be sure that the provided value is something that can be directly used in other Zed code. For example, the string literal form can contain things which are not valid in the Zed programming language. The name form will always provide a valid name, but the name may or may not be defined in the current context.

23.10 ConstructExpr

    export proc
    ConstructExpr(@ TempConstruct_t aTcon)void;

Adding an expression phrase to a construct will require that an actual use of the construct have an expression of some kind in the corresponding position. The expression is not constrained in any way - it can be as simple as a literal "0", or as complex as an explicit scope and 100 lines of code. As a small extension, the code allows an assignment statement as well as a simple expression.

If a construct has an expression phrase then it needs an "Expr" callout so that it can save the Exec_t for the expression. That callout should also do any needed checking on the nature and type of the expression, and issue error messages as needed.

23.11 ConstructBlock

    export proc
    ConstructBlock(@ TempConstruct_t aTcon)void;

"Block" phrases must be placed in constructs such that they are properly terminated within the construct. This is needed because the parser has rules about when a block, which consists of a number of statements with a possible final expression, is complete. Block phrases must be within sequence phrases ("23.13 ConstructSequenceStart and ConstructSequenceEnd") or list phrases ("23.15 ConstructList"). The exact rules for where block phrases can appear are complex, and changes may be needed to allow for more forms. Currently, a block phrase must be followed by one of 'fi', 'elif', 'else', 'od', 'do', 'incase', 'default', 'esac', 'corp', 'end', ';', ')' or '}'.

A "Block" method callout works the same as an "Expr" method callout.

23.12 ConstructTypeSf

    export proc
    ConstructTypeSf(@ TempConstruct_t aTcon)void;

This call adds a "type/storage-flags" phrase to the construct being defined. The corresponding method callout is method "TypeSf". The parser is instructed to parse a type, optionally followed by a set of storage flags. If no storage flags are present, the method will receive an empty set for "sf".

23.13 ConstructSequenceStart and ConstructSequenceEnd

    export proc
    ConstructSequenceStart(@ TempConstruct_t aTcon)void;

    export proc
    ConstructSequenceEnd(@ TempConstruct_t aTcon)void;

These calls are used when adding sequences of phrases to a construct. Sequence phrases are the simplest compound phrase kind. Corresponding method callouts are "SequenceStart" and "SequenceEnd".

Definition calls directly between the "ConstructSequenceStart" and the "ConstructSequenceEnd" add phrases to the sequence. Most constructs have at least one sequence, and some useful constructs have only a single sequence as their definition.

The entire group of calls which define a sequence phrase can be used as a single phrase, anywhere a phrase can occur. However, sequences cannot be directly nested, and sequences must contain at least one phrase within them.

23.14 ConstructAltStart, ConstructAltEnd and ConstructAltTrigger

    export proc
    ConstructAltStart(@ TempConstruct_t aTcon)void;

    export proc
    ConstructAltEnd(@ TempConstruct_t aTcon)void;

    export proc
    ConstructAltTrigger(@ TempConstruct_t aTcon; string str)void;

This set of programmer defined construct syntax definition calls allows the construct programmer to have multiple forms of the construct be accepted. A set of alternatives is defined, each one introduced by a different token or keyword. The code using the construct can use any one of the defined alternatives. It is also possible to specify that there can be no alternative present, i.e. the entire set of alternatives is optional when the construct is used.

"ConstructAltStart" and "ConstructAltEnd" delimit the set of alternatives. "ConstructAltTrigger" specifies the trigger token or keyword for one alternative, the body definition of which must directly follow the "ConstructAltTrigger" call. An alternative body can be of any simple or compound phrase kind.

If there is no alternative body specified after the "ConstructAltTrigger" call, then the trigger token or keyword becomes a flag - its presence is what is important to the construct. If the trigger value is nil, then it specifies that the entire alternative set is optional. For example, the "ForEach" list traversal construct uses optional alternative trigger "reverse", with no alternative body, as a flag to specify in which direction the list should be traversed. Note that an alternative that is optional must have the trigger which indicates it is optional, as well as one or more trigger values that are the set which is optional.

Method callout "AltTrigger" provides the construct programmer with information about construct uses involving alternative sets. "AltTrigger" is passed the trigger value used. There will be no callout if an optional alternative set is not present.

23.15 ConstructList

    export proc
    ConstructList(@ TempConstruct_t aTcon; string var startString;
                  string sepString; string var endString1, endString2)void;

The list phrase is the most complex kind of construct phrase. One occurence of a list phrase can be used to allow any number of list elements in an actual construct use.

The syntax creation proc, seen above, provides for one token/keyword that appears before each list element, one token that separates list elements and two tokens/keywords that can each mark the end of the list elements. All of these elements are individually optional (can be nil), but the separator token and either end token, when given, cannot be the same. If there is no separator token, the first end token/keyword must be given.

The separator token, if given, must be one of ',', ';', ':', '.', '/' or '-'). These are the typical tokens which separate elements in lists. If there is no start token/keyword, and either there is no separator token or there is no end token/keyword, error recovery from syntax errors involving the construct can be very poor.

Method callouts for lists are "ListStart", "ListElement" and "ListEnd". "ListElement" is called before an element is started. The "ListEnd" callout provides the number of elements that have been seen in the list.

The list element description is the phrase described by calls that occur immediately after the "ConstructList" call. The element can be any kind of phrase, but having it directly be another list phrase is unlikely to work - the parsing does not work. If nested lists are needed, having either element phrase be a sequence gives the parser something to better key on.

If there is an element start token/keyword, then each list element must start with that token/keyword. The start token will be consumed. If there is an element separator token, then the various list elements must be separated by that token; if there are no elements or only one element, then the separator token will not appear. Separator tokens will be consumed. If there are one or two end tokens/keywords, then the group of list elements will end when one is seen after a list element. Any list start token/keyword is checked for first. Then any list terminator token/keyword is checked for. If no terminator is found, then any separator token must be present. Terminator tokens will not be consumed - the should be handled externally.

23.16 Example Construct Syntaxes

The previous description described how things like list phrases are to be used, but it didn't *show* how they can be used. This section includes examples known to work. Only the syntax definition sequences are shown - often the semantics for the lists are lengthy and do not specifically help in understanding the syntactic possibilities.

One example which uses nested lists:

    Exec/ConstructStart(pctx, @tcon, "Nested2", Package/nl_private, nest2Start,
                        nil);
    Exec/ConstructList(@tcon, nil, ":", nil, nil);
    Exec/ConstructSequenceStart(@tcon);
    Exec/ConstructFixed(@tcon, "names");
    Exec/ConstructList(@tcon, nil, ",", ".", nil);
    Exec/ConstructName(@tcon);
    Exec/ConstructFixed(@tcon, ".");
    Exec/ConstructSequenceEnd(@tcon);
    Exec/ConstructEnd(@tcon);

"Nested2" represents the keyword "Nested2" followed by 0 or more interior lists, separated by ":". Each interior list starts with "names", and then must have 0 or more names, separated by ",". Examples:

    Nested2;
    Nested2 names.;
    Nested2 names Fred.;
    Nested2 names Pebbles, BamBam.;
    Nested2 names Fred, Barney.: names Betty, Wilma, Pebbles.;

Another example using nested lists:

    Exec/ConstructStart(pctx, @tcon, "Nested1", Package/nl_private, nest1Start,
                        nil);
    Exec/ConstructSequenceStart(@tcon);
    Exec/ConstructFixed(@tcon, "(");
    Exec/ConstructList(@tcon, nil, nil, ")", nil);
    Exec/ConstructSequenceStart(@tcon);
    Exec/ConstructTypeSf(@tcon);
    Exec/ConstructList(@tcon, nil, ",", nil, nil);
    Exec/ConstructName(@tcon);
    Exec/ConstructAltStart(@tcon);
    Exec/ConstructAltTrigger(@tcon, nil);
    Exec/ConstructAltTrigger(@tcon, ";");
    Exec/ConstructAltEnd(@tcon);
    Exec/ConstructSequenceEnd(@tcon);
    Exec/ConstructFixed(@tcon, ")");
    Exec/ConstructSequenceEnd(@tcon);
    Exec/ConstructEnd(@tcon);

"Nested1" roughly represents the parenthesized part of a Zed proc header. The use of "ConstructTypeSf" handles the type and storage class of the "parameters". Examples:

    Nested1();
    Nested1(uint Fred);
    Nested1(bool con Fred, Barney, Betty);
    Nested1(float ro Fred, Barney, BamBam; sint volatile Betty, Wilma, Pebbles);

The phrase syntax for the "ForEach" list traversal construct:

    Exec/ConstructStart(pctx, @tcon, "ForEach", Package/nl_export, runStart,
                        nil);
    Exec/ConstructSequenceStart(@tcon);
    Exec/ConstructName(@tcon);
    Exec/ConstructAltStart(@tcon);
    Exec/ConstructAltTrigger(@tcon, nil);
    Exec/ConstructAltTrigger(@tcon, "reverse");
    Exec/ConstructAltEnd(@tcon);
    Exec/ConstructFixed(@tcon, "in");
    Exec/ConstructExpr(@tcon);
    Exec/ConstructFixed(@tcon, "do");
    Exec/ConstructBlock(@tcon);
    Exec/ConstructFixed(@tcon, "od");
    Exec/ConstructSequenceEnd(@tcon);
    Exec/ConstructEnd(@tcon);

The phrase syntax for the "StringCase" case-on-string-value construct:

    Exec/ConstructStart(pctx, @tcon, "StringCase", Package/nl_export,
                        startStringCase, nil);
    Exec/ConstructSequenceStart(@tcon);

    Exec/ConstructExpr(@tcon);

    Exec/ConstructList(@tcon, "incase", nil, "esac", "default");
    Exec/ConstructSequenceStart(@tcon);
    Exec/ConstructExpr(@tcon);
    Exec/ConstructFixed(@tcon, ":");
    Exec/ConstructBlock(@tcon);
    Exec/ConstructSequenceEnd(@tcon);

    Exec/ConstructAltStart(@tcon);
    Exec/ConstructAltTrigger(@tcon, nil);
    Exec/ConstructAltTrigger(@tcon, "default");
    Exec/ConstructSequenceStart(@tcon);
    Exec/ConstructFixed(@tcon, ":");
    Exec/ConstructBlock(@tcon);
    Exec/ConstructSequenceEnd(@tcon);
    Exec/ConstructAltEnd(@tcon);

    Exec/ConstructFixed(@tcon, "esac");
    Exec/ConstructSequenceEnd(@tcon);
    Exec/ConstructEnd(@tcon);

There is a debugging routine provided which can help when defining the syntax of complex constructs. It is Exec/DumpConstruct, which can be called within the context of the definition proc used, as follows:

    assert assign con inf := Package/FindName(pctx, <name>);
    assert select pcon := inf->inf_progConstruct;
    Exec/DumpConstruct(pcon);

where "<name>" is the name of the construct just defined. The code looks up the name of the construct in the package it has just been defined in, asserts that the found entry is of kind inf_progConstruct, and passes the found Exec/ProgConstruct_t to "DumpConstruct".

23.17 Construct Semantics

The semantics of a construct can be pretty much anything that a programmer can dream up. Getting something complex working properly can be difficult, however. As mentioned earlier, start simple and test a lot. Some constructs will require mixing 'template' material with explicit calls to Exec code. Creating a properly working complex programmer defined construct is definitely well into "Zed Guru" territory!

29 Special Names

All names starting with "__" (two underscores) are reserved by the compiler and internal libraries - programmers should not, with any exceptions discussed in this section, declare entities with such names. As an example, names created by "Package/CreateNewName" and "Proc/CreateNewName" start with "__" as a means of avoiding a conflict with any legitimate programmer defined names. The "Fmt" code creates procs with such names, and the "Display" package will not, by default, display them.

Some special names are described in this section:

"_PackageInit_" - programmers can define a private proc with this name in each package or subpackage. The proc should have no parameters and a 'void' result. The Zed system will arrange to call the proc just after any explicit and implicit package variable and constant initialization, but before main program execution. The order of the calls across packages is not defined. This facility can be used for initial package setup which cannot be done (or is just very awkward) with simple variable/constant initializations.

Other special names are described elsewhere:

"6.6.3 "_IfBytecode_""
"10.10 _instantiate_"

Local variables whose names end with "_" (single underscore) will not trigger warnings if they are not used in their scope. This is a simple hack to avoid cluttering up test program diagnostics.

30 Warnings and Information

The Zed compiler can produce warnings about things it classifies as "questionable". It can also produce "information" messages that might be of interest to programmers. Both can be used to help the programmer produce code that is less likely to have unexpected bugs or performance issues. As described in "97.2 Current System Status", the warnings and information output are organized into levels, and the levels displayed can be controlled by flags on the Zed compiler's command line. The higher the level requested, the less likely the messages are to be desired by a given programmer.

The descriptions given should be sufficient for the warnings. The information messages might be less clear. The level 2 information message shows the name of generic procs which are instantiated. Some programming languages and compilers will always clone procs when the containing generic (or other such containing entity, such as a C++ template) is "instantiated". Initially I wanted to never have to clone generic procs, but as Zed's generics became more general (e.g. the addition of 'uint' parameters), it became necessary to clone some procs during compilation. With small procs, this is not an issue. However, if a proc is large, cloning it can add a noticeable amount of code to the final program. The information message identifies when proc cloning is needed.

The level 4 information message about the insertion of implicit checks for 'nil' is one that many programmers will not want to have enabled on all compilations - there can be far too much output. The problem is that there are many situations where the compiler simply cannot guarantee to itself that a value cannot possibly be 'nil'. For example, in Zed, you cannot initialize dynamically allocated matrixes. So, an address value ('@', pointer or tracked value) retrieved from a matrix cannot be guaranteed to not be 'nil'. The programmer can wrap such a value with a 'nonNil' construct to insert an explicit check, and thus avoid the message about an implicit check. Doing this can be ugly, but it does make all of the needed checks stand out.

More usefully, occasionally running with this information output enabled can point out situations where a single 'assert' 'assign' or other such test can avoid multiple implicit 'nil' checks. Alternatively, this output can point out situations where it would be valuable to declare some proc parameters as 'nonNil'. Doing that is much less intrusive and yields more efficient code. It is then necessary to ensure that all callers supply 'nonNil' actual parameter values, which can then create a new set of places where 'nonNil' values are needed. How far one pushes this is up to the individual programmer.

31 Conclusion

Go for it!

96 Philosophy

One aspect of the design of programming languages is that of the relative priority of various goals that the designer has. The primary goal of Zed is that of encouraging correct, efficient programming. The correctness goal has higher priority than efficiency. The goal of correctness is persued in many modern programming languages by having a lot of checks made at run-time, and not supporting some of the lower-level activities that languages like "C" allow. The Zed programming language attempts to support low-level activities while maintaining strict prevention of "bad" things.

One way that this is done is to only allow dangerous low-level activities like "unsafe" unions, type-casts and pointer manipulation when the programmer is a privileged programmer. Another, more generally useful way is to include language concepts which can reduce the inefficiency of run-time activities. More checking at compile time replaces checking at run time. Examples of this include the 'nonNil' concept, the capabilities of '@' safe pointers (especially '@' 'package'), the expanded 'for' loop forms, and the use of 'if' 'assign' to produce known 'nonNil' values from unknown values. Similarly, Zed's inclusion of fixed-size arrays as well as dynamic matrixes reduces the need for run time checks.

Another aspect of the Zed philosophy extends to provided libraries. This is the concept that the ultimate benefactor of programming, which includes the programming language used, its libraries, etc., is not the programmer, but is the end-user. Some programming languages provide concepts and features which can allow the programmer to quickly produce working code, or at least code that appears to be working. In fact, much such code is neither reliable nor maintainable - those things have been sacrificed in order to allow the programmer to reach the point of demonstration sooner.

It is not clear how the design of a programming language or library can influence such things. As an example, consider the "CharBuffer" package that Zed provides. There is no way in that library to undo an addition to a buffer. Some programming situations could take advantage of that capability. For example, there might be two different forms of display for a given type, and the programmer wants to use the form which produces the shortest display, on a case-by-case basis. An efficient implementation of this could be something like:

    <record CharBuffer position>
    <output form 1 to CharBuffer>
    <record length of form 1>
    <undo output of form 1>
    <output form 2 to CharBuffer>
    if form 1 length < form 2 length then
        <undo output of form 2>
        <output form 1 to CharBuffer>
    fi;

What should the "undo" interface look like? Perhaps it is a call which sets the current CharBuffer buffer position to a passed value, or adds a passed signed value to that position. The CharBuffer code would check to make sure that the new position is within the bounds of the available buffer.

Both of these interfaces, however, allow the programmer to move the current buffer position to beyond where any data has been added to the buffer. If the buffer is then displayed to the end-user, it can contain garbage characters because the programmer has an obscure bug in how the position is modified. So, those interfaces are too general for the Zed philosophy. Perhaps we provide an interface which accepts an unsigned value which is subtracted from the current buffer position. The library code would check to make sure it doesn't try to set the buffer position to less than 0.

This interface is safer - it doesn't allow garbage/uninitialized data to be displayed to the end-user. However, it still has another problem - a bug in the programmer's code can allow that code to move the current position too far back, and thus erase data that was put into the buffer by other code. Such a bug could be hard to diagnose, since the bad code is not directly associated with the output that is disappearing.

The Zed philosophy would say that no such interface should exist. The interface can make some things a bit simpler and more efficient, but the cost is that it allows obscure bugs to produce bad output. The programmer can do their work without that interface, with a bit more work and perhaps a bit more run-time cost, and the Zed philosophy says that is a good tradeoff.

This is perhaps not the best example in the world, but I hope that it conveys a bit of how I think about such things.

97 History and Status

97.1 Overall Philosophy

I call the project my "replace all of the world's software" project, and I actually sort-of mean that. I believe that lots of the world's software does need replacing. There is far too much duplication of code; there is far too much unmaintainable code; there is far too much generally buggy code. To fix things requires, I believe, a new programming language, a new programming philosophy, and fairly rigid adherence to standards of generality, correctness, re-use, style, etc. The Zed project is intended to encompass all of these things, but since I'm a person who works best in the concrete, rather than in the abstract, I've started by actually implementing things. The intent is to see how the language works out, and to explore various other issues as real work encounters them. At some point the language itself will become relatively stable, and I will progress to the system itself. My intent is that I will start with a system that runs on top of all of the major OS's (Windows, Linux, standard Unix, MacOS), and, far down the road, move to a system running independently on bare machines. The current implementation is based on my own bytecode, but my intent is to be able to generate native code as well. Since my system is AMD-64, that will almost certainly be the first native code used.

The above is far too ambitious - the programming language itself has grown to use up all of my time outside of other activities.

As of late 2022, code has been written to read/write Elf files (used on Linux systems for linked and unlinked binaries of native code). A code generator for X86-64 is now mostly complete, and I am proceeding with testing some of my more complex libraries with it (Fmt, DispAcc, etc.). The generated code is not very efficient, and various improvements are on my "to-do" list. Latest update: mid-2023 - generated X86-64 code is quite a bit better now. It is, obviously, not possible for me to produce code as efficient as that produced by industry standard compilers like gcc and llvm.

Why then, haven't I made my Zed compiler be a front-end for one of those? The answer is that those compilers are not written in fully safe programming languages. Thus, they may have some very subtle bugs because of that. The entire Zed compiler, including my new X86-64 code generation, is written in Zed, which is a safe programming language. Thus, those kinds of subtle bugs cannot be present. There are likely bugs of other kinds, of course.

Ignoring code dealing with Elf files, and one overall architecture file which is shared with my disassembler, the Zed X86-64 code generator is currently about 13,000 lines of Zed code. That quantity of code can be heavily reviewed, once released. At that point, I encourage as many people as possible to review the code generator, and indeed all of the Zed compiler, libraries, etc.

97.2 Current System Status

Note: this material contains outdated portions.

Currently, the system consists of a program, written in ANSI C, that provides a bytecode interpreter, some OS interface routines, parsing code, code generation for bytecode, and various pieces of glue. This was put together with pieces from previous projects. A lot of it is a mess. It will improve over time.

The C source is divided into two directories. The "FromZ" directory grows over time, and is the C equivalent to code I have written in Zed. Right now, the main pieces there are the guts of the language semantics. The other directory, "Hosted", currently includes the parser, the bytecode interpreter, the bytecode code generator, interfaces to the host OS, startup code, etc.

Several aspects of the Zed language require compile-time execution of Zed procs. These include compile-time '#' operators, Cli call handling, persistence and programmer defined constructs. Because of that, most of the code for those aspects is written only in Zed - there is no equivalent in C, only stubs that call out to the bytecode engine to run the Zed versions. As you can imagine, when I first did this I kept the old C stuff around for quite a while before deleting it - not even trusting my repository.

In order to compile more than just simple Zed code using "-n", it is necessary to have environment variable "ZED_SRC" contain a path to the ".../Src/Zed" directory so that the compiler can find library files, etc. You can start a file path with "+" and the contents of "ZED_SRC" will be prefixed onto it. This is most useful inside shell scripts or makefiles.

The Zed system is designed to both have a garbage collector and to use reference counting. Reference counting is present, but there is no garbage collector. Since initial program data is built up by C code, but must be usable by Zed code, I have had to ensure that shared data structures are identical in both languages. Thus, the C data structures contain reference counts and type tags. If they are still needed when a garbage collector is written, they will likely grow some more.

I call dynamic data structures that participate in reference counting and garbage collection "tracked" values. Values that low-level code allocates and manipulates directly with simple pointers are not tracked. Tracked values contain pointers to the type structures defining them. Those type structures are themselves tracked values. Clearly, some kludging is needed to bootstrap the system. It's ugly. Also, because the C code and Zed code share everything, at least one Zed Package's package-level variables are pre-allocated and pre-populated by C code. That is clearly ugly and fragile. It will get better.

Data structures and pointers can readily be shared between C and Zed. However, function pointers cannot (native versus bytecode). To gain generality, function pointers are needed for some important uses. This is an outstanding issue - currently Zed code that mirrors C code simply dummies out the calls that would land in C code instead of Zed code. [Update: The current main use here is in printing out error messages. To allow that to work better, I've added a builtin function that addresses the Resources (language-specific error messages for now).]

The current system is a single binary, called "zed". It accepts a series of Zed source files as its command-line arguments. Source files are either part of an explicit Zed package or not. If a source file is not such a part, then it is compiled into the root ("/") package. After all is compiled, the system looks for an exported proc called "main" or "ZedMain" in the root package. If found, it is called. "main" can have no parameters, can have a single vector of string parameters. "ZedMain" is typically a 'cliProc' and will be called as such, which requires additional complexity. Source files which are part of packages are simply compiled. All compilation yields in-memory data structures only - writing compiled information to files is not yet ready. [There is an annoying issue that I don't plan to fix because it won't be relevant in the future - you can't use the "Fmt" procs in package "/". This relates to package variable space creation.]

Errors at run time of bytecode currently generate a traceback and end execution. There are also a few direct "abort" calls in the C code - those must be caught and examined using a native debugger.

The command line looks like this:

    zed -aAzfwncsqQe -BsE<lt>sizeE<gt> -BcE<lt>countE<gt> -CcE<lt>countE<gt> -WE<lt>levelE<gt> -OE<lt>levelE<gt> -IE<lt>levelE<gt> {file.zed} [-- program-args]

The following flags exist:

    -a: abort the run if internal C function "zabort" is called. I use this
        when debugging, running inside a debugger.

    -A: abort on error - if the inner error-counting code is called, abort.

    -z: when the system does need to exit, use an "exit" rather than an
        "abort", so that the issue is not externally seen as a failure. This
        is useful when scripting test runs with expected failures.

    -f: compile with the full set of system source files, rather than the
        default smaller set. Any program using "Fmt" or most compile-time stuff
        needs this flag.

    -w: compile with the full set of system source files, plus the source files
        for support of persistent variables. The package initialization code
        will use file-based persistence (filename "ZedWorld"). See -Bs, -Bc
        and -Cc. Note that the persistent store is not setup using just "-f".

    -n: compile with *no* initial sources. Some simple programs work with this.

    -c: compile only - don't try to find and run "main"

    -s: show the names of system sources as they are compiled. This also
        includes parenthesized notes about the allocations of package variables
        and constants, and the calling of package initialization procs.

    -q: run quiet - show no non-error/warning/info processing messages. Will
        continue to show messages about the allocation of space for and the
        initialization of user source files.

    -Q: run very quiet - show no system informational messages - the only
        output will be from the user program.

    -e: activate the error trapping facility of the "Errors" package, normally
        used during automated testing.

    -BsE<lt>sizeE<gt>: set the block size for persistent store. Default 4096.

    -BcE<lt>countE<gt>: set the block count for persistent store. Default 100.

    -CcE<lt>countE<gt>: set the block count for persistence cache. Default 10.

    -WE<lt>levelE<gt>: set the warning level. Default is level 1.
        level 1:
            - warn if comparing a string against 'nil' using '=' or '~='
            - warn if comparing a 'char' value against a 'string' constant
            - warn about special binary operands that destroy the value, such
                as multiplying by 0, '|'-ing with ~0, etc.
            - warn if a comparison result is known because of operand size
            - warn if aliasing detected across '@' array proc parameters
            - warn about aliasing across matrix proc parameters
            - warn if a value known at compile time is too large when being
                assigned, etc. to a 'bits8', 'bits16' or 'bits32' variable.
            - warn if an enum member, variant record selector or 'oneof' value
                is not included in a 'case' statement with no 'default'.
            - warn if code is unreachable
            - warn if a local name is defined, but never used
            - warn that 'ro' is redundant on a 'private' record/capsule field
            - warn if a local variable is defined but never used
            - warn if a non-'abort' proc never returns
            - warn if a proc formal is declared with unneeded 'con'
            - warn when an instantiating type for an 'inline' field is not a
                struct
            - warn when an implicit 'oneof' name duplicates a previous name
            - warn when a new 'oneof' name is being defined and the given value
                is from a different 'oneof' type
        level 2:
            - warn about special no-op operands in binary operations, such as
                multiplying by 1, '|'-ing with 0, etc.
            - warn about shift/rotate amounts greater than the operand bit size
            - warn if operation on integral constants exceeds the bit size
            - warn if values might compare equal because both are 'nil'
            - warn if assigning a 'template' value to 'any' or 'autoAny'
            - warn if array index type is too small for range
            - warn if a substring operation which can be done at compile time
                yields an empty string.
            - warn about a counting 'for' loop with no iterations
            - warn about capsules declared 'final' which have a 'partial'
                internal interface
            - warn about methods from 'partial' interfaces with no
                implementation available for a 'final' capsule which
                implements the interface
            - warn when a capsule is just a record (no interfaces)
        level 3:
            [Many people will not want level 3 warnings on by default. They
            can warn about things which are perfectly correct. However, it
            might be useful to enable level 3 warnings every now and then, and
            check that all of them are expected.]
            - warn about copying of values > 100 bytes long
            - warn if a shift of a constant loses non-zero bits
            - warn if shift/rotate amount is 0
            - warn if shift result is 0
            - warn if 'case' with 'uint' or 'oneof' has no 'default'
            - warn if explicit C<'nonNil'> on C<'assign'> variable
            - warn about proc formals declared with 'var' or locals declared
                without 'con' that can never be modified.
            - warn about local variables which consume a lot of memory

    -OE<lt>levelE<gt>: set the optimization level. Default is 2. The semantic
            part of the compiler, and the byte-code code generator, do not
            distinuish between levels 0 and 1. That difference is available for
            native code generators. The optimizations listed here are in the
            semantic part of the compiler. Level 0 is not recommended unless
            you are dealing directly with a code generator.
        level 2:
            - binary operator optimizations for special values (e.g.
                multiply by 1, add 0, 'and' with 'true'). Note that basic
                constant folding is always done, unless the expression is
                inside a 'strict' section, as specified by the language.
            - "wrapper procs" - if a proc does nothing but pass its parameters
                to another proc with the same formals, skip the call. This also
                applies to using procs as values.
            - use direct capsule method calls instead of indirect where the
                method is known at compile time
            - C<'assert'>s where the C<'bool'> condition is known at compile
                time to be 'true' are removed
            - the artifact "(@XXX)@" is replaced with "XXX". This can happen
                with array and struct constructors used directly.
            - (non-)equality comparisons against the empty string are changed
                into comparisons of the string length against 0.

    -IE<lt>levelE<gt>: set the information level. Default is 1.
        level 2:
            - show the names of generic procs which are instantiated
        level 4:
            - show where code generators must insert implicit run-time checks
                for 'nil' values, if optimization cannot prove such are not
                needed.

See "18.5 Library "ctProc" Procs" for information on locally overriding the warning, optimization and information levels set by the above flags.

Recent work has been producing X86-64 native code for Zed. This is all done with Zed code, as is the code dealing with Elf object files. As of July 2024, I have been able to produce a binary, which is linked to be "zedc", which is a standalone native compiler for Zed. It is produced from the Zed versions of the lexical scanner, the parser and all of the semantic code. It does not yet include the bytecode engine, and so cannot compile programs which use compile-time execution (e.g. the "Fmt" formatting stuff). My main testing of "zedc" so far has been compiling over 2000 lines of Zed code into a .o file which, when linked with my "zed.a" library and the X11 library, produces a binary which plays my "Wrung" game (from CP/M times). I've run that binary under Microsoft WSL (Windows Subsystem for Linux) on my new Windows 11 laptop - good work by MS on the compatibility.

98 Issues and Future Work

98.1 Constructors Versus Initializers

Zed has both initializers and constructors for arrays and structs, but only constructors for records, capsules and vectors. What is the difference?

The difference is in what the compiler knows about what to expect. The syntax for an array or struct initializer is:

<type>
<name for new constant/variable>
':=' or '='
<initial values for constant/variable>

In this syntax, when it encounters the initial values, the compiler knows what type of values to expect. This is true even when a value is compound, e.g. an initializer for an array of structs. So, having any inner compound values within the initializer be surrounded just by '['/']' (for arrays) or '{'/'}' (for structs) works.

However, if a compound value is needed by itself, separate from a declaration of something which can directly hold it, the compiler doesn't know what type(s) of values to expect. For example, what would be the type of these:

    [1, 0xff, 194]
    {2.7, "A", [1, 2, 3], false}

In the first example, all 3 values are positive and less than 256, so are compatible with 'uint', 'sint' and all "bitsXX" types. The second element in the second example could be either a character or a string. Zed rules make it a character by default, but in an initializer situation, where the type of the struct is known, a 'string' field will accept it.

To allow anonymous values of array and struct types to be created in code, a way is needed to tell the compiler what type they are to be of. This is done by naming the array type or using the name of the struct type. Values are then provided, inside parentheses after the type name, just as in other kinds of constructors with that same syntax. Within those parentheses, the expected type is known, and so the syntax goes back to the initializer syntax which does not need type names.

When array and struct constructors are used, the values must be put into memory somewhere. In record and capsule constructors, new memory is allocated for that purpose. In bits constructors, no memory is needed since the entire value is a single non-compound value. To stress that the values for array and struct constructors are "somewhere else", those constructors yield '@' of the actual compound value. This is usually as desired for things like proc parameters, conditional expressions, etc.

It is allowed to initialize an array or struct variable with a constructor followed by an '@'. However, there is no reason not to use an initializer, and using a constructor is less efficient because the compound value is being constructed, and then assigned to its final destination.

If ':=' is used in an initializer, a variable is created; if '=' is used, a constant is created. Both exist in memory, and so '@' of them can be taken. Both can be done at the package level or inside procs. When initializing a one-dimensional array, the bound can be '*', indicating that the number of elements in the initializer determines the bound.

98.2 Operations on Compound Values

Zed is not consistent in how it handles compound values. C allows struct values to be used as proc parameters, but Zed does not. Zed allows array and struct values to be assigned. Zed does not allow array and struct fields to be initialized in record or capsule constructors. Zed allows array and struct fields to be initialized in array and struct constructors or initializers.

This inconsistency came out of practical needs in the compiler. Initializers are much less useful if compound values within them cannot be initialized, so that capability really couldn't be left out. Since the syntax and compiler infrastructure for inner initializer elements is reused for inner array and struct constructor elements, it would have been slightly more work to not allow them.

One of my philosophies for Zed is that there shouldn't be hidden costs unless they are not avoidable. For example, the cost of memory allocation is not avoidable with record and capsule constructors. The small cost of conversion is not avoidable when converting from a capsule value to a value of an interface type that the capsule implements. The cost of extra operations when using reference counting with tracked values is not avoidable.

Allowing, for example, assignment of compound values, violates the idea that there shouldn't be hidden costs. If I am willing to pay that cost for assignment, why aren't I willing to pay it for proc parameters? One answer to that question is that allowing array and struct values as direct proc parameters can have the hidden extra cost of using up large amounts of stack space for the call. I have personally seen a situation in C code where a struct was passed directly, and it had clearly grown a lot in size over time, such that in our situation, the thread stack overflowed.

I welcome reasoned suggestions for how to deal with these inconsistencies. I like to protect unwary programmers, but I also like to let experienced programmers accomplish things with less explicit source code.

98.3 Threading and Parallelism

As of this writing (December, 2020), Zed has no features directly related to multithreading or parallelism. If it is to be fully general and yet safe, it needs such things. What they should be, even in general, has not yet been decided. I am a fan of "fibers" - very lightweight threads, rather than the heavier "pthreads" model of multithreading. Neither Zed nor its libraries currently does anything with the concepts of signals or exceptions. I believe I prefer it that way, so the heavier model might not be needed.

Since I used to work for Myrias Research Corporation, a company which built early parallel computers based on commodity microprocessors, my thinking tends to lean in that direction. However, I don't like the overhead that the full memory model, with merging, requires.

I'm also not a big fan of explicit message passing - it seems too much like programming in assembler. It can also be difficult for programmers to get right in non-trivial situations.

One thought in my mind is that since Zed has several properties which can be associated with memory references, perhaps it is possible to invent more such properties, which can be used to govern which code has access to which storage, in relation to main-thread ("parent") or worker thread ("child"). However, for anything like that to work, the language must provide ways for the programmer to indicate under what conditions a given piece of code is allowed to run.

Another possibility is to do explicit storage control on parallelism, like the OpenMP facility in gcc.

99 Some Decisions Made

With enough convincing, some of these decisions could be changed. Part of my thinking is that these are issues which can be worked around by the programmer. There are more important things for me to be doing, both with the Zed language itself, and with the rest of the Zed ecosystem.

99.1 "nilOk"

[At one point I had decided that there would be no 'nilOk' in Zed. The concept is now in the language, so most of the material here has been deleted.]

I thought of having a 'nilOk' attribute on constructors which yield a tracked value, so that programmers could have control of situations where available memory is constrained, or where an out-of-control program allocates more memory than can be given to it. The idea with 'nilOk' is to add such a token to the constructors, which then tells the compiler to use constructor code which is allowed to return 'nil' if there is no available memory. For example:

    if assign [] float nonNil bigVec := matrix nilOk([huge] float) then
        <work with bigVec>
    else
        <could not allocate bigVec - do something else>
    fi;

Similar syntax and semantics could apply to record and capsule constructors.

While this might be a convenient way of doing things, it isn't necessary. If there is a way to tell the run time to allow allocation failure, then there can just as easily be a way to ask it if an allocation would fail. Then the above code would become something like:

    if <can allocate "huge" vector of float> then
        [] float nonNil bigVec := matrix([huge] float);
        <work with bigVec>
    else
        <could not allocate bigVec - do something else>
    fi;

[Minor quibble: in a situation where other code is running in the same address space at the same time (multi-threading), it is possible for the above question about available space to return "yes" but the allocation itself fails, due to the non-atomicity of the pair.]

99.2 Struct Parameters and Results

C allows function arguments and results to be of struct types. Zed does not. I've looked at doing this, and for now have decided against it. This would be relevant for arrays as well. There are a couple of reasons why you might want to pass structs by value to procs.

One reason is efficiency - you could write efficient routines to do complex number operations (e.g. add/subtract/multiply/divide) if you could pass the parameters directly to them, and get a result directly back. This avoids having to do '@' of the parameters, and having to pass an extra '@' parameter to receive the result. Presumeably generated native code could note that the address is never taken of any part of the parameters, and thus the entire proc could do its work in native floating-point registers.

A second reason is a semantic one. When a reference to a struct (or array) is passed via 'ro' '@', although the proc cannot change the value internally, it is possible that the referenced struct can be modified via side effect while the proc is active. By passing the struct to the proc directly, a copy is made (even if the compiler is able to optimize out the actual copying), so that it cannot be changed by side effects of other procs called. This makes it easier to be sure of the result of the proc, and also allows the compiler to optimize better. In Zed, the programmer must explicitly copy the struct into a local variable in order to get the same benefits.

A disadvantage of allowing direct struct/array parameters is that the value copy becomes implicit, rather than explicit. With simple structs, such as a two-'float' complex type, this is not a problem. However, the implicit copying can result in non-obvious inefficiencies if the struct grows over time, acquiring tracked fields, etc. It is also possible that the struct type can become fairly large, if it is dependent on separate constants for array sizes, etc. Then, a deep calling tree using parameters of such a type can end up using a lot of stack space, again without there being a readily visible clue.

The X86-64 native calling interface for struct parameters can be tricky to get correct. For example, a struct containing a pair of 32 bit values and a 64 bit value should be passed in a pair of 64 bit registers. The alignment of the fields within the struct, as well as the total struct size affect the required passing convention. If a proc with a struct parameter takes the address of a field of the struct (or the entire struct), via '@' or '&', then the struct must be saved to memory from the registers it is passed in.

99.3 "in", "out" and "inout" Parameters

All proc parameters in Zed are "in" parameters, in that values are being passed into the proc being called. It is possible to have "out" parameters, and "inout" parameters.

An "out" parameter is a value being returned from the proc call to whatever is given for that parameter on the call. This is like using the result value from a proc call, except that a proc can have multiple "out" parameters, and thus return multiple results.

An "inout" parameter passes values both ways - into the proc at the beginning of the call, and out at the end of the call.

These mechanisms provide another way to avoid the issues with side effects and aliasing that '@' parameters can have.

I have decided not to put these features into Zed. The benefit is not worth the risk, to me. Programmers used to most common current programming languages would not be expecting "inout" or "out" parameters, and could make serious mistakes based on not understanding what is going on, if they only look at the call sites.

99.4 Multiple Proc Results

Some programming languages allow procs to directly return multiple results. In C, this can be sort-of done by having the proc return a struct containing the various results, and then selecting fields from the returned struct. As seen above, Zed does not allow returning struct results, so returning multiple independent results would have some value. However, at this point in time I have decided not to do this.

99.5 Counting "for" Variations

The basic counting 'for' loop allows unsigned values to range from a given starting value, increasing or decreasing by 1 for each iteration, upto or downto a given ending value. Other possibilities exist. There could be a "step" or "by" clause to provide increments/decrements other than 1. I have chosen not to implement this extension. It would take some time to implement and thoroughly test, and much of the functionality can be done using "general" 'for' loops, as shown in the description "6.11 For Statement".

99.6 Template Types

As described earlier, when a proc formal or local of a template type is referenced within a template section, it is handled as a value of the type with the 'template' removed. This allows the value to participate in operations that the non-template type can be used for. For example, if expression "n + 1" is used in a template section and "n" is a "template uint", the result will be of type 'uint', and will be inserted into code as the addition of 1 to whatever is represented by "n".

Another possibility could be that the result of such an "addition" is of type 'template' 'uint'. But, values of template types must actually be of type Exec/Exec_t. So, the "n + 1" expression would have to be such that it builds an Exec_t tree representing the addition, and yields that. In the Zed language as described in the main body, that can be achieved by putting the "n + 1" inside a template expression.

This alternate interpretation could likely be done, although it would take a fair amount of work to find and update all of the places in the Zed compiler which require or otherwise work with specific types or kinds of types. It could also be a significant maintainance burden, since language changes might then require new code in multiple places. Furthermore, library code and user code which deals with specific types or classes of types at compile time would also have to deal with template types (or risk annoying users of that code).

In many cases, what would need to be done is to treat expression types with one level of "template" removed, just like is done now with proc formal and local references within template sections. This would have to be done in dozens of places in the compiler.

Some readers might also be uncomfortable with "n + 1" not yielding a value of type 'uint' (or perhaps 'sint'). The only benefit to this scheme is the need for fewer (none?) template expressions. At this point it would take considerable persuasion for me to go to the effort of trying this. Also keep in mind that there could be pitfalls not yet thought of.

Sometimes, while working with template values, the programmer will want to examine details of the actual Exec_t value. As has been mentioned in the main body, there are risks with this. Currently, the way to do this is to declare a temporary variable of type Exec_t and assign the template value to it. Then, the various Exec_t fields can be referenced via that temporary.

One possible alternative is to make brace brackets around a template value treat the template value as an Exec_t. E.g.

    if {templateValue}->ex_nonNil then ...

There is no run-time cost associated with this - it is merely forgetting the knowlege of the template type at the compilation level. If this were done, the compatibility rule that makes template values compatible with Exec_t could be done away with.

Other non-declaration uses of brace brackets in Zed have the notion of some sort of run-time indirection going on. Using braces as a simple compile-time type change would be a bit inconsistent. Other possible syntaxes would either be totally new, or almost as lengthy as using a temporary variable.

Another alternative would be to allow field selection of Exec_t fields on any template value outside of template sections. I dislike this option for the confusion it could cause - the exact same syntax, e.g. "tExec->ex_nonNil" would mean totally different things between normal code and code within template sections. (The assumption here is that "tExec" is of type "template Exec/Exec_t".)

My current decision is to not implement either of these options - they simply do not provide enough benefit.

99.7 Strings and Char Vectors

Zed allows one-dimensional vectors of 'char' to be used a lot like strings - you can use them where 'string' values are needed, and you can take substrings of them (yielding 'string' values). When used where a 'string' is needed, the vector value is copied into new memory, typed as 'string'. At one point I wanted to be able to do a string compare of a 'string' value with a "[] char" value, and Zed doesn't let me.

A problem here is that there is already comparison semantics on "[] char", just like for all matrix values of compatibile types - you can do comparisons on the pointer values themselves. It would likely be possible to allow comparisons versus strings so long as one of the two operands is 'string' and the other is "[] char". Since 'string' and "[] char" are not the same type, they would not normally be comparable, so the "compare the contents" comparison could be used, rather than the "compare pointers" comparison which the language disallows.

This language change has not been done however - it isn't all that important, and it could prove to be overly confusing.

99.8 Specification Issues

As stated in "1 Introduction", there is no formal specification of the Zed language. This includes the programming language, its libraries, and any overall environment that may or may not be available. There are a number of reasons for this:

syntactic details of the language have changed over time. For example, short-form declarations were not initially present.
basic capabilities of the language have increased over time. For example, persistence was not initially available.
I am the wrong person to create a proper specification. I could create a separate grammar for the syntax of the language, but that already exists scattered throughout this document. Creating a specification for the semantics of the language, ignoring compile-time operation and templates, should be do-able, but I feel it would use a large amount of my time, which is best spent on other things.

I believe I am a fair programmer, so working on programming the Zed system and tools makes sense for me. I do not work well (if at all!) in the abstract - if I tried to make a formal specification for even the non-compile-time nature of the Zed language, I would likely produce an invalid specification which is also incorrect. The attempt would take a long time, and that would likely be wasted time.

When the compile-time aspects of the Zed programming language are added in, a full formal specification may well require new specification formalisms. That sort of work is beyond my capabilities. It is also completely outside of my interests.

I am not convinced that such a complete formal specification is necessary, or even desireable. The number of people who could work with it is likely quite low, and the portion of those who would be using Zed as a programming tool is also likely to be low. If people whose interests lie in that direction wish to attempt to create such a formal specification, I can't stop them, but I also cannot guarantee that it would ever be considered "official".

See Wikipedia entries for "W-grammar", "Algol 68", "FLACC" and "Barry Mailloux". Dr. Mailloux was my university professor for a programming languages and compiler construction course. My first non-campus job had co-workers Chris Thomson and Colin Broughton, implementors of FLACC.

99.9 Historical Notes

My introductory university course in compilers and programming languages was very influential on my work. The instructor, Barry Mailloux, became a friend, and, by coincidence, later in life I ended up spending over a decade in an apartment with his widow as my neighbour.

As a graduate student, my thesis was the design of a programming language called ALAI (A Language for Artificial Intelligence). My supervisor wanted me to implement it. At the time, I only had access to an IBM mainframe running the MTS operating system. I wasn't happy with the compilers available, so I used the AlgolW compiler to implement a compiler for what I called the "QD" (Quick and Dirty) programming language, producing IBM 370 machine code. I started work on ALAI, but found that QD, being sort-of typeless, wasn't a good tool. So, I used QD to write a new language and compiler, "QC" (Quick and Clean"). I wanted to go further, but my supervisor balked at that, so real work on ALAI started, but eventually stopped. Both QD and QC were used by myself and others for a few projects.

Early on, I got an 8 bit (Intel 8085 - a version of the 8080) home computer running the CP/M operating system. Over time I created the Draco (someone made an entry on Wikipedia!) programming language, with compiler, assembler, linker, disassembler, librarian and cross-referencer.

That compiler and linker may well be the pinnacle of my life's work. The compiler was written in itself, and running in less than 64K of memory, it could, in one pass: read source, do constant folding and conditional compilation, do some simple optimizations, and write out an object file. My object file format was based on procs, not source files, so my linker could pick and choose individual procs from libraries. The linker would run in one pass, re-scanning libraries as needed, and could switch to a two-pass mode if it ran out of memory for symbols and the output binary.

I used CP/M Draco to produce a graphical game system called "Explore", with a scenario called "Amelon". It used a combination of tile graphics "grids" and vector graphics "mazes". I was aiming to be similar to the early Ultima games, but did not go as far. I had a graphical tool I called "ExpCre" which was used to design grids and mazes.

By means of cross compilation, I then produced a version of Draco for the Commodore Amiga computers, running on a Motorola 68000 CPU. That compiler added built-in floating point and a peep-hole optimizer. This would have been in the late 1980's. That version of Draco was used around the world, but most of the users seemed to be in Europe - my guess is that because of the multi-lingual environment there, people were more inclined to try different programming languages than those in the mostly unilingual western world, where C was dominant. Two releases of Amiga Draco appeared on the "Fish Disks".

Using Amiga Draco I ported my "Explore" system to the Amiga, but it did not have good graphics, and no sound, so it was never actually released (I think!)

I used Draco to implement my version of the Peter Langston Empire multiplayer game (no graphics) which is an economic and military game set roughly in World War II.

I also designed and implemented my AmigaMUD multiplayer system. This system included an implicitly persistent datastore and a full programming language with which to implement scenarios (worlds). Simple graphics and sound were implemented in the client program. Checking just now, the source for the entire system is over 70,000 lines of Draco headers and code. The provided world, plus a few small extras, is 35,000 lines of code in the AmigaMUD language. Copyright in the files is 1997. I later converted this project to C on Linux (CGMUD), with a friend writing the graphical client in Java. Unfortunately everything, including the bytecode machine, assumes a 32 bit system, so it does not compile and run on today's 64 bit machines.

Sometime in there I started on a library of code with which compilers for various programming languages could be created. I got a fair ways (5000 lines) with this, but eventually lost interest.

Actual Zed work began in April 2002. It was based on a version of the language and bytecode system from CGMUD. "Towers of Hanoi" ran on about April 21. I now (late February 2023) have over 56,000 lines of notes, to-do's, resolved issues, etc.

The language has been slowly evolving and growing over 2 decades. My '@' concept, being an explicit variant of "refs" in C++, was first described around November 2003, with "ref" mentioned in April 2002. Generics were first discussed in April 2002. I started down a long wrong path with "bundles", including "generic bundles", in January 2005 - they eventually went away and were replaced with generics, capsules and interfaces. My first thoughts towards Zed "templates" were in August 2009, but the first hints of that sort of thing were in my ALAI language decades previous. My 'nonNil' concept was first thought about in November 2010.

The latest major addition to the Zed language and compiler has been programmer defined constructs. I've already defined a utility one - a "string 'case". My first thoughts on this date back to 2010, with a suggestion from Andy Glew in USENET newsgroup "comp.arch". Serious thinking and experimentation started in July 2020.

I have used programmer defined constructs to implement something close to the "Six" adventure game writing language that I and a friend developed on the mainframe (in QD). That language had some unusual syntaxes to simplify creation of rooms, objects and verbs. I wrote the compiler for that, and my friend wrote the virtual machine which executes the "bytecode" I produced. Several people on campus used it to produce adventure games of their own, and I used it to write a version of the classic "Adventure" game, which now runs under Zed.

There was an early start at X86-64 native code starting around March 2014 - work on it was sporadic. The current form of such code generation started around January 2022, and as of February 2023 is mostly complete. This includes writing Elf utilities (reading, writing and a dumper) and a dissassembler (only for instructions I generate, e.g. no X87). My code generator is simplistic by some standards, but I believe the code it produces is "acceptable".

Unfortunately, "Six" doesn't work under the current native code because it has a strange mixture of compile-time and run-time. Compile-time activity is used to build various hash tables (each "room", "verb" and "thing" is a hash table), but with native code those tables are not available at runtime. Unfortunate - I had hoped to produce a standalone binary for my version of the classic "Adventure" game, plus a game of my own.