Contents and ideas Copyright 2002-2010 by Chris Gray. Date of initial publication: April 18, 2010. Any new ideas and concepts in this document are owned by Chris Gray and may not be used in any patent or other restricted use intellectual property instrument without the physical written permission of Chris Gray. April 13, 2002 and later - initial notes I've initially picked the name "Aleph" for this project. That's the first letter in the Hebrew alphabet or something. Q. What is this project? A. Replace all software. All of it. The bulk should reduced by several orders of magnitude. Q. Do you seriously expect to be able to do this? A. No. But I'm hoping the attempt itself will be fun. I'll be happy if I can end up with something that I can use to do useful applications. A. Aren't there serious problems with some of the ideas here, such that the whole scheme is impractical or unworkable? A. Almost certainly. How far can I go, however? I'm not afraid to redesign huge amounts of stuff if that is needed. Q. Are you concerned about compatibility with existing programming languages, operating systems, protocols, formats, etc? A. Not really. I'll need existing stuff to bootstrap from. If this thing ever goes anywhere, converters and interfaces will likely appear. Q. Why are you doing this? A. Because as time goes by, I realize that I am simply growing more and more unhappy with the current state of computer software in general. Instead of grumping about it, I'm going to try to make an attempt at doing something about it. Most of the world happily uses Microsoft software. I cannot. It continues to grow in size and requirements, while not providing the few things I require from computer systems: reliability, trustworthiness, and guaranteed privacy. Given its status as closed-source proprietary software, there isn't anything that will ever be done about it. Linux systems and other open-source Unix-based systems are very old technology at their core. All of the commercial Unix systems were long since rewritten in more modern styles. The bit I know of the Linux kernel tells me that I don't want to know more. Applications are becoming larger, and more dependent on huge GUI libaries, unsafe programming systems, etc. The simplicity, predictability and reliability of the Linux world is vanishing. The fact that the source is open doesn't really help if there are hundred's of thousands of lines of poorly written software to look through. The very Unix model is outdated. I/O should be asynchronous at its heart, not synchronous, for example. Having everything be a raw file of bytes is powerful, but I'm not convinced it is the best way to go. Changing from these basic assumptions is just not possible in a Unix environment - they are too deeply embedded. Q. So what do you propose? A. See the rest of the documents, starting right here: Some buzz-phrases: A family of programming languages, all with shared base elements, that is used to program everything. Things like pointers are an extension that is used to make a low-level programming language, for example. They are not used in applications. Applications are written in completely safe variants of the base language. Persistence. Variables in some of the higher level programming languages can be persistent, i.e. stored on non-volatile storage like disks. All variables can be persisted, regardless of type. This includes variables of reference types (*not* raw pointers). The system guarantees the atomicity of the persistence, regardless of the type or size of the variable being persisted. (Is that possible even theoretically?) Linearization. All types of variables in the higher level languages can be turned into byte sequences that are independently and universally interpretable. Such byte sequences are the basis for the persistence, but are also used for transmission over networks. Such forms are the *only* Aleph-internal way that information is stored or transmitted. All other representations are considered to be external. Code is a first class entity. Functions in the various languages can be linearized just like anything else. They can thus be persisted and transmitted. Code can readily be built up at run time - the concepts of "compilation", "linking", etc. are not directly used. (Except perhaps when producing raw binary code for the lowest-levels of the system, whether for the current platform or for some other.) Types are first class entities. They can be linearized and transmitted just like anything else. They are universally decodable, and universally identifiable. No, I'm not really sure what I mean by that. The system provides "playpens" in which pieces of code can run. Such playpens can limit the amount of persistent storage available to the code, can limit which persistent variables external to the playpen can be read or written, and can limit both the proportional and total amounts of system resources the playpen can consume. All configuration options, program input and output, settings, etc. are done as persistent variables. There should be no need to have text or binary configuration files dealt with directly by any programs. It is of course possible to do that, but I fully intend to reject any code that does so. Such persistent variables are viewable and editable by the standard system facilities for doing so (intended to be GUI based). The entire system is visible to non-playpenned code. All system code is represented the standard format as well as any compiled binary or bytecode forms that may be present. Thus, all system code can be browsed. A GUI is assumed. That doesn't mean there can't be text shells as well. However, the standard way of operating is with a GUI. Some intended results: All programming languages on the system are consistent. They are all based on the core language, with whatever extensions are appropriate for their particular niche. Some variants that could be useful: - low-level programming language useful for writing the heart of the system itself, and perhaps for device drivers, etc. - application level language, which adds things like a proper "string" type, language-defined collection types, etc. - shell language - spreadsheet language - editor language - "web page" language (runs on the client machine) - "web server" language (runs on the server machine) - "database" language (if one is even needed) - text formatter language (some folks still like to use text formatting languages, rather than word processors) A "web page" is really just a value of some type ("DisplayPage"?) that can contain images, runnable code, formattable text, references to other "pages" (including on other machines), etc. A "browser" simply(!) has to render the page within its given GUI frame, using the basic system facilities for rendering those kinds of things. Any code in such a page will of course run inside a playpen created for it by the browser. Thus, things like "cookies" are just persistent values that the playpen allows such code to have. Also, control over what a given page can do is controlled by the playpen settings for it. Thus, there should be no security issues with running code retrieved from remote sites, so long as the browser doesn't allow stupid settings on the playpens. I'm sure this will get very difficult. Pages are sent between the server and the browser simply as the linearized form of the page values. Email and news are just linearized values of some type. They are transmitted and stored as such. As such, they can contain anything. The issue of decoding "attachments" becomes rather moot - all systems can decode them, but might lack code to display them in the intended form (they can of course display them in a default form, since all systems can display any value using default forms). What is a spread-sheet? It is a value of some type, such as "SpreadSheet". Such values would likely include a two-dimensional array of other values, along with settings, row and column functions, etc. The usefulness of the value comes from the existence of a custom renderer/editor for that type. Similar for the "web pages" mentioned above, although the "browser" need not be able to edit the pages. Using databases as filesystems (kind of) is a big thing coming up - Microsoft is heading that way apparantly. I'd like to think that Aleph doesn't need such a thing because it already can efficiently store large quantities of structured data. Indices of some kind would still be needed, however, so perhaps a "database" is a value that contains one or more "index" values and a list of the data values. I doubt that will be sufficient, but perhaps such thoughts are a start. Various thoughts and mental images: A text editor is basically the custom renderer/editor for arrays of strings. As such, it should be universally available whenever a type which contains an array of strings needs to be rendered or edited. Similarly, a "paint program" is a custom editor for images. And thus I magically wave away entire industries and millions of lines of code! Needless to say, we need multiple possible editors for types, and the system has to allow the dynamic addition, replacement and modification of such capabilities. Since all values are system-known types, sub-values not deliberately encrypted or hidden should be extractable without too much difficulty. This is because types are first-class entities within the system, and thus code can examine the type of values at run-time, and find usable sub-types within those types. With that information, and knowing the type-path to get to those subtypes, the code can extract the sub-values. (Details hazy!) I picture there being at least 3 editors/renderers for code. One would use straight text, and have traditional lexer and parser. This would be used by those who haven't yet been able to retrain themselves to use anything else. A second would be an "augmented" text editor type thing that does on-the-fly checking, automatic insertions, etc. The sort of thing that would drive most programmers instantly insane. So, it had better be configurable, so that it can go all the way down to the level of the straight text editor, with minimal checking. The third would be a graphical program editor, with icons for various programming constructs, etc. This third is targetted at non-programmers who want to write spread-sheet rules, web-page code, etc. Since code is a first-class entity in the system, the core system language abilities are available to those editors. I'm not at all sure how it would all fit together, however. One thought is a construct that I put into the Alai language (the one I designed for my MSc.) That is a kind of type which is a reference to code. E.g. "code int x" (or some such) would be the declaration of a variable which holds a reference to some code which, when executed, produces an 'int' result. I don't know how far such a concept would go. Users would typically have hierarchies of folders (directories) containing their saved values. I envision smallish sets of customization values being associated with such folders. Thus a user could have their default GUI style settings stored with their root folder. When a value is opened by a GUI renderer or editor, the path from the value to the user's root if traced, looking for GUI style settings to apply to the display. The values could be something like a list of name-value pairs, where the values can be whatever is needed (typically numbers, pointers to chosen style definitions (which include the code to render the style), strings, etc.) My intent was that the various GUI code that is used by the renderer (and the renderer itself of course) would just use the name of the variable directly, and the system would find the appropriate value to use (caching it nicely of course). It's currently not clear to me how this would actually work. Perhaps a class attached to the variables, such as "config int WindowBorderColour", would trigger the application-level programming language to do the "right thing" at runtime. It is also my intent that there be some base definition of all such values, which is required to have explanatory text associated with it. So, you can't just add random arbitrary names to the lists of configuration settings. There are huge issues with this, but this sort of describes my desires. Porting Aleph to a new system should be a minimum of work. That's a goal. Towards that, I already have the beginnings of the core language in my MUD system. There, I have the lexer, parser, byte-code generator, byte-code interpreter, linearizer for code, pretty-printer (renderer), etc. Modifications needed, but it sounds like a first step, to me. Also needed, at a minimum, is a way to open a GUI window on the hosting system, paint pixels in that window, and get keystrokes and mouse stuff from the user. I've already done work towards an extensible programming language. That was done 7 or 8 years ago before I started AmigaMUD (which was first called AmigaDUM, since the real thing was going to use the extensible programming language libraries). This stuff is the original source for some of my thinking about Aleph. Work on AmigaMUD triggered my thoughts on linearization, persistence, byte-code execution, etc. I've recently looked at that old code (written in Draco), and its not as elegant as my rosy vision of it suggested. It likely should be mostly ignored, except perhaps as the source of the idea. One of the first things that needs to be done, I think, is to design the core programming language, how language variants are produced, etc. There are huge issues with how entities are named in this system, since I want there to be one single hierarchical namespace that covers the entire distributed universe. The "proper" way to access a persistent value is to do so asynchronously, with an event triggered when the value is retrieved. This is because values can be very large, and so the access could fail. No-one wants to do this to reference a simple configuration value, however, so the default is to just mention the name of the variable in code, and the system will synchronous fetch it, triggering an exception of some kind if it fails. Even doing asynchronous fetching isn't the full answer, however, since values can be larger than memory. So, a reference to "HumongousArray[index]" shouldn't need to fetch the entire array into memory. Thought needed here. My intent is that virtually all of the system code be fully explorable by the user (like the old Xerox Smalltalk systems). What reasonable restrictions can we put on the changing of the system code, however? Clearly any code related to guaranteeing security can't be changed. Or, if it is, the system then becomes untrusted, and anything it sends out cannot be run outside of a playpen. Need ways of updating code in the field, of course. Heck, I'd like the norm to be that all systems periodically check with update sites and update themselves with stuff that their user(s) have configured to be updated. Would I trust something like that myself? Not likely. Sigh. One main plus is that we ought to be able to greatly reduce the effort involved in storing and retrieving information from disk, be eliminating any issues of parsing, ugly formats, etc. We also ought to be able to get rid of a lot of the effort involved in network communication, by removing most issues of protocols and transmission formats from application-level programs. Random mini thoughts Perhaps type attribute "big" is a hint to the system that the value stored in such a variable ought to be stored in separate space, much like files are on traditional filesystems. Should that also affect its handling in memory? You should be able to point at darn near anything on the screen and push the HELP key (hmm, this keyboard doesn't have one!) and get some reasonable help. The system itself should be able to build a description of most things, but renderers, GUI style elements, etc. ought to be able to insert a more specific one of their own. At a very minimum, configuration variables should all be documented. Is intermachine communication message oriented or stream oriented. Or both? Its easy to build a stream on top of messages, slightly harder the other way. Extremely large messages are a problem, just like extremely large persistent values - they don't fit in memory. Should the system be able to start interpreting (including running code!) a linearized form before it has received the whole thing? In effect, the system becomes one large conglomeration of interlinked values. Hard to manage. Need to put boxes around parts. Where? Does the system use a garbage collector (perhaps more than one, at different levels of granularity?) or reference counts? Or both? If boxes can be placed around things, are reference counts among boxes practical? Are reference counts within boxes always practical - boxes could get pretty big. Perhaps reference count versus garbage collection is a choice of the box. The system knows how to render and edit all basic types. It also has ways of rendering and editing all of the language-defined compound types. Thus, the system can render and edit any value. Rendering can be one of two styles - render to text, or render to pixels. At the bottom they are equivalent, since for text to show up on a GUI, it needs to be rendered as pixels. However, being able to render to text is useful, in that it works in shells, editors, etc. Perhaps both modes are always available? What does rendering a number to pixels do? My thinking is that the base language has only the basic programming language constructs and the integer type. No floating point, no pointers, only fixed-size arrays, no collection types, etc. It isn't really useful for much of anything, except perhaps it could call system primitives for rendering and so be useful that way. Higher-level language variants would have language-defined collections, perhaps including list, set, bag and possibly others. The "string" type doesn't exist in the lowest level language, since it requires memory management. I'm not even sure that 'char' exists in the base language! Possibly including it, and thus allowing 'char' arrays would make the base language marginally useful. If that matters! Byte-order. There certainly has to be a defined byte-order for values sent between machines. I suggest big-endian for the usual reasons. Should persistent values use that, too? Then you could literally take a data disk from one machine to another, and it would still work (including the code on it), even on another architecture. I think the default attribute for variables is that they are not persistent, do not need to be updated atomically, are not 'big', etc. In AmigaMUD, all global entities (there are no global variables) are persistent. I don't think that is necessarily the right choice in this case. Oh yes, it *is* my intent that any code you have write access to can be edited online at will, and the new version will take effect immediately. This is like in AmigaMUD, so I know its doable. We need to think about the consequences of this in the more global scope. Conditional compilation. Its hard to imagine doing full operating systems, etc. without it. But, it doesn't really fit into the style of programming I've been talking about here. One thought is to have parameterizable chunks of code (at what level - package?). Associated with such a thing is a set of instantiations, where values have been selected for all of the parameters, thus producing an executable variant. I'm assuming here that the execution system is able to take advantage of the parameterization when it generates code (whether internal trees, byte-code or machine code), and do proper optimization, including inlining of routines whose bodies have become miniscule. The program editors should also be able to handle parameterization. E.g. you should be able to set values for parameters, and the rendering of the program will show the code resulting from that parameterization. Gnu emacs can apparantly do something like this, and I expect it to be quite useful. Perhaps the renderers should also be able to do it based on non-parameter conditionals - that could be very useful. A thought on program representation. Internally, one form is that of a parse tree (which can be linearized via prefix traversal - that's what AmigaMUD does). Nodes in the tree could all (or some?) have a field which points to their optimized variant. It is the optimized variant that is executed (or has code generated from), but it is the non-optimized variant which is rendered and edited. This allows things like constant folding to easily happen, but without losing the original form of expressions, etc. Parameterization could use the same mechanism, but we do want to have several different parameterizations, so perhaps that is best done at a higher level. Perhaps at the function or package level. Some random GUI thoughts Keystrokes in the GUI system go to the most local selected element which has expressed a wish to receive them. GUI elements that don't need keystrokes don't express such a wish. Thus, we hopefully avoid the focus issues that the Java MUD client has. In the GUI system, the contained element requests the size of the inner part of the container in which it is to render itself. *Not* the outer size. The contained element shouldn't have to bother itself figuring out how big the borders and stuff of its containing element are. What if the requested size can't be given? The simple choices are to fail the creation of the element, and to create a scrollable interior for the element to render in. The latter is likely needed anyway, for displaying large images directly. There is still a problem if there is no space for the containing element to render its scroll bars. Perhaps the creation then fails. Lets pick a minimum pixel size for such events - a one pixel by one pixel scrollable space is not terribly useful. GUI elements should not try to force appearances - they should use the hierarchical settings values the user has specified, as described above. I really, really hate programs that deliberately ignore the user's settings because the author wished his program to use a certain style. Doing that is art, not tool building. Perhaps a tool can include "recommended style settings", and the user can choose whether or not to allow such overrides. Yet more complexity. This might just come down to setting where the search path for config settings starts - with the tool itself, or one level up. A GUI element cannot tell anything about stuff outside of the box it is rendered in. That is not its business. Is it even remotely possible to produce a usable GUI system with that rule? I'd really, really like to keep the GUI system from bloating into a multi- megabyte monstrosity like nearly all others have done. What coordinate system to use? Floating point scales and is more abstract, but my experience with using it in AmigaMUD tends to make me shy away from it. Too many things need pixel accuracy to look right. Rounding problems mess things up. I also recall not liking Java's model, although it is interesting. I think the problem was that there was no way to find the bounding rectangle for a polygon. Or something like that. The defaults/settings values should be "live". I.e. changing them should affect the GUI's appearance right away, as appropriate. I have no idea how to do this efficiently! It should be possible to attach things to the "source" of programs, such as images, sounds, animations, etc. They wouldn't affect the compilation/execution of the program - they are only a communication from the program writer/editor to the human program reader. Perhaps allow setting of a backdrop image, etc. The reader must be able to control all of the appearance aspects, however, so that fonts/colours/sizes etc. can't change if he choses not to let them. Some random programming language thoughts What do we do in the language about global variables? I'd like to not have any, but that is often impractical. Perhaps we need to think first about the idea of something like packages or modules, which are classically built from multiple source files and define scope of their own which is larger than file scope, but still private. One vague possibility is something like the nested table thing that happens in AmigaMUD, but perhaps with different rules. E.g. the search path for symbols in a piece of code starts with the innermost scope and goes outward, as usual, but the end of the search is defined by the system as a whole, and not by anything relating to the current piece of code. Perhaps that is how system-defined symbols get used. Also needed is a way to "use" more tables of symbols that are not on the direct path, but are in branches reachable from that direct path. If we end up allowing the above kinds of references, how do we ever change anything in the system? There will be such a complex web of dependencies, that no function can ever be deleted! One possible answer is that functions can be edited at will, and thus they can start referencing fewer things, thus eventually untying the knots. Numeric types. What do we want? Given the current and near future state of processors, I think the default integer type should be 64 bit, and the default floating point also 64 bit. Those two, and string and boolean types, likely suffice for the primitive types for most application and GUI programming. For low-level programming, one needs to be able to do loads and stores of specific sizes to and from hardware registers, etc. That suggests adding some types (no need for signed, I expect) that are defined to be 8, 16, 32 and 64 bits. For efficient programming at low levels, the C style of not saying how big a type is works well, but we need to avoid the pointer trap by defining a integral type large enough for a pointer. So, as a first attempt, how about the following types: Base language: int (signed integer, size unspecified but likely 64 bit) char (character, typically an ASCII character) bool (true/false value) Applications level languages add: float (floating point, size unspecified but likely 64 bit) string (flexible, system checked string, unicode?) Low level language adds: float32 - 32 bit floating point float64 - 64 bit floating point uint8 - 8 bit unsigned integer uint16 - 16 bit unsigned integer uint32 - 32 bit unsigned integer uint64 - 64 bit unsigned integer addrint - unsigned integer big enough to hold any pointer uint - natural-sized unsigned integer sint - natural-sized signed integer A minimum subset of the load-level language could not have floating point, and only have unsigned integer types upto the reasonable size of the processor (e.g. an 8 bitter might stop at 8 bits, but would have to have a 16 bit addrint). An interesting question: do we need to low-level types in the bytecode machine? Or, is all low-level stuff compiled directly to native code? Requiring that compilation ability makes porting harder. Without it, one need only produce a bytecode interpreter, and the bit of lowest- level native code that the system needs to get up and running. Object-oriented programming. I'm still not a big fan of this, in the sense that it is used in, say, C++. I think, for example, that C++'s multiple inheritance is very bad. I much prefer Java's concept of interfaces, but I believe that some consider them inadequate. In a system as dynamic as what I am proposing here, I think we have to go with something like interfaces, since changing the inheritance hierarchy in a running system is likely intractible. With interfaces, one can add a new interface to the set exported by, say, a package, and new clients can start using that new interface. When all users of the old interface have switched over, then the old interface itself can go away, and any code in the package used just to implement the old interface can also go away. I still haven't really said anything about OOP. The idea of inheritance of value, like AmigaMUD does, is something I kind of like. I've used it above, essentially, for configuration settings. It isn't efficient enough to use for low-level or main application programming, however. So, here are some words that might make a usable compromise: - structs and classes are completely different beasts. Structs are like traditional C structs, in that they associate a bunch of values into a group in memory. They can nest at will, and no pointers or references are needed. They exist in the base language, since I see no particular problem created by their existence. - classes are more like Java classes - every value which is an instance of a class is accessible only by reference. Thus, classes cannot directly nest like structs do. There is no such thing as a class structure field or variable. Classes can inherit, however, and that inheritance does not require a reference - the "parent" class elements are included directly in the storage for the descendent class. The main aspect of inheritance, however, is the inheritance of the interfaces and methods of ancestor classes. - I personally find the use of class member names to be incrediby confusing. I would rather require explicit use of a "path" to the member names. I know that could get verbose, but hopefully it would encourage shallower inheritance trees! I'm also not sure I like the implicit 'this' used inside member functions. Typing "this.blah" all the time would be tedious, but if it results in greater clarity and uniformity, I'd say go with it. For similar reasons, I wouldn't be a big fan of something like Pascal's "with" clauses. I even find "this.blah" confusing when "blah" is a member of some ancestor class. Perhaps even more than when "blah" is a member of the child class, since I don't know where to look to find "blah". Now, with an IDE these problems are minimized, but I still think that code should read clearly and unambiguously all by itself, without external assistance. I'm not sure I have an acceptible answer here, other than "don't use OOP". Taglets. I'm a big fan of using taglets on identifiers as memory aids. Intellectually, I understand that they become questionable when a project becomes large enough that the taglets are hard to keep unique, and hard to figure out since there are several similar ones. I don't know what to say here - I've never worked on a project that got big enough that this was a problem. I'd like to say that large projects should be modular enough that it isn't a problem, but I'm not sure even I believe that. Taglets would only work properly in a very large system, I think, if no symbols with taglets (struct elements, class members, etc.) escape from the larger units that they are defined and used inside. E.g. the only symbols that a package exports are those defined in its interfaces. I think that means that a package cannot export a class that can be inherited from. Is that reasonable, or does that remove the whole point of OOP? I guess it depends on how large the package is - if it is large enough, then there is still lots of possibility for traditional OOP within it. We do need some way to stop the dependencies on memory layout, however. I.e. we need to create a system where the addition of a member to a class does not invalidate the entire system or tons of persistently stored values. Hmm. *Can* you add a field to a structure that has already been used? (Or, equivalently, a member to a class that has already been used?) I know there are some systems which can do it - they essentially have to go rewrite all values in memory. We would have to rewrite persistent ones too. Which can of course fail due to lack of space or I/O error. Well, this *is* the same as compatibility problems with differing versions of proprietary file formats. I'm thinking that there are only two situations where these format changes have consequences outside the code of the package: - persistant values written by code in the package. - structures exported in an interface the package implements The answer to the second is as mentioned earlier - you have to define and implement a new interface, and support the old interface as well as the new one until all users of the old one have been updated. What about symbol clashes between the old and new interfaces? They would want to be the same (nearly all of them if the change is small!). So, how does code bind to the right one? The answer here might be to rely on help from the system's knowledge of the programming languages, etc. A tool (or whatever) could be provided that takes a package A, that uses interface I1 from package B, and replaces all of its uses of I1 with corresponding uses of replacement interface I2. This will result in name changes seen in a rendering of the code in package A. However, a later tool could then do the step of "B no longer needs to implement interface I1, as it has been replaced by interface I2. Please delete I1, but do corresponding renames of the symbols in I2 (and I2 itself) to those from I1". This would work since the stored representation of A would not actually contain the symbols from Ix, since it references the single copy of those symbols directly. A much simpler example of this is that I could add a way in AmigaMUD to change the name of a local variable or parameter in a function, and any subsequent pretty- print of that function would use the new name throughout, since all uses are just a kind of reference to the single declaration. I have no immediate answer to the issue of updating persistent values of a structure or class that is changing. It doesn't seem practical to somehow have an index that shows where all of them are. Hmm. If we add a rule that a structure exported by a package cannot be persisted except by code inside that package, then perhaps we can have an answer. If only internal code can persist such values, then perhaps some kind of automatic conversion is feasible. If not, then we may be stuck with such packages having to maintain code to handle old structures indefinitely. Ick. How does the system know that a type has been persisted? Is the existence of a persistent variable of that type (within the package) enough? How *does* the system know the type of a persistent value? I don't think we want each persisted value to have to fully describe itself if there are thousands of them. Hmm. This might not be a problem - only top-level persisted variables need to describe themselves - if they contain aggregates of other values, their one description of that aggregate suffices. So, perhaps each such top level value includes a unique id of the type, and a version number of that type. Then, if the system reads a persistent value with the same type id but different version, it actually could do the conversion. The stored value would only be updated when it is next stored. If the new version of the type has new fields, default values would be supplied by the system. The new code in the package could check for the default values and update accordingly. How many individual persistent values are we talking about? Is the above extra stuff too much? My thinking is that there would be about the same number of top-level persistent variables as there would otherwise be files in a file-oriented system, so its not a problem. It should perhaps be a goal that this be so. A 'table' type might be interesting. The type definition requires two types as its arguments. One is the key type and the other is the value type. The key type needs to have an associated numeric hash function. This leads to the idea from some language several years ago that types are defined by the set of operations that can be performed on values of that type. Perhaps all types, including the builtin ones present an interface listing the operations that are available on them. The regular syntaxes we use for some of those operations are simply syntactic sugar. Maybe not the assignment operator, else we go down the icky path of C++ references. Persistent variables. How exactly does one use them? If a variable declared inside of a package is persistent, then I think there should only be one system-wide value for that variable. It could be persisted "inside the directory" where the package "resides". If a persistent variable is exported by an interface implemented by a package, is it system-wide unique? If so, how *does* one make/use a local (reached via the same path-search as mentioned earlier for configuration variable sets) persistent variable? Perhaps a tag on the declaration in the exporting interface says whether the variable is local or shared. Are such variables deleted if the package that exports them is deleted? Do the variable instances reference the package and interface for their definition? If the variables are not deleted, do we then leave a crumb behind which contains the type of the variable and a reference to the package, so that if the package is installed again, it can again find the variables? Does that need to be more than using the right name for the variable, perhaps prefixed by the package name? How unique do package names have to be? Can there be multiple packages with the same name on the same system, distinguished by their "path"? That would be useful for trying out new versions. How else do you work on a new version of a package that the system needs for operation? If the "names" of the persistent local variables contain the path to the package that they are for, does that mean that a package has absolute control over where it is installed? How then do you replace an old version with a new one? I think that somehow you have to swap them around when neither one is in-use. That implies a table of pending swaps that is executed on a system boot, for packages that are always in use. Having the package force where it has to be put is something that modern sysadmins are not pleased with. Windows has run into problems with everything wanting to be installed in "C:\Windows\xxx" or somesuch. The more I think about it, the more it seems to me that the traditional ways of using OOP just don't have a place here. It looks to me like they can only be used within a given package, since you end up being locked in to their current forms if they go out of the package. Is that enough usefullness to merit the effort of implementing OOP? Am I missing something here? 020424/Wednesday: Since types are first-class objects in the system, there need to be ways to construct them and explore them from within the system. These can be used to write linearizers and de-linearizers for types. The explorers can also be used by general printing routines and a debugger. Have a type-case statement like Algol68 (and Pascal?) that safely splits out a variant record. Then, there could be a main type-explorer that is passed a type and returns a variant record that describes the type. The caller uses a type-case to deal with the several forms of types (likely things like 'struct', 'variant', 'proc', 'array', "result of calling XXX with args ZZZ", etc.) Since types are first-class objects, can have functions which take types as arguments and return types. These *are* the routines that are used to define container types. E.g. proc set(type elementType)type Need a way to specify that a type used as a parameter to such a function must have certain "operators" available. For example, a type passed to 'set' must have an equality operator so that duplicates set elements can be removed. Most type functions would require a way to do assignments. This is where we might borrow a restricted version of C++'s "ref" concept. "ref" types can *only* be used as parameters to functions. When such a function is called, the "ref" parameter must be an assignable expression (lvalue). Inside the function, no special operator is used when referencing the "ref" parameter - the un-refing (likely a pointer dereference in reality) is implicit. This is likely quite close to the version in C++ - should read up on them again. I don't want "ref" *variables* however, since then it is possible to have "ref" values which are not guaranteed to be valid - by having them only as function arguments, there cannot be an invalid one. Note that the above 'set' function may well need to *generate* functions that are part of the set of "operators" that the returned type has available. I'm guessing that it won't always be possible, even with some added tricks, of using generic routines for such things. Which leads us to the whole set of routines used for constructing and exploring functions. Its possible that the types used to represent the bodies of functions (and their basic Proc_t?) can be transparent and public. However, the values themselves need to be read-only. A set of construction routines is used to build and assemble the pieces of functions. Those routines will check that everything makes sense. For example the constructer routine for a "for" statement must check that the index variable for a counting "for" is a integral local variable. (Type "int" in the base language - other sized integral types likely allowed in the system-level language.) This set of routines can be used when linearizing and de-linearizing functions, when pretty-printing functions, in the regular compiler, in the GUI-form program entry code, and in the bytecode compiler. Thus, "int" is simply a predefined constant of type "type". Etc. I think we could handle some container types like I did "lists" in Six/Fant. I.e. the types built by the "list" function put a "getNext" operator into the set of operators exported by the returned type. The convention for such iterators is that they take the list and the list element as parameters, and return the next element. Hmm. How do we describe such an operator? Maybe proc getNext( container; element) where the syntax represents a generic type substitution. This doesn't feel quite right to me. Don might suggest using inheritance from a base "iterable" type. How would that go: type iterated = newType(); /* get an otherwise undefined type */ type iterable = proc getNext(iterable container; iterated element)element and then "list" generates a type which inherits this routine? This doesn't feel right either. Don, how does C++ use their generics to do this? How about Java? Do they have to make the container take the most generic type, and use casts to get to the proper element type? Anyway, once you have a 'getNext' operator (and 'getPrev' for chains, 'insAfter', 'prepend', and 'append' for lists/chains, etc.), you can then use that type in an iterating "for" loop: list(int) li; int i; li := doSomethingToMakeAListOfInts(); for i in li do doSomethingWith(i); od; This works by having the construction function for the iterating "for" statement require a 'getNext' on the type of the expression after the "in", and also that that 'getNext' have the type of the iterator variable as its second parameter type and result type. Since you can't declare *variables* using non-constant types, it must be possible for the compiler to check this sort of thing at compile time, using the type examination routines. Hmm, what if a generic routine needs a temporary - like in a 'swap' routine? I had figured out how to do that in a previous Draco extension, but it was pretty low-level. However, my intent there was to share the actual object code among all uses of the generic type. We might not have to do that here (or be able to!), since the type functions can generate routines at runtime. 020425 email from Don > > Need a way to specify that a type used as a parameter to such a > > function must have certain "operators" available. For example, a type > > passed to 'set' must have an equality operator... > > Perhaps something like Java's interface would be suitable here. > The Generic-parameter-interface could syntactically be like an > Interface. The language designer could specify that the Type > supplied to Set explicitly implements the HasEquality interface; or > he could merely insist that the Type has all of the operators. > > Java doesn't have Generics. > > C++ has stupid Generics. One can supply any type-or-class where a > type-or-class is expected. The compiler or linker will gripe if you > get it wrong, but there's no way to express, within the language, > how to get it right. > (In my Ordered-array-of class, I have comments stating that > the supplied type must have >= and == operators. That's the best > one can do.) > > Modula-3 does it like C++. If you get it wrong, the compiler gripes. > > Eiffel has Constrained Genericity, which is rather like the above > Generic-parameter-interface. One can say, "this generic takes a > class which must be XXX class or a descendant." > (Eiffel has interfaces but calls them "deferred classes"; so > one doesn't have to say "class or interface" everywhere.) > To support that process, Eiffel does one thing which you don't like. > Even simple classes have a long line of ancestors. For example: > Eiffel has a type which has == and >= (and >, <, and <=): > PART_COMPARABLE; > then an heir, for which < and >= are inverses (also <= and >): > COMPARABLE; > then an heir with arithmetic operations (+, -, *, /): > NUMERIC; > and then INTEGER, REAL, and DOUBLE inherit from NUMERIC. > I might have used PART_COMPARABLE or COMPARABLE in my > Ordered-array-of. > > Even more amazing is the lineage for Eiffel's SORTED_LIST: > Bertrand created CONTAINER; > and CONTAINER begat TRAVERSABLE; > and TRAVERSABLE begat LINEAR; > and LINEAR begat BILINEAR; > and BILINEAR begat SEQUENCE; > and SEQUENCE begat CHAIN; > and CHAIN begat LIST; > and LIST begat PART_SORTED_LIST; > and PART_SORTED_LIST begat SORTED_LIST; > and Bertrand saw that it was good. > (Bertrand is Bertrand Meyer, not Bertrand Russel.) > Did I mention that SORTED_LIST begat SORTED_TWO_WAY_LIST? > > > Note that the above 'set' function may well need to *generate* > > functions that are part of the set of "operators" that the returned > > type has available. > > Why? > > > "ref" types can *only* be used as parameters to functions. > > Why? (I sometimes use one to name an oft-used element of an array, > so that address calculation is done only once. A pointer would work > just as well.) > > > When such a function is called, the "ref" parameter must be > > an assignable expression (lvalue). > > You might like the Eiffel terminology: Entity (lvalue) and > Expression (rvalue). > C++ allows the actual parameter to be an rvalue, if the formal > parameter is a ref-to-constant. (Just store the constant somewhere, > and ref that.) > > > I don't want "ref" *variables* however, since then it is possible to > > have "ref" values which are not guaranteed to be valid - by having > > them only as function arguments, there cannot be an invalid one. > > C++ doesn't have such variables, either. A C++ ref must be > initialized, and can't be altered. But they don't have to be > function arguments. > > C++ tries to guarantee that refs are always valid. In particular, it > guarantees they're never NULL. But that's the language > specification, not the implementation. One easily produces a NULL > ref in every implementation I've seen. (A program which does that > is invalid, of course, but the compiler doesn't notice.) > And one can produce an invalid ref with type-cheating pointer casts. > > Algol 68 has ref variables, but in my view they're just safe > pointers (no pointer arithmetic). > > Eiffel's ref variables cannot point to the middle of a structure. > (Except that it has facilities for linking to "C" programs, and > therein one can cheat.) But they can be null. > > If Modula-3, if one uses only "safe" modules, ref's are guaranteed. > (They are null, or point to something of the right type.) But in an > "unsafe" module, one can type-cheat and do pointer arithmetic. > > > The convention for such iterators is that they take the list and > > the list element as parameters, and return the next element. > > No. Conventionally, an iterator has its own state, a sequence and a > position within the sequence. The constructor takes a sequence, and > positions to the start. ToNext takes no parameters. > > > Don, how does C++ use their generics to do this? > > template class sequence { > friend iterator; // access to privates > ... > }; > > template class iterator { > private: > const sequence& seq; > const ITEM* pos; > public: > iterator(const sequence& sequ) > : seq(sequ) > { pos = %%% } > bool atEnd() const { return pos != NULL; } %%% maybe > const ITEM& getItem() const { > if (atEnd()) throw "You idiot!"; > return *pos; > } > void toNext() { pos = %%% } > }; > > The details depend on the implementation of sequence. > I might have gotten the syntax right. > > > Anyway, once you have a 'getNext' operator (and 'getPrev' for chains, > > 'insAfter', 'prepend', and 'append' for lists/chains, etc.), you can > > then use that type in an iterating "for" loop... > > Yeah, but you're stuck with that particular operator name. C++ > allows any methods to be used: > for (iterator it(sequence); > ! it.atEnd(); > it.toNext()) > { > const thingT& th = it.getItem(); > ... > } > > Eiffel does it too, with better syntax. (And any language has some > kind of While.) > > > Hmm, what if a generic routine needs a temporary - like in a 'swap' > > routine? > > Eiffel has a "type of" operator, called "like". It is especially > useful in generic functions. The base class ANY declares the Equal > routine this way: > equal(parm1: ANY; parm2: like parm1) BOOLEAN > And so I can pass any two things of the same Type to Equal. > Eiffel restricts use of "like", to places where the compiler can > deduce the correct type. > > As for "swap", Python does it this way: > a, b = b, a; > (The RHS produces a two-element list, which can be > unpacked-and-assigned to any two variables. You'd expect that kind > of construct, when variables are untyped.) > I don't know whether a strongly-typed language should have a builtin > swapper. > > ----- > > Do you intend that the base language can declare the base types and > their operators? Here's how Eiffel does (part of) INTEGER: > expanded class INTEGER > feature > infix "-" (other: INTEGER): INTEGER > -- result of subtracting Other > prefix "-": INTEGER > -- negation of > end > (Alas, one cannot declare the precedence of operators.) > C++ has something similar, and postfix expressions to boot. 020427 My reply: (Sorry for the delay in responding - I was lazy on both Thursday and Friday nights and accomplished little besides finishing that Lego disassembling.) > > Need a way to specify that a type used as a parameter to such a > > function must have certain "operators" available. For example, a type > > passed to 'set' must have an equality operator... > > Perhaps something like Java's interface would be suitable here. > The Generic-parameter-interface could syntactically be like an > Interface. The language designer could specify that the Type > supplied to Set explicitly implements the HasEquality interface; or > he could merely insist that the Type has all of the operators. I had something like that in mind, but didn't immediately come up with a syntax when I was writing, so didn't try to get specific. I would *like* the system to be general enough to be able to essentially define itself, but that's not a requirement. If doing that forces something to be too abstract for a typical programmer (or me!), or results in inefficiency, then it won't do. My main goal is that if there are programmer-defined collections (rather than just built-in ones), then they must be usable in the applications level languages. That means they must be 100% type safe, since those languages will not have type-cheat casts. I would also like them to be efficient enough to be used in the systems-level language, but that's not a requirement. (I use the word "collections" above to mean kinds of compound types in general, not just compound types that would be thought of as collections.) > Java doesn't have Generics. OK, I didn't remember any. They mostly do things by going all the way back to the Object class and then casting as necessary, don't they? > C++ has stupid Generics. One can supply any type-or-class where a > type-or-class is expected. The compiler or linker will gripe if you > get it wrong, but there's no way to express, within the language, > how to get it right. > (In my Ordered-array-of class, I have comments stating that > the supplied type must have >= and == operators. That's the best > one can do.) Hmm. Icky. As mentioned above, it *must* be safe at the applications level. > Modula-3 does it like C++. If you get it wrong, the compiler gripes. > > Eiffel has Constrained Genericity, which is rather like the above > Generic-parameter-interface. One can say, "this generic takes a > class which must be XXX class or a descendant." > (Eiffel has interfaces but calls them "deferred classes"; so > one doesn't have to say "class or interface" everywhere.) > To support that process, Eiffel does one thing which you don't like. > Even simple classes have a long line of ancestors. For example: > Eiffel has a type which has == and >= (and >, <, and <=): > PART_COMPARABLE; > then an heir, for which < and >= are inverses (also <= and >): > COMPARABLE; > then an heir with arithmetic operations (+, -, *, /): > NUMERIC; > and then INTEGER, REAL, and DOUBLE inherit from NUMERIC. > I might have used PART_COMPARABLE or COMPARABLE in my > Ordered-array-of. But, if you have methods which operate on those ancestor classes, how do you safely get from the result of one of those operators (which pretty much have to return values of the base class) to the descendant class which you are trying to work with? Hmm. I guess you could make that a fact of the way the language works - it looks like that is implied in the above. Looking around my bookshelves, I see have language descriptions for: Occam Modula-2 CLU Smalltalk C++ Eiffel Ada 95 Modula-3 Java Perhaps I just need to spend a bunch of time reading. Ick! > > Note that the above 'set' function may well need to *generate* > > functions that are part of the set of "operators" that the returned > > type has available. > Why? Because you could? Well, there may be limitations on what you can do by relying on single generic functions applied to all cases. So, if you need to actually generate code to accomplish the goal, you can. > > "ref" types can *only* be used as parameters to functions. > > Why? (I sometimes use one to name an oft-used element of an array, > so that address calculation is done only once. A pointer would work > just as well.) I picked this to make 100% certain that no "ref" value could ever be invalid. As you say below, I may be able to stay safe and still allow more situations. I would *like* to not have dynamic initializers in declarations, and without those, its hard to have "ref" variables. > You might like the Eiffel terminology: Entity (lvalue) and > Expression (rvalue). Hmm. > C++ allows the actual parameter to be an rvalue, if the formal > parameter is a ref-to-constant. (Just store the constant somewhere, > and ref that.) Yeah - I found that quite disturbing. Hidden inefficiencies. > > I don't want "ref" *variables* however, since then it is possible to > > have "ref" values which are not guaranteed to be valid - by having > > them only as function arguments, there cannot be an invalid one. > > C++ doesn't have such variables, either. A C++ ref must be > initialized, and can't be altered. But they don't have to be > function arguments. > > C++ tries to guarantee that refs are always valid. In particular, it > guarantees they're never NULL. But that's the language > specification, not the implementation. One easily produces a NULL > ref in every implementation I've seen. (A program which does that > is invalid, of course, but the compiler doesn't notice.) > And one can produce an invalid ref with type-cheating pointer casts. Can it be done without casts? My application language won't have casts, and the assumption will be that any code written in the system language is 100% reliable (may not be true, but the language will assume it). > Algol 68 has ref variables, but in my view they're just safe > pointers (no pointer arithmetic). Yeah - they are just pointers. I've stolen keywords from Algol 68, but I never liked a lot of their stuff (even though one of the designers was on my thesis committee, and his widow is my next door neighbour). I especially didn't like their automatic casts and use of similar syntaxes for all sorts of things. > > The convention for such iterators is that they take the list and > > the list element as parameters, and return the next element. > No. Conventionally, an iterator has its own state, a sequence and a > position within the sequence. The constructor takes a sequence, and > positions to the start. ToNext takes no parameters. I don't want such iterations to have the cost of allocating and deallocating an iterator. Thus, whatever method is used during iteration needs to have at least one argument, even if its only an implicit 'this'. > > Don, how does C++ use their generics to do this? > > template class sequence { > friend iterator; // access to privates > ... > }; > > template class iterator { > private: > const sequence& seq; > const ITEM* pos; > public: > iterator(const sequence& sequ) > : seq(sequ) > { pos = %%% } > bool atEnd() const { return pos != NULL; } %%% maybe > const ITEM& getItem() const { > if (atEnd()) throw "You idiot!"; > return *pos; > } > void toNext() { pos = %%% } > }; > > The details depend on the implementation of sequence. > I might have gotten the syntax right. And I might have understood it. Expecially the ": seq(sequ)" bit. > > Anyway, once you have a 'getNext' operator (and 'getPrev' for chains, > > 'insAfter', 'prepend', and 'append' for lists/chains, etc.), you can > > then use that type in an iterating "for" loop... > > Yeah, but you're stuck with that particular operator name. C++ > allows any methods to be used: > for (iterator it(sequence); > ! it.atEnd(); > it.toNext()) > { > const thingT& th = it.getItem(); > ... > } But I *want* the operator names to be the same, for readability reasons. > Eiffel does it too, with better syntax. (And any language has some > kind of While.) Yes, this is a form of "while" statement. However, it would be nice to be able to hide the details of the iterators and how they are used, so that it is easy to iterate through lists, etc., even for beginners. Hence the standardization on the names of the iterator methods, so that the 'for' that iterates can use them. > > Hmm, what if a generic routine needs a temporary - like in a 'swap' > > routine? > > Eiffel has a "type of" operator, called "like". It is especially > useful in generic functions. The base class ANY declares the Equal > routine this way: > equal(parm1: ANY; parm2: like parm1) BOOLEAN I like 'like' - I'll see if that helps. Keep in mind that I want this stuff to be efficient. I want all of the operations involving types to be compile-time only, unless the programmer deliberately does something involving types as first-class entities. I'm probably already contradicting myself on this, with the idea of generating code at runtime. Hopefully that would be a special case. > And so I can pass any two things of the same Type to Equal. > Eiffel restricts use of "like", to places where the compiler can > deduce the correct type. I like. > As for "swap", Python does it this way: > a, b = b, a; > (The RHS produces a two-element list, which can be > unpacked-and-assigned to any two variables. You'd expect that kind > of construct, when variables are untyped.) Understood. I've seen those before. They become a significant change in the language, however, and the consequences on stuff could be large. > I don't know whether a strongly-typed language should have a builtin > swapper. Oh, why? > Do you intend that the base language can declare the base types and > their operators? Here's how Eiffel does (part of) INTEGER: > expanded class INTEGER > feature > infix "-" (other: INTEGER): INTEGER > -- result of subtracting Other > prefix "-": INTEGER > -- negation of > end > (Alas, one cannot declare the precedence of operators.) > C++ has something similar, and postfix expressions to boot. Certainly nothing as fancy as Algol68 has. I *would* like to be able to introduce new operators, in some sense, but another part of me doesn't want to, but does want new types/classes to be able to define their meaning for the existing operators. Hmm. Perhaps operators themselves, along with their priorities, are introduced as features of a new language superset. Then, within that language, the meanings of the operators can be defined by any new types defined in that context. At least that way there isn't an unmangeable mish-mash of operators with priorities depending on argument types, etc. like Algol68 had to cope with. I'm not into writing a parser that is NP complete in its parsing. (I doubt Algol68 was, but it certainly wasn't simple fixed precedence like most are.) More thinking today, I guess. 020427/Saturday I need to be careful to not confuse things relating to types. I've been thinking of a procedure that takes one or more type arguments and returns a new type value. I want to be able to do that, but that isn't necessarily what is wanted for container types like lists. A more standard generic type is something that tells the compiler things about the type you want to use, and how to manipulate entities of that type. If I were to have a procedure like: "proc list(type elementType)type", would I be able to use it in a declaration in explicit source code, as in: "list(int) li"? Thinking of "list" as a procedure, that would require that "list" be executed at compile time, yielding a type that the compiler would then examine to determine how to handle "li". That may well be workable, but it is quite different from what other languages do with generic types. I need to be clear what I'm thinking about. It may well be possible to not need a distinction, but until that is clear, I need to be clear. Another consideration: when decompiling a program containing such a type, how is it represented? If "list" is viewed as a generic type, then "list(int)" is the representation of that type in a decompiled view, and so it needs to be the internal representation as well. However, since I want to allow the actual type procedures, it seems clear that the system will end up having types that do not have any associated printable form that means anything to a human reader. So be it. Note that you cannot make use of that type for additional declarations unless the editting facility treats whatever representation of the type is used as a separate object in and of itself, without trying to interpret the textual form. I'm beginning to think that all of the pre-defined types and type constructors in the system should have names beginning with a capital letter, since they are global symbols. It may make sense to expand the use of "ref" types. Thus, a list iterator would actually deal in "ref" types. Similar for a table type. Thus, if you somehow have "Table(String, TableEntry) myTable", then you should be able to do "Lookup(myTable, "fred").fieldName := value", and this would work because the generic "Lookup" returns type "ref TableEntry". Something that scares me about this sort of thing, is that it is so close to pointers that it may not be possible for reference counting to work efficiently with the things you "ref". E.g. to be safe, it looks like "Lookup" can't return a "ref" to the field of the table struct that is the value part, but rather has to return a copy of the table entry's "ref" to that independent entity. I.e., we can't have some variant of struct TableSlot { string tab_key; TableEntry tab_key; } as the resulting type, but rather must have struct TableSlot { string tab_key; ref TableEntry tab_entry; } instead. Similarly, how do "ref" values work safely? Do they in fact pad the thing they reference with another field that is the reference count? How does this really work? How would it mix in with reference counts on class objects? Are the two the same thing, i.e. anything that a "ref" variable references must be a structure that has a reference count as the first field, and is dynamically allocated? What happens with class inheritance? With single inheritance, perhaps it is simply that the parent class, complete with its reference count field, is just the first field of the struct that represents the child class, and the compiler knows that in this case it doesn't need to add an extra reference count field. So, declaring things using the system-level language, what "Table(string, TableEntry)" yields is something like: type TableEntryRef = struct { UInt ter_refCount; TableEntry ter_te; }; type TableSlot = struct { UInt ts_hash; /* Probably not needed. */ string ts_key; *TableEntryRef ts_entry; }; type TableBody = struct { UInt tb_spaceCount; UInt tb_usedCount; *TableSlot tb_slots; }; What am I missing? :-) It certainly looks here like "Lookup", "Enter", etc. can be generic, in that the same bytecode sequence works for any types. Note that the "key" types/values must all be the same size. So, that would somehow have to be a constraint on the first "parameter" to "Table". Hmm. Perhaps "Table" does have a body. What that body does is check the arguments, thus producing compilation errors. Wierd, but it might work. Grrr. If I do this for "ref" types, then "ref" types don't work as parameters to functions that operate on arbitrary types. Sigh. It looks like two different concepts are needed - one that is used for those parameter situations in order to allow things like assignment operators, and one for these situations that require a reference count. We don't want reference counts in the former, otherwise we have to treat specially any variable that is "ref"-ed in the body of the routine. Is that so bad? Dunno. Certainly any such reference count has to be initialized to '1', so that the second type of "ref" handling doesn't try to free it! Assuming all of that, lets try to write Table. Note that I've made this one have a fixed number of entries, so that I could skip stuff relating to dynamic expansion. /* Externally defined things that Table references: */ /* These types are predefined in the system somewhere. */ type TypeError; type TypeGeneric; type Type; type UInt; type Bool; type String; type GenericProc; type TypePackage; /* This is a generic procedure description, that is a description of what to match in the set of methods that a type implements. "Implements" matches these things. Where it sees Type "TypeGeneric" in such a signature, it must see its "t" parameter as the corresponding parameter of the "UIntHash" routine that must be implemented by the type being checked. */ type UIntHash = proc(TypeGeneric v)UInt; type Equals = proc(TypeGeneric v1, v2)Bool: /* These routines are part of the Type package or something. They are all pretty trivial, I expect - they simply return or modify internal data structures of the Type internal variant structure. */ proc Implements(Type t; GenericProc requirement)Bool; proc SizeOf(Type t)UInt; proc TypeGenericNew(UInt byteSize; TypePackage tp)Type; proc TypeGenericAddTypeParam(Type newType, paramType)void; proc FindMethod(Type t; GenericProc requirement)Proc; /* Table itself: */ type package Table(Type keyType, entryType; UInt entryCount) { type EntryStatus = enum { ts_empty, ts_deleted; ts_used; }; type TableEntry = struct { keyType like String te_key; Bool te_status; ref entryType te_value; }; type TableBody = struct { UInt tb_entryCount; [entryCount] TableEntry tb_entries; }; proc check(Type keyType, entryType)Type: Type t; if not Implements(keyType, Equals) then error("Table key type must implement Equals"); return TypeError; fi; if not Implements(keyType, UIntHash) then error("Table key type must implement UIntHash"); return TypeError; fi; if SizeOf(keyType) > SizeOf(String) then error("Table key type must be no larger than String"); return TypeError; fi; t := TypeNewCompound(SizeOf(TableBody), Table); TypeCompoundAddParam(keyType); TypeCompoundAddParam(entryType); return t; corp; proc new(ref Table tb)void: tb.tb_entryCount := entryCount; for UInt i from 0 upto entryCount - 1 do tb.tb_entries[i].te_status = ts_empty; od; corp; public Exception TableOverflow; public proc Enter(Table tb; keyType key; ref entryType value)void: UInt startHash, hash, deletedIndex; Bool foundDeleted; startHash = FindMethod(keyType, UIntHash)(key) % tb.tb_entryCount; foundDeleted := false; hash := startHash; while tb.tb_entries[hash].te_status ~= ts_empty do if not foundEmpty and tb.tb_entries[hash].te_status = ts_deleted then foundDeleted := true; deletedIndex := hash; fi; if tb.tb_entries[hash].te_status = ts_used and FindMethod(keyType, Equals)(tb.tb_entries[hash].te_value, key) then /* Replace existing entry. */ tb.tb_entries[hash].te_value := value; return; fi; hash := (hash + 1) % tb.tb_entryCount; if hash = startHash and not foundDeleted then raise TableOverFlow; return; /* Probably not needed. */ fi; od; /* Insert new entry. */ if foundDeleted then hash := deletedIndex; fi; tb.tb_entries[hash].te_value := value; tb.tb_entries[hash].te_status := ts_used; corp; public proc Lookup(Table tb; keyType key)ref entryType: /* Put some of the above into a subroutine and share it here. */ corp; }; This hurts my head. So much for being a simple language. I think we want this type of thing to be in a language level that is sort of beside the other ones, so that most programmers don't have to deal with it. There is a pretty twisted combination of language features and run-time system support routines here. I'm not at all sure of some of the semantics, either. For example, the declaration of TableBody, containing an array whose size is one of the parameters to the type, is strange. Does the system have to actually do such a declaration, during the processing of a declaration involving the use of Table, in order to produce a Type structure that can be used in the SizeOf call in 'check'? I think that is doable, in that TableBody is stored internally as a parameterized type (recognized by this level of the language, based on the use of a type parameter as the array size). Its pretty wierd, however. What exactly do the assignment statements of 'entryType' values in "Enter" do? If I have a pair of "ref X" variables and I assign from one to the other, is that copying the ref or copying the value? If it is copying the value, then this isn't a simple assignment statement, since it is a copy of a value of unknown size. Does that imply we need a "Copy" method on the type so that we can do this? If so, then the simple assignment as used here could copy the "ref" values, doing the right thing with reference counts. The efficiency of the resulting table isn't very good. The Enter and Lookup routines have to do run-time FindMethod calls. Ick. Hmm. It might be possible to do those lookups in the 'check' routine (but where would we store the results?) or the 'new' routine, and cache the function pointers in the TableBody. What would be the type of such variables? Generic proc pointers parameterized by keyType and entryType as needed? If we are sharing the code for Enter and Lookup among all uses of the Table type, then we don't have a lot of choice. One of the points here was to not require new generated code for each table type. Another problem I have is that it is the code in the type package that is doing the type checking, and not the compiler, essentially. That bothers me a bit, but is probably the way it has to be. Hmm. It almost seems like Enter and Lookup shouldn't use the Table parameter types at all, since they are run-time functions, as opposed to 'check', which is a compile-time function. Hmm. 'new' is a run-time function. What are the rules for what runs when? So far, only 'check' needs to run at compile time. Lets try a variant with some of these things done. Aha! The function pointers to the needed methods are part of the type, not part of each value of the type. Hence what we want is to be able to attach some type-implementation-specific storage to the structure created to represent the type. type package Table(Type keyType, entryType; UInt entryCount) { type TableMethods = struct { proc(keyType value)UInt tm_UIntHash; proc(keyType v1, v2)Bool tm_Equals; }; type EntryStatus = enum { ts_empty, ts_deleted; ts_used; }; type TableEntry = struct { keyType like String te_key; Bool te_status; ref entryType te_value; }; type TableBody = struct { Type tb_tableType; UInt tb_entryCount; [entryCount] TableEntry tb_entries; }; public proc check(Type keyType, entryType)Type: *TableMethods tm; Type t; if SizeOf(keyType) > SizeOf(String) then error("Table key type must be no larger than String"); return TypeError; fi; tm := new(TableMethods); tm*.tm_UIntHash := FindMethod(keyType, UIntHash); if tm*.tm_UIntHash = nil then error("Table key type must implement UIntHash"); return TypeError; fi; tm*.tm_Equals := FindMethod(keyType, Equals); if tm*.tm_Equals = nil then error("Table key type must implement Equals"); return TypeError; fi; t := TypeNewGeneric(SizeOf(TableBody), Table); TypeGenericAttachPrivate(t, tm); TypeGenericAddTypeParam(t, keyType); TypeGenericAddParam(t, entryType); t corp; public proc new(Type tableType; ref Table tb; UInt entryCount)void: tb.tb_tableType := tableType; tb.tb_entryCount := entryCount; for UInt i from 0 upto entryCount - 1 do tb.tb_entries[i].te_status = ts_empty; od; corp; proc find(*TableMethods tm; *Table tb; keyType key)*TableEntry: *TableEntry te; UInt startHash, hash, deletedIndex; Bool foundDeleted; startHash = tm*.tm_UIntHash(key) % tb.tb_entryCount; foundDeleted := false; hash := startHash; te := &tb*.tb_entries[hash]; while te*.te_status ~= ts_empty do if not foundEmpty and te*.te_status = ts_deleted then foundDeleted := true; deletedIndex := hash; fi; if te*.te_status = ts_used and tm*.tm_Equals(te*.te_value, key) then return te; fi; if hash == tb*.tb_entryCount then hash := 0; te = &tb*.tb_entries[0]; else hash := hash + 1; ++te; fi; if hash = startHash and not foundDeleted then /* Table full. */ return nil; fi; od; if foundDeleted then hash := deletedIndex; fi; &tb*.tb_entries[hash] corp; public proc TableEnter(ref Table tb; keyType key; ref entryType value)bool: *TableMethods tm; *TableEntry te; tm := TypeCompoundGetPrivate(tb.tb_tableType); te := find(tm, tb, key); if te = nil then return false; fi; te*.te_value := value; te*.te_status := ts_used; true corp; public proc TableLookup(Table tb; keyType key)ref entryType: *TableMethods tm; *TableEntry te; tm := TypeCompoundGetPrivate(tb.tb_tableType); te := find(tm, tb, key); if te = nil then nil else te*.te_value fi corp; }; /* An iterable collection type T1 of elements T2 must export: type iterator; proc iteratorInit(*iterator it; *T theList)void; proc iteratorTerm(*iterator it)void: proc iteratorNext(*iterator it)bool; proc iteratorValue(*iterator it)ref T2; */ type package List(type valueType) { type listElement = struct { *listElement le_next; valueType le_value; }; type listHeader = struct { *listElement lh_head; }; public type iterator = struct { *listHeader it_lh; **listElement it_current; }; public proc check(type valueType)type: Type t; t := TypeNewGeneric(sizeof(listElement), List); TypeGenericAddTypeParam(t, valueType); t corp; public proc new(Type listHeader; *List lh)void: lh*.first := nil; corp; public proc iteratorInit(*iterator it; *listHeader lh)void: it*.it_lh := lh; it*.it_current := &lh*.lh_head; corp; public proc iteratorTerm(*iterator it)void: it*.it_lh := nil; it*.it_current := nil; corp; public proc iteratorNext(*iterator it)Bool: it*.it_current := &it*.it_current**.le_next; it*.it_current* ~= nil corp; public proc iteratorValue(*iterator it)ref listValue: it*.it_current**.le_value corp; public proc ListDeleteAndNext(*iterator it)**listElement: *listElement next; next := it*.it_current*; it*.it_current* := next*.le_next; it*.it_current corp; }; type intList = list(int); intList theList; theList := ...; for int i in theList do doSomething(i); if i = 0 then ListDeleteAndNext(); fi; od; translates to: type intList = list(int); intList thelist; theList := ...; intList.iterator it; intList.iteratorInit(&it, &theList); while intList.iteratorNext(&it) do int i; i := intList.iteratorValue(&it); doSomething(i); if i = 0 then ListDeleteAndNext(&it); fi; od; intList.iteratorTerm(&it); 020428/Sunday Want to use 'ref' in the above example, not pointers. Don't really want explicit 'ref', though - that's not my desire for what 'ref' is. So, we want the list elements to be garbage collected entities. We don't want to have to make a class for them, however. So, that leads to the suggestion that a 'struct' is like a traditional C/Draco struct. We use the reserved word 'record' with the same concept, but when you declare a variable, field, parameter, etc. of a record type, you are actually getting a pointer to such a record. Record types have the 'new' operator to allocate a new one, with all of its fields initialized. Struct types don't have that. Hmm. If struct types are to be usable at all in the application language, then they need to be initialized. However, so do all variables used in that language, so perhaps its no big deal. However, simple variables can be initialized explicitly in their declarations - that's much ickyer with struct variables. What about arrays? How are they initialized in the application language? I guess in that language, all new entities are initialized. Might similarly need an array distinction. Fixed dimension arrays are akin to structs, but dynamically sized arrays (as in Java) are akin to records. Language is growing, and the variants are diverging! Note that the fact that the above 'list' type has ListDeleteAndNext does not directly relate to it being an iterable type - that is a separate feature of this particular list package. Try redoing the list package with records and a few ref parameters, but no explicit pointers. If can do it, then that generic type could also be done in the application language, and the deletion operation relying on garbage collection is OK. I think we might want the garbage collection to not work on pointers, since they are untrustable. 020429/Monday Random thoughts on GUI: - need a "draggable line", which can be used for those title bars, etc. where you can resize the columns (same for rows). - components may need several drawing implementations, size requests: - preferred size: would like this to draw properly - minimum size: below this, don't bother call drawing routines - maybe can use same drawing method for all, but it would be passed a size that it is drawing in. That might be the real size of the displayed space. It might be the preferred size in a scrollable display. It might be a very much smaller space where the component should draw its top-left corner only. For example, if a spread-sheet cell isn't big enough, the cell contents should do the best it can within that space. Random thoughts on a spreadsheet: - could make multi-dimensional, but no true need, I expect. Allowing only 1 dimension seems pointless - extra code for little benefit, since user could just a one-row or one-column 2-D spreadsheet. Perhaps menu for dimensionality contains "2", "3" and a numeric input box? Only if menus stay down. - each cell would have contents that is a variant record. One variant is a direct value. Another is an independent function to compute the contents. Another could be a function and a specification of which other cells (others in row/column, selected range of row/column) to accumulate the function over. Could be combos. So, some of these ought to be functions provided by the spread-sheet application, which apply a supplied function of a specified range. Random language thoughts: - the old language library showed that you cannot be type safe with things like token values if the set of tokens can change. Have to do something else. Token kinds would work with a variant struct. - likely want to define all reserved words from all languages, so that if a piece of code has to move languages, it can't have conflicts of identifiers with reserved words. - one thought (a bit wierd) - encode 1 - 4 characters of the operator tokens directly into a 32 bit value. Then languages can define whatever operator tokens they want. Any reason for this? Look through the language library stuff again before diving into this. Latest version of the List package, allowing ListDeleteAndNext: type package List(Type valueType) { type listElement = record { listElement le_next; valueType le_value; }; external = listElement; public type iterator = struct { *listElement it_current; }; public proc check(Type valueType)Type: Type t; t := TypeNewGeneric(sizeof(listElement), List); TypeGenericAddTypeParam(t, valueType); t corp; public proc new(ref external le)void: le := nil; corp; public proc iteratorInit(ref iterator it; *external pTheList)Bool: it.it_current := &pTheList; pTheList* ~= nil corp; public proc iteratorTerm(ref iterator it)void: it.it_current := nil; corp; public proc iteratorNext(ref iterator it)Bool: it.it_current := &it.it_current*.le_next; it.it_current* ~= nil corp; public proc iteratorValue(ref iterator it)valueType: it.it_current*.le_value corp; public proc ListDeleteAndNext(ref iterator it)void: listElement next; next := it.it_current*; it.it_current* := next.le_next; corp; }; type intList = list(int); intList theList; theList := ...; for int i in theList do doSomething(i); if i = 0 then ListDeleteAndNext(); fi; od; translates to: type intList = list(int); intList thelist; theList := ...; intList.iterator it; if intList.iteratorInit(it, &theList) then while int i; i := intList.iteratorValue(it); doSomething(i); if i = 0 then ListDeleteAndNext(it); fi; intList.iteratorNext(it) do od; fi; intList.iteratorTerm(it); And a version with no pointers (and so can be written in a language other than the systems one), but that does not allow ListDeleteAndNext: type package List(Type valueType) { type listElement = record { listElement le_next; valueType le_value; }; external = listElement; public type iterator = struct { listElement it_current; }; public proc check(Type valueType)Type: Type t; t := TypeNewGeneric(sizeof(listElement), List); TypeGenericAddParam(t, valueType); t corp; public proc new(ref external le)void: le := nil; corp; public proc iterate(ref iterator it; external theList)Bool: it.it_current := theList; theList ~= nil corp; public proc iteratorTerm(ref iterator it)void: it.it_current := nil; corp; public proc iteratorNext(ref iterator it)Bool: it.it_current := it.it_current.le_next; it.it_current ~= nil corp; public proc iteratorValue(ref iterator it)valueType: it.it_current.le_value corp; }; type sintList = List(sint); sintList theList; theList := ...; for sint i in iterate(theList) do doSomething(i); od; translates to: type sintList = List(sint); sintList thelist; theList := ...; sintList.iterator it; if sintList.iterate(it, theList) then while sint i; i := sintList.iteratorValue(it); doSomething(i); sintList.iteratorNext(it) do od; fi; sintList.iteratorTerm(it); By specifying the name 'iterate' explicitly in the above, we are allowing a type to have multiple iterators. However, they must all use the same "iterator" type for storing the state. Shouldn't be a big problem. For example, a 'Chain' type could export "iterateForward" and "iterateBackward". Aha! It was the presence of the list header that allowed the delete operation to work. The list could have been provided as a value, and so even using an address of what was passed wouldn't necessarily work. Also, having the list passed as a ref argument wouldn't then work. So, put back the list header structure. It has fields lh_first and lh_current. If lh_current is nil, then we are referring to the first element of the list, else we are referring to the one after where lh_current points. That allows the deletion to work. All without pointers or explicit ref's on anything other than the iterator structure. I think. Try it: 020431/Tuesday: type package List(Type valueType) { type listElement = record { listElement le_next; valueType le_value; }; type listHeader = record { listElement lh_head; }; external = listHeader; public type iterator = struct { listHeader it_header; listElement it_current; }; public proc check(Type valueType)Type: Type t; t := TypeNewGeneric(sizeof(listElement), List); TypeGenericAddParam(t, valueType); t corp; public proc new(ref external lh)void: lh.lh_head := nil; corp; public proc iterate(ref iterator it; external theList)Bool: /* it_current essentially always points one back in the list from what is the true "current" element. This allows us to delete the "current" element, and to insert before it. So, when the "current" is the first element in the list, it_current is nil, and we use it_header. */ it.it_header := theList; it.it_current := nil; theList.lh_head ~= nil corp; public proc iteratorTerm(ref iterator it)void: it.it_header := nil; it.it_current := nil; corp; public proc iteratorNext(ref iterator it)Bool: if it.it_current = nil then it.it_current := it.it_header.lh_head; else it.it_current := it.it_current.le_next; fi; it.it_current.le_next ~= nil corp; public proc iteratorValue(ref iterator it)valueType: if it.it_current = nil then it.it_header.lh_head else it.it_current.le_next fi.le_value corp; proc deleteAndNext(ref iterator it)void: if it.it_current = nil then it.it_header.lh_head := it.it_header.lh_head.le_next; else it.it_current.le_next := it.it_current.le_next.le_next; fi; corp; proc insertBefore(ref iterator it; valueType newValue)void: listElement le; le := listElement.new(nil, newValue); if it.it_current = nil then le.le_next := it.it_header.lh_head; it.it_header.lh_head := le; else le.le_next := it.it_current.le_next; it.it_current.le_next := le; fi; corp; proc insertAfter(ref iterator it; valueType newValue)void: listElement le; le := listElement.new(nil, newValue); if it.it_current = nil then le.le_next := it.it_header.lh_head.le_next; it.it_header.lh_head.le_next := le; else le.le_next := it.it_current.le_next.le_next; it.it_current.le_next.le_next := le; fi; corp; }; type sintList = List(sint); sintList theList; theList := ...; for sint i in iterate(theList) do doSomething(i); if i = 0 then deleteAndNext(); elif i = 1 then insertBefore(0); elif i = 1000 then insertAfter(1001); fi; od; translates to: type sintList = List(sint); sintList thelist; theList := ...; sintList.iterator it; if sintList.iterate(it, theList) then while sint i; i := sintList.iteratorValue(it); doSomething(i); if i = 0 then sintList.deleteAndNext(it); elif i = 1 then sintList.insertBefore(it, 0); elif i = 1000 then sintList.insertAfter(it, 1001); fi; sintList.iteratorNext(it) do od; fi; sintList.iteratorTerm(it); Commentary on this: The 'ref' type modifier can only be used on parameters. It is used to allow a proc to modify the so-passed parameter. There is no way, other than parameter passing, to produce a 'ref' value. 'ref's in and of themselves are not reference counted or garbage collected. Procs which take 'ref' parameters may only be passed individual variables, they may not be passed fields in struct or records, or array elements. This inconvenience is done to prevent the need for keeping track of refs that point into the middle of things. Allowing such refs without keeping track of them in regards to the full entity they are pointing within would allow the refs to be made invalid when applied to a value inside a dynamic entity that could be garbage collected away. 'record' types are akin to the records in AlgolW. All record values are internally pointed to. Hence assigning record values is assigning those internal pointers. Record types are reference-counted and garbage collected. The fields of a record can be modified by anyone who has a non read-only handle to the record. Record types can thus contain references to the same type. Manipulation of lists, etc. constructed from record types is done the same as it was in AlgolW - since you cannot have a pointer to a pointer, you have to use 'if's all over the place to special case empty lists. 'struct' types are the same as traditional struct types. A struct variable or field appears directly in its context - there is no indirection involved. Thus, struct types cannot contain direct references to the same type - the reference must be indirect via a pointer declaration, just like in C, Draco, etc. In this List example, type "iterator" can be a simple struct since it cannot be referenced outside of the scope it is declared in, and the iterator class procs have no need to keep a pointer to it - they are all passed it, and could pass it to other such procs if needed. Type "listElement" needs to be a record, however, so that it can be garbage collected when deleteAndNext is used. Similarly, type "listHeader" must be a record so that it can be remembered in the "iterator" and modified as needed. If we allowed "ref" struct fields, then it could be a struct, but I choose not to do that. Variant records. Example: /* System declaration we make use of: */ type UInt = record { uint theUint; }; /* Stuff here could be part of the system type facility: */ type Type = forward; /* forward declaration */ type ArrayDesc = record { Type ad_elementType; /* type of the array elements */ uint ad_dimCount; /* how many dimensions the array has */ [] uint ad_dimensions; /* array of the dimensions (0 origin) */ }; type GenericParam = oneof { case gp_kind in incase gpk_type: Type gp_type; incase gpk_numeric: UInt gp_numeric; esac; }; type GenericParamList = record { GenericParamList gpl_next; GenericParam gpl_param; }; type GenericDesc = record { TypePackage gd_package; /* package that implements the type */ GenericParamList gd_params; /* list of parameters to the package */ }; type TypeData = oneof { case td_kind in incase tdk_simple: UInt td_code; incase tdk_array: ArrayDesc td_array; incase ... ... incase tdk_generic: GenericDesc td_generic; esac; }; type Type = record { uint t_byteLength; uint t_alignment; TypeData t_data; }; ... Type t; t := Type.new(4, 4, TypeData.tk_simple(UInt.new(TYPE_UINT))); ... /* Construct an initial generic type representation with no parameters. */ proc TypeGenericNew(UInt byteSize; TypePackage tp)Type: Type.new(byteSize, 8, TypeData.tk_generic(GenericDesc.new(tp, nil))); corp; /* Add a type parameter to a pre-existing generic type representation. */ proc TypeGenericAddTypeParam(Type_t t, paramType)void; TypeData_t td := t.t_data; case td.td_kind in incase tdk_generic: GenericDesc_t gd := td.td_generic; GenericParamList_t gplNew := GenericParamList_t.new(nil, GenericParam_t.gpk_type(paramType)); if gd.gd_params = nil then gd.gd_params := gplNew; else GenericParamList_t gplScan := gd.gd_params; while gplScan.gpl_next ~= nil do gplScan := gplScan.gpl_next; od; gplScan.gpl_next := gplNew; fi; default: System.RunError("Non generic type passed to TypeGenericAddTypeParam"); esac; corp; 020501/Wednesday Grr. When do I use "type" and when do I use "Type"? It may not matter in many cases, but I'd like to have some idea, so that I can at least be consistent. Given that there will have to be a fair amount of pre- definition in the system, that isn't done in the language itself, perhaps its possible to use "Type" darn near everywhere. The only thing so far that *has* to be "type" is the "type package" syntax, and the declaration of "Type" itself. Hmm. Just started typing in "Type.a". If the proper way to talk about "Type" is as "Type.Type" (is it allowable to use the same symbol for a package name as for a public symbol exported from that package? Likely not!), then using "type" is much more convenient. Ok, lets make the package be "Types". For some reason, I haven't been appending "_t" to the type names I've been using here. I just tried adding them to "TypeGenericAddTypeParam" (which I think should be "GenericAddTypeParam" in the "Type" type package), and I'm not convinced it helps a lot. Makes the lines longer! I/O? Most of the system is GUI based, but there will undoubtedly be shell windows, and so there should be easy ways to do simple non-GUI I/O. *Perhaps* that will prove to be unnecessary - time will tell. If any is needed, my current thinking is to go with a system like Draco's. 020507/Tuesday It would be cleaner in some sense to use the "List" generic type for the lists in the Types.a package. There is a chicken and egg problem there, but there are probably ways around that. However, is a List type a valid type for a oneof? Should it be? Is there a way to look through the d_generic value to look behind it? 020510/Friday Almost gave up on the generic concept as it has been shaping up. One big problem is how to safely get pointers to the functions needed by the generic code stored away somewhere. There is just *no* way to do that in a type-safe manner, since there isn't a way to safely declare a type that is a proc type with an unknown number of parameters. I think the answer is to just have a type that represents that. In the latest example, I had called it Action, then changed it to Proc on Don's suggestion. So, let's have a system-provided type, named Proc, which can only be assigned a proc pointer, *along with its type*. Assigning from a Proc to a proc type similarly requires the type of the proc type we are assigning to. These routines are part of the builtin-type Proc, but must be written in the implementation language because they are not type-safe. To generalize on how the above might be declared, and to arrange that functions in a generic proc be passed the type of their arguements, add the new proc parameter syntax that specifies that the type of some previous argument must be passed. *Only the compiler* can provide (or generate code to provide) the actual argument to such a parameter. 020511/Saturday The language family, operating system, GUI, etc is called Zed. The implementation language is called Zil. The application language is called Zap. The common language is Zcl, pronounced "zikel". The operating system could be Zos. The user interface could be Zin. 020515/Wednesday Basically everything should be equally operable with the keyboard or the mouse. For example, if we have the folder-style stack of tabs, then you should be able to use arrow keys or your default forward/backward (likely ^F and ^B for emacs-compatible) keys to move among the tabs. Within such a tab frame (get there with down arrow?), the usual Windows TAB and Back-TAB should work, but so should the forward/backward, and directional keys in general. Also, the tabs should all have labels on the tabs, and selected letters in the labels should get you directly to that tab from the tab-set context. Similarly, within a frame/page (whatever they are called - the things with the individual tabs on top) you should be able to type a key corresponding to a highlighted letter in the the title of an entity on the page, and focus goes there, allowing typed input, or going to a set of radio buttons, which you can cycle activation through, or going to a set of non-exclusive buttons, where space/return and/or backspace toggle their settings. If can get proper raw keycodes, should be able to make ^H, BACKSPACE and DEL all be different. So, Gnu lovers can use ^H for help, but still have working BACKSPACE and DEL keys. Suggestion: finish off the closest I can come in Z to the Mapping package. Then translate the generic one to C, as closely as possible. Should be able to use it for the language symbol tables, just like in the real thing. Come up with a file listing the names of the system packages, and descriptions of what the package is about/for. For a C starter version, can perhaps have the package name be a large C struct, fully declared in a header file. Then the notation . will work as is, thus reducing later translation effort. 020815/Friday In the stuff I'm doing so far, I've used Package_Export to do this. The other way wouldn't really work as well, since I would need, for example, pointers to functions, which causes some inefficiency. From cg Thu Aug 15 20:51:20 2002 Return-Path: Received: (from cg@localhost) by ami-cg.GraySage.COM (8.9.3/8.9.3) id UAA28198; Thu, 15 Aug 2002 20:51:20 -0600 Date: Thu, 15 Aug 2002 20:51:20 -0600 From: Chris Gray Message-Id: <200208160251.UAA28198@ami-cg.GraySage.COM> To: djr@nk.ca Subject: conundrum Status: R [By the way, the BBQ, etc. is the Saturday evening of the train show weekend, not the Sunday evening.] How to do circular type references? In C you can do "struct blah", and the compiler is happy without having seen it. In Draco you can do something like "type blah;" and the compiler is happy to deal with pointers to it. In Zed, with a package, you can predeclare a type with something like "type blah = record", so that the system knows what kind of type it will be. At least, you can do that for struct, union, record and oneof. I don't know if anything else matters, since you could always wrap those in a one-element struct. If someone is developing an interdependent pair of packages online, then you just start with the pre-declared type in one package, then reference it from the other package. When the 2nd package is done, go back to the first, and you can now flesh out the predeclared type based on the other fleshed-out package. However, what about off-line development, where both package sources are coming from (likely separate) source files (or whatever)? I've bumped into this doing some compiler stuff - proc headers need symbol tables to represent scopes, but symbol table entries can be function parameters or local variables, which should logically be defined in the "proc" package. Any thoughts? -cg 020815/Friday In answer to the above email, what I've ended up doing is "predeclaring" types, as in: external type Package.Type_t; I'm taking this to mean that the type is defined for the current package compilation, but nothing is known about it, other than that it is a referenceable type. I.e. record or oneof. I've had thoughts in the past that I need to have inter-package references done in some kind of abstract way, and so this has now become a firm need. I'm not sure I'll get the final answer (requiring version numbers, originator numbers, proof-of-originator, etc.) for quite some time. However, for now, I'll do whatever works out here. Perhaps just a list of symbols representing the path to the final whatever from the root of the system tree. All such references must be resolved before the package can be used. Oops. I've had to add the ability to reference an external enum type. That means that the external references have to include what kind of thing the reference is to, its not just to a type. OK, so you can now do: external type Package.Type_t = record; external type Package.Type_t = oneof; external type Package.type_t = enum; This hence requires that all enum types are the same size. Sigh. 020822/Thursday I've done some work over the last few days, but its been mostly programming. I've changed a bit the way Array types are built, and the way Oneof types are built. The previous way for oneof's didn't even work, since it required the external guy to use the OneofDesc_t constructor! Just updated RefNotes with thoughts on oneof's and ref's. One of the next things to do is to make the system as a whole use the ref-counted "STRING" type everywhere. 020902 Monday Woke up at 6:00, with Zed going through my mind. Eventually got back to sleep. One thing that sticks out, however, is the thought that I might in fact end up with something a lot like Java's constant table. I don't remember any details of that (other than that it seemed complicated and I didn't like it!), but the reason for it could well relate to strong typing. My current kluge for execution structures and bytecode requires typecasts all over, and uses pointers to stuff. I can clean it up a lot by adding more kinds to ExecKind_t, which I will be doing. That way, the entire Exec_t will end up as a oneof, which is as desired, and assumed. However, what about references to things from bytecode? I can't embed a pointer in bytecode in any type-safe way. Hence, embed using indices into a constant table, just like Java did. Hence I might end up with something very similar. Oh well. I wonder if I switched to actual Java bytecode I could take advantage of Java implementations somehow? I doubt it, since my semantics are so different. A thought about persistence of things - its much more efficent to persist an array or matrix than a linked list. Remember that the MUD system used an array representation for "list" types, and that worked fairly well. This leads to the thought that eventually, I might end up with parallel representations of some things. Maybe. Perhaps just do some of the other types like I did the array type - build a temporary linked list of stuff, and then do a final step of conversion to the more efficient array. Might want a separate representation in some cases anyway, since there are going to be temporary values at "run time" (whatever that is), that don't make sense to persist. An example would be having a field in the Proc.ProcBody_t that is the current local variable offset - it goes up and down as scopes with variables are entered and left. Been idly thinking about Package references and paths a bit. I think I will need the concepts of absolute and relative path. Absolute paths work for system installed things. Relative paths work for user stuff, and likely applications as well. I think I'm going to have to do something about paths real soon. What do I start an absolute path with? Initial candidates are '.' and ':'. If '.' works that doesn't need another token, and is parallel to UNIX FS paths. So, e.g. "IntToString" becomes ".BI.IntToString". How will that look? Things get uglier - there will be nested packages for GUI things, so we could end up with a lot of very ugly code containing e.g. ".GUI.Window.Create(....)". On the subject of nested packages... I will need them. There will be projects that consist of many packages of code, and I need to be able to contain them within something. Paths are pointless without them! So, how do they look? I've thought of using a "path" construct that says where the package fits relative to other things. Alternatively, just use a path in the package name in its declaration. Cleaner, if it works. It is likely very wise to keep the system such that it can be compiled from text. I.e. not end up with just the database/filesystem, and no way to start from scratch. So, that means lots of source "files" and package bodies. Also lots of inter-package references. Rather than duplicate such things, it might make sense to either have some sort of pseudo-header files for packages, or to have a #include facility, so that one common set of declarations can be included by all packages that need to reference some other package. If I do the semantics of such declarations right, which I think is where the last day or so has been leading me, then that should be sufficient. Perhaps rather than calling it something like #include, how about "use path-to-used-package". For online development, you don't have to do that explicitly in what you type, since the system can figure it out from the paths you use to things. However, when the system emits a package "source", it should put them in. For the hosted environment like on Linux, there are actual files referenced by those 'use' lines. Are there for online development? The system can certainly produce one for a given package. How do you develop a set of mutually referencing packages? I *don't* want "nested includes". I think I will just put this off until I bump into it. Perhaps one of the things I should do today is go look at that site for the "BRix/Crush" system/language. ... Well, it started out looking good. There are definite similarities between it and Z. However, as I got further in reading, it became clear that he hasn't really used his language for anything yet, and it isn't close to being able to be used for anything. Also, all he really has is a bunch of X86 assembler code. Gack - there are lots of those out there. I'm doing it the other way round (naturally!). Looking back at the paths notes above. Do I need to be able to do the equivalent of ".." in paths? If I have a top-level package for a project, called "Top", and it contains packages "Sub1" and "Sub2", how does the definition of "Sub1" reference things in "Sub2"? Are types defined in "Top" fully known in "Sub1" and "Sub2", or vice versa? I don't think you get both - there has to be *some* order in which the detail processing is done. I'd say that the top level package is "compiled" first, and thus any types it has are fully available to packages it contains. Is it all types or just "public" ones? As a first guess, I'd say that all types are known by contained packages, but only "public" ones are exported to containing and sibling packages. 020903/Tuesday A problem I had thought about with the MUD system was that of how to "draw a box around" a chunk of the database in order to output it as source. Some ideas like ownership, a grouping key on each item, etc. With Zed, I think the clue is in both the type of the entity, and special handlers for some types. For example, in order to linearize a package, you start with something, like the package structure, and start turning it into a byte-stream. You first send a linearized form of the types you need to deal with. That can be done by having a linearizing function attached to the Types code (and the reverse for the other end), so that you can at least start. When a package is inserted, it needs to find things it wants to reference. It will do that by the Ref paths that are stored in it. Thus, the trick will be to have linearizers for ref things, so that they linearize the Ref, which is just a bunch of strings, rather than trying to follow the pointer and linearize what the Ref references. This could be keyed on the "ref" ExecKind_t that I've been thinking about. Printing is similar (ToText routines) - they don't follow Ref paths - they just display them. Might not be needed really, since it is pretty clear what is part of a package and what isn't. Probably can't hurt to allow for custom ToText and ToBytes handlers. May well be necessary to avoid special code in the system core. Lots of details to be worked out, but I think its time I dove in started building Ref paths. Add the concept of a Path as something that can be specified. They are shorthands for actual package paths. E.g. path PWindow = .GUI.Window; ... PWindow.Create(...) They might eventually get real handy. The semantics are very simple - they are like macros. They just make things look better and be easier to type. A Package should have a pointer to its parent package. This allows one to easily reconstruct the path to it. Perhaps there should be a pair of such pointers, often the same. One points to the parent package in the set of packages that is a larger project. One points to where this package fits in the overall package tree. Thus there could in fact be two instantiations of the same set of packages (??), that only differ in where the main package points for its place-in-tree. I don't think this makes a lot of sense. But, packages should have a parent pointer. 020904/Wednesday If a path starts with '.', it is an absolute path, and you start searching at the "root". If it doesn't, then you start searching at the top-of-this-set-of-packages point. All references within the set of packages are relative to that point. Thus, we don't need the package path equivalent of "..". Do I put a leading '.' in the paths for the defining names of the system packages (like "BI")? It may not matter, so I won't bother for now. Yes it matters. If I don't use the leading '.', then I don't know to enter the new symbols into the root - I would put them into a local root instead. 020908/Sunday Some idle-time thinking about "object oriented programming". This doesn't necessarily relate at all to Zed. If class Child inherits from class Parent, then instances of Child have all of the operations of Parent as well as Child. If Parent declares a method as abstract, then it is essentially not supplying an implementation itself, but requiring any inheritor to provide one. This seems an awful lot like a Java interface, but more obscure. If the Parent does provide an implementation of the method, isn't it overridden by a Child version for instances of Child? Doesn't that break any protection, in that any semantic rules that Parent wishes to impose, based on its methods, is broken, since there is no guarantee that the Parent method is called at all. Even with constructors/destructors, it appears to be the choice of the Child method as to whether or not to call the Parent method. Not sure about that - something tickling my mind about there having to be such a call, but it is the choice of the Child routine as to where. I think it might be clearer for all such methods to make it so that the Parent method is called first (last for a destructor of course), and can *prevent* the call of the Child method. This is a lot like the way I used 'status' values in MUD. In Java, and perhaps C++ as well, one often makes generic routines that handle all sorts of things that provide an interface. When a method is provided by Child for one, then that method is called, with a pointer to a Child object, even if the calling situation only had a Parent pointer. Is that true? I'm unclear, but I think something of that nature goes on. E.g. with GUI operations, the "Draw" call is a Child draw call, and that needs a Child pointer. I believe this is a safe thing to do, and perhaps even a very useful thing to do. What I don't much like are the "casts" which take a Parent pointer and force it to being a Child pointer, with a run-time type check hidden in there. I think if you need to do that, why not pre-declare the Child type, and have a slot in the Parent record for a pointer to the Child record? The whole concept of the kind of inheritance that C++ does, which is structural inheritance, doesn't fit well with what I'm doing in Z, with record and oneof and matrix types. Have to look more at what Java does. Perhaps it actually does nest the structures - I'm guessing so. I'm also not terribly pleased with the whole concept of the descriptor tables that go on with C++ and likely Java. Complex, hidden, poorly understood. Not sure there is an alternative if you want to do "Draw" call above. Ah yes, one problem with the virtual function tables is that they tend to be pretty static. Its virtually impossible to add elements to them, i.e. to add methods to a class that has descendants. In my fully interactive environment, its not clear that that is something I can have. There *have* been things done in C++ that allow more flexibility, but they are built up on top of C++ stuff, and are not simply using the language directly. Probably similar with Java. The thing here is that my vision is that Zed is an interactive programming language - the very nature of programming is fully interactive, working within a live system. I don't want "recompilation" of whole big messes to be necessary to do things like add a method. I *am* aware that sometimes recompilation is somewhat necessary, but I want it minimized. 021121 Thursday Haven't had time to do anything on this stuff for ages. Lego was the big holdup for a while, as was TV. Now I'm working late on our crush until Christmas. Sigh. At work, some folks use some BSD list manipulation macros. In at least 3 cases, they messed up and did other list manipulation things while iterating down those lists. Messup! So, what can be learned from that for the lists in Z? Well, in the old example, I've certainly had the abilities to iterate down the list, and to remove elements from the list (well, at least the current iteration position element). Also, I believe I had the iteration control structure in local storage. Now the question is: can there be more than one iteration of a list at a given time? If the answer is yes, then when an element is deleted, you need to be able to patch up all iterations. So, you would want the iterator constructor/initializer to link them all together so you can find them. Getting messy, but: What if you have a "return" statement in your language? In the case of the iterators, if the inner ones only point to the outer ones, it is likely OK to simply discard the inner one. Getting picky here - this is supposed to be a safe programming language, and there is nothing preventing a ref to the inner from being put into the outer. I think - I find I'm hazy enough on the language that I can't remember! Sigh. Even worse than a return statement, what about exceptions? They go foofing out of many contexts. If you can catch exceptions at random points, its tough to know what is safe to use. If it works, I'd want the language semantics to be that you can't do the sort of thing I'm talking about here. Hmm. Without storing a *pointer* to it in the head of the actual list, there is no way to link together the iterators, since the interator constructor/initializer has no way of knowing that others exist, or of finding them. That implies storing the head of the chain of current iterators in a head node for the list. Ick. But, if you do that, then the "return" or exception cases suddenly make the inner iterator, pointed at by the list head node, invalid. The key is perhaps that you just plain can't permanently take the address of something in the safe language. Hmm. I think the last sentence above is true, and is sort-of the answer here. The functions that use the iterator structure cannot create a permanent pointer to it, since it is only passed as a ref value to them, and you can't save ref values. So, you can't build the chain of iterators. Then, how do you allow element deletion from the list? The answer has to be that there is a list header, and it has an indication of the presense of an iteration, and you can't delete an element if there is one. Perhaps the trick is that the flag is set only when a second iteration is started - if there is only the one iteration, then passing it to the delete-this-element function allows it to be updated properly. I believe that is done in the list example way up above. Ok, I think I can live with this. 021203 Tuesday Some thoughts from a late walk home and in the shower. :-) The system will need some pretty capable searching facilities. For example, the ability to search all text fields or subfields/elements... in the entire (or a portion of) system will be needed. A search menu could include submenus for types of searches. It would be nice if that menu is extended automatically as new search modules are added to the system. Perhaps a search capability is one of the optional things that any compound type can include. Most won't, since its not relevant. So, for example, an image type could include an entire menu of image searching and classification capabilities; a music type could include capabilities there. The basic system could have a numeric type that does simple comparisons, ranges, function evaluations, and combinations thereof. Text searching needs to have patterns. There could be a bunch of pre-cooked patterns for, e.g. whole numbers, floating point numbers, IP addresses, capital letters, vowels, etc. There should be a way to find whole numbers in text, convert them to actual numbers and apply the number classification stuff to them. Searches should be able to run an arbitrary operation on each found item. Pre-canned complex searches should be combinable. Writing a server for the system could be fairly easy. A template could provide the pieces for a simple one-request-one-response server, and be expandible for multiple request types. A server registers a major operation code with the system - messages arriving (messages are of course just a basic system structure which contains the major operation code as one numeric field) with that code are passed to the registered handlers, and thus the server is created with little fuss. 021231 Tuesday On searching for symbols. I think the answer is that there are two concepts that interrelate. One is what I've been calling a "package", which is a set of things that implement a given functionality. Another I'll call a "unit", which is a part of a package. The "unit" corresponds more to a traditional source file, in that it is the set of things that can be displayed in a "text" editor window. Thus, a leaf package will consist of one or more units. In the simplest case there is only one unit, and so the two are pretty much equivalent. In a more complex package, a big package can contain multiple sub-packages, each of which can contain other packages or units. Any package, whether leaf or not, can contain units. A package must contain at least one other package or one unit, else it is an empty package. So, searching for a initial symbol in a non-absolute path starts at the innermost scope of any current proc, working outward, ending with the proc's parameters. If the symbol is not found that way, then the current unit symbol table is searched. From there, the search goes out to the containing package, and continues out to the top-most package. That topmost package will have a position in the main system tree, but searching does not go beyond it. An absolute path searches only in the top system table. The package header in a unit gives the path to that unit within the entire package that it is part of. Thus, it can be a path itself, starting with the name of that outer package. Should a package list the packages and units that comprise it? This might be desireable from a security point of view. Does it matter? 030106 - Monday A note from work. Likely irrelevant. When generating native code, it would be good for blocks of code that are usually skipped over (e.g. the DEBUG macro stuff at work) to be moved down to the end of the routine. That way, the CPU can work better on the assumption that forward branches are usually not taken, which is what I recall happening in the absence of branch hints or a branch taken table. 030108 - Wednesday Had a mental image of the process of programming natively. The screen would show a chunk of code. Areas where the compiler is complaining would be highlighted in some other colour. You could execute your program up until it actually hits an erroneous routine (or perhaps even erroneous code in some function). If you use " := ;" then you are initializing the variable to the value, and can assign to it later. If you use " = ;" then you are doing the same, but no further assignments to the variable are allowed. If "" can be computed at compile time, then it is, and "" is then a constant. 030111 - Saturday Perhaps replace 'ref' parameters with a choice of 'res' or 'valres' parameters. 'res' allows the compiler to know that a variable, passed to such a formal, need not be initialized, and might also warn about any assignment before the call, if the value is not subsequently used. For a 'valres' parameter, it is explicitly not stated whether the value is copied in on entry and/or exit, or whether a pointer is used. So, the compiler could copy in/out small variables and use a pointer for big ones. I don't think this kind of things violates any security needs. Its perhaps better also, since it doesn't give any impression that there need be a pointer inside the subroutine, so no temptation to try to use it as a true pointer value. 030112 - Sunday Put the beginnings of the 'matrix' construct in. It just parses so far. The problem is that I created a new type for the return value, and that is not compatible with the just-declared variable that I am assigning the new dynamic matrix to. So, I either need to compare types (just array, matrix and pointer types, I think), or I need to have a table of known types, so that I can always use the same type pointer for the same types. Which is best? I don't think I need to have other kinds of types in that table, since they aren't supposed to match even if they do happen to look alike. If indeed those are the only kinds of types I need to do that with, then perhaps it is just a list of pointer types, a list of array types and a list of matrix types? The entries need have nothing more in them than a list-next pointer and a pointer to the type descriptor, which I will already have. Hmm. What do I do if the element type ends up being Types_Error - then I really ought to allow them to match. Maybe not worth it - that just gets rid of an extra error message. Also, if there are any type modifiers on the element (or pointed-to type), then the compound type is not compatible, so I don't have to mess with those. Eventually, I'll need something faster than simple linked-list traversal, I think, since the full system might have thousands of types in total. Not clear how many distinct pointer, array or matrix types it will have, however. There is certainly an unbounded number *possible*. Can I use a Mapping somehow? That would be good, if I could figure out what to use for the hash value - it has to be something type-safe. Later: I think I can do a general function that provides an integer from a type. Often it will build up from a hash derived from the strings that are part of the type. This may have other uses than the type lists mentioned above. 030114 - Tuesday Package variables - what do they mean? I think something initialized with '=' at the package level should be essentially a read-only variable, set only once when the package is first "created", and shared by all users of the package. The expression needs to be evaluable at that time, and doesn't need to be what is traditionally a "compile time constant". Variables initialized with ":=" (or not initialized) can perhaps be of two different kinds - shared among all users, or created for each "context" (what's a "context"?!?). The system should preserve for display the original expression that was used to initialize the variable. Same for local vars, of course. Perhaps a "context" is similar to a process in other systems. I'll need something of the sort. For initial work, running hosted on other systems, I think there will be only the one context. This of course is totally ignoring the issue of persistent variables, and variables that override based on where they are in the hierarchy. I think when the system has multiple active contexts, then it would have separate instruction and data spaces. *All* code and *all* data not dynamically allocated would need to be in the spaces of all contexts. This is because all code can find any other code (plus create it or load it from elsewhere of course) and execute it. So, part of making any package runnable would be to "allocate" space for its code and data. Well, allocate space for the instantiations needed on this system - the process of parameterizing more versions needs to allocate more of course. Just what can be referenced from other packages? I'm getting an undefined symbol error for "theUint", the only field in record Base.Uint_t. That's because I'm not actually searching in the Base package's symbol table. The reason I can find the types and functions from that package is because they are being referenced fully with a path. If I don't search in other package's exports tables, then I'll never find such symbols. So, now I have to decide what the rules are for doing this. Have to decide just what "ro" means. Similar to C's "const", I think. Currently, the type normalizing code doesn't handle any 'ro' types, so I can't assign an "ro [] uint" to an "ro [] uint", in general, because the type pointers are different. 030124 - Friday Another way of looking at types, is that strong type checking is a way of enforcing constraints, when coupled with the basic type being 'ro', like Type_t is. Values of the type can *only* be produced by the entry points in the package which exports the type, and so it can guarantee conformance to some constraints. The fact that a given package, and the types it exports, are globally, uniquely identifiable, means that one can rely on the values meeting the constraints. Similarly, a type package (like Mapping), can enforce constraints. The constraints are programmed in the system, but are not expressed in the system - they are documented in interface specs. For example, Mapping can guarantee that the same key is never duplicated. They system can provide UintHash functions for all types. We create them for the simple types, and then. For a 'oneof', take the oneof index and xor it with the value from the active oneof case. (Could fix up my TypeHash routine to work that way.) Etc. Don't need a UintHash on pointer, struct and union types, since those are not likely usable with something like a Mapping anyway. They are low-level only stuff - second class! Define semantics of 'res' and 'valres'. Can optimize some cases. Say that 'res' copies into the destination formal only at the end of the routine. Similarly, 'valres' copies out at the start, and back in at the end. Goal for early implementation - generate code at runtime, using calls to the Type and Exec constructors. Perhaps use a bunch of special Type values as the error returns from some of the Type constructor functions. 030201 - Saturday I'm about to start in on Zed-izing my internal program representations (its currently a mangled version of what I used in AmigaMUD). Thinking about it, it looks like I'll have to include the type of the described subtree in each tree-node, and have special construction functions, perhaps for each kind of tree-node, that verify the type semantics of the node being built. To keep the system safe, I either have to verify the semantics when I build the internal representation, or when I compile that representation into byte-code. Either way, I'll have to know the types. In the old MUD system, I didn't need the types in the tree, since only the compiler could build the tree nodes. When I used the direct tree interpreter (instead of byte-code), it could determine the types from the bottom up, starting with the identifiers and constants that were the leaf nodes. Hmm. That suggests that when I compile a tree into byte-code, I might be able to do the type checks at that time. That would save having to have types stuck into each tree node. I wonder if that will work - it ought to, in just the same way as the tree interpreter did it. Ok - deep breathe - I'll try doing it without types in the nodes. Hmm. That suggests that I may not need consistency checking functions when I build the nodes, since it can all be done when I generate byte-code. Of course, that means that anything that deals with the internal program-representation trees needs to be able to handle wildly invalid ones that someone has built. That will include the pretty-printer and the linearizer (which turns them into a byte-sequence for transmission or storage, just like I did in AmigaMUD). One evil thought: if I make the types not restricted (in the sense that a restricted type is one for which only code in the type's defining package can construct or modify values of the type), then an evil person could modify links in the tree nodes and thus build circular (non-tree) data structures. I don't want the byte-code generator, pretty-printer and linearizer to have to mess around detecting cycles. I wonder if I want to add another variant to the public/ro spectrum of types: public: anyone can construct and modify values of the type ro: only routines in the package defining the type can construct or modify values of the type (if the type is exported, anyone can reference the values and their sub-components) partly-ro: only routines in the package defining the type can modify already-constructed values of the type, but anyone can build new values using the normal language-defined constructors. Given that the last form can be accomplished by simply exporting routines to construct the values, there is no real need for it. It could be convenient, but I don't think its necessary. Perhaps the way to go is to have the exported tree-node constructor routines do as much validity checking as they can, perhaps even returning error strings or somesuch. That keeps most of the checking for invalid stuff in one place, thus saving total code. Maybe. I think I might actually want to do the Proc_t stuff before the Exec_t stuff. That's because the Exec_t stuff will all be validated in the context of a Proc_t, and perhaps a chain of Package_t scopes. And, any identifiers that are referenced have to be coming from somewhere, and you'll pretty well always need local variables or parameters in any piece of code that is useful, so you'll always need a Proc_t to contain those. The various routines for Proc_t should be doing the grunt-work of determining parameter and local stack offsets, checking for duplicate identifiers, etc. Can we add a parameter to a Proc_t at any time? If we can, then we need to keep the total sizes of parameters and locals separate, and not determine the final offsets of things until code-generation time, since adding a new parameter affects things. Adding a new parameter can cause problems with locals, if the symbol happens to be the same (not allowed to mask a parameter with a local). Or, I could allow locals to mask parameters, in which case a masked parameter is simply inaccessible from the scope in which it is masked by a local. Workable. Which raises an issue. Have I ever had warnings in my compilers? I don't think so, except in MPCC, since the languages I define simply disallow dangerous things. I guess I could have had a "used before set" warning in Draco, and it would have been useful. So, how do I return these various things from some of the Proc_t and Exec_t routines? I guess the answer is that they go into the Parse.State_t record, much like I've used it in my early stuff.