Contents and ideas Copyright 2002-2010 by Chris Gray. Date of initial
publication: April 18, 2010. Any new ideas and concepts in this
document are owned by Chris Gray and may not be used in any patent or
other restricted use intellectual property instrument without the
physical written permission of Chris Gray.


April 13, 2002 and later - initial notes


I've initially picked the name "Aleph" for this project. That's the first
letter in the Hebrew alphabet or something.


Q. What is this project?
A. Replace all software. All of it. The bulk should reduced by several
    orders of magnitude.

Q. Do you seriously expect to be able to do this?
A. No. But I'm hoping the attempt itself will be fun. I'll be happy if I can
    end up with something that I can use to do useful applications.

A. Aren't there serious problems with some of the ideas here, such that
    the whole scheme is impractical or unworkable?
A. Almost certainly. How far can I go, however? I'm not afraid to redesign
    huge amounts of stuff if that is needed.

Q. Are you concerned about compatibility with existing programming
    languages, operating systems, protocols, formats, etc?
A. Not really. I'll need existing stuff to bootstrap from. If this thing
    ever goes anywhere, converters and interfaces will likely appear.

Q. Why are you doing this?
A. Because as time goes by, I realize that I am simply growing more and
    more unhappy with the current state of computer software in general.
    Instead of grumping about it, I'm going to try to make an attempt at
    doing something about it.

    Most of the world happily uses Microsoft software. I cannot. It
    continues to grow in size and requirements, while not providing the
    few things I require from computer systems: reliability,
    trustworthiness, and guaranteed privacy. Given its status as
    closed-source proprietary software, there isn't anything that will
    ever be done about it.

    Linux systems and other open-source Unix-based systems are very old
    technology at their core. All of the commercial Unix systems were
    long since rewritten in more modern styles. The bit I know of the
    Linux kernel tells me that I don't want to know more. Applications
    are becoming larger, and more dependent on huge GUI libaries, unsafe
    programming systems, etc. The simplicity, predictability and
    reliability of the Linux world is vanishing. The fact that the
    source is open doesn't really help if there are hundred's of
    thousands of lines of poorly written software to look through.

    The very Unix model is outdated. I/O should be asynchronous at its
    heart, not synchronous, for example. Having everything be a raw
    file of bytes is powerful, but I'm not convinced it is the best
    way to go. Changing from these basic assumptions is just not
    possible in a Unix environment - they are too deeply embedded.

Q. So what do you propose?
A. See the rest of the documents, starting right here:


Some buzz-phrases:

A family of programming languages, all with shared base elements, that is
used to program everything. Things like pointers are an extension that is
used to make a low-level programming language, for example. They are not
used in applications. Applications are written in completely safe variants
of the base language.

Persistence. Variables in some of the higher level programming languages
can be persistent, i.e. stored on non-volatile storage like disks. All
variables can be persisted, regardless of type. This includes variables
of reference types (*not* raw pointers). The system guarantees the
atomicity of the persistence, regardless of the type or size of the
variable being persisted. (Is that possible even theoretically?)

Linearization. All types of variables in the higher level languages can
be turned into byte sequences that are independently and universally
interpretable. Such byte sequences are the basis for the persistence, but
are also used for transmission over networks. Such forms are the *only*
Aleph-internal way that information is stored or transmitted. All other
representations are considered to be external.

Code is a first class entity. Functions in the various languages can
be linearized just like anything else. They can thus be persisted and
transmitted. Code can readily be built up at run time - the concepts of
"compilation", "linking", etc. are not directly used. (Except perhaps
when producing raw binary code for the lowest-levels of the system,
whether for the current platform or for some other.)

Types are first class entities. They can be linearized and transmitted
just like anything else. They are universally decodable, and universally
identifiable. No, I'm not really sure what I mean by that.

The system provides "playpens" in which pieces of code can run. Such
playpens can limit the amount of persistent storage available to the
code, can limit which persistent variables external to the playpen
can be read or written, and can limit both the proportional and total
amounts of system resources the playpen can consume.

All configuration options, program input and output, settings, etc.
are done as persistent variables. There should be no need to have
text or binary configuration files dealt with directly by any programs.
It is of course possible to do that, but I fully intend to reject
any code that does so. Such persistent variables are viewable and
editable by the standard system facilities for doing so (intended
to be GUI based).

The entire system is visible to non-playpenned code. All system code
is represented the standard format as well as any compiled binary or
bytecode forms that may be present. Thus, all system code can be
browsed.

A GUI is assumed. That doesn't mean there can't be text shells as well.
However, the standard way of operating is with a GUI.


Some intended results:

All programming languages on the system are consistent. They are all
based on the core language, with whatever extensions are appropriate for
their particular niche. Some variants that could be useful:

    - low-level programming language useful for writing the heart of the
	system itself, and perhaps for device drivers, etc.

    - application level language, which adds things like a proper
	"string" type, language-defined collection types, etc.

    - shell language

    - spreadsheet language

    - editor language

    - "web page" language (runs on the client machine)

    - "web server" language (runs on the server machine)

    - "database" language (if one is even needed)

    - text formatter language (some folks still like to use text formatting
	languages, rather than word processors)

A "web page" is really just a value of some type ("DisplayPage"?) that
can contain images, runnable code, formattable text, references to other
"pages" (including on other machines), etc. A "browser" simply(!) has to
render the page within its given GUI frame, using the basic system
facilities for rendering those kinds of things. Any code in such a page
will of course run inside a playpen created for it by the browser. Thus,
things like "cookies" are just persistent values that the playpen
allows such code to have. Also, control over what a given page can do
is controlled by the playpen settings for it. Thus, there should be no
security issues with running code retrieved from remote sites, so long
as the browser doesn't allow stupid settings on the playpens. I'm sure
this will get very difficult. Pages are sent between the server and the
browser simply as the linearized form of the page values.

Email and news are just linearized values of some type. They are
transmitted and stored as such. As such, they can contain anything. The
issue of decoding "attachments" becomes rather moot - all systems can
decode them, but might lack code to display them in the intended form
(they can of course display them in a default form, since all systems
can display any value using default forms).

What is a spread-sheet? It is a value of some type, such as "SpreadSheet".
Such values would likely include a two-dimensional array of other values,
along with settings, row and column functions, etc. The usefulness of
the value comes from the existence of a custom renderer/editor for that
type. Similar for the "web pages" mentioned above, although the "browser"
need not be able to edit the pages.

Using databases as filesystems (kind of) is a big thing coming up -
Microsoft is heading that way apparantly. I'd like to think that Aleph
doesn't need such a thing because it already can efficiently store
large quantities of structured data. Indices of some kind would still
be needed, however, so perhaps a "database" is a value that contains
one or more "index" values and a list of the data values. I doubt that
will be sufficient, but perhaps such thoughts are a start.


Various thoughts and mental images:

A text editor is basically the custom renderer/editor for arrays of
strings. As such, it should be universally available whenever a type
which contains an array of strings needs to be rendered or edited.

Similarly, a "paint program" is a custom editor for images. And thus
I magically wave away entire industries and millions of lines of code!
Needless to say, we need multiple possible editors for types, and the
system has to allow the dynamic addition, replacement and modification
of such capabilities.

Since all values are system-known types, sub-values not deliberately
encrypted or hidden should be extractable without too much difficulty.
This is because types are first-class entities within the system, and
thus code can examine the type of values at run-time, and find
usable sub-types within those types. With that information, and knowing
the type-path to get to those subtypes, the code can extract the
sub-values. (Details hazy!)

I picture there being at least 3 editors/renderers for code. One would
use straight text, and have traditional lexer and parser. This would be
used by those who haven't yet been able to retrain themselves to use
anything else. A second would be an "augmented" text editor type thing
that does on-the-fly checking, automatic insertions, etc. The sort of
thing that would drive most programmers instantly insane. So, it had
better be configurable, so that it can go all the way down to the level
of the straight text editor, with minimal checking. The third would be
a graphical program editor, with icons for various programming
constructs, etc. This third is targetted at non-programmers who want
to write spread-sheet rules, web-page code, etc.

Since code is a first-class entity in the system, the core system
language abilities are available to those editors. I'm not at all sure
how it would all fit together, however. One thought is a construct that
I put into the Alai language (the one I designed for my MSc.) That is
a kind of type which is a reference to code. E.g. "code int x" (or some
such) would be the declaration of a variable which holds a reference
to some code which, when executed, produces an 'int' result. I don't
know how far such a concept would go.

Users would typically have hierarchies of folders (directories) containing
their saved values. I envision smallish sets of customization values
being associated with such folders. Thus a user could have their
default GUI style settings stored with their root folder. When a value
is opened by a GUI renderer or editor, the path from the value to the
user's root if traced, looking for GUI style settings to apply to the
display. The values could be something like a list of name-value pairs,
where the values can be whatever is needed (typically numbers, pointers
to chosen style definitions (which include the code to render the style),
strings, etc.) My intent was that the various GUI code that is used by
the renderer (and the renderer itself of course) would just use the
name of the variable directly, and the system would find the appropriate
value to use (caching it nicely of course). It's currently not clear
to me how this would actually work. Perhaps a class attached to the
variables, such as "config int WindowBorderColour", would trigger the
application-level programming language to do the "right thing" at
runtime. It is also my intent that there be some base definition of
all such values, which is required to have explanatory text associated
with it. So, you can't just add random arbitrary names to the lists
of configuration settings. There are huge issues with this, but this
sort of describes my desires.

Porting Aleph to a new system should be a minimum of work. That's a
goal. Towards that, I already have the beginnings of the core language
in my MUD system. There, I have the lexer, parser, byte-code generator,
byte-code interpreter, linearizer for code, pretty-printer (renderer),
etc. Modifications needed, but it sounds like a first step, to me.
Also needed, at a minimum, is a way to open a GUI window on the hosting
system, paint pixels in that window, and get keystrokes and mouse
stuff from the user.

I've already done work towards an extensible programming language. That
was done 7 or 8 years ago before I started AmigaMUD (which was first
called AmigaDUM, since the real thing was going to use the extensible
programming language libraries). This stuff is the original source for
some of my thinking about Aleph. Work on AmigaMUD triggered my thoughts
on linearization, persistence, byte-code execution, etc. I've recently
looked at that old code (written in Draco), and its not as elegant as
my rosy vision of it suggested. It likely should be mostly ignored,
except perhaps as the source of the idea. One of the first things that
needs to be done, I think, is to design the core programming language,
how language variants are produced, etc. There are huge issues with how
entities are named in this system, since I want there to be one single
hierarchical namespace that covers the entire distributed universe.

The "proper" way to access a persistent value is to do so asynchronously,
with an event triggered when the value is retrieved. This is because
values can be very large, and so the access could fail. No-one wants
to do this to reference a simple configuration value, however, so
the default is to just mention the name of the variable in code, and
the system will synchronous fetch it, triggering an exception of some
kind if it fails. Even doing asynchronous fetching isn't the full
answer, however, since values can be larger than memory. So, a
reference to "HumongousArray[index]" shouldn't need to fetch the
entire array into memory. Thought needed here.

My intent is that virtually all of the system code be fully explorable
by the user (like the old Xerox Smalltalk systems). What reasonable
restrictions can we put on the changing of the system code, however?
Clearly any code related to guaranteeing security can't be changed.
Or, if it is, the system then becomes untrusted, and anything it sends
out cannot be run outside of a playpen. Need ways of updating code in
the field, of course. Heck, I'd like the norm to be that all systems
periodically check with update sites and update themselves with stuff
that their user(s) have configured to be updated. Would I trust
something like that myself? Not likely. Sigh.

One main plus is that we ought to be able to greatly reduce the effort
involved in storing and retrieving information from disk, be eliminating
any issues of parsing, ugly formats, etc. We also ought to be able to
get rid of a lot of the effort involved in network communication, by
removing most issues of protocols and transmission formats from
application-level programs.


Random mini thoughts

Perhaps type attribute "big" is a hint to the system that the value
stored in such a variable ought to be stored in separate space, much
like files are on traditional filesystems. Should that also affect
its handling in memory?

You should be able to point at darn near anything on the screen and
push the HELP key (hmm, this keyboard doesn't have one!) and get some
reasonable help. The system itself should be able to build a description
of most things, but renderers, GUI style elements, etc. ought to be
able to insert a more specific one of their own. At a very minimum,
configuration variables should all be documented.

Is intermachine communication message oriented or stream oriented. Or
both? Its easy to build a stream on top of messages, slightly harder
the other way. Extremely large messages are a problem, just like
extremely large persistent values - they don't fit in memory. Should
the system be able to start interpreting (including running code!)
a linearized form before it has received the whole thing?

In effect, the system becomes one large conglomeration of interlinked
values. Hard to manage. Need to put boxes around parts. Where?

Does the system use a garbage collector (perhaps more than one, at
different levels of granularity?) or reference counts? Or both? If
boxes can be placed around things, are reference counts among boxes
practical? Are reference counts within boxes always practical - boxes
could get pretty big. Perhaps reference count versus garbage collection
is a choice of the box.

The system knows how to render and edit all basic types. It also has
ways of rendering and editing all of the language-defined compound
types. Thus, the system can render and edit any value. Rendering
can be one of two styles - render to text, or render to pixels. At
the bottom they are equivalent, since for text to show up on a GUI,
it needs to be rendered as pixels. However, being able to render to
text is useful, in that it works in shells, editors, etc. Perhaps
both modes are always available? What does rendering a number to
pixels do?

My thinking is that the base language has only the basic programming
language constructs and the integer type. No floating point, no
pointers, only fixed-size arrays, no collection types, etc. It isn't
really useful for much of anything, except perhaps it could call
system primitives for rendering and so be useful that way.

Higher-level language variants would have language-defined collections,
perhaps including list, set, bag and possibly others. The "string"
type doesn't exist in the lowest level language, since it requires
memory management. I'm not even sure that 'char' exists in the base
language! Possibly including it, and thus allowing 'char' arrays would
make the base language marginally useful. If that matters!

Byte-order. There certainly has to be a defined byte-order for values
sent between machines. I suggest big-endian for the usual reasons.
Should persistent values use that, too? Then you could literally
take a data disk from one machine to another, and it would still
work (including the code on it), even on another architecture.

I think the default attribute for variables is that they are not
persistent, do not need to be updated atomically, are not 'big',
etc. In AmigaMUD, all global entities (there are no global
variables) are persistent. I don't think that is necessarily the
right choice in this case.

Oh yes, it *is* my intent that any code you have write access to can
be edited online at will, and the new version will take effect
immediately. This is like in AmigaMUD, so I know its doable. We need
to think about the consequences of this in the more global scope.

Conditional compilation. Its hard to imagine doing full operating
systems, etc. without it. But, it doesn't really fit into the style
of programming I've been talking about here. One thought is to have
parameterizable chunks of code (at what level - package?). Associated
with such a thing is a set of instantiations, where values have been
selected for all of the parameters, thus producing an executable
variant. I'm assuming here that the execution system is able to take
advantage of the parameterization when it generates code (whether
internal trees, byte-code or machine code), and do proper optimization,
including inlining of routines whose bodies have become miniscule.

The program editors should also be able to handle parameterization.
E.g. you should be able to set values for parameters, and the rendering
of the program will show the code resulting from that parameterization.
Gnu emacs can apparantly do something like this, and I expect it to
be quite useful. Perhaps the renderers should also be able to do it
based on non-parameter conditionals - that could be very useful.

A thought on program representation. Internally, one form is that of
a parse tree (which can be linearized via prefix traversal - that's
what AmigaMUD does). Nodes in the tree could all (or some?) have
a field which points to their optimized variant. It is the optimized
variant that is executed (or has code generated from), but it is
the non-optimized variant which is rendered and edited. This allows
things like constant folding to easily happen, but without losing
the original form of expressions, etc. Parameterization could use the
same mechanism, but we do want to have several different
parameterizations, so perhaps that is best done at a higher level.
Perhaps at the function or package level.


Some random GUI thoughts

Keystrokes in the GUI system go to the most local selected element
which has expressed a wish to receive them. GUI elements that don't
need keystrokes don't express such a wish. Thus, we hopefully avoid
the focus issues that the Java MUD client has.

In the GUI system, the contained element requests the size of the
inner part of the container in which it is to render itself. *Not*
the outer size. The contained element shouldn't have to bother
itself figuring out how big the borders and stuff of its containing
element are. What if the requested size can't be given? The simple
choices are to fail the creation of the element, and to create a
scrollable interior for the element to render in. The latter is
likely needed anyway, for displaying large images directly. There
is still a problem if there is no space for the containing element
to render its scroll bars. Perhaps the creation then fails. Lets
pick a minimum pixel size for such events - a one pixel by one
pixel scrollable space is not terribly useful.

GUI elements should not try to force appearances - they should use
the hierarchical settings values the user has specified, as described
above. I really, really hate programs that deliberately ignore the
user's settings because the author wished his program to use a
certain style. Doing that is art, not tool building. Perhaps a
tool can include "recommended style settings", and the user can
choose whether or not to allow such overrides. Yet more complexity.
This might just come down to setting where the search path for
config settings starts - with the tool itself, or one level up.

A GUI element cannot tell anything about stuff outside of the box
it is rendered in. That is not its business. Is it even remotely
possible to produce a usable GUI system with that rule? I'd really,
really like to keep the GUI system from bloating into a multi-
megabyte monstrosity like nearly all others have done.

What coordinate system to use? Floating point scales and is more
abstract, but my experience with using it in AmigaMUD tends to make
me shy away from it. Too many things need pixel accuracy to look
right. Rounding problems mess things up. I also recall not liking
Java's model, although it is interesting. I think the problem was
that there was no way to find the bounding rectangle for a
polygon. Or something like that.

The defaults/settings values should be "live". I.e. changing them
should affect the GUI's appearance right away, as appropriate. I
have no idea how to do this efficiently!

It should be possible to attach things to the "source" of programs,
such as images, sounds, animations, etc. They wouldn't affect
the compilation/execution of the program - they are only a
communication from the program writer/editor to the human program
reader. Perhaps allow setting of a backdrop image, etc. The reader
must be able to control all of the appearance aspects, however,
so that fonts/colours/sizes etc. can't change if he choses not
to let them.


Some random programming language thoughts

What do we do in the language about global variables? I'd like to
not have any, but that is often impractical. Perhaps we need to think
first about the idea of something like packages or modules, which are
classically built from multiple source files and define scope of
their own which is larger than file scope, but still private. One
vague possibility is something like the nested table thing that
happens in AmigaMUD, but perhaps with different rules. E.g. the
search path for symbols in a piece of code starts with the innermost
scope and goes outward, as usual, but the end of the search is
defined by the system as a whole, and not by anything relating to
the current piece of code. Perhaps that is how system-defined
symbols get used. Also needed is a way to "use" more tables of
symbols that are not on the direct path, but are in branches
reachable from that direct path.

If we end up allowing the above kinds of references, how do we ever
change anything in the system? There will be such a complex web of
dependencies, that no function can ever be deleted! One possible
answer is that functions can be edited at will, and thus they can
start referencing fewer things, thus eventually untying the knots.

Numeric types. What do we want? Given the current and near future state
of processors, I think the default integer type should be 64 bit, and
the default floating point also 64 bit. Those two, and string and
boolean types, likely suffice for the primitive types for most
application and GUI programming. For low-level programming, one needs
to be able to do loads and stores of specific sizes to and from
hardware registers, etc. That suggests adding some types (no need
for signed, I expect) that are defined to be 8, 16, 32 and 64 bits.
For efficient programming at low levels, the C style of not saying
how big a type is works well, but we need to avoid the pointer trap
by defining a integral type large enough for a pointer. So, as a
first attempt, how about the following types:

    Base language:

	int (signed integer, size unspecified but likely 64 bit)
	char (character, typically an ASCII character)
	bool (true/false value)

    Applications level languages add:

	float (floating point, size unspecified but likely 64 bit)
	string (flexible, system checked string, unicode?)

    Low level language adds:

	float32 - 32 bit floating point
	float64 - 64 bit floating point
	uint8 - 8 bit unsigned integer
	uint16 - 16 bit unsigned integer
	uint32 - 32 bit unsigned integer
	uint64 - 64 bit unsigned integer
	addrint - unsigned integer big enough to hold any pointer
	uint - natural-sized unsigned integer
	sint - natural-sized signed integer

A minimum subset of the load-level language could not have floating
point, and only have unsigned integer types upto the reasonable size
of the processor (e.g. an 8 bitter might stop at 8 bits, but would
have to have a 16 bit addrint).

An interesting question: do we need to low-level types in the bytecode
machine? Or, is all low-level stuff compiled directly to native code?
Requiring that compilation ability makes porting harder. Without it,
one need only produce a bytecode interpreter, and the bit of lowest-
level native code that the system needs to get up and running.

Object-oriented programming. I'm still not a big fan of this, in the
sense that it is used in, say, C++. I think, for example, that C++'s
multiple inheritance is very bad. I much prefer Java's concept of
interfaces, but I believe that some consider them inadequate. In a
system as dynamic as what I am proposing here, I think we have to
go with something like interfaces, since changing the inheritance
hierarchy in a running system is likely intractible. With interfaces,
one can add a new interface to the set exported by, say, a package,
and new clients can start using that new interface. When all users
of the old interface have switched over, then the old interface
itself can go away, and any code in the package used just to
implement the old interface can also go away.

I still haven't really said anything about OOP. The idea of inheritance
of value, like AmigaMUD does, is something I kind of like. I've used
it above, essentially, for configuration settings. It isn't efficient
enough to use for low-level or main application programming, however.
So, here are some words that might make a usable compromise:

- structs and classes are completely different beasts. Structs are
    like traditional C structs, in that they associate a bunch of
    values into a group in memory. They can nest at will, and no
    pointers or references are needed. They exist in the base
    language, since I see no particular problem created by their
    existence.

- classes are more like Java classes - every value which is an instance
    of a class is accessible only by reference. Thus, classes cannot
    directly nest like structs do. There is no such thing as a
    class structure field or variable. Classes can inherit, however,
    and that inheritance does not require a reference - the "parent"
    class elements are included directly in the storage for the
    descendent class. The main aspect of inheritance, however, is the
    inheritance of the interfaces and methods of ancestor classes.

- I personally find the use of class member names to be incrediby
    confusing. I would rather require explicit use of a "path" to
    the member names. I know that could get verbose, but hopefully
    it would encourage shallower inheritance trees! I'm also not
    sure I like the implicit 'this' used inside member functions.
    Typing "this.blah" all the time would be tedious, but if it
    results in greater clarity and uniformity, I'd say go with it.
    For similar reasons, I wouldn't be a big fan of something like
    Pascal's "with" clauses.

    I even find "this.blah" confusing when "blah" is a member of some
    ancestor class. Perhaps even more than when "blah" is a member
    of the child class, since I don't know where to look to find "blah".
    Now, with an IDE these problems are minimized, but I still think
    that code should read clearly and unambiguously all by itself,
    without external assistance. I'm not sure I have an acceptible
    answer here, other than "don't use OOP".

Taglets. I'm a big fan of using taglets on identifiers as memory aids.
Intellectually, I understand that they become questionable when a
project becomes large enough that the taglets are hard to keep unique,
and hard to figure out since there are several similar ones. I don't
know what to say here - I've never worked on a project that got big
enough that this was a problem. I'd like to say that large projects
should be modular enough that it isn't a problem, but I'm not sure
even I believe that. Taglets would only work properly in a very large
system, I think, if no symbols with taglets (struct elements, class
members, etc.) escape from the larger units that they are defined and
used inside. E.g. the only symbols that a package exports are those
defined in its interfaces. I think that means that a package cannot
export a class that can be inherited from. Is that reasonable, or
does that remove the whole point of OOP? I guess it depends on how
large the package is - if it is large enough, then there is still lots
of possibility for traditional OOP within it. We do need some way to
stop the dependencies on memory layout, however. I.e. we need to
create a system where the addition of a member to a class does not
invalidate the entire system or tons of persistently stored values.

Hmm. *Can* you add a field to a structure that has already been used?
(Or, equivalently, a member to a class that has already been used?)
I know there are some systems which can do it - they essentially have
to go rewrite all values in memory. We would have to rewrite persistent
ones too. Which can of course fail due to lack of space or I/O error.

Well, this *is* the same as compatibility problems with differing
versions of proprietary file formats. I'm thinking that there are
only two situations where these format changes have consequences
outside the code of the package:

- persistant values written by code in the package.

- structures exported in an interface the package implements

The answer to the second is as mentioned earlier - you have to define
and implement a new interface, and support the old interface as well
as the new one until all users of the old one have been updated.
What about symbol clashes between the old and new interfaces? They would
want to be the same (nearly all of them if the change is small!). So,
how does code bind to the right one? The answer here might be to rely
on help from the system's knowledge of the programming languages, etc.
A tool (or whatever) could be provided that takes a package A, that
uses interface I1 from package B, and replaces all of its uses of I1
with corresponding uses of replacement interface I2. This will result
in name changes seen in a rendering of the code in package A. However,
a later tool could then do the step of "B no longer needs to implement
interface I1, as it has been replaced by interface I2. Please delete
I1, but do corresponding renames of the symbols in I2 (and I2 itself)
to those from I1". This would work since the stored representation of
A would not actually contain the symbols from Ix, since it references
the single copy of those symbols directly. A much simpler example of
this is that I could add a way in AmigaMUD to change the name of a
local variable or parameter in a function, and any subsequent pretty-
print of that function would use the new name throughout, since all
uses are just a kind of reference to the single declaration.

I have no immediate answer to the issue of updating persistent values
of a structure or class that is changing. It doesn't seem practical
to somehow have an index that shows where all of them are. Hmm.
If we add a rule that a structure exported by a package cannot be
persisted except by code inside that package, then perhaps we can
have an answer. If only internal code can persist such values, then
perhaps some kind of automatic conversion is feasible. If not, then
we may be stuck with such packages having to maintain code to handle
old structures indefinitely. Ick. How does the system know that a
type has been persisted? Is the existence of a persistent variable
of that type (within the package) enough?

How *does* the system know the type of a persistent value? I don't
think we want each persisted value to have to fully describe itself
if there are thousands of them. Hmm. This might not be a problem -
only top-level persisted variables need to describe themselves - if
they contain aggregates of other values, their one description of
that aggregate suffices. So, perhaps each such top level value
includes a unique id of the type, and a version number of that type.
Then, if the system reads a persistent value with the same type id
but different version, it actually could do the conversion. The
stored value would only be updated when it is next stored. If the
new version of the type has new fields, default values would be
supplied by the system. The new code in the package could check for
the default values and update accordingly.

How many individual persistent values are we talking about? Is the
above extra stuff too much? My thinking is that there would be about
the same number of top-level persistent variables as there would
otherwise be files in a file-oriented system, so its not a problem.
It should perhaps be a goal that this be so.

A 'table' type might be interesting. The type definition requires
two types as its arguments. One is the key type and the other is
the value type. The key type needs to have an associated numeric
hash function. This leads to the idea from some language several
years ago that types are defined by the set of operations that can
be performed on values of that type. Perhaps all types, including
the builtin ones present an interface listing the operations that
are available on them. The regular syntaxes we use for some of those
operations are simply syntactic sugar. Maybe not the assignment
operator, else we go down the icky path of C++ references.

Persistent variables. How exactly does one use them? If a variable
declared inside of a package is persistent, then I think there should
only be one system-wide value for that variable. It could be
persisted "inside the directory" where the package "resides". If a
persistent variable is exported by an interface implemented by a
package, is it system-wide unique? If so, how *does* one make/use a
local (reached via the same path-search as mentioned earlier for
configuration variable sets) persistent variable? Perhaps a tag on
the declaration in the exporting interface says whether the variable
is local or shared. Are such variables deleted if the package that
exports them is deleted? Do the variable instances reference the
package and interface for their definition? If the variables are
not deleted, do we then leave a crumb behind which contains the
type of the variable and a reference to the package, so that if the
package is installed again, it can again find the variables? Does
that need to be more than using the right name for the variable,
perhaps prefixed by the package name? How unique do package names
have to be? Can there be multiple packages with the same name on
the same system, distinguished by their "path"? That would be useful
for trying out new versions. How else do you work on a new version
of a package that the system needs for operation? If the "names" of
the persistent local variables contain the path to the package that
they are for, does that mean that a package has absolute control over
where it is installed? How then do you replace an old version with
a new one? I think that somehow you have to swap them around when
neither one is in-use. That implies a table of pending swaps that
is executed on a system boot, for packages that are always in use.
Having the package force where it has to be put is something that
modern sysadmins are not pleased with. Windows has run into problems
with everything wanting to be installed in "C:\Windows\xxx" or somesuch.

The more I think about it, the more it seems to me that the traditional
ways of using OOP just don't have a place here. It looks to me like
they can only be used within a given package, since you end up being
locked in to their current forms if they go out of the package. Is
that enough usefullness to merit the effort of implementing OOP? Am
I missing something here?


020424/Wednesday:

Since types are first-class objects in the system, there need to be
ways to construct them and explore them from within the system. These
can be used to write linearizers and de-linearizers for types. The
explorers can also be used by general printing routines and a
debugger. Have a type-case statement like Algol68 (and Pascal?) that
safely splits out a variant record. Then, there could be a main
type-explorer that is passed a type and returns a variant record that
describes the type. The caller uses a type-case to deal with the
several forms of types (likely things like 'struct', 'variant',
'proc', 'array', "result of calling XXX with args ZZZ", etc.)

Since types are first-class objects, can have functions which take
types as arguments and return types. These *are* the routines that are
used to define container types. E.g.

    proc set(type elementType)type

Need a way to specify that a type used as a parameter to such a
function must have certain "operators" available. For example, a type
passed to 'set' must have an equality operator so that duplicates set
elements can be removed. Most type functions would require a way to do
assignments. This is where we might borrow a restricted version of
C++'s "ref" concept. "ref" types can *only* be used as parameters to
functions. When such a function is called, the "ref" parameter must be
an assignable expression (lvalue). Inside the function, no special
operator is used when referencing the "ref" parameter - the un-refing
(likely a pointer dereference in reality) is implicit. This is likely
quite close to the version in C++ - should read up on them again. I
don't want "ref" *variables* however, since then it is possible to
have "ref" values which are not guaranteed to be valid - by having
them only as function arguments, there cannot be an invalid one.

Note that the above 'set' function may well need to *generate*
functions that are part of the set of "operators" that the returned
type has available. I'm guessing that it won't always be possible,
even with some added tricks, of using generic routines for such things.

Which leads us to the whole set of routines used for constructing and
exploring functions. Its possible that the types used to represent the
bodies of functions (and their basic Proc_t?) can be transparent and
public. However, the values themselves need to be read-only. A set of
construction routines is used to build and assemble the pieces of
functions. Those routines will check that everything makes sense. For
example the constructer routine for a "for" statement must check that
the index variable for a counting "for" is a integral local variable.
(Type "int" in the base language - other sized integral types likely
allowed in the system-level language.) This set of routines can be
used when linearizing and de-linearizing functions, when pretty-printing
functions, in the regular compiler, in the GUI-form program entry code,
and in the bytecode compiler.

Thus, "int" is simply a predefined constant of type "type". Etc.

I think we could handle some container types like I did "lists" in
Six/Fant. I.e. the types built by the "list" function put a "getNext"
operator into the set of operators exported by the returned type.
The convention for such iterators is that they take the list and
the list element as parameters, and return the next element. Hmm.
How do we describe such an operator? Maybe

proc getNext(<containerType> container; <elementType> element)<elementType>

where the syntax <identifier> represents a generic type substitution.
This doesn't feel quite right to me. Don might suggest using inheritance
from a base "iterable" type. How would that go:

    type iterated = newType();	/* get an otherwise undefined type */
    type iterable = <what goes here?>

    proc getNext(iterable container; iterated element)element

and then "list" generates a type which inherits this routine? This
doesn't feel right either. Don, how does C++ use their generics to do
this? How about Java? Do they have to make the container take the
most generic type, and use casts to get to the proper element type?

Anyway, once you have a 'getNext' operator (and 'getPrev' for chains,
'insAfter', 'prepend', and 'append' for lists/chains, etc.), you can
then use that type in an iterating "for" loop:

    list(int) li;
    int i;

    li := doSomethingToMakeAListOfInts();
    for i in li do
	doSomethingWith(i);
    od;

This works by having the construction function for the iterating "for"
statement require a 'getNext' on the type of the expression after the
"in", and also that that 'getNext' have the type of the iterator
variable as its second parameter type and result type. Since you can't
declare *variables* using non-constant types, it must be possible for
the compiler to check this sort of thing at compile time, using the
type examination routines. Hmm, what if a generic routine needs a
temporary - like in a 'swap' routine? I had figured out how to do that
in a previous Draco extension, but it was pretty low-level. However,
my intent there was to share the actual object code among all uses of
the generic type. We might not have to do that here (or be able to!),
since the type functions can generate routines at runtime.

020425 email from Don

> > Need a way to specify that a type used as a parameter to such a
> > function must have certain "operators" available. For example, a type
> > passed to 'set' must have an equality operator...
>
>     Perhaps something like Java's interface would be suitable here.
>     The Generic-parameter-interface could syntactically be like an
>     Interface. The language designer could specify that the Type
>     supplied to Set explicitly implements the HasEquality interface; or
>     he could merely insist that the Type has all of the operators.
>
>     Java doesn't have Generics.
>
>     C++ has stupid Generics. One can supply any type-or-class where a
>     type-or-class is expected. The compiler or linker will gripe if you
>     get it wrong, but there's no way to express, within the language,
>     how to get it right.
> 	(In my Ordered-array-of class, I have comments stating that
> 	the supplied type must have >= and == operators. That's the best
> 	one can do.)
>
>     Modula-3 does it like C++. If you get it wrong, the compiler gripes.
>
>     Eiffel has Constrained Genericity, which is rather like the above
>     Generic-parameter-interface. One can say, "this generic takes a
>     class which must be XXX class or a descendant."
> 	(Eiffel has interfaces but calls them "deferred classes"; so
> 	 one doesn't have to say "class or interface" everywhere.)
>     To support that process, Eiffel does one thing which you don't like.
>     Even simple classes have a long line of ancestors. For example:
> 	Eiffel has a type which has == and >= (and >, <, and <=):
> 	    PART_COMPARABLE;
> 	then an heir, for which < and >= are inverses (also <= and >):
> 	    COMPARABLE;
> 	then an heir with arithmetic operations (+, -, *, /):
> 	    NUMERIC;
> 	and then INTEGER, REAL, and DOUBLE inherit from NUMERIC.
>     I might have used PART_COMPARABLE or COMPARABLE in my
>     Ordered-array-of.
>
>     Even more amazing is the lineage for Eiffel's SORTED_LIST:
> 	Bertrand created CONTAINER;
> 	and CONTAINER begat TRAVERSABLE;
> 	and TRAVERSABLE begat LINEAR;
> 	and LINEAR begat BILINEAR;
> 	and BILINEAR begat SEQUENCE;
> 	and SEQUENCE begat CHAIN;
> 	and CHAIN begat LIST;
> 	and LIST begat PART_SORTED_LIST;
> 	and PART_SORTED_LIST begat SORTED_LIST;
> 	and Bertrand saw that it was good.
>     (Bertrand is Bertrand Meyer, not Bertrand Russel.)
>     Did I mention that SORTED_LIST begat SORTED_TWO_WAY_LIST?
>
> > Note that the above 'set' function may well need to *generate*
> > functions that are part of the set of "operators" that the returned
> > type has available.
>
>     Why?
>
> > "ref" types can *only* be used as parameters to functions.
>
>     Why? (I sometimes use one to name an oft-used element of an array,
>     so that address calculation is done only once. A pointer would work
>     just as well.)
>
> > When such a function is called, the "ref" parameter must be
> > an assignable expression (lvalue).
>
>     You might like the Eiffel terminology: Entity (lvalue) and
>     Expression (rvalue).
>     C++ allows the actual parameter to be an rvalue, if the formal
>     parameter is a ref-to-constant. (Just store the constant somewhere,
>     and ref that.)
>
> > I don't want "ref" *variables* however, since then it is possible to
> > have "ref" values which are not guaranteed to be valid - by having
> > them only as function arguments, there cannot be an invalid one.
>
>     C++ doesn't have such variables, either. A C++ ref must be
>     initialized, and can't be altered. But they don't have to be
>     function arguments.
>
>     C++ tries to guarantee that refs are always valid. In particular, it
>     guarantees they're never NULL. But that's the language
>     specification, not the implementation. One easily produces a NULL
>     ref in every implementation I've seen. (A program which does that
>     is invalid, of course, but the compiler doesn't notice.)
>     And one can produce an invalid ref with type-cheating pointer casts.
>
>     Algol 68 has ref variables, but in my view they're just safe
>     pointers (no pointer arithmetic).
>
>     Eiffel's ref variables cannot point to the middle of a structure.
>     (Except that it has facilities for linking to "C" programs, and
>     therein one can cheat.) But they can be null.
>
>     If Modula-3, if one uses only "safe" modules, ref's are guaranteed.
>     (They are null, or point to something of the right type.) But in an
>     "unsafe" module, one can type-cheat and do pointer arithmetic.
>
> > The convention for such iterators is that they take the list and
> > the list element as parameters, and return the next element.
>
>     No. Conventionally, an iterator has its own state, a sequence and a
>     position within the sequence. The constructor takes a sequence, and
>     positions to the start. ToNext takes no parameters.
>
> > Don, how does C++ use their generics to do this?
>
>     template<class ITEM> class sequence<ITEM> {
> 	friend iterator<ITEM>; // access to privates
> 	...
>     };
>
>     template<class ITEM> class iterator<ITEM> {
>       private:
> 	const sequence<ITEM>& seq;
> 	const ITEM* pos;
>       public:
> 	iterator(const sequence<ITEM>& sequ)
> 	    : seq(sequ)
> 	    { pos = %%% }
> 	bool atEnd() const { return pos != NULL; }	%%% maybe
> 	const ITEM& getItem() const {
> 	    if (atEnd()) throw "You idiot!";
> 	    return *pos;
> 	}
> 	void toNext() { pos = %%% }
>     };
>
>     The details depend on the implementation of sequence<ITEM>.
>     I might have gotten the syntax right.
>
> > Anyway, once you have a 'getNext' operator (and 'getPrev' for chains,
> > 'insAfter', 'prepend', and 'append' for lists/chains, etc.), you can
> > then use that type in an iterating "for" loop...
>
>     Yeah, but you're stuck with that particular operator name. C++
>     allows any methods to be used:
> 	for (iterator<thingT> it(sequence);
> 	     ! it.atEnd();
> 	     it.toNext())
> 	{
> 	    const thingT& th = it.getItem();
> 	    ...
> 	}
>
>     Eiffel does it too, with better syntax. (And any language has some
>     kind of While.)
>
> > Hmm, what if a generic routine needs a temporary - like in a 'swap'
> > routine?
>
>     Eiffel has a "type of" operator, called "like". It is especially
>     useful in generic functions. The base class ANY declares the Equal
>     routine this way:
> 	equal(parm1: ANY; parm2: like parm1) BOOLEAN
>     And so I can pass any two things of the same Type to Equal.
>     Eiffel restricts use of "like", to places where the compiler can
>     deduce the correct type.
>
>     As for "swap", Python does it this way:
> 	a, b = b, a;
>     (The RHS produces a two-element list, which can be
>     unpacked-and-assigned to any two variables. You'd expect that kind
>     of construct, when variables are untyped.)
>     I don't know whether a strongly-typed language should have a builtin
>     swapper.
>
> -----
>
>     Do you intend that the base language can declare the base types and
>     their operators? Here's how Eiffel does (part of) INTEGER:
> 	expanded class INTEGER
> 	feature
> 	    infix "-" (other: INTEGER): INTEGER
> 		-- result of subtracting Other
> 	    prefix "-": INTEGER
> 		-- negation of
> 	end
>     (Alas, one cannot declare the precedence of operators.)
>     C++ has something similar, and postfix expressions to boot.

020427 My reply:

(Sorry for the delay in responding - I was lazy on both Thursday and Friday
nights and accomplished little besides finishing that Lego disassembling.)

> > Need a way to specify that a type used as a parameter to such a
> > function must have certain "operators" available. For example, a type
> > passed to 'set' must have an equality operator...
>
>     Perhaps something like Java's interface would be suitable here.
>     The Generic-parameter-interface could syntactically be like an
>     Interface. The language designer could specify that the Type
>     supplied to Set explicitly implements the HasEquality interface; or
>     he could merely insist that the Type has all of the operators.

I had something like that in mind, but didn't immediately come up with a
syntax when I was writing, so didn't try to get specific.

I would *like* the system to be general enough to be able to essentially
define itself, but that's not a requirement. If doing that forces something
to be too abstract for a typical programmer (or me!), or results in
inefficiency, then it won't do.

My main goal is that if there are programmer-defined collections (rather
than just built-in ones), then they must be usable in the applications
level languages. That means they must be 100% type safe, since those
languages will not have type-cheat casts. I would also like them to be
efficient enough to be used in the systems-level language, but that's not
a requirement.

(I use the word "collections" above to mean kinds of compound types in
general, not just compound types that would be thought of as collections.)

>     Java doesn't have Generics.

OK, I didn't remember any. They mostly do things by going all the way
back to the Object class and then casting as necessary, don't they?

>     C++ has stupid Generics. One can supply any type-or-class where a
>     type-or-class is expected. The compiler or linker will gripe if you
>     get it wrong, but there's no way to express, within the language,
>     how to get it right.
> 	(In my Ordered-array-of class, I have comments stating that
> 	the supplied type must have >= and == operators. That's the best
> 	one can do.)

Hmm. Icky. As mentioned above, it *must* be safe at the applications level.

>     Modula-3 does it like C++. If you get it wrong, the compiler gripes.
>
>     Eiffel has Constrained Genericity, which is rather like the above
>     Generic-parameter-interface. One can say, "this generic takes a
>     class which must be XXX class or a descendant."
> 	(Eiffel has interfaces but calls them "deferred classes"; so
> 	 one doesn't have to say "class or interface" everywhere.)
>     To support that process, Eiffel does one thing which you don't like.
>     Even simple classes have a long line of ancestors. For example:
> 	Eiffel has a type which has == and >= (and >, <, and <=):
> 	    PART_COMPARABLE;
> 	then an heir, for which < and >= are inverses (also <= and >):
> 	    COMPARABLE;
> 	then an heir with arithmetic operations (+, -, *, /):
> 	    NUMERIC;
> 	and then INTEGER, REAL, and DOUBLE inherit from NUMERIC.
>     I might have used PART_COMPARABLE or COMPARABLE in my
>     Ordered-array-of.

But, if you have methods which operate on those ancestor classes, how do
you safely get from the result of one of those operators (which pretty
much have to return values of the base class) to the descendant class
which you are trying to work with? Hmm. I guess you could make that a
fact of the way the language works - it looks like that is implied in
the above.

Looking around my bookshelves, I see have language descriptions for:

    Occam
    Modula-2
    CLU
    Smalltalk
    C++
    Eiffel
    Ada 95
    Modula-3
    Java

Perhaps I just need to spend a bunch of time reading. Ick!

> > Note that the above 'set' function may well need to *generate*
> > functions that are part of the set of "operators" that the returned
> > type has available.

>     Why?

Because you could? Well, there may be limitations on what you can do
by relying on single generic functions applied to all cases. So, if you
need to actually generate code to accomplish the goal, you can.

> > "ref" types can *only* be used as parameters to functions.
>
>     Why? (I sometimes use one to name an oft-used element of an array,
>     so that address calculation is done only once. A pointer would work
>     just as well.)

I picked this to make 100% certain that no "ref" value could ever be
invalid. As you say below, I may be able to stay safe and still allow
more situations. I would *like* to not have dynamic initializers in
declarations, and without those, its hard to have "ref" variables.

>     You might like the Eiffel terminology: Entity (lvalue) and
>     Expression (rvalue).

Hmm.

>     C++ allows the actual parameter to be an rvalue, if the formal
>     parameter is a ref-to-constant. (Just store the constant somewhere,
>     and ref that.)

Yeah - I found that quite disturbing. Hidden inefficiencies.

> > I don't want "ref" *variables* however, since then it is possible to
> > have "ref" values which are not guaranteed to be valid - by having
> > them only as function arguments, there cannot be an invalid one.
>
>     C++ doesn't have such variables, either. A C++ ref must be
>     initialized, and can't be altered. But they don't have to be
>     function arguments.
>
>     C++ tries to guarantee that refs are always valid. In particular, it
>     guarantees they're never NULL. But that's the language
>     specification, not the implementation. One easily produces a NULL
>     ref in every implementation I've seen. (A program which does that
>     is invalid, of course, but the compiler doesn't notice.)
>     And one can produce an invalid ref with type-cheating pointer casts.

Can it be done without casts? My application language won't have casts,
and the assumption will be that any code written in the system language
is 100% reliable (may not be true, but the language will assume it).

>     Algol 68 has ref variables, but in my view they're just safe
>     pointers (no pointer arithmetic).

Yeah - they are just pointers. I've stolen keywords from Algol 68, but I
never liked a lot of their stuff (even though one of the designers was
on my thesis committee, and his widow is my next door neighbour). I
especially didn't like their automatic casts and use of similar syntaxes
for all sorts of things.

> > The convention for such iterators is that they take the list and
> > the list element as parameters, and return the next element.

>     No. Conventionally, an iterator has its own state, a sequence and a
>     position within the sequence. The constructor takes a sequence, and
>     positions to the start. ToNext takes no parameters.

I don't want such iterations to have the cost of allocating and deallocating
an iterator. Thus, whatever method is used during iteration needs to have
at least one argument, even if its only an implicit 'this'.

> > Don, how does C++ use their generics to do this?
>
>     template<class ITEM> class sequence<ITEM> {
> 	friend iterator<ITEM>; // access to privates
> 	...
>     };
>
>     template<class ITEM> class iterator<ITEM> {
>       private:
> 	const sequence<ITEM>& seq;
> 	const ITEM* pos;
>       public:
> 	iterator(const sequence<ITEM>& sequ)
> 	    : seq(sequ)
> 	    { pos = %%% }
> 	bool atEnd() const { return pos != NULL; }	%%% maybe
> 	const ITEM& getItem() const {
> 	    if (atEnd()) throw "You idiot!";
> 	    return *pos;
> 	}
> 	void toNext() { pos = %%% }
>     };
>
>     The details depend on the implementation of sequence<ITEM>.
>     I might have gotten the syntax right.

And I might have understood it. Expecially the ": seq(sequ)" bit.

> > Anyway, once you have a 'getNext' operator (and 'getPrev' for chains,
> > 'insAfter', 'prepend', and 'append' for lists/chains, etc.), you can
> > then use that type in an iterating "for" loop...
>
>     Yeah, but you're stuck with that particular operator name. C++
>     allows any methods to be used:
> 	for (iterator<thingT> it(sequence);
> 	     ! it.atEnd();
> 	     it.toNext())
> 	{
> 	    const thingT& th = it.getItem();
> 	    ...
> 	}

But I *want* the operator names to be the same, for readability reasons.

>     Eiffel does it too, with better syntax. (And any language has some
>     kind of While.)

Yes, this is a form of "while" statement. However, it would be nice to be
able to hide the details of the iterators and how they are used, so that
it is easy to iterate through lists, etc., even for beginners. Hence the
standardization on the names of the iterator methods, so that the 'for'
that iterates can use them.

> > Hmm, what if a generic routine needs a temporary - like in a 'swap'
> > routine?
>
>     Eiffel has a "type of" operator, called "like". It is especially
>     useful in generic functions. The base class ANY declares the Equal
>     routine this way:
> 	equal(parm1: ANY; parm2: like parm1) BOOLEAN

I like 'like' - I'll see if that helps.

Keep in mind that I want this stuff to be efficient. I want all of the
operations involving types to be compile-time only, unless the programmer
deliberately does something involving types as first-class entities. I'm
probably already contradicting myself on this, with the idea of generating
code at runtime. Hopefully that would be a special case.

>     And so I can pass any two things of the same Type to Equal.
>     Eiffel restricts use of "like", to places where the compiler can
>     deduce the correct type.

<nod> I like.

>     As for "swap", Python does it this way:
> 	a, b = b, a;
>     (The RHS produces a two-element list, which can be
>     unpacked-and-assigned to any two variables. You'd expect that kind
>     of construct, when variables are untyped.)

Understood. I've seen those before. They become a significant change in
the language, however, and the consequences on stuff could be large.

>     I don't know whether a strongly-typed language should have a builtin
>     swapper.

Oh, why?

>     Do you intend that the base language can declare the base types and
>     their operators? Here's how Eiffel does (part of) INTEGER:
> 	expanded class INTEGER
> 	feature
> 	    infix "-" (other: INTEGER): INTEGER
> 		-- result of subtracting Other
> 	    prefix "-": INTEGER
> 		-- negation of
> 	end
>     (Alas, one cannot declare the precedence of operators.)
>     C++ has something similar, and postfix expressions to boot.

Certainly nothing as fancy as Algol68 has. I *would* like to be able to
introduce new operators, in some sense, but another part of me doesn't want
to, but does want new types/classes to be able to define their meaning
for the existing operators.

Hmm. Perhaps operators themselves, along with their priorities, are
introduced as features of a new language superset. Then, within that
language, the meanings of the operators can be defined by any new types
defined in that context. At least that way there isn't an unmangeable
mish-mash of operators with priorities depending on argument types, etc.
like Algol68 had to cope with. I'm not into writing a parser that is
NP complete in its parsing. (I doubt Algol68 was, but it certainly wasn't
simple fixed precedence like most are.)

More thinking today, I guess.


020427/Saturday

I need to be careful to not confuse things relating to types. I've been
thinking of a procedure that takes one or more type arguments and
returns a new type value. I want to be able to do that, but that isn't
necessarily what is wanted for container types like lists. A more
standard generic type is something that tells the compiler things about
the type you want to use, and how to manipulate entities of that type.

If I were to have a procedure like: "proc list(type elementType)type",
would I be able to use it in a declaration in explicit source code, as
in: "list(int) li"? Thinking of "list" as a procedure, that would require
that "list" be executed at compile time, yielding a type that the compiler
would then examine to determine how to handle "li". That may well be
workable, but it is quite different from what other languages do with
generic types. I need to be clear what I'm thinking about. It may well
be possible to not need a distinction, but until that is clear, I need
to be clear.

Another consideration: when decompiling a program containing such a
type, how is it represented? If "list" is viewed as a generic type, then
"list(int)" is the representation of that type in a decompiled view,
and so it needs to be the internal representation as well. However, since
I want to allow the actual type procedures, it seems clear that the
system will end up having types that do not have any associated printable
form that means anything to a human reader. So be it. Note that you cannot
make use of that type for additional declarations unless the editting
facility treats whatever representation of the type is used as a separate
object in and of itself, without trying to interpret the textual form.

I'm beginning to think that all of the pre-defined types and type
constructors in the system should have names beginning with a capital
letter, since they are global symbols.

It may make sense to expand the use of "ref" types. Thus, a list iterator
would actually deal in "ref" types. Similar for a table type. Thus, if
you somehow have "Table(String, TableEntry) myTable", then you should
be able to do "Lookup(myTable, "fred").fieldName := value", and this
would work because the generic "Lookup" returns type "ref TableEntry".

Something that scares me about this sort of thing, is that it is so
close to pointers that it may not be possible for reference counting to
work efficiently with the things you "ref". E.g. to be safe, it looks
like "Lookup" can't return a "ref" to the field of the table struct
that is the value part, but rather has to return a copy of the table
entry's "ref" to that independent entity. I.e., we can't have some
variant of

    struct TableSlot {
	string tab_key;
	TableEntry tab_key;
    }

as the resulting type, but rather must have

    struct TableSlot {
	string tab_key;
	ref TableEntry tab_entry;
    }

instead.

Similarly, how do "ref" values work safely? Do they in fact pad the thing
they reference with another field that is the reference count? How does
this really work? How would it mix in with reference counts on class
objects? Are the two the same thing, i.e. anything that a "ref" variable
references must be a structure that has a reference count as the first
field, and is dynamically allocated? What happens with class inheritance?
With single inheritance, perhaps it is simply that the parent class,
complete with its reference count field, is just the first field of the
struct that represents the child class, and the compiler knows that in
this case it doesn't need to add an extra reference count field.

So, declaring things using the system-level language, what
"Table(string, TableEntry)" yields is something like:

    type TableEntryRef = struct {
	UInt ter_refCount;
	TableEntry ter_te;
    };

    type TableSlot = struct {
	UInt ts_hash;			/* Probably not needed. */
	string ts_key;
	*TableEntryRef ts_entry;
    };

    type TableBody = struct {
	UInt tb_spaceCount;
	UInt tb_usedCount;
	*TableSlot tb_slots;
    };

What am I missing? :-) It certainly looks here like "Lookup", "Enter",
etc. can be generic, in that the same bytecode sequence works for any
types. Note that the "key" types/values must all be the same size. So,
that would somehow have to be a constraint on the first "parameter"
to "Table". Hmm. Perhaps "Table" does have a body. What that body does
is check the arguments, thus producing compilation errors. Wierd, but
it might work.

Grrr. If I do this for "ref" types, then "ref" types don't work as
parameters to functions that operate on arbitrary types. Sigh. It looks
like two different concepts are needed - one that is used for those
parameter situations in order to allow things like assignment operators,
and one for these situations that require a reference count. We don't
want reference counts in the former, otherwise we have to treat specially
any variable that is "ref"-ed in the body of the routine. Is that so bad?
Dunno. Certainly any such reference count has to be initialized to '1',
so that the second type of "ref" handling doesn't try to free it!

Assuming all of that, lets try to write Table. Note that I've made this
one have a fixed number of entries, so that I could skip stuff relating
to dynamic expansion.


/* Externally defined things that Table references: */

/* These types are predefined in the system somewhere. */
type TypeError;
type TypeGeneric;
type Type;
type UInt;
type Bool;
type String;
type GenericProc;
type TypePackage;

/* This is a generic procedure description, that is a description of what
    to match in the set of methods that a type implements. "Implements"
    matches these things. Where it sees Type "TypeGeneric" in such a
    signature, it must see its "t" parameter as the corresponding
    parameter of the "UIntHash" routine that must be implemented by
    the type being checked. */
type UIntHash = proc(TypeGeneric v)UInt;
type Equals = proc(TypeGeneric v1, v2)Bool:

/* These routines are part of the Type package or something. They are all
    pretty trivial, I expect - they simply return or modify internal data
    structures of the Type internal variant structure. */
proc Implements(Type t; GenericProc requirement)Bool;
proc SizeOf(Type t)UInt;
proc TypeGenericNew(UInt byteSize; TypePackage tp)Type;
proc TypeGenericAddTypeParam(Type newType, paramType)void;
proc FindMethod(Type t; GenericProc requirement)Proc;

/* Table itself: */

type package Table(Type keyType, entryType; UInt entryCount) {
    type EntryStatus = enum {
	ts_empty,
	ts_deleted;
	ts_used;
    };

    type TableEntry = struct {
	keyType like String te_key;
	Bool te_status;
	ref entryType te_value;
    };

    type TableBody = struct {
	UInt tb_entryCount;
	[entryCount] TableEntry tb_entries;
    };

    proc check(Type keyType, entryType)Type:
	Type t;

	if not Implements(keyType, Equals) then
	    error("Table key type must implement Equals");
	    return TypeError;
	fi;
	if not Implements(keyType, UIntHash) then
	    error("Table key type must implement UIntHash");
	    return TypeError;
	fi;
	if SizeOf(keyType) > SizeOf(String) then
	    error("Table key type must be no larger than String");
	    return TypeError;
	fi;
	t := TypeNewCompound(SizeOf(TableBody), Table);
	TypeCompoundAddParam(keyType);
	TypeCompoundAddParam(entryType);
	return t;
    corp;

    proc new(ref Table tb)void:
	tb.tb_entryCount := entryCount;
	for UInt i from 0 upto entryCount - 1 do
	    tb.tb_entries[i].te_status = ts_empty;
	od;
    corp;

    public Exception TableOverflow;

    public proc Enter(Table tb; keyType key; ref entryType value)void:
	UInt startHash, hash, deletedIndex;
	Bool foundDeleted;

	startHash = FindMethod(keyType, UIntHash)(key) % tb.tb_entryCount;
	foundDeleted := false;
	hash := startHash;
	while tb.tb_entries[hash].te_status ~= ts_empty do
	    if not foundEmpty and tb.tb_entries[hash].te_status = ts_deleted
	    then
		foundDeleted := true;
		deletedIndex := hash;
	    fi;
	    if tb.tb_entries[hash].te_status = ts_used and
		FindMethod(keyType, Equals)(tb.tb_entries[hash].te_value, key)
	    then
		/* Replace existing entry. */
		tb.tb_entries[hash].te_value := value;
		return;
	    fi;
	    hash := (hash + 1) % tb.tb_entryCount;
	    if hash = startHash and not foundDeleted then
		raise TableOverFlow;
		return;		/* Probably not needed. */
	    fi;
	od;

	/* Insert new entry. */
	if foundDeleted then
	    hash := deletedIndex;
	fi;
	tb.tb_entries[hash].te_value := value;
	tb.tb_entries[hash].te_status := ts_used;
    corp;

    public proc Lookup(Table tb; keyType key)ref entryType:
	/* Put some of the above into a subroutine and share it here. */
    corp;
};


This hurts my head. So much for being a simple language. I think we want
this type of thing to be in a language level that is sort of beside the
other ones, so that most programmers don't have to deal with it.

There is a pretty twisted combination of language features and run-time
system support routines here. I'm not at all sure of some of the semantics,
either. For example, the declaration of TableBody, containing an array
whose size is one of the parameters to the type, is strange. Does the
system have to actually do such a declaration, during the processing of
a declaration involving the use of Table, in order to produce a Type
structure that can be used in the SizeOf call in 'check'? I think that
is doable, in that TableBody is stored internally as a parameterized type
(recognized by this level of the language, based on the use of a type
parameter as the array size). Its pretty wierd, however.

What exactly do the assignment statements of 'entryType' values in
"Enter" do? If I have a pair of "ref X" variables and I assign from
one to the other, is that copying the ref or copying the value? If it
is copying the value, then this isn't a simple assignment statement,
since it is a copy of a value of unknown size. Does that imply we need
a "Copy" method on the type so that we can do this? If so, then the
simple assignment as used here could copy the "ref" values, doing the
right thing with reference counts.

The efficiency of the resulting table isn't very good. The Enter and
Lookup routines have to do run-time FindMethod calls. Ick. Hmm. It might
be possible to do those lookups in the 'check' routine (but where would
we store the results?) or the 'new' routine, and cache the function
pointers in the TableBody. What would be the type of such variables?
Generic proc pointers parameterized by keyType and entryType as needed?
If we are sharing the code for Enter and Lookup among all uses of the
Table type, then we don't have a lot of choice. One of the points here
was to not require new generated code for each table type.

Another problem I have is that it is the code in the type package that
is doing the type checking, and not the compiler, essentially. That
bothers me a bit, but is probably the way it has to be. Hmm. It almost
seems like Enter and Lookup shouldn't use the Table parameter types
at all, since they are run-time functions, as opposed to 'check', which
is a compile-time function. Hmm. 'new' is a run-time function. What are
the rules for what runs when? So far, only 'check' needs to run at
compile time.

Lets try a variant with some of these things done.


Aha! The function pointers to the needed methods are part of the type,
not part of each value of the type. Hence what we want is to be able
to attach some type-implementation-specific storage to the structure
created to represent the type.


type package Table(Type keyType, entryType; UInt entryCount) {

    type TableMethods = struct {
	proc(keyType value)UInt tm_UIntHash;
	proc(keyType v1, v2)Bool tm_Equals;
    };

    type EntryStatus = enum {
	ts_empty,
	ts_deleted;
	ts_used;
    };

    type TableEntry = struct {
	keyType like String te_key;
	Bool te_status;
	ref entryType te_value;
    };

    type TableBody = struct {
	Type tb_tableType;
	UInt tb_entryCount;
	[entryCount] TableEntry tb_entries;
    };

    public proc check(Type keyType, entryType)Type:
	*TableMethods tm;
	Type t;

	if SizeOf(keyType) > SizeOf(String) then
	    error("Table key type must be no larger than String");
	    return TypeError;
	fi;
	tm := new(TableMethods);
	tm*.tm_UIntHash := FindMethod(keyType, UIntHash);
	if tm*.tm_UIntHash = nil then	
	    error("Table key type must implement UIntHash");
	    return TypeError;
	fi;
	tm*.tm_Equals := FindMethod(keyType, Equals);
	if tm*.tm_Equals = nil then
	    error("Table key type must implement Equals");
	    return TypeError;
	fi;
	t := TypeNewGeneric(SizeOf(TableBody), Table);
	TypeGenericAttachPrivate(t, tm);
	TypeGenericAddTypeParam(t, keyType);
	TypeGenericAddParam(t, entryType);
	t
    corp;

    public proc new(Type tableType; ref Table tb; UInt entryCount)void:
	tb.tb_tableType := tableType;
	tb.tb_entryCount := entryCount;
	for UInt i from 0 upto entryCount - 1 do
	    tb.tb_entries[i].te_status = ts_empty;
	od;
    corp;

    proc find(*TableMethods tm; *Table tb; keyType key)*TableEntry:
	*TableEntry te;
	UInt startHash, hash, deletedIndex;
	Bool foundDeleted;

	startHash = tm*.tm_UIntHash(key) % tb.tb_entryCount;
	foundDeleted := false;
	hash := startHash;
	te := &tb*.tb_entries[hash];
	while te*.te_status ~= ts_empty do
	    if not foundEmpty and te*.te_status = ts_deleted then
		foundDeleted := true;
		deletedIndex := hash;
	    fi;
	    if te*.te_status = ts_used and tm*.tm_Equals(te*.te_value, key)
	    then
		return te;
	    fi;
	    if hash == tb*.tb_entryCount then
		hash := 0;
		te = &tb*.tb_entries[0];
	    else
		hash := hash + 1;
		++te;
	    fi;
	    if hash = startHash and not foundDeleted then
		/* Table full. */
		return nil;
	    fi;
	od;
	if foundDeleted then
	    hash := deletedIndex;
	fi;
	&tb*.tb_entries[hash]
    corp;

    public proc TableEnter(ref Table tb; keyType key; ref entryType value)bool:
	*TableMethods tm;
	*TableEntry te;

	tm := TypeCompoundGetPrivate(tb.tb_tableType);
	te := find(tm, tb, key);
	if te = nil then
	    return false;
	fi;
	te*.te_value := value;
	te*.te_status := ts_used;
	true
    corp;

    public proc TableLookup(Table tb; keyType key)ref entryType:
	*TableMethods tm;
	*TableEntry te;

	tm := TypeCompoundGetPrivate(tb.tb_tableType);
	te := find(tm, tb, key);
	if te = nil then
	    nil
	else
	    te*.te_value
	fi
    corp;
};


/* An iterable collection type T1 of elements T2 must export:
    type iterator;
    proc iteratorInit(*iterator it; *T theList)void;
    proc iteratorTerm(*iterator it)void:
    proc iteratorNext(*iterator it)bool;
    proc iteratorValue(*iterator it)ref T2;
*/

type package List(type valueType) {

    type listElement = struct {
	*listElement le_next;
	valueType le_value;
    };

    type listHeader = struct {
	*listElement lh_head;
    };

    public type iterator = struct {
	*listHeader it_lh;
	**listElement it_current;
    };

    public proc check(type valueType)type:
	Type t;

	t := TypeNewGeneric(sizeof(listElement), List);
	TypeGenericAddTypeParam(t, valueType);
	t
    corp;

    public proc new(Type listHeader; *List lh)void:
	lh*.first := nil;
    corp;

    public proc iteratorInit(*iterator it; *listHeader lh)void:
	it*.it_lh := lh;
	it*.it_current := &lh*.lh_head;
    corp;

    public proc iteratorTerm(*iterator it)void:
	it*.it_lh := nil;
	it*.it_current := nil;
    corp;

    public proc iteratorNext(*iterator it)Bool:
	it*.it_current := &it*.it_current**.le_next;
	it*.it_current* ~= nil
    corp;

    public proc iteratorValue(*iterator it)ref listValue:
	it*.it_current**.le_value
    corp;

    public proc ListDeleteAndNext(*iterator it)**listElement:
	*listElement next;

	next := it*.it_current*;
	it*.it_current* := next*.le_next;
	it*.it_current
    corp;
};


type intList = list(int);
intList theList;

theList := ...;

for int i in theList do
    doSomething(i);
    if i = 0 then
	ListDeleteAndNext();
    fi;
od;


translates to:


type intList = list(int);
intList thelist;

theList := ...;

intList.iterator it;
intList.iteratorInit(&it, &theList);
while intList.iteratorNext(&it) do
    int i;

    i := intList.iteratorValue(&it);
    doSomething(i);
    if i = 0 then
	ListDeleteAndNext(&it);
    fi;
od;
intList.iteratorTerm(&it);


020428/Sunday

Want to use 'ref' in the above example, not pointers. Don't really want
explicit 'ref', though - that's not my desire for what 'ref' is. So,
we want the list elements to be garbage collected entities. We don't
want to have to make a class for them, however. So, that leads to the
suggestion that a 'struct' is like a traditional C/Draco struct. We
use the reserved word 'record' with the same concept, but when you
declare a variable, field, parameter, etc. of a record type, you are
actually getting a pointer to such a record. Record types have the
'new' operator to allocate a new one, with all of its fields
initialized. Struct types don't have that. Hmm. If struct types are
to be usable at all in the application language, then they need to
be initialized. However, so do all variables used in that language,
so perhaps its no big deal. However, simple variables can be
initialized explicitly in their declarations - that's much ickyer
with struct variables. What about arrays? How are they initialized
in the application language? I guess in that language, all new
entities are initialized. Might similarly need an array distinction.
Fixed dimension arrays are akin to structs, but dynamically sized
arrays (as in Java) are akin to records. Language is growing, and
the variants are diverging!

Note that the fact that the above 'list' type has ListDeleteAndNext
does not directly relate to it being an iterable type - that is a
separate feature of this particular list package.

Try redoing the list package with records and a few ref parameters,
but no explicit pointers. If can do it, then that generic type could
also be done in the application language, and the deletion operation
relying on garbage collection is OK. I think we might want the
garbage collection to not work on pointers, since they are untrustable.


020429/Monday

Random thoughts on GUI:

- need a "draggable line", which can be used for those title bars, etc.
    where you can resize the columns (same for rows).

- components may need several drawing implementations, size requests:

    - preferred size: would like this to draw properly
    - minimum size: below this, don't bother call drawing routines

    - maybe can use same drawing method for all, but it would be passed
	a size that it is drawing in. That might be the real size of the
	displayed space. It might be the preferred size in a scrollable
	display. It might be a very much smaller space where the
	component should draw its top-left corner only. For example,
	if a spread-sheet cell isn't big enough, the cell contents should
	do the best it can within that space.

Random thoughts on a spreadsheet:

- could make multi-dimensional, but no true need, I expect. Allowing only
    1 dimension seems pointless - extra code for little benefit, since
    user could just a one-row or one-column 2-D spreadsheet. Perhaps menu
    for dimensionality contains "2", "3" and a numeric input box? Only if
    menus stay down.

- each cell would have contents that is a variant record. One variant is
    a direct value. Another is an independent function to compute the
    contents. Another could be a function and a specification of which
    other cells (others in row/column, selected range of row/column)
    to accumulate the function over. Could be combos. So, some of these
    ought to be functions provided by the spread-sheet application, which
    apply a supplied function of a specified range.

Random language thoughts:

- the old language library showed that you cannot be type safe with things
    like token values if the set of tokens can change. Have to do something
    else. Token kinds would work with a variant struct.

- likely want to define all reserved words from all languages, so that if
    a piece of code has to move languages, it can't have conflicts of
    identifiers with reserved words.

- one thought (a bit wierd) - encode 1 - 4 characters of the operator
    tokens directly into a 32 bit value. Then languages can define
    whatever operator tokens they want. Any reason for this? Look
    through the language library stuff again before diving into this.


Latest version of the List package, allowing ListDeleteAndNext:


type package List(Type valueType) {

    type listElement = record {
	listElement le_next;
	valueType le_value;
    };

    external = listElement;

    public type iterator = struct {
	*listElement it_current;
    };

    public proc check(Type valueType)Type:
	Type t;

	t := TypeNewGeneric(sizeof(listElement), List);
	TypeGenericAddTypeParam(t, valueType);
	t
    corp;

    public proc new(ref external le)void:
	le := nil;
    corp;

    public proc iteratorInit(ref iterator it; *external pTheList)Bool:
	it.it_current := &pTheList;
	pTheList* ~= nil
    corp;

    public proc iteratorTerm(ref iterator it)void:
	it.it_current := nil;
    corp;

    public proc iteratorNext(ref iterator it)Bool:
	it.it_current := &it.it_current*.le_next;
	it.it_current* ~= nil
    corp;

    public proc iteratorValue(ref iterator it)valueType:
	it.it_current*.le_value
    corp;

    public proc ListDeleteAndNext(ref iterator it)void:
	listElement next;

	next := it.it_current*;
	it.it_current* := next.le_next;
    corp;
};


type intList = list(int);
intList theList;

theList := ...;

for int i in theList do
    doSomething(i);
    if i = 0 then
	ListDeleteAndNext();
    fi;
od;


translates to:


type intList = list(int);
intList thelist;

theList := ...;

intList.iterator it;
if intList.iteratorInit(it, &theList) then
    while 
	int i;

	i := intList.iteratorValue(it);
	doSomething(i);
	if i = 0 then
	    ListDeleteAndNext(it);
	fi;
	intList.iteratorNext(it)
    do
    od;
fi;
intList.iteratorTerm(it);


And a version with no pointers (and so can be written in a language other
than the systems one), but that does not allow ListDeleteAndNext:


type package List(Type valueType) {

    type listElement = record {
	listElement le_next;
	valueType le_value;
    };

    external = listElement;

    public type iterator = struct {
	listElement it_current;
    };

    public proc check(Type valueType)Type:
	Type t;

	t := TypeNewGeneric(sizeof(listElement), List);
	TypeGenericAddParam(t, valueType);
	t
    corp;

    public proc new(ref external le)void:
	le := nil;
    corp;

    public proc iterate(ref iterator it; external theList)Bool:
	it.it_current := theList;
	theList ~= nil
    corp;

    public proc iteratorTerm(ref iterator it)void:
	it.it_current := nil;
    corp;

    public proc iteratorNext(ref iterator it)Bool:
	it.it_current := it.it_current.le_next;
	it.it_current ~= nil
    corp;

    public proc iteratorValue(ref iterator it)valueType:
	it.it_current.le_value
    corp;
};


type sintList = List(sint);
sintList theList;

theList := ...;

for sint i in iterate(theList) do
    doSomething(i);
od;


translates to:


type sintList = List(sint);
sintList thelist;

theList := ...;

sintList.iterator it;
if sintList.iterate(it, theList) then
    while 
	sint i;

	i := sintList.iteratorValue(it);
	doSomething(i);
	sintList.iteratorNext(it)
    do
    od;
fi;
sintList.iteratorTerm(it);


By specifying the name 'iterate' explicitly in the above, we are allowing
a type to have multiple iterators. However, they must all use the same
"iterator" type for storing the state. Shouldn't be a big problem. For
example, a 'Chain' type could export "iterateForward" and "iterateBackward".


Aha! It was the presence of the list header that allowed the delete
operation to work. The list could have been provided as a value, and
so even using an address of what was passed wouldn't necessarily work.
Also, having the list passed as a ref argument wouldn't then work. So,
put back the list header structure. It has fields lh_first and lh_current.
If lh_current is nil, then we are referring to the first element of
the list, else we are referring to the one after where lh_current points.
That allows the deletion to work. All without pointers or explicit ref's
on anything other than the iterator structure. I think. Try it:


020431/Tuesday:

type package List(Type valueType) {

    type listElement = record {
	listElement le_next;
	valueType le_value;
    };

    type listHeader = record {
	listElement lh_head;
    };

    external = listHeader;

    public type iterator = struct {
	listHeader it_header;
	listElement it_current;
    };

    public proc check(Type valueType)Type:
	Type t;

	t := TypeNewGeneric(sizeof(listElement), List);
	TypeGenericAddParam(t, valueType);
	t
    corp;

    public proc new(ref external lh)void:
	lh.lh_head := nil;
    corp;

    public proc iterate(ref iterator it; external theList)Bool:
	/* it_current essentially always points one back in the list
	   from what is the true "current" element. This allows us
	   to delete the "current" element, and to insert before it.
	   So, when the "current" is the first element in the list,
	   it_current is nil, and we use it_header. */
	it.it_header := theList;
	it.it_current := nil;
	theList.lh_head ~= nil
    corp;

    public proc iteratorTerm(ref iterator it)void:
	it.it_header := nil;
	it.it_current := nil;
    corp;

    public proc iteratorNext(ref iterator it)Bool:
	if it.it_current = nil then
	    it.it_current := it.it_header.lh_head;
	else
	    it.it_current := it.it_current.le_next;
	fi;
	it.it_current.le_next ~= nil
    corp;

    public proc iteratorValue(ref iterator it)valueType:
	if it.it_current = nil then
	    it.it_header.lh_head
	else
	    it.it_current.le_next
	fi.le_value
    corp;

    proc deleteAndNext(ref iterator it)void:
	if it.it_current = nil then
	    it.it_header.lh_head := it.it_header.lh_head.le_next;
	else
	    it.it_current.le_next := it.it_current.le_next.le_next;
	fi;
    corp;

    proc insertBefore(ref iterator it; valueType newValue)void:
	listElement le;

	le := listElement.new(nil, newValue);
	if it.it_current = nil then
	    le.le_next := it.it_header.lh_head;
	    it.it_header.lh_head := le;
	else
	    le.le_next := it.it_current.le_next;
	    it.it_current.le_next := le;
	fi;
    corp;

    proc insertAfter(ref iterator it; valueType newValue)void:
	listElement le;

	le := listElement.new(nil, newValue);
	if it.it_current = nil then
	    le.le_next := it.it_header.lh_head.le_next;
	    it.it_header.lh_head.le_next := le;
	else
	    le.le_next := it.it_current.le_next.le_next;
	    it.it_current.le_next.le_next := le;
	fi;
    corp;
};


type sintList = List(sint);
sintList theList;

theList := ...;

for sint i in iterate(theList) do
    doSomething(i);
    if i = 0 then
	deleteAndNext();
    elif i = 1 then
	insertBefore(0);
    elif i = 1000 then
	insertAfter(1001);
    fi;
od;


translates to:


type sintList = List(sint);
sintList thelist;

theList := ...;

sintList.iterator it;
if sintList.iterate(it, theList) then
    while 
	sint i;

	i := sintList.iteratorValue(it);
	doSomething(i);
	if i = 0 then
	    sintList.deleteAndNext(it);
	elif i = 1 then
	    sintList.insertBefore(it, 0);
	elif i = 1000 then
	    sintList.insertAfter(it, 1001);
	fi;
	sintList.iteratorNext(it)
    do
    od;
fi;
sintList.iteratorTerm(it);


Commentary on this:

The 'ref' type modifier can only be used on parameters. It is used to
allow a proc to modify the so-passed parameter. There is no way, other
than parameter passing, to produce a 'ref' value. 'ref's in and of
themselves are not reference counted or garbage collected. Procs which
take 'ref' parameters may only be passed individual variables, they
may not be passed fields in struct or records, or array elements. This
inconvenience is done to prevent the need for keeping track of refs
that point into the middle of things. Allowing such refs without
keeping track of them in regards to the full entity they are pointing
within would allow the refs to be made invalid when applied to a value
inside a dynamic entity that could be garbage collected away.

'record' types are akin to the records in AlgolW. All record values are
internally pointed to. Hence assigning record values is assigning those
internal pointers. Record types are reference-counted and garbage
collected. The fields of a record can be modified by anyone who has
a non read-only handle to the record. Record types can thus contain
references to the same type. Manipulation of lists, etc. constructed
from record types is done the same as it was in AlgolW - since you
cannot have a pointer to a pointer, you have to use 'if's all over
the place to special case empty lists.

'struct' types are the same as traditional struct types. A struct
variable or field appears directly in its context - there is no
indirection involved. Thus, struct types cannot contain direct
references to the same type - the reference must be indirect via a
pointer declaration, just like in C, Draco, etc.

In this List example, type "iterator" can be a simple struct since
it cannot be referenced outside of the scope it is declared in, and
the iterator class procs have no need to keep a pointer to it - they
are all passed it, and could pass it to other such procs if needed.
Type "listElement" needs to be a record, however, so that it can be
garbage collected when deleteAndNext is used. Similarly, type
"listHeader" must be a record so that it can be remembered in the
"iterator" and modified as needed. If we allowed "ref" struct fields,
then it could be a struct, but I choose not to do that.


Variant records. Example:

/* System declaration we make use of: */

type UInt = record {
    uint theUint;
};

/* Stuff here could be part of the system type facility: */

type Type = forward;		/* forward declaration */

type ArrayDesc = record {
    Type ad_elementType;	/* type of the array elements */
    uint ad_dimCount;		/* how many dimensions the array has */
    [] uint ad_dimensions;	/* array of the dimensions (0 origin) */
};

type GenericParam = oneof {
    case gp_kind in
    incase gpk_type:
	Type gp_type;
    incase gpk_numeric:
	UInt gp_numeric;
    esac;
};

type GenericParamList = record {
    GenericParamList gpl_next;
    GenericParam gpl_param;
};

type GenericDesc = record {
    TypePackage gd_package;	/* package that implements the type */
    GenericParamList gd_params;	/* list of parameters to the package */
};

type TypeData = oneof {
    case td_kind in
    incase tdk_simple:
	UInt td_code;
    incase tdk_array:
	ArrayDesc td_array;
    incase ...
    ...
    incase tdk_generic:
	GenericDesc td_generic;
    esac;
};

type Type = record {
    uint t_byteLength;
    uint t_alignment;
    TypeData t_data;
};

...

Type t;

t := Type.new(4, 4, TypeData.tk_simple(UInt.new(TYPE_UINT)));

...

/* Construct an initial generic type representation with no parameters. */
proc TypeGenericNew(UInt byteSize; TypePackage tp)Type:
    Type.new(byteSize, 8, TypeData.tk_generic(GenericDesc.new(tp, nil)));
corp;

/* Add a type parameter to a pre-existing generic type representation. */
proc TypeGenericAddTypeParam(Type_t t, paramType)void;
    TypeData_t td := t.t_data;
    case td.td_kind in
    incase tdk_generic:
	GenericDesc_t gd := td.td_generic;
	GenericParamList_t gplNew :=
	    GenericParamList_t.new(nil, GenericParam_t.gpk_type(paramType));
	if gd.gd_params = nil then
	    gd.gd_params := gplNew;
	else
	    GenericParamList_t gplScan := gd.gd_params;
	    while gplScan.gpl_next ~= nil do
		gplScan := gplScan.gpl_next;
	    od;
	    gplScan.gpl_next := gplNew;
	fi;
    default:
	System.RunError("Non generic type passed to TypeGenericAddTypeParam");
    esac;
corp;


020501/Wednesday

Grr. When do I use "type" and when do I use "Type"? It may not matter in
many cases, but I'd like to have some idea, so that I can at least be
consistent. Given that there will have to be a fair amount of pre-
definition in the system, that isn't done in the language itself, perhaps
its possible to use "Type" darn near everywhere. The only thing so far
that *has* to be "type" is the "type package" syntax, and the declaration
of "Type" itself. Hmm. Just started typing in "Type.a". If the proper way
to talk about "Type" is as "Type.Type" (is it allowable to use the same
symbol for a package name as for a public symbol exported from that
package? Likely not!), then using "type" is much more convenient. Ok,
lets make the package be "Types".

For some reason, I haven't been appending "_t" to the type names I've
been using here. I just tried adding them to "TypeGenericAddTypeParam"
(which I think should be "GenericAddTypeParam" in the "Type" type
package), and I'm not convinced it helps a lot. Makes the lines longer!

I/O? Most of the system is GUI based, but there will undoubtedly be
shell windows, and so there should be easy ways to do simple non-GUI
I/O.  *Perhaps* that will prove to be unnecessary - time will tell. If
any is needed, my current thinking is to go with a system like Draco's.


020507/Tuesday

It would be cleaner in some sense to use the "List" generic type for
the lists in the Types.a package. There is a chicken and egg problem
there, but there are probably ways around that. However, is a List
type a valid type for a oneof? Should it be? Is there a way to look
through the d_generic value to look behind it?


020510/Friday

Almost gave up on the generic concept as it has been shaping up. One
big problem is how to safely get pointers to the functions needed by
the generic code stored away somewhere. There is just *no* way to do
that in a type-safe manner, since there isn't a way to safely declare
a type that is a proc type with an unknown number of parameters. I
think the answer is to just have a type that represents that. In the
latest example, I had called it Action, then changed it to Proc on
Don's suggestion. So, let's have a system-provided type, named Proc,
which can only be assigned a proc pointer, *along with its type*.
Assigning from a Proc to a proc type similarly requires the type of
the proc type we are assigning to. These routines are part of the
builtin-type Proc, but must be written in the implementation language
because they are not type-safe.

To generalize on how the above might be declared, and to arrange that
functions in a generic proc be passed the type of their arguements,
add the new proc parameter syntax that specifies that the type of
some previous argument must be passed. *Only the compiler* can
provide (or generate code to provide) the actual argument to such a
parameter.


020511/Saturday

The language family, operating system, GUI, etc is called Zed.
The implementation language is called Zil.
The application language is called Zap.
The common language is Zcl, pronounced "zikel".
The operating system could be Zos. The user interface could be Zin.


020515/Wednesday

Basically everything should be equally operable with the keyboard or
the mouse. For example, if we have the folder-style stack of tabs, then
you should be able to use arrow keys or your default forward/backward
(likely ^F and ^B for emacs-compatible) keys to move among the tabs.
Within such a tab frame (get there with down arrow?), the usual Windows
TAB and Back-TAB should work, but so should the forward/backward, and
directional keys in general. Also, the tabs should all have labels on
the tabs, and selected letters in the labels should get you directly to
that tab from the tab-set context. Similarly, within a frame/page
(whatever they are called - the things with the individual tabs on top)
you should be able to type a key corresponding to a highlighted letter
in the the title of an entity on the page, and focus goes there, allowing
typed input, or going to a set of radio buttons, which you can cycle
activation through, or going to a set of non-exclusive buttons, where
space/return and/or backspace toggle their settings.

If can get proper raw keycodes, should be able to make ^H, BACKSPACE
and DEL all be different. So, Gnu lovers can use ^H for help, but still
have working BACKSPACE and DEL keys.

Suggestion: finish off the closest I can come in Z to the Mapping package.
Then translate the generic one to C, as closely as possible. Should be
able to use it for the language symbol tables, just like in the real thing.

Come up with a file listing the names of the system packages, and
descriptions of what the package is about/for.

For a C starter version, can perhaps have the package name be a large
C struct, fully declared in a header file. Then the notation
<package>.<export> will work as is, thus reducing later translation effort.

020815/Friday

In the stuff I'm doing so far, I've used Package_Export to do this. The
other way wouldn't really work as well, since I would need, for example,
pointers to functions, which causes some inefficiency.

From cg  Thu Aug 15 20:51:20 2002
Return-Path: <cg>
Received: (from cg@localhost)
	by ami-cg.GraySage.COM (8.9.3/8.9.3) id UAA28198;
	Thu, 15 Aug 2002 20:51:20 -0600
Date: Thu, 15 Aug 2002 20:51:20 -0600
From: Chris Gray <cg@ami-cg.GraySage.COM>
Message-Id: <200208160251.UAA28198@ami-cg.GraySage.COM>
To: djr@nk.ca
Subject: conundrum
Status: R

[By the way, the BBQ, etc. is the Saturday evening of the train show
weekend, not the Sunday evening.]

How to do circular type references? In C you can do "struct blah", and the
compiler is happy without having seen it. In Draco you can do something
like "type blah;" and the compiler is happy to deal with pointers to it.
In Zed, with a package, you can predeclare a type with something like
"type blah = record", so that the system knows what kind of type it will
be. At least, you can do that for struct, union, record and oneof. I don't
know if anything else matters, since you could always wrap those in a
one-element struct.

If someone is developing an interdependent pair of packages online, then
you just start with the pre-declared type in one package, then reference
it from the other package. When the 2nd package is done, go back to the
first, and you can now flesh out the predeclared type based on the other
fleshed-out package.

However, what about off-line development, where both package sources are
coming from (likely separate) source files (or whatever)? I've bumped
into this doing some compiler stuff - proc headers need symbol tables
to represent scopes, but symbol table entries can be function parameters
or local variables, which should logically be defined in the "proc" package.

Any thoughts?

-cg

020815/Friday

In answer to the above email, what I've ended up doing is "predeclaring"
types, as in:

    external type Package.Type_t;

I'm taking this to mean that the type is defined for the current package
compilation, but nothing is known about it, other than that it is a
referenceable type. I.e. record or oneof. I've had thoughts in the past
that I need to have inter-package references done in some kind of
abstract way, and so this has now become a firm need. I'm not sure I'll
get the final answer (requiring version numbers, originator numbers,
proof-of-originator, etc.) for quite some time. However, for now, I'll
do whatever works out here. Perhaps just a list of symbols representing
the path to the final whatever from the root of the system tree. All
such references must be resolved before the package can be used.

Oops. I've had to add the ability to reference an external enum type. That
means that the external references have to include what kind of thing the
reference is to, its not just to a type. OK, so you can now do:

    external type Package.Type_t = record;
    external type Package.Type_t = oneof;
    external type Package.type_t = enum;

This hence requires that all enum types are the same size. Sigh.

020822/Thursday

I've done some work over the last few days, but its been mostly programming.
I've changed a bit the way Array types are built, and the way Oneof types
are built. The previous way for oneof's didn't even work, since it
required the external guy to use the OneofDesc_t constructor!

Just updated RefNotes with thoughts on oneof's and ref's.

One of the next things to do is to make the system as a whole use the
ref-counted "STRING" type everywhere.

020902 Monday

Woke up at 6:00, with Zed going through my mind. Eventually got back to sleep.
One thing that sticks out, however, is the thought that I might in fact
end up with something a lot like Java's constant table. I don't remember
any details of that (other than that it seemed complicated and I didn't
like it!), but the reason for it could well relate to strong typing. My
current kluge for execution structures and bytecode requires typecasts
all over, and uses pointers to stuff. I can clean it up a lot by adding
more kinds to ExecKind_t, which I will be doing. That way, the entire
Exec_t will end up as a oneof, which is as desired, and assumed. However,
what about references to things from bytecode? I can't embed a pointer
in bytecode in any type-safe way. Hence, embed using indices into a
constant table, just like Java did. Hence I might end up with something
very similar. Oh well. I wonder if I switched to actual Java bytecode
I could take advantage of Java implementations somehow? I doubt it, since
my semantics are so different.

A thought about persistence of things - its much more efficent to persist
an array or matrix than a linked list. Remember that the MUD system used
an array representation for "list" types, and that worked fairly well.
This leads to the thought that eventually, I might end up with parallel
representations of some things. Maybe. Perhaps just do some of the other
types like I did the array type - build a temporary linked list of stuff,
and then do a final step of conversion to the more efficient array.

Might want a separate representation in some cases anyway, since there
are going to be temporary values at "run time" (whatever that is), that
don't make sense to persist. An example would be having a field in the
Proc.ProcBody_t that is the current local variable offset - it goes up
and down as scopes with variables are entered and left.

Been idly thinking about Package references and paths a bit. I think I
will need the concepts of absolute and relative path. Absolute paths work
for system installed things. Relative paths work for user stuff, and
likely applications as well. I think I'm going to have to do something
about paths real soon. What do I start an absolute path with? Initial
candidates are '.' and ':'. If '.' works that doesn't need another token,
and is parallel to UNIX FS paths. So, e.g. "IntToString" becomes
".BI.IntToString". How will that look? Things get uglier - there will
be nested packages for GUI things, so we could end up with a lot of very
ugly code containing e.g. ".GUI.Window.Create(....)".

On the subject of nested packages... I will need them. There will be
projects that consist of many packages of code, and I need to be able
to contain them within something. Paths are pointless without them! So,
how do they look? I've thought of using a "path" construct that says
where the package fits relative to other things. Alternatively, just
use a path in the package name in its declaration. Cleaner, if it works.

It is likely very wise to keep the system such that it can be compiled
from text. I.e. not end up with just the database/filesystem, and no
way to start from scratch. So, that means lots of source "files" and
package bodies. Also lots of inter-package references. Rather than
duplicate such things, it might make sense to either have some sort of
pseudo-header files for packages, or to have a #include facility, so
that one common set of declarations can be included by all packages that
need to reference some other package. If I do the semantics of such
declarations right, which I think is where the last day or so has been
leading me, then that should be sufficient. Perhaps rather than calling
it something like #include, how about "use path-to-used-package". For
online development, you don't have to do that explicitly in what you type,
since the system can figure it out from the paths you use to things. However,
when the system emits a package "source", it should put them in. For the
hosted environment like on Linux, there are actual files referenced by
those 'use' lines. Are there for online development? The system can
certainly produce one for a given package. How do you develop a set of
mutually referencing packages? I *don't* want "nested includes". I think
I will just put this off until I bump into it.

Perhaps one of the things I should do today is go look at that site for
the "BRix/Crush" system/language. ... Well, it started out looking good.
There are definite similarities between it and Z. However, as I got
further in reading, it became clear that he hasn't really used his
language for anything yet, and it isn't close to being able to be used
for anything. Also, all he really has is a bunch of X86 assembler code.
Gack - there are lots of those out there. I'm doing it the other way
round (naturally!).

Looking back at the paths notes above. Do I need to be able to do the
equivalent of ".." in paths? If I have a top-level package for a project,
called "Top", and it contains packages "Sub1" and "Sub2", how does the
definition of "Sub1" reference things in "Sub2"? Are types defined in
"Top" fully known in "Sub1" and "Sub2", or vice versa? I don't think you
get both - there has to be *some* order in which the detail processing
is done. I'd say that the top level package is "compiled" first, and thus
any types it has are fully available to packages it contains. Is it all
types or just "public" ones? As a first guess, I'd say that all types
are known by contained packages, but only "public" ones are exported
to containing and sibling packages.

020903/Tuesday

A problem I had thought about with the MUD system was that of how to
"draw a box around" a chunk of the database in order to output it as
source. Some ideas like ownership, a grouping key on each item, etc.
With Zed, I think the clue is in both the type of the entity, and
special handlers for some types. For example, in order to linearize
a package, you start with something, like the package structure, and
start turning it into a byte-stream. You first send a linearized form
of the types you need to deal with. That can be done by having a
linearizing function attached to the Types code (and the reverse for
the other end), so that you can at least start. When a package is
inserted, it needs to find things it wants to reference. It will do
that by the Ref paths that are stored in it. Thus, the trick will be
to have linearizers for ref things, so that they linearize the Ref,
which is just a bunch of strings, rather than trying to follow the
pointer and linearize what the Ref references. This could be keyed
on the "ref" ExecKind_t that I've been thinking about.

Printing is similar (ToText routines) - they don't follow Ref paths -
they just display them. Might not be needed really, since it is
pretty clear what is part of a package and what isn't. Probably
can't hurt to allow for custom ToText and ToBytes handlers. May
well be necessary to avoid special code in the system core.

Lots of details to be worked out, but I think its time I dove in
started building Ref paths.

Add the concept of a Path as something that can be specified. They
are shorthands for actual package paths. E.g.

    path PWindow = .GUI.Window;
    ...
    PWindow.Create(...)

They might eventually get real handy. The semantics are very simple -
they are like macros. They just make things look better and be easier
to type.

A Package should have a pointer to its parent package. This allows one
to easily reconstruct the path to it. Perhaps there should be a pair
of such pointers, often the same. One points to the parent package in
the set of packages that is a larger project. One points to where this
package fits in the overall package tree. Thus there could in fact be
two instantiations of the same set of packages (??), that only differ
in where the main package points for its place-in-tree. I don't think
this makes a lot of sense. But, packages should have a parent pointer.

020904/Wednesday

If a path starts with '.', it is an absolute path, and you start
searching at the "root". If it doesn't, then you start searching at
the top-of-this-set-of-packages point. All references within the set
of packages are relative to that point. Thus, we don't need the
package path equivalent of "..". Do I put a leading '.' in the paths
for the defining names of the system packages (like "BI")? It may
not matter, so I won't bother for now.

Yes it matters. If I don't use the leading '.', then I don't know to
enter the new symbols into the root - I would put them into a local
root instead.

020908/Sunday

Some idle-time thinking about "object oriented programming". This doesn't
necessarily relate at all to Zed.

If class Child inherits from class Parent, then instances of Child have
all of the operations of Parent as well as Child. If Parent declares a
method as abstract, then it is essentially not supplying an implementation
itself, but requiring any inheritor to provide one. This seems an awful
lot like a Java interface, but more obscure. If the Parent does provide
an implementation of the method, isn't it overridden by a Child version
for instances of Child? Doesn't that break any protection, in that any
semantic rules that Parent wishes to impose, based on its methods, is
broken, since there is no guarantee that the Parent method is called
at all. Even with constructors/destructors, it appears to be the choice
of the Child method as to whether or not to call the Parent method. Not
sure about that - something tickling my mind about there having to be
such a call, but it is the choice of the Child routine as to where. I
think it might be clearer for all such methods to make it so that the
Parent method is called first (last for a destructor of course), and
can *prevent* the call of the Child method. This is a lot like the
way I used 'status' values in MUD.

In Java, and perhaps C++ as well, one often makes generic routines that
handle all sorts of things that provide an interface. When a method is
provided by Child for one, then that method is called, with a pointer
to a Child object, even if the calling situation only had a Parent
pointer. Is that true? I'm unclear, but I think something of that
nature goes on. E.g. with GUI operations, the "Draw" call is a Child
draw call, and that needs a Child pointer. I believe this is a safe
thing to do, and perhaps even a very useful thing to do. What I don't
much like are the "casts" which take a Parent pointer and force it to
being a Child pointer, with a run-time type check hidden in there. I
think if you need to do that, why not pre-declare the Child type, and
have a slot in the Parent record for a pointer to the Child record?

The whole concept of the kind of inheritance that C++ does, which is
structural inheritance, doesn't fit well with what I'm doing in Z, with
record and oneof and matrix types. Have to look more at what Java does.
Perhaps it actually does nest the structures - I'm guessing so. I'm also
not terribly pleased with the whole concept of the descriptor tables that
go on with C++ and likely Java. Complex, hidden, poorly understood. Not
sure there is an alternative if you want to do "Draw" call above.

Ah yes, one problem with the virtual function tables is that they tend
to be pretty static. Its virtually impossible to add elements to them,
i.e. to add methods to a class that has descendants. In my fully
interactive environment, its not clear that that is something I can
have. There *have* been things done in C++ that allow more flexibility,
but they are built up on top of C++ stuff, and are not simply using
the language directly. Probably similar with Java. The thing here is
that my vision is that Zed is an interactive programming language -
the very nature of programming is fully interactive, working within
a live system. I don't want "recompilation" of whole big messes to be
necessary to do things like add a method. I *am* aware that sometimes
recompilation is somewhat necessary, but I want it minimized.


021121 Thursday

Haven't had time to do anything on this stuff for ages. Lego was the big
holdup for a while, as was TV. Now I'm working late on our crush until
Christmas. Sigh.

At work, some folks use some BSD list manipulation macros. In at least
3 cases, they messed up and did other list manipulation things while
iterating down those lists. Messup! So, what can be learned from that
for the lists in Z? Well, in the old example, I've certainly had the
abilities to iterate down the list, and to remove elements from the
list (well, at least the current iteration position element). Also, I
believe I had the iteration control structure in local storage. Now
the question is: can there be more than one iteration of a list at a
given time? If the answer is yes, then when an element is deleted, you
need to be able to patch up all iterations. So, you would want the
iterator constructor/initializer to link them all together so you can
find them. Getting messy, but:

What if you have a "return" statement in your language? In the case of
the iterators, if the inner ones only point to the outer ones, it is
likely OK to simply discard the inner one. Getting picky here - this is
supposed to be a safe programming language, and there is nothing
preventing a ref to the inner from being put into the outer. I think -
I find I'm hazy enough on the language that I can't remember! Sigh.

Even worse than a return statement, what about exceptions? They go
foofing out of many contexts. If you can catch exceptions at random
points, its tough to know what is safe to use. If it works, I'd want
the language semantics to be that you can't do the sort of thing I'm
talking about here. Hmm. Without storing a *pointer* to it in the head
of the actual list, there is no way to link together the iterators,
since the interator constructor/initializer has no way of knowing that
others exist, or of finding them. That implies storing the head of
the chain of current iterators in a head node for the list. Ick. But,
if you do that, then the "return" or exception cases suddenly make the
inner iterator, pointed at by the list head node, invalid. The key is
perhaps that you just plain can't permanently take the address of
something in the safe language.

Hmm. I think the last sentence above is true, and is sort-of the
answer here. The functions that use the iterator structure cannot
create a permanent pointer to it, since it is only passed as a ref
value to them, and you can't save ref values. So, you can't build
the chain of iterators. Then, how do you allow element deletion from
the list? The answer has to be that there is a list header, and it
has an indication of the presense of an iteration, and you can't
delete an element if there is one. Perhaps the trick is that the
flag is set only when a second iteration is started - if there is
only the one iteration, then passing it to the delete-this-element
function allows it to be updated properly. I believe that is done
in the list example way up above. Ok, I think I can live with this.


021203 Tuesday

Some thoughts from a late walk home and in the shower. :-)

The system will need some pretty capable searching facilities. For example,
the ability to search all text fields or subfields/elements... in the
entire (or a portion of) system will be needed. A search menu could
include submenus for types of searches. It would be nice if that menu
is extended automatically as new search modules are added to the system.
Perhaps a search capability is one of the optional things that any
compound type can include. Most won't, since its not relevant. So, for
example, an image type could include an entire menu of image searching
and classification capabilities; a music type could include capabilities
there. The basic system could have a numeric type that does simple
comparisons, ranges, function evaluations, and combinations thereof. Text
searching needs to have patterns. There could be a bunch of pre-cooked
patterns for, e.g. whole numbers, floating point numbers, IP addresses,
capital letters, vowels, etc. There should be a way to find whole numbers
in text, convert them to actual numbers and apply the number classification
stuff to them. Searches should be able to run an arbitrary operation
on each found item. Pre-canned complex searches should be combinable.

Writing a server for the system could be fairly easy. A template could
provide the pieces for a simple one-request-one-response server, and be
expandible for multiple request types. A server registers a major operation
code with the system - messages arriving (messages are of course just a
basic system structure which contains the major operation code as one
numeric field) with that code are passed to the registered handlers, and
thus the server is created with little fuss.


021231 Tuesday

On searching for symbols. I think the answer is that there are two
concepts that interrelate. One is what I've been calling a "package",
which is a set of things that implement a given functionality. Another
I'll call a "unit", which is a part of a package. The "unit" corresponds
more to a traditional source file, in that it is the set of things that
can be displayed in a "text" editor window. Thus, a leaf package will
consist of one or more units. In the simplest case there is only one
unit, and so the two are pretty much equivalent. In a more complex
package, a big package can contain multiple sub-packages, each of which
can contain other packages or units. Any package, whether leaf or not,
can contain units. A package must contain at least one other package or
one unit, else it is an empty package.

So, searching for a initial symbol in a non-absolute path starts at the
innermost scope of any current proc, working outward, ending with the
proc's parameters. If the symbol is not found that way, then the current
unit symbol table is searched. From there, the search goes out to the
containing package, and continues out to the top-most package. That topmost
package will have a position in the main system tree, but searching does
not go beyond it.

An absolute path searches only in the top system table.

The package header in a unit gives the path to that unit within the entire
package that it is part of. Thus, it can be a path itself, starting with
the name of that outer package.

Should a package list the packages and units that comprise it? This might
be desireable from a security point of view. Does it matter?


030106 - Monday

A note from work. Likely irrelevant. When generating native code, it would
be good for blocks of code that are usually skipped over (e.g. the DEBUG
macro stuff at work) to be moved down to the end of the routine. That
way, the CPU can work better on the assumption that forward branches are
usually not taken, which is what I recall happening in the absence of
branch hints or a branch taken table.


030108 - Wednesday

Had a mental image of the process of programming natively. The screen would
show a chunk of code. Areas where the compiler is complaining would be
highlighted in some other colour. You could execute your program up until
it actually hits an erroneous routine (or perhaps even erroneous code in
some function).

If you use "<type> <var> := <expr>;" then you are initializing the variable
to the value, and can assign to it later. If you use "<type> <var> = <expr>;"
then you are doing the same, but no further assignments to the variable
are allowed. If "<expr>" can be computed at compile time, then it is, and
"<var>" is then a constant.


030111 - Saturday

Perhaps replace 'ref' parameters with a choice of 'res' or 'valres'
parameters. 'res' allows the compiler to know that a variable, passed to
such a formal, need not be initialized, and might also warn about any
assignment before the call, if the value is not subsequently used. For
a 'valres' parameter, it is explicitly not stated whether the value is
copied in on entry and/or exit, or whether a pointer is used. So, the
compiler could copy in/out small variables and use a pointer for big ones.
I don't think this kind of things violates any security needs. Its perhaps
better also, since it doesn't give any impression that there need be a
pointer inside the subroutine, so no temptation to try to use it as a
true pointer value.


030112 - Sunday

Put the beginnings of the 'matrix' construct in. It just parses so far.
The problem is that I created a new type for the return value, and that
is not compatible with the just-declared variable that I am assigning
the new dynamic matrix to. So, I either need to compare types (just
array, matrix and pointer types, I think), or I need to have a table
of known types, so that I can always use the same type pointer for the
same types. Which is best? I don't think I need to have other kinds of
types in that table, since they aren't supposed to match even if they
do happen to look alike. If indeed those are the only kinds of types I
need to do that with, then perhaps it is just a list of pointer types,
a list of array types and a list of matrix types? The entries need have
nothing more in them than a list-next pointer and a pointer to the
type descriptor, which I will already have. Hmm. What do I do if the
element type ends up being Types_Error - then I really ought to allow
them to match. Maybe not worth it - that just gets rid of an extra
error message. Also, if there are any type modifiers on the element (or
pointed-to type), then the compound type is not compatible, so I don't
have to mess with those.

Eventually, I'll need something faster than simple linked-list traversal,
I think, since the full system might have thousands of types in total.
Not clear how many distinct pointer, array or matrix types it will have,
however. There is certainly an unbounded number *possible*. Can I use
a Mapping somehow? That would be good, if I could figure out what to
use for the hash value - it has to be something type-safe.

Later: I think I can do a general function that provides an integer from
a type. Often it will build up from a hash derived from the strings that
are part of the type. This may have other uses than the type lists mentioned
above.


030114 - Tuesday

Package variables - what do they mean? I think something initialized with
'=' at the package level should be essentially a read-only variable,
set only once when the package is first "created", and shared by all
users of the package. The expression needs to be evaluable at that time,
and doesn't need to be what is traditionally a "compile time constant".
Variables initialized with ":=" (or not initialized) can perhaps be of two
different kinds - shared among all users, or created for each "context"
(what's a "context"?!?). The system should preserve for display the original
expression that was used to initialize the variable. Same for local
vars, of course.

Perhaps a "context" is similar to a process in other systems. I'll need
something of the sort. For initial work, running hosted on other systems,
I think there will be only the one context.

This of course is totally ignoring the issue of persistent variables, and
variables that override based on where they are in the hierarchy.

I think when the system has multiple active contexts, then it would have
separate instruction and data spaces. *All* code and *all* data not
dynamically allocated would need to be in the spaces of all contexts.
This is because all code can find any other code (plus create it or
load it from elsewhere of course) and execute it. So, part of making any
package runnable would be to "allocate" space for its code and data.
Well, allocate space for the instantiations needed on this system - the
process of parameterizing more versions needs to allocate more of course.


Just what can be referenced from other packages? I'm getting an undefined
symbol error for "theUint", the only field in record Base.Uint_t. That's
because I'm not actually searching in the Base package's symbol table. The
reason I can find the types and functions from that package is because they
are being referenced fully with a path. If I don't search in other
package's exports tables, then I'll never find such symbols. So, now I
have to decide what the rules are for doing this.


Have to decide just what "ro" means. Similar to C's "const", I think.
Currently, the type normalizing code doesn't handle any 'ro' types, so
I can't assign an "ro [] uint" to an "ro [] uint", in general, because
the type pointers are different.


030124 - Friday

Another way of looking at types, is that strong type checking is a way of
enforcing constraints, when coupled with the basic type being 'ro', like
Type_t is. Values of the type can *only* be produced by the entry points
in the package which exports the type, and so it can guarantee conformance
to some constraints. The fact that a given package, and the types it
exports, are globally, uniquely identifiable, means that one can rely on
the values meeting the constraints. Similarly, a type package (like Mapping),
can enforce constraints. The constraints are programmed in the system, but
are not expressed in the system - they are documented in interface specs.
For example, Mapping can guarantee that the same key is never duplicated.

They system can provide UintHash functions for all types. We create them
for the simple types, and then. For a 'oneof', take the oneof index and
xor it with the value from the active oneof case. (Could fix up my
TypeHash routine to work that way.) Etc. Don't need a UintHash on pointer,
struct and union types, since those are not likely usable with something
like a Mapping anyway. They are low-level only stuff - second class!

Define semantics of 'res' and 'valres'. Can optimize some cases. Say that
'res' copies into the destination formal only at the end of the routine.
Similarly, 'valres' copies out at the start, and back in at the end.

Goal for early implementation - generate code at runtime, using calls to
the Type and Exec constructors.

Perhaps use a bunch of special Type values as the error returns from some
of the Type constructor functions.


030201 - Saturday

I'm about to start in on Zed-izing my internal program representations (its
currently a mangled version of what I used in AmigaMUD). Thinking about it,
it looks like I'll have to include the type of the described subtree in
each tree-node, and have special construction functions, perhaps for each
kind of tree-node, that verify the type semantics of the node being built.
To keep the system safe, I either have to verify the semantics when I build
the internal representation, or when I compile that representation into
byte-code. Either way, I'll have to know the types. In the old MUD system,
I didn't need the types in the tree, since only the compiler could build
the tree nodes. When I used the direct tree interpreter (instead of byte-code),
it could determine the types from the bottom up, starting with the
identifiers and constants that were the leaf nodes.

Hmm. That suggests that when I compile a tree into byte-code, I might be
able to do the type checks at that time. That would save having to have
types stuck into each tree node. I wonder if that will work - it ought to,
in just the same way as the tree interpreter did it. Ok - deep breathe -
I'll try doing it without types in the nodes. Hmm. That suggests that I
may not need consistency checking functions when I build the nodes, since
it can all be done when I generate byte-code. Of course, that means that
anything that deals with the internal program-representation trees needs
to be able to handle wildly invalid ones that someone has built. That will
include the pretty-printer and the linearizer (which turns them into a
byte-sequence for transmission or storage, just like I did in AmigaMUD).

One evil thought: if I make the types not restricted (in the sense that
a restricted type is one for which only code in the type's defining
package can construct or modify values of the type), then an evil
person could modify links in the tree nodes and thus build circular
(non-tree) data structures. I don't want the byte-code generator,
pretty-printer and linearizer to have to mess around detecting cycles.
I wonder if I want to add another variant to the public/ro spectrum
of types:

    public: anyone can construct and modify values of the type
    ro: only routines in the package defining the type can construct
	or modify values of the type (if the type is exported, anyone can
	reference the values and their sub-components)
    partly-ro: only routines in the package defining the type can modify
	already-constructed values of the type, but anyone can build new
	values using the normal language-defined constructors.

Given that the last form can be accomplished by simply exporting routines
to construct the values, there is no real need for it. It could be
convenient, but I don't think its necessary.

Perhaps the way to go is to have the exported tree-node constructor routines
do as much validity checking as they can, perhaps even returning error
strings or somesuch. That keeps most of the checking for invalid stuff
in one place, thus saving total code. Maybe.


I think I might actually want to do the Proc_t stuff before the Exec_t stuff.
That's because the Exec_t stuff will all be validated in the context of a
Proc_t, and perhaps a chain of Package_t scopes. And, any identifiers that
are referenced have to be coming from somewhere, and you'll pretty well always
need local variables or parameters in any piece of code that is useful, so
you'll always need a Proc_t to contain those.

The various routines for Proc_t should be doing the grunt-work of determining
parameter and local stack offsets, checking for duplicate identifiers, etc.

Can we add a parameter to a Proc_t at any time? If we can, then we need to
keep the total sizes of parameters and locals separate, and not determine
the final offsets of things until code-generation time, since adding a
new parameter affects things. Adding a new parameter can cause problems
with locals, if the symbol happens to be the same (not allowed to mask a
parameter with a local). Or, I could allow locals to mask parameters, in
which case a masked parameter is simply inaccessible from the scope in which
it is masked by a local. Workable.

Which raises an issue. Have I ever had warnings in my compilers? I don't
think so, except in MPCC, since the languages I define simply disallow
dangerous things. I guess I could have had a "used before set" warning
in Draco, and it would have been useful. So, how do I return these various
things from some of the Proc_t and Exec_t routines? I guess the answer is
that they go into the Parse.State_t record, much like I've used it in my
early stuff.