Contents and ideas Copyright 2002-2010 by Chris Gray. Date of initial publication: April 18, 2010. Any new ideas and concepts in this document are owned by Chris Gray and may not be used in any patent or other restricted use intellectual property instrument without the physical written permission of Chris Gray. Diary of Aleph Work 020413/Saturday - finally started this project, after thinking about it for a few weeks, and having some of the ideas and concepts brewing around in my head for several years. Dove in and started on file "020413-initial" of rambling notes. 020414/Sunday - more notes. Started the initial play implementation by scavenging code from my AmigaMUD/CGMud sources. The first parts I did today I tried to do reasonably cleanly. I believe I had the lexer and symbol table stuff going today. 020415/Monday 020416/Tuesday 020417/Wednesday 020418/Thursday - on these days I put in an hour or so each day. There were a few more notes, and more chunks of the code. I think on Thursday I did the disassembler and tried it out. That required doing something quasi-real for values in the symbol table. So, likely Wednesday I finished off the function body parsing. So, perhaps Tuesday I finished off the proc header parsing. Not sure about the Monday then, since my stupid habit of TV watching would have given me more time on the Monday. Perhaps move stuff back a day, and Thursday was some cleanup, adding builtins, etc. 020419/Friday - I think I got the first actual execution working on this day. Simple counting program. 020421/Sunday - not sure of the date here. Either today or yesterday I changed the function declaration code to define the symbol right after parsing the proc header, and undefining it if the parse of the body failed. With that, I could update the sample program to have my traditional recursive "Towers of Hanoi". 020422/Monday - Don phoned, and came over. We talked about various things for about 2 hours. Also some looking at the planets - cold wind! Talked about global/local naming issues, genericity, GUI, how Java does some things, etc. Don suggested that perhaps this project should be called "beth" intead of "aleph". That's the second letter of the Hebrew alphabet, corresponding to this being the second attempt at writing the world's software. Also easier to type and spell! I might just do it. 020423/Tuesday - started this diary. Entries before this date (and heck, perhaps some afterwards, too, if I don't get in the habit of updating this) are perhaps a bit suspect. Made the bytecode interpreter re-entrant. Didn't do the traceback stuff yet, however. Need the 'fmtXXX' routines first. 020424/Wednesday - realized I didn't do the re-entrancy of the bytecode interpreter quite right. Easy enough to fix though - just need to put back the init/term pair, since "bcRun" is also called from inside native code called from inside bytecode... More thoughts to get in notes file. Hope I remember them all! 020425/Thursday - got some email from Don Reble in reply to my email sent earlier. He didn't reply to my emailing a whole dump of my thoughts (020413-initial), but to a later email that I sent. 020427/Saturday - sent my reply to Don. Spent more time thinking about various things. Initial thoughts/concerns about what 'ref' should be. First hack at trying to define a generic Table package. Second hack done later that evening. Don was over after we were out for supper and we talked about various things and worked on the Table stuff, and on defining an iterable List generic type. 020428/Sunday - a few more rethoughts on the List stuff. Did some reading in the CLU book. 020429/Monday - some random thoughts written down. Another couple of variants of the List generic type. ("type package"?). After going to bed, had a couple of final thoughts. 020430/Tuesday - another List variant, using no pointers, and using a "external = listHeader;" statement, as triggered by reading CLU stuff. Also wrote commentary on it, and followed that by an exploration of how to do variant records (Don and I had discussed it when he was over, and it had been in the back of my mind that I wanted to do it better. Help from the CLU book, too.) I'm not too unhappy with how they turned out. The variant record example is working with the Type type. Parts of it are very Java-ish in the way they worked out. That kind of annoys me, but it does seem to be the only way to do it right. Doing these things efficiently but fully type-safe and allowing ref-counts and garbage collection does change how one does things! Sent the new stuff in an email to Don, but got no reply. 020501/Wednesday - updated this diary - I keep forgetting! I think what I should do today is start some more firm language definition. First whack at the type definitions in the Types package. Only unresolved ref is currently "TypePackage". 020502/Thursday - adding more stuff to the Types package. 020503/Friday - nothing. Home late; watched season finale of Dark Angel. 020504/Saturday - more on Types package. Don came over before supper and I got sidetracked into signature matching. Found the place that I have to first break type-safety in the type system. Cannot possibly have a "FindMethod" function that is type safe, and yet written in the language. It can't even be declared to return a proc type with some fixed number of parameters. Maybe have the top level FindMethod routine be somehow builtin or somehow at the implementation level, and then the guts of it can be type safe in Types.a 020505/Sunday - a bit more work on signatures. Not much! 020507Tuesday - not much done. 020510/Friday - here's what I emailed to Don: In the shower tonight, I first essentially decided that I would give up on the idea of doing generic types like we have been discussing, since there just doesn't seem to be a way to do them properly. I don't really *require* them, since I am fairly happy just building a few of them into the system. However, the fact that "Type" is a first class, publically visible object leads straight to the attempt to generic types the way we have been discussing. Anyway, by the time the shower was done, I had gone back, through stages, of wanting to look at it again. This time, I'm trying to be as precise in my thinking as possible. The result is ugly as sin and requires lots of compiler support, but *might* work. I'll sleep on it, and try some more stuff tomorrow. Anyway, here is the resulting initial part of "Mapping.a". The comments are as important as the code. 020515 - nothing the last 3 days. "Dinotopia" on TV. Not that good. Not sure what happened to the weekend. John K. over on Saturday until 1:00 A.M. So, lots of TV catching up on Sunday. Star Trek tomorrow aft! 020812 - Nothing happened for *quite* a long time. Lots of work on my Lego stuff for the upcoming (end of September) train show. Also time with PS-2 stuff and Spyro the Dragon (PS-1) game. Anyway, I'm taking a week off work, and the Lego stuff is now done. I'm thinking that once I'm up to speed again, I should try doing the parser/checker for the generic stuff in the early C code, and see if it actually works. 030523/Friday - Again nothing for a very long time. However I took this week off work, and have done some good stuff. Not as much as I could, but decent. I've got a lot of an Exec.z now. I've redone a new SymTab.z and the C variants thereof that don't use the old symtab stuff. Lego show this weekend, but perhaps I can just skip it and keep working. 030529/Thursday - good work last weekend - almost done Exec.z . Doing more now, including redoing the binary operations into a common record type, instead of having one for each possible operation/type combination. Question: should I allow - , - , + ? I did in Draco, and used it. But, doing the latter two can yield out of range values unless I put in a run-time type check (not otherwise needed for such operations). For now, I think I'll disallow them, and see how it goes. What about for 'char'? Draco allowed that, and I used it, although mostly with '\e'. I could just provide builtin routines to go between char and uint, and the check could be there. I'll do that. 031026/Sunday: damn - the Types.z code is using ' - ' to do a UintHash of enum values. I guess I need those. Likely should allow the add and subtract of uints too, to be consistent/flexible. 030531/Saturday - more work done the last few evenings. Almost none today, though. At supper I asked Douglas and Don about operater precedence. Don suggested an interesting idea, that parentheses be required if a subexpression is a different "kind" than the whole expression. For example, you can't write "a << 3 + 1". You would be required to write it as either "(a << 3) + 1" or "a << (3 + 1)". I think Don intended it to apply to mixing with bool operations ("and", "or", "not") as well, but I don't think I would do that. 030613/Friday - been a while again, sigh. Been doing some checking of stuff, and added the Exec.z/modifiable stuff and its uses. Also allowed 'for' to use an enum variable. Note: when parsing, I think only 'oneof' and 'record' types make sense to be 'ro'. Think about. [Ancient history] 030614/Saturday - still missing constructors and the array allocator. 030616/Monday - check the math definition of "mapping". If it is supposed to be 1 to 1 (maybe even "onto"?) then find a different name for the current one. I'm guessing is more like the math def'n of "function". Still a type package for a full 1 to 1 reversible mapping could be a good system test. Could even supply a hint as to the range of the values involved and roughly how many of them. Then, based on those ranges, the implementation could be a pair of arrays, the elements of which are the other value of the mapping pair, and the indices of which are the hash value of the first mapping pair value. <> 030619/Thursday - some random thoughts on a desktop layout system. Modes: 1) random by rows 2) random by columns In both of these the system spaces icons out as needed, into rows and columns. The user can change the global order of the icons within the global set, and that persists. User can pick top-to-bottom and left-to-right preference. 3) by rows 4) by columns In these, the user creates rows or columns, and can order the rows and columns. Individual icons are put into rows or columns, and the user controls the order within the row or column. Again, user has top-to-bottom and left-to-right overall preference. Do this for icons within directories, too, like the Amiga did. Use a trash dumpster instead of a garbage can or black hole. Make it green to indicate that stuff is recycled. Can have kewl sounds. Don't know what to do if stuff doesn't fit. Perhaps all of the non-fitting stuff, plus one, are put into a folder, and the folder icon is displayed. 030711/Friday - a bit of thinking about signals. I don't like the cost and pervasiveness of signals in pthreads, maybe not in UNIX in general. Can the idea of a trap point (thought about earlier in the context of being a point at which exceptions are caught) be used for signals as well as exception type things? Thus, anything unexpected gets you out of where you are and is described to the next enclosing trap point. Much coarser grained than typical exceptions. Many "system calls" could then be essentially independent of signals, in that the affect of a signal on them is either nothing or complete abandonment, i.e. nothing needs to be documented, or handled by the programmer. Some calls would need to pay attention to some signals. E.g. the equivalent (if any!) of a blocking read from a console has to pay attention to a user-generated signal (as in control-C) and return an indication of that. Hmm. Does it? Can't that be handled by a trap point - if the user has hit control-C, then it is typically because the user wants something large to happen, not just the read to return. 030712/Saturday - reading comp.arch, this is from Jonah Thomas: Yes, and the problem is that when you have giant libraries of simple routines it's often easier to rewrite and test than it is to find what you want. And when you have libraries of big complex things it's hard to get all the parameters and flags and switches right, and there's a lot of crud in the big library routine that you don't need, that will do who-knows-what if you accidentally call it with the wrong inputs. So the best library routines are things which are hard to write but which provide services that have simple interfaces. 030719/Saturday - finally getting around to replying to Don's earlier emails. See ./030701-parTypes.djr for his email and my response. Perhaps what I really want is call-by-value (default) and call-by-value-result. That way I can have values that are updated by a routine, but I can still use them efficiently inside the routine. It shouldn't be hard to note that there is no use of such a parameter before it is first assigned to, thus getting rid of a fetch of it at the start. 030819/Tuesday - second day of week off. Not accomplishing much. Sigh. Just going through Exec.z once again. Got as far as RecordConstructorAppend. What stops someone calling this routine on a random RecordConstructor_t that they have pulled out of some Exec_t, rather than on one freshly minted by RecordConstructorStart? Its beginning to look like I need a TempRecordConstructor_t that is used during the construction, and leave the real RecordConstructor_t for creation only in RecordConstructor. Similar for other constructs. Sigh. I was thinking that I might want to go ahead and just have a generic ExecList_t that is a list of Exec_t's, and use it in several places. The problem with that of course is that the caller could then use other XXXAppend routines and defeat the checking in, e.g. RecordConstructorAppend. Hmm. Wait. If I keep the ExecList_t's entirely inside of the temporary structures, then they are always safe. [Much later language additions - the ability to have 'private' structs, fields, matrixes, etc. have made this safe and efficient.] 030820/Wednesday - trying to get a bit more done. What about the Types.z package - does it suffer the same weakness of allowing a caller to extend types *after* building them? Since Exec.z depends on Types.z, does that produce a hole there too? Grr. I think this sort of thing only matters for something like array indexing, where adding a new index invalidates things. But you can't do that, since there is an array of index expressions, and you can't rebuild that array without calling the final checker routine. So, now I go undo a bunch of stuff. Sigh. [Again, much later - everything now has TempXXX routines and structs, and so should be safe.] 030821/Thursday - got most of Exec.z straightened out for the cheating. Need to add declaration (with or without initialization) as a statement kind, since we need to put them anywhere within a sequence. Each must have a type and a list of ids, each with possible init expression. To satisfy folks like Dale, we want some kind of named-bit type, which allows enclosed enumerations, and some operations on them. e.g.: type StatusRegDir_t = bits { srd_dir: srd_front, srd_left, srd_right, srd_back; }; type StatusReg_t = bits { sr_dst: StatusRegDir_t; sr_src: StatusRegDir_t; sr_mustBeZero4; sr_mustBeZero3; sr_kind: srk_ill00, srk_major, srk_minor, srk_ill11; sr_intreq; }; The above totals 9 bits. The fields start at the *upper* bit number and move to the lower, but the resulting bits are pushed to the *lower* end of the value. Thus the last named field will always represent the lowest bits of the unit, and it is upper bits that might be ignored. Enum-like fields have values starting at all bits 0 and going up. There must be a power of two alternatives, and the count determines how many bits they occupy. The display routines all understand this, so if we have a value of type StatusReg_t that in binary is 0b011100101, it would be printed as: sr_dst:srd_left | sr_src:srd_back | srk_minor | sr_intreq We could do the following: case statusRegValue.sr_src.srd_dir incase srd_front: blah-blah incase srd_left: blah-blah incase srd_right: blah-blah incase srd_back: blah-blah esac; or: if statusRegValue.sr_intreq then blah-blah fi; if statusRegValue.sr_kind ~= srk_major then blah-blah fi; There are no explicit bit numbers/values, which might be a bit confusing. The editor should show those, as should the debugger if you ask it to print things in hex, binary, octal, etc. The above named display form is simply the default output format for bits values. Bits values are assignment compatible with unsigned ints that are large enough to hold all of the bits in the bits value. [I see no harm in this, but is it useful? It would reduce the "type safety" of the bits values considerably.] Bits values can be directly modified by assigning to the individual fields: statusRegValue.sr_intreq := true; statusRegValue.sr_dst.srd_dir := srd_front; It is also possible to use bits values specified directly: statusRegValue := sr_dst:srd_back | sr_src:srd_front | srk_major; Fields not specified are filled in with zero bits. [Did not happen] Note that a single-bit field is actually treated and manipulated as if it were a bool value, with 0 = false and 1 = true. Is the provision for nested bits types overkill? How common is that sort of thing - it complicates the language considerably? I put it in on remembering the flags stuff the Amiga GUI code did, where a set of defined bit values was inherited and extended for another, related, use. [Much later: bits are packed to the *higher* end of the containing power- of-two sized value. Nesting is supported, but all bits type are named, so you must declare the nested types separately. Field references and assignments work. Constructors exist, but are ugly and must specify values for all non-constant fields.] Crap! Just doing an ErrorSub on the top-level returned Exec_t from one of my constructor routines doesn't help. The malicious caller can just skip over the eik_errorSub node, and get at the invalid Exec_t beneath it. E.g. for a proc call, if not enough parameters are passed, the built-up Call_t may not have any internal ErrorSub, but still not have enough actual parameters. So, we have to ensure that an ErrorSub is always *internal* to the returned structure. In the case of a proc call, we need to ErrorSub the proc expression, or append Error parameters. Hmm. Its ok to ErrorSub the top level expression, so long as the interior expression is valid in its own right. In the call of the Call_t, that wasn't true. But in other cases it can be. Then it is OK to let someone get at it, since the standard checking won't let them use it in a place that it doesn't fit. E.g. in FieldRef, the default at the end - it returns the 'base' expression ErrorSub-ed. If the caller strips away the ErrorSub, they get the 'base' expression alone. If that was valid, then all is well. The key is to not *create* an invalid structure - the ErrorSub's and Error's are used both to mark the detected invalid stuff for the pretty-printer, and to protect us against generating bad code in malicious cases. [Later: added ex_error flag] I will of course have to make sure that the debugger does not allow you to change the value of fields inside an 'ro' record, etc., unless you are authorized to do that (owner of the type? low-level programming enabled?) 030822/Friday - Whew! I think I'm now done with Exec.h . Well, other than going through and assigning real error codes. There are going to be a lot of them! There is still a fair amount to be added. E.g. if I do the above 'bits' type, the constructs will be needed. Also all of the constructs for pointer operations (take pointer, pointer arithmetic, pointer dereferencing, pointer comparisons). Also stuff for the added sized types that the low-level language will need. Probably more too. 031001/Wednesday - Its been a while. Lego train show, new DVDs, etc. Sigh. Anyway, some thoughts from the shower about exceptions. The concept I wrote about earlier, that of a general trapping point, could be the point at which all exceptions are caught, just like it is the point at which all signals are caught (or something like that). <> Asside: on entry, you could give it a "Resources_t" structure, which sets the limits on execution within the code within the trap point. You could fetch the one for your own context, and call a routine to adjust some arbitrary one (say read from persistant variables) to fit within your own limits. End asside. Probably want an explicit exception catching construct in the language, since that can be useful for things like memory probing. The usual "try" ... "except" sort of thing. Keep them *very* simple. It would be nice to not allow nesting for the same exception, but lack of compile- time visibility inside called routines prevents that. Oh well, this could be a low-priority thing for the language. Have lots of capabilities when an exception is caught. E.g. for an arithmetic-type exception, allow continuation after the specifying of a result to return from the operation. E.g. for a fetch through a bad pointer, allow retry with a different pointer, or allow continuation after specifying the result of the fetch. Might be hard to implement, but that's some thoughts about what would be useful. For higher level stuff, like referencing a field via a nil record pointer, allow specification of the value to return, or perhaps allow specification of a replacement record to use. Can we modify the actual variable or whatever that contains the pointer in these cases? Basically, have all of the facilities used by a debugger available, in terms of examining code, single stepping, advancing to certain points, perhaps modifying values, etc. [2010-03-11: there are no exceptions in the language.] 031019/Sunday - been doing more work over the last few days, starting to use the new Exec.c with the old pExec.c - some went quite well, but ran into problems with references to symbols. Also want to redo the field references, so that the Exec.c routine looks up the strings. Big ick just thought of, however. Some of the kinds of Exec structures are only valid within a given proc context. That is checked when the Exec is created. But, there is nothing stopping someone taking an Exec found inside one proc and using it as part of an Exec that is being built for another proc. E.g. in a sequence, include an assignment statement to a local from another proc! So, unfortunately, there needs to be a routine that is called when the final Exec is attached to a proc, that goes through the whole thing and verifies that anything that is relative to a proc is relative to the right proc. This will include local variable and parameter references, and return statements. What about stuff that is relative to a scope? If it stays attached to the scope, then only the scope needs to be validated as being in the proc? Actually, if the check is done top down from the point at which it is to be attached to the proc, will not it simply go through the tree of scopes, so the checks will work as they do now, if we update the current scope pointer as we go? Currently, I seem to be duplicating stuff with respect to scopes. At the Proc level, I build the scope tree, and the scopes point at the Exec that is their body. But, at the Exec level, I have eik_scope, where an Exec points at a nested scope. Is this a conflict? In the above note about verifying scopes, would I actually have to be walking both the Exec tree and the scope tree at the same time, to validate the Scopes in the Exec tree as being the same ones as in the Scope tree? [Later: this became Exec/ProcCheck. Even later: this turned into Exec/Copy/Verify.] <> GUI thoughts from earlier today: needs to look pretty, on first release. Sigh. Some thoughts about overall appearance stuff: - allow pictures to be used over the entire display. i.e. allow one image, scaled to the entire display size, to be lightly overlayed on all of the GUI elements. E.g. my dandelion picture would result in light dandelions overlaying all GUI elements - window borders, scroll-bars, buttons, menus, etc. Allow selection of the picture. Allow strength of picture relative to GUI element drawing. Allow choice of whether this works as window backgrounds as well (allow separate strength selection). If window background is specifically selected per window, that will override the general GUI picture. - could have GUI styles that control the general appearance of the various elements. E.g. very mechanical looking (steam punk?), fuzzy (as like in fuzzy dice), violent (lots of points and jags), animated (e.g. flowing water, waving grass, etc., or e.g. a computed one like my previous idea for Mandelbrot set). Another security check: when adding a caller-supplied Exec_t (same for Type_t, others?), do all checks before doing the actual add of the caller-supplied element. This is so that we don't end up having done the assignment and then the caller contrives to have us get an addressing error, etc. before we can do some of the checks. [This was resolved when the checking turned into using the Exec_t tree copying code. One key is that the assignment to the Proc_t field pr_exec is done *after* the validation of the Exec_t tree. So, if that validation dies, the assignment is not done.] 031026/Sunday - been doing some work during the week, getting things to run with the new Exec stuff. The scope stuff is my current issue, but there were bugs in both the Exec_t stuff and the new pExec stuff. Have an Exec_t node that is just an error node containing an undefined identifier. That way a pretty-print can just show that id in place. [Done, much later] Represent a Package as a linked list of the things in the package, which can be proc definitions, proc predeclarations, variable decls with inits, constants, types, etc. Use a one-of. Could then display a package with proc bodies not shown. How does one edit a package? Want to be able to insert things into the list, but need to make sure that everything that the new thing references actually is before it in the list. Similarly need to verify if something in the list is deleted. Need an explicit representation in Exec_t for declarations. Want to preserve the ordering of things that the author has used. Some declarations, i.e. variable decls with inits, are executable. What does doing this imply for what is represented in scopes? Need to verify procs carefully, as well as types. Also packages, when I get to that point. 031027/Monday <><> When showing a package, could lightly grey-out variables/types/procs that are not used at all and are private. 031102/Sunday - think about named types (already present), in regards to just what they mean. Should they be treated just like in Draco (whatever that was!), or is there a reason for a difference. Check what Draco does for if/case results, assignments, parameter passing, etc. [Long since resolved - names create new types that are compatible with what they name, but not compatible with any other named type. Much later - there were issues with the result of conditionals, now resolved.] <> Thinking about run-time checking. Knowlege about constraints on values has to go back through call chains, since additional knowledge in calling routines may be enough to avoid checking in a called routine. It may in some cases be desireable to clone a tree of called routines so that knowledge in a subset of the callers can result in the removal of checks in a variant of the called tree. An interesting example is that of handling incoming messages that meet some kind of specification. Early checking of the raw packet may make later compiler-generated checks unnecessary. An even more interesting thought is that if the incoming message is a true Z message, then it is possible that *no* checking is required, since the Z code which generates those messages (the byte-streaming code) will not generate invalid messages. The care in which the byte-streaming code for a given type (if its not default system code) is written will affect how much checking is needed at the receiving end. This all assumes, however, that we can be *sure* that the apparant Z connection and message really is a valid Z connection. We need to be *very* *very* *very* sure, if the whole concept is to be trustable. Ultimately, if this stuff ever gets used, then perhaps the statement needs to be that we can trust the survival of the human race to the checks. (Sure Chris, sure!) 031111/Tuesday - need to keep track of the input base of numeric constants, so the pretty-printer can display them the same way. [Done, later] There was almost certainly a bytecode generation bug in CGMud, and even possibly in earlier compilers. If an argument to a function call or other independent context like that involves 'and' and/or 'or', then the true and false chains likely either were not closed off, or the global ones were closed off too. 031113/Thursday - thoughts on walk to work, and discussion with Darius. Don't support add/sub with enums. Would have to add versions of uAdd that does overflow within range checking. The things I've done with enums in Draco can typically be done with other methods, especially if enum values are allowed as array indexes. Do need a way to find the uint equivalent of an enum value, since that is needed by the Exec.c code. Could introduce the "make" construct from Draco, and use that for int<->float conversions as well. Also note that if "enum - enum" is allowed, the result value must be "sint". Then folks will try to use "make" to create enums. Sigh. Maybe special-case subtracting the first enum element? Maybe add "ord", also useful for char => uint? Likely cleaner. [Resolved by having subtracts check for underflows, and having a range check bytecode.] Might want to add a "size/kind" field to enum descriptions. That way, the lower-level language could at least get compile-time checking and the convenience of enums, but not pay the run-time cost, and be able to have various sizes of them. [Not sure what this meant. 2010, Zed never allows an out-of-range enum value. That way it doesn't have to check on things like 'case' statements, array indexing, etc. However, there is the later 'oneof' type where you can give explicit values to tags in the type - no checking is done on these values.] 031124/Monday - arrays now work, a few details left. Matrixes next. <> Value watching, if implementable effectively, could be quite powerful. E.g. a "magnifying glass" for the display could simply watch the display contents, and so wouldn't need to poll. E.g. if a meteorology setup recorded the current temperature to a given variable, anyone could write a watching routine to display the temperature. E.g. one could at some time attach a watcher to a variable that is keeping track of the progress of some computation, and then display that via a simple window. E.g. the title of a window could be produced by watching some other value and displaying it as it changes. E.g. a debugging system could watch lots of variables in the program being debugged. Would want to watch out for infinite watch triggering. E.g. in the magnifying glass example, if it was watching the top-left corner of its own window, could get an infinite sequence of updates. Perhaps updates are delivered via a thread in the context of the watcher, and not in the context that changes the watchee. Could still have infinite updates, but at least there would be thread operations in between, and so it ought to be stoppable. 031128/Friday - refs Perhaps make refs explicit, using say '@' for make-ref and '$' for follow-ref. Alternatively, just use proper pointers and simply restrict how you can use them in the safe language. You can only make a ref of a local variable or proc parameter. The local can be a struct or array, or a piece of one. The only thing you can do to refs is assign them around, compare them for equality, deref them, make new refs for them, and pass them to subroutines. One key is that you cannot store them to non-local variables, or to anything on the heap, like a record element. [Done, later] 031212/Friday - text interface thoughts <><> As I've always wanted, the text interface should have the concepts of input and output history built-in, like the Amiga Shell had input history built-in. When you run a text shell (need some other less-history-laden name), it operates only sort-of like a traditional command shell. The built in commands would include: (note: for now, lets say that a directory is a package, and the entries in the package can be other packages, or other things that I'll just call items) list/l - show the names in the current item move/m - move to the named package (.. as usual) view/v - view (read-only) the named (or current) item edit/e - edit (read-write) the named (or current) item copy/c - copy the named item to another place delete/d - delete the named item (only if there are no references other than from the current package) rename/r - rename the named item to another name (note that doing so will change all references to it) new/n - brings up a menu of item natures. Upon selecting one, you end up in an editor for an item of that nature. Perhaps many natures are pre-known (some registry?) and the name can be given, along with a optional name for the new one, before landing in the appropriate editor. - I'm thinking that is short for "v " Editing anything always brings up a graphical editor. Viewing can bring up a graphical viewer if that is all that makes sense, or can just dump a text display of the item, including pagination. Optional on the "view" command are parameters +n and -n, which increase or decrease the level of detail. E.g. if applied to a package, the default might be to list the symbols exported by the package, along with a short tag indicating their nature. View -1 would just list the symbols, on as many lines as needed. View -2 would just display the count of exported symbols. View +1 would show the first part or header of the declaration of the symbol. View +2 could show the bodies of procs. View +3 could show all of the non-exported stuff too. 031214/Sunday - constant expressions and parameters When parsing a constant expression, just add a flag to the ExecContext_t that indicates that, so that a complaint can be issued at the right place if something isn't a constant or instantiated parameter. Then just get a correct parse tree as normal. Compile it into a proc and run it. The final value is the result of the constant expression. We want to keep both the initial parse tree for the expression and the final value (perhaps use the "optimized" Exec_t node kind for that). Note also that until we have evaluated the expression, we cannot do a Types.Normalize on the type. [Much later: ended up doing compile-time evaluation directly, rather than turning into a proc.] %%% When preparing a Package for execution, we have to instantiate any package parameters. When generating code for initializers, and then for the procs in the package, yet another flag in the ExecContext_t should indicate that we are doing so. Then we can get complaints if an encountered parameter has not been instantiated, regardless of which package it is in. We should likely build up a list of the parameters we have complained about, so that we only complain once about each one. I think there needs to be a very global execution context in which things run. That would contain the set of instantiated parameters under which things run in that context. It could also contain things like pointers to space for per-instantiation package-level variables. Think a bit more about Types.Normalize - just when do we want two types to be equal/equivalent? In particular with respect to any parameterization they use. [Resolved] 031215/Monday %%% It should be possible to send many upgrades to libraries essentially as a kind of 'diff', going from the old version to the new version. That could save a lot of transmit time. Each system essentially has the "source", so can apply the diff and recompile to bytecode. 031219/Friday %%% Want to have a system random seed generator. It should be *very* unpredictable. So, it would be good to make it dependent on the CPU speed, the amount of memory, system load, etc. Perhaps even grab bits from random VM pages. When using encrypted sessions, reseed on a regular basis. 040101/Thursday - caller of Exec (and other) routines can randomly switch context pointers on me! E.g. the Exec.Context_t and Proc.Context_t values. Thus, there is simply no way to know that a given built-up Exec_t is valid in its whole, at least in relation to what Proc it is referring to (and that will go back to the Package, I expect). So, I will indeed have to have a final verification routine that is called when an Exec_t is attached to a Proc. I had sort-of thought I might, but this is the real reason - I can't rely on the caller to be consistent about the parameters he passes me, and there is nothing I can do to stop the caller from having multiple context's hanging around. [DONE. Plus, I save the context pointer in the TempXXX struct, and only the first proc of a series accepts one as a parameter.] 040102/Friday %%% Need to not declare a new variable (declared inside the 'init' of a 'for') until starting the body. This makes it impossible to use the variable in the 'init' or 'limit' expressions. That would be bad, since it would be using an undefined value. Urgh. Need to prevent use of a oneof selector value anywhere other than in a oneof case. The reason is that if it is used in an expression, and that expression is a case selector, then we won't know to record that oneof as active in a oneof case. Hmm. There is nothing wrong with that - you just won't then be able to select from the oneof. Check. [Its OK - no oneof fields are valid if the selector is an expression derived from one of the oneof selector tags.] 040103/Saturday - added some %%%%% stuff to Exec.z - stuff to do. Remember to check that code outside of the exporting package is not allowed to assign to fields of an 'ro' record. This is in addition to disallowing there use of 'ro' record and oneof constructors. [DONE] Would it be good to also have a 'private' attribute for record and oneof types? That would make the type essentially anonymous, in that all you can do is compare pointers, and not examine individual fields. The syntax should change so that the 'ro' in this case, and also for any such 'private', is part of the record/oneof syntax, and not something external to the declaration. E.g. public type Exec_t = record ro { ... ['private' is now there to prevent access to fields. You can also just rename the type and then can just compare/assign pointers.] 040107/Wednesday Remember to eventually do something about struct/union/array assigns. [2010-03 - none are currently supported.] 040110/Saturday Should likely change routine "bcComp" so that it takes a parameter "wantAddr" indicating that we need the address of the Exec_t thing on the stack, and not the thing itself. That will allow for the simplification and generalization of handling of 'ref' parameters, and will be needed in the future for the '&' address-of operator. [DONE] 040117/Saturday - get rid of 'nil' as a valid string value, and the == and ~== operators. The whole lot doesn't really add anything to the language. Initialize string locals, etc. to an empty string. This *does* require initializing record, struct, array and matrix values. [Much later: no - we don't want to require that allocation/free.] 040120/Tuesday - note on encryption. See company "Certicom", and elliptic curve cryptology. 040207/Saturday - too much "split brain" work at YY - didn't have the brainpower to work on what was next in Z. Sigh. <> Thoughts today on reference counting and garbage collection. One question I had is that of how to avoid freeing up in the GC, something that has only just been allocated. I believe the answer, in a system that combines GC and RC is that newly allocated chunks of memory have a reference count of 0, and the GC will not free something that has a reference count of 0. Reference counting frees something that transitions from a count of 1 to a count of 0 (possibly offset for any needed special values - currently 0 means "never free"). Local variables and function parameters do need to count for reference counting, since they might be the only thing that points at something. It is when a pointer to a newly allocated piece of memory is stored into a local/parameter or some field of another chunk of memory that the new chunk's reference count becomes non-zero. So, I've been hoping that while things are in transit, e.g. on the stack for use in expressions or as arguments to an upcoming function call, then they don't count as a reference or for GC. The theory is that wherever those temporary values came from will count for both purposes. It is only stores and scope-ends that actually free things. That suggests that on entry to a function, an incref must be done for each pointer parameter. And, to match, a decref is done on each such parameter at the end of the function. This is somewhat like the "initL" and "freeL" that I was using in the MUD byte-code machine for string variables. With the local scopes in Z, the initR and freeR (use R for general reference) need to be done on scope entry/exit as well. Some optimization is possible, but I don't know that it would typically be of much use. [This is done, but no optimization yet.] The problem I see with this is that side effects during function calls could result in something being freed, when there is a reference to it on the stack as a temporary. E.g. if we were in the middle of doing a function call that takes two arguments. The first is a pointer from some heap location, and that has been pushed. The second actual parameter is the result of a function call, and during that function call an assignment is made to the heap location that the first actual parameter came from. That pointer could now be invalid. Ick. This almost seems to suggest that any push of a pointer to the stack must include an incref. Consider: PackageVar.p.q := func(); The execution of func could assign to PackageVar.p, and thus invalidate the pointer that is temporarily on the stack to handle this. Even changing the semantics of the language so that the RHS of an assignment is evaluated before the LHS doesn't really help, I expect, since a complex LHS could invalidate itself (e.g. a function called to index an array). Double ick. I think every push to the stack needs to do an incref. What about GC? Does GC need to be able to find all temporary values on the stack like that? That could be quite painful to implement. But, I currently do arithmetic on pointers to do things like field selection, etc. When and how do I do a decref to correspond to the above incref on stack temporaries? The only answer I see so far is that reference counting simply will not work, and that garbage collection must be restricted to times when there can be no transient pointer on the stack of any thread in the address space of a process in which garbage collection is running. Argghh! Well, GC can work if there are instructions to do all of the high-level things that can happen with a pointer, and they implicitly do any decref's needed. Will that work for very complicated temporary usage? They would have to do incref's for any reference they leave on the stack. Let's look at a more detailed example: type T_t = record { uint t_this; T_t t_next; }; [] T_t PackageVar; proc freeingFunc()T_t: PackageVar[17] := nil; nil corp; proc doit()void: PackageVar[17].t_next := freeingFunc(); corp; Object code: Proc freeingFunc successfully defined - 22 bytes of bytecode 0000: pshPV PackageVar 0009: pshCL 17 (0x11) 000b: mtxIdx 1 000e: pshZ 000f: storeR 0010: pshZ 0011: ret1 L0,P0 Proc doit successfully defined - 29 bytes of bytecode 0000: pshPV PackageVar 0009: pshCL 17 (0x11) 000b: mtxIdx 1 000e: load 000f: pshCL 8 (0x8) 0011: uAdd 0012: call freeingFunc 0017: storeR 0018: ret L0,P0 So we see that when "freeingFunc" is called, it is pointer + 8 on the stack. We can handle this by having load-with-offset and store-with-offset instructions, so the actual pointer is on the stack, but will even that be enough? Also, it would be quite painful to identify all of the temporary pointers on the stack during GC. If we try to restrict GC to only running between statements, a nasty programmer could guarantee that it never runs, by simply having freeingFunc never return. Can I keep a simple auxilliary stack of stacked pointers? It is one of the "root"s for garbage collection. Also, entry and exit of that stack could be incref/decref. [2010: Zed has had the "tstack" for some time now. It holds the original references (and right now, their type, but I want to remove those). Only the tstack will be examined for temporary roots during GC. Well, plus all local and proc formal references.] 040209/Monday <> The cost of that last can be lessened somewhat by having an optimizer that finds most of the places where it is sure that it doesn't need the complex stuff, and just using non-pointer push/pop/etc. But, that would need more versions of the complex instructions, like mtxIdx, since that instruction can both consume and generate a pointer (does it ever need to decref???) How do I handle something like a record containing an array of pointers to records? I would need something like an aryIdx instruction that finds the array as some offset from a pointer on the stack. That might be more efficient, but we could be getting into a lot of variants of instructions here. [Not an issue - the tstack works for this.] 040219/Thursday Going to replace the bytecodes. Its time to start working towards the stuff mentioned above. I think I'll use something like the "constant table" in Java, but I'll call it a "ref table" instead, since its mostly an array of pointers to random kinds of things. 040221/Saturday Defining the new bytecodes (already started Thursday). I was looking at a whole slew of array indexing opcodes, to handle the various sizes of array elements, the various places arrays can be, etc. There is a better way, though. Do it like I did before - have an array indexing routine that assumes the address of the array on the stack, then leaves the address of the desired element on the stack. The possibility of the array being inside a record can be handled by pushing the record ref onto the stack, then DUP-ing it, to make a temporary copy that we can do the needed arithmetic with. Then IGNR it when the stack gets popped back to that point. The same sort of thing could be done for record field references but I'm thinking it is cleaner to have the explicit opcodes. In thinking about optimization of array/matrix references, I've always thought there would be alternate forms of the instructions that do not do bounds checking. However, if bounds checking is not needed, we can just push the address of the array/matrix, do the indexing using the simple unsigned arithmetic instructions, and use load/store to access the array elements. That is efficient in native code, but is it efficient in the byte-code machine? It might in fact end up being *slower* than using the checking instruction! Hmm. Since I only seem to need the one array index computation instruction, perhaps there can just be another variant that does no checking. That handles arrays, but not matrices. Is it worth it? 040229/Sunday Been implementing the new bytecode stuff. Pretty much done the bcComp compiler stuff. Next is the disassembler. <> When I get there, think about doing a fairly complex web-page example. E.g. copy SlashDot's, or the The Register's. Have to allow the concept of logging in to the site, which then (I presume, never having tried it) gets you your own specific preferences or something. Anyway, that logging in should be secure from the Z system. So, it can safely be made to be automatic (if the user so wishes, on a per-site basis). There can be cookie-like things saved for the site, which specify this user's id for the site. With proper authentication, don't need a password. However, we might want one anyway (perhaps just not used from system with the authenticated "cookie"), so that the user can "login" from some other machine, with the password and without the "cookie". 040301/Monday <> Have to document that in a 'for' loop, the 'limit' expression is actually evaluated before the 'init' expression. Actually, state that it is undefined what order they happen in. 040411/Sunday <> Want to be able to allow loading binary (byte-code) for things like device drivers for proprietary hardware. However, such a driver should only get access to what it really needs. So, it has direct physical memory access to stuff, and can read/write control registers, but it cannot, for example, open a network connection. E.g. we don't want binary-only drivers to be able to "call home". They might have the ability to support a custom call which would allow a custom program to talk to them and do network things. But, the user would always have control over whether that program runs or not. Now of course, having kernel-level memory and I/O access means that in reality device drivers can do whatever they want, but my desire is to make it hard for them to do it, and to make it likely that if they do it, they stand a reasonable chance of being detected at it. 040428/Wednesday <> If there are multiple parameterizations of a given package, then there needs to be provision for multiple copies of the per-package variables for it, since they may depend on the parameterization. But, perhaps some are best if they are shared. Maybe have to declare which is which. 040516/Sunday <> Should preserve type symbols in the ref table (i.e. don't skip them), so that the disassembly will be able to use them when it prints out type references. The same will be needed for the pretty-printer. This will slow down execution, but optimization can produce a second ref- table which has skipped them. [Done, but not the optimized form.] 040517/Monday vitanuova.com - distributed OS 040522/Saturday Trying to push past last week's small stall in making the disassembler do named types better. The issue is that of how to do reverse lookups from type (and proc) pointers to the symbol that should be shown. One of the philosophical questions is that of which symbol/path to show if there are multiple that name the same thing. The desired answer, I believe, is that the symbol/path should come out the same as it was in the initial source. That is pretty much required in order that a recompile of something pretty-printed get the same end result. Some thoughts that came out of this: There should be a 'volatile' type kind, for use like in C. There are many situations where it is irrelevant, including when the code was not produced by a low-level authorized programmer. [Attribute done] The 'ro' type kind is currently misused. It should mean nothing more than that the type is an 'ro' type, in whatever context it is used. [Fixed] There should likely be a parallel record kind, used to represent a record whose fields are read-only outside of the package in which it is defined, and from which it is exported. Syntactically, this could be "rorecord". It is inside the representation of that that a reference to the containing package is needed, so that the read-only property (and the disallowing of the constructor use) can be correctly applied. [Done] Need to check if/where the containing package reference currently in Proc_Body_t is used other than for path display. Answer: its used to know which package's ref table is used with the proc's code. Currently, containingPackage references are in Proc_Body_t, Types.NamedDesc_t, Types.RecordDesc_t, Types.OneofDesc_t. Ah, the one in OneofDesc_t is needed for the same reason as the one in RecordDesc_t - you can export a 'oneof' as 'ro' currently. So, the 'rooneof' concept is needed as well. Currently nd_containingPackage is not actually used anywhere! The new kind of export, in which only the type name is visible, could be a type kind just like the current 'named', except that it too would have a containing package reference, indicating where its definition is visible. [Done] It should be possible to give a type a new name anywhere at all. That new name is usable only within the scope it is declared in. The icky case is if someone renames a type from some package and exports it from a new package. It should be possible to detect and prohibit that - I can't see why it would be valuable. Hmm. Maybe exporting it as one of the above named but not visible types could be useful - it just so happens that a type from some other package is directly useful in a package as an anonymous type. [Seems just fine - see test/names.z] This still leaves the question as to how the pretty printer and disassembler find the correct name/path for a referenced type. Do I now need to introduce ref-paths fully, for when something in a package references something from outside of it? Ultimately those are needed to allow the resolving of inter-package references when a package is "loaded". So, perhaps that is the answer for pretty printing too. But, simply having a pointer to the package in which the symbol is defined works too, since the in-memory system has parent pointers in all packages. [Things are clearer now that I have Package/PathToPackage_t.] 040523/Sunday Oh yeah, the other issue that came up last week is that of initializing and terminating local variables. In order for garbage collection to work, reference variables should likely start out as nil. When their scope is exitted, a decRef is needed on what they point at. Combining these results is just assigning nil to all reference locals, and zero to all other locals (since they might become a reference variable in the next scope) on scope exit, and clearing the stack to all zeros before starting. To help in the pretty-print and disassemble, perhaps the right trick is to have instructions for scope entry and exit. They contain a ref table reference to the scope structure they are entering/exiting. That should make it possible to correctly identify local variables by just their offset (useful in disassembly), and to clear up scopes on exit of the scope. If a stack of scopes is maintained by the disassembler and pretty-printer, then only the scope entry instruction needs the ref. True, but for execution, it might be faster to just have a ref in the scope exit instruction. Note that we need to have scopes represented properly inside Exec_t structures in order to do this. [Done] A thought here is that if it is only type names that need this disambiguation, then the NamedDesc_t is the right place to have either a pointer to the containing package, if there is one. If there is no containing package, then the type name must be proc local, and so just its name is all that is needed. Of course, that means that there must be a way to get from the NamedDesc_t to that local name. Some of the issues come down to the type equivalence rules. In Draco, I used "name equivalence". That means that if I did something like: type T1 = [10] int; type T2 = [10] int; then I could do array operations on values declared as T1 or T2, and I could assign/compare values of either T1 and T2 with values of type "[10] int", but I could *not* assign/compare values of type T1 with values of type T2. Naming a type made it into a new type, compatible with what it names, but not compatible with types that are other names for the same underlying type. Do I want the same rule with Z? [That's what I've done.] I suspect that in many (most? all?) cases the answer is that it doesn't matter. If the behaviour of the language is such that you can't tell what the type equivalence rule is, then it clearly doesn't matter - the use of named types is an implementation/convenience thing. A related question is that of what the type of a value is that is produced using a named reference to an underlying type. E.g., in Z: type T1_t = record { ... }; type T2_t = T1_t; T2_t variable := T2_t(...); Is the record created of real type T2_t or is it of real type T1_t? I.e. does the use of the symbol T2_t as a constructor change the type of what is constructed? Does T2_t have a constructor of its own, or does it just reference the constructor for T1_t? [The construction is invalid, since you have deliberately hidden the 'record' nature of the type you are using.] If a record type is used for a linked list of some form, it will have itself as a subtype. On output we would like that to show up as the name of the record type as given in the context in which it is declared, and never as something like T2_t above. In Draco, types did not exist at run-time, so these questions likely didn't matter. In Z, however, types do exist at run-time, so the questions are relevant. I believe I'm planning on an "any" type, which can hold values of any reference type. It will be like a 'oneof' type in that it is a two- value entity, but the values are the reference and a reference to its type, not a reference and a numeric selector tag. In such values, the types are represented at run-time, and their semantics there matter. [No - the type is stored in the allocated object, so 'any' values are just a reference to some allocated, tracked, typed entity.] What about in byte-code, via the ref table? In instructions like array indexing, the type referenced can be the low-level array type, and that will be fastest for execution. That might make it impossible for the disassembler to show the proper name of the array type, if it has one. What's the right answer there? [As of 2010 I use the named type, and the array indexing bytecode must skip over the name node.] With instructions like 'reccon' and 'varcon', should the type referenced by the instruction be the top-level type used in the program source, or should it be the bottom level record or oneof type? If the record or oneof type is declared directly in a package and is exported, then it may need to contain a reference to the package so that the compiler can check uses of it, if the type was exported as "read only". Perhaps these operations are sufficiently uncommon (and expensive anyway) that the instructions can reference the top-level name. [They do.] One thought for type names was that the NamedDesc_t could contain either a package reference, if it is defined at package level, or a scope reference, if it is defined in a scope. To figure some of this out, I should lay out what kinds of symbols can actually be defined at the package, proc and scope levels. If I am not going to put some local-only things (e.g. local variables) in symbol tables, then remove those variants from SymInfo_t. Syntactically, perhaps what I should move to is: package Blah { /* These are fully public to everyone. */ public type T1_t = record { }; public type T2_t = oneof { }; /* These are readable externally, but cannot be constructed or changed outside of package Blah. These are what I was declaring as "public ro" before. */ public type T3_t = record ro { }; public type T4_t = oneof ro { }; /* These are read-only types. No code can modify them after they have been constructed. */ public type T5_t = ro record { }; public type T6_t = ro oneof { }; /* These are read-only variables. They cannot be changed after they have been initialized. Note that that says nothing about the fields of the record/oneof. */ public ro T1_t T1Var := T1_t(...); public ro T2_t T2Var := T2_t.xxx(...); }; Is the above too easy to get confused over? It seems to me to be the logical way to do it. [2010 - all resolved, in variants of the above.] 040524/Monday isRo and modifiable in Exec.z need more work. They should return moderr_roType if the type being assigned to (need to find fields to get the actual type of the field, but in case of struct/array need to check the top level too) is readOnly. Should also have the same check as needed on records done on oneof's, since they can be exported readOnly as well. [Done] Really do need to redo the Types.z stuff to allow for the safe construction of types (likely need temporary types like I used in the Exec.z stuff). Some comments added there. [Done] Possibly it should be illegal to silently convert between signed and unsigned integers. An explicit type conversion would do run-time checks. If the type conversion is implicit, then it should do the run-time checks. [Conversion is explicit, with checks.] 040525/Tuesday Started actually doing the Types.z cleanup yesterday. Continuing. I was thinking about all the kinds of exporting of types that I want. I'm tentatively adding the "private" concept: package P { type Internal_t = record { ... }; public type External_t = private Internal_t; }; Internal_t must be a referenceable type (string, record, oneof, matrix, proc) in order to be used in this way. Type name External_t is the visible aspect. The only possible operations on values of such a type are assignment and reference comparison. Within the package, assignment is allowed between types Internal_t and External_t. But, to keep things simpler, no extra operations are allowed on External_t, i.e. the values usually must be copied to an Internal_t before using them. [2010: naming alone does this - the 'private' is not needed.] I think it makes sense to do the private as another indirection in the type structures, even though the earlier 'ro' is a specific property of record and oneof types. Types cleanup done (skipped generics) in .z file. Do the .c next. An issue raised by Don: package P { type t = record; proc p()void: type t = record { uint field; }; corp; }; What happens? How about for a struct? C (well, gcc) considers the two type 't's (well, struct tags) to be different. [2010: Zed no longer allows types to be defined inside procs. I tried this experiment with the same names inside a package and inside a generic in the package. All seems fine. See test/genNames.z .] 040528/Friday Nearly done new Type.z stuff and full disassembly. Have added the scope entry and exit instructions - they help the disassembly. An issue I have is that the members of an enumeration have that enumeration as their type. Thus an expression involving them will end up with that enum type. But, if the type was named (as is normal), then they end up not being the same type. This seems OK when assigning to a variable, but not as a proc result type. Need the same code as Types.assignIncompat? Duplicating that seems to have worked. Perhaps there should be a routine to do that in Types.z? [Resolved - all uses use the named type.] More interesting is that a disassembly does not show names for enum constants. Annoying, but likely not an issue - the pretty-printer should be able to because the bottom level Exec_t node is the enum constant, which contains the member name. 040529/Saturday Working on the issue Don raised. Probably don't need the general Types.Normalize. Can just normalize each kind of type as they are produced, since the Types.z construction routines are the only way to produce types. [Done, eventually!] Grr. Thinking more about 'ro'. 'ro' should not be a main type class. Rather, it is a property of storage locations. So, for example, the fields of structs and records can be 'ro', as can things being pointed at by pointers (so a pointer type description must be a record with the ro flag and the pointed-to type), and variables. I think 'volatile' is the same. The C example is wrong and misleading. The thing that is wrong is that value's are neither 'ro' nor 'volatile' - they are just values. But, they have a type. Thus, 'ro' and 'volatile' are not part of their type, and so shouldn't be part of the type scheme. On the other hand, perhaps one looks at it as "type 'volatile X' represents type X such that all locations which store values of the type are considered to be volatile". [Resolved later] Also need a way to make a package-level variable 'ro' outside of the package but not 'ro' inside the package. Need both syntax and internal representation. If that is figured out, it might also be possible to have fields of structs/records that are 'ro' outside of their package, but not inside. I can see that being useful. Another possibility deals with the 'private' concept. Perhaps that is a flag inside a NamedDesc_t, simply saying whether or not it is allowed to follow down into the named subtype. But, you can name types that you cannot make private. Not clear which is best. DONE 'ro', 'volatile' and 'private', are likely not in the current parser as tokens which can start a type/declaration. [Are now] I've put in checks for constructor/selection for an incomplete record type. Need to do the same for declarations of an incomplete struct type, and constructor/selection for incomplete oneof types. [Done] A thought from last night: what to do about proc types. Normalizing them is good for compatibility of function pointers. But, what do we do with the parameter names in them? If two completely unrelated proc types happen to have the same parameter sequences and so end up the same, the parameter names from the first one will win out. That will not look terribly nice in pretty-print of that proc type for any but that first definition of it. One answer is to *force* there to be no parameter names in proc types, so the user will never see them or use them. That would change the parsing a bit, of course. [The names stay and are significant. The names must match for the types to be compatible. This is handled by the new ':' 'proc' syntax which forces a proc to have as its pr_procType, while allowing it to have different names for its formal parameters. This is used with procs that are to be used with proc variables.] Flip around the arguments to assignIncompat, so that the 'want' is on the left - the same as in an assignment statement. [Done] It seems I can't stop someone using a partially constructed type, at least in certain ways. But, they have to be manually calling the type construction routines to do it. Say they have called RecordInit and RecordStart, but have not yet called RecordNew. They can, in that interval, use Exec routines to create code that references the incomplete type. That isn't fatal, but it might be good if the Exec code can prohibit it. That requires keeping the record type marked as incomplete. Similar, as usual, for oneof, struct. [Done - it is based on the the rd being nil for records, and the od.selectorSymbol being nil for oneofs. Struct types?] "opaque" 040530/Sunday Done checking of uses of incomplete types. 040531/Monday Back pretty well parsing stuff again. One issue that I thought of last night and proves out, is that if I declare a variable (package or local) as 'ro', then I can't even initialize it. Urgh. How do I allow that to happen without violating the ro-nature in general? (Assume malicious folks are calling the Exec_t stuff manually.) One thought is to create all variables as RW, then have a call to make them RO, but no call to make them RW again. Since they are variables, I don't really care if they get assigned to or not, in terms of the semantic correctness of programs - the RO flag is really just an aid to the programmer. [Done that.] Before this round of changes, I wasn't actually checking that the fields of RO records were not being changed outside of their defining package. I am actually violating that in Proc.z/AddFormal, when assigning the newly created FormalList_t into the SymInfo_t. Ok, so how *do* I have types from two different packages point at each other, and yet have both be exported RO? Can I at all? Exporting functions to do the assignment for me works, but effectively ends up making the field RW. [Not an issue in that particular case - the SymInfo_t isn't actually used! The 2010/April situation is that you can export things to a specific list of packages. So, you write a proc that does the change, and only export it to where it is allowed.] 040601/Tuesday Started in on typed execution stuff. Need to verify that the size of a struct type is rounded up as needed to allow for arrays of them. I.e. byteSize must be a multiple of alignment. Also need to round the used offset for a local variable (and package-level?) up to Z_WORD_SIZE, so they all start on that boundary. [Done] 040605/Saturday Been continuing this week on the new byte-code execution. A bit of a question about just what the nature of the indexing operations should be. http://www.giftfile.org 040606/Sunday %%% Need an instruction to test a ref for NIL, and replace it with a simple numeric 0/1, and need to generate it as needed. 040607/Monday Arggh! Of course! Just clearing the whole stack on startup, and clearing the frame on scope exit isn't enough. We use the stack for temporaries, silly, so it doesn't end up all 0's. So, zero the required amount of space on proc entry. No need for more on scpin, since the proc's maxLocalSize cover's all scope, and the clearing in scpout leaves that whole range consistent. 040608/Tuesday No longer need to clear stack on startup. Also, when exiting a scope, don't need to clear any variables - just do a util_freeRef on ref values. %%% Can have a code pre-checker that runs through and verifies that all calls to Package_GetRef will succeed. Then, during actual execution, can use something like an in-line macro instead of the actual call. Another alternative is to actually store real pointers in the instructions, that are filled in by the pre-execution validation. That changes how "RLD" information will have to work - on code instead of on the ref table. Argghhh! Return statements need to do scope exits for all scopes they are inside. Not too hard to do, but icky. DONE 040612/Saturday Mostly working again. Currently redoing the recordConstructor stuff. I think I want to store, in the RecordDesc_t, a count of the "simple" fields, where "simple" is not struct, union or array - need a routine in Types.z to tell that. The reason is that without it, I need to work backwards in pulling initial values from the stack to put into the new record. An ick in the disassembler. Now that I've put in the needed bc_scpout instructions when doing a 'return' branch, the disassembler gets confused because, although the scpin and scpout match up when running, they no longer match up textually in the code. Thus, the disassembler has trouble with local variable names. Not sure what to do there - I don't want to add a byte-code run-time cost. [Resolved] 040613/Sunday Actually, the fastest for execution may be to put an actual 'isSimple' flag in the FieldList_t records. That way there is no need to call the Types.IsSimple routine at run-time. [Record types now have an 'initVec' that just lists the fields that need initializing. This was needed because fields can now be marked as 'noinit'.] For the disassembler, the answer of course is that the disassembler should not process the bc_scpout if there is a later bc_scpout for the same scope in the function. Is it worthwhile to build auxilliary data structures to compute that, or is it OK to just go with the N**2 forward search on every bc_scpout? Disassemblies are not common, but that can be a lot for a big routine with a lot of scopes and returns. Gee, just like Exec.ProcCheck! (No - it doesn't have a lot of returns.) [Done the N**2 stuff - it doesn't seem to be a problem.] 040628/Monday Lots of Legoing. However, I have been working to get the new execution model working for all my test programs. The current stopping point is that the code that unreferences the local variables on exit of a scope can result in something being freed. E.g. "m" in xtest.z, function "createMatrix". Note that the created matrix (and all created things) has no references to it. It only gets one when it is assigned to something. So, try a variant of bc_scpout, say bc_scpoutr, which says that the value left on top of the stack when exiting the scope should have its reference count incremented before freeing the locals in the scope. After freeing those locals, decrement the value's reference count, but do not do any resulting freeing - just reduce the reference count. I think that ought to do it. [Done, works] 040705/Monday Make the 'size' construct work for arrays, so that it can be used to find the size of an array created via a multi-valued constant. Consider using 'dim' rather than 'size'. [ended up with 'getBound'] DONE 040707/Wednesday How do we do full optimization of things like TRACEn macros in C? On the surface, we parameterize the level we want in the code (just like you have to recompile in C), and then rely on dead code removal and inlining to get rid of everything. But, can we inline a routine from another package? The 'jsr' and 'jsri' instructions switch the current package, since ref table indexes are relative to the package that the active function is defined in. Does Java do anything here? Perhaps we have to get only the trace level from another package, as parameterized, and must declare the actual trace functions in each importing package. Then, some way to ease the duplication of the trace routines into multiple packages would be useful. [Much later: doing them as compileTime resolves the issues.] 040710/Saturday Thinking about constant expressions (constant folding). Perhaps the simplest way to do it is to introduce the "optimized version" Exec_t, and then do constant folding all the time. Then, checking for a constant is as simple as looking for an optimized node whose optimized form is a constant. The code generator would then automatically get the benefit. It might still be good, however, for the parse state structure (or perhaps the exec context) to have a flag saying that a constant is required, so that error messages can be at the appropriate place. Note that all places that currently just grab a value from a constant Exec_t node will have to instead look for an optimized node and look for a constant node under it. [All done, later, except the parser flag.] I'm now thinking that instead of a generic optimized node, I should just start with a UintConstExpr node, and see how that goes. For the most part, compile-time expressions are uint. [Not done - better to use the generic 'exk_alternate'.] 040718/Sunday A lot of the constant folding is done, and I think its working out OK. Have to get back to chasing the bug relating to compiling Types.z at some point, however. Just noting here that I also will need to add syntax, parsing, opcodes, etc. for some run-time type conversions. Currently I need uint to sint and sint to uint. Last night Don had suggested unary "+" as a convert from uint to sint, but I'm not sure I want something that short. [All resolved] 040719/Monday The work with constant folding has illustrated the need for type conversion constructs. Since the conversion constructs are the only ones needed in the safe language (i.e. no "pretend"), they don't really need to have a syntax like "make(, )". They can just use the built-in type name. The built-in type names that we can consider using this way: uint, sint, bool, string, float, char and perhaps even byte Many of the conversions can yield run-time errors. Note that my current thought is that 'char' would actually be a 32 bit value, like in Java. Some examples: uint ui; sint si; string st; float fl; char ch; ui := uint(si); /* error if si is negative */ si := sint(ui); /* error if ui is too big */ ui := uint(fl); /* error if fl is out of range (including negative) */ /* this would be like Fortran INT() - to nearest */ ch := char(si); /* Unicode character represented by si if positive */ st := string(ch); /* string of one character */ ch := char(st); /* err, pick something. Maybe the first character? */ st := string(uint); /* string representation of the number */ ui := uint(st); /* convert to number - can get runtime error */ Comment from Don: I'd save type-conversion names for conversions where there's only one likely interpretation. I'd call those "round" and "firstchar". And I'd provide "ceil", "floor", and "nthchar". Or maybe, uint(fl) should fail if the value of FL isn't an integer, and char(st) should fail if ST hasn't exactly one character. [Resolved, with 'toUint', 'fromUint', 'flt', 'round', 'trunc', string indexing.] 040720/Tuesday %%% Run-time check for shifts - shift amount must not be greater than the word size. Want to have two kinds of shifts - one that does this check and one that does not. When shifting bitsXX types, there is no check. There is also no check when the shift amount is an in-range constant. So, add 'shlq' and 'shrq' opcodes. 040721/Wednesday Discussion on comp.arch about ref and out parameters. Its valid that knowing something is a ref parameter, *at the call site* is useful in understanding code. Perhaps require an explicit 'ref' before such a parameter at the call site? [Later: perhaps go back to the older idea of using an '@' prefix. Then there isn't the icky dual use of the 'ref' reserved word. DONE] 040804/Wednesday <> Mention on comp.arch about a Boehm GC available on his website. 040817/Tuesday <><> For things that don't cause security problems, or other basic system consistency issues, don't get upset about errors. Log relevant info about them and continue on the way. As an example, GUI stuff isn't crucial - the most that can happen is a messed up display or buttons that don't work, etc. 040904/Saturday Link from SlashDot: http://apr.apache.org/ Its a portability library. 040910/Friday Use "pretend" and "convert". Allow both in the high level language, but restrict "pretend" to converting to "uint". [toUint, fromUint] When I have a pretty-printer, the disassembler can use it to display the symbolic expression form of constants. Maybe. [Worth it?? One issue is that the bytecode being disassembled does not refer back to the Exec_t tree, and so the expression isn't easily findable.] The fix to the disassembler for the extra scope exits may be to scan to the end once on startup to find the end (or perhaps we just know it based on the length of the bytecode). Then, on any scope exit, if it is part of a series of just scope exits, followed by an unconditional branch to the return instruction, then don't do those scope exits. [That's what is done.] [Later: another alternative is to have the code generator put returns right at the various returns, instead of using a branch to the end. That makes them easier to detect for the disassembler. However, then we likely would have to make the byte-code interpreter check on each return instruction for a "break on return" to do something like "finish" in a debugger. Probably not a big deal. A native-code generator wouldn't have that option, but a disassembler for it might not have a scope issue anyway. Later: likely not much help, would have to emit all the needed scope exits for each return.] [Later: did a simple scan past scpoutX (if same as prev, then if there is a bful, assume is a return and don't pop scope. Seems to work.] 040913/Monday From SlashDot, Linux Standard Base 2.0 standard: http://refspecs.freestandards.org/lsb.shtml 040916/Thursday Note from work: Whenever a function would take a bool parameter, it might be useful to instead use a 2-valued enum. This provides a shorter form for documenting just what the true/false means... No, that doesn't work - it doesn't allow one bool parameter type to be passed on to another call. Better would be to define more global 2-valued enums that can then be used as the parameters to more calls/functions. Later: The awkward part is testing the values - you can't just use the parameter in an 'if' anymore. Dunno what the right answer is - perhaps add a language feature that lets other 2-valued enums be derived from "bool" and thus have the same semantics? [2010 - no resolution.] 040918/Saturday From Slashdot: http://irrlicht.sourceforge.net/ An open, portable, 3D graphics engine in C++ 040924/Friday Some thoughts from some idle moments. <> I've several times been annoyed by the menus that Netscape uses for the BrickLink site. One of them has so many entries in it, which it divides into chunks linked by "MORE" entries, that I can't actually select the later entries in it, since it is drawing so many menu columns that it runs out of space, and I can't select the needed one with the mouse since they overlap almost entirely. To me, it seems the answer is that such large menus (i.e. those that won't fit in one vertical column) should simply grow a vertical scroll bar. Similarly, if an individual entry is very wide, perhaps either that entry, or the whole column, should grow a horizontal scroll bar. To do it right, and without noticeably delays, we likely want to pre- compute the total size of the menu, so that the decision about how to render it (with or without scroll bar) can be made without having to traverse it and compute the total size. Storing the needed size with each entry would be reasonable. Want to allow text entries, entries with simple bitmap images, entries pre-rendered with whatever is needed, and, .... animated entries! Why not! As long as we quickly know the pixel size of each item, we can display it properly without delay. I'd prefer to keep as much work as possible in the thread of the program, rather than in the GUI thread driven by the mouse. But, if an entry needs to be rendered as of the instant of the menu activation, then we pretty much need hooks to allow the drawing of the entry to its canvas at that point in time. Definitely have to have a resource timeout then, to prevent a GUI hang. How should animated items work? We don't really want to specify all of the types of animations that are possible. Rather, we should just let the provided handler update the image as needed. But, I think perhaps it should be driven by the menu code itself, since that code knows when the various entries are actually visible. We shouldn't be continually updating menu items that are not displayed. Even more we shouldn't be doing anything when the menu isn't active. So, if animated items have differing frame/update rates, then the menu code will have to schedule its wakeups so as to accomodate all of the active/displayed animated items. We might also want to have code that checks to see if changes to a menu are acceptible. For example, a font size increase might make it no longer possible to display some item in the available space, even with the above-mentioned menu scroll bars. Should we then sprout scroll bars (including a vertical one) on the menu item itself? Ick. Instead, if possible, deny menu changes that would result in such problems. That will require a test virtual render of the menu of some kind, at least to recompute the item pixel size of all affected items, and compare that against the pixel size of the display in which the menu (and any needed scroll bar(s)) must be rendered. I may not need actual system traps to do graphics drawing to the actual display. The structure which represents the final canvas (the one going to the display) can be made read-only (either via HW protection, or more simply as an 'ro' type). Then all of the graphics calls will simply clip relative to the area it specifies, and it should be impossible to write anywhere not allowed by that canvas. Here may be a situation where the fully strongly typed nature of the system allows it to safely allow that direct access, thus resulting in better performance than is otherwise possible. The assumption here is that it is then safe to have the actual display memory writeable by all processes. [I don't think modern graphics cards would be very happy with this. Many basic operations are accelerated by the GPUs.] 040928/Tuesday Need some way to implement some variants of the ASSERTn, DEBUGn and TRACEn macros that YY uses. Current vague thoughts about inline routines (forced to be expanded for code generation) which use an I/O-like facility (as in Draco's I/O constructs) to do the call sequence. For something like tracing, perhaps one builds a structure locally, which is filled in by the sequence of calls, and then delivered as a whole to the actual tracing facility. Note: will want to allow pretty much all 1-word types (e.g. not arrays/structs/unions) to be converted to "uint" bits values, for use in standardized tracing structures. [Compile time execution of 'ioProcs' lets this work fine. I've done the DEBUG stuff, and don't see any issues with TRACE, at least once I allow initialized structs.] 041009: had thought that the need for a cast-type thing in my range-checking routines like Base.UintToSint could be done away with by the value-range tracking stuff - in those routines, the checks are explicit, so when the assignment across types is done, the compiler can know there is no possible error, and so not complain. But, I think the above need for conversions may override this. 041002/Saturday Link from Slashdot: http://syllable.org/ BeOS-like free OS. 041004/Monday <><> Try to have TAB-expansion work pretty much anywhere. For example, in the "shell", type the first part of the name of an element (a "file") and TAB to complete. Then on a RETURN, start up the code to edit that kind of item (in a separate process of course). Thinking a bit about "web pages" (just documents with links). Use META-keys for things like bold, italics, underline, colour-change. On the latter, then want a colour name - use TAB-completion. And, allow a colour number (24 bit RGB). 041008/Friday <><> Optimizations based on value range knowledge: since we know that the bytecode engine will trap if it finds an overflow/underflow/subscript- error/divide-by-zero/etc, after the code flow has passed a case where that could happen, we can add a range limitation on the value that represents the run-time test it just passed. So, for example, if a value is used as an array index for arrays of a common size, its possible that only the first indexing operation needs to be one that checks the indices. 041009/Saturday <> How versions are handled will be very important. A scenario I just thought about is one where an author produces a new version of a library/type that has really very little new functionality, but large changes in presentation/style. Users may not want to use the new version (perhaps it takes more screen real estate, or more memory), but want it around in case an object arrives that needs it. So, allow the setting of the "chosen" version of a library/type. This is the version that the user prefers to use, even if its not the newest version on the system. It will be used for anything that does not require a newer version. Introduce the concept of a variant. Here, a basic library/type can split into multiple variants, likely by different authors. They do much the same thing, but perhaps in noticeably different ways. The user can select which variant they prefer to use. An object will still just request a basic version, and if the system has the library/type in the chosen variant with a high enough version, that one is used. Otherwise, the system will go with some other variant of high enough version. One big problem will be that of ensuring that objects/references do not ask for a version larger than they actually need. If new versions can simply introduce new functionality, then it ought to be possible to scan objects and automatically determine the feature-set they use, and hence which library/type version they require. It would be nice to be able to have the overall motto of the system be something like "you never have to upgrade". There are several reasons why people want a new computer, but there shouldn't be any reason why they have to upgrade the entire software system. Getting newer versions of libraries/types will be needed, as will getting entirely new ones. But, there should be no need to re-install the software or anything like that. Well, that's a goal, anyway. 041013/Wednesday Thinking about how to do I/O-like functions, that could work for things like the common DEBUG and TRACE macros used in C. I'll just ramble and see how it goes. Need a kind of construct that is a wierd combination of compile-time and run-time, it seems. The special 'case' statement is handled at compile-time, but the stuff around it needs to be at run-time. E.g. (I've liberally stolen the Draco syntax of "channel output text". This isn't necessary - perhaps something simpler would do, like "output". I think it does have to be somewhat special, to allow for the proper interpretation of the 'write' calls.) iofunc debug1(ioarg arg)void: if DebugLevel >= 1 then channel output text chout := Debug.New(); case ioarg incase string: write(chout; arg); incase uint: write(chout; arg); incase bool: write(chout; if arg then 'T' else 'F' fi); default: error("Unsupported type passed to 'debug1'"); esac; Debug.End(chout); fi; corp; A call like: uint count; bool success; ... debug1("All done, count = ", count, " success = ", success); Would have to come out as equivalent to: uint count; bool success; ... if DebugLevel >= 1 then channel output text chout := Debug.New(); write(chout; "All done, count = ", count, " success = ", if success then 'T' else 'F' fi); Debug.End(chout); fi; We don't want macro-like processing by the parser, however - we want the original 'debug1' call to be in the parse tree, so that it is there on all pretty-prints of the code. I can't see anything other than some kind of local define/equate thing that lets us have generic debug functions with a package-local level variable. [Years later, I have what I wanted!] 041014/Thursday See above. For many purposes, we could just define an "iofunc" as always using the 'write' functionality, without needed the strange case statement, since the language will likely be providing ways to translate all values to ASCII text. But, other uses, like tracing, will want to do other things with the values. From: Don Reble It looks like you're trying to define convenient text I/O for the programmer. To that end, if you already have Zstdin, Zstdout, and Zstderr, you might consider Zstddebug. And then there's formatting convenience. You're trying to mimic C's varargs: but there are other ways to do such things. Survey says... Modula-2 -------- This language just has text-print-routines for the various fundamental types: write (for character), writeString, writeLn, writeInt, ... This is perhaps the easiest way to do it, for the language- implementor. No doubt you've already rejected it. Pascal ------ For Pascal's writeln, the compiler converts writeln 42, 3.14159, "fred"; to writeInt(42); writeChar(' '); writeReal(3.14159); writeChar(' '); writeString("fred"); writeChar('\n'); (I haven't looked up the actual names.) This complicates the language slightly. It might be worthwhile, though, especially if one finds another use for such a construct. Algol-68 -------- This language allows one to use unnamed types ("modes"), by making struct-constants. You'll recall things like write((fmtvar, 42, 3.14159, "fred")) which makes a value of a implicit { format, int, double, char* } struct. Algol-68's write can break-down such structs, printing each field in turn. Now, any language already has implicit types for literal constants, but they're well-defined types. Implicit struct-types might be harder. Modula-3 -------- Modula-3 is a hybrid of the above ideas. There's a FMT.F method, which takes a format-string, and up to _five_ values. (The language has C++'s default parameter value mechanism, and uses that if there are less than five.) But those values be TEXT, or be built-in types, which FMT knows how to format. And the FMT.FN method takes a format string and an array of TEXT. One writes something like this: stderr.PutText( FMT.FN("%s %s %s", ARRAY OF TEXT{ FMT.Int(42), FMT.Real(3.14159), str } ) ); (But I could have used FMT.F, with only three values.) PL/1 ---- This anachronism has a "put data" statement. If one writes put data inta, pi, str; that might output inta=42,pi=3.14159,str='fred' Believe it or not, there's also a "get data" statement. It is very fussy about the input data. That's surely a compiler construct, although it might be built atop something like varargs. Java ---- Here, class Object has a toString method, which sees the class name, and produces "myClass@0xbffe2980" or the like. The fundamental types override it, making it somewhat like Modula-2. The programmer can, of course, override the somewhat useless toString definition in his own classes. Eiffel ------ Eiffel is a lot like Java, except that the default "out" method produces something useful. I believe it calls "out" recursively for each field, and concatenates the results (inserting whitespace). Z ----- (If I may propose the Z way...) Just combine Eiffel and Algol-68: stddebug.setVerbosity(2); stddebug.out2({42, 3.14159, "fred", &someStruct}.toStr()); No, wait: out2 calls toStr() for you. stddebug.out2({42, 3.14159, "fred", &someStruct}); There! There may yet be problems: can you distinguish pointer-to-array and pointer-to-element? How big is the array beyond that pointer? -- Don Reble djr@nk.ca From: Chris Gray Message-Id: <200410150125.TAA16458@GraySage.COM> To: djr@nk.ca Subject: Re: Some early-morning language design Status: R > It looks like you're trying to define convenient text I/O for the > programmer. To that end, if you already have Zstdin, Zstdout, and > Zstderr, you might consider Zstddebug. Now that's a complete response. Well, except for C++ (which you are likely presuming I know) perhaps. :-) Actually, I'm looking for something more general than just debug output. I would like to use it for tracing as well, where limited kinds of values are stuffed into an in-memory binary buffer with a datestamp. I also want the source-level calls to be as short as possible - one-line wherever possible, still fitting in the usual 80-column window. > And then there's formatting convenience. You're trying to mimic > C's varargs: but there are other ways to do such things. > Survey says... Well, I don't want to really mimic varargs, since its unsafe by definition. What I want is the briefness of the calls that use it, not the semantics. You've likely forgotten how I/O works in Draco. Syntactically, it looks a lot like Pascal I/O, and the compiler translates it into the appropriate sequence of calls. Its a bit of work, but not really that much. In Draco, the same style works for input and output, and both text and binary I/O can be accomodated. Formatting codes can be inserted, as in: writeln("i = ", i, "(0x", i:x:8, ")"); (Second output of 'i' will be in hex, with 8 digits displayed.) That translates into a sequence of calls that accomplish the formatting. There are hooks for specifying sources and sinks of text, which are used by the language syntax if a channel is specified, as in: channel output text chout; chout := /* one of several kinds of channels, e.g. to a buffer */ write(chout; "This is put via the channel", 37.6:e:12); and there are pre-defined channels for the console. > Z > ----- > (If I may propose the Z way...) > Just combine Eiffel and Algol-68: > stddebug.setVerbosity(2); > stddebug.out2({42, 3.14159, "fred", &someStruct}.toStr()); > No, wait: out2 calls toStr() for you. > stddebug.out2({42, 3.14159, "fred", &someStruct}); > There! An important point is that the debug/trace/whatever level can be different in different packages, but I would still like to be able to use a common debug/tracing/whatever infrastructure. As a more solid example, at YY we have something like: ================================ debug.h ============================ #define COMPILED_DEBUG_LEVEL 9 ... #if COMPILED_DEBUG_LEVEL >= 1 #define DEBUG1(x) \ if (DEBUG_LEVEL_VARIABLE >= 1) { \ dbg_print x; \ } #else #define DEBUG1(x) #endif ================================ whatever.c ========================== #define DEBUG_LEVEL_VARIABLE ComponentDebugLevel unsigned int ComponentDebugLevel = 3; /* changeable via a command */ void blah() { ... DEBUG1(("%s: i = %u\n", __func__, i)); ... } > There may yet be problems: can you distinguish pointer-to-array > and pointer-to-element? How big is the array beyond that pointer? In the generally-accessible Z language, one doesn't deal with pointers.And, in a strongly typed language there is no possibility of confusing the type of something pointed at. If you have: [10] uint blah; then &blah yields a value of type "*[10] uint" (pointer to array of 10 uints). And &blah[0] yields a value of type "*uint" (pointer to uint). There is no confusion. And yes, I had forgotten that I plan on having the equivalent of toText for all types. I'll provide it for the language-defined types. Programmers can define it for their own, and if they don't, the system will do it as you mentioned for Eiffel. -cg 041026/Tuesday A bit of thinking on the walk to work this morning. It doesn't look like a good idea to try to restrict the set of routines that code can call. E.g. restricting "unsafe" code to not be able to use routines that examine persistant data, etc. One reason is that we actually want to allow people to run pretty much any code. We just want to make sure that it can't do things we don't want it to do. Another relates to the difficulty of doing the checking. If we get a bunch of code from elsewhere, we end up having to do verification on it before we can allow it to run. It would be nice to simply not care. I.e., it would be nice to not even need anything like Java's verification. <> A better option seems to be to push hard on the "playpen" idea. Go so far as to allocate a small chunk of raw persistant storage to use as the entire persistent store for the playpen. The code running in the playpen can do whatever it wants in that space, but can't in general, access anything outside of that space. This is "cookies" in the traditional browser sense. This in turn suggests that the code for persistent store act very differently depending on the size of the persistent store it is given. With small enough amounts, it needs little if any overall structure - just a set of the items being persisted. That way, the overhead is reasonable for small stores. 041031/Sunday Over the last couple of days, I've put in the alloc half of a storage allocator that checks for duplicate frees, and can check for stompage (at considerable expense). Found a couple of issues with it. Not sure why I didn't just grab the one from cgmud! I've also fixed several issues with the type stack. Now I can run the "runit.exectest" test and get no messages about tsp not matching. 041101/Monday <><> The text input fields of a requester are just small text windows, and so all normal editting stuff should work in them. In particular, *history* should work in them as well. This simply requires that the data structures for such requesters not be discarded just because the requester isn't currently being displayed. It is then a matter of how the application handles them as to whether or not the history mechanism works properly for them. 041102/Tuesday Realized this morning that what I think might be the best for conversion operations is to have them as exported by the various types that they work on. Conversions "ToBits" and "FromBits" (the latter only when it is legal/safe) are automatically provided and checked for by the language system. They are in a sense language constructs. So, to convert a uint bit pattern into a float, use "float.FromBits(theUint)". To go the other way, use "float.ToBits(theFloat)". Similar for something like "sint.FromBits(theUint)". Are "ToBits" and "FromBits" reserved words? Can the provider of a type provide their own versions? Hmm. Thinking about a related issue, I think it might be useful to make "float" and "Types.Float" be completely equivalent throughout. Same for all other types with reserved word representations. Because of this, the flag indicating whether or not FromBits exists can be stored in the BaseDesc_t. 041111/Thursday A bit of a puzzle. I'm trying to put in syntax for referencing simple methods on types. Right now, I just want to to the above conversions. But, how does one reference a type as a value? I already have ways to represent them, and in some situations generate such a reference. But, there is an ambiguity. If a type name (or path) is the first thing in a situation where a statement could be allowed (e.g. in the body of an 'if'), then how do I know whether the occurence of that type name is a reference to the type, or is the beginning of a declaration sequence? If I skip ahead one token and look to see if there is an identifier following, then maybe I assume declarations. Can I disallow an empty declaration sequence? Sure, why not. Can I then, in such a context, go into handling declarations, but if there are no identifiers to be found, simply come back and yield the type reference? [It ended up being based on a combination of the type of the expression being Types/Type_t and the next token being a name or a storage flag.] 041116/Tuesday <> One thought on optimizing array stuff is to find the indexing operations that don't need any bounds checking, and convert them into direct uses of the simple arithmetic instructions - essentially expanding the indexing like would be done for a native CPU. Then, let stuff like common sub-expression code run over it. Also the usual kind of strength reduction that is done for indexing code. <> However, note that one of the big costs of Z code might not be array bounds checking so much, but rather the costly use of instructions dealing with tracked (reference) values. That tracking, and the handling of the type stack, will be expensive. It should be possible to find situations where we know that the code in a given range will not actually be permanently changing any reference counts, and will not be unlinking elements from structures. Thus, it does not need to use the reference instructions. <> Another point is that, with most byte-code systems, it may well be more efficient to simply have a non-checking array indexing instruction, rather than converting to address arithmetic in the byte code. 041123/Tuesday It seems I am allowing enum arithmetic, i.e. enum - enum, enum + uint, enum - uint. So, I should likely allow the equivalent with char. In any case, I need to do range checks on the result of these operations. How about a simple "range check" instruction, that aborts execution if the numeric top-of-stack value is unsigned greater than a constant specified in the check. DONE At some point, force all 'for' iterators to be declared inside the 'for' itself. DONE 041127/Saturday Asked Don what he thought about the above. Got a long reply, most of which isn't really relevant. Chris: > I should remove the code that allows a 'for' loop to use an externally > declared variable ... force the variable to be declared right in the > 'for' construct (e.g. "for uint i from 0 upto 10 do"). Can you see any > downside to this? It avoids the whole question of what the value of a > 'for' variable is after the loop terminates. --- Sometimes, one wants C(++)'s for loop to say where it found something. int ix; for (ix = 0; ix < ArraySize; ix += 1) { if (Array[ix] == sought) { break; } } if (ix < ArraySize) { /* found it at [ix] */ ... But one can put this instead: and one doesn't even need a "break". int found = ArraySize; for (int ix = 0; ix < found; ix += 1) { if (Array[ix] == sought) { found = ix; } } if (found < ArraySize) { ... I find the second version slightly better; but who would agree with me? --- If a loop might throw, and one wants to stop and report at the first failure: int ix; try { for (ix = 0; ix < ArraySize; ix += 1) { doItOrThrow(Array[ix]); } } catch (...) { cerr << "doIt fails at " << ix << endl; } The local-forvar version looks like this: for (int ix = 0; ix < ArraySize; ix += 1) { try { doItOrThrow(Array[ix]); } catch (...) { cerr << "doIt fails at " << ix << endl; break; // remove this line to keep going } } And the breakless version: int failpoint = ArraySize; for (int ix = 0; ix < failpoint; ix += 1) { try { doItOrThrow(Array[ix]); } catch (...) { failpoint = ix; cerr << "doIt fails at " << failpoint << endl; } } if (failpoint < ArraySize) { cerr << "yep, doIt failed at " << failpoint << endl; --- Here's an example from one of my C++ library classes: it compares two arrays for equality. The arrays might have different lengths: but zero-padding doesn't count. int ix; const int* aptr = &ArrayA[0]; const int* bptr = &ArrayB[0]; for (ix = 0; (ix < Asize) && (ix < Bsize); ix += 1) { if (*aptr++ != *bptr++) return false; } for ( ; ix < Asize; ix += 1) { if (*aptr++ != 0) return false; } for ( ; ix < Bsize; ix += 1) { if (*bptr++ != 0) return false; } return true; The local-forvar version: const int* aptr = &ArrayA[0]; const int* bptr = &ArrayB[0]; for (int ix = 0; (ix < Asize) || (ix < Bsize); ix += 1) { if ((ix < Asize) && (ix < Bsize)) { if (*aptr++ != *bptr++) return false; } else if (ix < Asize) { if (*aptr++ != 0) return false; } else { /* (ix < Bsize) */ if (*bptr++ != 0) return false; } } return true; Of course, one can remove comparisons from each. Here's a local-forvar optimization: const int* aptr = &ArrayA[0]; const int* bptr = &ArrayB[0]; if (Asize <= Bsize) { for (int ix = 0; ix < Asize; ix += 1) { if (*aptr++ != *bptr++) return false; } for (int ix = Asize; ix < Bsize; ix += 1) { if (*bptr++ != 0) return false; } } else { for (int ix = 0; ix < Bsize; ix += 1) { if (*aptr++ != *bptr++) return false; } for (int ix = Bsize; ix < Asize; ix += 1) { if (*aptr++ != 0) return false; } } return true; There's little practical reason to prefer one to the other; so one can go with principals. (Hey, the second version is shortest!) --- Sometimes one does double-tracking in a C++ for loop. for (ix = 0, ptr = &Array[0]; ix < ArraySize; ix++, ptr++) { ... Those vars have to be globals; one can't write this for (int ix = 0, ptr = &Array[0]; ix < ArraySize; ix++, ptr++) { because ptr is thereby declared int! If the Zedmeister allows a declaration in a for-head, he should consider allowing multiple declarations (and multiple increments). -- djr 041128/Sunday Thinking about 'char', strings, indexing, etc. 'char' will be a 32 bit unicode (affirming my earlier decision). Strings will be atomic objects. It might be nice to treat a string as a matrix of char, which readily allows indexing of chars, but then that allows assignment to chars within the string, and I think that might not be a good idea. In particular, string constants getting modified is bad. To prevent that, I would have to introduce the whole 'const' thing like in C, and that could be a pain. (Not so much if done early in the language design, though). However, I think I do want to go to the enum model for char, even if they are 32 bit values. That allows them to be used directly as array indices, and in case statements. Start work on that right away. However, I can't go to 32 bit characters before changing/providing some infrastructure. Does my gcc have 'wchar'? Yes. , wchar_t. Note that char's are 32 bit, not 64 bit, on a 64 bit version of Z! Use 'wcsXXX' instead of 'strXXX'. When switch to 32 bit 'char', have to fix up "bcGetSize". Have to think a bit more about forcing the 'for' iterator variable to defined inside the 'for'. To do it right, it needs to be enforced by the Exec code, and not just by the parser. That might require the Exec code itself to generate the variable symbol, using entries in Proc. Is that going to be a problem? How dependent on Exec is Proc? If I get myself caught in dependency circles so that I can't build a structure of mutually referring pointers without exporting procs I don't want to export, perhaps I would need to add the concept of a limited export. This would be an export from one package specifically to another package. It would be nice to avoid that, but it might end up being necessary. [It got done] It might be cleaner to essentially pre-declare the packages that are the system core. Nothing extra needed other than adding another Z source file that does nothing but those pre-declarations. Need to allow more things for 'byte' type - pretty much all that 'uint' is allowed to do. Hmm. Or, is 'byte' the same as 'uint', except that it must be loaded/stored as a 1-byte value? That would be good, but how is that done? Perhaps get rid of 'byte'. Introduce "bits8", "bits16", "bits32" and "bits64" as size-defined types. The sizes just control the loads and stores of them - for all other purposes they are treated as 'uint'. [Done] 041130/Monday <><> Could have a standard editing possibility (if the type being edited supports it) of "fix" (bound to ^F maybe). In a program window it tries to fix a single syntax/language error. ALT-F could try to fix all of them. No guarantee of the usefulness of the result of course. This might be useful to beginner programmers. It could be used for all manner of things, from identifier completion, path completion, construct completion, identifier spelling correction, declaration generation, etc. 041201/Wednesday From a written note: "ProcCheck packages". Go through and see if Exec.ProcCheck needs to do any check of the Packages referenced by the elements of the Exec_t. My quick thought is that if the caller can get a reference to something out of a package, then they are allowed to make use of it. A couple of days ago, I changed some code in Types.z (and .c), to get rid of extra error messages. I essentially allowed Types.Error field/members in structs/records and unions. That is done in the checks in "addField" and "UnionAddMember". The first two cases in each used to just return a nil. Now, they replace the bad type with Types.Error and proceed with the addition. This cuts down on compilation error messages, since the field name is actually added. But, are there any bad consequences to allowing what is essentially an invalid type? Can someone use the existence of such a thing to violate the basic system security? [I think this is fine.] Need to put routines into the Proc code for defining local variables in the current scope. Use it in pProc.c/defineVariable, and in Exec.z/ForVariable. That way, we just pass the symbol and type for the 'for' variable into that routine, so there is no way the variable won't be restricted to the scope of the 'for'. To finish that off, need to move the ScopeStart/ScopeEnd to inside the 'for' routines. Actually, move them into 'For' itself, and don't define the 'for' variable until there. That way we know they will be balanced. [Most done] 041202/Thursday Rename Proc_Body_t to Proc_Proc_t in C code. b_maxLocalSize is too big. We don't shrink b_localSize when we exit scopes. FIXED 041208/Wednesday Make all of the Types constructor routines do the normalization. There is no reason to leave it up to the caller, I think. [Heh! Still not done, nearly 5 years later!] DONE Botheration! I'm trying to test my first use of "procassign". The instruction (bc_pchk) is yielding a 'nil'. This is because it is checking a proc type created using C code (the main parser) against a proc type produced by Z code (using the Types.z code), and finding them unequal. Does this mean I can't test 'procassign' until I've done the whole lexer/parser in Z, and can compile the test program with it? 041209/Thursday No. Write a builtin that maps the package variable space of package Types onto the variables used by the C code for the Normalized variable lists. [DONE, but not that way] 041211/Saturday Perhaps want explicit 'compile' and 'run' keywords. 'compile' can be put in front of a proc to make it a compile-time proc. Either can be put inside code to toggle between compile-time and run-time evaluation. Run-time code inside compile-time code inserts that code into the context from which the compile-time code is running. This could be used for many of the C macro uses. It also directly produces the same result as 'inline' procs. Hmm. I guess that means that if a parameter to a compile-time routine is referenced in code marked 'run' then the value passed at compile-time must be used literally, sort-of. That in turn implies allowing the passing of run-time values to compile-time procs. In what form are they passed? Are they Exec_t references? That might work, but is awkward for the average programmer. If that is done, how do we pass compile-time values, i.e. how can the programmer specify whether to pass the literal constant 42 versus an Exec_t representing the literal constant 42? Ah - that would be based on the declared type of the parameter in the compile-time proc. One can imagine routines that can be used at both run-time and compile- time. They could be tagged as 'pure'. Is this worth doing? [Not needed] Perhaps the 'write' language construct could conceivably be a library compile-time routine with run-time chunks in its body? How far could we manage to push this? Can we arrange for a compile-time routine to assign to the elements of a struct or array, so that when it is done we have a static initializer for a struct or array run-time variable? [Much later: ioProcs have done this.] The 'typeof' construct (not there yet) could be a run-time thing when applied to an "any" run-time value, but a compile-time thing when applied to an "any" compile-time value (which includes some variables and parameters in a compile-time proc). [No typeof - too powerful. Ended up marking procs as 'compileTime', which causes them to be run at compile time, at the point of their call.] Variables declared take on the nature of the code they are in, as to whether they are run-time or compile-time. The values of run-time variables can of course not be used at compile-time. At compile-time, 'typeof' can be applied to a run-time value - if that run-time value is of type 'any', then typeof will return 'any' - it is returning the compile-time type of the run-time value. However, when applied to a compile-time variable, it must be applied to an 'any' compile-time variable and it yields the type of the current value of that variable. [See above.] <><> When displaying code in an output window, one of the options is to display the optimized code intead of the regular full code. When doing so, highlight non-optimized parts in, say orange (or some kind of dotted underline for the colour-blind). The programmer can then see where parts of his code have not been optimized. This can be done for simple operations (e.g. a non-check uint add, array indexing, record field selection, etc.) and for constant folding, etc. When the mouse cursor points at an identifier, some of the options that might be useful include: - highlight all uses - unhighlight all uses - spell check - show information on range (as in value tracking for optimization) - replace all Tracking down why things can't be optimized could actually be tracking down bugs in the code (or the code from which the value is obtained). 041213/Monday <> How will a debugger work? It would be very nice to be able to use the standard system parser, code generator, etc. with expressions to print in the debugger, and even to allow creation of complex debugger scripts. But, how can that be done without huge violations in the system security? We essentially want to insert code to execute in the appropriate context (either at the point of debugger attachment, or some other active frame selected by the user). We certainly can't allow code to be inserted into the procs being debugged, however - the user might not even own them. It *might* be necessary to directly interpret parse trees, doing the fetches from the stack of the context being debugged. We have to be careful if we allow even new temporary variables to be written on that stack. If the user wants to store to a variable in the program being debugged, then the usual semantic checks must be made, *and* all of the code involved that could see the stored value, must be recompiled to non- optimized bytecode, and executed that way from then on. This is because the store could have invalidated optimizations made which could have removed run-time checks. A great big ick. [2010 - head still firmly in sand.] How do we even reference variables in the context being debugged, in terms of parsing code? We need to do the parsing right in the right context, so that the referents come out right. That implies that we can't directly store the debugger "macros" as parse trees, since their referents can't be decided until they are called up in some context. Lots and lots of debugger issues. How does one actually use the "list(typeName)" concept that I've been trying to define? What limitations on 'typeName' are there? It looks easy if 'typeName' is 'uint', but how do we make more complex lists? E.g. if I would have a linked list explicitly built from record values, and where the data per node is more than one element, how do I do that? How do I create such a value? When using a record, the record constructor syntax works, but for "list(struct-type)", we now need a constructor for struct types, something we don't otherwise really need. Hmm. I guess what we really end up with is a constructor for "list(struct-type)", which is a record type, but likely with no explicit value for the implicit link field(s). Does that mean that the "list" type package has to export a constructor proc? I think so. This may mean that you have to name the produced type, otherwise we end up trying to do list(myStruct) ptr := list(myStruct)(value, ..., ...); Could that be made to work? Perhaps. Its consistent, but ugly. Its hard to see how this would be done in a generic type package however, since the arguments to that constructor are a copy of the fields of the struct, so they can't be declared in a fixed fashion. Ick. Its interesting to note that this *could* all be done the ugly way, by passing a Types.Type_t to a routine, and having that routine generate all of the code, etc. needed, by examining the passed type. Such a routine would execute at compile time, either emitting the new stuff into the current package, or finding somewhere else to put it (perhaps modifying its own containing package???) Grrf. Perhaps this sort of thing is the right way to go. These things essentially become part of the language, but they are written entirely in the language, and it is possible, though quite hard, for other programmers to write more. Perhaps I should try to write 'list' this way? Hmmm. Lots of work. First have to implement the concept of compile-time execution. Might not be a bad idea, since that's a major language concept. [Interesting how the final resolution has echoes back to this early thinking.] Test: just tried a bit of List.z . Didn't include Types.z in the compilation, so got lots of errors (plus used routines that don't exist). But, got errors in an 'init' proc, that resulted in errors from the code generator. Shouldn't get that far! Huh, even got some when I included Types.z! Duh! "public ro proc ..." and it tries to make the whole thing into variable declarations and inits! More trouble. I did just tik_basic's. If I want to do tik_struct, then what do I use for the tags of the record fields I add? We sort of want them the same, but I'm not allowed to duplicate the names within a scope, and they are already defined as the struct tags. Do I have to do away with requiring unique field names across all struct/record types? Do I already not require that? Turns out that I do currently prevent duplicates in a scope, but by just removing the check/definition, it all becomes moot. That would leave no use at all for a few SymTab entry kinds, and some corresponding error codes in Exec_t. However, I think what was originally intended was to use one common set of symbols for, e.g., the 'l_next' and 'l_this' in lists, and only define those once, in the generic type. The programmer would never actually use them. Oh? And how would the programmer reference the list element data then? My early example shows a 'for' iterator variable of the list element type. If I need an iterator type exported by the created type, then does that mean that the "method list" (or whatever) for a type can contain other types? Why not! Thinking about just going ahead and trying to do the compile-time stuff, I wondered how to represent the resulting type. I think I had already planned on that, by having a Types equivalent of Exec's optimized nodes. For Types, it would have an Exec_t expression, and the type resulting from executing that expression. Step one would appear to be making the parser able to select names from types. Which leads me straight back to the problem I had with with ToBits and FromBits - this conflicts somewhat with the syntax I use for constructing oneof values. I'll need to work around that. (Next day) I don't actually need to have symbols for fields of the record that is constructed, if the interface to the Exec_t to build references to them can use pointers to the FieldList_t instead. 041220/Monday I've got this week off work, but will leave for B.C. on Wednesday. I want to work on run-time execution. So, I've been looking at the concepts of signatures and type methods. If a signature needs to include the result-type of a method, then having special generic parameters on procs isn't enough. This suggests that the SigParam_t idea is the way to go. So, I don't need the ppk_generic concept. But then, do I need the ppk_typeof concept either. I've looked through this file, and the older notes, and can't really see that I do. I'll get rid of both, which should be a nice cleanup. Hopefully I won't have to put them back! 041221/Tuesday An email I'll send out. I'm mostly laying out the issues here for myself, but please feel free to reply - you might point me in a good direction. I'm wondering about the punctuation in Z. Not the stuff like parentheses, arithmetic operators, keywords, etc. Rather, I'm wondering about my many uses of '.'. That operator is used in Z, like in many languages, to mean a "selection" process. The selection things in Z: 1) selecting an exported symbol from a package. This would yield a proc, type, variable, constant, or a contained package. 2) selecting a field of a struct 3) selecting a field of a record 4) selecting a member of a union 5) selecting a variant of a oneof in a oneof constructor 6) selecting the discriminator variable in a oneof case 7) selecting the discriminated variant in a oneof case 8) selecting a method implemented by a type Currently, (1) through (7) are implemented, and all use '.' as the selection punctuation. I ran into this issue when thinking about adding (8). An early version was going to be used for "FromBits" and "ToBits" operations. The problem is that a oneof type can appear in both (5) and (8), making for some syntactic ambiguity. I can resolve that without too much trouble, by simply looking up the symbol after the '.' and seeing which kind it is. If it is both, a language rule can say which one wins (I would pick (5)). But, a language design rule that I came up with years ago says something like this: If its hard for the compiler to figure out, its much harder for a human to figure out - so don't do it. I think I should use different punctuation for some of these. Available characters include '!', '@', '#', '$', '%', '^', ':' and '.'. Note that all except '.' are fairly tall ('^' isn't bad), and so would be hard to see in between identifiers. An example of (1) is simply "BI.Print". Here we are selecting proc "Print" from package "BI". An extended example of my oneof usage: type Form1_t = record { Form1_t f1_next; uint f1_this; }; type Form2_t = record { Form2_t f2_next; float f2_this; }; /* This declaration defines 8 new identifiers. */ type Example_t = oneof { case ex_kind incase exk_f1: Form1_t exi_f1; incase exk_f2: Form2_t exi_f2; incase exk_string: string exi_string; esac; }; Form1_t f1Var := Form1_t(nil, 7); Example_t ex1 := Example_t.exk_f1(f1Var); Example_t ex2 := Example_t.exk_f2(Form2_t(nil, 12.34)); Example_t ex3 := Example_t.exk_string("hello"); Example_t ex := procThatPicksOne(ex1, ex2, ex3); case ex.ex_kind incase exk_f1: BI.Print("f1: " + BI.UintToString(ex.ex_f1)); incase exk_f2: BI.Print("f2: " + BI.FloatToString(ex.ex_f2)); incase exk_string: BI.Print("string: '" + ex.ex_string + "'"); esac; The above shows examples of (1), (3), (5), (6) and (7). Other programming languages use '.' for (2) and (4), so it would be good to keep that convention. It extends straightforwardly and unambiguously to (3), (6) and (7). I've been tempted to use '/' for (1), since paths of nested packages will replace the filesystem in a Z system. But, that leaves the division operator having to be something else. It could be '//', but I'm not sure I want to take this step. I've thought about using ':' for (8), and also thought of using it for (5), but its those two that are the most ambiguous! I could also consider using "::", but I think having both the single and double forms is asking for confusion. So, my thought is to leave '.' for (2), (3), (4), (5), (6) and (7). I would switch to ':' for (1), and use '@' for (8). Thoughts? Argh, wait. I think using ':' for (1) could be ambiguous with the use of ':' in case statements, if a case selector was a named integer constant from a package. Perhaps using "::" for (1) would be OK. That would be like C++'s syntax for selecting something from a class wouldn't it? 050101/Saturday The basics of compile-time functions is in. Now I can do more work on the List generic type function. I really do need to get rid of some of the redundancy in the parsing, but perhaps that can wait until I redo the parser in Z. After more work on the List function, I'll need to do the ability to lookup methods on types, so that I can then update the 'for' construct to use generic iteration. It occurred to me that I might need to be able to select another type from a type (e.g. the iterator type from type that is iterable). I think it might be better to use a different construct for the iterator stuff. Leave the semantics of 'for' clean. The new one could be 'foreach', but to me, that implies that it will touch each element in the collection. That may not be true if the collection is modifiable during the interation. Perhaps just use iterate over do ... od What if the 'iterate' construct needs to declare and use an iterator variable, to keep track of its progress? How does it do that? Do we need to be able to export other types from types, so that, given a "List(uint)", we can extract "List(uint):ListIterator_t"? Syntactically that shouldn't be a problem, but things get a bit messier in trying to declare and use the MethodList that is currently envisioned as being attached to types. Perhaps the package that exports List() would also export ListIterator(), and they work together in pairs. That's likely the cleanest, I think. To do a 'write'-type construct, the compile-time code needs to be able to do stuff conditional on the type of the argument. This speaks for the argument being an Exec/Exec_t. Perhaps whenever a compile-time routine is called, it is passed a pointer to the active Exec/TempSeq_t, and so it can generate code to there. This again suggests that no special syntax is needed, but perhaps some helper routines in a libary would be useful. So, 'write'- type routines might be required to have a prototype like: proc compiletime write(Exec/TempSeq_t ts; Exec/Exec_t tgt, val; char format; uint width, precision)void: If 'format' is ' ', then no format code was given. If width and/or precision were not given, the value '0' is passed. 'ts' can be nil in places where there is no suitable active proc. 'tgt' is the place to write to, and will be 'nil' for a simple write to a console. How many different kinds of 'tgt' values are handled will depend on the implementation of the 'write' proc. E.g. if it is a variable of type "string", then the 'write' can append to it. If it is reference to a proc taking a single 'char' parameter, then the resulting characters can be passed to it. What if *it* needs a context? Then perhaps 'tgt' should be an expression of type 'oneof', which the 'write' routine can expand to small structures describing where to output characters. There are lots of possibilities. So, with this, we don't need the 'runtime' reserved word, and the 'compiletime' reserved word is only a property of procs. Do I need a flag on 'write'-type routines, so that I can check for the proper prototype, and handle their calls properly? Do I need to be able to pass Exec/Exec_t's and Exec_TempSeq_t's to other compile-time procs? The answer is "yes" if I want them able to work macro-ish. Perhaps the convention is simply that if the first parameter of a compile-time routine is of type "Exec/TempSeq_t", then that routine is passed the active TempSeq_t when it is called. Any other parameter to a compile-time proc is passed either the compile-time evaluable value (types are a bit special here, but not much), and must be declared appropriately, or is passed the Exec/Exec_t for the parameter if that is its type. How do I do macro-ish things involving strings? Pass in string arguments, but then what? Perhaps the special parameter should be a record containing an interactive session context (for issuing error messages in the current language) plus the current proc context and the current TempSeq_t. The proc context allows local variables to be looked up, declared, etc. One can also look up package variables in the packages containing the routine being defined. Right now I have: InteractiveSession/Context_t ErrorConsumer/Handler_t Exec/Context_t Proc/Context_t Types/Context_t Perhaps these need to be combined more? [And that was the seeds for the current (April 2010) 'compileTime' and 'ioProc' stuff, which has worked well.] 050105/Wednesday A problem with 'ref' parameters. E.g. type r_t = record {uint r_p;}; r_t rVar := r_t(3); proc ick(ref uint p)void: rVar := r_t(4); p := 5; corp; It could be that the only reference to the record is rVar. Assigning to it inside ick could free the record, and now parameter 'p' is referencing freed memory. This can happen with any compound reference type, e.g. oneof and matrix. Direct compound types (struct, union, array) inside compound reference types present more opportunities. Two possible solutions come immediately to mind: 1) prohibit this. Could be a bit restrictive, requiring a fairly artificial use of a temporary variable. 2) Wrap such calls between the assignment to, and the clearing of, a compiler-generated temporary variable pointing to the compound reference object. [Done] 050108/Saturday Cleaned up the Context_t's a bit. Will merge Proc/Exec/Types into one. Still want parse/lex as another one. Been thinking about paths, as in package paths, import/use/whatever, and what symbols you can reference in what way. Java seems to allow references to the root directly, and anything in the current package directly. You use "import" to add more packages that can be referenced directly. "import x.y.*" is simply a shorthand for adding all the packages in "x.y" to the directly-referenceable list. It automatically adds "java.lang" to the search list. Given that I can have multiple source files that contribute to a single package (I've split up Types.z to do that), is there any point in the '{' and '}' around the package contents, and the level of indentation? Should the system remember the various pieces (source files right now) that contribute to a package? There is something to be said for being able to turn the whole system back into a mess of source files. But, would I then have to remember the forward declarations, or could I put those in "header" files that are just pre-declarations for their package? Then I would need to remember the compilation order of the various files. There wouldn't be any such order for stuff built online iteratively. There might not even be an easy-to-compute order that handles circular dependencies. Hmm. Could get rid of parseDependencies, perhaps. Also should change the way the package init proc is generated - it should be updated if we parse another source file for the package. Ick. Then when do we run it? When do we allocate the space for the package variables? One possiblity is to allow procs to be declared as :. (Or use "::" instead of ":".) This would create the new proc as normal, but would not add it to the symbol tables of the current package. Instead it would add it directly to the method list of the type. This is closer to the way OOP likes to do things. Note that you still can't reference those procs from objects of the type. But, thinking about GUI stuff, if one wants to add an item to a display, such that the main GUI code can call the proper redraw/paint code as needed, then there needs to be a way to find the redraw/paint routine from the object reference. One somewhat wierd way is to have the routine that adds the item to the GUI display find, at compile time, the routines it needs to do its work. That way it could issue error messages at compile time for missing procs. It could also accept various combinations of alternatives. It would need to either save a reference to the type itself, so that the procs can be found at run-time, or it would have to save the refs to procs in some expanded structure that it uses for referencing the item. This ought to be workable, but is it really a good idea? Having the registration routine check for what it needs on the type is more powerful than static checking techniques. And, it has the advantage that the whole thing is flexible in that we don't need complex schemes to find the routines at fixed offsets from virtual function descriptors. More Hmmm. Looking at the current Package.z code, I think I actually could generate something like an object file for each package. The definitions in Package.z include symbolic references to things that the package needs. Exec/PackageRef_t directly references a package right now. That can point to a stub, which contains the package name and its RefPath_t. If, after "linking", there are stub packages that have not yet been resolved, then we can't run. I believe my original intent for these structures was for handling new packages loaded from somewhere, that need to fit into the existing universe and run. But, they handle the more traditional "linking" use as well. Hmm. I can't *just* write out packages this way - that breaks the compile-time execution stuff, which is proving quite useful. Hmm. So, if, say, the List package has created a type on the machine that it was called on, how is that put together with the List package's data structures on other machines, if the package that uses List is shipped to another machine? The general type-resolution stuff can get rid of duplicates, but how does the List package get to know that the type has been created? It almost seems that the List constructor has to be called on the receiving machine (sort-of part of the "linking" process), so that it can properly handle the type locally. Hmm. If we re-create the type on the receiving machine, how do we make it have the same universally unique id as it had on the exporting machine? That suggests that a universally unique id for a type is not just a simple number, but is constructed from the ids of parameter types, and the ids of any generic types that were used to construct it. So, types produced using generic routines do not get their own numeric UUIDs. Right now, I represent the use of a generic type routine like List using an Exec/OptimizedExec_t. That's not even a type! Can I just use, say, a "created" type, which includes the Exec_t expression that created it, along with a reference to the resulting type, which is actually not shipped/stored persistently? What do we do with the procs that the generic type creates and attaches to the type it is creating? Perhaps we re-run the generic type proc on the receiving system, so it recreates the needed type. But, then how do we guarantee that it creates everything that the user of it needs, in terms of the procs it attaches to the created type? Perhaps a sending user of a generic type sends along a list of the signatures of the procs it needs the generated type to have. If the newly generated type does not have all of those procs, then the package does not resolve and will not be active (or whatever). Getting even more complicated! How do I verify that the code for a generic type does in fact fullfill the interface contract by generating all of the needed procs? This almost is coming back to using the compiler itself, with some special syntax or whatever, to generate at least the procs for the type. What is the name of the new type in that context? Maybe none is needed - it implied. Am I coming back to needing a type package? Ick. When one of above-mentioned registration routines is called, does the caller have to explicitly pass it "typeof(X)" where X is the actual parameter? No, because that type is available to a compile-time routine. That could have been why I had the typeof parameters, where an implicit parameter to a routine was the type of an explicit parameter. Do I like the compile-time execution way of doing it better? Its sort of icky to require all such registration routines be compile time, with a check of a type against a somehow-created interface specification, followed by a call to the actual run-time routine. Making the "interface" concept explicit, like Java, might clarify and simplify this. Note that one reason for having byteCode for field references in records or structs have a reference to the field definition is so that when code is received from elsewhere, it will be able to use the offset of the field within the record as defined on the receiving system. E.g. if the receiving system has a newer version of the record type from some newer version of another package, then the imported code can still properly look inside the records, even though the offsets are now different from what they were on the shipping machine. Been reading up on Java again - learning what I've forgotten! One big reason for the static inheritance model is situations where the basic functionality requires certain information. For example, anything that you can draw onto a graphic display must indicate where it is to be drawn. Have that as an X, Y pair of values that can be read by the generic drawing code is handy. However, this may not be universal. For example the location that something is drawn may be determined by code outside of the item itself, and thus having to stuff the values into the item is artificial. 050109/Sunday [April 2010: starting about here, I went down a bad path. I ended up with "bundles", both "polymorphic" and "generic". In the end, they did not work safely. I ended up converting "generic bundles" into just 'generic's. The polymorphic kind was discarded, and Zed gained 'capsule's (like Java classes) and 'interface's. This huge misstep cost me over 2 years.] Spent time today pondering, and reading up again on Java interfaces and a couple of other things. I can't see it explicitly said, but I believe that a method that satisfies an interface that a class implements must not be a static method. I.e. it must have the hidden instance parameter for 'this'. That is because the caller of such a function via only the interface cannot otherwise know whether to pass in the instance value (a parameter to the interface method itself) or not. At least, not easily. Also, it seems to me that when a class instance is passed to a routine as a value of an interface type, the run-time must also pass in the "instantiation" of the interface by the class of the passed value. This is so that when the method calls a method in the interface, it can use that "instantiation" (much like a virtual function table) to find the interface-satisfying method in the class of the passed value. So: interface testInterface { void doSomething(); void doOtherThing(); }; class someOtherClass { static void someOtherMethod(testInterface val) { /* Same syntax here as when calling a method on a class instance.*/ val.doSomething(); } }; class testClass implements testInterface { int instanceVar; static int classVar; /* This is OK - it is not static: */ void doSomething() {instanceVar = 0;} /* This is not OK - it is static: */ static void doOtherThing() {classVar = 0;} void aMethod() { /* Note that 'this' is passed explicitly as the value of type 'testInterface'. I'm thinking that this call must also pass a pointer to an internal structure containing pointers to testClass's implementations of testInterface members. */ someOtherClass.someOtherMethod(this); } }; 050110/Monday I think I'm OK with something like interfaces. How explicit can I make them? I'm still not convinced about the concept of a class with inheritance of structure from an ancestor class. One typical example is a GUI object, needing its x,y screen position. My answer is that no, the thing itself doesn't need its position - it just needs whatever values it needs to describe its appearance, plus access to code to draw itself. To specify where something is to appear, use a descriptor like a Java "Graphics", or an Amiga RastPort. The item is drawn in the top-left corner of that described area, and it is that area which contains its position relative to an enclosing area. Thus, a described item can then be drawn at several locations, by drawing it into several such desciptors, *without* having to duplicate the description values of the item. Looking for significant-size class definitions in the Game Programming book, I encountered a game in 3-space. Surely every item modelled in that 3-space needs to record and maintain its own position? Not necessarily. Some items may simply be drawn at multiple locations using a common description. This was common practice before OOP, and is very similar to the GUI situation above. I need another example. Perhaps I should look through Don's MUD client code - how much of that is needed OOP, and how much is simply forced by the Java language and by its libraries? One observation, however: damn, I've lost it, and that's what I came out here to type, but I did the above first! Maybe: In order to pass an entity to a routine which accepts any entity which implements an interface, there must exist an entity. That entity must contain data, in order for the interface implementation to be useful. So, there *must* be some type of encapsulated entity description which contains both data and its own methods with which to implement the interface. Hence a class definition, and instantiation. But, this concept doesn't, I believe, require any concept of structural inheritance of data. If some situations appear to need that kind of access in common to several different data collections, then those classes can all simply implement an interface which provides any needed getter and setter routines for the common set of "values". Some might implement the values virtually, not actually storing them. I sure would like to create the needed interface method pointer structure explicitly, rather than implicitly. The "interface" description would be a function of signatures, thus allowing more generality than Java (and C++?) does, in that there can be more than one parameter to a member method that is of the "class" type. E.g. an interface can have an Equals method. One issue here is that of what the new "type" means. Is it the type of the struct of function pointers, or is it a type whose values can be any class instance whose class implements the interface? Hmm. Or is it simply a record containing both of those? The problem with that is that we would then need to allocate and build that record all the time. I'd prefer to pass the pair of values separately to any routine which accepts the pair. Can I define a syntax where some kind of "type" represents multiple values at once? Can I just do something like having a parameter to a function be a pass-by-value struct, so that I only need the one "type"? Maybe not! Let's try something. /* All occurences of "*" here are replaced by SigParam, the special parameter type which means "generic substitution". */ type testFuncs_t = record { proc(* left, right)bool tf_equals; proc(* theItem; uint count)uint tf_counter; proc(* theItem; uint count; bool how)uint tf_newCounter; }; ... /* In this context, the "*" says that 'theItem' is a reference value, of the same type as the "*" values in the testFuncs_t record. In a given execution context (e.g. the body of "user"), the "*" must always refer to exactly one type. There isn't really any special type-checks needed here. We only pass SigParam values as SigParam parameters to the procs in the testFuncs_t. */ proc user(testFuncs_t tf; * item1, item2)void: /* Note that we can have an "interface member method" that takes multiple parameters of the type being generalized. */ if tf.equals(item1, item2) then BI/Print("Items are equal\n"); else uint count; /* Note that we can thus allow optional entries in our "interface". */ if tf.newCounter ~= nil then count := tf.newCounter(item1, 10, true); else count := tf.counter(item1, 10); fi; BI/Print("count = " + BI/UintToString(count) + "\n"); fi; corp; /* If the "user" proc needs to access/change values that are part of the generic references it is passed, then it can do so via accessor and updator routines in the "testFuncs_t" "interface". */ ... type MyThing_t = record { uint mt_count; string mt_description; ... }; proc MTEquals(MyThing_t left, right)bool: if left = right then true elif left.count = right.count then /* Description doesn't matter. */ true else false fi corp; proc MTCounter(MyThing_t mt; uint countParam)uint: mt.mt_count * countParam corp; /* Special type checking needed here. All occurrences of SigParam in the testFuncs_t type description must be mapped by the same type in the value being assigned. Here, that type is MyThing_t. */ testFuncs_t MTFuncs := testFuncs_t(MTEquals, MTCounter, nil); ... MyThing_t mt1 := MyThing_t(10, "first"), mt2 := MyThing_t(20, "second"); user(MTFuncs, mt1, mt2); 050111/Tuesday Email from work: With the scheme from last night, one can pass the wrong type of reference along with the signature record to an interface user. Need to bind the two together somehow, and perhaps even more. The calls to the interface user are OK I think, since the parameters that need to be the same as in the signature record must be of the type that matches SigParam. Hmm. Even that might not work. Have to look more. Anyway, perhaps something that might help could be: Signature(thing_t, interface_t) sig := interface_t(func, func, func); This saves thing_t at compile time, and so it can check, at compile time, that all interface_t values are using thing_t as SigParam. Try again: /* An "interface" type is structurally the same as a "record" type. However, all fields must be proc pointers, and those proc pointers can use the special SigParamX types in their prototypes. Interface types are parameterized types, those parameters being the actual types that correspond to the SigParamX types used in the procs in the interface type. The type parameters, if given, are given in a parenthesized list after the interface type name. Such type parameters are part of the interface type, and their actual values (which can be left unspecified) must match if any assignments of interface variables/parameters are done. An interface type without the list of parameterizing types is called an "unspecified interface". An unspecified interface is not compatible with any other value, including an otherwise identical unspecified interface. Interface types are usually specified as read-only, preventing any changes to them. */ type TestFuncs_t = ro interface { proc(SigParam1 left, right)bool tf_equals; proc(SigParam1 theItem; uint count)uint tf_counter; proc(SigParam1 theItem; uint count; bool how)uint tf_newCounter; }; /* SigParamX values can only be "created" in restricted ways. See later. No operations are defined for SigParamX values other than comparison and the various forms of assignment (such as parameter passing). */ proc userProc(TestFuncs_t tf; SigParam1 item1, item2)void: if tf.equals(item1, item2) then BI/Print("Items are equal\n"); else uint count; if tf.newCounter ~= nil then count := tf.newCounter(item1, 10, true); else count := tf.counter(item1, 10); fi; BI/Print("count = " + BI/UintToString(count) + "\n"); fi; corp; type MyThing_t = record { uint mt_count; string mt_description; }; proc MTEquals(MyThing_t left, right)bool: left = right or left.count = right.count corp; proc MTCounter(MyThing_t mt; uint countParam)uint: mt.mt_count * countParam corp; /* When an interface constructor produces an interface value, it determines the actual types for the SigParamX types within it, based on the types in the prototypes of the procs used in the constructor. Nil proc pointers are allowed, and say nothing about a real type corresponding to a SigParamX type, and thus a SigParamX may remain unset, and thus matches any actual type in a parameterized interface type. This same rule can allow assignment and other use of interface value fields, when the interface is not "unspecified". */ TestFuncs_t(MyThing_t) MTFuncs := TestFuncs_t(MTEqual, MTCounter, nil); MyThing_t mt1 := MyThing_t(10, "first"), mt2 := MyThing_t(20, "second"); /* Values of the SigParamX special types can only be created in a proc call, when values are passed to parameters declared with those types. The values passed must be of actual types as determined by any interface values passed to the same call. Only SigParamX types which are so determined by the interface types will match any real type. Within the call parameters, all occurrences of the SigParamX types, including within any specified interface parameters, must match. */ userProc(MTFuncs, mt1, mt2); /* I believe the above rules, taken together, mean that an interface value passed to a using proc can be constructed directly without being assigned to a variable which is then passed to the proc. */ 050115/Saturday Trying again with interfaces. The above doesn't work at all - you can't safely do anything with the interface values. It looks like the values we use need to be a record containing a pointer to the procs that implement the interface for the value, along with the value itself. I was hoping to avoid the extra allocation and indirection, but it doesn't seem possible. The only other possibility I've thought of is to treat the pair as a unit. That complicates things, however. For example, such a pair could not be a variant in a oneof. The syntax will be icky, I expect. It needs to declare two types at once, that are bound together. It could also define a bunch of type names local to itself, that are used instead of the SigParamX types. /* Define the generic type bundle, used here like a Java interface. All values of generic (non-specific) types are read-only. */ bundle Fred(type value_t) { type FredOps_t = record { proc(value_t left, right)bool fo_equals; proc(value_t val; uint count)uint fo_counter; proc(value_t val; uint count; bool how)uint fo_newCounter; }; type Fred_t = record { FredOps_t f_ops; value_t f_value; }; }; /* These use the generic Fred_t. You can use a specific Fred_t as a generic Fred_t, but not vice versa. Same for FredOps_t, if useful. */ [] Fred.Fred_t TheFreds := matrix([100] Fred.Fred_t); uint FredNext := 0; proc fredUser(Fred.Fred_t f)void: uint count; if f.f_ops.fo_newCounter ~= nil then count := f.f_ops.fo_newCounter(f.f_value, 10, true); else count := f.f_ops.fo_counter(f.f_value, 10); fi; if FredNext ~= size(TheFreds, 0) then TheFreds[FredNext] := f; FredNext := FredNext + 1; fi; corp; /* Standard stuff here with a simple record type. */ type MyThing_t = record { uint mt_count; string mt_description; }; proc MTEquals(MyThing_t left, right)bool: left = right or left.count = right.count corp; proc MTCounter(MyThing_t mt; uint countParam)uint: mt.mt_count * countParam corp; MyThing_t mt := MyThing_t(20, "first"); /* Now we create and use a specific Fred. */ bundle MyFred = Fred(MyThing_t); MyFred.FredOps_t MTFuncs := MyFred.FredOps_t(MTEqual, MTCounter, nil); MyFred.Fred_t mf := MyFred.Fred_t(MTFuncs, mt); fredUser(mf); Q: What is a bundle? A: I don't know. Looks sort of like a nested package that can take type parameters, and whose symbols (other than its parameter names) are at its containing scope. Q: Are there other uses for bundles? A: I don't know. Could be. Q: Can I put things other than record declarations into a bundle? A: I don't know. These I need so far. I haven't thought past that. Q: How do you prevent me from messing around with your Fred_t's, and changing the FredOps_t, either in whole or in part? A: Anything that is of a generic type (one in which all type parameters have had actual types provided) is read-only. You cannot assign to the parts of it. So, you cannot assign to f_ops, f_value, fo_equals, etc. You can assign to variables of type Fred_t and FredOps_t. The former is useful, as in this example. The latter may not be. Q: Can I use the generic types to build other types. A: Yes. I don't see how using FredOps_t in something else can be useful, however. %%%%%%%%%%% What about storing a FredOps_t from one Fred_t in fredUser into a global variable, then trying to use the procs in that FredOps_t with some other Fred_t. Ick. [2010: Bundles gone] 050117/Monday Its actually worse. Talking with Roel on Saturday, we decided that if I change the syntax to get rid of the ".f_ops" clause in "fredUser", then you can't apply the wrong op to a value. But, later that night I realized that's still not enough. Given the example above, even with the ".f_ops" removed, I can still do TheFreds[0].fo_counter(f); The problem is that the explicit parameter to the fo_counter routine allows the programmer to pass something not appropriate. The syntax that Java uses TheFreds[0].fo_counter(); or f.fo_counter(); does not allow the mis-use, because there is no opportunity for it. Note that, in Java, if the fo_counter routine takes two Fred_t parameters (in Java the first is not declared, and is access via "this"), then the matching method in any implementation must declare that second parameter as of type Fred_t, and not as the type which it is implementing. Thus, the second Fred_t in the routine is still a generic Fred_t, and can only be manipulated as such. Thus with Java interfaces (and perhaps true of classes as well), you cannot write a generic Equals routine to compare two values according to their most descendant class. I think the closest you can come is to declare it (using Java syntax) somewhat like this: interface I { bool Equals(I rhs); }; class C implements I { private bool localEquals(C lhs, C rhs) { ... }; public bool Equals(I rhs) { if (rhs instanceof C) { /* Use a "narrowing reference conversion" runtime typecheck. */ localEquals(this, (C) rhs); } }; }; Can classes, either Java or C++ do this without the cast? I doubt they can directly, because they would be vulnerable to the 'rhs' parameter not being of the proper descendant class. Perhaps I need to think more about my more explicit stuff. Its a bit icky to require the explicit parameters, etc., but to also require them to be of a very explicit form, but if that does allow more flexibility and power, perhaps its worth it. If nothing more can safely be accomplished, then perhaps just go with the implicit parameter syntax that C and Java (and many other OO languages) use. 050118/Tuesday From last night: if I want to allow operations in a generic routine (like I've been doing here), then I need all reference values to include their type inside themselves. If I don't do that, then in the generic code, I can't properly free things if their reference count goes to 0. I don't know the internal structure at compile time, since it varies. I could just arrange to not free (pass nil as the type), and rely on GC to do the rest. Or, I could change my reference structures to include their types, and then replace my typestack with a stack of bools: ref or not ref type. (Over a year later: all allocated values include their type.) Do I allow non-word members in records? If so, how does that work with a record constructor? E.g. a uint8_t member? DONE Yes, all types are allowed in records. But, creating records is not currently done properly in bcRun.c/recordConstructor. I'll need to do some casting and byte pointers to do that right. The non-multiple values are all just pushed onto the stack, so it shouldn't be too bad. DONE Try an interface again. Use the full long syntax, but with the assumption of heavy restrictions on the use of "generic" interface types. bundle Sortable(type elementType) { type SortBundle_t = record { proc(elementType lhs, rhs)bool: sb_greater; [] elementType sb_elements; }; }; public proc sort(SortBundle_t sb)void: if size(sb.sb_elements, 0) < 2 then return; fi; for uint i from 1 upto size(sb.sb_elements, 0) - 1 do for uint j from 1 upto size(sb.sb_elements, 0) - i do if sb.sb_greater(sb.sb_elements[j - 1], sb.sb_elements[j]) then elementType temp := sb.sb_elements[j - 1]; sb.sb_elements[j - 1] := sb.sb_elements[j]; sb.sb_elements[j] := temp; fi; od; od; corp; ... proc stringGreater(string lhs, rhs)bool: lhs > rhs corp; [] string Strings := procToCreateMatrixOfStrings(); bundle SrtStr = Sortable(string); SrtStr.SortBundle_t strSb := SrtStr.SortBundle_t(stringGreater, Strings); sort(strSb); type Pair_t = record { uint p_left, p_right; }; proc pairGreater(Pair_t lhs, rhs)bool: lhs.p_left > rhs.p_left or lhs.p_left = rhs.p_left and lhs.p_right > rhs.p_right corp; [] Pair_t Pairs := procToCreateMatrixOfPairs(); bundle SrtPr = Sortable(Pair_t); SrtPr.SortBundle_t prSb := SrtPr.SortBundle_t(pairGreater, Pairs); sort(prSb); Important aspects that I want here: 1) there are no casts 2) there are no runtime type checks or method lookups 3) I want enough restrictions on bundles that this does not break the 100% type safety of the higher Z language. 4) There is only one copy of "sort" - the system does not duplicate it for the various types it ends up working with. Duplicating this little routine is no big deal, but this is an example only. Some notes: 0) I *think* I got the bubblesort right. :-) 1) I don't know that it will be reasonable to allow the declaration of "temp" in "sort". I may require a temp in the SortBundle_t. I hope not, however as that could be awkward. 2) You can't sort direct non-ref values using this - the genericity of "sort" only works with reference values. 3) The code in "sort" looks normal, and compiles to what a Z user would guess it compiles to. However, the usage of the things from a generic Sortable.SortBundle_t is extremely restricted. In particular you cannot use the methods from such a bundle on anything other than the values obtained from the same bundle as the method itself. You cannot declare or use generic SortBundle_t's outside of a routine which takes a SortBundle_t parameter. Or something like that - details to be explored. 050120/Thursday Don sent me an example where the return value of a proc which used a bundle was of the generic type parameter to the bundle. I hadn't thought of that case, but I think it ought to work. 050210/Thursday No work for too long. Lego, movies. <> Consider splitting up error codes one more level. So, have 16 bits for which function in a package, and 16 bits for the error number within the function. Then, the IDE-type stuff can automatically build the man page for a function based on the resource for those error codes, and knowing the error code number for the function. Also, each resource entry should have the text of the error message along with a help description on the meaning of the error message, the circumstances in which it is emitted, what to do about it, etc. Note that what this can do is make all function return error codes, but have a builtin way to turn error codes into error message. Even if no error message is directly needed (and why not always have them ready?) the help material is always useful. The sets of error codes should likely be Z one-dimensional matrixes of structs. Then, the error-message printing stuff can validate the passed error code against the matrix size. Fix up util_vector code so that it can properly fill in the type of the resulting vector (one-dimensional matrix). Then, use them a lot more for Exec, etc. structures. They take up less overall memory than linked lists do, and since most of the Exec construction code has a final assembly call, there is a place to convert from the temporary linked list in the final vector. [Done, much later] 050211/Friday <> Random thoughts about a "search" menu item in the IDE, searching for the uses of a symbol. Menu has entries for "In This Function", "In This Package" and "Select From Tree". The last brings up a graphic of the Package tree from the root to the current Package. The user can click on multiple nodes, toggling them on and off, to select a search from them down to their leaves. Those not on the path up start out as pseudo-nodes, with their descendants not shown. Clicking on a pseudo-node expands it, but does not select it. My first thought was just to show arrows to the pseudo-nodes, but having a good box for a name would be useful. My first thought had also been to require shift-click to select subsequent nodes. 050229/Saturday I've done the ref-count typed freeing code and it seems to work. An ugly is that I had to declare all of the types in the Z files before defining any of the functions. This is so that the types created by the ugly type-pre-init stuff I've got in C/FromZ/Types.c end up getting filled out as real types *before* their definitions are needed in the freeing code. The normal order of compilation has Base.z first, and it has procs in it. The Exec structures used, especially the temporary ones, end up getting freed as part of the compilation of those procs. If they are not fully defined by then, the refcount freeing code can't work - it ends up with NIL pointers to the actual definitions. So, I've split the type definitions out of the .z files and put them into a bunch of Xxxx0.z files, which are all compiled first now. This, unfortunately, means that even the simplest Z source file must now compile all that stuff first! 050223 (er, sometime last week maybe?) A ref: RaskinCenter.org I'm thinking that resources, such as my Exec, Types, Proc, etc. should be named by strings, not indexed by numbers. That way the naming scheme is more random, thus making it easier for multiple people to create new resource names. The individual messages within the resources can be numbered, because those are under much tighter control. (DONE) <> For videos, would it be useful to have the hierarchy go all the way through: version->variant->angle->frame. Variant and Angle come from the way DVD's can work. There can be a couple of variants (e.g. alternate endings) on the disk, and in some places there are multiple angles. 050324/Thursday I've been working a bit getting more interesting "compiletime" stuff going. I can now have a compiletime proc that emits code into the stream of the proc being compiled. However, I'll want to restrict things: - don't allow a compiletime proc to be called at runtime - there are issues with the special Proc/Context_t argument [Resolved later by not allowing them to be proc values] - the Context_t needs to be unmodifiable by the compiletime code, since it belongs to the parser and Exec. Keeping it an "ro" type exported by the Proc package works for that, so far. - the Context_t will eventually point at a Lex/Context_t (or whatever). That must also be non-modifiable. But, a bigger restriction is that I don't want the compiletime code to be able to call the Lex or Parse code with the stuff from the Proc/Context_t passed in. I don't want them able to consume stuff after the compiletime proc call in the "source". How do I do that? One way is to bring in the true "readonly" attribute for variables, and have the special code that calls compiletime procs clear out the references to usable contexts in the Proc/Context_t, and instead fill in their values into fields that are declared to point to readonly versions. That should prevent them from being passed to the Lex and Parse routines that actually use those contexts to consume input. Weird maybe, but it should work. [Much later: another possibility is to mark the field containing a reference to the Lex state be 'private', which means it cannot even be read outside of package Package. Then, Package could export a proc which yields the value, but only exports that proc to system parsers. I don't really like this, since I want to allow others to write parsers, and so Package will not know about them.] 050325/Friday Unfortunately, the above isn't good enough. There isn't anything to stop the programmer from making a copy of the Lex context and making another Proc context to point at that, and it then wouldn't be a read-only one. The problem lies in wanting to give the user the ability to append statements to the current sequence, while at the same time not being able to use the Lex context to consume things. There are also issues with how to pretty-print the compiletime calls. Currently, the intent is to somehow use the Exec/OptimizedExec stuff. [Resolved later] Hmm. Just in Lex stuff to add "::". The Lex state isn't what is needed for displaying error messages. So, perhaps just nil-out the Proc/Context_t pointer to any Lex state when passing it to a compiletime proc. Testing stuff. It would be nice to have an error message if a local symbol hides a parameter. [These are now prevented completely.] 050326/Saturday Grrr. There is a limit to how much I can mix the C and Z structures, etc. I can't do it with functions. A value must either be a Z proc or a C function pointer. There is no way the two can be the same. So, I think the right thing is for the various error routines in the C versions of the Z stuff to simply call a fixed routine, rather than trying to call via the "errorHandler" function pointer. That way, the universe can stay all Z compatible. Of course, that pretty much means that until I have a "system" wrapped around this stuff, the error handler and input context pointers will always be NIL, except in cases like "exectest.z", which has a legitimate error handler (but no input context, and the error handler just uses BI/Print directly). Wow. Looking through, it looks like the "ro" property is already there, both in declarations and in assignment checking, for record/struct fields, package and local variables, and proc parameters. And its there as much as is currently implemented, for pointer types. Argghh! The nature of a SymTab/SymInfo_t is such that it wants a Types/NamedDesc_t for a type entry. But, having to have a different one of those for the element, head and iterator types in Lists.z means that the various types are not assignment compatible! I could maybe fake it by using 1-field structs, but that is painful. Even more, if I try to make my own NamedDesc_t for the element type, and use it for all 3 cases, then that won't be compatible with any name that the user gives to the element type. 050406/Wednesday Been working on some Util stuff, including things like StringLength that is a constant for constant strings. Put in float negate handling of constant. In the divide and remainder constant cases for Binary, check the divisor for zero at compile time. DONE Add checks in Util stuff for nil proc (not called in actual proc). Is that even possible - don't I create a proc even for initializers? Yes, I do, but currently only after parsing the initializers. Just move it, and give it a name like -init. 050407/Thursday With the Init routines generated by the Lists package, if we have a queue (with head and tail), how do I init the list head without evaluating the expression twice? Its fine if its a simple variable. 050409/Saturday Exec/ProcCheck should be given (or assume and check) the result type of the proc, and it should check the type of the Exec against that. [ProcCheck is gone, and checked this the new way.] 050423/Saturday Generic types could be done in a different way than the Lists stuff currently is. That is, the functions could be exported directly by name, and could accept raw Exec_t's. Then, they would, at compile time, generate the required operations. If the operations don't make sense, the Exec code will complain. The routines could check that they are supposed to exist for the give kind of list. It may be possible to have a "Reverse" proc in Lists, that takes the head of any doubly linked list, and returns the same value, but with some kind of compile-time tag (apply a named type???) that indicates it is the reversed variant. Don discovered that the string concatenate and comparison bytecodes don't decref the argument strings. Fix. (A year later: I don't understand - there is no increment involved. Tested - see no problem.) 050507/Saturday First Z work in quite a while. Needed to make compile-time eval of uint comparisons work. Need to fix things up so that all possible compile-time eval is happening. Could be bulky. [It was!] 050509/Monday See: http://www.fundable.org <> Z considers the consumer to be more important than the producer. This implies that the user of a program, interface, API, etc. is more important than the implementor of it. Etc. Design many user data structures (e.g. images, document formats - especially tables) such that it is fairly straightforward to render them on the fly. This is important for applications like web browsing, where there is a time interval required to retrieve the entire item. In the particular case of a document, it is good to start rendering it right away, but it is also good to make the initial rendering choices correctly. Given that small pieces of code are cheap, and can be shipped with items, and that all of these formats will be generated by software, it should be possible to have ways to specify the total resulting appearance up-front. e.g. for a text document, the total width in text columns and fixed graphics items should be computed when the item is turned into a bytestream, so that that property, and any others that similarly matter, are known, or can be computed, right away. Hopefully this can get rid of a lot of the annoying reformats and redraws that happens with some complex web pages. [Much later: I don't see how this could work. You need to know the size of the display area in order to decide on the final formatting.] 050510/Tuesday Need to think through the use of variables that are smaller than Z_WORD_SIZE bytes wide. When they are local variables, or on the stack, they are the full Z_WORD_SIZE bytes, but in structs, records, arrays, they are not. When using passing one to a ref parameter, we take the address of the variable, and the proc will use load1/store1 to access/update it. Thus, on a little-endian architecture, the address of such a variable is the address of the first byte of it, but on a big-endian architecture, the address is the address of the last byte of it. Perhaps only "pshla" is affected? I think my current problem with the Lex.z/getNumber code relates to this - I need to think/look through how I do char/bits8 local variables and local arrays (not matrixes). Even local structs. How does alignment work for locals? Likely the same issues for package variables. [I think this is all now resolved. I think that now, even sub-word things just occupy the needed space as locals. April 2010: no, that is not the case - see test/local.z .] 050512/Thursday URL: http://www.openusability.org/ 050515/Sunday Reading comp.arch. Crypto routines need to be constant time. I.e. the code to do key operations should not be dependent in run-time on the key. If time variations are present, then there are means to obtain info about the key from other processes on the machine. There was concern about hyper-threaded CPU's in particular. May not be any real use for the "::" syntax. Compile-time procs can just call Types/ExportFind instead. 050516/Monday From /. "GreaseMonkey" is a Mozilla FireFox extension that allows the user to insert scripts into the browser, which can modify what happens with downloaded pages. 050528/Saturday <><> Moving stripcharts. New data can come in and be entered with the time of its arrival. Or, data can come in with its own timestamp, and be entered based on that. Chart motion can be continuous, or perhaps driven by data arrival. Updates can be done based on async data arrival, or the chart itself could callout to get data at regular intervals. A stripchart is just a moving version of a line graph. Would animated versions of other kinds of charts/graphs be useful? 050530/Monday <> Note that another name for type-tagged data is "self-describing data". 050608/Wednesday <> Link on /. for "Symphony Desktop OS": http://www.symphonyos.com/ Start with wikipedia: http://en.wikipedia.org/wiki/Symphony_OS 050728/Monday All spare time spent on Lego building of the Legislature. <> Had a thought just now, however, about library/type versions. A chunk of code references other chunks, and when it was written, it was written using some version of each. So, it is only known for sure that it works with those versions. If a new version of some library/type is produced, how do we know that the chunk of code will work with it? We don't, and we can't, without testing/code examination, etc. Presumeably, a minor version change in the library will leave it still compatible. But, a major change might not, because some behaviour is now different, to fix a bug or improve the way it works. Such a change can break our code chunk. How do we handle this? Suggestion: Define things such that minor version changes are incredibly unlikely to break existing code. Thus, our code chunk should run with minor new versions of things. Give the user, and developers, ways to specify *AFTER THE FACT ONLY* that the code chunk must have an exact specific version. To run it, the system must have a copy of that version. Most code chunks will run with new improved versions of libraries. But, we don't know that, and we don't want to break things. So, a central repository of compatibility knowledge is needed. Once someone has determined that the code chunk runs with a new version X of a library, that is published in the central repository. Client computers can periodically update their meta information for the code chunk to indicate that. They key is that it cannot happen by default or automatically - someone must verify it first. We can provide the user with a way to try new library versions, and the user can report the success, but only authorized people (for the chunk) can update the official central repository. The central repository can be polled periodically by all clients that are network connected. There can be multiple such repositories, with hierarchical authority/distribution, just like, say, DNS servers, etc. [That's a big ick!] 050724/Sunday Referenced by /.: http://www.identityblog.com/stories/2004/12/09/thelaws.html 050812/Friday <> I wonder if it would be useful to have specific bit orders allowed on numeric types and bitfield types somehow? Then the programmer of low-level stuff doesn't have to do conversions - they can just ensure they have proper declarations. It would likely be important to optimize out unneeded conversions, if there can be any. [2010 - nothing even thought of about this.] 050817/Wednesday <> Things like a document format with font/style/etc. changes should be self-optimizing, in that if someone changes the tagging of a chunk of text so that it is now the same as an adjacent chunk, the two should be merged. Hmm. Not sure that's possible. If a user doesn't have the editing/display code for a type, then they wouldn't have the hooks either. But, the creating system that has the code would do the optimization, thus making the stuff as good as possible for any systems without the code. 050831/Wednesday A link from The Register: http://www.koders.com A site with a search engine for shareware libraries, routines, programs. 050919 - from work <><> 1) GUI builder. Drive menus from a data structure which the GUI builder creates (under a specific name in some top package). Use the same code for nearly all stuff, but have to create the data structure manually, initially. 2) is it worthwhile, or philosophically OK even, to have a construct like "repeat statement" It violates the language philosophy both syntactically and in terms of being nothing but a reworded 'for' loop. My thought here was that it could be a one-line repeat of some simple action, instead of the 3 lines used for a 'for' loop. If I could figure out a usable way of allow compile-time stuff to do parsing things, it could be done that way. [2010: likely possible if I can do the new construct stuff.] Do I *really* have to prevent compile-time code from calling in to the parser and lexer? Do I care if the pretty-print of the code is different from what the user initially typed? The designer of the compile-time stuff won't get the result they want of course, but nothing breaks my system. I could just document that it is pointless to do that. Or, is this somehow a security breach? Ah, it might have been the issue that once they can do those things, they can change the meaning of code following their invocation, by consuming tokens that would otherwise be part of normal constructs. Ick. 050920 - from work http://www.classicgaming.com 050921 - from work Note that we want a way whereby comments in the "source" where a record or structure type is defined, should be attached to the internal rep'n of that type, so that users of the type can see it. I.e. those field descriptions are shipped around whenever the record type itself is. That likely means formalizing the comments. [Happening] 051010/Monday <><> (Been floating around in my head for a few days.) It could be that simple versions of many system tools can be done without really needing any programming - just a way to specify a use of existing tools with existing data structures. The example I had in mind was that of a disk usage tool. If there is a data structure in the OS that tracks the sizes and use levels of partitions, then if that is represented as, say, an array of structs, then just looking at it with the normal output methods it has is one version of a tool. A bit more can be done (like summing) if that data structure is the source for data for a simple spreadsheet. Spreadsheet utilities like graphing could also be specified. Quite handy for many things (e.g. a real-time CPU monitor) would be the ability to attach watchers to variables, so that the display is live updated. If enough of the nice graphing and charting facilities are in the spreadsheet datatypes, and those can be pointed at an appropriate system variable, and changes in that variable can be echoed in the spreadsheet display (or maybe its good enough for the spreadsheet to poll periodically), then we can have a handy tool virtually for free. 051016/Sunday <> Streams. A compound type "stream" is a continuous stream of values of some other simpler type. (I don't think a stream can contain a stream type.) A stream writer writes values into a stream. This could be a generator program, or something as simple as periodic sampling of some values. Note that the type being streamed can be a 'oneof' type. A stream reader reads individual values from the stream. Both the writing and the reading can be blocking. Can have a 'tee' function for streams - there are some details in just how it can work. Can you tap into an existing connected stream? What about the security implications of doing that? Will need some "select"-like routines that can be used to multiplex and de-multiplex streams. The merged stream will be a 'oneof' stream of the types of the streams being merged. If something like a multiplexor is written as a simple round-robin loop, then the data in the stream is a simple alternation. By using the "select"-like routines, the merged stream can flow more freely and more efficiently. On top of a more general "select" reader, there could be an interface that simply returns the index of the next stream to read from. Internally, it could maintain state that allows it to be "fair" in its choice. E.g. it could start checking at the next stream from the one it returned the last time. There could be options to allow it to do this both on data element counts, and on the raw byte-counts that have been transferred. The latter is likely needed for the best efficiency in use of some communication medium that the stream is traversing. If a stream writer would block when writing to a stream, it should have the choice of simply dropping the item. That blockage could be caused by the limited capacity of something the stream is going over, or by the slowness of the reader at the other end. With the possibility of tee's and stream merging, it might be useful to allow the loss of data elements at the read end of the stream. I.e. if the writer would be blocked because of a slow reader, the mechanism drops a data element at the read end (that has already been transferred), rather than pushing back to the writer. A mechanism like that would allow tee-ed and merged streams to not lose data because some reader is slow. Stream read and write ends could keep track of the number of elements they have discarded. Note that a stream type that references a type that is of restricted scope must similarly be of restricted scope. Although, it may not matter. If the stream data type is not available outside of some scope, then code outside that scope cannot declare or understand the stream type, and so cannot do much with it. Its like having a pointer to an undefined type - you can pass the pointer around, but you can't look at what it points to. Watchers. Although it would be nice to be able to attach a watcher to anything, that likely isn't practical. Some kind of polling could be packaged up to make that kind of thing a bit easier. For watchers, what might work is a variable that is setup to be watched. Reads to it would simply happen - the variable is exported read-only. (If that's not possible in the language, then I would hope that the compiler would inline and essentially eliminate an accessor routine.) Changes to the variable go through a settor routine, which is given an entire new value for the variable. If that value differs, then any registered watchers are called after the value update. Note that an optimization is to not compare old and new values of "large" variables if there are no watchers. How, and in what context are watchers triggered? For something like the strip-chart of kernel values, we certainly don't want to just call through a function pointer, from the context in which the watched variable is being updated. Alternatively, if watching must be from the same owner and security context as the thing being watched, then we could just do a function call, and that function could write the new value to a stream. If there is copying inherent in streams, then that serves to detach the two contexts. It might actually all fall out from the type rules of the language. 051024/Monday See new references for Minix - its updated with 4K lines only of kernel level code. 051030/Sunday Today started compiling for X86-64 on my new Athlon-64 X2. Also a much newer version of gcc. - wrote a quick size-testing program, and setup some conditional compilation to choose between 4 byte and 8 byte Z words. Note that I can't run a 4-byte Z machine as a 64-bit binary, since I use native pointers as Z pointers, and they won't fit in 4 bytes. I'm sure I'll have more issues here. - I had used "typedef unsigned char z_char_t". The problem is that the new gcc has 3 distinct types: char, unsigned char, signed char. The characters in strings are "char", so I got a complaint on every string constant passed to a routine with a "z_char_t *" parameter. So, I got rid of z_char_t and just use "char". Only after that did I wonder if there is a gcc flag to change the type of string constants. Oh well. I also added "-funsigned-char" to the Makefiles, and so had to add a 'u' to some hex constants in the lexer that are compared against characters. And, I had to remove the test of "char value" <= 0xffu, since gcc complains that it is always true. That's correct, but I liked it for the clarity it gave. - added a few more #include's of system files the new compiler needed (complained about implicit prototype versus builtin one). - added a few semicolons - the new compiler doesn't like completely empty switch alternatives (e.g. "default: }"). - new compiler doesn't like "*((type *) sp)++" - it doesn't like to have a casted value as an lvalue. I guess that is illegal, isn't it? Used to be allowed, though. So, move the cast outside. Note, however, that we can't do that for z_float_t values, since we do *not* want to convert from z_word_t to z_float_t. We must cast "sp", so that it is pointer to z_float_t, and then fetch through it. Then we need a separate "++sp;" following the fetch. - an interesting point about alignment. If I want the alignment of my 8 byte values to be 8 bytes, then I'll have to do something to force alignment of C structures to match that, since I map the C structures directly to my Z record types. The gcc __aligned__(8) should do it. Alternatively, I could drop the requirement that 8 byte values be 8 byte aligned. What is right will depend on what other CPU's require - the X86 family is quite lenient. It'll be a bit of a pain to fix the C structures, so I think for now I'll expand my test program to verify what alignments gcc is using, and just match those in Z. Huh! It turns out that on this platform, 64 bit values are 8 byte aligned, even pointers. So we stick with that. 051110/Friday After a few evenings of not much work, I found a problem that showed up in disassembly - I had broken Exec/PackageSymbolRef when I added handling of symbols selected from types. A simple fix. %%% Todo: at some point, go through, in Exec code, and optimize by handling do-nothing operands. E.g. add/sub 0, multiply/divide by 1, etc. The case I spotted is a subtract of the enum first element to turn an enum value into a uint. Now, is it legitimate to optimize that out right in the Exec structure? That results in an invalid Exec_t, because of the types. Perhaps do that kind of thing in the code generation? [Use the exk_alternate stuff.] 051123/Wednesday Cannot allow Proc_t to be constructed, since cannot allow attachment of the body Exec_t without running through Exec/ProcCheck. [Resolved] Need to check all local symbol refs for correct scoping. (Already done?) 051126/Saturday <> Aliasing. Perhaps the code generation can assume there is no aliasing of variables. Without casts there can't be many. But, there can be two variables/fields of the same type that point to the same record, etc. and loads/stores can get bad data if we try to optimize too much. Can the compiler check for the possibility at compile time, and simply disallow it, so that there is no valid possibility at runtime? 051213/Tuesday <> Could allow the user pretty much complete control over the appearance of the system. By this I mean essentially extending the idea of a toolbar to being a 2-D thing that the user has full control over. They can configure where various available windows are located, can fill fixed parts of the panel with fixed graphics or animations, can have fixed applications running in parts of the panel, can have sets of applications/animations/ whatever selectable by radio buttons, etc. Essentially, they can build their own fully operating "console" or "control panel", in the more traditional sense. Provide gadgets (radio buttons, sliders, knobs, dials, gauges, filler panels, grills, etc.) in motifs like chrome, gold chrome, gold, brushed aluminum, folded steel (like in fancy swords), various wood-grains, plastic, paquetry, etc. How extend to multiple desktops? Leave it up to the user. Let them mark things as appearing on all desktops, or appearing on one only. Even the overall structure and background of the various desktops can be specified. Essentially, they can build a huge "control panel" that takes several screens to fully display. 051216/Friday <> A mention from Dan Wilson a week or so ago: look into the "S" language. (And yes, searching for it is hard. I guess it would be for "Z" too!) All my tests are currently working. The last issue was one where I do: if false then fi; The problem here is that we end up with nothing at all. First, the code in Exec/constantFoldIf was producing an Exec/OptimizedExec that had nil for its optimized part, and that results in problems when trying to find out if the entire thing is a constant of any particular kind. I fixed that by plugging in an Exec/Nothing instead. Is that the right answer, or should I check for nil all over the Exec code? I think I'd rather not do that, since that would be allowing a lot of deliberately supplied nil's to get by. Likely doesn't matter, but the Exec/Nothing yields no code, so I don't really see a need for the added complexity. 051222/Thursday Worthwhile to go through and check (with Don?) the correctness of all of my overflow/underflow checks. Do those in byte-code machine as well as those in Base.[cz]. <><> Aliasing possible: multiple 'ref' parameters, and 'ref' with globals. But, the types must be the same. [I do some of the needed checking when checking to see if I need a temporary copy of something when passing an '@' value. See Exec/CreateLocalCopy] 060101/Sunday I think I do want to include formal parameter names when comparing proc types. That way there is at least a little more checking against accidentally passing the wrong proc. I can't demand that the proc type be the *same* proc type, as is done for records, etc., since an explicit proc won't have the same proc type as a declared proc type. Unless I do something like Algol68 did, where the normal proc declaration is a short-hand for the actual one, which is something like: id = proc corp; So, a proc that must be assignment compatible with a declared proc type must be defined that way. There is something to be said for that. But, what if the need for a proc type isn't known when the procs are declared. E.g. someone wants to point at either "sin" or "cos", and those procs are not declared with a proc type? Well, what if "sin" and "cos" (or whatever) don't happen to have been declared with the same names for all parameters? Then you can't use them via a proc pointer. What you *can* do is define your own procs that simply call the real ones. Is that an acceptible solution for both circumstances? It does make things more explicit. 060102/Monday Need to do something about the issue raised in RefNotes. [Resolved] From cg@GraySage.COM Mon Jan 2 10:35:10 2006 Return-Path: X-Original-To: cg Delivered-To: cg@GraySage.COM Received: by ami-cg (Postfix, from userid 1000) id 1D1B2BBD1F; Mon, 2 Jan 2006 10:35:10 -0700 (MST) To: djr@nk.ca Subject: Z: proc types Message-Id: <20060102173510.1D1B2BBD1F@ami-cg> Date: Mon, 2 Jan 2006 10:35:10 -0700 (MST) From: cg@GraySage.COM (Chris Gray) Status: R In most programming languages, two struct/record types are not compatible just because they happen to have the same set of fields (whether or not the field tags are the same). Programmers generally expect the protection against accidents that is provided by that. However, the same is not true for proc (function pointer) types. Proc types in C (and C++) pay attention only to the order and type of the parameters - functions that accidentally match the pattern can be passed or assigned to places that they shouldn't be. Is there anything I can do in Z about this? Should I? Currently, the parameter names are significant in Z, and I think I at least want to keep that. It provides a little bit of extra protection, and isn't really a problem. It simply means that if a proc type is used, that all procs that need to be compatible with that type must use the same names on all of their formal parameters. This can only aid readability. Going further. In Algol68, the common syntax for defining procs: proc ( ): corp (At least, that's what I think it is - my memory could be failing on this, since I never really used Algol68.) That syntax is actually a shorthand. The proper way to define a proc is as an in-place (they have some word for it) constant of the proper type: := proc(): corp; Again, I'm not sure of the exact details - my Algol68 manual is long lost. This same notion is true even for simple variable declarations, which are actually contants which reference newly created cells to hold values. int i; <==> ref int i := heap int; The key is that the type of the proc *is* the proc type, rather than just being equivalent to it. Thus, proc type-checking can be much stronger, and accidental matches can be signalled as errors by the compiler. Anyway, I could do something similar in Z, although I wouldn't want to duplicate the formal parameter list (which does allow renaming, I think): = proc corp; To invent an example: type KeyHandler_t = proc(uint keyClass, keyCode)bool; ... KeyHandler_t myKeyHandler = proc BI/Print("Got key: class " + BI/UintToString(keyClass) + " code " + BI/UintToString(keyCode) + "\n"); true corp; ... In my MUD language, I actually had immediate procs like this (they were assumed to return a "status" result), so I believe the syntax can work in the Z language. Is this worth doing? The positive aspect is that it makes it very clear what the procs are for, and what they must conform to. Also, if parameters are added, the various instances don't actually have to change. If parameters are removed, they generate compilation errors where they are used. (Well, in the persistant world of Z, it won't be anywhere near that simple.) The downside? Well, it becomes harder to use some pre-existing function for both its normal purpose and as a value for one of these proc types. E.g.: proc(float x)float func := if flag then Math/sin else Math/cos fi; Instead, the user would have to write wrapper functions: type MyMathFunc_t = proc(float x)float; MyMathFunc_t mySin = proc sin(x) corp; MyMathFunc_t myCos = proc cos(x) corp; MyMathFunc_t func := if flag then mySin else myCos fi; I believe cases where this is needed would be fairly rare. This whole concept is beginning to grow on me. Can you talk me out of it, Don? -cg From CGray@yottayotta.com Wed Jan 4 08:39:08 2006 Return-Path: X-Original-To: cg@ami-cg.graysage.com Delivered-To: cg@ami-cg.graysage.com Received: from isengard.yottayotta.com (isengard.yottayotta.com [198.161.246.10]) by ami-cg (Postfix) with ESMTP id EB083BBD1B for ; Wed, 4 Jan 2006 08:39:07 -0700 (MST) Received: from fw-edm-dmz.yottayotta.com ([192.168.1.2] helo=edm-exchange.yottayotta.com) by isengard.yottayotta.com with esmtp (Exim 3.33 #1) id 1EuAj3-0004LB-00 for cg@ami-cg.graysage.com; Wed, 04 Jan 2006 08:38:57 -0700 Received: from tethys.edmonton.yottayotta.com ([10.0.1.8]) by edm-exchange.yottayotta.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 4 Jan 2006 08:38:24 -0700 Received: from gandalf.edmonton.yottayotta.com ([10.0.1.132]) by tethys.edmonton.yottayotta.com with esmtp (Exim 3.22 #2) id 1EuAiW-0006e1-00 for cg@ami-cg.graysage.com; Wed, 04 Jan 2006 08:38:24 -0700 Received: from cg by gandalf.edmonton.yottayotta.com with local (Exim 3.36 #2) id 1EuAiW-00052I-00 for cg@ami-cg.graysage.com; Wed, 04 Jan 2006 08:38:24 -0700 To: cg@ami-cg.graysage.com Subject: GUI Message-Id: From: Chris Gray Date: Wed, 04 Jan 2006 08:38:24 -0700 X-OriginalArrivalTime: 04 Jan 2006 15:38:24.0300 (UTC) FILETIME=[E620DEC0:01C61144] Status: RO X-Status: <> Do windows need titlebars? They contain some buttons, the menus, and a place to drag the window with. Perhaps all can be done away with. Menus should work in all 4 directions: down, up, right, left. There could be a small button on a status/control/title-bar that expands to the top-level menu set. If the set won't fit (and this is in general), then use arrow buttons on the two ends to scroll within it. Minimal mode for a whole display - example with bar on top: left [bar-hide arrow button] menus (or button to expand to menus) quick-access gadgets tabs for all windows quick-access gadgets/buttons [bar-hide arrow button] right The tabs would be for all sorts of things - "browser tabs", text editing sessions, program editing sessions, graphics editing sessions, etc. This includes any needed concept of separate programs. 060106/Friday Spivey, J.M. The Z Notation: A Reference Manual. 2nd ed. Prentice- Hall, 1992. (Referenced in an online article linked to by Slashdot: Correctness by Construction: A Manifesto for High-Integrity Software Martin Croxford, Praxis High Integrity Systems Dr. Roderick Chapman, Praxis High Integrity Systems http://www.stsc.hill.af.mil/crosstalk/2005/12/0512CroxfordChapman.html) From CGray@yottayotta.com Mon Jan 16 12:55:33 2006 Return-Path: X-Original-To: cg@ami-cg.graysage.com Delivered-To: cg@ami-cg.graysage.com Received: from isengard.yottayotta.com (isengard.yottayotta.com [198.161.246.10]) by ami-cg (Postfix) with ESMTP id 57F2CBBD25 for ; Mon, 16 Jan 2006 12:55:33 -0700 (MST) Received: from fw-edm-dmz.yottayotta.com ([192.168.1.2] helo=edm-exchange.yottayotta.com) by isengard.yottayotta.com with esmtp (Exim 3.33 #1) id 1EyaRa-0003KG-00 for cg@ami-cg.graysage.com; Mon, 16 Jan 2006 12:55:10 -0700 Received: from tethys.edmonton.yottayotta.com ([10.0.1.8]) by edm-exchange.yottayotta.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 16 Jan 2006 12:54:37 -0700 Received: from gandalf.edmonton.yottayotta.com ([10.0.1.132]) by tethys.edmonton.yottayotta.com with esmtp (Exim 3.22 #2) id 1EyaR3-000L5g-00 for cg@ami-cg.graysage.com; Mon, 16 Jan 2006 12:54:37 -0700 Received: from cg by gandalf.edmonton.yottayotta.com with local (Exim 3.36 #2) id 1EyaR3-0002wc-00 for cg@ami-cg.graysage.com; Mon, 16 Jan 2006 12:54:37 -0700 To: cg@ami-cg.graysage.com Subject: levels Message-Id: From: Chris Gray Date: Mon, 16 Jan 2006 12:54:37 -0700 X-OriginalArrivalTime: 16 Jan 2006 19:54:37.0726 (UTC) FILETIME=[AE5C83E0:01C61AD6] Status: RO X-Status: <> - no-one - can do most things that require no access, but has limited resource limits - admin - has privileges over the hardware/machine, but nothing special with regards to anything else - - no resource limits (unless at the general system level), but is not verified, so cannot do much outside of the system they are logged in to. E.g. no online transactions, no access to remote email. - verified - same as , but is verified with a good password or some biometric. Can do transactions online. - verified code-trusted - can submit programs using low-level features. Verification *will* involve off-system verification. 2006-01-18/Wednesday Ick. Decided to move Base/StringHash to String/Hash. It is used several times in Types/UintHash. Unfortunately package Types contains a variable called String. So, referencing String/Hash there doesn't work. What's the proper answer here? I tried using "/String/Hash" to disambiguate, but the current parser doesn't accept that syntax. Ick. Check the optimization stuff (OptimizedExec_t) versus the rule I just added saying that we cannot ever change an Exec_t node after we have accepted it into a tree. Possible Ick. Check how proc types work. When do we fill in or change the offset fields? What if the user puts a given ProcParamList_t into multiple symbol tables (or multiple symbols in one table)? What are the consequences of that? Is that checked by Exec/ProcCheck? [Long since not an issue - the caller does not create ProcParamList_t values - they are created and added by Proc/AddFormal.] 2006-01-20/Friday <> When a new package is received, its references to other packages must be resolved. Call this action "binding". If a programmer wishes to change the package's code (e.g. a developer), then they need to make a copy of the package. That copy may or may not be already "bound". But, the programmer needs a way to "unbind" a package. Then, they can go edit the "import"/"include"/whatever statements at the top of the package, to cause references to go elsewhere (e.g. instead of a system package, go to a local programmer copy). Actually, you would normally unbind/rebind code that *uses* the package you are editing. Beware security considerations. Can't "unbind" a package that you don't own. What about a user copying things like Exec and Proc? We can't really stop them (they can type them in again if nothing else!). But, the key could be that the "procassign", needed to take the resulting "Proc_t" and turn it into a valid Z proc, is type-checking its argument against the real Proc_t, and the user's Proc_t will not match. "procassign" is not a routine - it is a language construct. Hmm, that doesn't really work - procassign is simply checked by Exec/ProcAssign, which checks against Proc/Proc_t, which someone could have rebound to be their own version. Similarly, a programmer can duplicate the compilation code, and put bytecode (and eventually native code) into their Proc_t. We need to prevent execution of that. It may come down to how doBcRun is called. It is passed a Proc/Proc_t (ok, Proc_Body_t in C). The current C code for parsing/compiling/executing can't be compromised because the user can't change it. That should be enough - they can build their own fully setup Proc_t, but they can't get it to execute because when they try to use the real "procassign" to execute it, the type won't match. 2006-01-21/Saturday <> Mixing code and other things in packages is interesting. A standard file browser/editor will work, I think. If a package just contains various kinds of documents and other packages, then it can be shown entirely graphically, with icons sorted by name. If it contains some code things, then those code things should be shown in their textual form, in the order they are in the package's contents. Where non-code things are in the package, they intersperse with the text form of code things. Where there are several non-code things in a row, they are displayed with the same packing of icons as if there were no code things in the package. [Much later: early dgol package browser puts all contained packages and subpackages first.] If the package display is set to non-icon form, then it is a sorted listing of the names and sizes of the things in the package. That is the "long form". Shown will be name, size, date-stamp, and perhaps permissions. The usual clickable column headers would allow choice of how to sort the entries. It would be nice to find a way to allow four choices on the entry name sort - two which ignore case. Each package can have a "Help" entry, which is the help/docs for that package. In either format, it can be displayed at the top of the display for the package. Perhaps want a close button to not display it - it could be fairly long and "in your face". Have to think about what formats are supported for the help, and what its "reserved" entry name is. %%% In my ByteBuffer stuff, in a few places I'm passing what is essentially a uint to a bits8 or bits16, etc. parameter. Should check range, I think. [Perhaps not - one of the points of 'bitsXXX' is that they do *not* involve range checks, overflow checks, etc.] 2006-01-22/Sunday Why does "Exec/If_t" have if_type in it? That's needed during type checking when building it, but is it needed later? Same for Case_t. [2010 - they are gone.] Exec_AddToBuffer - on readback, how are types handled properly? Hmm. Rename to something else when add GetFromBuffer. (Er, why?) 2006-01-29/Sunday If the various linked lists in Exec_t and Types_t are turned into vectors, then the length of the vector can be put out before the elements when turning these into byte streams. It can be put out as a bits8, with the convention that a value of 0 means that a uint with the true value follows. That allows for arbitrary element count, but uses just a bits8 value for virtually all of them. [Done.] <> At some point will need to document the behaviour of recursive compiletime procs. They work - see Types/Range, but are a bit peculiar. Note that you cannot have mutually recursive compiletime procs - it doesn't make sense - you can't run a proc before you have specified it. What you currently get is: *** bc_jsr: calling erroneous proc *** 0x000e+rec1() 0x0003+() Execution terminated. which will do for now. Perhaps simply issuing an error message when a predeclaration contains 'compiletime'. Done. Don R. suggested changing the "ignore" language reserved word to "eval", which is what Modula-3 uses. The argument is that "ignore" could mean "totally ignore what comes here". I agree, and have done the change. 2006-01-31/Tuesday Warnings. Do I want them at all? There are 4 in Exec: (These are via routine "stringNilCheck") /* 147 */ "Left operand of string comparison is 'nil'", /* 148 */ "Right operand of string comparison is 'nil'", (These are in case constructs) /* 099 */ "Enum member '", /* 100 */ "' not handled in 'case'", /* 101 */ "Oneof tag '", /* 102 */ "' not handled in 'case'", If a 'case' doesn't handle all cases and has no "default", does the bytecode machine do the right thing? Note that currently the system counts warnings as errors anyway. 2006-02-04/Saturday <> Likely *can* allow syntax extensions safely. Just do it at the parsing level, not at the lexical level - sort of. Let the programmer define a new syntax in terms of new "reserved words" (perhaps just keywords in that context?). So, a new construct could consist of: keyword1 keyword2 keyword3 for example. The programmer would provide routines that are called at the occurrence of each keyword. Those routines would check the 's, issuing error messages as appropriate, and system routines would append the 's, to a generic list being built up. At the end, the user routine (these are all called at compile time) would call a system routine to "end" the construct, which would put it all together and return an Exec_t node that represents the whole, including a pointer to the user's supplied routines and description. Hmm. To make it work, the user programmer would need to be emitting code in those routines. That may be OK - we can use an OptimizedExec_t, in which the optimized part is what the user's routines have emitted, and the non-optimized part is the system structure referencing the user's description of the construct, and the list of Exec_t's that were present. That lets the pretty-printer print out the original. [Much later: ended up with "construct" procs. Even later - planning on doing this - in a more general way than existing 'construct' procs.] 2006-02-06/Monday Really do need to issue an error if a local variable is the same name as a proc formal! Couldn't figure out a bug because of that. DONE. 2006-02-07/Tuesday Resolve type equivalences. Get rid of Normalize and all its uses. Do the normalization at the end of creating each of the kinds of types that we can define equivalence for. [Partially done] DONE Type equivalence again. In Draco, if you named a type, it was not compatible with any other named type, but could be compatible with type type it names. Useful for things like array/matrix/pointer types. Check how this is done in Z. [Pretty much the same] 2006-02-23/Thursday Compiletime procs could essentially be varargs-like. An iofunc is fine with no ':' stuff, and accepts a list of triples. It can interpret them as it choses. <> Could write a "Search" routine for the Lists package. If the list is one with a simple representable type, then "Search(, )" searches for a list element with the given value. If the list is of structs, then "Search(, , )" is needed. 2006-02-24/Friday <><> In an edit window where there are errors, holding the mouse over a spot with an error could produce a popup line with the error message. 2006-03-03/Friday Need to review the type compatibility code, etc. Want to make sure that it doesn't allow too much. E.g. type PrivateType_t = record { ... }; type NamingType1_t = PrivateType_t; type NamingType2_t = PrivateType_t; The two naming types should be compatible with PrivateType_t, but not with each other. public type OpaqueType_t = PrivateType_t; It should not be possible to construct OpaqueType_t outside of the defining package, or know anything about OpaqueType_t. [Resolved] <> Code will have to be "recompiled" if it depends on a package that changes something visible externally. That includes the values of enum/oneof tags, the values of exported constants, and the prototypes of procs. Recompiling can be done by walking its trees and rebuilding them, thus letting the constant expression evaluation code work again. 2006-03-04/Saturday <> Just doing compile-time evaluation of case expressions, using optimizing of case constructs. It occurs to me that one thing that is thinkable is to evaluate the value for a constant declaration by creating a proc out of the Exec_t after the '=' and running it at compile time. Thus, it could, for example, contain a call to BI/Print, which would come out at compile. That could be conditional, if the Exec_t contains an 'if' or a 'case'. However, what if the constant declaration is inside a proc, and references another constant defined in that proc? That is OK, in that the constant must be already defined. What, however, if we find that the proc contains a reference to a variable of some kind? I don't think we want to get into allowing that, even if we could. One way out would be to pass a flag to the bytecode generator that says "don't allow references to variables", and only accept the expression as compile-time if the code generator doesn't detect any such references. Is this really worth it? I'm getting almost the same result by just doing the optimization of "if" and "case" constructs, which ends up with an OptimizedExec_t containing the chosen value. 2006-03-11/Saturday Added five "q" opcodes yesterday, and used "bc_qadd" in the 'for' loop code. Also added some special cases in the 'for' loop code. The first step down a long path towards better code. <><> The terrain.z code is likely one that will be *very* hard to optimize, in terms of knowing that the array indices cannot go out of bounds. Changing if l2 = SIZE then l2 := 0; fi; to l2 := l2 % SIZE; will greatly simplify things, since we now know that l2 is within the SIZE bound of the Cell array. Similar for the c2 computation. What about the l1/c1 computation, however? One approach would be to try to determine that "step" always evenly divides SIZE. Then observe that the way l and c are incremented, and that the while loops ensure that we will stop before l/c = SIZE, and so l1/c1 never hit SIZE, since nextStep is half of step. Similar analysis for l2/c2, along with looking at the 'if's on them, might make it possible even without switching to the modulo. 2006-03-13/Monday <> tudos.org, demo.tudos.org 2006-03-16/Thursday Must prevent nasty folks from grabbing a pointer to a package-private proc and calling it. Likely similar for other package-privates. This *might* be checked by Exec/ProcCheck. But, it probably makes more sense to not store actual pointers into other packages in the Exec_t trees. Instead, a PackageRef_t should reference indirectly, though a node that contains a path to the package and the symbol being referenced. These can then be resolved when byte-code is generated. I suspect there will be issues with types, however. The type-checking needs type structures in order to work. Similarly Exec_t nodes contain pointers to their result type. How can that be done without pointing to the type as exported from its defining package? Types can go through indirection nodes too, but must resolved during compilation, not just during code generation. Dunno. [I believe the latest Exec/Type verification stuff handles this.] 2006-03-19/Sunday <> Ok, so having moved the allocation and initialization of package-level variables to just before main is called, I have a problem. Compiletime procs. They run *before* main is called, but need packages they reference to be ready for use. Ultimately, when there is an IDE-like thing, I'll need to re-allocate the space for package variables whenever one is added or removed. I could do that now - I don't have many such variables anyway. But, there is one big problem. The package-level variables for the Types package are special- cased. They are not in fact allocated at the Z level - they are in fact allocated and initialized at the C level, since the C code and Z code must share the type definitions which are in the Types package variables. Hmm. I guess I can just move where the special-casing is done. For Types, when adding a new package variable, don't re-allocate the variable space. Does that mean I should also run the individual variable initializers as I encounter them? I'd rather not, since there are yet-to-be-addressed issues related to importing a package from outside the system - presumeably the initializers run then. But then, what is the semantics of package variables with respect to compiletime procs? Can one add package variables to a package whose facilities have been used by compiletime procs? Semantically, I don't see any reason why not, but implementing it efficiently may be another matter. I'd rather not have to check for package-initialization on every reference to a package variable. I may end up needing more than one way for things to be done, depending on the circumstances. Perhaps something like a concept of whether or not a package is yet initialized. If it is, then package variable initializers must run right away. If not, they don't run until the initialization time. Then I just need to figure out the right time to do the initialization. Maybe for now it can be manual, i.e. a builtin call to do it or something. That will bite me eventually, of course. Will there eventually be package variables that are global to the entire system (e.g. the Lists list?), as well as local to an invocation (which is what, a "process"?) Allocation and initialization of them will be different. Must such global package variables be necessarily persistent? Would those two properties (global existence and persistence) be tied together? One alternative is to disallow the initialization of package variables. Instead, the programmer must write explicit initialization routines. When are those routines called? Well one for global stuff, if they are indeed persistent, would happen when the system first receives the package. A package that is part of the base system would then come pre-initialized in the initial database. Per-"process" initialization could be done then, at process startup time. This too suggests that packages contain definite lists of their references to other packages (and perhaps there are their "import" lists). Then, the entry point for the process can, recursively, identify all packages that it uses, and they can all be initialized within the process. Again, though, what of compiletime procs? Its likely that packages will be mutually referencing, so there is no determinable order of initialization that will work. This at least suggests that there should be a specific point of initialization. So, for now, I will have a builtin, that must be called at compile-time, before any user code is compiled, that will init all packages. Note that this assumes that this assumes that the system packages do not have compiletime use that this would not work for. (Later) I think I'm liking the option of not allowing initialization of package-level variables. What about interdependencies if something (or someone) were to move the variables around in the package. It seems fairly fragile. Requiring an explicit init function seems clearer. I can't remove the facility until I add the init function, however. DONE [And then much later, undone. Sigh.] 2006-03-20/Monday Argh. I'm testing out the new "use" statement. I'd like to just have some Z code that runs through the "use"es of a package and prints them. But, I can't reference my own package by name anymore. And, compiletime procs aren't yet setup to return Package_t values. And, compiletime procs die if they try to store it into a package-level variable, since the package init hasn't run yet, and so the variables are not allocated. [RESOLVED] 2006-03-22/Wednesday <> An OS URL: http://www.centos.org 2006-03-29/Wednesday <> When storing packages to a file, references to items within the package itself need to use a similar kind of symbol reference as do references to other packages. This is because packages can have things like types that reference each other. The symbolic reference will allow the system to fill in in-memory references as the referenced item is loaded. It would be good if the same representation can be used. Then it could share the same helper structures in the package loader. One could be simple pair table. One element of the pair is the address of where a pointer to the referenced item needs to be (could chain them for memory efficiency). The other element is a C-struct (no need for a full Z typed struct) that contains the path/symbol. As further items are loaded from this and other packages, the paths of the newly loaded items are checked for in the table, and thus earlier references can now be satisfied. In the header for a package, could keep a count of how many references in the package are yet to be resolved. Only when that hits zero can the package's init routine be called, and procs, etc. in the package be used. The number of unresolved forward references might be minimized by pre-selecting some packages for loading. E.g. Types, Package, Proc. Some special handling will be needed for builtin functions. But, it may not be necessary to call the existing code that constructs the entries for the builtin functions. Instead, those could be written to the Z world file as part of writing the BI package. (Which means that the builtin procs needed to be appended to BI's contents list.) When the BI package has been loaded, then run through the contents list, finding all of the functions. Compare their names with C-level names in a table of entries containing names and code pointers, and when found, fill in the nativeCode pointer. After language support for low-level programming is in place, could have Z versions of the storage allocator, the reference counting code, etc. Little more than the byte-code engine would remain in C. There could thus be a pointer to some raw byte-code in the Z world file. That is loaded first, and jumped to to boot. That won't directly work, since there needs to be builtin functions for raw memory allocation and file reading/seeking, if nothing else. Perhaps there could be a mini package as the first package loaded that loads the definitions for those few items, as builtins. Perhaps even most of the package loader could be in Z code. As with the current kludgy Types.c startup stuff, there could be C code that directly creates the few needed builtins, then loads the boot bytecode and jumps to it. When testing that kind of booting, can use a builtin for printing to spit out diagnostics. 2006-03-31/Friday For compiletime packages like List, perhaps the answer is to use *both* styles - the List/ style as well as the :: style. Use whichever is appropriate. The inner functionality of the "::" style would also be used for other things (other than the iteration ability), such as having an output formatting routine, an input parsing routine, an input checking routine, etc. [April 2010 - the Fmt code uses a "fmt" proc found on types.] <> Speaking of formatting, when the lower-level language abilities are present (and this may prompt me to do them earlier), could add a generic record/oneof/etc. formatter, that can be used from the "Write" compiletime proc that I've written about before. So, if the type isn't a basic type, generate a call to that generic routine, passing in a valid ref (OK, so doesn't directly work for structs). That ref pointer will contain the type of the value, which the generic code can use to pretty print the value. There is room for a character format code and two uint's on the call - they can be used for things like specifying the maximum depth to go when printing ref structures. All 0's could mean "do a nice job please". Part of the job (nice or otherwise) could include a pre-scan of the structures, looking for multiple refs to the same record/whatever. If that happens, generate a label to put in front of the first printed occurrence, and use that label for the remaining occurrences. [Much later: Fmt stuff does a lot of this.] <> Could have input parsers/checkers(?) for things like Exec, Type, Proc, that wrap up calls to the parsing routines, as needed. 2006-04-02/Sunday <> An array type has the usual byteSize field. That has to be recomputed if the expressions in the type change, which can happen if a named constant that such an expression depends on, changes. That then invalidates all values of that array type. How does that work? Such a value can be outside of the package in which the named constant is defined. Definitely an issue, but then it is the same issue that comes up if one adds a field to a record. Essentially, this is when we have to introduce version numbers to types, so that old values are not compatible with new values. A new type structure is created when needed, and it is now referenced from places that can find it. Objects of the old type still reference the old type, but as those objects are removed, the old type loses references, and can perhaps be garbage collected eventually. 2006-04-06/Thursday <> When a package is removed from memory, it needs to become "un-inited". If nothing else that requires going through its package variables and assigning NIL to any reference values (freeRef is enough), and of course freeing the package variable space. 2006-04-10/Monday I've used Types/BitsPromote right when building a tree node for a reference to a variable, formal, field, etc. That allows those values to be used as uints in expressions. It does not affect storing to them, however, (as in the size of stores), because the storing does not pay attention to the type of Exec_t, but rather to the type in the field, variable, etc. However, I note that adding a pair of bits8 values gets an error message for the left-hand operand. If the BitsPromote is happening, why? FIXED Haven't yet added Exec_t node types for local declarations. So, when a proc is printed out, there is no declaration for locals - where the declaration was, simply an assignment statement appears - this was added to the TempSeq_t during parsing. Likely the right thing for a variable declaration is to have a node type that is the same as an assignment, but simply flags a declaration. In that case, the "as_src" equivalent is allowed to be nil. Other types of local declarations? Do we want to allow any? Types? Constants? Have to address this at some point. RESOLVED. I'm trying to put in proper reference paths for symbols from packages. What do I do with types? Exec_t nodes all point to the type of the Exec_t they represent. Often that will be a type from a package. Is the Types/NamedDesc_t the right place for the package path stuff? It might well be the only possible place, if I want to catch all occurrences. So, when I have an Exec_t that is, say, a record constructor, I don't really need to have a separate reference to the type being constructed - it is already in the Exec_t itself. Perhaps it is conceptually better to have it there explicitly, like it is now? The code in Exec uses the same type pointer for both places. Hmm. The Types/NamedDesc_t is something that exists inside the exporting package. The importing package needs to link up to that field, for all of the Exec_t's in it that are of that type. Aha - Exec/AddToBuffer does not add the Exec_t's type to the buffer. It doesn't need to, since when the Exec_t is reconstructed when the Exec_t's are rebuilt from the persisted version, the types will be filled in based on all of the leaf values. So, the RecordConstructor and VariantConstructor nodes need to reference the type properly, and so do any declarations, but there doesn't need to be other references. So, those should have path references to the type in them. We get a similar situation for types defined in a package that reference types from other (or the same) package. They need to link up properly, using a path. The Types/NamedDesc_t node is the logical place to do that. That means that when a package is being reconstructed from a persistent store, it needs to resolve those, and somehow make the using types refer directly to the Types/NamedDesc_t in the exporting package. That could be a problem, as I'm thinking right now. Perhaps an extra indirection, so that the referencing package can have its own NamedDesc_t that in turn eventually references the one from the exporting (or self) package? [Resolved much later] 2006-04-13/Thursday <> I'm thinking that the language should *NOT* allow identifiers to contain symbols from the non-ASCII values. Allowing that makes programming easier for non-English speakers, true. But, and this is more important, it makes it much harder for non-speakers of that particular language to work with the code. It also makes it quite hard for reviewers to check the code for acceptability against Z standards. I believe that English has mostly become a universal language for programming. So, Z should require it for identifiers and keywords. The full range of character sets can be used in strings. I'm thinking comments should be restricted as well, except for quoted strings within the comments. [2010: names and reserved words are restricted to the "western" set of A-Z, underscore and digits.] 2006-04-15/Saturday Don and I have been emailing about adding a unary '+' operator to the language. It is basically a shorthand for Base/UintToSint - it takes a uint value, checks its range, and makes it a sint value. This came up in: sint signValue := if signFlag then -1 else 1 fi; which currently produces a compile error - the two halves of the 'if' expression are not yielding the same type. Adding this would require a new bytecode to do the runtime check. If the operator is also allowed to operate on sint values (note that unary '-' will not operate on uint values), then the code generator would have to check the operand type and do nothing if it is already sint. In the above situation, it should be possible to note that the current return type for the 'if' is sint, and to silently turn the 1 (which by default has type uint) into a sint constant. [RESOLVED - see Exec/SintPlusNew. Also the new conditional stuff.] 2006-04-20/Thursday <> MakeBoot(proc()void bootFunc) Grab everything needed. Use absolute branches, but chain them all together per target, so they can be relocated at load time. I've made 'for' variables "ro". So, don't need all the extra checks. DONE. 2006-04-21/Friday Remember to make the scope for a 'while' statement span the entire statement, so that a variable declared in the condition can be used in the body. DONE. <> Conceivably there could be a "casual programmer" mode, which might be the default, where non-English characters can be used for identifiers and comments. This would make it easier for non-programmers to at least dabble in programming, e.g. for spreadsheet creation. Every type used at the top level of the system must actually be represented by an Exec_t that yields a type constant. That's because all such uses might be a type constant expression. This is uses like named types, variable/constant declarations, uses for fields, etc. Basically check for calls to ParseType and ParseTypeSpecial, and replace all except the one from inside parseUnit. Later: no, but want a type kind that means "evaluate this Exec_t at compile time". DONE. Make all of the Package/AppendXXX routines define any symbols themselves, as well as checking for duplicates. DONE. 2006-04-22/Saturday URL for free 3D-library, etc.: http://www.crystalspace3d.org Also something called "Ogre 3d" Another URL: http://www.delta3d.org Another URL: http://www.panda3d.org Plus the Quake engines (1-3) on ID's ftp server I've put putting in the tik_exec type kind. I got lots of errors when compiling Z code, because I was ending up with a tik_exec used for all named type references. I wanted to keep it that way so that lots of places that might use a eik_exec would get tested. But, I don't think that's right, so I made it special-case an Exec_t that is just an eik_typeRef. Everything compiles again, but I wanted to test it some more. So, I tried: proc giveValue()if true then uint else sint fi: 17 corp; The problem here is that Types/ExecNew complains that the type passed to it is not a constant type. It's not because the Constant expression routines do not skip past an eik_scope. (Or an eik_sequence for that matter). Why is there a scope in the above? It's because Proc/ScopeEnd won't delete the uppermost scope in a new proc. The scopes for the 'if' construct above are not in a proc, but the proc is still sitting around in the Context_t when we parse the result type, so those scopes get attached as the upper-most in the proc. That appears not to hurt, and I expect I could make it work by skipping scopes as needed. However, what are the consequences of messing around with that proc, whose Exec_t should now be closed? The same might happen if there is a tik_exec for one of the proc's formal parameters. Does it make any sense at all to have a scope outside a proc? What if the user tried to declare some variables in it? Perhaps the answer here is to only have the current proc be in the Context_t when parsing the Exec_t that is the proc's body. Second, have the scope start/end code do nothing if we are not inside a proc. And third, disallow any declaration if we are not inside a scope. Would there still be any consequences or other things to catch? Hmm. The real thing I want to allow is the calling of compile-time procs. So perhaps the above example using 'if' is not valid? If that is the case, maybe *all* valid tik_exec types start with an identifier? That would simplify things. It's also not restrictive, since you can always create a compile-time proc and do whatever you want inside it. Note that one of the formats I likely want to support is "id :: symbol", so its not just simple identifiers and proc calls. RESOLVED. A separate issue: in my ctime.z test program, routines test4 and Inc are inserting code into the sequence of their call spot. That's not quite what I now want, since then the code shows up in the main sequence. What I want is for them to generate a new sequence, which is the body of a new Exec_t which they return as the oe_unOptimized of an eik_optimized to replace their call. How do I enforce that? RESOLVED. 2006-04-23/Sunday At some point, go though Don's floating point routines in IeeeFloat.z and do a bit of "cg-izing". Need more code in Exec/Binary when handling sint subtract. There are two more cases where we have to convert a uint constant to a sint, with the needed checking. RESOLVED. 2006-04-25/Tuesday Reminder to do right away: 1) check that the sint fix to bo_sub works right. I think it was for "2" in the float test program. DONE 2) look in Binary for other similar situations that need fixing. DONE 2-a) note that IsUintConstantExpr and IsSintConstantExpr are identical! Can surely simplify more in Binary because of that. Should likely also simply have both call a common routine. DONE 2-b) can we do similar things for uint versus sint constant exprs? Probably yes - do we need to? DONE 2-c) More compile-time comparisons can be done for comparison ops. DONE 2-c) The logic on the comparison ops is not the same as the new stuff used for +/-, etc. Is it right, or too restrictive? DONE 3) when parsing a proc, don't have a Proc_t in the Proc/Context_t except when parsing the body, and don't create scopes if there is no Proc_t. Issue error messages on any attempt to declare something when there is no scope. Might do (5) first. RESOLVED 4) get back to testing that tik_exec types work in lots of places DONE 5) then start on the few new ExecKind_t's needed for variable declarations, type declarations, constant declarations DONE 2006-04-27/Thursday What should the rules be when mixing uint and sint as operands to binary operators? The rules up until now could be roughly described as "require the same types, except when constants are present, in which case convert the constant as needed, which will issue an error message if it is out of range". Using the notation of "+N" to mean positive sint values, and "N" to mean uint values, here are some situations and possibilities: N1 + +N2 : seems clear that it should work (with a compile-time check for overflow). Is the result uint or sint? If the result is sint, presumeably we first try to coerce N1 into +N1, which could result in a compile-time range error. +N2 + N1 : should his always be handled exactly the same as the first example? I.e. we never treat the LHS different from the RHS? Perhaps the LHS determines the desired result type? N1 + -N2 : this is well defined if N1 > abs(N2). Otherwise, it will yield a negative number. Should that force the evaluation to be done as sints? What if N1 is out of range for sints? Should the values of the two operands control how we interpret this? We could, for example, compare the magnitudes, and yield a uint if the result is positive, and a sint if the result is negative. That would allow this to work regardless of the size of the values. Is that too complex to be a rule of the language? We certainly can't follow that rule if the two operands are not constants. +N1 + +N2 : if the sum is too large for sint, should we convert to uint to avoid issuing a compile-time error message? There are corresponding questions for subtraction. All of this came about because of the desire to allow positive named sint constants to be used with uint variables, and other similar situations. In looking at the code, I noted that my routines for testing whether something was a uint or sint constant expression were identical. They didn't start that way, but expanded to allow the above. For now, I've made a 3rd routine which is what those two were, and added new ones that include range checks when the constant is of the other type, and only return 'true' if the constant is in range for the type being tested for. Now I'm trying to go through the binary operation code and figure out what should be allowed. Asside from +, -, * and /, there are also the comparison operators. There are also things like the bit operators, & and | (and unary ~). Should they be happy if their operands are positive sint constants? Should they complain about the value range if the operand is a negative sint constant, or should they drop through and simply say that the operands must be uint (which is what they will do for non-constant sint values)? Etc. RESOLVED. Later: another consideration is that if we decide that the left hand operand is invalid, when we see the right-hand operand, we don't have a way to indicate that the error is at the location of the left hand operand - it could be arbitrarily far back in the token stream. We *can* do this for &, |, ><, << and >>. Yes, it would be possible to introduce a mechanism that records arbitrary token stream positions, and thus put error indicators earlier in the stream, when we have a buffer in a window to indicate errors in. But, if the tokens are coming from some kind of real stream, that's likely not possible. Perhaps not rely on the Base/ routines to emit such errors - do it directly, so that we can explicitly mention the "left operand". Error messages for the left operand will come out at the position of the operator already, since that's as early as we know what we are doing. 2006-04-28/Friday From: Don Reble Chris: > What should the rules be when mixing uint and sint as operands to > binary operators? There may be a more fundamental consideration than those rules. I ask, which calculations are possible? (Let's pretend that integers have two bytes, so the constants are easier to type.) --- Let uint a := 40000; sint b := -10000; Here, (a+b) is 30000, which fits in both types; how does one compute it? The obvious (a + tobits(b)) and (frombits(sint,a) + b) cause overflows. Must one write this, for an unsigned result? if b = MIN_SINT then a + 0x8000 elif b < 0 then a - tobits(- b) else a + tobits(b) fi Or does the translator produce all that when one writes "a+b"? --- Which signedness should one produce? It's easy when the expression value is bound immediately; but often, it's just part of a bigger expression. uint a := 2; sint b := -3; sint c := 2; Here, ((a+b)+c) is a small positive value; but (a+b) must be signed. Whereas if uint a := 30000; sint b := 10000; sint c := -20000; Then in ((a+b)+c), (a+b) must be unsigned. --- I see two extremal alternatives, each of which should pique your wrath. No doubt you'll pick something in-between. 1) One may not mixed signed and unsigned values in an expression. 2) If an expression has both signed and unsigned values, the translator can pick either signedness for the result of a subexpression. If there are sets of choices which produce the mathematical value of the expression, the translator picks one such set; if there are no such sets, the expression overflows. For (1), the programmers will rant, but they'll know what to do. In this case, you might want a shorthand for "frombits(sint,x)", or a whole bunch of mixed-mode functions. (2) is tricky to implement. In the example above, one needs a run-time check, to discern whether (a+b) is signed or unsigned. And that's an easy example. (2) also violates a static-typing rule which some programmers appreciate: "The type of an expression is independent of its context." From: cg@GraySage.COM (Chris Gray) > > What should the rules be when mixing uint and sint as operands to > > binary operators? Good thoughts, thanks. And timely - I was just sitting down to approach this stuff. > There may be a more fundamental consideration than those rules. > I ask, which calculations are possible? > > (Let's pretend that integers have two bytes, so the constants are > easier to type.) > > --- > > Let > uint a := 40000; > sint b := -10000; > Here, (a+b) is 30000, which fits in both types; how does one compute > it? The obvious (a + tobits(b)) and (frombits(sint,a) + b) cause > overflows. Must one write this, for an unsigned result? > if b = MIN_SINT then a + 0x8000 > elif b < 0 then a - tobits(- b) > else a + tobits(b) > fi > Or does the translator produce all that when one writes "a+b"? I can't imagine anyone would expect or want a compiler to produce that kind of run-time code. However, it may well be the best thing to do when compile-time constant folding. I'll have to think about the MIN_SINT case. > Which signedness should one produce? It's easy when the expression > value is bound immediately; but often, it's just part of a bigger > expression. > > uint a := 2; sint b := -3; sint c := 2; > > Here, ((a+b)+c) is a small positive value; but (a+b) must be signed. > Whereas if > > uint a := 30000; sint b := 10000; sint c := -20000; > > Then in ((a+b)+c), (a+b) must be unsigned. > > --- > > I see two extremal alternatives, each of which should pique your > wrath. No doubt you'll pick something in-between. > > 1) One may not mixed signed and unsigned values in an expression. Well, I started out that way. Then I very quickly bumped into things like: sint i = ; sint j = i + 1; Error - you are adding a sint and a uint. This, I believe, would be just too annoying. However, allowing this was definitely the first step down the path towards the other extreme. > 2) If an expression has both signed and unsigned values, the > translator can pick either signedness for the result of a > subexpression. If there are sets of choices which produce the > mathematical value of the expression, the translator picks one > such set; if there are no such sets, the expression overflows. My current thinking is to do this, for compile-time constant folding. The big downside is that something that works at compile time can generate error message when generalizing it to run at runtime. This extreme can at least be explained fairly simply, using something like: <> When evaluating constant expressions involving uint and sint values, if a reasonable answer can be produced, it will be, without any errors being emitted. When compiling run-time expressions, however, the type of a single non-constant operand is the type that governs the expression, and a constant operand will be converted to that type, with an error message emitted if it is out of range. If both operands are non-constant, then they must be of the same type. > For (1), the programmers will rant, but they'll know what to do. > In this case, you might want a shorthand for "frombits(sint,x)", > or a whole bunch of mixed-mode functions. Functions aren't much more convenient than the explicit 'frombits', and are more obscure, in the sense that they are a bunch of new symbols that a reader must track down. We had talked about the unary '+' operator as being a converter from uint to sint. I haven't done anything about that yet, however. > (2) is tricky to implement. In the example above, one needs a > run-time check, to discern whether (a+b) is signed or unsigned. And > that's an easy example. (2) also violates a static-typing rule which > some programmers appreciate: "The type of an expression is > independent of its context." As mentioned above, I would never generate the run-time code to try to do this sort of thing. One of the reasons is that the type of the result depends on the values of the arguments, and that is not known until runtime, so the code that is generated could be arbitrarily long, if it had to track the resulting type through all the rest of the expression. (And then how would it emit an error message about incompatible types in assignment?) Also, in something like ((a+b)+c), I would never have the presence of the "+c" affect how the (a+b) operates. The choices of type would always flow outwards. I think anything else would end up nearly incomputable. Thanks! 2006-04-29/Saturday It occurred to me the other day that the code in Exec/Binary could be nicely tested using a test framework. The usual problem with testing compiler stuff is that you have to feed it source file after source file and scan the error messages produced, as well as the machine code generated. Well, in the Exec/Binary case it could be done entirely inside a single test program that reports on its results. That program would construct a Proc/Context_t to run each test in, and then would directly call Exec/Binary after setting up Exec_t's for the two operands. The test program would catch the error codes coming out of Exec/Binary, and in cases expected to be correct, could examine the produced tree to see if it is the expected result. Some sort of pseudo- language could drive this in a more automatic way, perhaps a routine that takes some strings that describe the test and the expected result. DONE 2006-05-02/Tuesday Spent quite a bit of time in the last couple of days working on the Exec/Binary code. It handles all cases that seem reasonable now, and is forgiving in regards to integral constants. I've also built a framework (bintest.z) to test it, and have started populating it. COMPLETED 2006-05-04/Thursday <> Some GUI thoughts from yesterday. In allowing users to choose the style of scrollbars, one choice could be an invisible scrollbar. This moves the contents of the window (if possible) as the user holds down some keys (e.g. CTRL-ALT) and moves the mouse over the window. Could also universally allow keyboard-based scrolling. E.g. CTRL-ALT and an arrow key. Perhaps adding SHIFT to the combo scrolls more. A bit hard to do with the fingers, so maybe its just CTRL-SHIFT and arrow keys. Hmm. Another possibility is to use the NUM-LOCK or CAPS-LOCK key to toggle (complete with indicator light) between key use of the arrow keys and scrolling use. It would be controversial, but maybe go with it: don't allow an application to override the user's preferences for things like scroll bar styles, etc. The application can only do calls like "bring up editor for style". The code for scrollbars, etc. reads those values directly and uses them. It is also invoked in verifying entered values. As mentioned long long ago, search up the package tree from the package containing the application data itself, looking for a preferences structure that specifies the preferences. Only needed once when the application starts. 2006-05-08/Monday <> Some vague thoughts... We may want to allow unreadable source. By that I mean normally represented stuff (so it can be recompiled for new platforms, properly parameterized, etc.) that cannot be viewed or edited. Sort of like the "private" procs I had in MUD. It might make more sense to do it on an entire package basis, however. That could simply be as part of the normal package hierarchy access scheme (corresponding to Unix file access checking). One imaginary example that would be good if we can allow it: A vendor has shipped a proprietary driver for their fancy hardware. Some specific situation or application is having trouble with it, and it cannot readily be reproduced back in the shop. So, if that driver was arranged as a package containing a bunch of private subpackages containing all of the real stuff, plus a few parameters or constants that control the rest of the package, then the user could modify those constants, causing the recompilation of the private stuff. A constant could turn on (more) tracing, for example. Or, it could enable more conchecks. 2006-05-12/Friday Finished all of the Binary testing yesterday. Don emails me with a MathTest.z - cool! Check through ProcCheck again. Sigh. 2006-05-13/Saturday <> Search Engine: - paid for by micropayments from the searcher. 1 unit goes to the search engine organization to pay for their infrastructure. 1 unit goes to the hoster of the target web page, to pay for their infrastructure. 1 unit goes to the data source, to pay for the data. The data source will sometimes be the hoster (like for my stuff). This latter may be difficult to sort out - perhaps it should be left up to the hoster - they can have agreements with data providers. - no web crawlers. Instead, people that put up web pages decide whether or not they want the pages indexed. They can do this on a page-by-page basis, and it is an explicit action. Their system remembers the settings, and if a page is updated, it asks if it should be re-indexed. Any initial indexing or re-indexing is done by sending the page and its location to the search system. - online stores can be indexed under a master index of such. It is searchable by location, general type of store, etc. The online stores do not have to pay for this service, but the usual 1 unit fee is charged to searchers. 2006-05-16/Tuesday <> Online petitions. These could be setup like a web site. They would allow people to record their name and email address on a petition. The petitions could be for anything, and aimed at anyone. For example, they could be toward a font company to release some fonts for use on the Z system. A mass emailing to the names on the petition could be done, but must be manually allowed by appropriate authorities (which is NOT authorities associated with the target of the petition). Straighten out the wasPrivate versus isPublic issue. DONE 2006-05-30/Tuesday Been doing lots of cleanups lately, like moving semantics out of the parsing code, and reducing the public interfaces accordingly. I'm considering a decision: dynamically created proc, types, etc. are not persisted. Only entities created within a package are persisted, as part of persisting that package. So, for example, the procs and types created by the Lists package are not persisted. One implication, I think, is that when packages are read from persistent store, any compile-time type expressions need to be evaluated, so that the dynamic stuff is created. E.g. when a tik_exec type is read in, the expression needs to be evaluated to produce the type. Actually, I think all eik_optimized Exec_t's need to have that happen. Right now, the code just aborts. 2006-06-10/Saturday Have been changing the representation of symbol references, to something that should work for persisting them properly. Still more to do. At some point, have to decide on whether or not a path can use symbols in a package's private table. If so, then a bunch of code in Package.z will have to change somewhat. 2006-06-22/Thursday <> There is another OS - ReactOS - http://reactos.org 2006-06-26/Monday Got the proc creation stuff done right, in regards to the handling of executable code outside of a proc, still doing Proc/AddFormal right, and allowing for the creation of builtin functions with native code. 2006-06-27/Tuesday Plan to do a writeup/discussion on inheritance/interface issues, using a GUI layout manager as an example. [Mostly done.] 2006-08-04/Friday <><> Could represent non-checking sections of code under a new type of Exec node. When generating code under such a node, the instructions generated do not need to be checking instructions. There would need to also be a do-checking node. For example, a 'for' loop over an array does not need to check the arithmetic on the 'for' variable, nor does it need to check the array indexing. But, other work in the body of the loop (e.g. adding elements from the array) would still need to check. 2006-08-10/Thursday There is an interesting choice for a programmer, and that is whether to use something line a oneof type and a bunch of case statements, or to use a bundle. With the bundle approach, all code dealing with one alternative can be together, and independent of code for other alternatives. With the oneof and case statements, the code for a given alternative is spread around, but all code for a given kind of work is together. The example in Z of the ExecInfo_t type (and the TypeInfo_t type) are ones in which the oneof is the right answer. One reason is that the security of the system relies on there being no alternatives outside of what is defined and handled by the core Exec code. Another is that there are numerous cases where a lot of handling is shared, and where multiple alternatives are essentially checked for together. <> An example at the other extreme could be the usual GUI example. There, programmers are encouraged to create new, independent alternatives. As long as those alternatives satisfy the interface, the system is happy to use them. My humourous example: someone might decide they simply must have a menu system that uses coloured bottles, selected by a hand-icon. Perhaps the colour and fluid level of the bottles serve to identify the choice. If the programmer is implementing a real menu system for a bar, then perhaps this kind of GUI menu system is just right! Note: think more about the desire for 'ro' on the record type used to bundle the API with the value. I definitely want to have it, but I also need to pass the value by ref to API routines. Likely need to special- case this for bundle API's. Do more when bundles stabilize. RESOLVED 2006-08-12/Saturday Working on making bundles work. A big decision I'm looking at is that of whether to copy the types in a bundle when it is instantiated. At the moment I haven't done that. A fair amount of stuff compiles, but you can't, e.g. select a field from a struct used as the type parameter to a bundle. Since I haven't copied the types, the type from the bundle itself has a tik_bundleParam where the generic type name appears. There are various consequences to the current (no copy) setup: - can't properly construct records exported from the bundle, both because non-struct generic replacements in such a type do not have any possible valid initializer, and because record types marked 'ro' cannot be constructed outside of the package containing the bundle. Also, the type of the generic fields of, e.g. an API record, contain the generic type parameter, and thus there is no real value, e.g. proc reference, that matches them. If the types in the bundle are copied, they will end up as being part of the package in which the instantiation happens, and so can be constructed only within that package. That's still a wee bit odd, since the common part of them is defined in the bundle. - can't properly access the instantiated generic type, as it appears in things like records exported from the bundle. - when creating a new instantiated record type, how do we know the size to allocate? This is mostly true when the generic type will be instantiated by a struct type, as in my first example. - when garbage collecting or ref-counting, the generic type from the bundle will not have enough information for complete analysis. There are likely more issues. It's ugly to copy all of the types, but it can be done. The copying would also have to be done when an instantiation is read from storage, but likely that just involves the same calls. Hmm. If I try to copy the types, do I end up having to entry all of the relevant field, etc. names into the symbol table of the instantiating package? What if that produces a clash? Have to check for that, I guess. Hmm. Perhaps we don't actually copy the entire types. Rather, we create tik_bundleInstance types. Arghh! That's what I intended to do, but never did - the "SkipOneInstance" calls are completely irrelevant. What's happening is due to my questionable kludge in the Package path code, where I allow a '/' path to follow into a bundle's containing package. That finds the symbols that are in the package (since they are put into the containing package's table), and so the resulting types from the instantiation simply have a pair of named-type nodes in a row, but no tik_bundleInstance at all. Hmm. Copying the types would mean that you can't work with two instantiations of a given bundle within the same package, because of name clashes. 2006-08-13/Sunday The first test program for bundles compiles and runs. Lots more to do, but this is a good step towards affirming that the concept works. 2006-08-20/Sunday The 'return' statement can exit lots of scopes. Need to clear up the stacks of pushed oneof's vars, etc. In main code and in ProcCheck. [Later: no, silly! 'return' exits the scopes only at runtime, not at compiletime. Nothing to do.] I thought there was an issue with dastardly souls calling the pop routines and then being able to assign to oneof vars, etc. However, I think ProcCheck will prevent code generation in such cases. 2006-08-25/Friday I've been working on the bundle checking stuff, and going round in circles somewhat. It's time to write some thoughts down more. Consider the declaration from my current version of the sortable bundle: bundle Sortable(type elementType) { public type SortElement_t = elementType; public type SortData_t = [] SortElement_t; public type Sortable_t = record { proc(elementType lhs, rhs)bool srt_greater; SortData_t srt_elements; }; }; Here we have a vector of the bundle parameter type, i.e. a vector of the generic type. What can be in the vector? In the intended use of the vector with the Sortable bundle, the elements in that vector will all be of the same instance of Sortable, and that instance is the one that is consistent with the srt_greater routine. If we allow elements of that vector to be changed, then there is no way to prevent problems, because there is no way in the language to know that the vector does not contain all consistent values. So, clearly, we cannot allow the elements of the vector to be changed, other than by the code that has instantiated Sortable, and thus knows the actual instantiated type involved. What about in the more general case? Do we simply disallow modification of any bundle parameter type values? Do we lose any functionality if we do that? Does it even make any sense to allow it? Current thinking has two kinds of values for bundle parameter type values: "ref" values are those which must always be passed to API routines as "ref" parameters - they are usually multi-valued types, i.e. structs or arrays. We do not allow the direct assignment of either of those, so assignments *should* be disallowed. Also, the fact that generic code has no idea how big the values are, or whether any parts of them are refs that need to be tracked, prevents any attempt to allow the assigning. The other kind of bundle parameter is the non-ref kind, all values of which must be ref values (I know, that definition sucks, but it is either that or have the "ref" on the bundle parameter, and not on any use of the type in an API - which is better?). Those we normally allow to be assigned back and forth, and we don't need to know the actual types, since all values will be properly typed anyway. However, we still can't allow assignments for something like Sortable's vector of values. One of the whole points to this bundle stuff is its use to create a record which binds together a set of values whose nature is not known to generic code, with a set of proc pointers satisfying the API allowing proper use of the values. The pair is supposed to be indivisible. If we allow assignment to either part of the pair, then we break that rule, and we will end up crashing at runtime if the pair is violated. So, it seems clear to me that we do not ever allow assignment to any value of the bundle parameter type. We also cannot allow assignment to the API pointer(s) of the bundled record, at least within the land of generic code. How do we do that? One way is to disallow an assignment if the destination type has *any* occurrence of a bundle parameter type. But, that could be hard to do, since assignments of the entire bundled record pointer are almost certainly needed? What we want is to require the bundled record to be "ro", but in sort of a reverse sense, in that it is neither constructable nor modifiable within the bundle that defines it, but it *is* both constructable and modifiable within a package which has defined an instantiation of the bundle. Ok, that may be do-able, but it still doesn't help with my current problem with checking for uses of the API. Part of that checking is ensuring that the ultimate source of the proc that is being called from an API is the same as the ultimate source of the bundle parameter value that is being passed to that proc. E.g. from the Widget example: public proc AddWidget(LayoutManager_t lm; Widget/Widget_t w)void: lm.lm_api.lma_AddWidget(lm.lm_theManager, w); corp; Here, the call to lma_AddWidget is legitimate because that proc pointer and the bundle parameter value (lm_theManager) have the same final source value (lm). However, what if the structures defined are such that a linked list is possible? Then, any number of different values for lm_theManager can be found, depending on how many record/struct field selection operations are done from the root of the linked list. And, there is no guarantee that all such values are in fact the same instantiation, and are therefore valid for the lma_AddWidget being called. If the struct/record in question contains more than one value of the bundle parameter type, they must be consistent, since we are going to prevent assignment to such values within generic code, and constructing them can only happen in an instantiated context. Hmm. Perhaps we get *very* restrictive. Perhaps the exact set of selection operations used above is the only set that is allowed. I.e. it must be: record-type: API record pointer proc pointer[s] generic type pointer[s] and thus accesses must be as above. Well, in the proposed code for the simplest sorting example: if srt.srt_greater(sd[j - 1], sd[j]) then SortElement_t temp := sd[j - 1]; sd[j - 1] := sd[j]; sd[j] := temp; fi; there is no API record pointer, just a single API function. Its no big deal to require one. In the sorting example, we may end up requiring some kind of "swap" API element anyway, since we have already concluded that we can't allow assigning to bundle parameter type values. 060826/Saturday Thinking more... A lot of the bad possibilities go away if the record types declared inside the bundle are forced to be "ro". I also need more checking when declaring a type in a bundle - a "ref" bundle parameter can only be the last element of a record/struct. In terms of implementing the checking, it again comes back (something like my first implemented version) to requiring that the common root be the first record type found (other than one containing the API functions). This is enough, and in most cases won't be an issue for users, since the user can always have a local variable containing that record reference. And, if the record is "ro", even if it has multiple fields of a non-ref bundle param type, they will all be consistent, because they could only have been put there by code in an instantiated situation. We don't actually have to worry about struct types declared inside a bundle, I think. They aren't initialized in constructors - their fields must be assigned individually. That means that bundle param type fields cannot be initialized in a generic context - only in the context of an instantiator. If there are any errors in things like instantiating a bundle, I need to make sure that no code can be generated for procs that use it. It might even be safest to not generate any code for any proc in a package after there have been any errors in the package at all. If the package is marked that way, then have to re-parse the entire package "source" to attempt to clear up the error condition. Proc's before any errors will currently get code generated for them, but I think that is just fine - they are clearly not using any of the erroneous stuff. The "sortbad.z" example in C/bundle causes code generation errors with no parse/semantic errors. Why? Fix. DONE. The vector of bundle parameter type values that is used in Sortable is a problem. What happens if someone stores a pointer to the array away somewhere, and then uses it later, independent of the srt it came from? 060827/Sunday If I don't generate code for any proc in a package that has had an error, then do I need Exec/ErrorSub at all? I can also likely do away with a lot of Exec/Error cases. Think real hard about this one! Ah, that was only a few seconds - the other use for ErrorSub is to mark a subtree as being in error, so that the pretty-printer can print it differently. Need to go through Exec (and maybe Types) code and make sure that the rule about skipping one name node is actually followed. It *seems* to be, but it should be checked carefully. Arghh! What I had done to get initial stuff working included some stuff to skip over a tik_bundleInstance in a couple of cases, and substitute in the actual type that instantiates the bundle. That allows things like field selection from that they. The Widget/LayoutManager example uses that all over the place, since it instantiates the generic types with struct types, and those need to allow field selection. However, in trying my nonro.z example, where a non-ref bundle generic is instantiated with "string", that replacement yields "string" as the type of the selected field (code in Exec/FieldRef that uses getInstantiation and checkForInstance). "string" is not compatible with any api routine, since they expect the generic type. 060829/Tuesday From comp.risks: Date: Sat, 26 Aug 2006 09:36:46 +0100 From: Ross Anderson Subject: Security Engineering After several years of argument, I've persuaded my publisher to let me put my book "Security Engineering" online for free download: http://www.cl.cam.ac.uk/~rja14/book.html My book draws on a lot of the experience shared in this list, and has become a standard textbook in the field. The publishers thought for years that it was too risky to let authors put books online but they are gradually learning that this isn't so. Putting a book online often increases its sales; more people read it and those who find it useful often go buy a copy. Enjoy! Ross Anderson, Cambridge University 060901/Friday <> Quick thought on a console/log tool. It would be good to allow the user pretty fully control over what is shown and how. E.g. allowing them to specify that all XXXX events be simply counted, and shown in a count box. The usual filtering would be needed as well. If the user chooses to have some count boxes, then an iconized form of the tool, in a status bar somewhere, could just show those count boxes - expanding the tool would then bring up the actual history console, if that is enabled. 060902/Saturday When do we want to allow type expressions? Everywhere? What implications does that have for type checking? (More calls to Types/SkipExec). Can we be sure of persisting them properly everywhere? 060903/Sunday <> Thoughts on GUI styles: Could have a decorated style that uses scrollwork (allow selection, etc.) and jewels/pearls/etc. for decorations. Could have different colours of gems mean different things, or be on different items of the GUI. For a "volume control" slider thing, could allow a variety of options, like different (including custom bitmap or custom scalable shapes) knob styles, different tick styles (one-side, both-sides, all uniform, longer every 5, etc.), container styles and background, etc. When drawing it, try to make tick marks be on uniform pixel boundaries, so that all tick spacings are the same. May have to extend/contract the range of slider a bit to do that. 060904/Monday "fl" is the taglet for both FieldList_t and FormalList_t. FieldList_t could use "fldl" if I care. I've removed a lot of SkipExec calls, cleaning up my bundle handling. Write a test source file that uses lots of computed types, and make sure they work properly in lots of contexts, both as target and value. Value can be done by defining variables with a computed type. Go through Package, and make private the entry points that are now guarded by the Phase-ed stuff. Note that the upper Z language does not allow common C-style iteration, etc. routines, where an arg is passed as "void *", and is then cast to something appropriate inside the routine being called for the iterations. However, this might work with a bundle, where the "void *" is the bundle type parameter. Try it out. Without Types/InstanceCompatible, there are four difficulties in the sortable.z test code: 1) The actual API record is not compatible with the srt_api record field in the XXX.Sortable_t record. (Record constructor) 2) The values array is not compatible with the srt_elements record field in the XXX.Sortable_t record. (Record constructor) 3) The "srt" value (the instantiated record itself) is not compatible with the Sortable_t formal parameter of the Sort routine. (Proc call) 4) The individual specific procs are not compatible with the generic versions in the API record. (Record constructor) Similar cases exist for the test1.z widget code. InstanceCompatible handles this by recursing, in parallel, through matrix and proc types, and matching when it finds an instance that matches up with the generic. The main problem is that it needs to skip named-type nodes to do that. It needs to skip them on both the "want" type and the "got" type. E.g. it hits tWant being SortableApi_t. 060907/Thursday Now have a version of Types/InstanceCompatible that allows all of my bundle tests, but does not allow things I don't want it to. It's fairly minimal, but perhaps more thought is still needed. I had thought at one point that I could allow bundle parameter types to be the last field of a struct as well as of a record. But, that then yields a struct whose size is not known. It could only be used via reference parameters. Can I ensure that? E.g. you can't put one inside some other struct or record (or in an array or matrix). Do a binary tree bundle, with insert, delete, tree-walk-print Do a printable bundle - expandible vector of values, with just a print proc to go with them. Allow for a recursive print, when one of the printable values is another printable collection. For doing compile-time code generation, one possibility is to have a routine, which takes three parameters. One is a reference to a proc. The other two are types. The routine clones the proc, but substitutes every use of the first type with a corresponding use of the second type. If the types are record, then the fields must correspond appropriately. Then again, maybe just wait until I have the parser, etc. in Z, and then just build up strings to parse into procs. Have to make sure that bundle param names are unique, and that a type defined in the bundle doesn't clash with one of them. 060909/Saturday Problems with bundles. Can't do binary tree, for example. Thinking about modifying bundle capabilities. Thinking about something a bit like the Pascal "with" construct. In my thinking above, I've got situations where I need to reference various fields in a struct in an array. Without pointers its painful and slow. Something that allowed me to temporarily assign a pointer to that array/matrix element would be useful. Thinking about that: do I currently allow such a thing to be passed via a "ref" parameter? I believe I do. That creates a pointer into the middle of some larger allocated object. How does that work. I think perhaps it does, but I'll have to investigate. RESOLVED 060910/Sunday Investigating the above. I don't see how it works. Very big sigh. E.g.: type Str1_t = struct { uint st_n; string st_str; }; type Rec1_t = record { uint rc1_n1; Str1_t rc1_st; uint rc1_n2; }; Rec1_t Rc1; proc t1(Str1_t ref st)void: Rc1 := nil; st.st_n := 3; st.st_str := "hello"; corp; proc t3()void: Rc1 := Rec1_t(1, 2); t1(Rc1.rc1_st); corp; The above runs fine. Printing Rc1 after running shows Rc1 to be nil. I can see no reason why the stores in t1 are not to freed memory. I do recall thinking about this before. Have to look through this diary file looking for references. I see nothing in bcRun.c . See starting at line 728 for early thinking. See also 863, 2702. [LATER: see 061016 (Oops - line numbers will now be slightly wrong.)] Note: cannot allow run-time "typeof" to be used by non-privileged programmers. This is because it can return the type of something that the programmer cannot normally look inside of, and would thus break information security. It wouldn't be a system problem, but it would prevent data hiding. A possible solution. Have two new opcodes. The first is "ref". It takes the top-of-stack value and pushes an extra copy of it onto the t-stack. It also does an incref on the value. The second is "unref". It pops a value from the t-stack and does a decref on it. When a proc call is being code-generated, pass a new flag param to bcComp, that asks it to "doRefs". If that flag is set, then whenever a ref value is pushed, emit a "ref", and count how many have been done. The "doRefs" flag is not set on recursive calls to bcComp for internal other proc calls. When a proc call code generation is complete, emit the counted number of "unref"'s. Note that the count must be saved around internal proc calls being code generated. The flag is initially set when compiling the actual parameter for a proc call for which the formal is a "ref" parameter. Something like this may work. Note that pushing the address of a simple variable, or an element of direct array/struct simple variable, does not require the "ref", since the storage cannot be freed while the proc call is active. 060929/Friday (Lots of Lego train show stuff. Now busy disassembling. Slashdot notes that as of today, the .GIF format is finally patent free. 061001/Sunday <> With the idea of having complex types with the code to implement them: A view-only mode should have an "edit" menu-item/button that puts one into editting mode. And likely vice-versa. 061016/Monday The fix for dangling "ref" pointers, using new "incref" and "decref" instructions, seems to work. Currently it could use some optimization for the cases that are occurring with the Widget bundle example. If that style of programming becomes common, the unneeded incref/decref are unwanted overhead. [Later: broken, replaced by another method] In the testing of the above (program test/bugtest.z), the last test doesn't free the last TheR value properly. This may be an interaction between the new incref/decref stuff and the handling of returning a ref value from a proc. [LATER: resolved by adding "decref2"] 061021/Saturday At some point, check through what happens when referencing bundle-related symbols (the bundle itself, and its contained elements) in ways that are not primarily expected in bundle usage. Also need to address the public/private issues, i.e. disallowing construction of a private type outside of the package/bundle. Can code assign to the fields of a bundle API struct? We should only allow code in an instantiating package to modify fields of the instantiated API type. In a sense, we should get type-incompatibility when comparing the generic form of types, i.e. a bundle parameter type is not even compatible with itself. But, that prevents any generic code at all from running, doesn't it?. Even preventing assignments is a problem - we can't have generic containers without being able to modify things whose type contains uninstantiated generic types. This problem verified in the "printable.z" test program - ins "AddToVec", I can replace the pra_print API field of one Printable_t with that from another - it SEGV's at runtime. I can even do that inside the using package - replace the StringApi print with the IntVecApi one. This doesn't SEGV, but prints incorrectly. Adding a recursive check in "Types/InstanceCompatible", where it is testing "tWant == tGot" does not catch anything in printable.z, but it prevents creation of the SLMApi record in the main test1.z file. So, that change doesn't fix the problem, and introduces one, in a sense. [It also infinite loops with mutually-referencing types!] Is there an issue with incompatible instantiations being present in a source instantiated type being assigned to a destination generic type? This may be handled by InstanceCompatible. 061030/Monday The "writeablePackage" of a record type is in the RecordDesc_t. Should the "definingBundle" (or some such) be there too, rather than in the NamedDesc_t that names it? Be wary of making assumptions about types having names. Names are used in the normal syntax to refer to types. However, since types are first- class entitites, they can come from other places. E.g. compile-time procs. E.g. practically anywhere if the user is creating code on the fly. I'm having trouble getting the system to do all of the needed checking on types exported from bundles. The latest one I found and have not been able to fix yet, is the issue of assigning to the member fields of the API record. Heck, I even allow assigning to the fields of the bundle data type, in code in the package containing the bundle. So, perhaps a bigger step is needed in order to be reasonably sure that all of the type holes are closed for bundle use. Perhaps greatly restrict the syntax within bundles. Perhaps only allow exactly two type definitions - one for the API record, and one for the data record. How limiting would that be? This seems to have trouble with things like bintree.z, mapping.z and symboltable.z, where we are trying to have a container specific to the bundle parameter type, rather than a container of generic values. In those cases, we are trying to have the API record in a different place than right with the individual occurrences of the bundle parameter type. Perhaps there can be multiple data records in a bundle? Only bundle data records can contain references to either the bundle API record, the bundle type parameter, or to other bundle data records? Can't have that last restriction, else we can't ever use bundles! The thing about a bundle is that we cannot allow the "breaking" of a bundle data record - the API record instance and anything containing bundle type parameter values must always be "bound". Pushing the restriction/idea further. Perhaps there are two kinds of bundles. The simple generic container bundle has two types within it - the API record and the data record. The data record must contain one instance of the API record, and one instance of the bundle type parameter. It can contain other fields as well. Hmm. "sortable.z" has the data record containing a vector of the bundle type parameter. Perhaps that example is really not of the first type - it does not mix sortable types within one container. The second kind would allow for 3 records in the bundle - the API, the element data record, and the container data record. The element data record contains one instance of the bundle type parameter (along with possibly other fields). The container data record contains one instance of the API record, and some way to have multiple instances of the element data record. That could be an array of them, or a reference to to such a value, if the value itself can reference further of itself (requiring a self-referencing record). As a special case, the "sortable" example just has the bundle type parameter itself as the element data record. Note that there probably isn't a reason why special syntax is needed for this - it ought to be possible to figure it out directly. It would need documenting of course. Since the creation of a bundle is a package thing, if we need the open bundle in the Context_t, we really need the Context_t to be something that is defined in the Package package. Otherwise, we need an entry point in package Proc that allows us to set/clear it in the current Proc/Context_t. It really should be set/cleared in the routines that are used to add a new bundle to a package. Doing this will be a large, but presumeably straightforward, change. It's worth noting that I started with multiple context records, and went away from that. It might work reasonably to just have one in Package. The stuff that code in Proc has to do could be done by having a Proc-owned mini-context inside the Package-owned main context. Still amounts to 2 context structs! 061031/Tuesday While gdb-ing some stuff, I noticed that the references to the name of a package var in the sr_ and vd_ structs are to different copies of the same string. In trying to prevent the /User/main routine in printable.z from being able to assign one pra_print function pointer to the vector of another instantiation, I see that the tWant and tGot types are the same type. The types of the two package variables (StringApi and IntVecApi) are not the same - they are bundle instantiations. However, in field selecting within them, we end up with the same type for the field. There is a bui parameter to assignIncompat. Perhaps the shortcut of identical type pointer should not work if bui is non-nil? Perhaps, but in this case both bui pointers (I extracted one from the RHS ex) are nil. That's because we only pass bui pointers down in things like record constructors, we don't pass them up in the recursive Exec_t construction code. Sigh. This came about from my decision to not clone the types in a bundle when the bundle is instantiated. By not doing that, types do not fully describe themselves with just the type pointer - they are a pair of the type pointer and the active bundle instantiation. I've also wondered in the past whether there can be multiple active instantiations. Can that happen if a type which is an instantiation of a bundle type is used as the parameter to another bundle? If I go the step of cloning the type structures, how would I handle procs inside a bundle? I don't have that yet, but I've been moving in that direction. Each proc structure, and each formal parameter in its type, can only have one type. So, the proc structures would have to be cloned as well. But, we specifically don't want to end up duplicating their object code, so the byte-code and Exec_t pointers within them should all point to the same copies. Alternatively, perhaps there should be a bundleInstanceProc entity, which would point at the one generic proc and the instantiated version. I was going to need such a thing anyway, to allow the "selection" of a proc from an instantiation. 061110/Friday I've ripped out the old bundle implementation and started on a new one. In this one I'm allowing multiple type parameters to a bundle, and I plan on cloning the types in the bundle, rather than trying to keep track of which types are within an instantiation's influence or not. That didn't work in preventing assignment of API proc's within the API struct, within a package that has full write access to the struct. It occured to me today that I either have to avoid matrix types within restricted structures, or I have to introduce the concept of an 'ro' matrix. With a linked list of 'ro' records, outside code can make no modifications. But, if a record contains a matrix, outside code can change elements of the matrix, even though it cannot replace the matrix. [2010: this is an issue even without bundles. It has been resolved by having a StorageFlags_t in matrix types which allows 'private'.] 061214/Wednesday Waaay too much NALUG stuff. Ick. For scripting work, could have a utility routine called, e.g. "Run", which is an ioproc, thus accepting variable arguments and formatting codes, but which emits code which sends the resulting string through the default (or specified) command line interpreter. [Later thoughts on command procs makes this quite different, and likely irrelevant.] Could also have an ioproc routine, e.g. "Str", that simply yields a string from the formatted result. More efficient than a bunch of string concatenations, conversion routine calls, etc. [Done: Fmt/FmtS] Might want to allow "ref" local variables - they could be used instead of something like a "with" statement, to allow easy reference to a larger entity. If we go with '@' as an explicit en-ref operator, then we might have: [SIZE1, SIZE] BigStruct_t bs; ref LittleStruct_t ls := @bs[index1, index2].bs_littleStruct; ... ls.ls_field1 := ... Note that the referenced objects are not individually dynamically allocated, so when garbage collecting through them, the type must be carved off from the outer type, rather than being retrieved from the entity itself. Is this a lot like C++'s ref values? Are there issues? Where can ref values be declared? We initially have proc parameters. Now local variables. Is it possible to have ref values as fields of structs/records? The key is that no ref value can be allowed to survive the exit of the proc in which it is created. So, there must not be any globally accessible ref values. ['@' locals added] 061214/Thursday <> Perhaps there could be something, like a "crate", which contains a bunch of inter-related packages. The packages in the crate would reference each other, and things outside the crate, but there would be no direct references from outside the crate to inside it. So, it would be relatively easy to unload the entire crate. There must be *some* kind of references to the crate, else it is not usable. <><> When the "view" command is used to view something (read-only edit), (or editing, for that matter), there can be 3 modes, depending on the user preferences: 1) each viewed item opens in a new window - maybe a suboption for whether a proc/etc. in a package which there is already a window open for gets a new window, or just repositions (and raises?) the existing one. 2) viewed items of the same kind (video, audio, code, rich text document, etc.) use tabs in one window 3) items of the same kind use sub-tabs in one master window, and the tabs in that master window are for the kinds. The shell window itself is a kind, so all shells would be under one tab. 061218/Monday Some good work on the weekend. Need to remember to put checks in the new version of bundle stuff to make sure that parameters come from the same instance of the bundle as the API proc. Perhaps passing multiple values to an API proc can be made safe so long as the source of the values, and of course the API, is all within a single parameter (is that needed?) to the proc. So, a generic binary tree package would always pass the tree and the API into procs in the generic package, and so things like element comparisons could be done in callouts through the API. I'm thinking that with the actual type instantiation that is done now, the normal type checking will keep things right. Reducing the number of special cases has got to be good! Given that the API routines I'm using do in fact modify the values in the record they are passed, perhaps the record should not be ro? But, if the calls that are modifying them are in the bundle which defines the record, that should be OK, shouldn't it? Interesting. In my "test1.z" bundle testing program, the reason I couldn't initialize the API records was the fact that the names of the parameters in the instantiation routine were not the same as the names in the generic API proc type. The proc type used, e.g. "w" for a widget, but the instantiation routines used the taglet from the instantiating type. Perhaps the symbols shouldn't matter, especially if I implement the thing mentioned way back somewhere, where proc's used via pointers have to be the exact type, and, to handle this case, the syntax for that allows parameter renaming? A bit icky. I'm thinking something like: type ProcType1_t = proc(uint a, b)uint; ... ProcType1_t realProc(uint row, col)uint: ... use row, col ... corp; When defining the real proc, the type comparison would specifically ignore parameter names. One downside is that all proc types would have to be named. [Resolved with the ':' syntax.] At this point, after renaming instantiating proc parameters, the only errors left in test1.z are because an instance value is not compatible (in this case in a proc parameter position) with the proc parameter type in the generic proc being called. Grr. That's not so easy to handle. When we give a name to the instantiated type, that is through normal type definition, and we get the unnamed type from the bundle instantiation. So, we have lost the nd_containingBundle and the nd_containingBundleInstantiation. Perhaps Types/SkipOneName should skip an arbitrary tik_named, and then skip another one if the inner is an instantiation? Then, in pExec.c/parseBundleSelection, we actually do want nd_namingType there. And then Types/Instantiates must look for and skip one level of name in tGot. 070104/Thursday The above appears to be the final stumbling block. I modified Package/InstantiationType to return the nd_namingType instead of the nd_subType, then changed test1.z to always directly use the type from the instantiation (not using a new name for that type). Everything then compiles and runs. I don't at all like that as a permanent solution, however - its just too ugly. Hmm. What if I change the code that defines a named type to inherit the nd_containingBundleInstantiation of a named type from an instantiation, and to then skip that NamedDesc_t node? In other words, there is a special case where when naming an already named type, the double naming doesn't make the types differ. Its an exception, but it might be excusable. Hmm. What will happen with instantiated procs? Do they need to be able to have a generic type accept an instantiation of it? I'm hoping not. 070113/Saturday Finally getting back to it. Nearly all Lego away. Having to re-learn a lot of stuff, sigh. One issue that I saw earlier, and has popped up again with printable.z, is that now I need the names of API routine parameters to match. That's because Types/sameParamLists checks the parameter symbols. Note that if an 'ro' type is used to instantiate a bundle, the 'ro' attribute is lost, in the sense that calls in API members in generic code can pass a reference to the object to those API procs. Now, if the type is, e.g. an 'ro' record, then only code in the package that defines the record can modify the fields of the record, and so only that package can define procs that could be used as those API procs. However, the package writer must be aware that public functions can be used in this way. It likely doesn't matter - if there are public functions that can modify fields of the 'ro' struct, then it is hard for the package to put restrictions on when those functions are called, so having them called from some bundle's generic code won't matter. 070114/Sunday Refer back to 061218. If I want to use that syntax, and to force exact matches on proc types, then I need to change the rules for assignment some more - if the type after skipping a name is a proc type, then the code must go back to the named type, i.e. un-skip the name. Also, if I am wanting to force that kind of consistency, then, in the cases where I want to do that, do I really need the ability to rename the parameters? Clearly, when using bundles, I want to allow parameter renaming, but do I need or want to use a named proc type for bundle API routines? Does this differ from the desire to use exact typed procs for other reasons - the use of the type name makes it explicit, when defining the proc, that the proc can be used for that purpose. Hopefully it would work out for proc predefinitions, but I'm thinking that wouldn't strictly be necessary - if there is a recursion cycle, then just predeclare a proc in that cycle that isn't defined using a proc type name. If I do this proc-type-name stuff, then presumeably the parameter names are no longer significant in proc type comparisons. Recalling now, I made them significant so that procs with accidentally equivalent signatures couldn't accidentally be misused in an incorrect way. Note the grammar issue, that has to be disambiguated by looking at the token after the symbol: type ProcType1_t = proc(uint a, b)uint; ... ProcType1_t ProcRef; ... ProcType1_t realProc(uint m, n)uint: ... corp; If I do this, I want to allow bundle use without proc type names anyway, so don't worry about trying this right now. For now, just remove the formal parameter names from proc type equivalence. <> Allowing changes in things that define the types exported from packages will be difficult. Perhaps, if there is a way to track that such a type or value has actually been used, then it cannot subsequently be changed (e.g. a field added/removed, an array size changed), unless that is done by created an entire new version of the package. Back to bundles... What happens if the formal parameter to a proc in a bundle API is of a type declared in the bundle, which has a bundle parameter type within it? We can then pass in any generic instance of that type from within the bundle, but is it still viewed as generic inside the actual API proc? What happens is that the actual proc is not the correct type for the API field. This is because the actual proc has the parameter type being the straight generic type from the bundle. The API field, being in a type instantiated from the bundle, has the parameter type being the instantiation of that generic type. Thus, the two pointers are not the same, and so the proc types are not the same. This is the rule of two names types never being the same, even if they are naming the same thing. Actually, in this case, one of the named types refers to the instantiation, and the other does not, so they are not really equivalent. Cannot declare variables of the type of a bundle param, since the representation of those has a size/alignment of 0, and routine Types/CheckDeclarationType disallows it. Types.z/addField special cases this, so allows adding bundle param typed fields to structs/records. 070115/Monday If records defined in a bundle are treated as the proper "ro", i.e. are read-only everywhere, but their instantiations are regular "ro" for the package that does the instantiation, then I don't see a good way to do something like the binary tree bundle. There is also the problem of forward-declaring the tree node type within the bundle. I'm going to go with the restriction for now, and perhaps try to find a safe way to relax it, in the future. There may not be any reason to restict "typeof" to system-level code. The thought behind the restriction was so that types not exported from a package could not be used, created, etc. outside of that package. However, the Package_t type itself is not private in that way. So, there is nothing stopping someone from looking up a private type in the private symbol table, thus getting a reference to the type structure. From there, dynamically created code can deal with items of that type just fine. This does suggest that all private record/oneof types should be "ro". Carrying this further, what is there stopping someone from adding a proc to a package (using dynamic code creation), and then using that proc to create objects that are 'ro' to that package? Basically, nothing, except the "ownership", at the system level, of the package (in particular the Types, Proc and Exec packages). Fiddling with "nonro.z". When a name is given to a symbol from an instantiation, it does *not* appear to be equivalent to the long form (selection from the instantiation). For an API type, it appears you must use it directly from the instantiation, if you want it to be useable when creating a bundle value. But, that same API value works when creating a bundle value either directly with the name from the bundle, or with a symbol that names the type from the bundle. Seems inconsistent, and undesireable. Patched in assignIncompat. Note that all bundles are made public, just like nested packages. Should they be? I don't see any particular reason. 070118/Thursday symboltable.z working for a minimal test. Check to prevent trying to index an array whose element type's length cannot be determined because there is a "ref" bundle parameter type in it. DONE - can't declare such an array/matrix. Check out that bool flag in record descriptors. Is it used at all? LATER: it is only used for a run-time check on record constructors. I've changed Types.z/Types.c to actually set it when appropriate. 070120/Saturday Z => Uni ??? UNIversal, UNIque, UNItary 070126/Friday <><> Have a standard representation for a text input history. This would be used for shell input history, browser form input history, etc. It would be persistable, thus acting as .bash_history as well as browser-preserved form history. To make shell history work better, introduce the concept of naming shells. E.g. "name gd1". The name would be used when preserving and restoring the history, could be used in the icon/tab name, etc. <> Maybe have an icon on a bar somewhere that represents in some way whether anything has the sounds system active. E.g. a slow pulse, or perhaps a more noticeable pulse when the sound system is activated. <> When producing a minimized "object file" for some use, such as for a standalone device, can go through all packages that are referenced, and write out versions that just include the items that are needed. Of course, for such an "object file", we really only want the native object code and and referenced variables and addressable constants. 070127/Saturday <> Some thoughts on a retro-computing "Empire", triggered by Chris T's running it on his old NeXt. Everyone will be running Z, so there is no reason not to have a full GUI for the interface. Perhaps also have a command-line version for those who want the full retro effect. Make it *very* scalable, i.e. hundreds, perhaps thousands of players, with 1/10 connected at any given time. The BSD Empire is likely too complex for the average potential player to just jump into. It's too complex for me, jumping in mostly cold, and I've spent many hours playing our old Peter Langston Empire. So, options in the server, set on world generation, could control features: - no nukes, uranium mines, or anything to do with nukes/satellites - no oil wells or need for it for production - no food, or need for it to keep things running - no "ucw"s, whatever they are - no weather, forecasts, weather stations - no plague, research stations - everyone has same tech, so no technology centers - no flying, planes, airports, aircraft carriers - no subs (maybe this one not worth it) - no ships/ports at all - a land game only - no treaties (don't seem much use anyway) - no trading (isn't used much anyway) 070201/Thursday Note that in symboltable.z, I cannot currently create the table, or expand the table, inside the generic code. This is because the table entry types contain a value of generic type. However, in this case that value is a pointer, so perhaps we I can allow the creation, using the "matrix" construct. Oops, no. The matrix creation is currently allowed, but the construction of the SymbolTable_t is not allowed. That seems to be required, since we don't have an actual instantiated type to attach to the newly allocated record. But, I should be able to grow. The code currently allows this, but it has the same issue - what type pointer an I attach to the newly created matrix. Clearly it must be attaching the generic matrix type. But, that does not let me properly do refcounting or garbage collection. To be able to do the creates properly, what is really needed is to pass the BundleInstantiation_t into the generic routines, so that they can extract the appropriate types. That could be done as a hidden initial parameter to all routines exported from a bundle. The compiler would then need to generate run-time code to extract the needed type from that instantiation, and pass it to constructed matrix and record constructors. A job for compile-time code? 070202/Friday The use of the allocation *could* work just as is. The decRef code and a garbage collector would simply have to get the type pointer from the actual dynamic object, when they encounter a non-ref bundle param type. Am I willing to go in that direction? Well, util_freeRef, which is the top-level entry point for freeing something, does not take a type parameter - it obtains it from the pointed-to region. - cannot allow the construction of a record which contains a field of a "ref" bundle parameter - do not the size to use. - all of the values used in the constructor must have the same base value (similar to the proc and its parameters in an allowed call). I think that should provide the same protection - there is still no way for a programmer to put together values from different instantiations of a bundle. - cannot skip past array/matrix indexing when determining the base of an expression. The array elements can, in the generic case, be created from different instantiations of that type in the bundle. - if I add "ref" variables to the language, then they cannot be used as a common base for the proc call or constructor cases (or any other that might come up). The reason is that, by their nature, the value referenced by a "ref" variable can change, and it can change from being a value from one instantiation to one from another instantiation. Thus, the common "ref" variable base is not enough to prevent the combining of values from different instantiations. - this is likely already there, but when checking that values have the same base, there can only be one non-proc field selection that is used from a given record, in the entire set of values being tested. 070203/Saturday The "ro"-ness of records declared in a bundle needs to mean something else. In the "detail" and "printable", etc. examples ("containers"), the "ro" means that generic code (dealing with the generic types) cannot construct or modify the "ro" records. Code dealing with the instantiations can construct and modify them. However, for examples like the "symboltable" one, it is code within the procs in the bundle that can create and modify the records, and no code outside of it should be allowed to. One possibility is that records defined in a bundle are *always* "ro" from the point of view of code outside of both the procs in the bundle and procs in an instantiating package. The "ro" then just controls which way the attribute is between procs in the bundle and procs in an instantiating package. That could be confusing. Perhaps invent a new reserved word, like "internal"? Need a better one, that expresses the condition better. I wonder if I want to split up the two methodologies? The term "bundle" was invented for the "bundle" constructed from an API and a generic value. I briefly thought here of going back to something like the old concept of "type package". Perhaps "generic" instead of "bundle"? Hmm. Perhaps the current "bundle" becomes "generic", and the reserved word "bundle" is used on record types within a "generic" that are not buildable/changeable within code within the "generic". 070205/Monday Something to watch out for (and is already a problem for array/matrix bounds arrays, I think): Currently the elements of a matrix are not read-only, even if a record field pointing to that matrix is read-only. So, I need to do something about that if I want to switch over to using matrix values for things like the arguments for proc calls. The reason for wanting to do that is that matrixes take less memory than an equivalent linked list. One possible solution is to make the "ro"-ness of a record field extend to the values referenced by that field. Currently, if an "ro" field is an array or struct, then the entire array or struct is "ro" (hmm, do I do it right when going into the fields of such a struct - can a struct have "ro" fields of its own?). I could extend the record field's "ro" one level into values referenced by it. So, elements of a matrix are covered, as would be fields in a referenced record, even if that record itself is not "ro" and its fields are not. What about going even further, and having the "ro" extend throughout the whole possible set of values referenced by the field? If I did that, it likely means that that whole set of values is now immutable, since you can't pass an "ro" value to a proc that doesn't have it so marked, so even procs in a package that exports some record type can't now further modify the fields of the record. [Resolved - 'private' matrix attribute] Some simple syntax changes that should make the language better: - the use of "public" to indicate that a symbol is exported from the defining package should be replaced by "export". It's used in the verb sense, rather than the adjective sense. DONE - the above frees up "public" to be used for a lot of the current "ro" uses. In particular, its use on record and oneof types should be replaced by a scheme which uses "public" in the position. Note that this reverses the default - now the default is non-public. There is no explict "private", however - that's a pain for a pretty-printer (the structures would need to remember whether there was an explicit "private" or whether it was implicit by the absense of "public". DONE - the current "ro" becomes like C's "const" - it indicates something that cannot be assigned to. Note that ref parameters, and the new ref variables if I add them, are always "ro" in the sense that you cannot change the ref (the pointer itself). So, the "ro" on a ref parameter or variable applies to that which is being referenced. This use of "ro" instead of "const" is because I'd like to keep the term "constant" unambiguous. - I can keep "volatile" the same as in C. - I want to make en-reffing and de-reffing fully explicit. So, when you pass a value to a proc in a ref parameter position, you must preceed the value with the '@' en-ref operator. Similarly, when you declare and initialize (with '=', I think would be best) a ref variable you must put an '@' in front of the initial value. When using either kind of ref value, you must use a trailing '@' to get at the value being referenced, either to fetch or assign to the location - note the paragraph above on "ro". DONE - I think the declarations of ref parameters and variables should switch to using a leading '@' instead of the word "ref". This is a bit icky, I think, but is consistent with the use of '*' for pointer types and the dereferencing operation. That also suggests that the address-of operator should be a prefix '*' instead of a prefix '&'. I think that should all be fairly unambiguous during parsing. Hmm, it will require a smidgen of look-ahead, since a statement starting with '@' or '*' could be an assignment statement or a declaration. (Wrong - it can only be a declaration.) An advantage of this is that I can now freely use the term "ref" as an abbreviation for "reference", and speak of "ref types" - those types whose run-time representation is a pointer, but that isn't an explicit user-manipulable pointer (records, oneofs, matrixes). ['@' done. Did not change '&' to '*' for pointers.] - as I mentioned to Roel the other day, I've realized that there are (at least) two distinct ways that I'm using what I currently call bundles. One use (for generic "containers" that can contain generic values mixed in the container), is the use for which I originally chose the term "bundle". The other use (so far!) relates to sharing code by having procs which are run-time generic, in that they can work with values of multiple types, but only with consistent sets of values in any given call. In this model, I've been putting the procs themselves inside the bundle, since the proc declarations contain uses of the generic type parameters named in the bundle header. I would like to change things so that the term "bundle" (whether as a language reserved word or not) is restricted to the first usage, where it makes sense. So, I need a general replacement for the current use of "bundle". The problem I have with "generic" or anything close to it, is that it is an adjective. "bundle" is a noun (OK, it can be a verb too, but the language has been using it in the noun sense). So, I've been talking about bundle instantiations. The meaning I want doesn't come out of "generic instantiation". Instead you have to use "instantiation of a/the generic", which is longer, and doesn't compress well into programming language identifiers. So, I'm seeking an alternative to "bundle" that is a noun with a meaning strongly related to "generic". [Gave up - I use 'generic' as a noun.] <> URL for the thesis behind Erlang, on reducing errors in large projects: http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf 070206/Tuesday Reply from Don: The nominal form of "generic" is "genus". (See [1], sections 9.4, 6.12.) Hey, an instantiation could be called "species"! :-) [1] Reble, "The R programming language" (Aug 5 2006), RSC press, IBSN 0-13-114977-1. >> My dictionary hasn't that meaning. It has, however, this (third) >> meaning: "having no particularly distinctive quality or application". >> That rather suits the average programmer's work, n'est-ce pas? > That sounds right. And "genus" is still a relevant nominative for that > definition? No. Thinking about "tool". Checked online thesaurus: gadget? Neither really expresses the genericity. Perhaps make up a word, like "gener". "extensible"? Still an adjective. "omni"? "gener" is good except that it is also short for "generator". "omni" is pretty much "universal". 070207/Wednesday Could allow the syntax '[' [ '::' ] ']' to be a string character-select and string substring operation. [Later] <> Could use the syntax '$' to mean a semantics much like property selection in MUD. A *run-time* search is done, up the package tree from the current position, for a variable with the given name. In a reference, the value of the first found is used. A default value or something is yielded if none is found. An assignment to such a variable persists the value in the most local package. Need to declare the type of these somehow. Perhaps a value in the package declared as an initialized value (which I don't have now, but need, for things like files in the package), is a findable value. Such a declaration without a given value is a declaration of the variable, mostly giving the type, and a name to look for. E.g. type PrefConfig_t = struct { uint count; string name; } ... some other package ... /PrefConfig_t $PrefConfig = {10, "fred"}; ... /PrefConfig_t $PrefConfig; /* Yes, same name! */ .... if $PrefConfig.count = 10 then ... fi; Handy in "shell scripts" would be a syntax like $(), which interprets the using the currently set interpreter, which is typically the CLI. 070209/Friday Might want to allow string concatenation to include character values. That would go nicely with the above. Alternatively, the substring syntax above could simply produce single-character strings instead of "char" values. [Both done, but different substring syntax] 070211/Sunday (and earlier) Add a ref type kind. Let errors come out based on type issues with it. Much easier and cleaner to implement than trying to check directly for misuses of ref values. An iteration ability would be useful in CLI scripts. <><><> Could have a "short doc" (or specific "ls doc") in packages. Show it at the top of an "ls" listing. <><> When a user starts typing a piece of code directly in a CLI window, what happens? Could be done similar to in AmigaMUD client. As with that, it would be very nice to not require an explicit "proc" around the code. As with AmigaMUD, have the prompt change when successive lines of a proc or immediate statement input are needed. Could actually allow direct editing within the region (and growing downward/scrolling as needed) of the CLI window that contains the code being entered. Not quite sure what the magic event is that triggers the execution of the code. There should also be a key that triggers a new tab in the window, just for editing that piece of code. What happens in the CLI window then? 070216/Friday Issue at work was overflow in unsigned integer arithmetic. The fix was to add parens to force the evaluation order. In A * B / C, may want to do the B / C first, else the A * B could overflow. How do in Z? Later: I think it just works - I don't re-order the tree, so the lack of an explicit representation of parentheses won't matter, because I will generate them based on the operator precedence. I hope! 070218/Sunday <> %%% It is going to be nearly impossible to get rid of refcount fiddling on records, etc. Sigh. I had been thinking about having an attribute on types, related to "ro", and related to something being not writeable in this package, that let me know that the memory that an entity occupies cannot be changed by this code, and so no ref-count work is needed. However, what if the application is multi-threaded? Then the other thread(s) could be doing something that affects, say a linked list that is being traversed. If we don't do an incref when our local variable is referencing a hunk of memory, that memory might get freed out from under us, and possibly be re-used, thus causing us to perhaps get an invalid pointer to something that we are allowed to modify. Can there be some kind of attribute that says that this cannot happen? The lack of "volatile" says it in some sense, but how could that allow the system to do things such that the guarantee can be made? Note that re-doing things as matrixes rather than things like linked lists doesn't really help, since an array of references is just as vulnerable to the elements being freed. A matrix of structs is OK, since there is no pointer involved, however. Perhaps some sort of per-process lock on the memory stuff. If that is set, then any refcount operation that would free something must instead simply add the pointer to an array/list of such, which are then freed when the lock is released. Argh. Prefix '@' is ambiguous - it can start a type or it can be an en-ref operation. Handled that. 070219/Monday When taking a reference ('@') of something like a local variable, its true that the value can never be nil. However, when taking a reference of something like a field of a record, the record pointer itself can be nil. So, I think I need a nilchk instruction that can be applied to the record pointer, before any offset is added to it. Need to prevent the use of ref types as struct/record/etc. fields, as elements of arrays/matrices, as proc result types, argument for 'eval',... DONE 060225/Sunday A matrix type can be viewed as similar to a pointer type. One value, the pointer, references one or more other values. So, it makes sense to allow 'ro' and 'volatile' attributes on the elements of a matrix. It does not make sense for an array, since the field or variable that is of an array type specifies those properties. So, I could have: [] ro volatile uint as a valid type. The elements of the matrix cannot be modified, and must be loaded/stored using the volatile attribute. This idea can resolve the problem from 070205 where I want to have matrixes of values in Exec_t's instead of using linked lists, but I don't want to allow external code to modify the elements of the matrix. Since the elements of the matrix are fully 'ro', even the code that creates the matrix cannot change it, it would seem. However, the attributes are part of a type, not part of the matrix itself. So, one creates the matrix without the 'ro', using perhaps a local variable to refer to it. In that way, fill in the matrix. Then assign that matrix to a 'ro' matrix destination. That assignment needs to be allowed, in exactly the same way a corresponding assignment of a pointer value needs to be allowed. For example: * uint p0; * ro uint p1 := p0; /* Legal - 'ro' attribute is being added. */ Similarly: [] uint a0 := matrix([count] uint); ... fill in a0 ... [] ro uint a1 := a0; /* Legal - 'ro' attribute is being added. */ These need to be handled in "assignIncompat". [Much later: 'private', 'ro' and 'volatile' are there.] <> On a completely different subject, I woke up with a conceptual image of parallel distributed computing in my head. Nothing earthshaking. The basic is the concept of a container representing a chunk of work to be done. The chunk is mostly self-contained, and only needs to interact outside of itself in limited ways. My mind thought of the container as a balloon for some reason. What gets stuffed into the balloon are virtual machines on which the work can be done. There are communication and interaction channels among the balloons. The key is that they are among the balloons, not directly to any of the computation resources inside the balloons. As the computation proceeds on the resources in the balloons, it can require input from other balloons or send messages to them. Those inter-balloon connections are the only ways that the various mostly-independent computations communicate. Nothing is said or indicated about the physical local or connectivity of the resources inside the balloons. That means that the inter-balloon communications can be over long distance links, or directly via shared memory. 'Nuff said - nothing earthshaking. 070228/Wednesday Testing more of the bundle test programs, after changing "ref" to '@' in proc headers and inserted '@' as needed. All ref parameters in API's currently need to have "ro" after the '@', since the call of those API routines is happening in generic code, which has no write access to the generic values. Can I do anything about that, and if not, how big a limitation is it? Getting tsp mismatches on bundle/test1.z and test/listtest.z . Chasing that, I've found that in test/ctime.z, I'm getting error messages concerning the run-time types returned by proc test3(). The problem is that it is returning a tik_exec, and things like Exec/BinaryStart are not skipping a tik_exec node. Should they? Should I strip it earlier, when declaring variables (similar for the result type of "weird1()")? Should Types/SkipOneName skip a tik_exec? [Later: tsp errors were caused by a structure mismatch for Proc/FormalList_t between the Z and C versions.] 070301/Thursday I think the answer is that any number of tik_exec's should be stripped when a value is obtained from a declared symbol (variable, parameter, field, etc.) or is produced via a cast (frombits). Basically, remove them all right at the source of them. Note, however, that they need to be preserved in things like declarations, code, etc. so that pretty- printing gets the correct expression for them. 070304/Sunday <> This thought is from a couple days ago. Do I actually need to push ref values on both stacks? Don't I know what kind of value is in use for each instruction, and so can push/pop from the right place? One issue is that of addressing proc formals - if they can be on different stacks, it gets a bit messy. Still handleable though? 070305/Monday <> Odd-ball thoughts on "eye candy", prompted by a Slashdot article today. Could have a "Lovecraft" theme. Main image on desktop is a head, with a bunch of moving worms for hair. Various worm heads, or perhaps tears in the skin, are clicked on to perform actions. The fun comes when you insert hardware. E.g. when you insert a CD/DVD, a big worm comes out of the mouth, grabs a disk, and pulls it in. Perhaps with blood. A digital camera on USB could be the left eye, and a web-cam the right eye. Etc. test1.z (current GUI bundle test) doesn't work with new '@' ref stuff. There is a "*** wantInc still set in bcComp" for setSize, after a pair of complaints about assigning through a ro '@'. Well those latter are correct - without the 'ro', I get type compatibility errors, presumeably because the value I am en-reffing to pass to an API proc is ending up as 'ro' in the context I am using it. Also get a pair of "Exec error #134" for "createSLMD". First, chase the 'ro' need. Am I just hosed? 070311/Sunday <> %%% May not be possible to have generic code which reads bytestreams and turns them back into arbitrary structures. This is because doing so can allow the violation of consistency constraints within those structures. Consider Exec_t structures. They *must* be built via the published Exec/ interfaces. Consider what could happen if someone was able to modify the bytestream (e.g. opening it and modifying it on disk), between when it was created and when is turned back into internal structures. Possibly there could be a flag/attribute on the type itself, that prevents this. Possibly a function with a "well known name", which if, it exists, is used to vet every re-assembled element. The Exec/ one could simply aways say "no". 070312/Monday After various fixups in the test1.z program, I have no "public"s in the bundles, and only one error message: test1.z(358, 41): *** Value is not compatible with parameter 'lm' The problem here is that the SetSize interface requires write access to the layout manager, in order to set the size it will use. In this context (Container/Create), we don't want to directly create the LayoutManager, since we want to allow our caller to pass whatever kind of layout manager they want. But, since this code is in the Container Package/Bundle, it does not have that write access to the LayoutManager. The actual line in question: lm.lm_api.lma_SetSize(@lm.lm_manager, width - 2, height - 2); is fairly safe, since it is passing the LayoutManager to a function obtained from the LayoutManager's API. Hmm. That, pretty much by definition, should be allowed write access. How can I detect and allow that? The "samebase" stuff can tell me that I am in this situation, but then I need to get that info into Exec/CallAppend code (actually, into the callParamCheck code). Hmmm. With my new '@' parameter, the error is coming back as "type mismatch" - the code already special cases "moderr_privateRecord". So, I need a moderr_roRef - wait, there is one - is it the right thing? Looks it! Bah - after passing that check, which is only for "modifiable" for an old-style ref parameter, we still call assignIncompat, which is where the error is coming from. And now, after getting rid of the 'ro' on the lma_addWidget API, I now get three errors. Ok, adding a parameter to "assignIncompat" that allows for non-'ro' makes it all work. Too tired right now to check if that all really makes sense. 070313/Tuesday These are raw notes from paper, written over the last couple of weeks. Some are no longer relevant, others are still quite relevant. 1) Add ref type, and use. Let errors come out based on type issues with it. Much easier and cleaner to implement. [Later: hah!] 2) iteration ability could be useful in CLI scripts 3) Could have "short doc" in a package. An "ls" of the package shows that at the head of the listing. 4) What happens when start typing code in CLI widow? Could be similar to AmigaMUD client. Nice to be able to not require user to put proc around it. Should be able to allow edit of code, even multi-line, right in the CLI window? IsRefType => IsAllocedType [DONE] Any use of "ref" is too confusing (e.g. IsRefCountedType). "indirect" doesn't work, since it could also apply to the new ref types and pointer types. Don't need new IsRefType, sice disallow naming of new ref types. Also, disallow as field, array element, etc. DONE <> Maybe something like threaded code for native code? Perhaps use a reg to point to a struct of RT routines? How far can a PC-relative JSR go? Can I put RT routines near zero in virtual address space? Could then call them via a zero-relative address. Bundle generic, instantiated with a struct, actually has fields "owned" by two different packages - the bundle one and the instantiating one. Who can write them? This affects the need for 'ro' in the API funcs. Generic code cannot change instantiation type, so it can always have write access. So, how does the instantiator init their fields? Hmm, by creating, since the bundle exports the generic record type publically? If they don't want it public, then perhaps require extra indirection to an instantiating record? A lot of checking is easier with '@' not being a type. E.g. special handling of 'ro' for bundle instantiation. If keep en-ref Exec explicit, then could simply require that arg to '@' param must ben an en-ref. But, then what is the type of an eik_enref? Sigh. IsAllocedType => IsTrack{able|ed}Type DONE <> Buffer/Tab/Window Menus at top relate to active buffer, as do key bindings, including menu shortcuts, mouse stuff, etc. New frames get new colour for tab and border (when active) Menu/key to convert active among buffer/tab/window Can move tabs back & forth - colours stay with the tab contents Manual ability to change frame colour - menu Key to tab among tabs, others for buffers, windows. Like emacs, can use tab-completion on name Can any code, anywhere, call a bundle API proc on its containing record? Why not, if not? What happens if the bundle types (API & record) are not even exported from the package? Write up the consequences of these, along with the rules for bundles. At least try, then can compare that vs the implementation. Remember - bundle types cannot be "public", since that would allow assignments to the fields, and thus the "breaking of the bundle". Don't really want the "allowNonRo" param on assignIncompat. That allows true 'ro' ref targets to be modified. Really want to avoid making the type 'ro' ref in the first place, based on the nature of the API call. 070405/Thursday [Capilano mall display took a week of time.] <> Can save a lot of space if one email is allowed to directly reference another. So, if someone replies, including a full copy of what they are replying to, can just refer to the original copy of the replied-to email, assuming that it has been saved somewhere. Track with reference counts, and maybe convert to direct inclusion if the original is deleted. 070406/Friday <> The sending computer should hold the personal information of an email, etc. sender on that system. A service (a "whois" service) could run on that computer that responds to requests for that info (e.g. when receiving an email, the reader could click on the sender to ask for more info). That service would be subject to abuse. So, for a private computer, have the service simply refuse to do anything for 5 minutes after a request for info on a non-existant user. This should effectively prevent scanning for personal information using random email ids. An email server has a similar problem - people trying random ids to email to. A large enough server can't just stop responding so much. However, it could keep track of IP addresses, and ranges of them, that have been trying to email to invalid ids, and subsequently delay responses to them. Note that it has to delay successful responses, else the lack of a quick response would tell the prober that the probed id is invalid. 070408/Sunday Perhaps the concept of "private" should be more explicit in the system. Have a type node for it. There is no explicit syntax to yield a value of such a type, but incorrect uses of generic or instantiated values could perhaps yield better error messages based on such a node. Alternatively, the "ro" attribute as used now is the correct thing to complain about, because it is explicit. Hmm. I already have the error message: Cannot assign to private (here) record field 't_data' 070413/Friday <> From comp.arch: From: "David Kanter" Subject: Re: 32-bit vs. 64-bit x86 Speed Newsgroups: comp.arch Date: 13 Apr 2007 01:10:34 -0700 Organization: http://groups.google.com I'd also point out that in the case where increased memory pressure is a problem, some environments and languages have added support for compressed pointers - i.e. they force the system to use 32b pointers, but can access all 16 architectural registers. Java in particular uses this optimization, so that the extra performance from the registers is 'free', so to speak. DK 070415/Sunday Need a treatise on '@' types. What the options are, and what they imply for the implementation. 070416/Monday <> Could use run-time evaluation (or maybe not even needed) to have a way of inserting native assembler code into routines. Perhaps it is as simple as "/.../Exec/Asm(string instruction)", which simply inserts an Exec_t into the proc. That string is not interpreted until native-code generation time, if that ever happens. It pretty much has to be delayed until then, else anything it references is not known yet. 070419/Thursday <><> Linked to from SlashDot: Law of font kerning: The Law of Optical Volumes states that the area between any two letters in a word must be of equal measure throughout the word, and remain consistent throughout the body of text. 070420/Friday <> Thinking about the ZFS stuff Dale presented at work. The "uberblock" is going to be rewritten an extremely large number of times. Even with disks automatically remapping blocks, I think that is going to be risky. Perhaps there should be N (e.g. 100) blocks, each at the beginning of a separate physical track (since I think that disks remap to spare blocks at the end of the track). On mounting the file system, real all N blocks and run with the one with the largest sequence number. When writing the uber-block, cycle among the N locations. If the latest that was actually written cannot be read on a mount, then the system will go with the one that can be read with the largest sequence number. A consequence of that is that no block that is "freed" can be re-used until N writes of the uber-block have happened, else an old uber-block activated this way could be referencing overwritten data. Dale didn't think that ZFS has special support for small files by putting the file data directly in the header, instead of the normal header stuff for a file. I think for Z's object store, something like that would be required, since Z will have lots of quite small persisted objects. 070425/Wednesday <><> For top-level "programs" that can be run, the argument could be a oneof with two alternatives. The first alternative is used when the program is called from the command line, and can be the command line tail, or some processed result of it. The second alternative is used when the program is called via the GUI. It could be like the Amiga used - an array of paths to files given as arguments. Those files can have their own override values for properties, and each should be used as appropriate. This usage would only be when the user selects a bunch of data icons, and then starts the "program" with them selected. If the user simply double-clicks on a data icon, then the code associated with the top level "type" of that icon will be started. As a concrete example, double-clicking on a drawing should bring up the drawing editor. But, the user may wish to simply do something else to a set of drawings, so selecting the drawing files, then double-clicking on an explicit "program" file, allows for this. <> With really fast graphics, one could have windows that are coming to the foreground do it by becoming less transparent, through a one or two second interval. Sort of a "fade in". 070429/Sunday An interesting smallish project would be to use conditional compilation in bcRun.c, to turn off all of the checks. Compile all of the C code with -O3s or some such. Compare the run times of digits.z both ways. Then do a straight C version of it, both with no optimization and -O3s. Compare them all. [The # stuff is there now.] 070506/Sunday This is a very brief thought at this point, and may lead nowhere. A function in an API record could be one that returns a pointer to a contained or containing type of a generic value. Different instantiations of the generic type will be layed out differently in memory. A function such as this would thus move forward or backward different offsets for different instantiations, but always returns a pointer/reference that has some related significance to the generic value. This is really vague! Something like the run-time casts in C++ perhaps, but without run-time checking needed. The validity of the "cast" would be determined instead by whether the particular API contained such a function pointer. 070507/Monday In getting rid of the concept of "ref" parameters (replacing it with the more explicit '@' parameters), I started getting some errors from listtest.z about "Differing '@' types". That's because the code in assignCompat isn't skipping over a tik_named node when comparing the types under tik_ref nodes. However, while chasing that I noticed that the "tptr" values for the two PointerDesc_t records were not the same. One pointed to the actual record type, and one pointed to the tik_named for it. This is because in this situation I have one PointerDesc_t produced by C code, and one by Z code. The Z code one had the tik_named skipped, because the Z code generator does that when generating record constructors (in turn the bytecode routine for record constructors asserts that it is given a tik_record type, and does not skip a tik_named). The C code, however, is just assigning the result of "predefRecord" into the TV_PointerDesc slot, and thus has a tik_named. This doesn't really matter, but they should be consistent. To make them consistent, the routines like "predefRecord" should return the type being named, and not the naming type. However, some of the early very-specific setup code in Types.c must be modified to no longer skip over the now non-existant tik_named node. 070509/Wednesday I may want to go back on the above. Sigh. Having the named form of the type be pointed to by tptr's allows debuggers, etc. to know the name of the type. Doing this will require adding a Types/SkipOneName call into the bytecode's "recordConstructor" code path, in order to find the record type itself. Since record construction is already expensive, this ought to be acceptable. Lots more editing to do! 070522/Tuesday Should now be done with the '@' stuff. You can have a simple variable, but it must be 'ro', so you can't change what it references. Fixed up all needs for the 'ro' in Z code (and builtins!) See 070519-PolyGen 070527/Sunday Actually finshed up the '@' stuff. Started in on "generic" packages. Don't allow declaration of any variables, fields, etc. of types exported from generic bundles. Also be paranoid and make them not type compatible with anything outside of the context of the use of instantiated ones as actual parameters to proc exported from generic bundles. Watch out that people can grab direct references to procs and types from inside bundles. Just catching bad uses of uninstantiated forms at the point of reference isn't good enough. Package and Exec declarations, plus field, etc. uses in Types can catch all type uses. Proc uses can be caught mostly in Call/CallStart. Is there a way other than at runtime with procassign to catch references to the generic procs? In bundle/symboltable.z, which is currently a polymorphic package, I'm not allowed to set the table entry count to 0, since the table record is private to the instantiator. Interesting. symboltable.z can't be "generic", because it wants to have utility procs inside the bundle, and you can't call uninstantiated generic procs, even inside the bundle that defines them. Should this be allowed? I'm thinking that symboltable.z really ought to be generic, not polymorphic. 070601/Friday Hit a problem in "symboltable.z": there is a scope for the 'while' loop in SymbolTable/search. The scpin and scpout are OK. But, the "incref" is done once per trip around the loop, while the "decref" is only done outside the loop, where the scpout is generated. Putting a 'for' loop in the middle of reftest.z/deep, we see the same problem with 'for' loops. Also, the disassembler is again getting local variable names wrong in there. There was also a problem with decref versus decref2. The test in eik_return of whether a type is trackable has to be based on the type of the exec of the current scope - it used to be based on the type of proc's result type - that is only correct for the outermost scope. Disassembler fixed - needed more generality when scanning over things to check for a return sequence. "while" loops fixed. "symboltable.z" fine now. Arghhh! I thought I did "for" loops. Now mand.z dies on its rtsv, saying "returning to nil proc?". Hmm. Doesn't seem to have been the "for" changes. Aha - inc/dec of "cc_forDepth" moved inside bcGenFor. Ok, can I now get back to generic bundles? 070602/Saturday Starting towards more "generic" stuff. Yesterday I wondered about all of the effort to support '@' types in a general way - could I restrict them to make it easier? The problem is that a couple of the nice uses of them I've already made are just those that require all of the extra support. E.g. in bundle/test1.z/SimpleLM/ Create, the use of "slmd" local variable. Another is the use of "en" in bundle/symboltable.z/search. Thinking today about some way to have compiletime procs in a bundle, that work on the bundle parameter types. My aim is to find a way to allow code defined in the bundle to work at the level of an instantiation of the bundle, so that it can do things it otherwise could not. Hmm, the proc doesn't need to be in the bundle - it can be in the package. 070603/Sunday I've re-enabled the test about referencing proc's in the uninstantiated form of generic bundles. But, I've added the check that the active bundle is not the same as the bundle containing the proc. This allows calls from within procs in the bundle, but not otherwise. However, in my variant of Create in symboltable.z, where it is generating a call to "Init" after creating code to create the SymbolTable_t, this is a problem. The code, even though it is defined in the bundle, is executing outside of it, in the context of its caller, and so the code it generates to call Init hits this check. What the code needs to do is to call the instantiation of Init. But, how can it do that - it doesn't have access to any instantiation information, even though it itself is an instantiated proc? Resolved the above by putting pointers to the active compiletime proc and instantiated proc into the Proc/Context_t. Works. [Ahhh, that's where that came from - after bundles were removed, it was removed too.] Need to prevent use of uninstantiated generic types, just like we prevent use of the procs. I think I need to prevent matrix construction of a matrix whose element type contains an uninstantiated generic type. I think I went through this before and somehow decided it was OK if the element type was an '@' generic. But, don't I want the allocated matrix to have the correct type, which is the instantiated type? Make sure you can't return '@' values from a proc. DONE <> For lower-level programming of a sort, could have a run-time system flag, like the one in Draco's RTS, that allows constructors to return nil. For efficient lower-level programming, it would be nice to allow something approaching pointers. Perhaps an intermediate level (or the top level) which allows dereferencing pointers, but not the creation of them? What I would like is a way to do a lot of stuff with pointers (or '@'s), in a way that requires no incref/decref work, but that is still safe. If it is '@'s, then the code is better because it requires no run-time check for a nil pointer. Can I somehow relax the rule of not having proc's able to return '@' values, so that trusted low-level procs can actually do so? For the symboltable.z setup, I want a way to allow the generic code to grow the table. But, it cannot modify the generic SymbolTable_t to assign a new matrix to it. Also, it should not be able to allocate the new matrix (see above). Perhaps the trick is to have some kind of __BundleInit__, which is run at instantiation time, and is passed the generic type (what if there are more than one?). It could then create a 'grow' and attach it to the table or something. It could also create a "Create", which it can attach to the instantiated SymbolTable_t, so that it can be used: PairTable_t MyPairTable := PairTable_t :: Create(30); Checked for the reference to generic types, but in both that and the proc case, can't just ErrorSub a full valid reference, else the naughty caller can just skip that and get at the "valid" reference which should not be allowed. Ok, cleaned up - I just use Types/Error or Error(). But now bundle/test1.z doesn't compile. [Fixed] 070604/Monday Expanded symboltable.z to have an instantiation with "any". Added the tests in Exec/assignIncompat to allow tracked type to be used as "any". <> Thinking about a key type of "any". Privileged code could have a routine which extracts the actual type from an "any" value, then uses Types/ExportFind on that type to find a hash function. Perhaps there could be a non-privileged construct that allows the combined action. For example, a variant of ExportFind could take an "any" value instead of a type. Is even that a security violation, in that it might again allow the caller to get at a type definition that they should not be able to get at (it is private to its package)? I've gotten sidetracked. I noted that in several places the syntax "type blah = " is commented out and replaced with "Types/Type_t blah = ". Looking into this, I see problems when the type being named is the result of compiletime execution, and so the result is a tik_named over top of a tik_exec. For example, in Exec/TypeSymbolRef, I've needed to add a Types/SkipExec in the lookups. However, there are still more issues, such as assignment compatibility ones. It's a bit ugly to have that eik_exec underneath the tik_named all the time. I need to preserve the structure of the actual declarations, however. listtest.z runs as expected now, using "type" instead of "Types/Type_t". There is still one unfortunate issue, in that you can't save a pointer to the created "New" proc in a proc variable - the types don't work. Perhaps the safest thing to do, in Exec, is to just recurse through the proc's exec tree re-doing all of the construction calls. If any errors are generated (the error count goes up), then don't generate code. This is more failsafe than trying to keep Exec/ProcCheck uptodate. DONE 070605/Tuesday Yesterday I ended up putting back all of the SkipExec stuff, in the form of a Types/SkipNameAndExec routine. I dislike doing that. The reason it is needed is that the Types/NamedDesc_t which names types often ends up pointing to a tik_exec type. I think the presense of the Exec_t for the type is only needed in order to save away and pretty-print the code. But, right now, with a declaration in a package, the package element only contains a pointer to that NamedDesc_t, so the tik_exec must be kept. Similarly an eik_typeDeclaration contains just the name for the type and a Types/Type_t for the type itself. Both of these representations would have to grow (along with the bundle one too), in order to contain both the with-tik_exec form and the without-tik_exec form (or even just the without form and an optional raw Exec/Exec_t). That would be akin to the way an Exec/UintConstant_t preserves the input form of the number, etc. 070616/Saturday Planning on starting the last bit of "generic" bundle handling - that of checking parameters to a call of a proc from the instantiation of the bundle. Adding field ctx_activeProcInstantiation to allow for that. Noticed fields ctx_runningProc and ctx_runningProcInstantiation, but can't find any uses of them. They are pushed/popped around running a compiletime proc - perhaps there is some unimplemented need for them during execution. LATER: they are used in the bundle/symboltable.z test program, in which the symbol table Create routine is a compiletime routine, and needs access to the instantiation because it needs to create stuff in the context of that instantiation. [Much later: NOW GONE.] Beware recursion when walking type structures. I think in current code this can't happen (haven't checked), but beware it. In types generated by something like the Lists package, there can be circular references that do *not* go through a tik_named node. Is that even valid? Don't I need a NamedDesc_t to indicate the package/bundle within which the type is defined? If a Lists list is used as part of a type within a bundle, how does its instantiation reflect where it is instantiated? [Much later: all types that could possibly reference themselves are now always named.] Hmm. Perhaps some of the parameters to Types/NamedNew should actually be taken from the ctx, rather than allowing them to be passed in directly. But, watch out for instantiation - it likely needs the current interface, but it can be private to package Types. RESOLVED Must not let a proc instantiated from a generic bundle be assigned to a proc variable/field/etc., since then there is no way to check the parameters passed to it, relative to its instantiation. The type of its parameters is not sufficient here - there is the added constraint needed on the actual parameters. [Later: I don't follow.] RESOLVED Argh! A Proc/ProcInstantiation_t has a reference to the proc that is being instantiated. It is of type Proc/Proc_t. So, there is nothing preventing a user from assigning that to a Proc/Proc_t variable, and then calling the pointer at runtime using "procassign". Perhaps the run-time checking that procassign does (which up to now has been very minimal - just compare the pointers) needs to be expanded, so that it will not allow a proc from a generic bundle to be used. Heck, why not even push it all the way to disallow any proc from a bundle - easier to test. And, there is no real need to allow that to happen - in legitimate situations the user can always write a wrapper routine and use a reference to that. DONE. Interesting: you can't put a direct reference to a proc as the second value for "procassign". That's fine, since it is pointless other than for a test of the above, but was unexpected. Exec.z/ProcAssign simply does a direct comparison against Proc/Proc_t (TV->TV_Proc in C code). Annoying issue with ctx_activeProcInstantiation - there needs to be a stack of them. I can do that with another linked list in Proc.z, or I can slacken up the tests in Proc/SetActiveProcInstantiation so that the Exec.z code from proc parameters can save/restore the current around any parameter handling. This leads me to think of the other stacking that is done - that of pushing/popping active variants. Compiletime code has access to the Proc/Context_t and so can call those push/pop routines, I think. That lets it effectively "cast" within a oneof. I think the only answer here relates to what I mentioned last Monday - I really should just rebuild the entire tree instead of using the current ProcCheck. When doing that, do *not* re-execute any compiletime procs. So, for now, just relax the checks in SetActiveProcInstantiation, since they don't hold water anyway. Later: one thing that I've observed that is caught by the current ProcCheck only, is if you pass a local variable to a compileTime routine, in a parameter position that is not Exec_t. In my case it was a uint local variable and a uint compileTime proc parameter. [080412: I can find no use of ctx_activeProcInstantiation anywhere. There is a test relating to it in Exec.z/InstantiatedProcRef, but that test is not present in the C version, and the error code number (245) is used for something else entirely, in both C and Z versions.] From my walk to supper: for generic bundles, I should just be typechecking the proc calls against the instantiated proc type, rather than the generic form of the proc. Then I don't need anything else special when checking calls to procs from generic bundles. Underway next day. 070617/Sunday The above-mentioned way of doing type-checking seems to work fine. An issue that came up, however, is that in Exec/callParamCheck, we can't always have a Proc/FormalList_t available. That means that we have to rely on the ppl_symbol in the Type/ProcParamList_t. That's often fine, but Types/sameParamLists ignores the symbols, so we can end up with Types/Normalize yielding a proc type with different symbols for the formal parameters. That is a good reason to make it pay attention to the formal parameter symbols. I'll switch to that, at least until I can recall why I didn't before. :-( Well, one reason is in polymorphic bundles - the procs provided for an API all need to have the same name for their formal parameters as in the declaration of the API record. This is icky. <> When I do a final Mapping package, do the following: - track the instantiations at compile time, like the Lists package does, so that duplicates are avoided - if requested to do so (optional to save space) provide iterator functions for iterating over the keys and the values - if requested to do so provide iterator functions attached to the produced mapping type, which can be used by the "iterate" construct - perhaps export SymbolTable_t as a half-instantiated form of the mapping, where the key is 'string'. 070618/Monday <> Right now, bundle/mapping.z is getting a code-generation error - bcGetSize can't determine the size of a type. The type is Types/Error, so that is expected. But, we shouldn't even be trying to generate code. The problem is that the errors have been in bundle instantiation, etc., and there were no errors in the routine being compiled ("inquire"). So, I think that the Proc and Exec code should examine each type used in a declaration, to see if it contains "Error", and increment the Proc/Context_t error count if it does. No need for an error message, but prevent code generation. Add a routine Types/ContainsError to do the check. I don't know whether it needs to be able to go past named types or not - just looking at the named type itself may be enough. Maybe not. Ick. Working on mapping.z some more. The reason I can't construct the SymbolTable_t api record is that with a "generic" bundle, the relaxation of type compatibility is disabled, and so the type of the actual routine for the instantiation is not allowed in the generic API. Wait - the API type should be instantiated. Hmm. Ah - types SymbolEntry_t (the instantiated type from the instantiation) and Entry_t end up with the same definition, but since they have different names for that definition, they are not compatible. 070619/Tuesday Note from the other day: get very poor error message (the one about having code outside of a proc) because of symbol confusion when the bundle name is the same as the name of the package it is in. Prevent that. 070629/Friday When a proc is instantiated, only its header is actually copied and re-created. Thus, all references to types from its containing bundle will be to the generic forms of them. This extends to run-time, where things like matrix allocation, etc. will use the uninstantiated types. A record constructor is not allowed, but a matrix constructor is, currently. If the code is to be shared, then there must be no references within it to the generic form of types defined in the bundle. That means no references to such types at all, within the code itself. The proc header is fine, unless references end up referring to the generic types at runtime. Have to go check all of the references that the code generator needs to associate with the proc. Essentially, cannot allow allocation of a generic value. Ever, including inside procs inside the bundle. The '@' support may come in quite useful for an iteration construct. Suggest "iterate over do od". Is there some role for bundles here? Is the iteration facility somehow a generic thing, rather than requiring a bunch of support in the compiler? There could be one (or several, for different purposes) iteration API that is looked for on types, rather than a bunch of individual procs. %%% Be careful with adding proc's to the export list of types. Keep in mind that doing that is not a privileged operation, and so entries put there by some library code can be replaced by another programmer. Are there implications of this wrt bundles and instantiations? This thought came up when thinking about mapping.z, and keeping an "alloc" function on the export list of the mapping type created by the instantiation. If multiple occurrences of the same instantiation end up as the same type (via Types/Normalize), one could look for "alloc" on the type after normalization, and assume it was from a previous instantiation with the same instantiating type. Is there any issue here? 070701/Sunday The current thing I don't like about how the mapping.z package is shaping up is the need to generate code to do the allocation. By the nature of bundles, it has generated the matrix type for the hash table contents. There just needs to be a way to get access to that type during the execution of the generic procs, in such a way that we can be sure it is the right type. Of course that means that the allocation is not being done from a constant type at run-time, but so long as we are sure that it is the proper type, that really doesn't affect the speed of compiled code (access via a variable versus a constant). Hmm. Can I make it a parameter to the "Insert" routine? Can I also make the exported Insert be a compiletime proc, to hide that detail from callers? The compiletime code would find the instantiated type within the context of the executing procInstantiation. Then, at the compile level, it all comes down to having the matrix allocation construct accept a proc parameter. Hmm. But how do I safely let the compilaton of matrix allocation know that that particular Types/Type_t value is the appropriate matrix type? 070702/Monday Some links via Burton Smith via The Register, for parallel programming and parallel programming semantics: http://www2.cmp.uea.ac.uk/~jrwg/Sisal/ http://www.cs.cmu.edu/~scandal/nesl.html http://portal.acm.org/citation.cfm?id=360224&dl=ACM&coll=portal http://64.233.183.104/search?q=cache:Y5GxmmwZ1b4J:research.microsoft.com/users/lamport/pubs/ghl.pdf+The+Hoare+Logic&hl=en&ct=clnk&cd=4&gl=uk Thinking, on a walk, about the issue of allocation inside procs in a generic bundle. - want a way to have the actual type, for a running instantiation, to be available to the generic code. Then, it can safely allocate using that type. This could be done via hidden parameters, a magic API-like record that the language/system maintains, etc. But, I don't like implicit things - I greatly prefer them to be explict. An idea is to have slightly strange syntax of some kind, that allows a constrained type value to be put into an API record. It is constrained to be a consistent (with the rest of the API) instantiation of the type that needs to be allocated (e.g. a matrix type for the hash-table form of mapping.z). It could be declared, as a field of the API record, using a syntax that looks sort of like a type declaration. E.g. "type matrixType = [] Entry_t". When the API record is constructed by an instantiating package, that field must be filled in with the instantiation of the required type, e.g. InstantiationName.MatrixType_t or some such. - the above may well be doable, but it then occurred to me that it might not be necessary at all. The issue I've been concerned with is that of ending up with chunks of storage that are typed (as in the type pointer within them) by an uninstantiated generic type. But, is that really a problem? This can clearly only be done when all involved generic types do not contain an '@' bundle parameter type. So, each generic type can be treated as an "any" value - the item it points at will have its own proper type. So, there really isn't an issue. In fact, its probably OK to allow record construction of such generic types. This would be good, since it makes the interface to generic code like, say, a binary tree bundle, nicer, in that the allocation of a tree node can be done within the generic code. The value "nil", or an actual instantiated value of the non-'@' bundle type parameter, is a fine value for such a field in the generic record. Part of the key is that there can't be matrixes or arrays containing '@' bundle type parameters, nor can generic code construct a record value containing such a parameter. In both cases the required length is not known. In the matrix/array case, indexing is not possible within the generic code because the element length is not known. 070703/Tuesday <> A downside to not having the correct type on things is that if pointers to those things are assigned to "any" variables, and then tested for type with an "assign" construct, they will not have the proper instantiated type, but will have the uninstantiated form. That would prevent code from using them as the proper instantiated form, since then there is no way to go from the bundle parameter type to the actual instantiated type for non-'@' types (and '@'-bundle parameter types are irrelevant here). So, it seems the language extensions, as ugly as they may be, are the way to go. Another (asside from the use of the instantiated type within the API record in a matrix constructor) could be to use such a constrained type in a record constructor within the generic code. Further thought: in a bundle like "mapping", the type "[] Entry_t" isn't a named type in the bundle. So, currently it won't get explicitly instantiated. I think it needs to be. Perhaps when compiling a bundle, I need to track all types that are derived from any bundle type parameters. Keep a list of them. Then, instantiate all of them when the bundle is instantiated. This could be instead of the current list (of just the user- named types) that is so instantiated. [Resolved in generics] The bundle instantiation itself currently exists at runtime (like everything else does). So, one way to handle all of this is to have code generated for bundle procs get the instantiated type from the matrix of instantiated types in the bundle instantiation. Depending on just which types are ever referenced at runtime (still need to look through the byte-code generator to see when types are referenced directly, but note that one of them will be any direct reference to a type in the code, where it is explicitly being used as a value by the code), it may or may not be possible to make this a more explicit thing, rather than this fairly implicit operation. Admittedly, the implicit form will certainly be easier for programmers to use - they don't have to be aware of what is going on with the instantiated types. If this vector of instantiated types is used, there needs to be a way to access it at runtime. This could be via a reference to an in-memory copy of it, created like an initialized array of types by the compiler. Or, it could be something that is, at least for the byte-code engine, set into the active execution Context_t via a new kind of subroutine call opcode that extracts it from the ProcInstantiation being called. That requires that the reference in such a new subroutine call (only direct needed - you can't have an indirect reference to an instantiated proc) be to that ProcInstantiation_t rather than to the normal Proc_t. Shouldn't be an issue. When is the previous such value restored? It ought to be on a call return (rts/rtsv), but it might work out better to do it after a normal return. That way, the calling code can stick to a normal call when appropriate. I think the special call is only needed when calling an instantiated proc in a different bundle instantiation from the caller. Hmmm. This suggests explicit instructions to both set and restore that type-vector pointer, rather than a new call instruction. Where do they save the old value? On the stack it would seem, but that would be an issue, since it needs to be done after all parameter preparation, but before the actual call, which would mess up the FP offset to the parameters. Ick. Right now, types are used in byte-code a lot more than I thought they were. matrix/array indexing, record constructors, matrix constructors, and all 'r' variants of push/pop instructions. I am still using pairs on the type stack (and its still called a type stack). If all allocated chunks of memory, I shouldn't need the type part of the pairs, I should only need the pointers, since the actual type can be seen by looking at the pointed-to entity. However, there might be an issue. During bytecode execution, an address into entities can be on the stack. One example is when indexing into a matrix/array. Hmm. Those operations do not push anything onto the type stack - they only pop 2 words. But what if I have an array of arrays? It all seems to work out. So long as bcComp.c is consistent with bcRun.c, the instructions are interpreted properly. So, that leaves wondering about all of the stuff pushed onto the type stack - is it all popped properly? So far it seems to be. There will be a type vector created for each instantiation of a bundle. So, a fixed static element will not do. In fact, it is only at the transition from outside of that bundle's procs to inside them that the particular instantiation is known. To avoid the problem of being in the wrong instantiation during evaluation of the actual parameters, we can't switch to a proc's proper type vector until just before the call to the proc. And, we need somewhere to save a pointer to the old type vector for that bundle (we may have been in the context of some other instantiation - I think that is possible), in a way that supports recursion. So, that pretty much has to be on the stack. As mentioned above, we need to be aware of the effect on parameter offsets of any extra value pushed. So, perhaps any proc exported from a generic bundle assumes such an extra value on its call. Or, even more, it simply takes that value as an extra parameter to its call, and thus we do not need to save previous values - they all have their place in the stack frames. This should be do-able, and doesn't need any new instructions. 070704/Wednesday %%% Noticed that I don't seem to allocate space for package variables in package "/". Could be a special case where I can't. Got a working mapping.z, by just loosening some tests. So, that verifies that the above will be useful, if it works out properly. Noticed that test program .../Src/C/parse/parsetest.z gets an error - a reference to freed memory! 070714/Sunday Took a while (plus busy on other stuff) to figure out the problem. It is a fairly significant oversight on my part. It relates to '@' parameters, and how I use incref and decref to handle cases where I don't want the larger entity containing something being '@'-ed to go away during the function call that is referencing that something via the '@'. To clarify, consider a situation where there is a package variable that is of a record type. A proc is called with the '@' of a field within the current value of that package variable. The risk is that an assignment to the package variable, while the proc is still active, could free the old value, thus turning the '@' parameter to the proc into a dangling reference. My fix for this was to arrange to do an "incref" instruction on the old value of the package variable before the proc call, and a "decref" instruction after the proc had returned. A concrete example (not using a package variable, so in truth the code generator could be smart enough to not do the incref/decref, provided no '@' of "r10" is passed to the procs): type R10_t = record { string r10_s; uint r10_n; }; proc test10a(string s; @ uint ro pN)void: corp; proc test10b(@ uint ro pN; string s)void: corp; proc test10()void: R10_t r10 := R10_t("hello", 20); test10a("fred", @r10.r10_n); test10b(@r10.r10_n, "barney"); corp; Here is the disassembly of test10: Disassemble: proc 'test10': 0000: scpin ref<76> 0003: pshtr "hello" 0008: psh1 20 (0x14) 000a: reccon R10_t 000d: poplr r10, offset 0 0012: pshtr "fred" 0017: pshlr r10, offset 0 001c: incref 001d: pshflda r10_n, offset 24 (0x18) 0020: jsr test10a 0023: decref 0024: pshlr r10, offset 0 0029: incref 002a: pshflda r10_n, offset 24 (0x18) 002d: pshtr "barney" 0032: jsr test10b 0035: decref 0036: scpout ref<76> 0039: rts Note the differing placement of the "incref". The code is invalid in the first call, because the pointer copy (to the r10 record) that incref leaves on the tstack is thought by the byte-code machine to belong to the string parameter. So, as part of proc exiting, a DECREF is done on it. The proc entry matches that, thus leaving the r10 value with no ref count. Worse, however is that the "decref" instruction ends up working on the string constant. That's OK if the string constant is marked with a useCount of 0 because it should never be freed, but not if its a dynamically allocated string. What is the solution to this? The problem comes down to situations where an '@' proc parameter is not the first parameter, and other parameters before it are trackable values. One possibility is to have a weird instruction that shuffles the tstack in the right way. Or, "incref" can have a constant operand telling it how to shuffle the stack. This would let us evaluate the parameters in the normal order, and only once, but then end up with the extra tstack values all effectively pushed first, so the actual proc parameters are in their proper places. This has some implications for eventual native code generation. Another possibility is to effectively generate hidden temporary variables in the calling proc. Save a copy of vulnerable pointers to them, and then assign nil to them after the call. [This was done] A third alternative is to simply disallow the calls. Require the programmer to explicitly use temporary variables. E.g. test10 would have to be: proc test10()void: R10_t r10 := R10_t("hello", 20); uint temp := r10.r10_n; test10a("fred", @temp); test10b(@temp, "barney"); r10.r10_n := temp; corp; I have a vague recollection that it actually used to be that way, when '@' parameters were 'ref' parameters. But then why did I have test program "bugtest.z"? 070727/Friday bintest.z seems to be not freeing much memory - the stats at the end show mostly 0's for "free". The "total" is similar to what I've seen on other runs that show mem stats. Something recently broke this. I've replaced "OptimizedExec" with "AlternativeExec", and added an enum to the values, indicating what general kind of alternative Exec they are. The plan now is to use the second alternative above, as a kind of AlternativeExec_t I've named "aek_callWithRef". 070730/Monday I've done the fix for the '@' parameters having trackable parameters after them. "bugtest.z" initially seems to work with it. Need to optimize a bit - if the source of a reference to a larger value which an '@' is used within is a local variable or parameter, then the protection is only needed if that same local variable or parameter is also passed, via '@' to the same proc call. I now get code-generation errors about bcGetSize when compiling Exec_Binary.z . Have to chase. Also got incorrect results when running parsetest.z - could be commented out stuff in .z sources, left over from when I was chasing the problem. 070731/Tuesday Fixed the fix for '@' parameters. If I change bcComp.c to not pass "z_true" to bcComp() for eik_enRef's, then I get the code I want. So, that leaves handling '@' local variables, which are the other use for incref/decref. The above optimization is there as well. The bcGetSize problem was due to trying to turn a struct into a simple value - fixed. However, I just realized that my fix to the fix isn't right for more complex cases. Hmm. Maybe it is. My concern is that the variable created might not point at the thing that is directly being '@'-ed for passing as the proc parameter - it might point at an enclosing compound, and so replacing apl.apl_this with a reference to the local variable is not correct. Proceeding with adding a test case to bugtest.z . Yep, its not doing the right thing. The part of the tree for the actual parameter that is applied to the reference copied into the new variable needs to be applied to that variable, when it is used as the actual parameter. Actually, I think that the "ex" parameter on "saveOuterRef()" should be an '@' parameter. Then, when "saveThisRef" is called to create the new variable, (again using '@' for the expr), the "ex" value is replaced with a reference to the new variable. Then we don't need to return anything from these routines. Done. Seems to work. What's left in this issue is to switch to the temporary variable technique for '@' local variables, then get rid of incref and decref, etc. altogether. 070802/Thursday A bit of quick work. parsetest.z seems to get the right values now - didn't change anything in it. Should do some cleanup of the old debugging stuff. 070810/Friday One problem with not freeing memory was a dumb error in the new FREE_REF macro - it didn't actually do the freeing! Fixed that and lots more stuff is now freed. But still some left in bintest. Also, the row for size 72 shows 1 alloc and 2 frees. Trimming it all down, I got to a single 32-byte memory chunk being alloced but not freed. It turned out to be a ProcDesc_t. The types normalization code keeps hold of type pointers, so they won't be freed. But, you'd think there was already a parameterless proc with void result. Made the C version of Types/Normalize do a DEC_REF on the old type. Didn't help in this instance. 070818/Saturday In fun/snow.z I wanted to have a local flake_t record type that could reference itself. The compiler didn't let me. 070820/Monday Think about adding unit types. If they are properly used in system routines, they might prevent bugs without introducing too many problems. Maybe. 070827/Monday (Too much Lego work - Hoth for the Telus World of Science Star Wars) Perhaps the "destination" argument for all of the "print" formatting routines could be a bundle type. It simply has a cookie, and a routine to add a character to that cookie. A library-provided one could use a growing one-dimensional matrix of characters as the destination. Perhaps call the routines AscXXX, short for ASCII. It's not very clear, but it doesn't steal Fmt, Txt, etc. and it does stress that the facility is only used for formatting of simple ASCII text, with no "decorations". It also doesn't use "Print" which the casual user likely would assume relates to getting stuff out to a physical printer. 070916/Sunday (Way too much Lego stuff, mostly for Telus World of Science, but also some for the setup at Capilano mall for this year's GETS). An article on The Register points to security vulnerabilities in system call wrappers, that comes up on multiprocessor and multicore systems. But, once the idea is seen, they can also happen on single processor systems, if a context switch can be forced at the right time (e.g. by making the kernel page fault when fetching a system call argument value). I haven't actually read the papers, but what popped into my head is that the system call arguments are read by wrappers, and sometimes by non- careful system calls themselves, in some order. If the wrapper or system call has already vetted an argument, and you can change the argument in memory after that (from another context while the calling context is switched out), you can bypass the checking. For this, the solution seems to be to copy all arguments into non-user-modifiable storage *before* starting any of the argument checking. If the system only allows programming in Z, this problem is less likely, but can still happen, if access to indirect arguments is available in more than one execution context. The URL's: http://www.watson.org/~robert/2007woot/2007usenixwoot-exploitingconcurrency.pdf http://www.watson.org/~robert/2007woot/20070806-woot-concurrency.pdf 070929/Saturday (Telus and GETS work took all my time. Slowly recovering.) Don R.: Chris: Recently, David Monniaux wrote a paper, "The pitfalls of verifying floating-point computations". It's available on the internet. Here's an excerpt. --- begin excerpt --- Consider the following source code (see Section 2.1 for the meaning of hexadecimal floating-point constants): /* zero_nonzero.c */ void do_nothing(double *x) { } int main(void) { double x = 0x1p-1022, y = 0x1p100, z; do_nothing(&y); z = x / y; if (z != 0) { do_nothing(&z); assert(z != 0); } } This program exhibits different behavious depending on various factors, even when one uses the same compiler (gcc version 4.0.2 on IA32): * If it is compiled without optimisation, x / y is computed as a long double then converted into a IEEE-754 double precision number (0) in order to be saved into memory variable z, which is then reloaded from memory for the test. The if statement is thus not taken. * If it is compiled as a single source code with optimisation, gcc performs some kind of global analysis which understands that do_nothing does nothing. Then, it does constant propagation, sees that z is 0, thus that the if statement is not taken, and finally that main() performs no side effect. It then effectively compiles main() as a "no operation". * If it is compiled as two source codes (one for each function) with optimisation, gcc is not able to use information about what do_nothing() does when compiling main(). It will thus generate two function calls to do_nothing(), and will not assume that the value of y (respectively, z) is conserved across do_nothing(&y) (respectively, do_nothing(&z)). The z != 0 test is performed on a nonzero long double quantity and thus the test [sic] is taken. However, after the do_nothing(&z) function call, z is reloaded from main memory as the value 0 (because conversion to double-precision flushed it to 0). As a consequence, the final assertion fails, somehow contrary to what many programmers would expect. * With the same compilation setup as the last case, removing do_nothing(&z) results in the assertion being true: z is then not flushed to memory and thus kept as an extended precision nonzero floating-point value. --- end excerpt --- Gee, I sure hope Z doesn't have those pitfalls. > Gee, I sure hope Z doesn't have those pitfalls. Looking at it again now, I don't think Z will. The reason is that it only has one size of floating point number so far. Also, even if I introduce another size, I doubt I would evaluate (float) / (float) using the longer size, so I would always get 0.0 for that. If the programmer wanted the operation done in the longer size (if such were to exist), they would have to do it explicitly. That being said, my rules for evaluating expressions involving natural number variables shorter than "uint" don't necessarily reflect the above - I believe that I've been considering them to be evaluated "as if" everything is converted to uint, but leaving the option open for optimization on processors that have such operations as native opcodes. In short, whenever you use a numeric value other than "uint", then you need to be very aware of what can happen, and that the language philosophy is that you are using such values only to match external constraints or to get extra speed where possible. That's pretty wishy-washy, I know, but my point is that I specifically don't want to specify the behaviour much beyond that. Question: so when would any sane programmer use such things, with such poorly defined semantics? Answer: when the size is determined by external constraints (e.g. hardware registers, a specified protocol, etc.) and a struct is needed to overlay such; or when the programmer knows that the Z system will compile his operations using smaller and faster native operations, and knows either that all values will fit, or that the resulting overflows are the desired result. 071006/Saturday Fleshed out the example mapping.z a bit - filling in the SymbolMapping instantiation. Noted that its create got an error on the proc result type. Using Exec's assignIncompat instead of the tests there fixed the problem. So, how much of what assignIncompat does should be done for a proc result expression. I believe *everything*, since one can always just declare a local variable of the exact proc result type, assign the result expression to that, then return the variable. So, I need to finally go and clean up assignIncompat so that it can be used in other places. If I recall correctly (it has been far too long!), I was thinking about it returning an enum of some kind. Hah! Went and looked - an explicit "return" statement already uses assignIncompat. 071008/Monday These are from paper notes. Some are fairly old, and thus are already dealt with. "capsule" Note that HTML is a markup language. It is akin to *fmt, *textform, Doc, troff, etc. CSS is like a stylesheet, just poorly designed. Javascript is like custom macros for the layout. Sort of. Arithmetic involving sized values (bits8 => bits64) does not check for overflow. Constant exprs convert to the appropriate type to make that happen properly. So that allows lower level programming and things like multi-precision arithmetic. Test that something can implement multiple API's. My thought is that there can be future GUI items, not thought of, that support their own API, along with a generic GUI item one, which supports creating, drawing, click handling, keys, etc. The distinction between generic & polymorphic likely needs to be at the bundle level, so that all procs in the bundle are of the same kind. E.g. cannot allow a proc that puts values of different instantiations into a collection, since then a "generic" routine would treat them incorrectly. Are "generic" and "polymorphic" good replacements for "bundle"? Perhaps not internally distinguished. %%% <> It may be possible to get rid of the tracked stack (tsp). Knowing where the PC (or a return point) is in a proc's code should allow knowing what set of values that proc has on the stack. Trace back through the entire call chain to identify all trackables. Could be quite expensive. Examine bc_incref and the decrefs. Are they useful for use with '@' local vars? Already to the increfs. [Ended up deleting them, using a scheme where I invented local vars to point at collectible things.] It should be illegal to pass a value typed by one instantiation of a polymorphic bundle type to a proc parameter (etc) of a type from a different instantiation. Its OK if the proc is from the bundle, but it might not be. So, for both kinds of bundles, type check instantiated types (whether in proc params or not) strongly. The special checking for polymorphic types is only for the uninstantiated form. A proc called with such an uninstantiated type need not be from the bundle, or even the same package. How much can one use the uninstantiated bundle elements? E.g. can one *mix* instantiations of a polymorphic bundle? Not good. E.g. could build an API record with the wrong type procs in it. Must use instantiated forms, in which case exact type matching should work. I just can't see anything wrong in generalizing oneof's to allow multiple fields in both the variant and non-variant parts. Construct just like a record - the whole is still always pointed to. Could even have multiple variant parts within the whole, but the syntax gets ugly. [Later: see 080501] C++ allows specification of storage to use when constructing an object. Could do the same in record/oneof constructors. The expression must be of the record/oneof type. There must be no refs (@'s) to within the item to be used in the constructor. [LATER: bad idea in general] If generalize oneof, do they end up being a form of record? If an extended record with oneof is to be re-used through an alloc syntax, it must first be cleared as its old variant. This negates much of the savings. Not so bad for a straight record. %%% Is there a way for compiletime proc to insert part of itself into its caller? Making for a much easier way to write some compiletime code. Later. %%% When running ProcCheck or something similar, want to do the check on the replacement part of any Exec/ModifiedExec. The original is only run once, at original parse time. Whatever it leaves behind must be checked, since that is what code is generated from. DONE With that in mind, Proc/ScopeList_t Inline1; proc compiletime Save1(Proc/Context_t ctx)void: InLine1 := ctx.ctx_currentScope; corp; proc compiletime Use1(Proc/Context_t ctx)void: if true then Save1(); BI/Print("hello\n"); fi; Exec/SequenceAppend(ctx, InLine1); corp; ... Use1(); or something like that. [Strange.] Generic Bundles Have the needed instantiated types be in the API record. E.g. type api_t = record { ... Types/Type_t arrayType = [] Entry_t; Types/Type_t mappingType = record { ... }; }; [Possibly "type" instead of "Types/Type_t"] Very strange syntax! Perhaps the nested stuff could be defined outside the API record and only names allowed after the magic '='? To use them: elements := matrix([size] Entry_t = api.arrayType); Another very strange syntax! (Is there something other than '=' to use?) (or maybe something like matrix([size] api.arrayType) ) mapping := api.mappingType(api, elements, 0); Looks like calling a proc field! When instantiating, the API record is built by the user. The magic Types/Type_t fields are not specified, just like compound types are not. Nothing is gained by requiring the caller to specify the types, since only one is possible anyway. Is there any way to not require the user to specify the fields in the API record, and not use them in the constructor calls? Would be nice, but I don't know how practical it is, given that the user has pretty much free control over what the API record looks like and how it is used. Also, that's an awful lot of implicit stuff. The implicitness also argues that the user should specify the type in the API record constructor. Actual thoughts from today: Is there really a problem with having generic types in some allocated instantiation objects? You already can't have compounds of '@' bundle parameters, since the size isn't known, so you can't do things like allocate or index arrays of them. So, if you want things like arrays, you have to stick to fixed size things like trackables. In that case, the items will each have their own correct type pointer (which could of course be a generic type as well, but that's resolved in the same way). So, freeing these things and doing reference counting on them is not a problem. One issue is that of run-time type testing. For example, if I end up with an "assign" construct like the current "procassign" construct, then the testing at runtime for type equality becomes harder. Basically type pointers being the same is not good enough. If the types are both generic types, then they are not equivalent. It seems there is no way to tell whether they are equivalent instantiations, since, by definition, the generic types cannot reference any type-instantiation record. Hmm. Even worse, even types that are equal pointers and are not pointers to generic types, cannot be assumed to be equal - they could be types that contain generic types deep within them. Ick. This would force any runtime type check ("assign" or "procassign") to be a full recursive search through the two types, failing if any generic type (from a generic bundle - types from a polymorphic bundle are OK) is found. Wait - what stops the user from having a global variable that is a compound which contains a generic type within it? Even nastier, is there a way for a user to make something containing a generic type be inside a capsule for a polymorphic bundle, such that they can be cross-used because of that? But, when defining the types needed within a generic bundle, I need to put the generic types within compound types. Is the save simply the fact that you can't define file-level variables inside a bundle? That would be the only place you could have such a variable that in any way involves a type that is generic in the bundle. You can save a reference to a generic value to a file-level variable of type "any". You can also try to use it. However, as with any use of an "any" value, you first have to do a run-time type check. And that re-opens the issue above about doing such run-time type checks that might involve generic types. Is it a valid answer to say that you cannot reference any file-level variable from within a bundle? But then you must also not be able to reference any procs defined outside of the bundle, else you could pass a generic value to them, and it could save it in a file-level variable. That would force any generic bundle to be completely self-contained. It could not, for example, make use of any other bundle, or something like the Lists package. Well, to be more correct, it wouldn't be able to pass any value containing or referencing a generic value to an outside proc. Hmm. I already have Types/ContainsBundleParamSubtype, which is likely good enough for that test. In any case, not being able to use procs from other packages is not very good. For example, in the current mapping.z, I use stuff from package Mapping in test package SymbolMapping. That isn't a generic bundle on its own, however, it is only an instantiation of the more general Mapping. Hmm. Given these rules on generic bundles, it probably is OK to pass generic values to procs exported from other generic bundles - they are bound by the same rules, and so cannot save away copies of anything. Note that the type of the destination in "assign", "procassign" is the static type of the destination variable. So, it will be the uninstantiated type if we are trying to do this inside a generic context. A simple answer is to perhaps simply deny using "assign" on any variable whose static type contains a generic. The same for "procassign". [Later, I ended disallowing 'assign' and 'procassign' to variables whose type contains an uninstantiated generic type.] Actually, "procassign" doesn't appear to be an issue. You can't declare a proc variable involving a generic type outside of the bundle, and since you can't have file-level variables in a bundle, there is nothing that a user could do. What about using it in a polymorphic bundle to take a proc from one instantiation and use it in the context of another, since I think the "procassign" would succeed in that situation? The items in a bundle, even if they are exported, cannot be directly used: *** Cannot use uninstantiated generic type outside of its bundle *** Cannot use uninstantiated generic proc outside of its bundle 071014/Sunday Yesterday, turned on optmization for C/Hosted. I now get the following error in C/bundle/test1.z: test1.z(195, 52): *** Have 'volatile '@' type, but need non-'volatile' '@' type test1.z(195, 52): *** Value is not compatible with parameter 'head' (Later: it was a missing initialization in the C parsing code.) 071016/Tuesday Thought from earlier - is there really a use for an "assign" construct, and for the "any" type? I'm having trouble thinking of one. Perhaps if the interface for, e.g., extracting values from a byte-stream, is done as a proc, that proc would return an "any", and the check for having read an appropriate value could then be done via "assign". [Later: see around 0804** for thoughts on an Any package and Python.] 071021/Sunday A thought that came to me a while ago: if there are indeed any alien civilizations out there, and they know of us, perhaps one reason they have no contacted us is that they believe we are not ready. And, one reason for that is that we have no common "Terran" language. OK, so the above is a fairly lame reason to do something in a programming language. But, it is a reason. So, perhaps Z should have nothing but simple ASCII support within the programming language itself. I doubt I can justify going further than that. Of course, I can always put off supporting anything else nearly indefinitely. Too bad I already allowed for other languages in the Resource concept. :/) 071026/Friday Rename "tobits" and "frombits" to "touint" and "fromuint". I want the word "bits" for the types with named bits and bitfields. DONE For those types, define it so that the first specified bit is the high order bit in the unit needed. The unit needed is governed by the total number of bits in the bitfield setup. Doing it this way means that the first specified bit is the sign bit, and so can be tested fairly cheaply. DONE If programmers have persisted sets of bits, and now need to add more bits such that the previous size (e.g. a byte) is no longer big enough to hold all the bits, then regardless of how I positioned the bits, the persisted stuff is no longer sufficient. With my choice, you have to convert by shifting the existing bits left 8 bits (to go from byte to 16-bit word). I don't count that a big issue. The benefit of starting with the high order (sign bit), asside from the speed of access to that one bit, is that the description of the semantics can be done fairly nicely, in terms of the uint value equivalent of various bit patterns - that stays the same regardless of the machine numbers or interprets the bits. Programmers using the bits values to overlay hardware will need to know how their hardware deals with bits, but that is needed anyway. 071027/Saturday Changed "tobits" => "toUint", "frombits" => "fromUint", "procassign" => "procAssign" and "getbound" => "getBound". A bunch of experiments with Don, trying to break bundle stuff. Had to fix a couple of things. 071028/Sunday Try doing a couple of "sort" routines. One uses '@' parameters, and the API nees a swap routine. The other uses non-'@' parameters, and should be able to do the swap internally. Want to allow local variables of the non-'@' bundle param type, and we want to allow the assignments, so long as they are non-'@'. Also, within such a bundle, do we force the '@' params to be 'ro' for the values? How does a swap API element work? I don't think they need to be 'ro' - the lack of knowing the size of the values prevents us from allowing an assignment within the bundle code. 071030/Tuesday Could allow any 2-element enumeration to be used wherever a bool can be used. The first element is 'false', the second is 'true'. That would solve the problem we see at YY where function calls have multiple bool parameters, and a list of true/false values is not very descriptive. What happens if you don't want to name the enum type, since it will only be used in the one proc? E.g. proc doSomething(string what; enum {ds_weak, ds_strong} doStrong)void: [The above now works, but it is a full enum type. Later: not any more!] 071030/Wednesday <><> When doing something like VNC, allow multiple input streams (mouse/kbd/ etc.) to operate on the same session at the same time. Something like that could end up being a very useful (and kewl!) collaboration tool (collab?). How far could it be pushed? Reasonably straightforward is each input stream working on its own windows. Then allow the streams to switch windows, but only one working on a given window. Then push further and allow different streams to control different panes in one window. So, for example, if a window is editting the code in some package, if it is split into 3 panes, then 3 different people could be editing procs in the package at the same time. All would see all changes. [Went even further with later thoughts.] 071102/Friday Created Types/CheckBundleGeneric to try to prevent use of generic types outside of their bundle. I tried to make it search completely, and to keep a list of named types that were already in the search stack. It turns out to be very expensive to do that, since testing is happening over and over and over. Also, it appears to have gotten into an infinite loop of some kind, which I didn't persue. So, any attempt to use full depth recursion through a type tree is not going to work. So, I killed the recursion by stopping at any named type. The intent would then be to call this new routine anywhere a type is being used. Which could be a lot of places, unfortunately. If I can't recurse through NamedDesc_t's I will need to check whenever I create one. But, before even adding any new calls other than the one in Exec/Type, I get into an infinite recursion in the Z code when running listtest.z . The reason, I believe, is that the Lists code creates self-referencing types that do not have a named node in the cycle. This has occurred to me before, but I haven't actually hit it. Now I have. So, I need to put back the recursion limiting, but this time with Type_t nodes, rather than NamedDesc_t nodes, I believe. I don't see any particular reason why my other Type_t recursors won't need the same thing. Done the recursion stop in CheckBundleGeneric - listtest works again. 071103/Saturday Haven't been able to make the call of CheckBundleGeneric in Types/NamedNew catch anything. The one in Exec/Type always seems to be enough. It should be possible to bypass the latter by directly traversing the structures to pull out the generic type. Given that, the test in Types/NamedNew might fire, but may not be enough if I can use the extracted type without applying any name to it. ... I got it to fire. procAssign has a run-time check against getting a generic proc from inside a bundle. But, what if I directly construct a proc containing a call to such a proc, using the Exec routines? Presumeably the check for calling a proc will issue error messages. Verified. Hmm. All of my new checking stuff is based on the ctx_containingPackage. The Lists code creates new contexts with a nil ctx_containingPackage. Why? Proc/CreateContext does not let you specify a ctx_containingBundle. So, when code like the Lists code is running, it is always running with no active bundle. So, it cannot be used to build types using bundle parameter types. 071104/Sunday When a type in a bundle has a Lists/SList in it (which gets errors when compiling the bundle, as above), you also get an error when compiling an instantiation of the bundle. This is pretty much for the same reason - we are trying to name a type that contains a reference to the bundle parameter type (that is what the Lists/SList is passed). Why wasn't that instantiated? Answer: because Types/Instantiate was just returning the original type for a tik_exec type. I've changed it to build a new tik_exec type, with the same Exec_t, if the created type, when instantiated, differs from the original in the original tik_exec. Now I get the expected infinite recursion in Types/Instantiate, because of the record loop that Lists/SList has built. Recusion fixed. Can I do lists with a bundle? End up with type-safe generic code that is a lot like the old stuff on the Amiga. [Done. Seems great.] 071106/Tuesday <><> %%% Units. (As in km, mi, in, s, us, lb, kg...) The big question is that of whether to put them in, and how strongly. I like the idea of the additional checks. If we ever get to the point that computer software is basically stable in terms of the OS, languages, tools, etc., then that leaves the actual end-user applications. Some of those can be safety critical. It would be good to have unit correctness checked by the compiler. But, does it end up being an unworkable hassle? Possible syntax: unit(Distance) mile/mi; unit(Work) Watt/W = Ampere * Second; "Distance" is the type of unit - there are not too many of those. "mile" is the official full name of the unit. "mi" is the official suffix for the unit. Is there a well-defined international set of units and their names/suffixes? I hope so. The SI folks are where I would start looking. In source-code, unit suffixes can be append to uint and float literal constants. The actual type is then uint, sint or float, tagged with the unit. For input, there could be library routines which take a unit and a string, and yield a value converted to the unit. The input can contain any suffix of the same type (with optional K/M/G/Ki/Mi/Gi/etc.), and will be converted to the requested unit. If no suffix is present, then the value is assumed to be in the requested unit. If a suffix is present, then the value is scaled as appropriate (plus checking the suffix - it must be one for a same-type unit). For output, there needs to be some sort of routine that can accept values of any unit, and do the right thing. Compile-time output formatters might be the way to do this. How strict is this? Are unit suffixes required in contexts where a unit is needed? If they are not there, how does the system or a reader know the scale of the unit used (e.g. miles versus kilometers)? It may depend on whether the expectation is for specific units or for unit types. "iterate". Can it be made a library routine? My thinking here is that there could be an entry point in Exec that is a "push user construct" thing. It pushes a description on a stack of such. The new keyword pair "begin" and "end" denote a block of code, which will be made available to the user construct compileTime routine. This happens when the "end" is parsed. It marks the end of the block, and the compiler automatically calls the "end-user-construct" routine. Possible example usage: bundle UintSList = Lists/SList(uint); UintSList.Head_t Head; ... UintSList/InsertHead(@ Head, ...); ... Utl/Iterate(it, Head) begin BI/Print("Value is " + BI/UintToString(it.it_this) + "\n"); end; Utl/Iterate gets a Context_t, along with Exec_t's for "it" and "Head". "it" must be an undefined local name, and Iterate declares it as a variable in a new scope, of type extracted from the type of "Head". It then extracts various procs from the type of "Head" to implement the actual iteration. E.g. "First", "Next", etc. Can the body return a value, thus allowing some of the strange accumulations in loops that I thought of for Alai? How does the pretty-printer know to format it the above way? How in fact is this represented in Exec_t records? Clearly its an eik_alternative at the top, but what is under that for the "original"? The main compiler parser must expect and swallow the "begin", since I don't want user code to be able to swallow tokens. Perhaps the answer there is to add a new aek_ kind, e.g. aek_construct or aek_hasBody. [This is very close to the constructs that exist as of April 2010.] <> To allow procs to be exported on types in bundles, need a __bundleInit__ which runs at compiletime at the end of defining the bundle. Also need to look through bundle types when instantiating, and any procs on them that are defined in the bundle should have their instantiated version added to the instantiated type. Types/AddExport should not allow exports to be added to a type defined in a bundle unless the execution context is still in the bundle. It must also not allow procs/types from within other bundles to be added. [Done] 071107/Wednesday Probably should track the instantiations of bundles, so that we don't end up duplicating them, e.g. in cases like the Lists bundle. Do I want a simpler way to instantiate bundles, so that the syntax could be more like: type UintSList_t = Lists/SList(uint); ? Would need something like a "returned" type from a bundle. 071112/Monday Trying to figure out how "construct" procs would work. I think I do need to know beforehand that a proc is a construct proc. If nothing else, I need to start a scope before calling the proc at compile time, so that it can declare variables. E.g.: export proc construct Iterate(Proc/Context_t ctx; string iterator; Exec/Exec_t container)Exec/Exec_t: corp; Processing would consist of: 1) note, in the parser, that the proc is a construct one. 2) before calling Exec/Call, parse the construct body. 3) pass the Exec_t for the body to Exec/Call as a new parameter. 4) Exec/Call stores the new parameter into the Proc/Context_t 5) Exec/Call calls, as normal for compileTime procs, the proc. 6) the proc can access the saved body Exec_t via ctx. 7) the proc does what it needs, including declaring variables in the new scope that Exec/Call has created. 8) the proc returns an Exec_t that is the entire construct Exec_t, and Exec/Call puts that in as the construct code in the Exec/Construct_t that it then builds. So, an example usage: type UintSList_t = Lists/SList(uint); type UintSListHead_t = UintSList_t.SHead_t; UintSListHead_t myList; myList := ... Utl/Iterate("itr", myList) begin Utl/Fmt("Value in list is ", itr.sl_this, '\n'); end; There are a few downsides to this scheme: 1) any messages emitted by the construct proc will come at the position of the "end" token. 2) you have to pass the iterator variable name as a string, since it is not declared yet - the iterator proc must do that. 071114/Wednesday It would be useful to have all dates in any kind of document be a full semantic entity of their own. That way, they are shown in a format that is chosen by the individual viewer on his/her machine (OK, except when you are using a projector), so there should be a lot less confusion. The hard bit might be in figuring out how to enforce this - do I have to pattern-match for dates when parsing anything the user types? 071116/Friday Could allow an extension to the record constructor syntax that allows values for a struct which is within the record. Hard to make it general and still be useful. [Later: see 080501] DONE Alternatively, have the Lists package export a compileTime New routine, which does not need the l_next value, but which accepts the fields of the struct. It could then decide what to accept and what not to, issuing error messages about things it doesn't like. Again, not very elegant or predictable. 071117/Saturday %%% Perhaps there should only be one "stack" of various things under the Proc/Context_t. It would be a list of oneofs. Each "pop" routine would then check that the top element is of the right kind. This could catch attempts to do invalid nesting. Possibly this is more important with constructs - crossing the nesting of those, or those with case contexts, could lead to issues - it's best to detect them right away. 071124/Saturday In Package/DefineTypePhase1 we don't allow for the simple duplication of a type name - we assume it was a predeclaration. Might be just an error! Currently, we get a SEGV out of this. FIXED. With Don, did the start of allowing for "inline" tag on struct fields within records or structs. The idea is to allow direct use of the struct fields names in the context of the record/struct (recursing down through all such "inline" struct types). Did the stuff to add the isInline field to the Types/FieldDesc_t, and checks in Types/addField for duplicate names (at all levels). If end up keeping the tf_usedNames (renamed) on the actual struct/record types (need to add a struct descriptor to hold it), then when I populate it, I need to be creating new FieldList_t values that have the correct nested offset, as well as the correct accumulated values for isRo and isVolatile. Can then use them properly at code generation time. 071127/Tuesday Added Types/StructDesc_t in preparation for saving the renamed tf_usedNames in both struct and record types. 071128/Wednesday Finished the "inline" struct concept. First "construct" ran. 071129/Thursday "constructs" looking pretty good now. 071130/Friday %%% Internal state of random number generate must be non-accessible, to avoid people predicting its future values. (I don't grok this, but today there is a note on The Register indicating a bug of that nature in the FreeBSD random/urandom, with similar effects on crypto-security as the same issue in Windows random number generator.) Later: ahh, the bad people need access to the system and the ability to actually look at the data kept by the random number genarator. 071201/Saturday Starting on what could hopefully be the real Lists package, done using generic bundles. Got to the iterator. Do I want "Iterate" to be very generic, working on any types that export the right things? Or am I OK with having lots of them. In other words, do I write an Iterate inside the Lists package, that can only iterate down lists, or do I write a fully generic Iterate by itself? I was thinking about a fully generic one, which would use procs, etc. exported from types. However, for the Lists types, the types are instantiations of the generic ones. So, who adds procs to the exports table of the instantiated types? I have an earlier comment about doing that automatically when a bundle is instantiated (__bundleInit__). Another option is to have a routine that is called when a bundle is instantiated - that could add exports to instantiated types, using the instantiated functions directly. That prompted me to ask if I can do that. The answer right now is "No". Exec error 244, triggered in Exec/assignIncompat, specifically checks for the use of an instantiated generic proc as a value, and prevents it. Why? You can write a stub routine that does nothing but call that instantiated routine, and that is accepted as a value just fine. My current thinking is that I have a preference for the fully generic "Iterate" construct. However, without it, the entire concept of having things exported from types could likely go away - everything would be done with more specific "constructs". If I go with it, then I will need a way to get things exported on instantiated types. For now, I'll just go with a Lists-specific iterator and see how that goes. Note: within generic code, you are allowed to use uninstantiated generic procs as values. Is that good? No - its not - the use I tested was that of adding it to a type as an export. That was to a generic type, but it could have been to any type at all. That essentially exports the generic proc to anyone who wants it, thus, I think, violating my rules. The check for the use of an uninstantiated generic proc is Exec error 242, which is only check for in Exec/symbolRef. Do I need to use uninstantiated generic procs in this way? I certainly need to be able to call them, from inside the bundle defining them, but do I need to use them as values? Maybe I should just change the error 244 check to check for UNinstantiated procs. Hmm. Ran into the error "Cannot reference a bundle name by itself". Is there any particular reason for that? The workaround was to simply to a string comparison on the name of the bundle. Perhaps I just didn't see a need for directly mentioning a bundle. What type is a bundle? Hmm. I thought I was building a scope around any construct being run, but don't see it. I think I should, else any declaration will end up in the current scope. RESOLVED If a record type contains an inline struct type, should the constructor for that record type include the fields of that inline struct type? (Go fully recursive, of course.) That would mean that a programmer would need to know all of the details of all of those structs. Without this, the programmer would call some routine with a ref to the nested struct, to initialize that nested struct. RESOLVED 071202/Sunday I'm happy with my "construct" code. All cleaned up. 071203/Monday Can a bundle export types as "public"? I can't recall. If so, then might want the ability to override then when instantiating, using e.g. the reserved word "private". May need to export types instantiated from a bundle as "public", even though they are private in the bundle. Suggest, e.g. bundle MyType_t = public PackageName/BundleName; I've thought about sub-packages before. One issue is that of the scope of symbols defined in them. I believe there should be 3 choices - private to the subpackage (default), exported throughout the package, and exported outside of the package. The latter should be "export" as now. The in-between choice should be ....? Roel suggests "local". Should I require an explicit "private" on package contents, types, etc. that are not "public"? I.e. no default. 071205/Wednesday Making progress on ioProc work. Things we possibly need control over for uint output: base, is base prefix present, is base prefix uppercase, are leading zeros present (else spaces) For sint, want an option of showing a leading '+' or not. 071206/Thursday Probably need some more checking on the arguments passed to compileTime (etc.) procs. E.g. if the formal is uint, can't pass a uint variable from the proc being compiled. Currently ProcCheck catches that, with a very obscure error coming out. Such arguments must essentially be constant expressions. However, it will actually work to reference a package variable from an initialized package. Should not, however allow a reference to a package variable in the package containing the proc being compiled, since it likely isn't initialized. Will the concept of initializing a package like that exist with native code? Allowing that sort of thing might allow an invalid value to be used? From Roel, on Python formatting: From Guido van Rossum (Python's creator): Printing and Formatting Two more I/O-related features: the venerable print statement now becomes a print() function, and the quirky % string formatting operator will be replaced with a new format() method on string objects. Turning print into a function usually makes some eyes roll. However, there are several advantages: it's a lot easier to refactor code using print() functions to use e.g. the logging package instead; and the print syntax was always a bit controversial, with its >>file and unique semantics for a trailing comma. Keyword arguments take over these roles, and all is well. Similarly, the new format() method avoids some of the pitfalls of the old % operator, especially the surprising behavior of "%s" % x when x is a tuple, and the oft-lamented common mistake of accidentally leaving off the final 's' in %(name)s. The new format strings use {0}, {1}, {2}, ... to reference positional arguments to the format() method, and {a}, {b}, ... to reference keyword arguments. Other features include {a.b.c} for attribute references and even {a[b]} for mapping or sequence access. Field lengths can be specified like this: {a:8}; this notation also supports passing on other formatting options. The format() method is extensible in a variety of dimensions: by defining a __format__() special method, data types can override how they are formatted, and how the formatting parameters are interpreted; you can also create custom formatting classes, which can be used e.g. to automatically provide local variables as parameters to the formatting operations. 071215/Saturday Done the infrastructure for ioProc's. Lots more work needed on the real Fmt code, however. First need to do a Char buffering package along the lines of the current ByteBuffer. Should the current ByteBuffer be renamed as Bits8Buffer, to be consistent with the language? Currently any ByteBuffer_t has a "filler" routine. That is only relevant when doing input from a buffer. Should the input and output functionality be split into two different packages? Should the output functionality have a flusher routine, which is used when the current buffer is full, if it is present, else buffer expansion happens? Also, those filler/flusher routines should use a bundle (each), so that the routines can have a cookie. For now, started into adding subPackages and the "local" declaration tag. Well, its working, but I'm not 100% sure it is all correct. There seems to be an extra pre-declaration in the output of scopetest. Also, I was surprised to see the lexer calls in Package - I had forgotten that I had put the path parser in there. Check carefully in CreateReference for what I've done with subPackages - I'm not convinced I've got it right. Done a bit more carefully. Should be OK. Hah! The test of course is pktest. It SEGV's. Minor - no init set of subPackage. 071216/Sunday Thinking about the "bits" stuff. By making "bits" things be types, and by making the internals of "bits" things include types, it is easy to make nested ones. This could include the old Amiga RKM flags words in the GUI, where one function accepted a superset of the bits another did. The use of "inline" here can make it easier to access. So: type Mode_t = enum { m_dDir, m_aDir, m_indir, m_inc, m_dec, m_disp, m_index, m_spec }; type EffectiveAddress_t = bits { 3 : Mode_t ea_mode; 3 : uint ea_reg; }; type GeneralInst_t = bits { 10: gi_opCode; 6 : EffectiveAddress_t gi_ea; }; Notes: 1) The width specifiers can be optional. E.g. in EffectiveAddress_t, the width of the rs_mode field is optional. If given, the compiler can check that the specified type fits within that size, or can pad the field if the specified type doesn't need that much space. 2) The width specifiers can be after the type instead of before it. Having typed the above, I think I like the before option. 3) Valid types: bool, uint, sint, enums, arrays of such, other bits types, probably allow bits8, bits16, bits32, bits64, although the latter is somewhat pointless. 4) The first specified field occupies the highest order bits in the machine addressable units that they would fit in. The accesses done at the machine level will be sized as the smallest normal access size that is big enough to handle the bits value. E.g. an EffectiveAddress_t above is 6 bits, so it would be handled using 8 bit load/store. A GeneralInst_t is 16 bits and would be manipulated that way. 5) The reason for highest-order first is so that an initial bool field can be tested by testing the sign bit of a value. Its a minor optimization, but it doesn't seem to cost anything to allow it. 6) enum, uint and sint values are treated using the endianness of the target machine. 7) In the actual MC68000 effective address format, if ea_mode = m_spec, then there are 5 special codes for ea_reg that specific more addressing modes (absolute, PC relative, etc.). It would be nice to have a way to specify that. Perhaps I could allow unions in here, and allow non-privileged programmers to use such unions: type Mode_t = enum { m_dDir, m_aDir, m_indir, m_inc, m_dec, m_disp, m_index }; type NormalEa_t = bits { 3 : Mode_t ea_mode; 3 : uint ea_reg; }; type SpecialEa_t = enum { sp_absShort = 0o70, sp_absLong, sp_pcDisp, sp_pcIndex, sp_imm }; type EffectiveAddress_t = bits { union { NormalEa_t ea_norm; SpecialEa_t ea_spec; } ea_u; }; type GeneralInst_t = bits { 10: gi_opCode; 6 : EffectiveAddress_t gi_ea; }; That's a lot of language changes! One is the use of "union" in this way, and allowing non-privileged programmers to use it here. The other is that of allowing an explicit value for an enum tag. If the union thing isn't done, then I need to be able to use enum values in the ea_reg field. That can certainly be done using either toUint, or subtracting sp_absShort from a 0-origin SpecialEa_t enum member. Is that good enough? Doing that, the run-time system likely cannot know how to print these values nicely. Can it with the union? Perhaps yes, by noting that the 0o7 value for ea_mode is not valid, and so exploring other branches of the union, and similarly noting that the other m_ values are not valid for the SpecialEa_t branch of the union. Hmm. I'm trying to get the Z Fmt code to be an MC68000 disassembler by having smart enough types! Perhaps that is a wee bit too ambitious? Looking through the op-codes, sorted by value, yes that is too ambitious! Still... So, what operations can be done? Well, you can assign and extract the individual fields: GeneralInst_t gi; gi.gi_opCode := 0; gi.gi_ea.ea_mode := m_aDir; gi.gi_ea.ea_reg := 2; if gi.gi_ea.ea_reg ~= 7 then gi.gi_opCode := gi.gi_opCode | 0x1000; fi; What about combined constants? Can parts be non-constant? gi.gi_ea := m_disp : regNum + 1; if gi.gi_ea = m_index : 3 then ... fi; Note that there are no operations other than assignment and comparison for combined bits constants. Is this valid under the union model: gi.gi_ea := sp_pcDisp; or must the union branch be specified: gi.gi_ea.ea_u.ea_spec := sp_pcDisp; Is this another situation where "inline" could be used to allow the former? With enough "inline"'s, do we end up allowing: gi.ea_spec := sp_absLong; I'm not sure I'd want that - its not as readable. But, that was allowed in the previous struct "inline"-ing that we did. Without "inline", this is not allowed: gi := op_moveFromSr : m_dec : 3; To allow this, the "inline" property affects the combined values that are valid, since the 2nd and 3rd fields above are not at the same level of the GeneralInst_t bits type as the 1st field is. Perhaps the answer here is that the EffectiveAddress_t type should not exist separately, since it never appears anywhere else. However, the fact that this single use is the case for the MC68000 instruction set doesn't mean that it is true in general. I recall a case at YottaYotta with some chip that did have a two-part field which occurred two times in a larger value. Would this end up being a whole lot of work (implementing, documenting, testing) that wouldn't get used because people couldn't figure it out fully, and so didn't trust it? Or didn't use it because they are so used to doing manual stuff in C? How does one ask "is this specific set of bits set"? I guess you compare the value for equality against the set of bits. How do bool fields work? Clearly you can set and test individual bool fields. Can you test multiple at a time: type MainFlags_t = bits { bool mf_firstFlag; bool mf_secondFlag; bool mf_thirdFlag; }; MainFlags_t mf; if mf = mf_firstFlag : mf_thirdFlag then fi; if mf.mf_firstFlag then fi; mf.secondFlag := true; mf := mf_firstFlag : mf_secondFlag; Yet more syntax/semantics extensions, but these are the kinds of things that you would do with bits and defined flag values. 071217/Monday Hmm. Doing 'use' in a subpackage doesn't make the symbols available. Do we need to check both sets of use's? What I've done is to simply force all use's into the package, even if they are done in the scope of the subPackage. Interesting - there is no error message about a duplicate 'use'. Fixed - it was a 'for' loop error in Z to C translation. Doing "package ..;" or "package .;" complain about no ';'. Ahhh - the parsing of a package path there does not use the more general code, since it needs to create the packages. So, it doesn't handle ".." or ".' at all, and does nothing when they are seen. I have a problem right now. When a proc is forward-declared and then defined, the body gets attached to the pre-declaration's Proc/Proc_t. But, it is the definition's Proc/Proc_t that is appended to the package. The result is that the proc appended to the package has no body. Bad. Does anything go wrong if I append the pre-declaration's one? That means it is in the package contents list more than once. We need to attach the body to the pre-declaration so that callers of it can use it. Should I just attach it to both? Done that - seems OK. The key is that when a pre-declaration is dumped to a bytebuffer, only the proc name is dumped. 071218/Tuesday %%% Via The Register, with links to Bruce Schneier, use CTR_DRBG random number generator instead of Dual_EC_DRBG. The latter may have a deliberate back door for the NSA. 080105/Saturday New system (Ubuntu 7.10) all set. Did some apartment cleanup ready for Telus stuff to come back starting tomorrow. Next thing in Z, I think, is the stuff about making proc formal names significant to the types, and having a syntax to "rename" the actual parameters. See 070114, which also refers back to 061218. I had actually been thinking recently about a different syntax from that proposed earlier. E.g. type ProcType1_t = proc(uint a, b)uint; ... ProcType1_t ProcRef; ... ProcType1_t realProc = proc(uint m, n)uint: ... corp; or something like that. But, I'm now getting extremely ambiguous in how to process "proc" followed by "(", so I expect I can't parse this right. In this latter case I need to be parsing an actual proc header, not just a proc type. So maybe go back to the earlier form, without the "= proc". Well, unless I omit the "proc", which I almost did when typing the above, but it was going to look weird, and it was going to start with '(', which would also be strange. It would likely work, but only because there aren't any other forms of proc constants. Hmm. Maybe I can use the above as is, in the same way, by skipping the "proc" after the "=". Wait! C/FromZ/Types.c/sameParamLists *IS* currently checking formal parameter names. So is Z/Types.z/sameParamLists. The comment there: /* When calling a proc, we may only have the proc type, and not an actual Proc/FormalList_t. So, error messages can only use the symbol from the ProcParamList_t. If we allow proc types with differing formal parameter names to be equivalent here, then such error messages can use the wrong names, which is just not acceptible. */ The other issue I'm recalling is that the pretty-printer uses the names from the proc type when printing that type (natch!). If two with different names have been made equivalent, the later ones won't exist, and only the names from the first defined are used, for all of them. That's icky too. To parse it, I think it comes down to having a proc type when parsing the proc header, and checking the parameter types but not names. This is very similar to checking a definition versus a pre-declaration, I expect, and so might make use of common code. 080106/Sunday Thinking about it again, I think if I go with the above, then all tests of proc types must be for exact equality of proc type pointers. Since you can essentially force the proc type of a proc constant, then there is no need to allow assignments of proc-type-X values to name-of-proc-type-X variables. Similarly, if there are no named proc type involved, the two types must be the same type, including matching the formal parameter names. In the above syntax, it would also be possible to allow the formal parameters to be omitted if the same names as in the proc type are wanted. However, I don't like that, since it means the formal names are not visible nearby, which would hinder readability. 080108/Tuesday Mostly done, and seems to work fine, both with regular procs and with procs in bundles (tested a polymorphic bundle). Have not yet changed assignInCompat to require exact match on proc types. 080109/Wednesday All done with the above, as near as I can tell. Oh wait, there is the issue of pretty-printing. How do I know that a given proc was produced this way, i.e. its b_procType isn't the one that would be produced from its specified formal parameters? Just add a bool, I think. Got that, but have noticed that instantiated type names are not printed properly when they are used directly from the instantiation. Fixed, but its icky. Thinking about how I format proc definitions. In proctest.z (testing the above stuff), its ugly having only "proc(" on a line, since then the 4-character indented body of the proc is only one character different in indentation. I'm wondering if the 'proc' should be on the previous line, and thus *ALL* proc definitions have the '(' at the beginning of a line. That certainly gives the most space for the formal parameters. Ick. There seems to be some weird kind of stomper. p_typeForced ends up set when I print a proc from a package, but nothing seems to be setting it, and it starts out not set. Argghhh. Presumeably this is the only case of this. I can't put an enum type into a C struct that has to map to a Z struct unless the field after the enum field requires 8 byte alignment. Gcc (at least 4.1.3) makes enum fields only 4 bytes long. So, the C offsets of fields after one can differ from the Z offsets, since Z makes them 8 bytes. 080110/Thursday Changed to a syntax where we have ':' 'proc' <...> '(' ... This looks much better and seems to work fine. The convention will be that the proc name is at the beginning of the second line of the proc header. This makes more sense in an abstract way too - the syntax is essentially [ ':'] . First, use the syntax in some of my test programs. Then go edit all my Z code to use that format. DONE In regards to the C/Z enum issue above, gcc has flag "-fshort-enums" that uses the minimum needed space for an enum value. That would be a good thing to do in Z anyway, so I should go with that. DONE 080111/Friday Finished reformatting all main and test Z sources according to the new style where the proc name is at the beginning of the second line of the proc header. Did the switch to enums to occupy minimum space. I guess I should make ones with 255 and 256 elements to test. 65535 and 65536? Maybe. Defintely not 4 billion! Since I have an O(N**2) search algorithm to check for duplicate enum tags, even the 65536 test program takes way too long to compile. So, whenever I do the change of using arrays for final representations instead of linked lists, I can make the temporary representation for lots of things include a symbol table, and then that test will work. [Later: Don tried it with gcc - they have it working fast.] 080121/Monday Found this link on Slashdot: http://www.parrotcode.org Looks like they are trying for a universal bytecode machine, and there are several languages that have been made to emit the bytecodes. It's written in C, and one of the things they aimed at is the ability to support dynamically typed languages, like Perl 6. Most bytecode machines support only statically typed languages. (Z is statically typed, but I believe it can do anything useful that dynamically typed languages can do.) 080123/Wednesday %%% May want to allow a formatting precision without a width. Could allow it syntactically as e.g. "Fmt/Fmt(floatVar :: f : : 3)" That forces 3 digits after the decimal point, and as many as needed before. Similar for 'e'. 080124/Thursday <> Security thought: don't allow root/admin to do GUI tools other than those for administering the system. No games, no normal applications, etc. <> Thought from earlier about upgrading core (or other) libraries: there could be a standard signal to applications that tells them "please save your persistent and temporary state, then exit". The GUI would then restart the application, pointing them at the temporary state. It could do this without "undraw-ing" the application. Assuming the application does this nicely (which could be part of the requirements for them), the user wouldn't see anything on the screen, but may simply note a delay in the application processing things. 080127/Sunday <> %%% The Fmt package would likely want to cache a CharBuffer/OBuf_t for use with FmtS, so that it doesn't have to create and destroy one all the time. When multithreading comes in, that, and FmtDefaultOBuf need to be per-thread values. So, that raises the issue of thread-local-storage. I think I can do that cleanly with a bundle. The bundle would export a routine to add a tls value, returning 'true' if all is well, and 'false' if there is already an entry with that key, or space is exhausted. It would also export a lookup routine, which I think could directly return the bundle parameter type value that was stored under the key, else nil if there isn't one. If that doesn't work, then perhaps it needs to be a callout to a handler routine registered when the key was added? 080203/Sunday The two kinds of "oneofs" can be oneof-case and oneof-set. 080210/Sunday Finished "set-oneofs", which included a lot of renaming within the system. 080211/Monday Thinking a bit more about 'bits' types. Constants, using the syntax above, are a problem - how do I know which bits type I am supposed to be working with, just based on the occurrence of some ':'s, so numbers and some tags from enums/set-oneofs? Perhaps I need a "constructor" for these constants, which simply provides the bits type. I'm also thinking of the syntax of the types themselves. I don't really like using 'union' in this context - it is only vaguely the right thing. Actually, set-oneof is closer. Can I invent a 3rd use for 'oneof'? It has to be syntactically distinct, since the programmer might put unnamed explicit set-oneofs within a bits type. I *could* re-use the 'bits' reserved word, somewhat like I've re-used the 'case' reserved word: type GeneralInstruction_t = bits { 10: gi_opCode; oneof bits { bits { oneof { m_dDir = 0o0, m_aDir = 0x1, m_indir = 0o2, m_inc = 0o3, m_dec = 0o4, m_disp = 0o5, m_index = 0o6 } ea_mode; 3 : uint ea_reg; } ea_norm, oneof { sp_absShort = 0o70, sp_absLong = 0o71, sp_pcDisp = 0o72, sp_pcIndex = 0o73, sp_imm = 0o74 } ea_spec, } gi_ea; }; This variant gets rid of all of the type names, just to see what it looks like. In actual use, the type names might be preferred. This version does get rid of one level of types - the EffectiveAddress_t has disappeared. But, one of the points of defining that was so that it could be re-used if needed. The above isn't complete, so it might be needed again. I've also replaced the enum for Mode_t with a set-oneof, and removed the unneeded field width specifiers. It's not the easiest thing in the world to read. I think it is easier with the inner types defined separately. Either way, the comma versus semicolon is a pain when typing them: type AddressMode_t = oneof { am_dDir = 0b000, am_aDir = 0b001, am_indir = 0b010, am_inc = 0b011, am_dec = 0b100, am_disp = 0b101, am_index = 0b110, }; type SpecialMode_t = oneof { sea_absShort = 0b000, sea_absLong = 0b001, sea_pcDisp = 0b010, sea_pcIndex = 0b011, sea_imm = 0b100, }; type NormalEffectiveAddress_t = bits { 3 : Mode_t nea_mode; 3 : uint nea_reg; }; type SpecialEffectiveAddress_t = bits { 3 : 0b111; 3 : SpecialMode_t sea_mode; }; type EffectiveAddress_t = oneof bits { NormalEffectiveAddress_t ea_norm; SpecialEffectiveAddress_t ea_spec; }; /* type SingleEffectiveAddressInstruction_t = bits { 10 : seai_opCode; 6 : EffectiveAddress_t seai_ea; }; */ type BitOpType_t = oneof { bot_tst = 0b00, bot_chg = 0b01, bot_clr = 0b10, bot_set = 0b11, }; type DynamicBitInstruction_t = bits { 4 : 0b0000; 3 : uint dbi_reg; 1 : 0b1; 2 : BitOpType_t dbi_type; 6 : EffectiveAddress_t dbi_ea; }; type StaticBitInstruction_t = bits { 8 : 0b00001000; 2 : BitOpType_t sbi_type; 6 : EffectiveAddress_t sbi_ea; }; type OpMode_t = oneof { om_wMToR = 0b100, om_lMToR = 0b101, om_wRToM = 0b110, om_lRToM = 0b111 }; type MoveP_t = bits { 4 : 0b0000; 3 : uint mp_reg1; 3 : OpMode_t mp_opMode; 3 : 0b001; 3 : uint mp_reg2; }; type Size_t = oneof { s_byte = 0b00, s_word = 0b01, s_long = 0b11, }; type SizeEa_t = bits { oneof { szea_ori = 0b00000000, szea_andi = 0b00000010, szea_subi = 0b00000100, szea_addi = 0b00000110, szea_eori = 0b00001010, szea_cmpi = 0b00001100, szea_negx = 0b01000000, szea_clr = 0b01000010, szea_neg = 0b01000100, szea_not = 0b01000110, szea_tst = 0b01001010, } szea_op; 2 : Size_t szea_size; 6 : EffectiveAddress_t szea_ea; }; type Specific_t = oneof { sp_oriCCR = 0b0000000000111100, sp_oriSR = 0b0000000001111100, sp_andiCCR = 0b0000001000111100, sp_andiSR = 0b0000001001111100, sp_eoriCCR = 0b0000101000111100, sp_eoriSR = 0b0000101001111100, sp_reset = 0b0100111001110000, sp_nop = 0b0100111001110001, sp_stop = 0b0100111001110010, sp_rte = 0b0100111001110011, sp_rts = 0b0100111001110101, sp_trapv = 0b0100111001110110, sp_rtr = 0b0100111001110111, }; type ReversedNormalEffectiveAddress_t = bits { 3 : uint rnea_reg; 3 : Mode_t rnea_mode; }; type ReversedSpecialEffectiveAddress_t = bits { 3 : SpecialMode_t rsea_mode; 3 : 0b111; }; type ReversedEffectiveAddress_t = oneof bits { ReversedNormalEffectiveAddress_t rea_norm; ReversedSpecialEffectiveAddress_t rea_spec; }; type Move_t = bits { 2 : 0b00; 2 : oneof { mv_movb = 0b01, mv_movl = 0b10, mv_movw = 0b11, } mv_op; /* %%% Handle limitations on destinations %%% */ ReversedEffectiveAddress_t mv_dst; EffectiveAddress_t mv_src; }; type Instruction_t = oneof bits { DynamicBitInstruction_t i_dbi; StaticBitInstruction_t i_sbi; MoveP_t i_mp; SizeEa_t i_szea; Specific_t i_sp; }; Clearly I need to be able to have specific bit patterns inside bits types. It's fairly common to have "must-be-zero" bits in registers. And, for the partial MC68000 instruction definition above, they are key to allowing disambiguation inside bits-oneofs. An interesting puzzle is that of how to represent the various restrictions during compilation, to enable the Types code to detect overlapping values in bits types. Similar structures will be needed when printing bits values. A first thought is something along the lines of lists/arrays of pairs - each pair consisting of a mask specifying the fixed bits along with the value under that mask. A list of these is needed when there are restrictions on possible values, such as the above AddressMode_t not allowing 0o7. Thinking about how to implement checking for duplicate bit patterns in entire bits values, and how to decode actual values into the set of symbols that describes them... Let V be a bit that has a variable value, and F be a bit that has a fixed value. For simplicity, assume a bits value with 4 bits. If the description is a top-level bits-oneof of simple bits values, then there is a conflict between a pair of described value-sets A and B if: 1) there is no overlap of the fixed parts of A and B, or 2) the overlapping fixed parts of A and B have the same value E.g. FVVV and VVFV conflict by rule (1). VFFF and FFFV conflict by rule (2) if the center two F bits are the same in both values. If the "branches" of a bits-oneof contain one or more set-oneofs, then 3) if all set-oneofs specify all bit patterns in the spaces they occupy, then they can simply be considered to be V (fully variable) values. 4) if a given set-oneof does not specify all bit patterns in the space it occupies, i.e. if there are invalid bit patterns within it, then one way to handle it is to conceptually expand that bits description to be an entire set of descriptions, one for each value in the set-oneof. Within those descriptions, the set-oneof values are considered to be F (fixed). For example, if the bits description contains a 2-bit set-oneof field, for which only the values 0, 1 and 2 are valid, followed by a 1 bit variable field and a 1-bit fixed field, then that description is equivalent, in terms of correctness checking, to one which consists of 3 bits descriptions, consisting of 00VF, 01VF and 10VF. This tentative approach is N**2 in complexity. Careful coding can cut the multiplier down by not comparing values artificially produced by rule (4) above. However, values not produced by a specific rule (4) expansion must be compared against all of the expansions produced. If the top-level description is not simply a bits-oneof of simple bits types (ones that do not contain nested bits-oneofs), then things get more complex. If only one of the inner bits types has its own bits-oneof, then recursing into that for each description being tested for duplication should be good enough. However, that doesn't work if the description being added itself has an internal bits-oneof. Can I restrict bits-oneofs to be top-level entities? The MC68000 description looks to be going in that direction. Can any desired bits description be cleanly reworked (by the human programmer) to be in that form? CPU instruction encodings get complex. However, the real intent behind bits types is to describe hardware registers and pre-defined protocols. Do they ever get as complex as, or more complex than, CPU instruction encodings? I think protocols can. E.g. if you tried to build a single bits type describing TCP/IP, I think you would get a great deal of complexity. However, few people would try to do that - the result would be unwieldy. Someone might want to do that as a way of formally describing a protocol, however, in which case the Z compiler will see such a description. 080213/Wednesday (Doing 'bits' type stuff). Probably should have an ErrorSub for types as well. Skip over it to find the nature of the type, but do not generate code for a proc that uses any element containing such a type. I'm not consistent in what is returned from routines that are adding symbols to types. Struct/record/union return a SymInfo_t. Enum, set-oneof and bits could as well but currently do not. case-oneof can't because there are two symbols for each variant. [Resolved] Getting to code generation for 'bits'. I've been allowing a 'bits' type to have more bits than BITS_PER_WORD. If I do that, then what size of load/store do I use with the individual words of it? Must it be the full BITS_PER_WORD for each one, or does it depend on the field being fetched/stored? It's certainly easier if anything longer than 32 bits is referenced via 64 bit fetch/store. I'll go with that for now. Done code generation for bits types. Except for array fields. There is nothing I can use to do that - it will take manual creation of the indexing code and range checks. This leads me to ask if it is really worth the bother. I can't imagine a piece of hardware having a register that has an array of something in it. There can be multiples of things, but they are going to have fixed uses, and so are better described by named fields, rather than array indexing. My current thought is to simply get rid of the capability of having array bits fields. [Done] I don't actually recall the reason I allowed them in the first place - my vague memory is that Don suggested doing them for completeness or something. If you want to have an array of bool's, you can do so directly - no need to use bits unless you have a very large number of them, in which case a bit of manual bit-fiddling code won't hurt. [Removed] Make sure that we can't take the address of ('@') of bits fields. [Fails with: *** Cannot '@' this value] Cannot allow enums in bits types. Since the hardware usefulness of bits types requires that they have a fromBits, there is no useful way to ensure that enum types stay within range. This is also true of set-oneof types, but I think that is handleable, since no arithmetic is possible with them. [Done] 080217/Sunday Spent the last two days doing the code generation for bits constructors. Some tricky bits, but I think it is now all there. It also should be a reasonable model for similar code for a real CPU. The code for setting fields in bits values is not as good - it will do things like pushing a constant, then shifting it, then IOR-ing it back into the bits value. It wouldn't be hard to notice that the value is a constant (e.g. take the code that does that in the bits constructor code, and turn it into a common utility to use) and just push the shifted value. [DONE] I've pretty much concluded that bits-oneof's are a non-starter. There is no easy way to ensure correctness - I would need to do a 'case' on them like I do for case oneof's. But, there is no obvious thing to base the case selection process on - the compiler would need to examine each bits-oneof and produce an algorithm to do the determination efficiently. Ick. So, since the only use for them that I had thought of was the artificial use of a hacky disassembler, they just aren't worth doing. The disassembler utility can be done by library routines - they don't need programming language support. <><> Next up is units - a big one. Latest vague thoughts: unitKind time; unit uint second = time : sec; unit uint minute = second * 60 : min; unit uint hour = minute * 60 : hr; unit uint day = hour * 24 : dy; uint uint week = day * 7 : wk; unitKind potential, resistance; unitKind current = potential / resistance, power = potential * current; unit float Volt = potential : V; uint float Ohm = resistance : Ohm; unit float Ampere = current : Amp = Volt / Ohm; unit float milliAmpere = Amp / 1000.0 : mA; unit float Watt = power : W = Volt * Ampere; Or something. The two '=' are icky. The idea is that you can write a numeric literal and append the abbreviation for a unit after it, to make the constant be of that unit. The Fmt code would be aware of units, and would print them properly. Both would need to handle unit expressions, for when there is no name for a given expression. Oh yeah - do I have exponentiation in the system yet? I need it, and it would come up in units, where some things need e.g. "sec ^ 2". 080219/Tuesday <><> The only use for '/' in a declaration is in the type. So, perhaps I could do something like: unit uint time:second/sec; unit uint time:minute/min = second * 60; ... Or, perhaps it makes more sense to have the unitKind closer to the 'unit': unit time uint second/sec; unit time uint minute/min = second * 60; ... Maybe even put the unitKind in parens: unit(time) uint second/sec; unit(time) uint minute/min = second * 60; ... Do I even need the 'unit' keyword, which is hard for me to type?: time uint second/sec; time uint minute/min = second * 60; ... Makes some sense, since we are declaring second and minute to be units of kind time. Hmm. In which case 'unitKind' can be become 'unit'. unit time; time uint second/sec; time uint minute/min = second * 60; time uint hour/hr = minute * 60; time uint day/dy = hour * 24; time uint week/wk = day * 7; unit potential, resistance; unit current = potential / resistance, power = potential * current; potential float Volt/V; resistance float Ohm; current float Ampere/Amp = Volt / Ohm; current float milliAmpere/mA = Ampere / 1000.0; power float Watt/W = Volt * Ampere; I'm not sure about syntax for use. I need a way to add a unit to a simple numeric value. An easy answer is something like: float x := ..., y := ...; Ampere a1 := Volt(x) / Ohm(y); float z := unUnit(a1); /* This uses 'unUnit' as a generic "remove units". */ day timePassed := 6 dy; Volt v1 := 13.75 V; Amp a1 := 13.5 mA; /* Implicit scaling from mA to Amp */ Watt w1 := 120.V * 15.0A; /* Is space needed? What about exponents? */ second s := day; /* Do I allow this, with implicit scaling? */ minute m := v1; /* This is definitely an error */ I.e. the abbreviation (if any) of a specific unit can be appended after a literal constant to make that constant have that unit. The lack of any other punctuation makes me a bit concerned, but I think that syntax is perhaps the most readable to end-user programmers. This of course leads directly to users wanting to use the capital Omega for Ohms. No way. Also, is the unit suffix needed when declaring a constant? The desired unit is clearly known. Perhaps the explicit suffix is only needed when literals are used in expressions. But, constant expressions can be used when initializing variables or named unit constants. I think the space should be put in by the pretty-printer, so they will effectively be always there. Hmm. I think I should *not* allow implicit scaling, not even for constants. The reason is that missing such a scaling is likely a common cause of bugs. I can detect the problem and point it out, so the programmer can put in the needed scaling. What does that look like? Perhaps just use the unit-name constructor to request conversion. Or, perhaps it is clearer for the users to allow them to do explicit scaling: Ampere a1 := ...; milliAmpere ma := a1 / 1000.0; Do I allow more complex conversions? I likely need to, since that's what real-world formulas will need. E.g. unit uint secondsSq/secSq = Second * Second; unit uint milliSecond/ms = Second / 1000; milliSecond ms := ...; secondsSq n := ms * ms / 1000000; Do I want to allow all of uint, sint and float units? Should I just stick to float units? Scientists typically use only float, I expect. But, what about other uses of units, like in programming, where uint may often be the appropriate type? Should I simply allow conversion among representations without affecting the units? That means that conversion among uint/sint/float must be language constructs, rather than utility routines, so that the units can pass through properly. (Don't talk to me about generic functions - that usage does not match Z's generics.) Do I need to explicitly state the relationships among potential, resistance, current and power? Or, should I infer it from the stated relationships among Volt, Ohm, Ampere and Watt? Do I even need it? I can't make use of it unless I know an explicit example that lets me know which specific units to use. Why do I even need what I was calling a "unitKind" at all? Aren't the specific units good enough? unit uint second/sec; unit uint minute/min = second * 60; unit uint hour/hr = minute * 60; unit uint day/dy = hour * 24; unit uint week/wk = day * 7; unit float Volt/V; unit float Ohm; unit float Ampere/Amp = Volt / Ohm; unit float milliAmpere/mA = Ampere / 1000; unit float Watt/W = Volt * Ampere; 080220/Wednesday <><> Found a table of units in the "Handbook of Chemistry and Physics". Many of the proper names have non-ASCII characters in them. This allows them to be fairly short, but less conflicting, by expanding the effective alphabet quite a bit. Allowing those symbols in Z programs goes counter to my belief that actual programs (as opposed to string literals, etc.) should use a very restricted character set, so as to be as universally readable as possible. That means that units will have to get ASCII forms. I'm not sure that I am the right person to do that. However, specialists from any given field will naturally tend to pick the "cleanest" names for the symbols they use. Since I'm not a specialist in any fields, perhaps I'm as good a person as any to do that. One solution to the issue of uint/sint/float is to not include the representation in the unit. Thus, the unit is representation-free. When variables/named-constants are defined, the unit must be combined with a representation at that time. E.g. unit seconds/sec; ... seconds:uint s := getATimeValue(); seconds:float micros := toFloat(s) / 1000000.0; The subject of scaling is bothering me. When explicit non-scaled constants are used in formulas, Z has no way of knowing whether the writer intends some or all of those numbers to be scaling constants or not. They may simply be other aspects of the computation. I think there are only two workable alternatives: 1) Z silently adds all needed scaling. This keeps the code cleaner, when viewed, but I'm not sure its clearer. Consider a long-used program that is translated to Z. If the original was missing some scaling, the answers that the Z version produces can be different than those produced by the original. Which are right? If the original had some scaling being done explicitly, and those scaling factors are not removed when translating to Z, then the Z version will be wrong. If the original program was missing some needed scaling, then the Z version can be right. Either way, if the original program is trusted, then the Z version will not be. 2) All scaling must be implicit. This can be ugly - what is the syntax? One possibility is to not allow scaling at all, other than perhaps on constants and in text input and output. For example, all time computations could be carried out using seconds. With 64 bit numbers this ought to be sufficient. However, I expect some explicit computations deliberately do scaled work in order to get more precision, based on knowlege of the range of values involved. This is an aspect of floating point computation that is quite important. 080221/Thursday <><> %%% During lunar-eclipse-watching last night, Don and I chatted. He also skimmed the PHYSCAL article in SIGPLAN. PHYSCAL converts all values to one common dimension for a given unit. E.g. all lengths are converted to meters. Conversions are done on constants specified with other units, and on input and output. They seem to allow integer representations, but all of the examples use REAL. I'd like to allow integer representations, so that unit arithmetic is still relevant to environments where floating point is not a good idea, such as on a small embedded processor. However, as Don asks, if you have 2.4 feet, and convert to an integer internal representation, what do you get? Well, you get 0 meters if you use truncation. Ick. We did back-of-the-envelope and concluded that 64 bit floating point has enough range to handle everything that scientists would do. I remain a bit sceptical, even if it can represent the universe's size when expressed in Angstroms. My current thinking is to continue roughly as I had been. I'd like to work in PHYSCAL's automatic use of the u/m/k/M/G/etc. scaling factors on abbreviations, if possible. To do that, I have to know when I am looking at a unit abbreviation, since they will come through the tokenizer as identifiers. I could add knowlege of those scaling factors to the tokenizer, but that doesn't help, since it doesn't know when it should be splitting them off and when it shouldn't. We can't have a whole mess of single-character identifiers be suddenly reserved. The same holds for the unit abbreviations in general - they are too short to be reserved in the language as a whole, so they must only be special in restricted contexts. The ideal is that they come from packages, just like other identifiers. So, I believe I end up with: - no "unitKind" things - not needed - units are as in: unit second/sec; unit minute/min = second * 60; unit degreeCelsius/degC; unit degreeFahrenheit/degF = degreeCelcius * 9.0 / 5.0 + 32.0; - variable declarations are as in: seconds:uint s := getATimeValue(); seconds:float micros := getMicrosValue(); - conversions are done using a new "toUnit" construct. This is very close to the "toUint" construct, but has two "parameters", and so their shouldn't be syntactic confusion. To add a unit to a non-unit value: float f := ...; Volt:float v1 := toUnit(Volt, f); To convert from one dimension of a unit to another: seconds:uint s := ...; milliSeconds:uint ms := toUnit(ms, s); To remove the units from a value: Watt:float w := ...; float f := toUnit(nil, w); Note that the units specifier can use either the full name of the unit or an abbreviation for the unit. It can also be a unit expression, as in: unit Joule/J = kilogram * meter * meter / (second * second); Joule:float f := toUnit(kg*m*m/(s*s), ...); Note also, however, that declarations must use the full name of a unit - the abbreviations are not recognized in the normal language context. I may be able to allow them in unit definition right-hand-sides, but I'm not sure. - the Fmt routines will see unit types as named types (I haven't yet figured out how all of this will be represented), and will thus allow custom output formatting routines. I need to change Fmt to allow a precision when no width is specified. For example, if there is a time format that produces hh:mm:ss, it would be nice to be able to not specify a width, but to specify a precision of 2 to specify two digits after a whole seconds value. - run-time implementation of this should be "fun" for input of these values. Ideally it would happen in a common numeric input decoder routine, like my current String/ParseFloat, but how is it supposed to figure out what units are appropriate for the context? It would need fingers deep in run-time knowledge to do that. Perhaps there should be a generic scheme in the input code (not ParseFloat), which has access to the type of the variable it is reading a value for, and can thus have access to any units involved. I don't currently have an exponentiation operator. I need to add one. I'm currently thinking of using '**' for it, instead of the '^' that I could use, so that I save '^' for some other use. [DONE - use '^', since '**' can occur with pointer types.] It will be needed to convert from uint/sint encoded units to float encoded units and vice versa. So, I need language constructs for that after all - a simple library routine won't work, because it won't accept arguments with unit types, nor will it yield one. Forcing programmers to strip out the units, then re-add them is just ugly and error-prone. I think I want both a rounding and a truncating version of float => sint, since having just one requires the programmer to check the sign of the value before being able to add/subtract 0.5 and use the other operation. Given no other obvious choices, I'll just pick "round" and "trunc" for now, along with "flt" to go the other way. Do Fmt stuff before proceeding with units. Clean up the interface to "assignIncompat" first as well. 080224/Sunday Haven't done much. Did '**', 'flt', 'round' and 'trunc'. 080228/Thursday Been avoiding sitting at the computer. Lots of Lego put away. Today, worked on low-level stuff. Added ctx_isPriv, and check it for union field references. Added stuff for creating pointers and dereferencing them. Bumped into an issue: 'nil', when pushed, is a tracked value. When used with a pointer (e.g. assignment, parameter passing, comparison), it needs to be not a tracked value. Having it tracked messes up the tstk, since pshnil pushes to tstk. Added psh0 instruction. Initially did some manual stuff to detect when to use psh0 instead of pshnil, but later realized that that is not the right way to do it. That detection should be done in Exec code, not in a code generator. So, I need to add an eik_null (like eik_nil, but for pointers), and modify stuff in a few (hopefully!) places to cause eik_nil to become eik_null where needed. DONE 080305/Wednesday (Last few days - ran into a gcc code generation bug relating to my use of -fshort-enums. Also parted-out and put away my Lego ISD.) "Singularity" OS, applications, languages: Reference: http://research.microsoft.com/os/singularity/ There was an interesting bug that showed up in various run-time errors, like "ref table index too big", invalid opcode, etc. Turned out to be my code in C/Hosted/bcComp.c/bcStackBool - there was a magic '5' for the distance of a forward branch. That became incorrect when I added the bc_psh0 opcode and used it in bcPushConstant. I've change the calls to bcPushConstant in bcStackBool to just explicitly generate the 1/0 constants. I think I've finished the Exec work to change eik_nil to eik_null when needed. Basically, Exec/MakeNull searches through any Exec_t structures that could be yielding 'nil'. Everywhere there is a call to assignIncompat, I also do the check for nil=>null conversion. My current test/lowlevel.z now runs without leaving an unbalanced tstack. 080308/Saturday Low level stuff (pointer use, pointer arithmetic, '&', 'sizeof' and 'pretend') done and somewhat tested. Realized this morning that using '**' for exponentiation (I want it in the language so it can work with units) wasn't such a good idea, given that I use '*' for pointer types and operations, and so having 2 or more in a row is going to happen. So, I need to switch to '^' for exponentiation. [DONE] Need to update my writeup on the language. [DONE] %%% Need to add initializers. I really do need them, I think. As part of that, I expect I'll need to use something like '*' as a bound for an array whose size is determined by its initializer count. Then I'll need to make 'getBound' work with arrays as well as with matrixes. And, I'll need to make such use be handled in IsUintConstantExpr and GetUintConstantExpr. Still haven't started on units - been putting it off because I'm still not sure its the right thing to do, and it will be a pretty hefty chunk of work. Hmm. Still haven't handled bits types in Fmt/Fmt. [DONE] 080310/Monday Note: if a package has internal types that it does not wish to expose to others, it is not safe to pass values of those types, or containing those types to compileTime procs from other packages. Doing so passes a reference to those types to such a proc, which is then free to examine and use that type in any way. Having done the bits types in Fmt/Fmt, do I want to go further and do similar things for arrays? I can't efficiently do the same thing for structs/records/oneofs without one of: a) generating a lot of code in-place to reference the individual fields. If there are multiple uses of a given type with Fmt/Fmt this can generate a lot of duplicate code. b) having general code that, at run-time, creates procs that reference the fields, and thus gets their values for output. This doesn't duplicate code, but is expensive at run-time. [Actually, this might not work - a general routine can only accept the reference as an 'any' or 'uint', and can't turn that back into the proper typed reference. Maybe some uber-weirdness with doing an instantiation of a bundle?] c) using some privileged code, and walking along the values using pointers, and doing pointer casts and dereferences. d) below, DONE A better answer may be to export a routine, say Fmt/AddFmt, which is passed a type, and at that time (typically someone's compileTime) creates a 'fmt' routine and adds it as an export on the type. [DONE] Change set-oneof's, so that all values of the given bit size are valid. Accept them on assignment, show them just as a uint on output. Also allow toUint on set-oneof values. This makes them more general and possibly more useful, and doesn't really remove much safety. [DONE - toUint already accepted them, and allowing uint to be assigned to them means that fromUint isn't needed.] 080311/Tuesday Fmt/Fmt can handle struct/array/union if it passes a ref ('@') to a provided fmt routine. [DONE - the '@' must be to a named type.] %%% Lists package could export an "AddFmt" routine which looks for a fmt routine on the list element type and constructs a fmt routine to add to the instantiated list type. Lists package could have the small routines be compileTime, and thus effectively inline themselves. Pick a standard for Types and Exec constructor names (with or without the "New" on the end) and switch to it. Given the name duplication of Proc in Exec, it looks like the New should be used. [DONE - the "short" final constructor names in Exec have had "New" appended.] 080312/Wednesday Don't convert a nil name to "" in the proc creation code. Handle it as nil throughout. This avoids issues like CreateDirectReference thinking that is actually a name to look up in tables. [DONE] Hmm. Looks like there are unresolved issues with the old listtest.z file. Lots commented out as part of investigating. This also affects file bundle/test1.z, and construct/listtest.z . Need to persue. [Fixed - it was some fiddles in assignIncompat about where the TypeInfo_t's were being assigned that broke it earlier today.] Arghh. I *don't* want the routines created by Fmt/AddFmt to have a name, since I don't want them put into their containing package's contents. If that happens, then they are written out, and when the package is read back in, they will already be present, which invalidates the creation of them that will be done by the existing compile-time call to AddFmt. Soooo, I think I need a way to directly represent an anonymous proc in Exec_t's, so that a call to it can be compiled into code, which is what the Fmt code does with "fmt" routines it finds. Having that direct reference ought to be OK, since it only happens when the proc, and the code that calls it, are dynamically created. [DONE, and used in Fmt/.] [Later: they have gone back to having names, but since they are not in the package's contents list, they won't get written out or shown in pretty- print.] 080313/Thursday Did some of the above smaller things. <> %%% There is a distinct difference between a package variable and a persistent entity in the package. For example, types, bundles, procs, and other [sub]packages are individually persistent entities. They have their own special syntaxes when displayed in a text view of the package. These forms are what the current parser (and presumeably in the future) handles. To put something like a picture in a package, it needs to be of a different nature. Such a persistent item does not occupy space in the allocated package variables space. It has its own location in persistent store, and its representation in the in-memory package elements vector should include that location, and also an in-memory reference when the item is actually in memory. That in-memory record describing the persistant item should likely also include a "dirty" flag, indicating that the item needs to be repersisted. New Exec kinds will be needed to access and assign to these persisted items - they may need to bring the item into memory. For large persistent items, the description record could contain a cookie slot for use by code that deals with avoiding having all of the large item actually in memory. (Vague arm-waving here!) So, I think initializers for compound package-level items should only be for constants. The initializer elements must each be a compile-time constant appropriate for the element type. This allows them to be things like calls to compileTime procs. Any needed evaluation is done when the compound constant is created by the parser, or when the package is read in from persistant storage. Since a package itself is one of the above persistent items, it can use the cookie to avoid bringing in all of a package at once (although that may not matter, or even be a good idea). Example: proc compileTime powerOfTen(uint i)uint: uint n := 1; while i ~= 0 do i := i - 1; n := n * 10; od; n corp; [10] uint PowersOfTen = { powerOfTen(0), powerOfTen(1), powerOfTen(2), powerOfTen(3), powerOfTen(4), powerOfTen(5), powerOfTen(6), powerOfTen(7), powerOfTen(8), powerOfTen(9) }; type Str1_t = struct { uint str1_id; string str1_name; [2][2] str1_values; }; Str1_t TheStr1 = { 1037, "key", { { 10, 20 }, { 30, 40 } } }; There will need to be routines to manipulate persistant items in packages. E.g.: export proc AddPersistantItem(Types/Type_t; any val)void; Hmm. An interface like this makes it look like all persistant items should be trackables. I think I'm OK with that, since a persistant uint, for example, could actually be a Base/Uint_t. That makes accessing them from code a bit more awkward, but it might be worth the simplicity. Ok, so if that is the case, then the Type_t may not be needed. Having it there explicitly would allow more code to work without being privileged, however. E.g. using Types/ExportFind to find a routine to display the item. Similar for routines for fetching into memory, repersisting, etc. (Quite a bit more wild arm-waving going on here!) 080314/Friday Types/ExportAdd should be compileTime. It needs to check that the code calling it is from the same package as the type that it is adding an export to. [DONE, but need to change stuff so that the exports table is in the NamedDesc_t, and not in the Type_t.] %%% A program documentation viewer should have links directly to the source package, so that the "text" of proc headers, comments, etc. does not have to be duplicated. Should those links allow writing, i.e. can change the version in the documentation document and have the version in the code package be updated at the same time? 080315/Saturday May have hit a big issue with bundles. The back of my mind has always been nervous about them. I just realized that my CharBuffer/Consumer bundle is polymorphic, not generic. Its creator routine: export proc OCreateSC(uint size; Consumer_t consumer; cookieType cookie)OBuf_t: has two parameters that involve the bundle parameter type "cookieType". When the consumer routine is used, it is passed the cookieType that was given in this call. But, nothing prevents them from being mismatched as for the types being passed in. Making the bundle "generic" fixes that issue. That requires that all of the CharBuffer procs be put inside the bundle, since you can't use an uninstantiated generic type (OBuf_t) outside of its bundle. Fine, but doing that in turn means that all of those procs are bundle procs, and must be called relative to an instantiation of them, so that they are not uninstantiated generic procs. That in turn means that you can't use the simpler forms of CharBuffer that don't involve an instantiation. Ick. Next step is to actually test this, to see if I am correct. Ok, I *seem* to be safe. The three ways I tried it (bundle/newtest.z) are: /* Cannot determine that bundle param "cookie" actual value has same base as proc being called */ t := Create(consumer1, t2); /* Type "T2_t" is not equivalent to type "T1_t" */ t := Inst1.Create(consumer1, t2); /* Type "Consumer_t" is from a different instantiation of bundle "Problem" */ /* Value is not compatible with parameter "consumer" */ t := Inst2.Create(consumer1, t2); I need to examine the details of the first error - I expect it is only allowing proc types that are instantiations to be passed in to the non- instantiated proc. Well, the good thing is that I can't break it. I take that back, I can break it. The key, right now, is to pass only proc values to a routine exported from a polymorphic bundle, that then puts one proc and the result of calling the other together in an invalid way. Either the constructor code has to disallow this, or the constructor code has to check that the values are all local or something, and the proc call must fail. The first test I'm not sure how to do. Thinking about it a bit later, I can put the same tests on constructors as I do on calls to procs from generic bundles. The CharBuffer situation is one in which I have a combined record - it has the bundle-like stuff (the Consumer_t and the cookie) as well as private CharBuffer stuff. To keep it private, I'm constructing it inside the CharBuffer routines. The more standard (well, how I've been using bundles) way to do it is to have the instantiating call build a bundle containing its Consumer_t and cookie, then pass that to the CharBuffer constructors, which can then build the OBuf_t's, keeping them private to CharBuffer. It's one more level of indirection, but that's not fatal - the indirection is only used in the "flush" routine. Code generator problem with io/test1.z, assigning to a bitfield. Ick, there is an unconditional call to bcComp with "wantAddress" true for the left-hand-side when bitfield assignment is happening. Things need to get a whole bunch more complex there. 080316/Sunday Test program "test/strInRec.z" is testing the use of a non-exported struct type inside an exported, public record type. The effect I was trying to achieve, that of having something inside the record that code outside of the defining package cannot even see, works, but requires an extra level of type renaming. Without that extra level, external code can modify fields of the struct. That's because so much of the code in the Z semantics calls Types/SkipNameAndExec to skip past a type-name to get at the actual type. Also, when a field of a struct is referenced, the field name is simply searched for in the struct definition, so the fact that the struct field names are not in the package export table does not matter - they can still be used outside of the package. I believe I had sort-of noted this kind of effect before, but it is still somewhat icky! One possibility is to change Types/SkipNameAndExec, and Types/SkipOneName, so that they examine the NamedDesc_t of the named type to see if they should skip it or not. They would compare the package that the symbol is defined in against the package in the Proc/Context_t, and if they are different then look the type name up in the defining package's exports table to see if it is public (or that could be marked in the NamedDesc_t). This of course requires passing ctx to those routines all the time. Its possible that could be icky. Also, a lot of code would now end up possibly seeing tik_named types when before they couldn't. [Later - the issue is fixed by allowing structs to have 'private' overall and field attributes.] Gack. Looking for used of Types/SkipOneName. Need to figure out again why I'm using it in some places - the TV stuff points at the named types, and presumeably that's what I want to check for, so why are a number of places using SkipOneName when comparing type pointers? [I got rid of most or all of those.] 080317/Monday Looking at Types/IsMultiple, etc. They all recurse all the way down through an arbitrary number of tik_named nodes. So, a user can in fact find out what general kind of type an otherwise unknown type is. In fact, I'm not sure you can hide a trackable type at all, in the sense of knowing it is trackable. I could add a Types/IsUnknownType() to detect the double-naming situation, but I'd have to use it in a lot of places, and add a lot of error messages. Also, note that currently, any compile-time code can completely examine any types that it can get its hands on. To prevent that, I would need to do something like make NamedDesc_t be a private type, so that you can only get at the public fields via some exported procs, which would only do so when appropriate. Came up again 2009-06-xx %%% Add an iterator to Mapping.z that iterates over the keys. 080318/Tuesday New test C/bundle/newtest2.z - shows gaping type holes in bundles. Sigh. See [080314/Friday]. The change made to ExportAdd, to only accept named types, has broken the old lists code (and any test program that uses it) since it was creating unnamed types - the types were being named in the calling code. I knew this was going to happen, so I need to get rid of the dependencies on that old Lists package. That's already done for the pseudo-GUI bundle/test1.z example. Yet another problem. bundle/gen.z gets a SEGV - bu->bu_contents is NIL in Package_AppendBundleInstantiation, called from parseBundle. This is after commenting out "test2" inside Test2 (Test2 is now empty), along with the second use of "test2" in main. Ah, its the act of trying to instantiate an empty bundle that does it. [Fixed] 080319/Wednesday %%% Currently, Exec code checks ctx_isPriv to see whether it should allow the use of privileged code. It could at the same time call a Proc routine to set a flag saying that privilege has been used. Then, at the end of building a proc, the entire proc can be marked as using privilege or not. The privilege check will be active when code is created from a bytestream, such as when it is read from disk, or gotten from another machine. In the latter case, that remote machine should also be sending a system identifier for itself. The code stream should include a summary flag saying whether any of it is privileged. If it is, then the receiving machine should verify the triple of [sending-identifier, sending-IP-address, sending-port- number] with a central server. If they don't verify, then the receiving code is not granted privilege, and so the received code will not be received with the privileged stuff enabled - it will be error-ed out. So, how reliable is the idea that only one machine on the internet can have a given IP-address/port pair? [The issue is not that of more than one machine on the internet - it is a simple matter to use whatever IP address you want on an internal network.] This scheme might actually work pretty well. Any machine receiving new code, or updates, etc. will essentially be recompiling that code. So, it will be validating it with the full validation of the Exec/Types/Proc packages. Even if the sending machine is compromised, it seems that it can't spread its badness to other machines through bad code. 080320/Thursday In C/Hosted/pExec.c/parseBundleSelection - the old issue about not using the 'sr' value. We need to create and use an Exec/InstantiatedTypeRef here. That will require stuff to deal with eik_instantiatedTypeRef like it deals with eik_type. In Exec.z/modifiable, we unconditionally allow modification of an element of a matrix. That's OK for its explicit 'ro'-ness. But, what if the type of the elements is such that they should not be modified in the context in which the modification is being tried. See bundle/temptest.z . Tested in the '6' set of temptest.z, using a struct rather than a matrix. It allows me to swap around pieces of the uninstantiated types in the struct. It shouldn't. So, I need "modifiable" and "getAttributes" to return values based on the type of the items as well as on any explicit "ro" encountered. 080321/Friday Should perhaps separate out the functionality of getAttributes that yields the "volatile" property, since that is the only one needed at code generation time. [DONE] "modifiable" checks for modifying a private oneof. Do I otherwise allow assignment to a oneof alternative? I thought I didn't. I don't see a reason not to, mind you. Tested, works fine. Given that I have a byte-code "popvrnt" instruction, I guess I intended it. 080323/Sunday Finished cleanup of Z/CharBuffer.z and Doc/CharBuffer. Working on cleanup of Z/Fmt.z - did some cleanups, improvements. I think I need to allow a 'fromUint' for bits types - otherwise how do you nicely read them from hardware? Well, right now I'd like to be able to get an invalid constant part into a bits value so that I can try out the Fmt code to show that. Ok, I'll hack that with a pretend for now, then comment it out. Worry about whether I want 'fromUint' later. [DONE] 080324/Monday %%% It would likely be good to add the ability to call compile-time procs outside of any proc, directly at the package level. That way, things like calling Fmt/AddFmt can be done right after defining the type in question, rather than having to stick the call in some proc. The proc doesn't directly relate to the type after all, and any __PackageInit__ could be a long ways away. Hmm. Seems to be a problem with an ioProc call with no extra args? E.g. "FmtB(ob)". Got this with Fmt/CreateFmt. [Resolved] Why doesn't control-C kill a Z program in an infinite loop? It used to work. It works fine with "runit.digits", so why doesn't it work with whatever is happening in my testing of Fmt/CreateFmt? The stuff for producing new sequences and new scopes is simply far too hard to use. I never get it right. Clean up somehow - perhaps always use a scope, relying on the Proc code to discard empty ones, and the (now internal) sequence code to omit single-element sequences. And, likely always put the new sequence as ctx_currentSequence. [Resolved] I think I'm finished with Fmt now. Fmt/CreateFmt is quite general, and handles arrays now. That may have been overkill. :-) Looks like I need a few more checks in type constructors, for things like trying to use Void, or incomplete struct types. See UnionAddMember for a place that does have those tests. [Done] 080326/Wednesday Clarifying bundle rules. Brain seems to be working OK, helped by cleaning up the code and observing what effect things have. The thing with polymorphic bundles is to not allow "breaking the bundle". The "bundle" is the packaging of multiple values that are all of the same instantiation of the bundle. E.g. an API record and some user-defined structure, and possibly some fixed fields common to all instantiations. Such a "bundle" is a safe entity, in that everything in it belongs to the same instantiation of the types in the bundle. The rules of the language need to ensure that it stays that way, while still allowing the kinds of operations needed to work with them. It isn't strictly necessary to disallow a constructor for the uninstantiated form of such a type - the compiler could check that all of the values are consistent with the uninstantiated forms, and that all types that match up to bundle parameter types are the same. I was trying to go this direction early on, before I came up with bundle instantiation. However, there is no need for the complexity - just require that only the instantiated forms of the polymorphic types to be constructable. The uninstantiated forms must not allow their individual fields to be modified. The entire "bundle" values can be passed around, and variables referencing them can be modified as per the usual Z rules. However, the language must not allow the modification of a field from within an uninstantiated polymorphic bundle type. This included making an '@' of such a field be 'ro'. The exception to this rule is when passing an '@' of such a field to a proc extracted from an API record within the same value that the field is. Because the proc types are instantiated the same as the other bundle types, this can be safely allowed - it does not "break the bundle". It is also crucial to the usefulness of polymorphic bundles. Note, however, that fields in such values that do not involve the bundle parameter types are not restricted by the bundle mechanism - they can be modified at the uninstantiated level. At the moment (2pm) test bundle/newtest2 is still showing a hole. Ick. Also, test1.z fails to compile again. This is after changing the C version of "getAttributes" to just use "modifiable". But, that does not allow for the "noPrivate" parameter to "getAttributes". The latter has been fixed by using "noPrivate" on the new "getAttributes", to ignore a moderr of moderr_privateRecord or moderr_privateCaseOneof. But, what is the implication of doing that? The issue is that the "LayoutManager_t" value really is 'ro' in the context of the caller, since the caller is in a different package than the LayoutManager code. But, you need to be able to modify the LayoutManager_t in its package. The current test is being "hacked" in a situation that is specific to bundle involvment. Should it be more general? I.e. if a package exports a record type, and someone passes an '@' to a field within it to a proc in that package, should the proc have write access even though the caller doesn't? I'm thinking it should. [Later: I did not try anything like that. Instead, a caller should pass a plain '@' to the struct (or a record reference) and the called proc can modify the fields but the caller cannot if they are not enabled for universal writing.] 080327/Thursday Noticed that even if a comment out all of the special case handling in callParamCheck, I still don't get error messages. I then found that the problem was that the type I was passing to typeModifiable in the record-field case was one with the name already stripped from the type of the field. Bad. typeModifiable needs the tik_named in order to find that the type is exported from a bundle. Fixed that. But, some of the test routines still don't get any error messages with the special stuff in callParamCheck removed. E.g. detail.z and printable.z. The reason there is that nothing is being changed inside the API procs, and so there is no need for non-'ro' access. The API procs, in their uninstantiated form, require a non-'ro' ref to the bundle parameter type. That is exactly what they are getting. Verified with "testquick.z". You can pass, as 'ro', a bundle parameter type value from one polymorphic value to a proc obtained from another polymorphic value. Bad. Why doesn't this get caught by the "moderr_uninstPoly" stuff? Because that is only reporting the value as 'ro' only. That test needs to be in assignIncompat. Is it the case that all of this bundle-related checking needs to be in assignIncompat, and has nothing to do with "modifiable"? Working on the above, in assignIncompat, and Types/Equal. In assignIncompat, when both types are '@' or pointer, why am I replacing the types with the pointed-at types, and continuing? I should just check for type equality and return in those cases. We have established that both types are of '@' or pointer, so there are no named type issues to worry about, nor are there any issues of bundle instantiations, etc. As long as Types/Normalize has done its job, comparing the pointers of the pointed-to types should be the right thing. ($$$ - see below) Some of the special stuff in "modifiable" is allowing a polymorphic routine to modify values passed to it that are of types exported from the bundle, which the normal rules make 'ro' in the context of the caller. Should this have anything to do with bundle issues? If I pass a record defined in package P to a proc in package P, then that proc can modify the fields of the record. Should this not be the case if I pass the '@' of a struct defined in package P to a proc in package P? What about passing the '@' of fields in the record or struct? 080328/Friday I think the next step is to get rid of "allowNonRo" in assignIncompat, and replace it with "allowUninstPoly". The issue I'm struggling with right now is that of how much testing do I need to do in assignIncompat relating to uninstantiated polymorphic types. At the moment, I detect if the actual type is such, via tik_bundleParam. However, what about types which contain such things? Hmm. Maybe my out is that, other than '@' and pointers, you can't have an unnamed compound type containing a bundle parameter type. Hmm. That may not be true for non-'@' bundle parameter types. Its not - you can have arrays of them, and you can pass the '@' of such arrays to procs. So, I think I do need to delve into at least one of the types involved, to determine that it is using an uninstantiated polymorphic type in a context in which I can't allow it. The other way to do it is to have Types/Equal report that two references to an uninstantiated polymorphic bundle parameter type are not the same, even though it is the same type pointer. I've done a bit of that and it seems to work. My issue with it is that I can accumulate a lot of type structures. Hmm. Doing it that way may also make it harder to specifically allow such types to match, when "basesMatch" indicates an acceptible API usage. I was about to start in on that when I realized that it would prevent such things as: Poly_t p1, p2; p1 := p2; To remind myself yet again, the key with polymorphic bundle types is to not allow "breaking the bundle". Hmm. I think there actually is a second aspect, but its more restricted. That is to not allow an uninstantiated polymorphic value to be passed from one "expression base" to another "expression base". The testing via "basesMatch" is specific for passing a value (or an '@' of one) to a proc. Do I need more? Do I want more? I already disallow constructing uninstantiated polymorphic record types, so that isn't an avenue of problems. So, I think checking for assignment (and passing to procs) is enough. Note that "modifiable" is not enough to prevent passing a value as 'ro', but even that is bad, since you can pass a value to the wrong proc. So, if you cannot construct an uninstantiated polymorphic type, you cannot assign to the fields of the record/struct/case-oneof forms of such a type, and when passed as an '@' parameter such fields are 'ro', then you cannot "break the bundle". So I am safe on that aspect, I believe. That leaves treating a value as the wrong type. Since the uninstantiated context has no operations possible, I think it only matters if you can get the value into a wrong instantiated context. Don and I tried a while ago using 'any', and couldn't, because any actual value must have the actual type of some specific instantiation (since you can't construct an uninstantiated form). You can store a reference to such a value into a globally accessible variable, but that doesn't help you work with it in an instantiated context (you can use an instantiated type as the uninstantiated form, but not vice-versa). I believe that just leaves passing values to procs. That should be handleable via "basesMatch". Interesting. The type passed to my new ContainsUninstPoly must be the type of the parameter value, not the type of the formal parameter. The reason is that in many uses we pass an instantiation type to a polymorphic proc that takes the uninstantiated form. Arghh. The reason for going down past '@' and pointer types in assignIncompat (see $$$ above) is so that we don't try to compare the actual ref or ptr record pointers, since they won't be equal in cases where we want compatibility - e.g. when a non-volatile pointer is passed to a volatile parameter, or a non-ro is passed to ro. Put back, but with some changes. Today I added the restriction that procs in bundles must all be pt_regular. That broke bundle/symboltable.z, which has a Create inside the bundle that needs to be there so that it can find the instantiation it is working within. It compiles when moved out of the bundle and referenced as a proc in the package, but it doesn't run properly. Thinking about it again, I think I made the restriction just out of paranoia - I didn't know anything that would go wrong. Undone for now. But now symboltable.z gets errors during Proc.SetExec. There must be something I'm forgetting to undo, but I can't find it. Arghh this is going from bad to worse. [This is being caused by a small addition I made to Exec/RecordConstructorStart, which denies the constructor if the type is not a named type. When it is not a named type, the code cannot determine if the record type is an uninstantiated one, and so denies the construction with no error message. Adding an error message is one fix. Also, a minor change to the test program fixes it too.] And now I can't seem to fix the errors in CharBuffer.z - they can be removed by removing the first 'ro' on "b" in OBufL and OBufR, but I don't right now see why. That's what I get for trying to hack the order of the stuff in assignIncompat. I should go to a backup and put it back to the way it was. Sigh. [Fixed - big oops - C version was missing the stuff to move down to the '@'-ed or pointered type.] 080329/Saturday Need to add recursion protection to Types/ContainsUninstPoly, since it may have a looping unnamed type. [DONE] I've now pretty much come full circle, back to where things were. By having ContainsUninstPoly, my checking is tighter, and that is catching things that were getting by before. However, I find that I need to put back the same stuff in assignIncompat that allowed an ro value to be assigned to, when parameter allowNonRo is set. And that in turn needs to be set when the basesMatch call in procParamCheck says it is OK to ignore both the moderr_uninstPoly and moderr_privateRecord restrictions. The remaining problem in test1.z (simple GUI thing) is that I have a call to an API func that has a parameter that does not involve the bundle that defines the API, and I am getting "Cannot determine that bundle param "w" actual value has same base as proc being called". Is that what some of the "bundleParamCheck" calls were doing? I don't think so. [NO] The reason, I believe, is that ContainsUninstPoly is finding more things that the direct tik_bundleParam that the old code was checking. So, the same checks that help with cases I want to disallow is not doing things quite right in this other situation. In callParamCheck, I tried changing the basesMatch stuff so that it only allows the special setting of allowUninstPoly, and does not actually emit the error about bases not matching. The problem with that is that it only works if write access is needed. Some of the problems that started me down this path don't require write access - they just have a value passed to the wrong instantiation of an API function. That read-only access must be stopped too. One direction I've thought about is having ContainsUninstPoly return the bundle that the uninst poly is found in. Then, we can do some kind of testing between the proc being called and the parameters, in terms of the bundle. In test1.z, the parameter currently being complained about is an uninstantiated one from the Widget bundle, which doesn't matter for the operations involving the LayoutManager bundle being tested. But, how do I distinguish allowable use from unallowable? What is the right differentiator here? In particular, I worry about values whose types might involve multiple bundles. 080330/Sunday Don was over last night and we thought about the above issue. An answer appears to be to identify the bundle that the indirectly called proc is from. The ContainsUninstPoly call is modified to only recognize uninst poly types from that bundle. That will fix the issue with test1.z . Is it safe in general? I believe it is safe. The are two issues with polymorphic bundles: 1) "breaking a bundle". If extra parameters to such a proc call do not involve the bundle in question, then there is no way they can be modifying parts of a bundled unit. 2) passing a polymorphic value to a proc from the wrong instantiation. Again, if the type of a parameter does not involve the bundle that the proc is from, then that parameter cannot have anything in it that the proc couldn't obtain by other means. Any actual parameter whose type contains any taint of being from the bundle that the proc is obtained via will trigger the further testing. If the proc pointer (or a fixed proc expression) does not provide a bundle pointer to test against, then the actual proc cannot be a proc that is allowed to know the insides of the bundle, because its parameters cannot involve the bundle parameter types of the bundle in question. Note: that last sentence is true for either procs obtained from an API-like record/struct, or whether they are themselves defined inside the bundle. By definition, a proc or proc type not defined inside the bundle cannot have parameters involving the bundle parameter types other than wrapped inside types exported from the bundle. Hmm. Having procs inside polymorphic bundles appears to be pointless. Such a proc will never have the same "value base" as any of its actual arguments, so no useful call to it will ever be allowed. Summary: if the type of the proc being called does not My new routine extractDefiningBundle needs to find a tik_named in order to know the defining bundle of the type of the proc being called. If the proc is inside the bundle, it can manage without any named types. So, this is a reason for not allowing procs inside a polymorphic bundle. 080331/Monday I had thought that something like this (twopoly.z): bundle polymorphic Widget(@ type g) { type Widget_t = record { uint n1, n2; }; }; bundle polymorphic Layout(@ type g) { type Layout_t = record { record { proc(@ g theG; Widget_t w)void api_handler1; proc(Widget_t w; @ g theG)void api_handler2; } l_api; g l_g; }; }; proc doit(Widget_t theW; Layout_t theL)void: theL.l_api.api_handler1(@theL.l_g, theW); theL.l_api.api_handler2(theW, @theL.l_g); corp; would have trouble with the second call because it would find "w" first in the type of api_handler2, and so disallow the call. But "bundleFromType" does not look at anything other than a tik_named or a tik_bundleParam - it will not explore any further in the type. So, it never even sees the type of "w" within the type of the proc. It finds nothing at that level, and so "extractDefiningBundle" has to go all the way out to "theL" to find a named type from which it can deduce the bundle that is involved. When I thought something like that was a problem, I was thinking of changing the language so that only a named proc type could be used. That would also resolve the small syntactic issue where I have to look ahead to the next character to distinguish between a proc definition and the declaration of a package-level variable of an unnamed proc type. It looks like doing that is not necessary, but I'm wondering if I should do it anyway. That would certainly get rid of most of "extractDefiningBundle". 080401/Tuesday Given that proc types are represented using the same type system as all other types, there would need to be tests throughout the Types code to prevent the use of an unnamed proc type. Just changing the current parser would not be enough. It might be enough to just change the point-of-call code to only accept named proc types. Note that this is intended only for proc values that are called, not for direct calls of named procs. Good progress on cleaning up bundle type checking stuff. Need to check on the use/name of moderr_refBundleParam, however. Can I get that error out? What about 240/241? I'm wondering if I need to be checking for both conditions, and with different moderr_ results. 080402/Wednesday Because of the checking for error #243 in Exec/symbolRef, you can't use a named type from a generic bundle outside of its bundle. So, I don't see how it is possible to get the current moderr_refBundleParam out of Exec/typeModifiable. It seems to be testing for just such a use. The "refBundleParam" name suggests it is trying to get a better error message for trying to modify a value that is of an '@' bundle param type. Currently you get "Cannot assign directly to struct/union/array types", which is correct, but not very specific. Error codes 237 - 241 seem to be designed for that situation, so lets change it. Currently, the only uses of Types/ContainsRefBundleType is some checks for using such a type in arrays, matrixes and unions. There is special handling for using them in structs and records, so perhaps only case-oneof's need more? 080403/Thursday I don't think I understand what Types/CheckBundleGeneric is all about. Working on it. Likely change it to return a bool. 080405/Saturday If compilation of a proc fails, the proc is not created. That means that there is no later error if a proc with the same name is defined again. %%% Go through the Z sources, and put the English form of each error message right before the call to errorXXX that generates it, as a comment. The enforcing in (1) has been added - it used to be only there for record values! However, it is currently based on the type being named. Is that sufficient? It seems unlikely for matrix types at least. Also, do I want to disallow construction of an unnamed record or case-oneof type? That would further break the old Lists code, I expect. Actually, I've already done that. Should I do it for case-oneof's too? Hmm. printable.z creates a matrix of an uninstantiated polymorphic type - why is that bad? Another way of looking at the issues is to think about the values involved, how and where they can be created, and what can be done with them. 1) There is no such thing as a value whose type is an uninstantiated polymorphic type. All polymorphic values are created/allocated by code using specific instantiations. This is enforced by the record constructor code. 2) Variables whose type is that of an uninstantiated polymorphic type are fine anywhere. They are typically used as "containers". A value of an instantiated polymorphic type is compatible with a variable of the uninstantiated type, but the reverse is not true. Once a value has "become polymorphic" by being used as the uninstantiated form, the only ways it can become its uninstantiated type again are: a) call an API proc from within the capsule of the value. Within that proc, which because of (1) must be one which uses the instantiated form, the value will appear as its instantiated form. This is safe because of the tight rules on where the called proc can be obtained. b) use the "assign" construct to do a run-time type check on it. 3) Values involving types from generic bundles can be created both at the uninstantiated level and at the instantiated level. They can be freely mixed. 4) Uninstantiated generic types and procs cannot be used at all outside of the bundle they are defined in. Thus, even though values whose type is that of an uninstantiated generic type can be created, there is nothing extra that can be done with them outside of procs within the bundle. Code outside of the bundle must always be using a specific instantiation, and so can only treat values according to that instantiation, even if the actual types of some values in the uninstantiated form. 5) Outside of the bundle, variables, etc. using instantiated generic types can be freely used. (It makes no sense to speak of using an instantiated type within the bundle, since the bundle cannot be instantiated until it is complete.) Users of "assign" must be aware that the values they are dealing with may have actual types that are either the instantiated form or the uninstantiated form, depending on the details of the bundle and the operation of the code within it. 6) Code within a generic bundle is not allowed to 'assign' to a destination whose type involves any parameter to that bundle. Since there cannot be such a destination that is outside of the bundle, this can only be done to local variables/fields. What is being prevented is a cross-instantiation 'assign' that might work because some values are created with such uninstantiated generic types, and so a value that is supposed to be used relative to one instantiation is passed into the code as, e.g. an 'any' value, then converted to the uninstantiated type in a context which assumes that all values it will see are from only one instantiation. Confirmed - see the Gen8 test in bundle/refAssign.z 080406/Sunday I've removed the stuff I added yesterday that prevents the allocation of a case-oneof or matrix whose type contains an uninstantiated polymorphic type. The reason is that the evil is that of putting values from different instantiations into a single compound. Neither case-oneof's nor matrices do that. In a case-oneof, there is only one value, which itself is prevented from having cross instantiation contents by the record constructor restriction. In a matrix, we don't care if the elements of the matrix individually come from differing instantiations - in fact that is desireable if the matrix is used as a polymorphic container. The code that checks what calls are allowed prevents two different matrix elements from being mixed. From discussions last night with Don and Roel: I may want to allow bundle instantiations within bundles. For example, if one wanted to use a Lists bundle in the definition of a type exported by the new bundle. Split up the parseBundle code - that's needed in order to allow an instantiation within a bundle. There was a bit of discussion of syntax. The word "bundle" could be done away with, and just use either "generic" or "polymorphic". I have no big problem with that, and I could then use "instance" for the instantiation. Internally, nothing would change, since the two types of bundles are so similar in what they consist of. 080407/Monday,... <> Does a partial instantiation of a bundle make sense? It would result in a new bundle which still has the remaining bundle parameters left to specify. This could be useful for Mapping, in that a partial instantiation of the key type to 'string' yields general symbol tables, in which the entry types are not yet defined. See Notes/080406-PythonToRoel %%% In thinking about the approach of using all "any" values to simulate the way Python operates, I noted that it is painful, and not too efficient to use a bunch of sequential "assign" constructs to determine the actual type of an 'any' value. This comes back to not wanting a generally-accessible "typeof" construct. However, I could extend the 'case' construct to include another variant, which does the 'case' based on an 'any' value, and where the case labels are types that the programmer wishes to test for. This way, only types that the programmer has access to are available. Note that there isn't much besides a linear search that can be done to implement this, since the Type_t values are not known until run-time. So, it wouldn't be all that much more efficient than the sequence of 'assigns'. Also, some smarts in the code generator could perhaps deduce what is happening, and effectively rewrite the code to be as if such a 'case' had been used. Back to bundles. I'm going to try to go through all of the situations where types are used, and try to state the rules in those contexts. 'assign' operator: As far as I know, the risks here have already been addressed. Variables, the first "argument" to 'assign' will always have types that can exist in the context where the variable is declared. The values (second "argument") can come from pretty much anywhere. 'assign' *only* does a direct comparison of the Type_t pointers - nothing else. This is relied on, I believe, in that the run-time code does not have to try to re-implement the compile-time rules. In particular, the compile-time rules include information about the context of the usage. I'm not sure it would be equivalent to use the context of the 'assign' for that purpose. It might be, and it is likely close, but it would take some serious thinking to be sure. The one issue that did come up involved the fact that code inside procs inside a generic bundle can allocate values. Within that context, the values will have actual types involving the uninstantiated forms of the types used. This will be true regardless of what instantiation of the bundle those procs are running on behalf of. Thus, any value created in that context would be acceptible to 'assign', thus leading to cross-instantiation pollution. Such procs cannot declare a variable of an instantiated form of the types in the bundle, so all variables would be of uninstantiated types. However, the normal language checking makes those be compatible with all such within those procs. The fix I added here was to prohibit the use of such variables (which can only exist within procs within generic bundles) as the first argument to 'assign'. Types/ContainsBundleParamSubtype is used to do the test. It is passed the bundle in question (the open bundle from the Proc/Context_t for 'assign' and 'procAssign'). It has recursion protection. It uses the recursion protection for pointer, array, matrix and exec types - why? If it should, then it needs tik_ref as well. The routine stops *only* with things like tik_enum, or when it hits tik_bundleParam, where it compares the bundles. [I have removed tik_pointer, tik_array, tik_matrix, tik_exec, tik_proc, and tik_union from the recursion prevention stuff - there is no way for those kinds of types to refer to themselves. Note that ContainsUninstPoly did not do those, nor did CheckBundleGeneric.] Note: the only other use of Types/ContainsBundleParamSubtype (and its field-list helper) is in Types/Instantiate, when it is determining whether or not a given type it has encountered needs instantiation. assignment (including passing non-'@' proc parameters, and values used in record constructors - basically stuff that uses "assignIncompat": The usual rules for assignment compatibility apply. One extension is that if type X is a named type defined in bundle B, then a value of a type which is an instantiation of X can be assigned to a destination of type X. The reverse is not true. I have verified that this extension does not operate for unnamed types, but in order to trigger it without other errors, I had to have a proc inside a polymorphic bundle. Note that since uninstantiated generic types cannot be used outside of their bundle, this extension can only be applied to types defined in polymorphic bundles. A second extension deals with the names of instantiated types. Normally, if two names name the same type, then they are not compatible types. However, if the names name the same type from a bundle instantiation, they are aliases for that type and are equivalent, and are also equivalent to the long syntax for that instantiated type. Assume there is no code inside polymorphic bundles. In code inside generic bundles, there are no additional limits on assignments to variables/fields whose type involves the bundle parameters. If a bundle parameter is an '@' bundle parameter, then it is considered to be multivalued, and so cannot be assigned to. Outside of a polymorphic bundle, the requirement is to not allow "breaking of bundles" (capsules). The fields of uninstantiated types from a polymorphic bundle, and this includes any unnamed types that can be gotten at, cannot be modified. The notion of "field" here includes record fields, struct fields and union members (although the latter doesn't really matter). passing as '@' proc parameter: The limitations here are very similar to assignment. The main things is to not allow bundle-breaking, so fields of uninstantiated polymorphic types must be passed as read-only. Hmm. The model I've been using is this: bundle polymorphic LayoutManager(@ type managerType) { export type SetSize_t = proc(@ managerType ro lm; uint width, height)void; export type AddWidget_t = proc(@ managerType ro lm; Widget/Widget_t w)void; export type Draw_t = proc(Graphics/GContext_t gc; @ ro managerType ro lm; uint x, y)void; export type LayoutManagerAPI_t = record { SetSize_t lma_SetSize; AddWidget_t lma_AddWidget; Draw_t lma_Draw; }; export type LayoutManager_t = record { LayoutManagerAPI_t lm_api; managerType lm_manager; }; }; Why do I need to pass a writeable "managerType" to the API procs? Why can't I just pass a LayoutManager_t? The actual API procs will see the parameter in its instantiated form, and so can legitimately do whatever they want with it. They can even modify the LayoutManager_t itself, since code nearby created it, and has that access because it has instantiated the type itself. Test with a new version of test1.z. Result of testing: currently, I get errors on each instantiation proc. The complaint is that the type of the formal parameter does not match that of the type in the proc-type being given to the actual proc using the ':' syntax. I can make it work by slightly relaxing the code in Proc/AddFormal that compares the type in the proc type versus the actual type. When an instantiation has occurred, the instantiated type has a different name from the instantiating type - it has the same name as the one that it is an instantiation of. So, the two top-level type pointers do not match. Allowing them to match, so long as they point at the same type behind their names, and they agree on the bundle instantiation they are from, then it should be OK, and the new version, test3.z works. Re-look at the instantiation code, and make sure this makes sense. [DONE] If so, can maybe get rid of some icky code that is allowing the non-'ro' '@' usage. I believe this situation came about because I was forcing things to work before I allowed type pre-declarations in bundles. This new way pretty much requires one. passing as other parameter: construction/allocation: declaring variables: declaring struct/record/union/case-oneof fields: directly using the uninstantiated types: 080408/Tuesday %%% Package-level variables exist in the package's variable space. One of these is allocated whenever the package needs to be initialized because some proc in the package is called. Similarly, any __packageInit__ proc will be called before any other proc in the package is called. Each process that is running and using proc's or variables from the package has its own copy of the package variables. Items added directly to the package do not exist in the package's variable space. If they must be brought into memory, each will have its own chunk of memory. All processes that access these items share the one copy. Proc's in the package are one of this kind of entity, as are subPackages. The proc's code is shared by all users. The subPackage contents are seen as the same by all users. Each item of this kind also has its own storage in persistent store, and must be retrieved from that storage when it is needed. These items can be as small as a bool value and as large as a 6 hour HD video, or a database of all humans. I think I've suggested this before - if a struct is 'inline' within a record, then perhaps the record constructor should include the fields in the struct. This would be useful with records defined in bundles too, where there is a last-field of the record which is an '@' bundle type. To implement this might require some small fiddles to constructor code, and to the code that decides to allow 'inline' or not. The 'inline' flag is in the Types/FieldList_t record. DONE Ideally, the next/prev pointers in the lists defined in package Lists should not be specified in record constructors used outside the package (which is the only place they can be used if there is a final '@' parameter field). This could be done by adding a "noInit" flag to fields, along with that keyword in field definitions. DONE These two things would help make the language "higher level". 080409/Wednesday Take errors about too many or two few instantiating types from pProc.c and put them into Package/AddInstantiationType and Package/CompleteInstantiation. DONE 080410/Thursday Today I was back at the strange situation where in bundle/test1.z (the one still using the above old-style call), I get an error for the attempt to call lma_AddWidget in LayoutManager/AddWidget, but I do not get one in Container/AddWidget. The error is :Uninstantiated polymorphic field "lm_manager" cannot be passed for non-'ro' '@' parameter "lm":. The problem is that code in package LayoutManager should have *more* access to lm_manager than code in package Container, not less access. This occurred after I #if-ed out code in callParamCheck that overrides moderr_uninstPoly when basesMatch has returned true. What is going on is that the later code that overrides moderr_privateRecord is still active, based on the fact that we are passing the '@' of lm_manager to a proc from a type in package LayoutManager. If I remove that overriding as well, then the second call gets :Private (here) record field "lm_manager" cannot be passed for non-'ro' '@' parameter "lm":. With the test3.z form of the program (passing the entire LayoutManager_t to the API procs of LayoutManager, rather than just a ref to the "managerType" bundle param), even with all of those overrides #if-ed out, the program compiles and runs fine. I think this is what I was wanting. When passing a full record pointer to an API proc, the proc has whatever access to the record contents it has based on where the record is defined and where the proc is defined. So, no special casing is needed. If I want to allow an '@' of a struct to enjoy the same privilege, then I need to have the moderr_privateRecord override. However, that should be done by passing the override into "modifiable" itself, so that if it will end up not returning moderr_privateRecord it will call "typeModifiable" and so might get moderr_uninstPoly. With that change, I think both uses would have gotten the first error messages, even with the moderr_privateRecord override active. So, first, I need to #if out all of the overrides and see if I can make variants of all bundle test stuff work properly. Note that the bundle uses for ioProcs and construct procs do not rely on the overrides. (Should 'ioProc' change to 'io' - the 'Proc' part seems redundant. Would 'io' as a reserved word be too short?) If all works out well, I need to decide if I want to do the special case of allowing write access to a struct field to a package which exports that struct field, even though the caller himself doesn't have write access to it. Maybe I just don't want to do that - given that I can get rid of the overrides, it would be a wart. If I do want to do the above, note that the lack of write access because the current context is in the wrong package must be the *only* reason for lack of write access. E.g. if the struct is within a record defined in another package, and I don't have write access to that record, then I should not grant write access to the routine from the package that defines the struct - such access should only be granted by the package that owns the record type, since it is the one that is maintaining the semantic consistency of everything in its record type. In callParamCheck, I can't just start with a call to something like Types/ContainsUninstPoly, and require that basesMatch if I find one. That would prevent me from passing polymorphic capsules around. What the current check does is to first call "extractDefiningBundle". If that is non-nil, then the proc we are calling is coming from a type associated with the returned bundle. Thus, it is possible for that proc to be one which takes uninstantiated values. Procs not coming from such a source may take uninstantiated values as parameters, but cannot do anything harmfull with them since they cannot convert them to instantiated values. Only when a proc comes from a polymorphic value can it possibly be one which is declared to take uninstantiated values, and so will treat them as uninstantiated. This is safe because there is no way to construct a value of a type defined in the bundle containing pieces from multiple instantiations - all occurrences of the bundle parameter types within a type defined in the bundle are instantiated the same, and other types will not match them. So, a risk is that "extractDefiningBundle" will find no bundle, and so Types/ContainsUninstPoly will not be called, nor will basesMatch. This is why I'd like to require that any proc expression being called must have a named type - it greatly cuts down on the possibilities for this risk. Note that I do need to call Types/ConstainsUninstPoly, since otherwise the test would require that all parameters to a function from a capsule have "basesMatch", and that would be far too restrictive - e.g. the base of a uint constant will not match. The real restriction for polymorphic bundles is that the proc being called and all involved parameters be from the same, if any, instantiation of a given bundle. However, there is no way to determine that at compile time. A run-time test could be used, but it might require hidden extra data within the "capsule" (assuming the compiler can identify "capsule" types) to do that. The restriction that the proc and the actual parameters have the same "base" is a way to guarantee the above, but by allowing only a more restricted set of calls. Does the restriction really matter? My current belief is that it does not. 080411/Friday Started last night, and nearly done today the Debug package. Done. %%% Ick. 'compileTime' stuff cannot use Fmt since package-level init has not yet been done. Need a fix for this eventually. 080412/Saturday If I take away the ability to have unnamed proc types, how do I prevent someone from simply calling the proc-type constructor routines to build one? Can I simply add a symbol parameter to Types/ProcStart? How would that work in the parser? That would essentially mean that the proc-type code would call NamedNew internally, etc. Should be do-able. [Later: I resolved the issue here by adding a "containingBundle" field to proc types.] %%% Is there any reason to require that construct procs must return void? I could have constructs that yield things otherwise. This came up with the idea of someone doing a compiler for Python in Z - I think Python has constructs whereby one can execute a loop and build up a vector of values to return as the result of that loop. This goes back to the similar thing I had in my ALAI language. The concept for Python that is in my head here is that all variables are of type 'any'. There would be a package "Any" that exports a bunch of stuff needed to work this way. The Python parser would have to generate declarations for variables, fields, etc., but would always just use type 'any'. Otherwise, the hope is that the Python stuff could be supported in Z using Any, and would just call Exec, Proc, etc. to generate normal Z internal structures, which could then be compiled to somewhat optimized machine code. For example, something in Any could be along the lines of: export proc Add(any a, b)any: Base/Uint_t ua, ub; assign(ua, a); assign(ub, b); if ua ~= nil and ub ~= nil then return Base/Uint_t(ua.theUint + ub.theUint); fi; Base/Float_t fa, fb; assign(fa, a); assign(fb, b); if fa ~= nil and fb ~= nil then return Base/Float_t(fa.theFloat + fb.theFloat); fi; ... BI/Abort("mismatched arguments to '+'"); corp; Its not at all clear how to implement Python's exceptions. If I make the restriction on proc types being named, then do I even need "extractDefiningBundle" as it is now? All procs have a type. I simply take the type of the proc being called, and see if it was defined in a bundle, yielding that bundle if so. There is no need to track up through the expression that yielded the proc - it is just that type that matters. If the proc is a fixed proc, and not a proc expression, it will still have a proc type. If the is an uninstantiated proc from a generic bundle, then use of it is prevented by the "Cannot use uninstantiated generic proc outside of its bundle" error from Exec/symbolRef. For an instantiated proc from a generic bundle, the type of the proc comes from the ProcInstantiation_t record, and is an instantiated form of the type of the proc from the bundle, as created in Proc/Instantiate. %%% Add tik_setOneof and tik_bits to bcRun/showFrame. Error "Cannot use instantiated generic proc as a value" is produced by Exec/assignIncompat. The comment says "/* Only instantiated procs from generic bundles can be used as values. */", but the code complains only if the bundle is generic. Which was it supposed to be? Given my desire to not have procs inside polymorphic bundles, the test of the bundle is not needed - they will always be generic. I can sort of see issues with allowing an eik_instantiatedProcRef to be used as a value - proc instantiations point at the original uninstantiated proc, and share the code, so what would they be at run time? The current byteCode generator uses the value of the generic proc's Proc/Proc_t as the value. Thus, 'assign' would allow a proc from any instantiation to be put into a variable of uninstantiated proc type. Testing with the check in assignIncompat reversed... this doesn't work, since I can't do the 'assign' to an uninstantiated type within any generic proc, and no uninstantiated type from a generic bundle can be used outside of that bundle. So, I don't think this test matters anymore. But, I may leave it in, without the test of whether the bundle is generic or not, just as a nicer message. [Done] If I try the 'assign' outside of the bundle, it should fail, since I have to 'assign' to a variable of an instantiated type. Verified. Fascinating, I can't assign a proc which has been forced to have a named type to a Proc/Proc_t! They are different types, and since both are named, the assignment doesn't work. That's coming out of the handling of proc types in Exec/assignIncompat. [Changed assignIncompat to allow the assignment.] Done the removal of ctx_activeProcInstantiation and Proc/SetActiveProcInstantiation, and the test for it in Exec/CallStart. Since the error message number was a duplicate, nothing to remove there. 080413/Sunday Ripped the overrides out from callParamCheck and assignIncompat. %%% Note that test/colcol.z was using Types/ExportAdd (which is compileTime) from inside a compileTime proc, and that was failing with Exec/ProcCheck. Changing it to use Types/DoExportAdd works. Should investigate. 080414/Monday Minor change to allow '@' bundle param to be instantiated with a union type. This keeps the triple struct/union/array consistent with the complaint about assigning to them. Tested and updated error comments in all the newer test programs in the bundle test directory. All seems well. How does an entity implement more than one interface? Can it end up just being a single allocated item, or are two needed? Can some added meaning for concepts line 'inline' and 'noInit' help out? Could there be a field property 'hidden' which means more than just 'noInit'? Are there situations in which 'inline' can apply to record types? I guess I could define it to mean whatever I wanted it to mean. Likely I could only 'inline' a record type into another record type. But, what of the semantics of records? I think that idea is a non-starter. [Later: I think the idea of 'hidden' has become 'private' fields.] Can an entity be in more than one list from the new Lists package? Again, can I avoid two allocations to create something like that? Did better error messages for when using a named proc type versus an unnamed one. Tested in test/proctest.z . %%% Proc types appear to allow a non-ro '@' formal - fix. Actually, right now there is no provision in a proc-formal in a proc-type to record either 'ro' or 'volatile' - those are attributes of the actual proc formal in the proc itself. That also means those attributes cannot participate in any type checking involving proc types. Should they? I don't think either can make any difference to a caller of the proc. Within the proc, 'ro' can be a documentation aid. 'volatile' can be significant for low-level programmers who take the address of the formal parameter. Nothing changed. Adding the rule that procs used as values must have a named proc type is a nuisance - you have to have that name when using BI/Disassemble! You also have to create a name for any "fmt" proc you produce. Perhaps undo that rule, and only insist on a named proc type when a proc value is called. Bah! The latter is a nuisance when trying to call a proc you have just generated - the 'procAssign' fails. In fact, I think it *prevents* any use of a generated proc. So, maybe only require the name when I need to have one - if "getDefiningBundle" sees that the proc to be called is coming from an expression and not an eik_procRef or eik_instantiatedProcRef. Even that seems tough to avoid making it harder to use proc values. I think perhaps the answer is to store the containing bundle right in the proc type. Whenever a proc type is created, copy the containingBundle from the ctx to the proc type. Then it is always there. If the proc type actually has no formals of types involving bundle types, then the check using ContainsUninstPoly will report nothing. DONE. Note that bc_pchk, used for 'procAssign', will unconditionally return nil if the proc's containingBundle is non-nil. 080415/Tuesday I've add pd_containingBundle to proc types. It is only relevant when the bundle is a polymorphic bundle. Everything that should run still runs. Now to test the error cases to make sure they still get an error. DONE. 080416/Wednesday A common pair is to call Package/CreateDirectReference and then call Exec/PackageSymbolRef. Perhaps combine those, which has the added benefit of not exposing the intermediates to the caller. Can I then get rid of Package/CreateDirectReference totally? How about the other package routines that create such references? MOSTLY RESOLVED When appending bundle and package contents items, check that those items actually were created under a Proc/Context_t in which the bundle/package was the active one. Handled some of this right away, by not passing in the Bundle_t at all, but getting it from the Proc/Context_t. RESOLVED There is no complaint when a proc formal symbol is re-used as a local variable within the proc. FIXED Prevented polymorphic bundles from containing procs. Disallowed 'assign' to proc variables - must use 'procAssign'. This is done so that the additional check (proc is not from bundle) done by 'procAssign' cannot be subverted. There might be more such checks in the future. Hmm. This doesn't really accomplish much, since you can easily just wrap a proc in a record or matrix, and assign that. However I don't know what my representation of proc values will be when moving to native code, so it might be best to keep both constructs for now. 080418/Friday %%% Accidentally ended up with both a package-level variable and a package instantiation with the same name. The instantiation came first. There was no complaint about duplicate names. Interesting, but it makes sense. If you add a custom formatting routine to a type that is from an instantiation, then the name you use matters. If you make an alias, and you add the type to that alias, then only if a variable is of that alias of the type will you get its "fmt" proc. 080419/Saturday %%% I could add some new abilities to Fmt. I could allow a format cycle to have no mainExpr (do an iop_main phase with a nil Exec_t). Then, the format string can specify what to generated internally. Some ideas: pr - the name of proc containing the Fmt call pk - the name of the package containing the proc pt - the path to the proc ct - time of compilation as HH:MM:SS Could allow a precision to add .xxx after SS. cd - date of compilation cdt - date and time of compilation rt - time of execution rd - date of execution rdt - date and time of execution These could look something like this: Fmt/Fmt(:: rdt, ": ", eventString(), '\n'); Alternatively, I could reserve a bunch of identifiable string constants to specify the what, thus allow the format string, etc. to have more normal meanings. That sounds a bit limiting on the ability to handle strings though. 080420/Sunday Last week sometime, I pretty-much reached the decision that the language name should be "Zed". Not "Z", but explicitly "Zed". I could use that as a file-name suffix without problems, I expect. Also, at this point I don't think there is an existing conflicting language anywhere. Plus, it forces the pronunciation away from the American "Zee". Last night, discussed with Roel and Don a bit. There doesn't appear to be a way in C++ to build multi-way lists, where the elements are on multiple linked lists like the queue.h macros can do. Roel says when using the STL stuff in C++, you typically have one record that is the data, and then have multiple records for the multiple linked lists you want to use. I suspect I *can* do what the C macros do, using compileTime procs, and requiring continual specification of the various field names, etc. just like the C macros do. So, it looks like I have nothing more to do there. The only example I've seen so far where the concept of partial instantiation of a bundle could be useful is in the Mapping example, where instantiating the key type with 'string' (or 'uint') without instantiating the value type could be useful. Again, I think for now I won't persue that any more. Which means that other than trying yet again to clearly specify them, I'm again done with bundles. 080501/Thursday Been working on Don's dgol in Z. Currently record fields have flags: isRo, isVolatile, isInline. I'm thinking of adding one more - isPrivate. That makes 4, so it might be worthwhile to switch to a bits (in Z) representation of the flags. If the isInline flag is set, then the fields of that inlined struct should be present in a record constructor for the record. DONE Could add a new flag, isPrivate. What it says is that even though the record/struct is public, that field is private. So, it should not be given in record constructors and cannot be modified outside of the defining package. In the case of a record type defined within a generic bundle, the field should be modifiable only within code within that bundle. isPrivate trumps isInline, in that a field of an inlined struct that is private is not included in record constructors and is not writeable. It *is* directly selectable, however (the current use of isInline). DONE. Does it make sense to default structs, like records, to be private to the defining package? What would it mean? Well, even if a containing record or struct is public, the fields of the private struct could not be modified, and are not given in record constructors. Code outside of the defining package could declare variables/fields of the private struct type, but could not assign to the fields of such variables - they would have to call a proc, within the defining package, passing the '@' of the struct variable/field, to get the struct modified. Some consequences of these changes: 1) the Lists package could mark the link fields of the list record types as 'private', and thus those fields would not be present in record constructors, and could not be modified by code outside of the Lists package. The package could also mark the data field as 'inline', which means that record constructors would include the fields of such a struct. Thus, a record constructor would contain exactly the fields of the struct, which is just what is really wanted. Note that currently only struct/array types can be used at an '@' bundle parameter type. Arrays cannot be inlined. So, would that be an error, or would I just prohibit the use of arrays in that situation? It doesn't seem too big a restriction. In modelling the X events, it would be useful to have the X id field be common to all variants. E.g. in dgol, Don used that to look up the affected window before any further processing. That allows ignoring events that the window doesn't want or doesn't support, etc. If I wanted to go that path, what would it look like? A current constructor, e.g.: ExecInfo_t.eik_binary(bin) could end up as: ExecInfo_t(eik_binary(bin)) which is only one character longer, and extends readily to: ExecInfo_t(common1, common2, eik_binary(bin)) This could be a merge of the record and oneof concepts. What would the declaration look like? Current: type Variable_t = oneof case v_kind incase vk_string: string v_string; incase vk_uint: Base/Uint_t v_uint; esac; could perhaps end up as: type Variable_t = record { case v_kind incase vk_string: string v_string; incase vk_uint: Base/Uint_t v_uint; esac; }; which then could be extended as: type Variable_t = record { uint v_key; string v_name; case v_kind string v_string; incase vk_uint: Base/Uint_t v_uint; esac; }; This requires no change in indentation or style, so is not hard to change to. Now I need to go re-read my objections to this. Note that the variant part would be restricted to the last field of the record. Hmm. One of my objections involved memory usage patterns. Thinking about it now, it doesn't make much sense - I think I was thinking about the variant parts being of different sizes, which then means several different sizes for the overall variant record. Note that doing this clears up the use of 'oneof' when parsing types. It now is no longer a top-level thing for variant records, and is only a set-oneof at the top level. I could entirely remove the current "case-oneof" concept, other than as a part of record types, and the "oneof" concept would then only refer to set-oneof types. That is a fairly large editing session, and change of thinking, unfortunately. ALL DONE 080502/Friday Get SEGV during instantiation if you predeclare a record type outside of a bundle, then define it inside that bundle. Need an error message instead. Either that or figure out how to allow predeclaration of types defined in bundles. Likely the former. [Done] Its looking like it would be useful to be able to fully pre-declare types defined in bundles. That may not be that bad - just recognize when we are re-entering a bundle that has already been defined. Also, there can then be no checks, at the end of the bundle, that all of the types in it that were predefined are now defined. There will likely be issues if someone then trys to instantiate a bundle that still contains types that are not yet defined. Have to watch for that and issue an error message and fill in the type with any needed dummy info. In C++, if class B inherits from class A, then a class B pointer can be used as a class A pointer (sort-of). Could I allow that in Zed? E.g. type AData_t = struct { uint ad_count; string ad_name; }; type A_t = record { AData_t a_ad; }; type BData_t = struct { AData_t bd_ad; float bd_size; }; type B_t = record { BData_t b_bd; }; Any reference to a B_t would appear to be a valid reference to an A_t as well. Could I then allow: A_t aVar := A_t(...); B_t bVar := B_t(...); aVar := bVar; I could of course not allow bVar := aVar; If the B_t record itself just directly has the BData_t stuff, that ought to work as well, but doesn't allow the process to continue. If all of structs are always 'inline', then bVar.adCount is a valid field selection, etc. [That aspect of 'inline' verified, in test/inline.z] I tend to prefer explicit things, so it might be better to not allow the assignment directly, but to have an explicit conversion available. The question is that of what it would look like. On that thought, I might also prefer that this "inheritance" be explicit rather than this strange implicit thing. One though: if a record type is 'inline' in another record type, and is the first field of that second record type, then this situation exists, and perhaps just a reference to the field name is what is needed. To be more explicit: type A1_t = record { uint a1_count; string a1_name; }; type B1_t = record { A1_t inline b1_a1; float b1_size; }; A1_t a1Var := A1_t(10, "Fred"); B1_t b1Var := B1_t(20, "Wilma", 25.6); a1Var := b1Var.b1_a1; I'm not sure whether or not the A_t fields should be given in the constructor for the B_t. Syntactically, this looks pretty much the same as if the 'inline' were not there. However, there is only one chunk of memory, not two. Doing it this way, I do have to restrict the 'inline' to the first field of the larger record type, since the pointers only work out for the first field. In C++, pointers aren't to typed chunks like in Z, so they can build a pointer to the middle of an allocated chunk and have it work. I can't do that. I don't think 'inline'ing the overhead fields would work out properly. How does this interact with structs? I think it doesn't, i.e. you can't 'inline' a record type into a struct type. The question about whether the A_t fields are given in a constructor for a B_t suggests that it shouldn't be the 'inline' flag that says whether or not such inlined fields are included in constructors. So, that goes back to my old thoughts of the 'noInit' flag. If you 'inline' one record type into another, but do not want the fields to be given in constructors, then specify *both* 'inline' and 'noInit'. Either flag can be used alone as well. type A2_t = record { uint a2_count; string a2_name; }; type B2_t = record { A2_t inline noInit private b2_a2; float b_size; }; proc initA(A2_t a2; uint count; string name)void: a2.a2_count := count; a2.a2_name := name; corp; A2_t a2Var := A2_t(10, "Fred"); B2_t b2Var := B2_t(25.6); initA(b2Var.b2_a2, 20, "Wilma"); a2Var := b2Var.b2_a2; a2Var.a2_count := 11; /* legal */ b2Var.a2_count := 22; /* not legal because of 'private' */ a2Var.a2_name := b2Var.a2_name; /* legal */ 080505/Monday The other flag I was thinking about for fields was "noInstantiate" (or some such). That causes the field's type to not be instantiated when the struct/record type is instantiated. This would be useful if, say, a Window_t wanted to have a reference to its parent Window_t - that parent is a different Window_t altogether, and need not be of the same instantiation of this Window_t. 080506/Tuesday What I did yesterday in bundle/twopoly.z is an example, using the current Zed stuff, of handling multiple inheritance. That's actually not what is needed to merge the event API with the container API from Don's dgol. What is really needed is just adding more API members to the event API to produce a combined API - the container calls are never used with something that is not also a window. So, a way to handle that is some kind of merged API record, which contains the event API as its first part, and then added container API as its second part. With the syntax of the above 'inline' for records, it could be something like: type MergedAPI_t = record { EventAPI_t ma_eva; AddWidget_t ma_addWidget; SubWidget_t ma_subWidget; ChildWantsResize_t ma_childWantsResize; }; Doing it this way may allow doing it without the extra indirection needed, as used in twopoly.z. However, I don't know how the types would work out, given that "EventAPI_t" is actually an instantiation of a type defined inside a bundle. It *might* work out, somehow. However, to make the new container sub-API work, there needs to be another bundle, which has the merged API, so that containers can be done properly. More thought and experimenting needed. It would also be very nice to have an easier way to construct the actual API records involved. Could I allow a record constructor to accept a record value for a first field, just like the 'inline' allows that first field to be a record type? Perhaps the 'inline' *requires* the first field to be initialized from a record value? Here's something to try: do things work out somehow if the merged API just has a non-inline reference to the event API? 080520/Tuesday Done lots of Lego stuff for ETS, and been looking at dgol more. May want an explicit 'inherit' flag that could be used in both records and structs, but only in the first field. Then there isn't any implicit intent for this inheritance. 'inherit' would imply 'inline'. It may or may not imply 'private', 'noInit'. I'm thinking that the only difference between 'inherit' and 'inline' is the extra type relaxation implied by 'inherit'. Hmm. It might be more obvious is the 'inherit' actually appears *before* the type of the first field. All of the flags could be done that way, and the result might be clearer. I could also syntactically require the 'inherit' to be the first field, to make it stand out more. I'm not sure how to proceed at this point. My current run at a Zed version of Don's dgol/botox is quite painful - I am fighting both all of the X stuff that Don has used, as well as all of the C++ features. I'm thinking that maybe I should make a backup of Zed at this point, and then dive in and try adding all of these new field flags. If they don't work out, then that is all a wrong direction anyay. A YEAR LATER - DGol runs. Just starting changing the code to use a flags field. It occurs to me that allowing a bits type to be 'inline' could be useful. Wouldn't be as simple to implement, perhaps, but would be handy for, e.g. Types/FieldList_t. Something that got missed in the bits stuff is the readability of a bits constant. One of the things I aimed for was to be able to specify a list of flag values by the names of those flag fields. E.g. for my new FieldListFlags_t, it would be nice to say: FieldListFlags_t flags := FieldListFlags_t(fl_ro, fl_volatile); instead of: FieldListFlags_t flags := FieldListFlags_t(false, false, false, false, nowRo, nowVolatile); Hmm. These aren't equivalent anyway - its not actually a constant I want in this situation. Maybe nothing reasonable to be done here. 080522/Thursday Actually done some of the changes. Switched to a bits type in Zed and a z_uint_t of flags in C. Did the parsing, To/From Buffer, etc. Changed the Exec/RecordConstructor stuff. But, it doesn't do the right thing at runtime because the code in bcRun/recordConstructor doesn't compute proper offsets for fields in inlined records/structs. Also, having it call Exec_NextInitField, which allocates/frees memory is icky. I think perhaps I need to have another field in record and struct descriptors, which is a list of all fields, including inlined ones, but excluding those that do not need initializing. Hmm. Just looking - I don't think I'm getting the offsets right when addressing individual fields in inlined structs if the inlined struct or record is not the first field in the containing one. Modify inline.z to test that... 080523/Friday Lots of time not on Zed. Following on to the above, however. Zed refs ('@'s) are similar to C++'s. In C++, any reference value is guaranteed to be non-nil (barring messing around with pointers and casts, etc). This is also true in Zed. However, in Zed, there is an additional goal that no ref points into a heap object that can be freed out from under it. This is done by the compiler generating temporary extra references when needed, rather than having the garbage collector and reference counter able to handle pointers into the middle of things. Hmm. Putting 'private' 'noInit' in the pointer fields in the Lists package's generic types has worked out fine - the user of the package must no longer supply values for those fields in record constructors. However, putting 'inline' into the generic fields is more of an issue. What that means is that I need to special-case allowing a bundle parameter type to be inlined. I also have to rebuild the allFields table and the initList list when a record type is instantiated. Also, I only do it if the instantiating type is a struct - do I ignore other situations, or do I mark the bundle parameter as having been 'inline'd, and then check and complain at the point of the instantiation? I think I do want to do the inlining, but not the complaining. 080524/Saturday Setup wireless router and played with DS/PSP. Ok, doing the 'inline' as above was not an issue. I had to do the error message carefully about when you can allow an '@ bundle param to be 'inline', but after that it all worked with no change. That's because the instantiation code rebuilds the instantiated type using the normal type building code, so all of the needed stuff just happens. Excellent! Well, had to add a test in 'addField' to just skip the error message if the type being inlined is in appropriate, when called with ctx = nil, which is from inside instantiation. Easier to do the test here than trying to redo the checking in the instantiation code itself. 080526/Monday An issue I've hit up against. Struct types are not instantiated properly. Well, record types aren't either, really. If the type does not contain any field that uses a bundle parameter, then the code in Types/instantiate0 will just use the original type. But, the code in Package/ CompleteInstantiation that is doing this will have created a new struct/ record/caseOneof type for the instantiation. The result is that the instantiated type ends up having no fields. One possibility is to have Package call ContainsBundleParamSubtype to see whether it should use use Instantiate directly, or should create a new compound, use that in the NamedNew, then complete the compound. Doesn't work yet, and still more testing needed. But, I'm wondering why I'm testing using ContainsBundleParamSubtype at all? Can't I just instantiate all types, and rely on Types/Normalize to get rid of duplicate ones? The result of instantiation, if a type in a bundle doesn't include a bundle parameter, is another name for the same type. By the normal rules, that will not be compatible with other instantiations or with the original type in the bundle. Later: that doesn't work because we don't actually normalize struct/record/union types - they are all considered unique. The simple answer here is perhaps to require that all named types within a bundle must include a bundle parameter type somewhere within them. The current example I'm testing works fine if they do. I really do need an ErrorSub for types. 080528/Wednesday Re-examining the above - trying to push through allowing any kind of type in a bundle. Doing this by checking in Package/CompleteInstantiation, and not instantiating when not necessary. What I run into is that the record type in my test situation references itself (as in a linked list). So what happens is that when I try to instantiate that type, the call to Types/CheckBundleGeneric from within Types/NamedNew is saying that I am trying to use an uninstantiated generic type outside of its bundle. And that is quite correct. So, my attempt right now is to add a new field to Proc/Context_t, ctx_containingBundleInstantiation, which is only set while I'm doing a bundle instantiation, and which can be tested inside of Types/CheckBundleGeneric to allow the use of an uninstantiated generic type within the instantiation of its containing bundle. The above seems to fly. Ok, all spiffed up. Seems all well. [Later: interesting that this issue came up again after bundles were gone, for 'generic's. The pctx_containingInstantiation method works.] 080529/Thursday Hmm. My test program gets complaints now because the record types from the bundle are not instantiated anymore, and so now they are defined in the package containing the bundle, not the package containing the instantiation, so the record type is no longer public in the using package. Need to look a bit deeper. It's because the concept of a "writeablePackage" is in the RecordDesc_t. It is there to handle record types with no names, such as can be produced by code like the old Lists package. If the type is not instantiated then no new RecordDesc_t is created, and so the name in the instantiation refers to the RecordDesc_t for the uninstantiated type, which has an rd_writeablePackage of the package containing the bundle. Gack. When you predeclare a record type in a bundle, it doesn't yet reference any bundle parameter. So, when you define a type that uses that predeclared record type, that isn't enough to make the second type use a bundle parameter. So, you get a warning about that subsequent type. I saw this in bundle/test3.z, for the API types. There is no warning for the individual proc types because I had made the warning only come out for struct/record/caseOneof types. So, the final answer seems to be to *always* instantiate record and caseOneof types, regardless of whether or not they involve a bundle parameter type. Ok, so what about predeclared struct types? I think I'm OK with them because they have no 'private' attribute on the entire struct, and there is no constructor to deny the use of. That still leaves individual 'private' fields in the struct. Testing. It doesn't seem to matter, but my brain has trouble figuring it all out. See the updated comment in Types/checkBundleGeneric0. Interesting. If a field in a type in a polymorphic bundle is made private, then there is no context in which it can be modified, and it is not provided on initializers. Is that what I intended? No, fixed. 080530/Friday Long day of Zed work so far. (Industrial fans drying wall!) Done the first crack at supporting 'inherit'. Works fine so far, with direct inherit of record types, also compatibility with struct types via ref and pointer. Similar for params of proc types. 080603/Tuesday Fans still here. When type-checking involving inherited initial fields, need to do a loop through multiple levels of inherit, not just one. 080607/Saturday "dirSing.z" compiles and runs. Looks like I've matched C++ single inheritance, at least when using records. The Z code does the same for structs, but I haven't tried those out in this way yet. Note: with the way I've added field names from an instantiating struct in an inheriting instantiation to the bundle type, I have multiple meanings for those field names. They have different offsets, and the initial ones are in a struct, and the generated ones are in a record. So, my attempt to make there be only one meaning for an identifier in a given scope does not work. I may as well admit this and remove the checking I do. To-do: when instantiating with an 'inherit' clause, there are checks that later code assumes. They are not yet done. See instantiate0. When instantiating, should it be "instance" as opposed to another use of the "bundle" token? 080609/Monday %%% <> When I get the parser translated to Z (just 4 source files, I think), it should be possible to do an interesting scheme for automatic testing. Write a compile-time routine, say "ErrorExpect", and another, say "ErrorDone". The first registers an expected error somewhere, and the second checks that the error happened. Write a different "main" for the overall compiler in Z that has a lex/parse/whatever context to hold that information. Then, can just put calls to those routines right in the source for test files, and then just compile them with the different variant of the compiler in Z, and it should check that expected errors, and no others, come out. 080629/Sunday For the last little while I've been working towards more complete ability to do class-like and interface-like things in Zed, but doing them explicitly. I think I'm close. I've been doing this via some Zed source files to see what it looks like, with commentary on issues. I'm dumping those here now, even though some were done a few days ago. class1.z: package /Parent; /* Note the 'class'. Note the lack of bundle parameters here. */ export bundle class ParentB() { export class type Parent_t = record; export type Display_t = proc(Parent_t par)void; export api type ParentApi_t = record { Display_t papi_display; }; export class type Parent_t = record { ParentApi_t par_api; string par_tag; }; }; package /Child; use /Parent; /* Note that we 'inherit' the entire bundle, not just a type from it. Here we expand both the api and the data record. */ export bundle class ChildB(inherit Parent/ParentB) { export class type Child_t = record; export type Use1_t = proc(Child_t ch)void; export api type ChildApi_t = record { inherit Parent/ParentApi_t capi_papi; Use1_t capi_use1; }; export class type Child_t = record { inherit Parent/Parent_t ch_par; uint ch_count; }; }; package /GrandChild; use /Child; /* Here we expand only the API. */ export bundle class GrandChildB(inherit Child/ChildB) { export class type GrandChild_t = record; export type Use2_t = proc(GrandChild_t gch)void; export api type GrandChildApi_t = record { inherit Child/ChildApi_t gcapi_capi; Use2_t gcapi_use2; }; export type GrandChild_t = record { inherit Child/Child_t gch_ch; }; }; package /GreatGrandChild; use /GrandChild; /* Here we expand only the data record. */ export bundle class GreatGrandChildB(inherit GrandChild/GrandChildB) { export api type GreatGrandChildApi_t = record { inherit GrandChild/GrandChildApi_t ggcapi_gcapi; }; export class type GreatGrandChild_t = record { inherit GrandChild/Child_t ggch_gch; float ggch_size; }; }; /* What has to be going on here: GreatGrandChildApi_t has to have fields: papi_display capi_use1 gcapi_use2 GreatGrandChild_t has to have fields: GreatGrandChildApi_t par_api string par_tag uint ch_count float ggch_size Types GrandChild_t and GreatGrandChild_t can be passed to a Use2_t. Types Child_t, GrandChild_t and GreatGrandChild_t can be passed to a Use1_t. Types Parent_t, Child_t, GrandChild_t and GreatGrandChild_t can be passed to a Display_t. Types Child_t, GrandChild_t and GreatGrandChild_t can be assigned to variables of type Parent_t Types GrandChild_t and GreatGrandChild_t can be assigned to variables of type Child_t Type GreatGrandChild_t can be assigned to variables of type GrandChild_t The Display_t proc in GreatGrandChildApi_t must be of type Parent/Display_t, so a display proc defined to accept a GreatGrandChild_t must be able to be declared with proc type Parent/Display_t. Similar up the chain. Note that no symbols are generated by the compiler - only programmer- provided symbols exist. Nor does the compiler add any "missing" symbols to any of the packages - the programmer must provide both the 'api' type and the 'class' type in all of the packages, even if they only 'inherit' from their parent. In fact, the compiler requires that these types so inherit. How does a bundle/class/value implement multiple interfaces? In Java, I believe it works because the compilation type of any item is known at that compile time, and that tells the compiler the complete set of interfaces that the item's class "implements". Thus, the compiler can put pointers to API records into each object of that class that is created, pointing to the actual functions that implement the various interface methods. Extending an interface simply adds more methods to the end of it, just like "extend"ing a class adds more data fields to the objects of the class. Note that the compiler cannot choose to not have such multiple pointers in each object, trying to have a single pointer to a constant record which contains all of the needed pointers. If it did that, then when the object pointer is passed to a member function of a far ancestor class, the code there would not be able to get at the other member function pointers it knows about. This also means that the order of the pointers to API records is determined strictly by the order of the various "extends" clauses in the class hierarchy, followed by the various API pointers needed by the "implements" clauses. Hmm. How does even that work? If an object is created with class Child_t, which implements interfaces Api1 and Api2, that would require the objects of class Child_t to start with pointers to the full set of class member functions, followed by pointers to records of functions for Api1 and Api2. That means that the pointer to the records for the Api's are not the first field of the object in memory. Ah, that's OK, since you can't inherit a function implementation from a interface. You can only inherit function implementations from a parent class. Thus, if a child class adds an interface that it implements, the pointer to the function record for that new interface must come after all of the data fields that are inherited from the parent class. I find it hard to think about, but I think it all works out fairly simply for Java. As new classes add new data fields, and implement new interfaces, the new fields or API pointers are just added to the main data record. Any actual function that deals with the record is declared to operate on whatever initial part of the record is defined for the class it is declared in. It will never need, or be able to, access anything beyond that point. No, it doesn't work that simply. Java lets you use the interface type as a real type. So, you can have a function whose parameter is of that interface type, which means that any value passed to that function is of a class which implements the interface. But, if you call an interface member within that function, how is that interface member found? At that point, there is no fixed way to find that interface member, since the pointer to the interface record can be anywhere within the record for the actual class of the object passed in. One way that Java could do it is to construct a pair that consists of a pointer to the interface record and a pointer to the base record, and pass that to a function which requires a type of that interface type. Then, within that function, there is a fixed way to find the needed interface function. In general, any situation that requires an object of an interface type can use a method like that. An issue for Zed is that such an "interface type" does not exist independently - there is in fact no way to declare it as a type, unless I add stuff to allow a bundle to sometimes be a type. Ick. That means the programmer must explicitly define, create and use the above-described pair type. I'm fine with that. In order for that to work, Zed must allow such a type to be defined (how?) and must allow the needed pair values to be created, in contexts which can do that. Hmm. Can this be done by using a polymorphic bundle which inherits the interface (API) type? Or just renames it? Maybe if I switch polymorphic bundles to use 'class' as the parameter type for such interface bundles, as below. */ package /AddNewFuncs1; use /GreatGrandChild; /* Add a new "interface" that we implement. */ export bundle class NewFuncsB(inherit GreatGrandChild/GreatGrandChildB) { export class type NewFuncs_t = record; export type Doit1_t = proc(NewFuncs_t nf; uint n)void; export type Doit2_t = proc(NewFuncs_t nf; bool flag)void; export api type NewFuncsApi_t = record { Doit1_t nfapi_doit1; Doit2_t nfapi_doit2; }; export class type NewFuncs_t = record { inherit GreatGrandChild/GreatGrandChild_t nf_ggch; NewFuncsApi_t nf_api; }; }; /* Hmm. We are defining the "interface" within this "class". In Java you don't have that limitation - you can define it externally. To do this, I might want to have multiple 'inherit' clauses in the bundle header. */ package /Api2; export bundle class Api2B() { export class type Api2_t = record; export type Doit1_t = proc(Api2_t a2; uint n)void; export type Doit2_t = proc(Api2_t a2; bool flag)void; export api type Api2Api_t = record { Doit1_t a2api_doit1; Doit2_t a2api_doit2; }; export class type Api2_t = record { /* No members! */ }; }; package /AddNewFuncs2; use /GreatGrandChild; use /Api2; export bundle AddNewFuncsB(inherit GreatGrandChild/GreatGrandChildB, Api2/Api2B) { export api type NewFuncsApi_t = record { inherit GreatGrandChild/GreatGrandChildApi_t nfapi_gcapi; }; export class type NewFuncs_t = record { inherit GreatGrandChild/GreatGrandChild_t nf_ggc; Api2/Api2Api_t nf_api2; }; }; /* How is the above going to work? There is no 'inherit' with the nf_api2 field in NewFuncs_t. That's because the 'inherit' must be on the first field only. So, how do we pass a NewFuncs_t to a proc that wants to have an Api2/Api2_t parameter (which is empty anyway)? What I'm running into here is that with Java and C++, the 'this' parameter is not explicitly given. I want to explicitly give it, but doing that requires a type for it. That in turn forces the creation of type Api2/Api2_t, which has no other reason to exist. In Java the type exists as the interface type itself. However, doing that in Zed would require that the bundle be a type, at least in that case. Interesting point - in C++/Java a class is both a packaging mechanism, like Zed's bundle, and also a type. I *could* partially follow their precedent, and require that such a parameter be the first in all of the proc's, and that its type is not given, other than as say 'class' or something. That would be a property of an 'api' proc type. So perhaps the 'api' should be on the proc types, and not on the type of the record containing the proc types. Or perhaps on both, and it checks that types within an 'api' record type are 'api' proc types. If I use the "class" syntax for the type, then the parameter doesn't actually have to be the first, since it is explicit. */ package /Api3; export bundle Api3B() { export api type Doit1_t = proc(class cl; uint n)void; export api type Doit2_t = proc(class cl; bool flag)void; export api type Api3Api_t = record { Doit1_t a3api_doit1; Doit2_t a3api_doit2; }; }; /* See some discussion above. The "class" pseudo-type, when used in a bundle that actually defines a class type, refers to that class type. If the bundle doesn't have one, then the reference is not resolved until we are in a bundle which inherits from this "interface bundle". */ /* Ok, can I "extend" an "interface"? */ package /Api4; use /Api3; export bundle Api4B() { export type DoMore1_t = proc(class cl; float f)void; export type DoMore2_t = proc(class cl; string tag2)void; export type Api4Api_t = record { inherit Api3/Api3Api_t a4api_a3api; DoMore1_t a4api_doMore1; DoMore2_t a4api_doMore2; }; }; /* What I would be wanting here is that any value of a 'class' whose API includes Api4Api_t would be assignable to a variable whose type is a 'class' whose API is Api3Api_t. Is that right? No, its not. A 'class' whose API is Api3Api_t may have data elements in it. A class that implements Api4Api_t may not have those data elements. So, the assignment can only be allowed if the destination has no data element type, or the source data element type inherits from the destination data element type. */ /* Does that better state the compatibility rule? Data value D can be assigned to variable V if the 'class' type of D inherits from the class type of 'V' and the 'api' type of D inherits from the 'api' type of V. Do I even need the 'bundle' concept here? Switching to file class2.z . */ /* Since there is an explicit 'inherit' within the types within the 'class' bundle, is there any need to actually have the bundle 'inherit' from the bundles its types are inheriting from? */ class2.z: package /Parent; export api type Display_t = proc(class this)void; export api type ParentApi_t = record { Display_t papi_display; }; export class type Parent_t = record { ParentApi_t par_api; string par_tag; }; package /Child; use /Parent; export api type Use1_t = proc(class this)void; export api type ChildApi_t = record { inherit Parent/ParentApi_t capi_papi; Use1_t capi_use1; }; export class type Child_t = record { inherit Parent/Parent_t ch_par; uint ch_count; }; /* The "magic" thing here is that field ch_par is needed in order to 'inherit' the data members of Parent_t, and the 'api' record within it must be mangled to now include the new capi_use1 field, i.e. to be of type ChildApi_t instead of type ParentApi_t. I'm tempted to say that there should be no names for the API types, so that this mangling looks less bad, but then I can't write a constructor for the API record types. */ /* Ick. I just asked myself whether the 'api' field should be implicit, rather than explicit, so that the mangling is invisible. Sigh. The circle closes, and I'm thinking about implicit stuff. */ /* Ok, more thinking. Perhaps the kind of thing to be doing would be to have a syntax that says "this is a field of this compound that represents an API to use the compound with". Then I can put them in both records and structs. Their special relationship with the containing type is made explicit. */ /* Also, instead of using the 'class' name for special parameters to API functions, it would be more consistent to just use a bundle, and use a bundle parameter name. That allows the API proc types to be either '@' types or not, and to have any desired "ro" flags. Maybe it would work something like the stuff in dirSing.z where there is an 'inherit' clause in a bundle instantiation, but I would be using it to "extend" an "interface" rather than to "extend" the data part of a "class". Using a bundle, and its bundle parameters this way also gives more generality, in that there can be multiple bundle parameters that can be used in the "interfaces". Whether that is useful or not I don't know. Also, it would perhaps be good if the "main" API for a "class" is done just the same as any other API that the "class" "implements". */ bundle Api1(type gen) { export type Display_t = proc(gen g)void; export type Print_t = proc(gen g; string header)void; export type Api1_t = record { Display_t api1_display; Print_t api1_print; }; }; export type Parent_t = record; bundle ParentI1 = Api1(Parent_t); export type Parent_t = record { ParentI1.Api1_t api par_api1; string par_tag; }; export type Child_t = record; bundle ChildI1 = Api1(Child_t); export type Child_t = record { ChildI1.Api1_t api ch_api1; inherit Parent_t ch_par; uint ch_count; }; /* Because field par_api1 of type Parent_t is tagged as 'api', the 'inherit' of that field within Child_t is special. It is not actually inherited, so field "par_api" does not exist within Child_t. However, the overall 'inherit' of Parent_t into Child_t is still set. Note that Child_t *must* have an 'api' field before its ch_par field in order for that 'inherit' to be allowed. And, that 'api' field must be a corresponding instantiation of the same bundle that is used in the type being inherited from. (Is that the correct restriction?) Here I don't actually do any kind of inherit from ParentI.Api1_t into ChildI.Api1_t. That's fine, since you can't take a value of type ChildI.Api1_t and assign it to a variable of type ParentI.Api1_t, since the procs in the value would be assuming larger values than would be provided to functions handling only Parent_t's. Individual functions which take Parent_t values are assignable to variables of proc types of corresponding Child_t types, however. But that all should fall out of the existing 'inherit' rules. */ bundle Api2(type gen) { /* Note: Api2Data_t is not used until later in the discussion - it is only needed when we want to use Api2 in isolation. Api2 corresponds to a Java interface, whereas Api1 corresponds to a Java class. */ export type Api2Data_t = record; export type Use1_t = proc(Api2Data_t a2d)void; export type Api2_t = record { Use1_t api2_use1; }; export type Api2Data_t = record { Api2_t api a2d_api2; gen a2d_gen; }; }; export type GrandChild_t = record; bundle GrandChildI1 = Api1(GrandChild_t); bundle GrandChildI2 = Api2(GrandChild_t); export type GrandChild_t = record { GrandChildI1.Api1_t api gc_api1; inherit Child_t gc_ch; GrandChildI2.Api2_t api gc_api2; }; /* Note that gc_api2 must be *after* gc_ch, else a GrandChild_t cannot be correctly used as a Child_t or as a Parent_t. So, the rule comes down to the idea that fields before an 'inherit' field must "correspond" to the 'api' fields in the type being inherited from. Ok, so how do I 'inherit' from GrandChild_t??? Also, I still haven't done anything here about the Java concept of one "interface" being able to "extend" another, and thus types that implement the extended one being compatible with variables of types that only implement the smaller "interface". Ok, maybe the rule is something like this: when a struct or record is inheriting from another, the 'inherit' field can only be preceeded by fields which are marked 'api'. There must be at least as many of those as there are API fields in the type being inherited. The corresponding ones must match, in the sense that the provided API proc types in the child type must match those in the proc types in the API in the parent type, except that any parameter in the parent that is the parent type itself must be the child type in the child API proc type. I don't know about result types. Note that the current 'inherit' rules for proc types should allow a parent API proc to be assigned to a child API proc variable/field. There can be 'api' fields after the 'inherit' field - that allows them to be put where they will actually appear. Perhaps those before the 'inherit' field must be only those inside of the inherited type. The resulting child struct/record type will thus have its 'api' fields at the same positions as the corresponding 'api' fields in the parent. Any remaining 'api' fields will go after all inherited fields, and before any added data fields. */ export type GreatGrandChild_t = record; bundle GreatGrandChildI1 = Api1(GreatGrandChild_t); bundle GreatGrandChildI2 = Api2(GreatGrandChild_t); export type GreatGrandChild_t = record { GreatGrandChildI1.Api1_t api ggc_api1; GreatGrandChildI2.Api2_t api ggc_api2; inherit GrandChild_t ggc_gc; float ggc_size; }; /* Ok, lets say I want to have code that cares only about Api2_t stuff. I believe that Java would need to build a small structure containing a pointer to an implementing class's virtual function table for that interface along with a pointer to the data value, wrapped together. It could have this inline with the overall single structure, I believe. I can't do that in Zed because it would a pointer into the middle of an allocated unit, which I don't support. So, I have to do a second allocation to handle that. In many cases I may be able to make things Api2Data_t be a struct, and use '@' parameters in all Api2 API procs. What about extending an API? */ bundle Api3(type gen) { export type Api3Data_t = record; export type Doit1_t proc(Api3Data_t a3d; bool flag)void; export type Api3_t = record { inherit Api2_t api3_api2; Doit1_t api3_doit1; }; export type Api3Data_t = record { Api3_t api a3d_api3; inherit Api2Data_t a3d_a2d; }; }; /* Ok, that's somewhat special. The 'inherit' in Api3Data_t is intended to make Api3Data_t values assignable to Api2Data_t variables. But note that it is including a 'gen' field. In Api2Data_t that field is of a type that is a parameter to the Api2 bundle. In Api3Data_t we wish to have it become of the type that is the parameter to the Api3 bundle. If this works out, it could likely be done either this way, or by having data fields whose type is a bundle parameter be not directly inherited, just like 'api' fields. Or maybe even require the 'api' tag on it. Weird - maybe a better name instead of 'api'. Maybe 'match' or something. */ class3.z: package /Test; use /Fmt; /* Given the assignment compatibility rules, how far can I go without anything special? Well, its close, but it doesn't work. The error messages on the child procs are valid. A proc that deals with a Child_t cannot be compatible with a destination proc variable dealing with a Parent_t, since a proc variable dealing with Parent_t can be passed a value which is only a Parent_t. A proc expecting a Child_t can access the child field "ch_count", which does not exist in a Parent_t. The reverse situation does work, and is currently handled - a proc expecting a Parent_t can be assigned to a proc variable expecting a Child_t. A key thing to note here is the API proc call in "display" is not invoking any of the special compiler code which checks polymorphic value usage. So, even though this particular example would run OK, it is not valid. */ type Parent_t = record; type Display_t = proc(Parent_t par)void; type Print_t = proc(Parent_t par)void; type ParentApi_t = record { Display_t papi_display; Print_t papi_print; }; type Parent_t = record { ParentApi_t par_papi; string par_tag; }; Display_t: proc parentDisplay(Parent_t par)void: Fmt/Fmt("parentDisplay, tag '", par.par_tag, "'\n"); corp; Print_t: proc parentPrint(Parent_t par)void: Fmt/Fmt("parentPrint, tag '", par.par_tag, "'\n"); corp; ParentApi_t ParentApi; uint Count; uint MAX = 10; [MAX] Parent_t Set; proc parentInit()void: ParentApi := ParentApi_t(parentDisplay, parentPrint); Count := 0; corp; proc add(Parent_t par)void: Set[Count] := par; Count := Count + 1; corp; proc display(string desc)void: Fmt/Fmt(desc); if Count = 0 then Fmt/Fmt(" - empty\n"); else Fmt/Fmt(":\n"); for uint i from 0 upto Count - 1 do Fmt/Fmt(" "); Parent_t par := Set[i]; par.par_papi.papi_display(par); od; fi; corp; type Child_t = record { inherit Parent_t ch_par; uint ch_count; }; Display_t: proc /* Proc formal "ch" does not match type from proc type "Display_t" */ childDisplay(Child_t ch)void: Fmt/Fmt("childDisplay, tag '", ch.par_tag, "', count ", ch.ch_count, '\n'); corp; Print_t: proc /* Proc formal "ch" does not match type from proc type "Print_t" */ childPrint(Child_t ch)void: Fmt/Fmt("childPrint, tag '", ch.par_tag, "', count ", ch.ch_count, '\n'); corp; ParentApi_t ChildApi; proc childInit()void: ChildApi := ParentApi_t(childDisplay, childPrint); corp; export proc main()void: parentInit(); /* childInit();*/ Parent_t par := Parent_t(ParentApi, "Fred"); add(par); Child_t ch := Child_t(ChildApi, "Wilma", 10); /* add(ch);*/ display("Only"); corp; package /; use /Test; export proc main()void: Test/main(); corp; class4.z (which currently compiles, but has an issue with field offset): package Parent; use /Fmt; export bundle polymorphic Api1(type gen) { export type Display_t = proc(gen g)void; export type Print_t = proc(gen g; string header)void; export type Api1_t = record { Display_t api1_display; Print_t api1_print; }; }; export type Parent_t = record; bundle ParentI1 = Api1(Parent_t); export type Parent_t = record { ParentI1.Api1_t match par_api1; string par_tag; }; uint MAX = 10; uint Count; [MAX] Parent_t Set; export proc Add(Parent_t par)void: Set[Count] := par; Count := Count + 1; corp; export proc Print(string header)void: Fmt/Fmt("Print, count = ", Count, '\n'); if Count = 0 then Fmt/Fmt(header, " - empty\n"); else Fmt/Fmt(header, ":\n"); for uint i from 0 upto Count - 1 do Fmt/Fmt(" "); Parent_t par := Set[i]; par.par_api1.api1_print(par, header); od; fi; corp; ParentI1.Display_t: proc parentDisplay(Parent_t par)void: Fmt/Fmt("parentDisplay: tag '", par.par_tag, "'\n"); corp; ParentI1.Print_t: proc parentPrint(Parent_t par; string header)void: Fmt/Fmt("parentPrint/", header, ": tag '", par.par_tag, "'\n"); corp; ParentI1.Api1_t ParentApi; export proc Parent(string tag)Parent_t: Parent_t(ParentApi, tag) corp; proc __PackageInit__()void: ParentApi := ParentI1.Api1_t(parentDisplay, parentPrint); corp; package /Child; use /Parent; use /Fmt; use /BI; export type Child_t = record; bundle ChildI1 = Parent/Api1(Child_t); export type Child_t = record { ChildI1.Api1_t match ch_api1; inherit Parent/Parent_t ch_par; uint ch_count; }; ChildI1.Display_t: proc childDisplay(Child_t ch)void: Fmt/Fmt("childDisplay: tag '", ch.par_tag, "' count ", ch.ch_count, '\n'); corp; ChildI1.Print_t: proc childPrint(Child_t ch; string header)void: Fmt/Fmt("childPrint/", header, ": tag '", ch.par_tag, "' count ", ch.ch_count, '\n'); corp; ChildI1.Api1_t ChildApi; export proc Child(string tag; uint count)Child_t: Child_t(ChildApi, tag, count) corp; proc __PackageInit__()void: ChildApi := ChildI1.Api1_t(childDisplay, childPrint); corp; package /; use /Parent; use /Child; use /Fmt; export proc main()void: Parent/Parent_t par := Parent/Parent("Fred"); Parent/Add(par); Parent/Print("First"); Child/Child_t ch := Child/Child("Wilma", 10); Parent/Add(ch); Parent/Print("Second"); corp; The issue with class4.z right now relates to the offset of the Child_t field par_tag. It is added to Parent_t at offset 24, but it ends up with offset 48 in Child_t. This relates to dealing with field offsets when using inline/inherit/match. If you 'inline' a struct containing just a pair of bool's, it might not need to extend the size of the containing struct if there is unused space at that point. We need to keep the offset (including the alignment computations) correct in addField, addAllNames, and appendFields. In particular, 'match' causes fields to not be added, so we can't just use the size/alignment of the containing struct/record, since those are based on the having the 'match' field present, but it is not present in the 'inherit' or 'inline'-ing struct/record. Can 'match' work with 'inline', or should it be limited only to 'inherit' situations? Also, "addAllNames" is just adding the type's byteSize. Should it not be reproducing the normal offset/alignment calculation? "addAllNames" was always using "getStructFields" for a recursive call. Fixed. "appendFields" was not taking into account the fact that fields in records are 2*8 bytes further into the record than such fields in a struct. When 'inline'-nig, that amount needs to be subtraced from their recorded offset. Fixed. Need to prohibit some combinations of 'match' and 'inline'/'inherit'. class4.z now running properly. Need lots more testing. "addAllNames" is doing strange/redundant things with fl_flags. 080630/Monday Make sure to disallow 'inherit' or 'inline' of incomplete type. New routine 'inlinedSize' needs a proper struct/record type. BUG!! Even without using 'match', there is a problem in class4.z . There the data record is defined outside of the bundle, but is used in a polymorphic way. The checks on the proc parameter being the same item as the API proc came from are not active, and allow cross-calling! 080705/Saturday Trying to have a bundle inherit from a parent bundle. Note on current status: perhaps when instantiating a bundle, first instantiate its parent, then continue with the child's new contents. [Later: no need - since I put all parent elements into the child, just instantiating the child works fine.] 080706/Sunday Lots of success today! 080707/Monday A couple of issues. I'm now trying to "extend an interface". That results in two 'match' fields in the record which defines the "object" that "implements the interface". However, the "object" that it "inherits" from has the interface pointer after earlier data fields (as it must, to keep it compatible with its ancestors). That means that I now need the code that actually deals with 'match' fields properly, in that it does the actual matching-up, so that the fields end up in the right places. Currently they don't, and so I can't construct the record. To clarify: the second 'match' field needs to be inserted in the middle of the 'inherit' of the parent record. To programmers, the rule would appear to be: "When you first define a 'match' field, put it wherever you need it to be. When you later must 'match' it with an inheritor, put all such 'match' fields at the beginning of your 'struct' or 'record' - the compiler will take care of inserting them at the position corresponding to the position of the originals. You may not be able to do that yourself, such as when the correct position is in the middle of a 'struct' or 'record' you are inheriting." %%% Should probably switch to using '->' for record field selection. Is does make sense in that situation. The fact that it is familiar to C programmers is a *disadvantage*, since the semantics are subtly different. The 'noInit' flag actually does make sense on a struct field - if that struct is 'inline'd into a record, then the flag comes into play. When 'inherit'ing into a struct or record, issues about field offsets are not that important, given that a pointer to an API record is of the maximal size/alignment. Issues could come up if we add a longer float that needs greater alignment. When 'inline'ing into a struct/record, I think the full alignment of the inlined type should be used. Also, the field tag by which the inner item is inlined should in fact be usable as a field name. This allows such a field to be used with '@' and its reference passed to routines accepting such. If we don't preserve the full alignment, then we cannot do the above, but we would occasionally be able to shrink the full size of the containing unit. Note that we do not have to preserve the full size of the contained type - we can add unaligned fields of the outer type right after shorter fields in the inner type. That could be tricky to arrange in the compiler. 080709/Wednesday When inlining a field, we need to align to the alignment needed for the record/struct being aligned. However, we can just increase the outer length by the actual length of the fields in the inlined type, since we can allow further fields to follow closely, and not violate semantics. One possible issue is that we don't store the alignment of record types, since they are always pointed at. So, the "addField" code will have to accumulate it as it inlines the fields. When building the init list for records, we should re-use the FieldList_t's that we have put into the allList table. Those will already have the correct field offset. 080718/Friday Too much messing with ETS stuff. I've just added to "addField" some stuff. It assumes that if fl_inherit or fl_inline are set, then fl_match is not set. Verify that that is valid. 080719/Saturday After moving the 'match' fields to after the 'inherit' field that they match within, is the whole "flAfterMatches" concept no longer needed? 080721/Monday My "apigrow.z" example now properly compiles and runs. Note that 'match' fields must be record fields. It doesn't work for struct fields, since they might not be the same size. Similarly, it cannot work for 'match' fields to be 'inherit' or 'inline'. 080722/Tuesday Look at Exec.c/packageInheritCheck, which is used by recordInherits, which is now used by Proc.c/matchesWantedType. The new code isn't checking for the types being from the same instantiation. That will never be the case when we are handling bundle inheritance. Is that OK, or does there need to be a stricter check? Changing packageInheritCheck to require that the wanted type be an uninstantiated type causes a problem: I can't use grandchildUse1 as the Use1_t field of Ggc's Api11. There is no problem with the main Api, however. Changing grandchildUse1 to have proctype Interface1/Use1_t (the uninstantiated form) gets a complaint about proc formal "if1" not matching what is in the proctype. However, "if1" is of a type that is an instantition of the uninstantiated one, so that ought to be OK. Again, need more thought. 080723/Wednesday Later thoughts suggest that using the proctype syntax to significantly change the type of a proc is a bad idea as it can remove potential uses for the proc. However, just now I've realized that the programmer is doing it explicitly, so if they need/want to, then why not, so long as everything is valid. 080728/Monday <><> Locking in GUI/graphics component for theaded writing. Likely makes the most sense to lock at the level of regions of the physical display. It also is possible that there is forced serialization down in the driver or whatever for the graphics chip, in which case no real locking is needed. However, care would be needed to ensure that the basic primitives are re-orderable if the chip can re-order from a queue. 1) master lock for entire display 2) on-demand locking. Each small (rectangular, presumably) region is the bounds of what is covered by any single bottom-level graphics call. That region is locked until that call completes. 3) reservation. Threads can quickly reserve regions of the display. Then no further checking is needed by them until they unlock it. Aha! I can make bundle/apigrow.z compile properly, without any of the changes to Proc.c/matchesWantedType using Exec_RecordInherits (which I just don't think are valid), by simple changes to apigrow.z itself. The change is, in gggcUse1 and gggcUse2, to make the first parameter be of type GggcI11.If1_t instead of trying to make it of type GggcI11.If11_t. That "smaller" field still has the needed if1_gen field which is of type Gggc_t. It's a bit less visibly consistent, but it works. I'm think I'll just leave it that way and push on, at least for now. Add another level of testing? Perhaps a "Top_t" "class" that implements a new interface, and extended versions of If11 and Gggc? Hmm. What was the name of the Rubble's cat? Turns out it was the Flintstone's cat (remember Fred trying to put it out, and getting put out himself?) and it is called "Baby Puss". It may have been Wilma's before they married. Added Top_t. All is well. Even had an api function which takes a pair of the 'gen' arguments. Summary of apigrow.z types: Parent_t: has field "par_tag"; api has "Display" and "Print". Own: Display, Print Child_t: inherits from Parent_t; adds field "ch_count"; adds api members "Doit1" and "Doit2". Own: Print, Doit1 Interface1: has api members "Use1" and "Use2" Grandchild_t: inherits from Child_t; implements "Interface1"; adds field "gch_size"; adds api member "Doit3". Own: Display, Doit2, Doit3, Use1, Use2 Interface11: "extends" Interface1; adds api member "Use3" Ggc_t: inherits from Grandchild_t Own: Print, Use3 Gggc_t: inherits from Ggc_t; adds fields "gggc_left", "gggc_op" and "gggc_right"; Test shows it can use any func from its ancestors. Own: Display, Print, Doit1, Doit2, Doit3, Use1, Use2, Use3 Interface2: has api members "Less" and "Crunch"; note that Less takes a pair of the polymorphic arguments. Top_t: inherits from Gggc_t; implements Interface2; adds fields "top_left" and "top_right"; adds api member "Eat". Note that Eat takes a pair of Top_t's as arguments. Own: Display, Use3, Eat, Less, Crunch Parent, Child, Interface1, Interface11 and Interface2 all implement simple collections of values compatible with them, and allow iterating over them. All actual API routines have access to the full data value with which they work. With the API through the direct descent, it is simply the parameter to the proc. With interface API's it is one field selection away. Big sigh. I had identified a situation in apigrow.z where I was expecting the error message about "Cannot determine that parameter "param" actual value has same base as proc being called" but was not getting it. I've looked into that now. The thing is that in apigrow.z, the "polymorphic" type is not actually defined within the bundle - it simply contains an instance of the API record that *is* defined inside the bundle. That breaks the test in "callParamCheck". Changing that test to check the type of the proc being called, rather than the parameter, results in errors that I don't want, from CharBuffer.z and Exec_Call.z, as well as from my basic bundle test programs. Also, things like Types/ContainsUninstPoly probably need to walk up the inheritance tree for bundles. 080730/Wednesday Trying to think carefully through polymorphic bundles. Are they type safe? Well, by modifying Types/ContainsUninstPoly to check a named type for an instantiation as well as being directly in a bundle, I can get an error in my small "problem.z" test program. Now to expand testing of that. Mostly OK. Get an error in test3.z (the lastest version of the simple GUI testing thing). It appears incorrect. The same error appears in program djrShapes.z, but that is using old instantiation inheritance. There are no problems in system code, or construct or io examples. Hmm. This only works out if Exec/callParamCheck uses Types/SkipNameAndExec on ppl.ppl_type before passing it to ContainsUninstPoly. Without that skip, the bad code is allowed. That will be because the top-level record type in this situation is not defined in a bundle or instantiation, but a field within it (the API field) *is* defined in an instantiation. So, an additional level of record/struct would break this again. Its reassuring to know, however, that if I let ContainsUninstPoly recurse all the way down (in a program that doesn't include most of the big Z sources), the error is reported without the above skip, so long as I check both nd_containingBundle and nd_containingBundleInstantiation. If that kind of full search can be made to work in general, then it should handle inserted record/struct types, as mentioned above. I still want to know whether the code in ContainsUninstPoly needs to go up the inherit chain of bundles, and which one(s). I also still want to know if the whole thing is type-safe in general. 080731/Thursday I *think* my head cleared a bit on this stuff overnight. The basic vulnerability with polymorphic bundles, that must be carefully managed, is the very existence of uninstantiated variables. Note that there are no actual uninstantiated values - the compiler prevents the creation of those. There can be things like vectors allocated in such a way that the element type is an uninstantiated type, but the actual values stored into such array will always be of an instantiated type. The vulnerability is that different values that can be referenced by variables of uninstantiated types can be from different instantiations, and therefore can be of different sizes/forms. This isn't directly a problem, since there are no operations that can be performed on the unknown portions of values referenced via uninstantiated variables. Since values with apparant types are uninstantiated cannot be directly passed to any procs whose formals accept any instantiation, there isn't a problem there. If the proc has formals of the uninstantiated form, then the same rules will apply within that proc. The only situation I can think of that is a potential vulnerability is that of calling procs which are obtained from the values themselves. Those procs can relate to the actual instantiation of the value (the compiler checks that everything matches when instantiated-type values are being created). It is perfectly fine to pass that instantiated value itself (or a reference to any portion of it) to a proc obtained from it. It is, however, not correct to pass a value to a proc obtained from another value. And the above, of course, is the basic check applied to values of uninstantiated polymorphic types. The situation I ran into is one where the wrapping record-type itself is not an of an uninstantiated type, but it contains references to procs which have formal parameters which are of uninstantiated types. In general, there can be an arbitrary amount of wrapping around the occurrences of the uninstantiated types. Regardless of how much wrapping there is, however, there is no way to produce a value of an instantiated type which contains procs which relate to a different instantiation. See bundle/twoInst.z for some testing. Whew! Got a verson of Types.c/containsUninstPoly0 that neither allocates forever nor infinitely loops. However, it takes a good 3 minutes to do the single check necessary for Exec_Call.z/RunCompleter! So, doing that full search is not a viable answer. It looks like storing a vector of referenced bundles in each named type should work, but will be a pain to do. Bah! Did the same change in the other two procs that use VisitedTypeList_t. Unfortunately "checkBundleGeneric0" ends up being called very early in startup, and ends up with a NIL TV->TV_VisitedTypeList when it tries to free the one it has created. This is via Types/NamedNew from predefRecord. 080801/Friday Some kludges required to fix the above issue. All in CheckBundleGeneric stuff so far. Basically, the VisitedTypeList_t stuff will never need to mix between Zed and C, and the Zed version isn't setup for quite a while (and never for runit's that don't include the Zed source), so just use direct mem_alloc/mem_free for them. Ick. The InstantiatedTypeList_t stuff used in Types/instantiate0 is not done properly. They are *never* being freed in the C version. There were appropriate "DEC_REF" calls, but that doesn't actually free anything. To be consistent, I've added a "res" variable, and do the freeing using a "mem_free" at the bottom of the routine. Needed to add "break"s though. What about assignments? With the wrapper record being defined outside of a bundle, is it now vulnerable to assignments to its fields breaking the "don't cross the instantiations" rule? Gah... "bundle/problem1.z" is quite vulnerable to this. Considering that there can be arbitrary levels of wrapping involved, is this fixable? Hmm. "problem.z" only has one instantiation of its bundle. So, everything is actually OK from a types sense. Try in "twoInst.z". Ok, there I'm protected by the error: Cannot assign to uninstantiated polymorphic field "t2_display" But, the code that does that (Exec/uninstPolyCheck) doesn't recurse through the type - it just checks a tik_named. Is that good enough? I don't think so - the code that uses it, in "modifiable" recurses through the Exec_t to get at the base expression (e.g. for struct field selection), but it does not recurse through the type - it relies on that one check. Ahhh - but uninstPolyCheck unconditionally says "NO" if the type is not named. Ok, one more thing to check - the proc types defined in a bundle, but the record type containing them defined outside the bundle. Note that I still haven't done the needed checking for 'match' fields. "nbPoly.z" - exploring polymorphism without a bundle. Needed to try using a 'match' field in my record. Ran into same problem as VisitedTypeList_t with MatchList_t. In my "runit" for that file, I don't include the main Zed sources, so get a SEGV when trying to free a MatchList_t element. I don't think the same tricks will work here, since user compileTime code can run in the middle of defining a record/struct, so there can be mixing of C/Zed MatchList_t's. Fascinating. I can do polymorphism without using a bundle. I just need the magic 'inherit' and 'match' properties on fields in a record. Of course, it doesn't provide the desired type safety either! But, since I haven't done the 'match' checking yet, that might not yet be a real issue. However, it has the same basic problem - at the "parent" level, there is nothing that prevents me from assigning an API routine from one instance to another. That's fine until we have "child" records, in which case it is another example of "crossing the wires". A simple test program, "nbPoly.z" illustrates the problem. It's true that there is no checking done on 'match' fields yet, but that wouldn't help. Full checking *might* make the entire example not work, but that isn't helpful. I'll write it here: I don't think polymorphic bundles can work. Period. I think I can keep generic bundles, likely just renamed as "generic"s. It looks like I may have to go to Java-style classes and interfaces. I want to make sure they do what I want them to do first, however. DONE 080802/Saturday Much discussion with Don about syntax for polymorphic method calling. Current decision is to surround the proc expression (this generalizes to all uses of proc expression) with braces. E.g. {ggc->display}(ggc) to call virtual method "display". E.g. {adder}(a, b) to call via proc pointer variable "adder". Note that the polymorphic method call looks just like calling a proc pointer obtained from a record. That's unfortunate, but in "cg-style" code, all record fields have taglets, and simple polymorphic methods won't (because you implement them in various classes), so you can tell. Similarly, the syntax {ggc->display} *yields* as type Proc/Proc_t the proc value that is object ggc's display proc. The difference is the choice between the '}' token and the '}(' token. Note, I will need an explicit syntax to say whether or not a proc is polymorphic, and I will need to mark that in the Proc_t. I could use 'poly' or 'virtual' or 'abstract'. 080803/Sunday (Heritage Days today with Don.) Before the above, I finished off the Java version of "apigrow.z". What I couldn't do is the "less" method, which takes two arguments of the interface type. The reason I couldn't do it, as discussed with Don, is that it is a type violation to have such a thing. Polymorphic methods are called with arguments of the polymorphic type. That's all that is typically known about them at the calling site. So, there is no way to know that they are the same concrete type. Something like the "less" interface requires that they be of the same concrete type, in order that field accesses in an implementation of the proc be valid. As Don says, you could put in an implicit runtime type check, but I hate that idea. So, all polymorphic functions (member functions) can only have a single parameter that is handled special, in that the proc is declared to accept a value of the interface type, but the concrete implementations end up being defined with that parameter being of the concrete type. The Zed compiler must check this if I am having explicit declaration of that parameter, which I still believe I want. Note that this all applies just the same to class methods as it does to interface methods. The "less" concept can be done by having a proc in the interface that takes a single value of the interface type and returns a fixed-type value (e.g. uint, float, string) that gives the "size" of the value. Those size values can then be compared as needed. One situation where a proc that takes a pair of same-type values is needed is in a sorting routine. A "size" value as above cannot always be guaranteed to sort properly - e.g. if the concrete type contains a pair of floating point numbers. In Zed, sorting is accomplished by having a generic sorting package. In old-speak it was "bundle generic". In new- speak it might just be "generic". Hopefully the class/interface stuff and the "generic" stuff are enough. 080804/Monday From djr@nk.ca Mon Aug 4 16:18:38 2008 > ... So, all polymorphic functions (member functions) can only have a > single parameter that is handled special, in that the proc is declared > to accept a value of the interface type, but the concrete > implementations end up being defined with that parameter being of the > concrete type. The Zed compiler must check this if I am having > explicit declaration of that parameter, which I still believe I want. Aw. How about this? A function within a class or interface (CorI) may have a parameter of type "morph". That means that the function is polymorphic; each such parameter's declared type is the CorI type, but its concrete type can be any derived type. When the function is called, it refers to the function instance defined for the concrete type of the "morph" parameter (whether directly defined or inherited); this is determined at run-time. If there is more than one "morph" parameter, then at run-time, a function instance is determined for each such parameter. The call uses the most ancestral of all such instances. The user will not be astonished, that the run-time increases with the number of morph parameters, nor that "two" is rather slower than "one". Of course there are alternatives: ... The call uses the most ancestral of all such instances, provided that the first "morph" parameter also refers to that instance; otherwise, the call is a run-time error. ... The call proceeds only if all such instances are the same instance; otherwise, the call is a run-time error. From: Chris Gray On Mon, 4 Aug 2008, Don Reble wrote: > Aw. How about this? > A function within a class or interface (CorI) may have a > parameter of type "morph". That means that the function is > polymorphic; each such parameter's declared type is the CorI > type, but its concrete type can be any derived type. Any derived type? Assuming there is no way to forward-declare derived classes (or post-define the concrete member procs), that should be safe. But, I see no practical advantage to not simply using the class type that the proc is defined in. And, I see potential confusion in allowing such an obscure thing as using some other type. > When the function is called, it refers to the function instance > defined for the concrete type of the "morph" parameter (whether > directly defined or inherited); this is determined at run-time. Right, the usual virtual function table access - 2 or 3 instructions. > If there is more than one "morph" parameter, then at run-time, > a function instance is determined for each such parameter. The > call uses the most ancestral of all such instances. Most ancestral? I would have thought that the greatest common ancestor would be the logical choice. > The user will not be astonished, that the run-time increases with > the number of morph parameters, nor that "two" is rather slower than > "one". Of course there are alternatives: "rather slower"? It would be 1 or 2 orders of magnitude slower. Perhaps more, since it has to examine the formal parameter lists of the procs. > ... The call uses the most ancestral of all such instances, > provided that the first "morph" parameter also refers to that > instance; otherwise, the call is a run-time error. > ... The call proceeds only if all such instances are the same > instance; otherwise, the call is a run-time error. Sorry, I'm not at all convinced. It breaks one of my rules for Zed - no implicit run-time stuff that isn't easily described things like array index checks, overflow checks, etc. Also, doing this would *require* that extensive type information be present at runtime. My hope is that I can determine which programs need it and which don't, so that if it is not needed, it is not brought into memory. I imagine most programs wouldn't otherwise need it. I would need to carefully examine all virtual function calls in all code of a "program" to see if type information is needed, and which parts of it. All of this seems far too great a cost (description, implementation, memory, CPU time) for very minimal benefit. -cg 080808/Friday Been working on Lego and web pages, as well as just plain putting off starting on classes and interfaces. <> The style of scroll bar that seems reasonable is the one where the size of the knob is proportional to the fraction of the whole that is visible in the window. However, if that fraction is small enough, you want to stop shrinking the knob. Perhaps indicate visually that we are in that mode, by changing some property of the knob, like its main colour. Also, it might be good to put 3D-ish bars on the knob, as in thumbgrip things. Might want to pre-provide a 2-D scroller like GraphicsMagik's "display" program uses when the image is bigger than the display window. Allow the user to set properties of that (keyed to the particular application and working directory as usual), such as whether or not the scroller window opens within or without the main window it is controlling. 080813/Wednesday (Most time spent on "Stremnaya/Hollywood" Lego project.) <> The Zed language should not know about any identifiers. It knows about reserved words, but I don't want things like C's knowledge of "memcpy", etc. However, the compiler *will* know about the Base routines that do checked arithmetic, since it uses them itself. An optimizer should know about them too, so that if the user is using them, the optimizer can use its knowledge of what they do to further constrain values it is tracking. 080817/Sunday Make 'bits128' - 'bits1024' reserved words. Just in case. [Done] Similarly, when changing to a compressed format for uint in byte-streams, leave the format open-ended. [Done.] 080818/Monday I had a thought last night, that has some possibilities, but after some quick looking this morning, I don't think it resolves everything. The thought was triggered by the idea that in a class implementation, I would want a tag on polymorphic formal parameters, so I know which ones are to be treated special. What went through my mind last night was a bit about describing what the polymorphic bundle concept was about, and the way the special API calls are checked. I then go to the idea that with the bundle type defined outside of a bundle, I had no way to know whether or not to do those checks (the "basesMatch" checks). If I check too many calls, then I disallow simple things using the uninstantiated type that need to work, such as my various "Add" routines. If I don't check enough, I end up with bad calls not detected. So, maybe I can explicitly flag which ones are supposed to be the special polymorphic ones. This is best done with the 'polymorphic' flag on the formal parameter type of the proc type of the API member. The testing would come in that if the formal parameter is so marked, then only a proc that is an appropriate instantiation will match that proc type. In particular, a proc which simply has the uninstantiated type will not match. Then, when calling via a proc expression, if the proc expression type has a flagged parameter, then the "basesMatch" code is triggered. This morning, however, I did "ls -lrt *.z" in the bundle test directory, and found "nbPoly.z" to look at. It does polymorphic stuff without having any bundle at all. It also has a "Bad1" proc that is able to assign an API function pointer from one bundled value to another. It's "Display" routine then ends up passing a bundled value to the wrong proc. In earlier tests the assignment that "Bad1" does would not be allowed, because the bundling type would have been defined in a bundle, and so would not have been writeable in its uninstantiated form. So, one possible way out is to only allow the 'polymorphic' tag to be used on a proc formal parameter whose type is a bundle parameter type. Or, maybe it can be used anywhere inside a bundle. Looked again, there is another issue. The magic stuff happening in "nbPoly.z" and elsewhere is based on the 'match' and 'inherit' trickery. Is the answer simply that 'match' fields are never writeable? They can only be initialized via constructor? Use of the wrong API function in the constructor is caught in "nbPoly.z" - see "bad2". Still, where this is pointing is that with 'match' and 'inherit' there may be no need for polymorphic bundles. Needs lots more examining and testing. Maybe the 'polymorphic' tag must be present on exactly one proc type formal parameter in order for 'match' to be allowed? I was wondering if the 'polymorphic' formal is enough, and implies the 'match' functionality. However, in "apigrow.z", the 'match' is also used on a data field (e.g. "match gen if1_gen;") in my "interfaces". Later: perhaps the trick is to make 'match' fields read-only. The "nbPoly.z" code uses match fields and the constructor checks work there. Can it be that simple? 080819/Tuesday I added the check to make 'match' fields read-only. "nbPoly.z" is fine. However, "apigrow.z" now no longer compiles. The problem is that the implementation of interfaces requires that there be a pair of records that reference each other, and those references are both 'match' fields, so I cannot set them up. Hmm. That is actually fixable by simply removing the 'match' from the 'gen' field in the interface record definitions. Then, all references to the field named in the inheriting interfaces must switch back to using the name of the field in the base interface (e.g. "if11_gen" => "if1_gen"). Also, the constructor procs must be re-arranged to then do the assignment to those renamed 'gen' fields. Then "apigrow.z" compiles and runs again. Note however that it *is* using polymorphic bundles for the interfaces. The same changes make "class4.z" work again too. Further note: source file "problem.z" still compiles without error. However, note that it actually doesn't have a type violation - it just looks somewhat like it does, and the error indicated in "Bad1"'s comment does not come out. Similarly, the assignment in "Bad2" does not produce a type problem. What if there actually were two instantiations, however? Then both routines would actually be bad, I think. Check. 080821/Thursday Unhappy with the Hollywood hill LEGO project. Didn't work on it today. Read instead. Quite cool and cloudy and windy out. Went for a walk to think about non-bundle interfaces. Can I do something with empty struct/records? Or ones that are empty except for match fields? Can I inherit from them at other than the first position of the inheriting struct/record? Does that help at all? It does give a way of doing multiple inheritance without the various nasty issues that C++ has. But does it provide me with interfaces? Verify that none of the problems with polymorphic bundles exist for the non-bundle approach to inheritance. It certainly protects against assigning API records, since they are 'match' and so now read-only. What about assigning to fields of an API record? They were safe with bundles because they were defined inside the bundle. Sigh. You can assign to the individual fields of the API record. How do I fix that? One brief thought was to require that all fields of a record type that is used as a match field must also be match fields. But that prevents the use of normal record types as match fields. Is that acceptible? It might be, if I'm sure that only API records need to be 'match'. But again, the question comes out of how deep do I need to search for all 'match' fields? Ah - only one level, since the rule would have applied when the lower struct/record was declared. If a field is declared 'match', there is supposed to be something for it to match against. I think that implies that the API records must 'inherit' from the parent one (or the base one for an interface). Will that work? That *seems* to work, so far. 080822/Friday I seem to have managed to do an interface without a bundle, without any language changes. See bundle/nbPoly.z . I think the key is that each implementation of the interface type simply declares a field of the main record type for that "class". Routines in the interface do not ever need to even know that it exists, and the routines in the implementing "class" can simply use it directly as they wish. So, in implementing the interface, the implementing class defines its own variants of all API routines, the API record itself (which inherits and matches), and the interface record (which inherits and matches). It then includes a reference to its variant of the interface record in its main record, so that it can, if it needs to, call interface routines from its main record API routines. 080823/Saturday Unfortunately, I believe all fields of any struct/record that is used as a 'match' field must be 'match'. This is because once it is used in a 'match' position, we must prevent values being transferred from one such struct/record to another. We can't tell to prevent that from looking at the type - we must see the 'match' to prevent those assignments. When defining 'match' fields in a child type, the field names provided are typically never used. Only the parent field names are used, in polymorphic contexts. So, it would be nice to have a syntax in which no name needs to be provided. E.g. use '*' or 'match' or something. Add an error when trying to define a field that is both 'match' and 'noInit', since such a field can never be other than 'nil'. Perhaps when defining a child 'match' field, you should be forced to use the field name of the parent field that is being matched. That avoids having to use a new identifier, and removes the need to do the fields in the matching order. The field name would not be added again, of course. 080824/Sunday I started doing "nbapigrow" this evening. Not going well. The inheritance thing isn't working out for API procs. Need to look more at some point. 080902/Thursday (Way too much work on "Stremnaya"/Hollywood LEGO project.) <> There could be a library routine (which would have to be written using privileged operations) which calls a user-provided routine on each field of a struct/record. It would have to pass the routine a case-oneof detailing the value, as well as the field name. Special oneof alternatives can be used for array elements and perhaps that is how structs would need to be handled. The routine could be used by a debugger to textually show record contents, or even to graphically show them in a graphical debugging environment. It could also be used by the Fmt code if there is no customer formatter present, perhaps based on some format code associated with a record value to be displayed. Note that an individual programmer can always use the method currently used by the Fmt/FmtCreate routine, which generates code based on examining the struct/record definition. However, the routine is likely easier to understand and use in many instances. Note that the routine would have to be quite careful to not reveal information that should not be revealed to the executing user. For example, types that are not exported should not be shown to users other than the owner of the package containing the type. 080908/Monday (Still more LEGO work - getting close though.) <> Triggered by an article on Google's Chrome browser on The Register, I was thinking a bit about all this Web 2.0 nonsense. Assuming a good protocol between user client machines and remote servers, such that remote access to data is about the same difficulty as local access to data, what is the real difference between an application that keeps data locally and one that keeps data remote? In fact, there doesn't have to be much difference for the user in most cases. I *think* this is some of what "Google Gears" does, although I haven't looked into it. However, there are some big implications: - the data is not on your local machine, which means you don't need permanent storage space for it (with browser-based applications, some or all of it can end up in your browser cache). - backups are no longer a user issue - the user no longer has control over the data (the data may be used for other purposes without the user's permission or knowledge, the data may be modified without the user's permission or knowledge, etc.) - the user may not even own the data anymore - the user no longer has 100% guarantee of the data existing - communication problems can deny the user access to the data Go one step further in the Web-2.0 model, and have the application be a Javascript thing running in a browser. Implications: - the user no longer has to be concerned with keeping the application uptodate - the provider of the application no longer has to worry about porting the application to multiple OS's. My understanding is that there still exist cross-browser issues, however. - the provider can know that all users are using the most uptodate version of the application - the provider can add new services, modify existing services, and remove services, by simply updating the application on their servers. The users have no direct control over any of this. It seems to me that in both aspects, most of the benefits are to the provider, and not to the user. <> The Zed implication here seems fairly small. The user has control over when things are updated. However, perhaps what is needed is a tag on a package that says that the user is not asked when the package can be updated (it is updated whenever a new version is available), *BUT* that package can only run inside "sandboxes". This leaves the user in control over packages that have normal access to local data. 080915/Monday Back at Zed finally! The last thing I was doing was working on nbapigrow.z, which is exploring a growing API (inheritance and interfaces) without using bundles. The problem I had run into was using a parent "display" proc in a child API struct. Turns out the reason for the error message was that the names of the formal parameters differ ("par" versus "ch"). Now, those are the formal names one would want. With an explicit child "display" proc, the use of a typename and the ':' syntax would allow the child display type to use formal name "par" while the proc uses "ch". So, it looks like the answer here is to do the rename in the test program, so that all of the child API proc types have to have formal names matching the formal names in the parent API proc types. It is sufficiently annoying to have to type all of the API proc types, etc. in that I think it would be very good if the IDE aspect of Zed did all of that for you. Ok, hit the next problem. In "nbapigrow.z", I'm trying to do Ggc_t. It needs to inherit from Grandchild_t. The issue is with its version of Interface1 - it needs to inherit from Interface11 in order that values of type GgcIf11_t be acceptible to polymorphic routines which accept values of type Interface11_t. However, when constructing values of type GgcIf11_t, the compiler is not letting me use actual procs from the Grandchild level. The reason is that its procs want parameters compatible with type GrandchildIf1_t. And GgcIf11_t is not such a type, since it does not inherit from GranchildIf1_t. (If it does, then I get the issue with the Interface11 polymorphic routines.) If I try to make the specific instance interface API routines accept just the interface type, then the specific routines of those types do not have access to the fields within the specific data record. Now, I'm not *too* concerned if I can't grow interface API's. However, since I *can* grow "class" API's, it would be nice to be consistent. Note that if I write stub Ggc Use1/Use2 routines, they can't even call out to the Grandchild versions, since their argument, a GgcIf11_t doesn't even contain anything compatible with grandChildUse1/grandchildUse2. Does it come down to the possibility that I can't "inherit" from something that implements an interface and at the same time be able to use the parent's specific implementations of that interface? So, the "inherit" ("extend" in Java-talk) concept only applies to the main "class" hierarchy, and not to an auxilliary "interface" hierarchy? Not quite, since you can use an Interface11_t as an Interface_t, to use names from "nbapigrow.z". 080916/Tuesday All of the types and symbols needed to do the inherit/match stuff are quite ugly. Avoiding the field names of 'match' fields, as suggested earlier (080823) helps somewhat. However, in "nbapigrow.z" there are still far too many type declarations - essentially duplications of the various API function types. Really, only one should be necessary. An API function type can have the "class" type used for more than one formal parameter. With the way 'inherit'/'match' are working, does that provide a hole in the type scheme, which would let different child types be present, and so be passed potentially to the wrong actual API proc? ... Well, "twoArg.z" stomps all over the smaller type. However, I have to keep in mind that I've never implemented the checking that 'match' needs, so that might catch this. Even if it does, it would end up being something that is hard to explain - "only one argument to a proc in a 'match' proc type can be of the type within which there is a record containing the proc type". Or some such. Ick. Sigh. The traditional "class" syntax avoids all of these issues by simply not explicitly specifying those things that are special, thus removing the possibilities for problems of this type, and avoiding any need for checking for them and reporting them. How about I just type in some stuff off-the-cuff, for what "classes" could look like in Zed, with as much being explicit as I can make, then we'll see if that makes any sense. type Parent_t = class { type Print_t = proc(*; string header)void; type GetValue_t = proc(*; uint count)uint; type ParentApi_t = record api { Print_t papi_print; GetValue_t papi_getValue; }; ParentApi_t par_api; string par_tag; }; Notes: 1) 'api' is a reserved word (token). It can be used only in this context. 2) the '*' in the proc types is only valid inside a 'class'. It means that there is a parameter of the class type in that position. There can only be one such in the proc header. This could be 'class' instead of '*'. No parameter name is needed - those are given in explicit implementations of the proc type. The other formal parameters have one, since that is standard syntax. Do I want to be more standard in these special ones, so that you can add tags like 'ro'? There is something to be said for not allowing symbols, to avoid any issues about multiple symbols. 3) note that the programmer explicitly provides names for the API elements and the API within the class. 4) the set of things that can be within a 'class' type declaration is types and data fields. 5) class types can be forward-declared, just like record types Ok, so let's inherit from that: type Child1_t = class extend Parent_t { uint ch1_count; }; That's easy. But how do I expand the API? type Child2_t = class extend Parent_t { type Use1_t = proc(*)void; type Child2Api_t = record api { Use1_t ch2api_use1; }; float ch2_size; }; Notes: 1) the magic token 'api' lets us know we are defining the API type for this class. So, the compiler knows to fill it in with the types from the parent class. The "inheriting" is implicit. Or, I could use an explicit "inherit ParentApi_t;" However, that would have to be there, so I'd have to check for it. Also, when we come down to interfaces, it would have to be the first field of the 'api' record, so that needs to be checked too. So why bother making it explicit? 2) in API proc types, the '*'/'class' parameters are not given any specific type - they simply use a special tik_class. This should allow the single declaration to work for parent and child. Is an interface just a class type with no data elements? Maybe, but more might be needed. Type checking on procs used to initialize API records needs to be done correctly. The proc types must be defined in the same class type...... Later thoughts... what really is the difference between this stuff and bundles? They both have the concept of magic types. The problems with bundles came about when types defined within the bundle, using the bundle parameter types, were allowed to be exported outside of the bundle. That was pretty much required to do polymorphism. What stops the same problems happening here? Need to go look again at what the problem with bundles was. Can bundles be "fixed" somehow? 080917/Wednesday My problem with programmer-defined operators is basically that you cannot look at code that uses them and understand what it does. However, that is true for most code if you are not familiar with what the proc calls do. The difference is that you *know* that you don't know what the proc calls do. When you see a '+', you think you know what it means. However, with operator overloading of it, you may be mistaken. It's the hidden source of errors that I really don't like. So, it just occurred to me that I can allow the operators in a way that makes their use stand out: don't allow overloading of the normal operators. Instead, have a set of alternates that *can* be overloaded, and which are resolved based on the argument types, as in C++. In Zed, I think the way to do some operators is to double them, since such are not currently used in Zed. So, allow '++', '--', '//', '**', '^^' (I use '^' for exponentiation), '||', '&&', '{{', '}}'. I can't use ']]' since it can occur normally in Zed. Perhaps go with ':==', too. I already use '==' and '~==', but perhaps I can change those somehow. Can also have '<==' and '>==', but I already have '<<' and '>>' as shifts. Or, maybe just use something like '$' or '@' in front of all of the available operators. That way they are all available consistently. LATER: decided on '#' in front of them. 080923/Tuesday First public mention of Zed (but not really what it is), in a new "linkedin" page that I made in order to join the YottaYotta alumni group. 080924/Wednesday Given that using bundles for the apigrow stuff ends up having to use both inherit and match record fields as well as the 'inherit' in the bundle itself, there just seems to be no advantage to that scheme. It has the language addition of bundles, it has the extra symbols, it isn't as concise as the above 'class' setup, etc. To do interfaces, I pretty much need to explicitly define the record type for the interface. Either that or it is implicitly defined somehow, and I don't like that. So, an interface is not just a 'class' with no data fields, unfortunately. Let's just blunder ahead and try one. type If1_t = class { type Doit1_t = proc(class; char ch)void; type Doit2_t = proc(class; sint s)void; type If1Api_t = record api { Doit1_t if1api_doit1; Doit2_t if1api_doit2; }; type If1_t = record { If1Api_t if1_api; instance if1_instance; }; }; type Child3_t = class extend Child2_t { If1_t ch3_if1; bool ch3_flag; }; Ok, I'm not fully happy here. the "instance if1_instance" line is a new thing. The 'instance' ends up referring to a class that uses this one. Is that too obscure and indirect to be able to check properly? Do I need to restrict where an 'instance' field can appear? I.e. do I need an actual 'interface' type like I have a 'class' type? Is there any reason to prevent their being actual data fields in an interface record? How do I do the type checking involved with an 'instance' field? I'm back with needing a bunch more restrictions as well. For example, the fields of an 'api' record must be read-only. Similarly, the field of the class/interface record that references the 'api' record must be read-only. That needs some indirection to check. Ick. I could also end up not being able to construct the pair of records involved in a class/interface pair, since both end up having to be 'ro' just like in the old 'match' version. What it comes down to is that the cross linking among the 'class' record and the various interface records must be done properly, and not ever broken apart. That almost implies a language construct to do that binding. I think that is doable, but its not very pretty. No 'bind' statement is needed when the class record doesn't implement any interfaces - all of the fields can be given directly in the constructor. How does the 'bind' statement know what to do? Presumeably it is given a set of record pointers and it examines their types and figures out what all it needs to do, and it also complains if there are not enough of them to fill in the class record. Perhaps the class record has to be the first one. Perhaps also the field names within the class record should be given. E.g.: Child2Api_t Ch2Api; If1Api_t Ch3If1Api; Child3_t ch3 := Child3_t(Ch2Api, "tag", 1.23, nil, true); If1_t if1 := If1_t(Ch3If1Api, nil); bind ch3 ch3_if1: if1; Note the types of variables Ch2Api and Ch3If1Api. They don't look right to me. With the bundle variant, they were at least typed by this instantiation of the bundle type. There is nothing here that says really what type they are. Is what they are typed as good enough, given the proc assignment rules? How then do I determine that the API procs given to a constructor of them are from the proper class (proper instantiation in bundle-speak)? I don't see a way. [Just verified that you can't test for the presense of a member function in C++. Not that you could ever explicitly create an object without one.] Gah! None of this stuff works out. What I *need* is for a record type to be able to participate in multiple polymorphic operations. I don't see a way to do polymorphism without using API records of some kind. They either have to be explicit or implicit. Explicit leads me down all of these problematic paths. But, I dislike all of that implicit stuff. I guess its not so much that I dislike the implicit construction of the API records (C++ virtual function tables) as it is the use of them when calling the API procs. Hmm. See 080802 - the explicit use of '{' and '}' around proc expressions. A polymorphic call would need those. Thus, you can tell that something special is happening. Explicit constructors? Both C++ and Java have them. Destructors? Would have to be Java style, where they are called when the ref count goes to 0. 080925/Thursday Done a bit of history stuff - hardware so far. I'd like to get some thoughts straight on where I am going. My intent is to ask Don to proof-read, etc. what I write here, then post it to comp.lang.java for feedback. We'll see how that all goes. The reasoning steps towards my decision go something like this: S1. my requirements make the only workable solution be one involving some kind of virtual function tables. If they are explicit, they are references to structures containing references to procs (functions). Is there anything else that would work, given my requirements? S2. in procs used for some type's polymorphic function, it may be necessary to call other procs associated with the same actual type. To do that, those other procs must be directly findable from the reference passed to the calling actual proc. S3. various types that participate in the polymorphism can have quite different actual layouts. Only one virtual function table reference can be at the beginning of all such layouts, unless the compiler can know the entire type structure at compile time. That isn't possible in my system. S4. given the above, the only way I can see to allow for multi-way polymorphism (e.g. implementing Java interfaces) is to use small sub-structures that pair a reference to the whole structure with a reference to the particular virtual function table. This small structure can either be inline, or be allocated separately, depending on the storage management system (e.g. garbage collection, reference counting). A reference to this small structure is what is actually passed to procs which implement a given interface. S5. if the language is requiring that all of this be done explicitly by the programmer, then the programmer must construct the main structures and the above sub-structures. In general, however, the sub-structures (and any references to them if they are separately allocated) and the references to any virtual function tables, must not be modifiable in any context in which their actual types are not fully known. (For example, if classes B1 and B2 both inherit from class A1 which implements interface I1, code within A1 procs cannot be allowed to copy a virtual function table pointer from one A1 or A1 sub-structure to another, since it may not be valid for the destination value.) [In my system, I am using both reference counting and garbage collection. I don't want to allow references into the middle of allocated structures, since I don't want to require expensive run-time searches to find the beginning of such structures from such a reference. So, the sub-structures must be separately allocated.] S6. in order to construct the above values explicitly, they must be allocated and initialized in a normal series of steps. Doing that requires assigning to explicit fields that cannot normally be assigned to. Can the situations in which this is allowed be safely identified? S7. if the answer to the above question is "No", then there is no way to construct the required setup if things are done explicitly. S9. the type of procs in a virtual function table structure will be the type needed for the most ancestral variant of the class or interface. Procs not at that level are not directly compatible with those types, so some modification is needed. In one of my earlier approaches to this, I would "instantiate" a "polymorphic bundle" in order to yield the desired types, and type compatibility rules allowed an instantiated value to be assigned to an uninstantiated (and hence polymorphic) destination. That approach eventually failed because it allowed too much to be done with uninstantiated values, where the dependence on the bundle was hidden deep within the types involved. I could not find a way to determine whether to relax the rules or not. In another approach, I allowed structures to "inherit" from other structures, and came up with rules of proc assignment using that. However, I still had to introduce new concepts to morph the needed types. This approach also ended up being vulnerable to operations at the polymorphic level. I am creating a new programming language. Much of it is running, but I need a good mechanism for polymorphism. I currently have good stuff for things like information hiding, access control, generics, etc., but I need a mechanism for polymorphism. I wanted something more explicit than what is done with C++ and Java classes, but I eventually concluded that making too much be explicit opens more type-safety holes than I can close. My requirements include: R1. absolute type safety. This is crucial to my system. There are other ways within the language to do low-level things. R2. multi-way polymorphism. C++ does this with multiple inheritance. Java does it with interfaces. R3. as explicit as possible in syntax and semantics. In my early attempts, virtual function tables were explicitly constructed by the programmer, for example, but I no longer think that is workable. R4. no implicit run-time type checks. I have explicit checks. R5. no significant run-time expense. This means no looking up of function pointers in hash tables - things need to be O(1) execution time. R6. syntax and semantics as clear and straightforward as possible My current thinking is to essentially implement Java's class and interface system, with a few syntactic changes. So, I ask the Java experts among you some questions: Q1. Is the Java class/interface system absolutely type safe? I believe the general belief is that it is. However, if there are little-known issues, I would like to know about them. It is possible that I can resolve them in my system. Feel free to privately email me at cg@graysage.com Q2. Are there known problems with the Java class/interface system? What are they? Are there proposed fixes to those problems? Where do I find those? Please be specific - since this is only one aspect of the whole that I am building, I can't spend months reading thousands and thousands of lines of discussions in order to become expert on issues that aren't directly relevant. Q3. Are there tricky things that are non-obvious in terms of implementing this stuff? Short example code that exposes the issues would be good - I can then use them as test cases. Note that I am not a Java (or C++ for that matter) expert. I have a lot of experience with programming languages and compilers, however. You may need to explain specific terminology, but I should catch on. I am basically a programmer, not a language lawyer. I have designed and implemented over a half-dozen programming languages, but I am not active in following any specific language discussions or processes. I intend the results of my work, whenever finally ready, to be freely available. 080927/Saturday Don suggests replacing all of my requirements and the first paragraph with: I've been creating a new programming language, and I want to incorporate polymorphism without compromising type-safety, clarity, and efficiency. I'm not convinced - that doesn't actually state requirements like the O(1) access, and complete lack of runtime type-checks. For Q1, Don says: Some people say, the class loader breaks it. Others say it's never been proven. For Q2, Don says: Then ask for specfic kinds of problems. I have a book of such problems: conceptual pitfalls, political pitfalls, management pitfalls, analysis and design pitfalls. Yeah, it also has class and object pitfalls and such; but even there, I figure you're interested in very few of them. 080928/Sunday I think all leftovers of 'match', 'inherit', 'inline' and polymorphic bundles are now gone. What is left is a lot of renames from bundle stuff to generic stuff. It adds quite a bit of space, but I may want to have a flags field in Exec_t. That way I can have an 'error' flag. That would allow me to return the desired Exec_t containing all the stuff from the input, without throwing away a lot of stuff, while at the same time guaranteeing that a nasty programmer cannot get at a valid Exec_t that should not be valid. 080930/Tuesday Whew! The renaming took a while. I did comments as well. However, there is an English ambiguity in using "generic" to refer to the Zed construct, in that it is then harder to use "generic" in its normal English sense. Something to do is to examine all the comments (ick!) and look at replacing phrases like "generic type" with "uninstantiated type" as appropriate. Sort of done, grepping on "generic type". 081001/Wednesday Reading a ComputerWorld article by the designer of C#. One of the things he mentions is wishing there was a way in the type system to indicate that a given reference type/value could never be nil. It is good to have the compiler check that statically, so that there is no possibility of a nil- pointer problem at runtime. See later on "nonNil". Latest C# has "boxing" and "unboxing". Everything is an object. They can teach the language that way, then later bring out the separate cases for int, bool, float, etc. C# 3.0 has more declarative stuff to bring it closer to database work. Something called LINQ. Later: Language INtegrated Query. He says programmers should check out functional programming and meta- programming. Chatting with Don during a walk. We settled on: 'final' - applied to class/interface that you can no longer inherit from - you are prohibited by this token. On an actual proc, it means that an inheritor cannot override this proc. (Hence you can't have 'abstract final' on a proc.) 'poly' - for a polymorphic proc inside a class. If 'poly' not given, then the proc is not polymorphic. (So 'poly' is like the inverse of 'static' in Java.) 'abstract' - like Java. If on class, then class cannot be instantiated. If class contains any 'abstract' proc, then it must be 'abstract'. On a proc header, means that an inheritor of this class must provide an actual implementation before the resulting class can be constructed. If 'abstract' is on a proc, then 'poly' must be too. Interfaces contain only 'poly' 'abstract' procs. Note that 'abstract' implies only a proc header, and not a proc body. Lack of 'abstract' implies a proc body. 'poly' can be on either. 'final' can only be on an actual proc. Posted my R & Q stuff to comp.lang.java at 17:23 today. 081002/Thursday Don notes that you may want to use 'final' on a proc header (which would not be marked 'poly' or 'abstract') to indicate that an inherited proc is to be finalized. Strange case. My response: Hmm. That would require a strange special case - it would have to be the proc header only, but with 'final'. In at least one sense, that is the reverse of what is normally done - the initial ancestor with the method declares it with just a header, and then a descendant provides an actual proc. In the situation you are discussing, some ancestor has provided an actual proc, and now the descendant needs to have a proc header, just to specify the 'final'. My thinking is that this is an obscure, and somewhat hard to follow case, so I could disallow it. If the programmer really needs to do that, then they can provide an actual proc, marked as 'final', which calls the ancestor proc. Later: Don suggests that the same can be said of 'final' on a class in general. I think I agree - maybe just drop that. 081003/Friday <> Remember to put 'assert' clauses in at some point. They are appropriate in procs (working with local and package variables), directly in packages (working with package variables), and in structs, records and classes (working with the fields of such). <> UUID's. Perhaps the local part (e.g. the lower 64 bits of a 128 bit value consisting of machine-id and counter) is split into two 32 bit halves. The upper value increments whenever there is a system restart. The lower half overflowing is also allowed to increment the upper half, but only once in a given real-time interval (e.g. an hour). %%% Maybe use '#+', '#<=', etc. for operators, rather than '$' forms. This leaves '$' for use with persistent variables. Also, Don says Eiffel uses '#' '#' for a user-defined operator. Spending a bit of time looking at just how classes fit into what I've already done in Z sources. Starting with CharBuffer and Fmt. When I write an explicit proc to satisfy a class proc, do I want 'poly' on it or not? Also, do I declare the first formal parameter as 'class' or of the actual class type? Using 'class' makes the type of the actual proc better match the type of the parent class proc (if it is 'abstract'). The latter makes using the formal symbol within the proc easier for the compiler. That suggests I should do the latter, since I should be able to handle the former in Proc/matchesWantedType. Do I really need to require 'poly' when you have 'abstract'? I think I don't, since 'abstract' on a proc means it is a proc type. That would leave 'poly' only used on actual procs that can be overidden. But, the 'poly' is then only really needed on the most ancestral class that first defines that proc. This is all pretty implicit. Sigh. I really don't like this. I certainly don't want to try to handle having procs in a class that reference data fields that haven't even been defined yet. Java even shows doing that in examples. Also, the syntax means quite different things depending on minor syntax differences. E.g. in Java, the magic 'static' in front of a function completely changes what is meant. I think I need to go back to some of the concepts I thought about with polymorphic bundles, and apply them to classes. Perhaps the required syntax within a class body is something like: ??? data fields 'api' api proc types (thus no actual proc needs 'poly') ??? actual procs for this class (the only reason they are inside the class scope is that that lets the compiler know whether any of the api proc symbols is 'abstract' or not. If I drop 'final', then I think the above also allows me to drop 'abstract' on procs and also 'poly'. I could then drop 'abstract' on classes (since it is implied by their being a missing proc implementation). How about: 'class' [ 'extends' ] '{' [ 'fields' '{' '}' ';' ] [ 'interface' '{' '}' ';' ] [ ] '}' Using 'interface' here instead of 'api' essentially means that I can just re-use the syntax and some semantics of an interface. Perhaps that says that instead of 'class' within API proc headers, I should use 'interface'. Or perhaps that's where I should use 'poly', since that is shorter, and doesn't imply either a class or an interface. 081004/Saturday Need to bring back 'inline' for struct/record fields - I need it with the new Lists package I did. DONE I don't like the terminology Java/C++ use. superclass/subclass and parent/child. It is the inheritor that is "larger" than that which it inherits from. Speaking of it as the "subclass" is just confusing. Similarly, a "parent" in real life is often bigger than the "child". I'm thinking of using "inner" and "outer". An outer class inherits from an inner class, and thus "encloses" it. Similarly, I think the term 'field' should only be used for the data fields in a class. Need to work on the semantics of constructors/destructors. One thought is that if a class has an explicit constructor, then its argument list is what is given on a constructor call for the class. If it does not have one, then I do the same as for records - the default constructor has as arguments the list of fields (including inline ones) not marked 'noInit'. If an inner class has a non-default constructor, then so must all outer classes of it. What exactly do the declaration of, and the innards of, an explicit constructor look like? They pretty much have to be somewhat like Java ones. I don't particularly like the 'super' stuff though, in particular the fact that seemingly you can omit the call. No - it must be the first line, and if you don't put it in, then a call to the inner class's default constructor is silently inserted. One issue is that externally, I want construction to appear as the typename followed by the list of values, and I want the class reference to be returned. But, within the constructor, I want to be setting up the fields within storage that is implicitly available. I guess this is exactly what Java does, and the constructor is declared to return 'void', but all user uses of them actually return the reference. Hence, I guess, their use of 'super' rather than the named constructor. INDEED IT WAS UGLY TO DO Java does a special case syntax - there is no result type provided with the constructor proc - and it actually isn't considered to be a proc. The Java default constructor has *no* arguments. That's the reverse from where I am going, where the default constructor has arguments for each non-'noInit' field. The Java default constructor requires that the inner class have a zero-argument constructor (or the default constructor) and calls that. Putting in a "classes/apigrow.z" for future testing. One missing thing came up - I wanted to do an explicit run-time type check that an interface value was actually a class value. The current 'assign' will not do that. First, the value is not of type 'any'. Second, I actually need to indirect through the implicit interface record node to get at the class node. This could be done by the 'assign' construct, and an additional compile-time check on the second argument. This would then yield a different eik_XXX kind node, however. [Later: I added the new 'interfaceAssign' construct.] Note also that I assumed the use of proc references taken out of the class or interface API record. In one case I tested against nil, but in another I passed the value to "Fmt/Fmt", assuming that it is of type Proc/Proc_t, and so Fmt/Fmt can print the name of it. This is consistent with stuff I said earlier, where the token '}(' is in fact a single token, different from '}' '(', so that I can easily decide in the parser whether to be calling the API proc, or to be taking it as a value. I *could* do the same as Java and have a special proc-header-like syntax for constructors. That should be do-able, since the name is the name of the class we are in. Should I need the 'proc'? The constructor would look even more different if I leave it out. But, it needs a 'corp'. So: C1. if a class has an explicit constructor, then any class that extends it must also have an explicit constructor, etc. C2. syntactically, a constructor looks like a proc, but it does not have a return type. Sometimes it returns nothing, and other times it returns a newly created object of the class type. C3. if the class extends another class, then, if that other class has a constructor, the first statement in the constructor body must be a call to the inherited class' constructor. Passed parameters are expressions involving the parameters to this constructor. This call looks like the inherited class name, '(', parameters, ')'. C4. if class C2 extends class C1, and C2 has an explicit constructor, but C1 does not have an explicit constructor, then the constructor for C2 can call the implicit constructor for C1 just like it would be able to call an explicit constructor for C1. C5. if a class does not have an explicit constructor, then it is given an implicit constructor that is like a record constructor - it accepts arguments that are the non-'noInit' data fields in the class, and returns a reference to the newly created class value. 081006/Monday I've just finished modifying the struct/record/union/case-oneof/bits code to not yield SymTab.SymInfo_t records for field symbols. Nothing requires that these be available in local/package symbol tables, so why put them there? For case-oneof, the index symbols *are* put into the tables, as are enum and set-oneof tags. This is because all of those tags can be used as independent values, and so the context of the containing type may not be known when the symbol is looked up. RESOLVED MORE LATER 081007/Tuesday NSA has SPARK Ada for writing safe, secure software. 081008/Wednesday %%% When spiffing up Exec stuff, introduce a TempWhile_t. put a TempScope_t in it, and have Exec/WhileStart start a scope, and have Exec/WhileNew end it. That way the parser doesn't need to do it explicitly. Do the same for 'for'. [There are now TempWhile_t and TempFor_t, but I still need to clean up the scope/sequence issue.] Hmm. All of the "Print" stuff that pretty-prints everything should not be in places like Exec, Types, Package. That pretty-print stuff is the reverse of the standard parser, and so should be with it. If folks want to change the syntax, they will also supply a different pretty-printer, that will not be within Exec, Types, Package, Proc, etc. ALL MOVED Cleanup. Added eik_instantiatedTypeRef, and reworked stuff to use it. Should help cleanup parseInstantiationSelection, and moving of some of it into Package. (Moved it into Exec.) Added the requirement for '{' and '}' around proc expressions in calls. Didn't work like I had initially thought, but it seems OK. It is entirely in the parser - I didn't end up with a new ExecInfo_t kind. 081009/Thursday Did some renames in Package.z - the word "use" is just too confusing for direct use, since in English it can be used in both the verb sense (A uses B) or the noun sense (a use of B by A). So, PackagePath_t becomes PathToPackage_t, and PackageUse_t becomes UseTarget_t. <> Found a reference on comp.arch to a programming language: http://seed7.sourceforge.net 081013/Monday Starting to define the types for interfaces and classes. Even though the syntax for a class includes an interface, it doesn't look useful to actually do that internally. A question that comes up is that of whether the actual procs must be in the same order as the methods in the interface. If they must, then I can have the arrays always be the same size, and having a nil for a proc means it is abstract for this class. If the actual procs can be in other orders, then I need a separate indication of whether a class method is abstract or not. RESOLVED 081014/Tuesday Starting actual code for interfaces and classes. Do a need a "dgs" state structure when defining generics, like I have "dts" and "dps"? Do I need one for interfaces and classes? Also for generic instantiations? %%% I have both ps_pk in the parsing code, and the two package references inside the Proc/Context_t. Do I need the one in the parsing code? There are several uses. parseSetPackage has some pointless stuff. For now, I'll do the interface/class stuff the same. 081015/Wednesday Reference to dynamic languages from Slashdot: http://www.cio.com/article/454520/_Scripting_Languages_Your_Developers_Wish_You_d_Let_Them_Use Will have to prevent 'poly' formal param values from being passed anywhere else. Perhaps the key is that only proc types within 'interface' can even have 'poly' parameters. That means that no explicit proc can have them, and so things should be safe because of that. DONE THAT WAY 081016/Thursday %%% May want to add a proc variant 'varargs' or something. It would be like 'ioproc' except it would not have any of the '::' or ':' stuff. It's not really necessary, since 'ioproc's can handle it, but it might be nice, since it is simpler to explain and use. Using the above, I could probably have an explicit matrix constructor, that takes any number of explicit values and yields an allocated matrix containing those values. For Python-like work, could have one called "anys", that accepts ref values and basic type values (which it wraps using the Base records), and yields a "[] any" value. Could also have a "cat" that concatenates multiple vectors and elements. Pretty high-level stuff! [Tried the matrix constructor - works fine.] 081018/Saturday <> I got to thinking about future editing scenarios. One important one will be the addition or removal of fields to struct/record/class types. There are issues even ignoring the whole concept of modifying existing values. How does persisted code reference the fields? Right now, the in-memory representation is a reference to the Types/FieldList_t for the field. This is persisted in Exec/AddToBuffer by using the field name. But, what if the field is renamed? All references to that field must be updated somehow. Using an index into the fields has similar issues - a field can be inserted or deleted before the referenced one, and updates are then needed. Another possibility is using field ID's. They never change for a given field, and so will survive renames, insertions and deletions. Even deleting the field itself will work - referencing code that is not updated becomes invalid and will no longer compile. (If there is compiled native code stored with a proc, then if the package containing that proc "use"s the package containing the type that has changed, and the last-update time of the using package is earlier than the last-changed date of the used package, then code stored in the using package is invalid - that can be detected at package load time.) Note that you cannot just scan the type looking for the lowest unused id - you have to persist a "next-id" counter with the type. However, there are issues even with ids. Eventually it is possible to wrap even 64 bit ids (not during normal use, but during extreme testing it might be possible, or perhaps scenarios with automatic code updating). What happens then? A simple answer (in some senses!) is that the type can no longer be modified (other than deleting fields). (Note, it is *not* valid to reset the id counter when there are no fields.) Or perhaps a global system search is needed for all uses of the type, so that the ids can be reset to 1, 2, 3, .... Perhaps Zed needs the concept of a "program". The idea is to limit the scope of searches like the above. Types defined in a program cannot be exported outside of the program. So, the scope of searches needed when types are changed is greatly limited. In particular, there is no issue about using code being on other systems. If the program is on other systems, then they will get the updated type when they get the updated program as a whole. System libraries and types are another issue. Perhaps the version number stuff can help there - if you modify a type in a library, then the version number of the library is incremented, and all users of that library must now be "re-compiled". When the library is shipped off-system, receiving systems see the change and must similarly re-compile. Might work. <> I *can* avoid needing a path to case-oneof tag constants when doing a oneof case. So, likely should - it is a pain to keep putting them in. I'm having trouble deciding what the legal uses of methods are. If a class has its own set of instance methods, S1, and implements some interface I1 that has its own set of interface methods S2, where is it legal to use the syntax {clVar.meth}(...)? One thought is that it is only legal to do that when "meth" is in S1, and not when it is in S2. What if the call uses the direct call syntax "meth(clVar, ...)"? It would seem strange to be prevented from calling a proc that is explicitly given within the class. Perhaps the question comes down to that of what the scope is for such procs - are they added to any of the containing package's symbol tables? If they are not, then perhaps denying the call is justified, at least outside of the class body - but what about within the class body? What about calling procs that are inside a class this one is inheriting from? [This all got resolved during implementation - you can only use the capsule methods selected from a capsule expression, and interface methods can only be used via an interface expression.] Another issue I ran into was that of whether or not you can put procs inside a class body that are not implementations of an instance method or some interface method. As of this writing, I disallow that. [I added the "utility proc" 'procs' section.] I think there might be problems when a class implements lots of interfaces. Right now the matching of method to proc is done based on name - I do not have overloading. But, what if interfaces happen to have methods with the same name, but different types (different signatures)? There can also be a clash between an interface method and an instance method inherited from an inner class. [RESOLVED] The whole way of doing things is a big mishmash! Perhaps what I need to do is to split up the procs section of a class so that there is a section for each interface that it implements (either directly or because its inner class implements the interface), as well as a single section for an implicit constructor and any instance procs. That at least makes it clearer what a specific proc is for, and allows for multiple procs with the same name but given signatures - the name is only relevant within the section containing it. This also makes it more acceptible to not allow non-method procs - there isn't any section for them to be in. [DONE] With the above, then perhaps the rule is that the {clVar.meth} can only call methods that are relevant to the section containing the proc containing the call. Common utility routines are outside of the class, and must be pre-declared (which is unfortunate, but I don't see a way around it). That standard proc-call syntax cannot be used for method procs. [The utility 'procs' section fixes this.] Just sent the following as email to Don, Roel and Darius: (This isn't an issue with my Zed stuff, but rather one that I've noticed with both C++ and Java. I don't know if Python works out this way or not.) In C++, it is possible that multiple inheritance in a class results in the same method name being inherited from multiple ancestor classes. In Java, this can happen between multiple interfaces that the class implements, or between a name from an ancestor class and one from an interface. Both languages use overloading of function names. However, in both cases, the programmer writing the new class often has no control over the method names used in classes he inherits from or interfaces that he is implementing. If it happens that the same function signature is used for two different instances of the same method name, the programmer has no way to distinguish the uses - the one actual function that he writes will be used for both purposes. I don't think there is even a way for the function itself to tell how it ended up being called. The whole "match by method name and signature" thing is very ugly, to my way of thinking - there is no simple way to determine what function is used for what purpose. It's possible that C++ has invented a way around this, like perhaps prefixing an actual function name with a class name of the chosen ancestor class. However, the programmer must know to do that, and if the inherited classes get a new method, some existing actual function can silently get used for that new method. This came up during my Zed implementation of classes, when I realized that I was going to have to search through the ancestor class and all implemented interfaces to find the set of methods that a given actual procedure will be used to implement. So, my tentative decision is to make this explicit, by having separate sections within the class body for procs intended to implement methods from specific interfaces (or instance methods). Some example syntax, as it currently is in my mind: interface Int1_t { proc print(poly p; uint width)void; proc printWithTitle(poly p; uint width; string title)void; }; class Inner_t implements Int1_t { /* These are the data fields (per-object). */ fields { uint in_count; float in_size; }; /* These are the declared instance methods. */ interface { proc work1(poly p; float factor)float; proc work2(poly p; uint iterations; [] float factors)void; proc print(poly p; uint width)void; }; /* This section contains any explicit procs to implement the class interface. This can include a non-default constructor. We don't have a "work2" here, so it should not be called on an Inner_t object. */ procs { proc work1(Inner_t in; float factor)float: float f := ....; f corp; proc print(Inner_t in; uint width)void: /* Print "in" with the specified width. */ corp; }; /* Procs that are intended for use with Int1_t. */ procs Int1_t { proc print(Inner_t in; uint width)void: /* Print "in", possibly in a different way. */ corp; }; }; interface Int2_t { ... }; class Outer_t extends Inner_t implements Int2_t { fields { string out_tag; }; /* No additional instance methods declared in this example. */ procs { proc print(Outer_t out; uint width)void: /* This print can print "out_tag" as well. */ corp; } /* This is valid because Outer_t inherits from Inner_t, and so implicitly implements Int1_t. */ procs Int1_t { /* No new "print" here, so Inner_t's Int1_t print will be used. */ ... }; procs Int2_t { ... }; }; So, any comments on all of this? -Chris 081020/Monday <><> Had a "vision" this morning, that follows on to earlier ideas. It would be nice to have a graphical system display that shows the hardware communication paths within the computer. For example, starting out from the CPU, one could see north/south/host bridges, PCI bus, SATA stuff, floppy stuff, ethernet, USB, ISA-Bus, serial, parallel, etc. These would show up as "bus joiners" on the display. There could also be bus splitters, such as the chip YY used that took one PCIx bus and yielded a pair of PCI busses, that then went to FC chips that then go to routers, and then to drives, etc. Hovering the mouse over an icon on the display would show all of the information relevant to that represented item. E.g. ISA interrupt numbers, DMA channel numbers, etc. The display (and system of course) should be prepared for multipathing. Various connections shown on the display would flash (controllable update rate) when data is travelling. Annotations could show the actual data rate, along with a sorted (by decreasing amounts of data) list of the ids of the users on whose behalf the data is moving. This of course requires that the system itself maintain that information. The system should remember (possibly among other things), a setting for internal/external - i.e. whether a device is "inside the box" or outside of it. For example someone could have both internal and external USB drives. The system, and this program, would support a database of manufacturer and device ids that can be consulted. This could provide custom code to deal with the device at the OS level, and could provide custom icons, or lettering, etc. for display. So, if I plug my PSP into my system via the USB cable, then an icon for it should appear, connected to the USB bus, shown outside of the "box". Custom software, etc. could start up and show me what is on the PSP (e.g. what UMD disk is present, if that can be determined via the USB connection). From: Don Reble To: Chris Gray Subject: Re: Zed: instance functions > [C++ and Java] use overloading of function names. However, in both > cases, the programmer writing the new class often has no control over > the method names used in classes he inherits from or interfaces that > he is implementing. If it happens that the same function signature is > used for two different instances of the same method name, the > programmer has no way to distinguish the uses In C++, one can rename methods by introducing artificial classes. The usual example: class grapher { virtual void draw(); }; class carddeck { virtual void draw(); }; To inherit from both, introduce new classes, class mygrapher : public grapher { virtual void graphDraw() { draw(); } }; class mydeck : public carddeck { virtual void randomDraw() { draw(); } }; and inherit from those instead. The problem occurs often enough, that Eiffel has a rename statement. I wonder, how much is caused by ambiguous natural language (whence comes the names). My Java books don't say what happens when functions from two interfaces match, but most likely, > the one actual function that he writes will be used for both purposes. > procs Int1_t { > proc print(Inner_t in; uint width)void: > ... > corp; > }; Hmm? If the class inherits from two interfaces, and both have a "proc grok()void:", what does the class programmer write? -- Don Reble djr@nk.ca 081021/Tuesday This is from a couple of days ago. I had momentarily forgotten why I wanted constructor procs. Can't I achieve whatever consistency I need by not making the class public, and by exporting creation procs that maintain my consistency? Yes, that works. However, if the class is extended by another class, those creation procs are not usable. They allocate and return a value of the class (or record) they are defined for. If class B extends class A, then we need to be able to allocate a memory chunk for a B, and then have A's consistency done on A's portion of it. The magic trick of constructors allows this - when used outside of the class, they look like something that takes arguments and returns a class object. When seen inside the class, or in an extending class, they look like something which is given a class value to initialize. It also occurs to me (it may have before as well) that a default constructor for the inner class can just as easily be used inside an explicit constructor. The restriction is that if an inner class has an explicit constructor, any class that extends it must too, since that outer constructor must choose how to call the inner constructor. %%% The various SymTab/Table_t's used in Class_t and Interface_t would better be custom tables. It ought to work to use generic "StringTable" for them. In C code, just have functions that deal with "void *", and use casts as needed. Note that when doing this, we are switching from using a case oneof (SymTab/SymInfo_t) to using a direct value, so the internal structure is changing (one less indirection). So, it needs to be done in Zed and C at the same time. At some point, go through various type structures (e.g. TempFields_t users) and have the final type use an array of fields, rather than a list of them. This can be done in Types/FieldsDone, which would save the resulting vector(s) into the TempFields_t, where the callers would access them. The same can be done for enum and case-oneof, and there we can also put in a table indexed by the strings. Such a table is not needed in the final enum type (the symbols are in the containing proc or package symbol table), but it will be useful in case oneofs if I do what was suggested earlier and allow case statements on case oneofs to not need a path to the tags. DONE I really ought to create a couple of C macros to make some the work a bit easier - e.g. the cass of either creating a single-element vector or appending the new element to an existing vector. Come to think of it, this could be a compileTime proc in Zed. Maybe enough of the world would be setup when compiling Package.z that it would work there. 081022/Wednesday <> A possible OS concept is a kind of "shared segment". Its a chunk of memory that the OS (or possibly some process) owns. A separate storage manager is used for the segment, so that a set of related data structures can be built in the segment. Only the owner can allocate/free or modify the data. Other processes can "open" the segment, thus mapping it into their address space at the same virtual address. This allows them to use the data structures using normal means. The one difference is that the segment is read-only to them, and the access is such that they cannot modify the use counts on blocks within the segment, nor can they initiate garbage collection within it. However, they do hold a "lock" on the segment, which prevents the freeing of any chunk of memory within it. The system must track all chunks of memory within the segment that are in the "waiting to be freed" state. When no more client processes hold the lock, those chunks can be freed for re-use within the segment. If I client process holds the lock for a long time, and the segment ends up running out of free space, the owner of the segment (possibly automatically) can remove the segment from such a client. The client will then get an access violation on its next access, which will either kill the process or land it in an exception handler. This sort of thing could be used for the OS information about busses and devices, as described earlier. It could also be possible to have shared-writeable segments, which all clients can modify. This would require finer grained locking, and the storage management would still need to know that it is a separate segment. 081023/Thursday Note that there needs to be a Package/CreateSymbol, and some of the uses of Proc/CreateVarName should change to use that. DONE 081024/Friday Making good progress on classes. Two issues have come up. One is that a class may want to implement multiple interfaces with the same name. This could be either direct or indirect. The problem comes with the 'procs' sections - they currently only have the direct class name. It becomes a bit uglier if I change that to being the same path to the interface that was given in the 'implements'. I guess its consistent, however. But, that means I have to change the way in which I tell the code whether the 'procs' is for an interface, the class, or utility. And change the parsing. DONE The other issue is that of how to reference a utility proc. They *aren't* in the package, so the usual symbol referencing stuff doesn't work. Looks like I need an eik_classUtilRef or something. DONE %%% Add a bits field to Proc_t. Put p_pt and p_typeForced into it. Add a p_hasError, which allows us to mark a proc as invalid, so that it cannot be run (will not even try to generate bytecode). That way we can avoid having the DefineProc stuff return nil, and thus do not discard erroneous procs. 081025/Saturday Allowing more fixed fields in a record/case-oneof before the one variant part shouldn't be a problem. Allowing that keeps the size of the records consistent (i.e. the garbage collector and reference counter don't have to check the variant to determine how big the memory chunk is) without often wasting space. Also, there is upto 7 bytes of space available without increasing beyond the current space - typically the tag needs to be only as big as an enum needs to be, which is usually only 1 byte. So, I could add a flags field to Exec_t and Type_t without using more storage. DONE 081031/Friday Getting close to finished with classes. Biggest left is constructors. This is a reminder to check that a constructor body has a call to an inner constructor if needed. Also to add the "abstract" token and the checking, etc. for it with classes. [DONE, but I never put in an 'abstract'. I did put in a 'partial' for interfaces. Also no 'final' yet.] Hmm. Can you put a class inside a generic? I think it would open up a whole can of worms. E.g. the concept of an instantiated class, which is sort-of a type, and then instantiating all of the procs in the class, and the methods. For now I'll prohibit it. There are all kinds of places where I have emitted an error message, but have not actually marked an error. On classes is just one example. In that situation I need to mark the class, so it is not constructable, i.e. any proc that tries to construct it (or a descendant) gets an error. The same (relates to a point made earlier) is needed for erroneous types. RESOLVED Ok, constructor stuff programmed, completely untested. This is to remind me that when an inner constructor is called explicitly, I need to make sure to push the class parameter on *last*, since I've done the offset- changing thing for constructor parameters as well as method procs. DONE 081102/Sunday %%% Interesting. For local variable and proc formal offsets (and maybe other things like field offsets), the value emitted in byte-code is the z_word_t offset, rather than a byte offset. This looks nicer in the C code for the bytecode engine, but it does result in a run-time scaling of the offset. Execution would be a bit faster without that. And, I compute offsets as byte offset within the Zed system, then convert when I emit bytecode. Now, doing the conversion does increase by a factor of 8 the range of offsets I can handle, so likely that is why I did it. Note that I don't do the scaling in 'gmeth'. That's inconsistent, if nothing else. Want to change so that the DefineProc stuff knows that we are defining a method/constructor proc. This is so that it can do the formal offset modification before the initial code generation. Having MakeMethodProc do code generation a second time is icky, and worse - it can't know if there were actually errors in the proc because the new context for compiling the proc no longer exists. It *could* check for an existing byteCode vector, but that's a bit kludgy. DONE Other than that, and testing out explicit constructors, and doing the changes mentioned above for those, things seem OK. 081103/Monday %%% Don and I chatted last night about my 'assert' concept. A very valid point is that the updating of variables/fields involved in an assert can violate the assertion in the middle of a sequence of updates. There is also a big difference between what I sort of had in mind, which is a condition that is *always* true, and an explicit assertion that says a condition is supposed to be true at a specific point. A specific-point assertion might be do-able with a 'construct' proc, once I get the parser done in Zed - the argument to the "Assert" construct proc could simply be a string containing an expression. The expression is parsed and compiled right there, with an 'if' around it, to implement the 'assert'. This would allow the programmer to readily control when these asserts are present in the code, and they could also be disabled at runtime, like the "Debug" stuff allows. Could use another token, say 'require' for the other kind of thing. These could be general within a package, within a proc, or within a record or class. To handle the multi-update issue, there could be a simple 'atomic' construct that says that the requirement may not be true within the statements inside the 'atomic' section. Note that there cannot be any proc calls in an 'atomic' section - inside those procs might be nested atomic sections for the same requirements, and those would be incorrect since the outer one is in the middle of a sequence of changes. Need to change the interface used for adding procs to a package. It needs to be a call to set which section to be adding to (likely including the proc name, so that errors can be done right there, and we can know whether we are compiling the constructor or not). Then, adding a proc does not specify that. Hmm. Can we know the name before we are done? Yes, since the main parser uses the more detailed proc construction calls. The Package will then be able to call into Proc to set things inside the Context_t, which will allow proper compilation of an explicit constructor. DONE 081104/Tuesday Perhaps in all Types/Exec/Proc/Package construction temporaries, keep the Proc/Context_t from the first call. That way we are sure we are using the same context throughout the sequence. Also may want to save copies of the various "containingXXX" fields in the Context_t, and verify them at each step (compare the ones saved in the TempXXX versus the ones in the saved Proc/Context_t). Thus, we can tell if we have been "left dangling" for a long period of time, and so should just clear everything out and not return a constructed entity. Since we have a reference to the Context_t, it cannot be freed and the space re-used. Note that it is thus important that the Proc code create a new one for each proc compiled, and never re-use an existing one. RESOLVED %%% Re-examine Types/CheckGeneric and Types/ContainsGenericParamSubtype. If nothing else, their header comments are incorrect after the removal of polymorphic bundles. 081105/Wednesday There is a big hole in Exec/ProcCheck. It should disallow creation of a record or class outside of its writeablePackage. It should also check for the writes themselves. [Done with the new Validate copying.] A URL from Don, relating to type representation comparisons - much as I do. Also mentions a persistent language Napier88. URL: http://www.cs.adelaide.edu.au/~idea/idea2/fred.htm 081106/Thursday Starting on explicit constructor use. The problem I'm hitting is that I need to leave the new class pointer on top of the stack after calling the explicit constructor. But, that constructor proc returns no result. Maybe I need to have a special return instruction for an explicit constructor proc, that leaves the first parameter on the stack. That suggests I would want to make that parameter 'ro', so the code can't change it. Or, I could have the code generator notice that it is compiling an explicit constructor, and simply add the needed "pshlr". DONE On a related note, I thought about making the return value of an explicit constructor be explicit. This could allow, say, the code to avoid duplicate equivalent objects by searching based on the constructor parameters and returning a reference to an already existing object, instead of using the reference to the allocated space passed to it. However, this would not work when we have an inner class that has an explicit constructor - it cannot properly return a pointer to an alternate object, since it needs to fill in the one it was passed a pointer to. It could do both, but I don't think I want the language specification to be as loose as that. 081108/Saturday Is there a restriction about not being able to extend a non-public class from another package? DONE 081111/Tuesday *Almost* got classes/interfaces fully working. Change "class" => "capsule" (Don is fine with me using his term.) DONE Change "abstract" => "partial". DONE 081115/Saturday Capsules all done, I think. The one thing left (unless there is more listed above that I've forgotten about) relates to the run-time implementation of them. Can I avoid the parameter-offset-hacking (makeMethodProc) by changing how things work, including changing the "jsri" instruction so that it has an offset from SP at which the proc reference is stored? Also, would it help to split the "capcon" instruction into two parts, so that if the capsule has an explicit constructor, I can use a normal "jsr" to call that, and thus don't need to kludge constructors to return the capsule reference. One thing I just remembered - if a capsule is to be extended (made an inner capsule to some new one), it has to be 'public', which in turn makes all of its fields writeable, unless they are made 'private noInit'. This is awkward. Perhaps you can extend a non-public capsule, but you do not gain the right to write its fields, and it must have an explicit constructor that you must call in your explicit constructor. HANDLED Not clear how to do the extra checking - the Types/FieldList_t does not contain an indication of what capsule it is declared in. However, it would be possible to modify the "fl_flags" on inherited fields so that they are 'private' and 'noInit' in Types/CapsuleStart. Ack - that won't work - making them 'private' allows the extending class full access to them. Maybe 'ro' would work? Which then begs the question of whether fields that are directly 'ro' should not be 'ro' in an explicit constructor. Which in turn pretty much invalidates the usefulness of marking inherited fields as 'ro', since then they are writeable in that constructor. Okay, I've gone back and read everything on classes, upto my c.l.j post. Discussion with Don. I concluded I need a new flag bit in Types/FieldList, which is set by the compiler on fields that are inherited from either a non-public capsule, or from private fields in public capsules. That flag prevents *any* writes to the fields, and it copied further if that capsule is extended. DONE. Add an instruction that takes an offset off the top of the stack, and adds that to SP, and fetches and pushes the value from there. This can be used in capsule method calls and explicit constructor calls (after splitting up the "capcon" instruction) for the indirect calls without having to use "makeMethodProc". DONE, then UNDONE 081117/Monday <> SymTab/Table_t is wide open - anyone can add elements to them. Sigh, this is a big hole, I'm pretty sure. E.g. someone adds an element to an existing record type, and can use that to probe memory all over. One fix is to make SymTab/Enter be internal, and have a pair of external entry points that require either a TypesKey_t or a PackageKey_t (or some other). 081118/Tuesday <> Are there things that Zed can do to make OS's more secure, in the sense of isolating various components, drivers, etc.? One thing that might help is limiting the direct I/O abilities of drivers. For example, on port-based (e.g. 'in' and 'out' instructions like on Intel chips), the OS could create specific routines for the driver (e.g. at driver insertion time, based on system configuration info) that allow it to write just to the ports of the device it is handling. If the driver ends up always using constant port numbers on its calls to those routines, then chances are that if the routines are inlined, the tests of the port numbers will be optimized out (e.g. as simple as constant folding), and resulting code is as efficient as it would be with in-line 'in'/'out' instructions. How about drivers that need to use DMA hardware? Perhaps there can be a scheme that describes how to setup and initiate DMA for a given device. Based on that description, the system can generate DMA-initiation procs for the driver, which could hopefully work as above and essentially disappear. Perhaps these routines also check the buffers they are given, to ensure that they are either buffers passed to the driver from above, or are internal buffers that the driver has allocated from the system. Note that most, if not all, drivers do not ever need to look at the actual user data they are dealing with. They may need to work within their own buffers, for things like filesystem metadata, however. When the time comes to do the first such driver, more thought will be needed here. But, the intent is to protect user data and the rest of the system from any accidental or malicious harm from the driver, while still allowing the driver to do its job efficiently. Note that a driver for a non-DMA device will need access to user data pages, however, since it must push the data through the device using CPU instructions. Currently there is no way to call an implicit capsule constructor from inside an explicit capsule constructor for an extending capsule. This should be easy once "capcon" is split into "capcon1" and "capcon2". DONE 081119/Wednesday I've split up the old 'capcon' (CAPsule CONstructor) instruction into 'capcon1', which does the allocation, method vector setup, interface stub setup, etc. and 'capcon2', which fills in initialized fields, in order to do the default constructor. An explicit constructor is now done via a direct 'jsr' to the constructor proc. When calling an explicit constructor for an inner capsule from the constructor for an extending capsule, only the call to the inner constructor, or the use of 'capcon2', is done, since the 'capcon1' work is only needed at the top level of construction. I've also added a 'decref' instruction. This is emitted after calling an explicit constructor at the top level, and balances a special case in 'capcon1' which initializes the useCount of the new object to one, rather than zero. That is needed so that the new object is not freed on return from the constructor, which otherwise happens. This new technique allows me to get rid of the 'rtsc' alternate instruction which was used to avoid the extra reference count decrement. However, it was not previously possible to use something like 'decref', since there was no place to really put the instruction in the sequence. I had temporarily added a 'pshspr' instruction that pushes a copy of a value from higher up the stack. Don and I had discussed using this when doing method calls, to avoid having to mess around with the parameter positions in method procs. This would have worked for calling methods on capsule objects, but it does not work well when calling methods on interface objects. The reason is that both values that must be obtained from the interface object (the method to call, obtained from the method vector, and the capsule object reference) are indirectly fetched from the interface stub object. Getting the proc via 'pshspr' would require leaving a copy of the reference to the interface stub object on the stack, and then doing a 'swap'/'ignr' combination to get it off after the method call. I chose not to go that route, and deleted 'pshspr'. That form may work better with native code, however. 081120/Thursday <> Some thoughts on email SPAM prevention. Probably strongly disliked, and probably ultimately futile anyway... - don't allow paste into the "To" box of the email tool. Address must either be typed in, or come from the address book. Addresses can be put into the address book from the To/From headers of received emails. - don't allow import into address book, from anywhere, including via email, import from legacy system, etc. - there were ideas earlier. One was to deliberately pause between sending emails to multiple recipients. Possibly between any email sending operations at all. Starting to look at how records and case-oneofs are done, with the view to merging them. I noticed that field rd_hasGeneric is set in the Types/RecordDesc_t, but only ever checked in the bytecode machine, to prohibit construction of a record that has that flag set. There is no other use for the flag, even in the SrcBundle stuff (old sources with bundles still present). REMOVED it. Also fixed the use of flag tf_hasRefGenericParam in Types/getFieldAttributes. Hmm. Can you have variant stuff in capsules, or only in records? If you can have it in capsules, can you have an explicit constructor? Without changing records to have variant parts, I really could just use the capsule stuff for records, since a record is just a capsule without any methods or 'extends' or 'implements' - there is just a simpler syntax to declare them. However, adding a variant part to them makes them distinctly different from capsules. My gut tells me that allowing a variant in a capsule is a bad idea. For example, you cannot extend such a capsule, since the whole idea is that there is no data beyond the variant part of such a record. So, on to what I'm supposed to do first: write out some examples, to see what they look like. I think there are two choices for the syntax of a constructor for a record that has a variant and multiple fields: type HasBoth_t = record { sint hb_i; char bh_ch; case hb_kind incase hbk_uint: Base/Uint_t hb_uint; incase hbk_string: string hb_string; esac; }; HasBoth_t hb := HasBoth_t(-3, 'a', hbk_string("fred")); or HasBoth_t hb := HasBoth_t.hbk_string(-3, 'a', "fred"); The second form is one character shorter, and does not require any special new syntax (well, other than dropping the '.' when there is no variant part!) to drop down to the current form which has only the variant field. That's a strong argument, even though I sort-of prefer the first form, as clearer. Both would end up exactly as a current record constructor, if there is no variant. Having to use: HasVariant_t hv := HasVariant_t(hvk_string("barney")); would suck, so my intent was to drop to the old variant form for a record with no fields other than the variant. But, that's a special case. So, it seems I should go with the second form above. To my mind, that connotes building an entire different record based on the variant, whereas the other syntax connotes building the fixed part of the record along with one form of the variant part. I don't think there are any issues with using these new record values. The fixed fields can be referenced just like record fields - the variant part can only be referenced inside a case on the variant index. %%% Perhaps associated with any numeric variable with a unit specification is an optional "preferred unit". So, a distance measure could normally use metric units, based on the meter. But, if it is given a preferred unit of "feet", then output would be in feet. Perhaps there could be meta- units, saying things like "feet and inches". 081122/Saturday <> Incref/Decref removal optimization doesn't need to be on an entire proc basis. Doing it on something like a 'while' loop will be valuable. E.g. StuffList_t sl := blah.stuffs; while sl ~= nil and simpleTest(sl) do sl := sl.sl_next; od; (or the equivalent using Lists/Iterate) Here, there is no need for any incref/decref operations until at least after the loop. Doing an incref there suffices. There might already be code that tests whether "sl" is nil after the loop - if so, only an unconditional incref is needed in that 'if'. Maybe. Now that I'm quite a way into adding a variant part to records, I was having some thoughts about another way to do things. Right now the "case-oneof" construct is a reference value that is pointed to. It could be a struct-like value that is not pointed to. I don't think you could put one as a separate variable, or as a field of a struct that is not ultimately inlined into a record (so maybe not into a struct at all), but you could inline one or more into a record type. E.g. type OO1_t = oneof case oo1_kind incase oo1k_string: string oo1_string; incase oo1k_vector: [] bool oo1_vector; esac; type OO2_t = oneof case oo2_kind incase oo2k_uintVec: [] uint oo2_uintVec; incase oo2k_floatVec: [] float oo2_floatVec; esac; type R_t = record { uint r_n; string r_tag; OO1_t inline r_oo1; OO1_t inline r_oo2; }; A type-case is needed to access within either case-oneof, but not to access "r_n" and "r_tag". A constructor could look like: R_t r := R_t(1, "tag", oo1k_string("Fred"), oo2k_floatVec(fVec)); In this form, the enum-like value for the selector would have to be stored as the full 64 bit value, since one could do things like: @ OO1_t ro oo1 := @r.r_oo1; Because the 'inline's are present in R_t, the r_oo1/r_oo2 field names are not used when referencing the case-oneofs. Putting the variant right in a record has the advantage that I can sometimes save memory if the enum-like value fits within space available at the end of the existing record fields. That was my intent for Exec_t, for example, since I wanted to add a bits type to it, or to ExecInfo_t. One of the reasons I wanted something like variants in records was for the case of X stuff, where the data for an XEvent consists of some fixed fields (e.g. the windowID) and some variable fields. Returning just a case-oneof value leaves nowhere to put the fixed fields. However, one way that would work for that is to not have XWaitEvent return the XEvent, but instead to accept an '@' to a struct type that has the fixed fields and the one case-oneof value. This is closer to what the C-level code does anyway. That leads to asking whether I can put a case-oneof variant in a struct, like I can do in a record. I think the answer is "no", because I don't have any "assign-to-entire-case-oneof" construct - I cannot allow the two parts to be assigned separately. 081124/Monday Check to make sure that no '@' value can persist. I think that involves verifying that '@' types cannot be used for non-local variables, fields/ members, matrix/array elements,... DONE <> Thinking about assert/require a bit. I think the name 'assert' works better for the one that can help the compiler optimize. The programmer is asserting that a given condition is true. The 'require' sense works for a construct where the programmer is requiring a condition to be true. It also occurs to me that the asserts can be done in the form of procs. This makes parsing them easier, and also simplifies the implementation. The issue is that of declaring the procs. They essentially want to be inside the record/capsule. That is OK for capsules - the proc could be declared inside the unnamed utility section, and just specified as an assert proc. But that doesn't work for records. I *could* restrict them to be only in capsules, since you can make a capsule that is identical to a record in how you use it. That could look like: capsule Buffer_t { fields { [] char b_buf; uint b_pos, b_max; }; procs { proc condition1(Buffer_t b)bool: b.b_max = getBound(b.b_buf, 0) corp; proc condition2(Buffer_t b)bool: b.b_pos < b.b_max corp; }; assert condition1, condition2; }; Note that requiring a capsule means that you can't have a variant part, like you can with records. The reason for that is that I (currently) require that the variant part of a record be the last part, and so it would not be possible to extend, with new fields, a capsule with a variant part. I guess I really don't have to disallow fields after the variant part - I really only need to prevent multiple variant parts, and I could do that with capsule extension easily enough. I would need to represent such types differently, however. As has occurred to me before, it would be possible get rid of records, internally, and just use the 'record' type syntax as a short-hand for a capsule that only contains data fields. Hmm. Would you be allowed to extend such a capsule? My current gut response is "yes". 081125/Tuesday <> Of course, the syntax above could more simply be: capsule Buffer_t { fields { [] char b_buf; uint b_pos, b_max; }; procs assert { proc condition1(Buffer_t b)bool: b.b_max = getBound(b.b_buf, 0) corp; proc condition2(Buffer_t b)bool: b.b_pos < b.b_max corp; }; }; Another thought is that the 'fields' part could be 'record', using the same term as outside of capsules, just like I use 'interface'. I'm not sure that is quite as good, however. As I think I said earlier, one big issue with getting rid of 'record' altogether and just using 'capsule' is that there is no obvious way of dealing with variant parts in an explicit constructor. What I have been thinking about is how to represent records that have multiple variant parts. Basically I need to use a case-oneof for type Types/RecordFieldList_t. Something similar would likely also be needed for Exec/RecordConstructor_t. Well, I have had a way in the back of my mind to handle explicit construction of case-oneof values. It arose during earlier thoughts about having them as separate struct-like entities. The syntax would be something like: capsule HasVar_t { record { uint hv_n; string hv_tag; case hv_nameKind incase hvk_nameChar: Base/Char_t hv_nameChar; incase hvk_nameString: string hv_nameString; incase hvk_nameProc: proc(HasVar_t)string hv_nameProc; esac; float hv_weight; case hv_otherKind incase hvk_otherString: string hv_otherString; incase hvk_otherVector: [] uint hv_otherVector; esac; }; procs HasVar_t { proc HasVar_t(HasVar_t ro hv; uint n; string tag, name; float weight; string x)void: hv.hv_n := n; hv.hv_tag := tag; variant(hv.hvk_nameString, name); hv.hv_width := weight; variant(hv.hvk_otherString, x); corp; }; }; Well, I sure had to pause in typing the 'variant' things, since what I hadn't realized was that it works better if there is a single name for the entire two-valued variant. But, there isn't, so I had to invent. The above *would* work, but it is an entire new statement form in the language. Observation: having more than one variant in a record/capsule essentially means that more than one set of good field/variant names is needed. Note my difficulty in picking good taglets/names above. Hmm. I'll redo it. Done. Note also that doing an assert based on a variant field would be more complex than the simple asserts above, and harder for the compiler to get any information out of. 081126/Wednesday A big issue with the above 'variant' initialization statement is that it is able to change which variation a variant currently is. That ability is something I didn't want to have. The main reason is that it is currently possible to have '@' values to the data field of a variant. If it is ever possible to have the current variation of the variant changed while one of those '@'s exists, then I have a huge hole. Limiting the use of 'variant' to an explicit constructor helps somewhat, but I may also want to add other limitations, like that all uses of 'variant' must come before any other things in the explicit constructor. Argh! The whole 'variant' thing is pointless. They only allow the single variant that is directly coded, and since they are in the explicit constructors, there is no way at all to create objects with any other variant. The implicit constructors used for records allow that. 081127/Thursday <> Seed7 Homepage: http://seed7.sourceforge.net Seed7 - The extensible programming language: User defined statements and operators, abstract data types, templates without special syntax, OO with interfaces and multiple dispatch, statically typed, interpreted or compiled, portable, runs under linux/unix/windows. 081128/Friday <><> I've previously thought that one problem with a lot of Linux stuff is that it is held together by shell scripts. A problem with them is that the call to a lower-level program is not strongly typed in terms of what command-line arguments it expects. So, when a utility program changes its command-line syntax, some shell scripts that use it can be silently broken. The author/maintainers of the utility program may not even be aware of the uses to which the program is being put. Last night at a retirement supper I talked about this a bit, and my mind has been thinking a bit about it. First, I don't have a good grasp of what a "program" is in Zed. They may not end up being very similar to the same thing in traditional systems. Ignoring that for now, here are some thoughts. Perhaps programs do not directly take command line arguments at all. Instead, it is an interactive shell's responsibility to parse the user- typed command line, find a matching "entry point" public proc in the program, and then call that entry point appropriately. There could be a "help " command in the shell that lists an interactive-style syntax for each of the public entry points in the program. Also, the program could export a standard entry point (or just a string constant) that provides a more concise description of the interactive commands that can be used with the program. It should always be possible to ask the shell to construct the syntax, because the provided summary can become out of date, or could be just plain misleading. There would need to be ways to allow shorter forms of flag and mode parameters to be used - just using the formal parameter names from the entry-point procs could easily end up being too verbose. I need a way of doing this that does not allow the information to become out of date or internally inconsistent. Perhaps there is another proc-type (similar to 'compileTime', 'construct', 'ioProc') which is used to declare a proc as an entry point. Or, perhaps better, there is a system utility routine that can be called at compile-time of the program, that is passed the entry point proc and a string which is the command-line syntax expected by that entry point proc. The compile-time routine verifies that the parameters of the proc and the syntax string are consistent, and if so, does something magic that makes that entry point proc available for use. Short forms of commands would simply be additional entry points with fewer parameters. Some standardized way of accepting multiple argument sets will be needed. Either that, or I go with the Amiga style way of doing it, where it is each program's responsibility to expand filename patterns. Note that since only humans ever actually enter command lines, extremely complex forms are not needed. 081129/Saturday Is it useful to have a capsule with no fields or methods, but which implements some interfaces? What if it just extends some other capsule - is that a useful indirection in some situation? 081130/Sunday Very little work for several days. However, there are now no longer any oneof's in the main Zed sources - they have all been trivally turned into records with a variant part. Next job is to fix all of the error messages to refer to records instead of oneofs, and then the big step of removing case-oneofs from the language. 081201/Monday If multiple variant parts of records are allowed, they could all be required to be at the end, in which case I could just use a list of them, rather than having to have a list of record fields which are themselves a variant record - sometimes plain fields, sometimes variant fields. An alternate constructor syntax could be something like: RecordType_t(1, 2, rk1_str : "Fred", 3, rk2_ui : Base/Uint_t(4)); It's actually longer, but perhaps easier on the eye. The other thing still possible is to change things so that records are declared like capsules are. That makes it no longer possible to have unnamed records. Still leaves unnamed structs though, so they would have to be done as well. I would still need tik_named stuff, though. But, there would no longer be possible recursion through unnamed types. [Unnamed struct/record/union/bits/oneof types are gone.] 081202/Tuesday %%% Really need to put index ranges into case constructs. Some extensions to Pascal had ranges and a default in the variant part of records, but I really can't see the point to that. Well, it wasn't hard to comment out all of the case-oneof stuff. Things all seem to compile and run. Now do I really take the step of actually deleting all that stuff? The answer seems to be yes, but I'm still nervous about doing it. Oh well, I did it with polymorphic bundles. Since a record can now have just a variant field (obviously, since that is what all of the existing case-oneofs have been turned into), its not losing any language feature. 081204/Thursday Spent a bunch of time debugging. Ended up adding a couple of comments to the C sources. The key one is in Types.c/findSame. Have now added flags -f -s and -n to the command line of "z". They allow the silent pre-sourcing of everything currently in my ../Src/Z directory, which means I don't need all of the "runit" files. %%% I note that Exec/ConstantDeclarationCheck allows a Types/Type_t constant with a type value. Does this actually work as a way to name types? 081205/Friday Found a bug with test/lowlevel.z . proc pretendTest1 pretends from a uint to an array of float, which it assigns to a variable. This results in a TSP overpop. I have code to handle the other direction - it does a dup and an ignr. FIXED. The combination of Package/Print and Types/Print does not correctly print types that are defined as names for types from a generic instantiation. I think it needs to examine the Types/NamedDesc_t in more detail, and not just print the nd_subType. DONE. Hmm. 'use' statements currently go into the current package, rather than into the current subPackage. The 'ps' struct does not contain the current subPackage, only the Proc/Context_t does. Hmm. Package/AppendUse *forces* all 'use' statements to be in the containing package. RESOLVED 081206/Saturday It is unfortunate that bits types can't have constants. Many of them, such as Types/FieldListFlags_t can readily fit in a word, and there are lots of situations where having constants would be useful. However, there are also lots that won't fit in a word - they are more like structs. So, it would be icky to have only some allowed to have constants. Also, for cases like FieldListFlags_t, it would be nice to be able to use "0", instead of the much longer "FieldListFlags_t(false, false, false, false, false, false)". Fixed next day - you can now have bits constants. Wait a minute. How do those work anyway? If the bits type is longer than a word, how do I assign them? Is there some check in the constructor to see if they fit in a word? Ahhhhh. Bits types cannot be longer than a word size. So, they could *all* have constants. Currently they cannot: "Cannot have a constant of this type" Eeek. I just noticed that test/exectest.z is saying "procAssign failed!". Need to investigate that. Fixed. Sigh. I can't just go adding package variables to Types - it needs to match one-for-one with the C one. 081207/Sunday Ick. Currently, if a proc symbol is undefined, I get the message about needing '{' and '}(' around it. Need to fix that. Fixed. test/bits.c - values are wrong now. Fixed. This was my change to what used to be bcComp.c/isBitsConstant and is now Exec.c/IsBitsConstant - the handling of a nested bits value didn't compensate for the position of the bits value in its containing bit8/bits16/bits32/bits64. And, because bcComp uses that routine, direct code generated was affected as well as just bits constants. 081209/Tuesday Grff! There was a spurious '{' in a routine, in the same place as there would be an open '{' for a C proc. There was no error message for it, but it was forcing ps_withinBracesCount to be 1 instead of 0, and so messing everything up. Check at the end of a proc?? How else deal with this syntactic ambiguity? Resolved - did it an easier way. <> From comp.risks: We normally don't think much about this, but lately I've been doing a lot of programming in Haskell, and the folks who built the libraries being much too smart for their own (or, anyway, my) good, make this clear, not to mention push it a bit in your face: http://haskell.org/ghc/docs/latest/html/libraries/time/Data-Time-Clock.html Actually, though, as well as being more than a little pedantic, this is I think a fairly brilliant example of good risk management: by forcing programmers to stop and make a choice between a DiffTime (which includes leap seconds) and a NominalDiffTime (which more or less pretends they don't exist), it also forces them to think for a moment about what exactly they're doing. So kudos to Ashley Yakeley, the rather smart author of this library. 081212/Friday If the variant for a record constructor is not one of the tags for that record variant, get both "too few" and "too many" error messages! FIXED 081213/Saturday Got Exec_t and Type_t reduced - no longer any ExecInfo_t or TypeInfo_t. Puff! Ick. In Types.z, SkipExec just yields t.t_exec.ed_type. However, SkipNameAndExec uses GetTypeConstantExpr to get at the desired type. In Types.c, both routines use GetTypeConstantExpr. However, there are lots of other places in the code that just use exec.ed_type. What are the uses of GetTypeConstantExpr for? There is no point in having ed_type if that is needed everywhere. For now, I'll make all versions just use ed_exec, and hope I find some problem soon. 081214/Sunday Interesting. I was starting to look at changing the SymTab stuff to combine the SymInfo_t and Entry_t records, now that I can have a variant part in records. But, Entry_t is a struct, not a record. And, the vector of them within a Table_t is thus a vector of structs. So, there is already not an extra allocation for these - only for the SymInfo_t, which used to be a case-oneof type. SymInfo_t could now have some non-variant fields in it, but this has not been needed, else I would already have done something else with it. So, it appears I don't need to change SymTab stuff at all. Similarly, Package/GenericElement_t, Package/GenericInstantiationElement_t and Package/PackageElement_t do not need any changes. 081215/Monday %%% <> In native code, it might be a good idea to build actual creation routines for records with variant parts, just to do the allocation and fill in the various pieces. This would be especially true if some of the fields are fixed in the constructors. E.g. currently the ex_hasError field in type Exec_t. Spent a while chasing a problem. When I was first putting variants into records, I noted that I had to make the "sel" instruction handle the size of the selector properly, but wanted to continue on some other changes first. Well, I forgot to do it. It caused me troubles as soon as I added ex_hasError to Exec_t. Fixed, and did some further cleanups that I had forgotten, both in bcRun.c and {Ll}ex.{hcz}. 081216/Tuesday %%% Language things left to do: - (physical) units - asserts - compile-time proc calls in package - user-defined operators The compile-time calls are desired for things like adding Fmt routines to types declared in the package. However, they can also do, or attempt to do, undesireable things. For example, what if they directly or indirectly use a package variable in the package? I'm OK with making the behaviour undefined, but I need to always ensure that it is safe with respect to Zed's internal safety. Hmm. You can already put compile-time calls into procs, and they can do the same. So, there is no new issue here. I'm still unhappy about package variables as well. Some form of them is certainly needed, but adding and removing them will be problematic in a running system. I would really like to re-allow initializations of them, too, if I can do it safely. Perhaps I can, since that issue is not very different from the compile-time call issue. Maybe, just before calling any package init routine, I should just call a routine constructed from all of the initializers for package variables. [Later - initializers are back.] I'm wondering if the answer for adding/removing/changing package variables is that I throw away the entire current set of them, re-allocate a new set, and re-run the initializers and the package init. And, I document that that is what happens. Maybe even pop up a requester saying it about to happen. Is there a way to delay initializing a package until it is actually needed? When I get to the point of loading packages from a Zed world, the init can happen then, since presumeably I am loading the package because it is needed by already running stuff. [Maybe wait for its dependencies to be satisfied. But, what about circular dependencies among packages?] <><> When editing code stuff, it is necessary to know what all is affected by edits, so that all necessary recompilation, etc. is done. One way to do that is to attach "edit info" to the package being edited. That info could be produced by scanning everything in the package (sub-package?) looking for what depends on what. The "edit info" would then be a summary of that information, which would allow the system to only reprocess a subset of the procs in the package, based on what is being edited. [Note that this is temporary info - it is only needed when you start editing stuff.] Cross-package dependencies are probably already determinable, in terms of the package-depends-on-package information, from other pre-exising info. But, that might need to be examined in more detail when something exported from a package is changed. Aha! I can combine record types BitsField_t and BitsFieldList_t. DONE 081219/Friday Been doing follow-up work to having added ex_hasError. %%% Exec constructs must exist entirely within a given context. That context includes things like the package, the subpackage, the proc, the scope. So, for paranoia sake, define a struct type in Proc that contains that information. Export a routine that copies the info from a Context_t into such a struct. Export another that compares the info saved in such a struct against the current info in a passed Context_t. Return bool indicating whether a change has been made. Then, constructs in Exec (and likely in Types too) can use the pair of routines, along with the struct in their TempXXX records, to verify that the caller isn't trying to make a construct spread across multiple contexts. Mark the construct as erroneous, or perhaps just delete it altogether, if a violation is noted. It may be way overkill to do this. But, I'm thinking "better safe than sorry". [Later - with the PContext_t reference within the TempXXX struct, this is less of a problem. But, if I do not recreate the PContext_t for each package/capsule/ generic, or I do not recreate the EContext_t for each proc, then problems can still exist. See below - maybe proc verification catches all of these.] 090104/Sunday Almost nothing accomplished since the above. Away for 10 days, and came back with a nasty cold. I've just been defining the type for the above. Nearly all fields of record Proc/Context_t move into struct Proc/ContextElements_t. However, it has just ocurred to me that Exec/ProcCheck is responsible for doing all of the scope-related tests for things like using local variables. Do I need additional tests at all? Perhaps just make sure that Exec/ProcCheck can catch all of the possible violations that the new stuff is supposed to. Undid all the changes to Context_t. Need to clean it up and add comments, however. 090105/Monday All of the TempXXX records used for Exec and Types construction can be made opaque. The cleanest way is to have 2 levels of names for them, and only export the outer name. This requires an assignment within the Exec or Types routines which use the values, but that's not big deal. The other way is to make all of the fields 'private' 'ro', but that is likely too limiting to the implementations. [See below - why do I care if someone can read the fields?] Done coversion of 'if' to use a new privately exported TempIf_t. Seems OK - did have to use the assignments to go from internal to external types. Should use less memory now - have an array of structs for the alternatives. 090107/Wednesday Yesterday did For and While. Today started in on Case, which already has a TempCase_t. Got rid of the private TempXXX0_t stuff that I had put in yesterday - what's the point of that - I don't care if the caller can see inside my structures - the key is they can't modify them. Finished Case, but want to do a bit more on it. 090108/Thursday Did TempSeq. Exposed a bug in Hosted/pExec.c/parseWhile, where it was not ever creating the sequence for the while body! This showed up when making an exk_sequence just be [] Exec_t. Fixed. Want to change Case_t, so that the common case of having exactly one index with each alternative uses less memory. Do this by having a new field in Case_t that is simply an array of SimpleCaseAlternative_t, which is in turn just a struct with two Exec_t's in it. Having a vector with one element for nearly all CaseAlternative_t's is quite wasteful. Doing the choice on a per-alterative basis is another possibility, where CaseAlternative_t has 3 fields - one is the body, and the other two are mutually exclusive - either a single Exec_t or a vector of them. Hmm. Because of variant records (what used to be case-oneof's), there are a *lot* of case constructs with exactly one alternative. That merits some special handling. There are lots of cases that individually handle all labels, but there are quite a few that merge labels as well - some with lots of labels, only a couple of which are merged. So, I'm thinking I should simply allow "one of only" or "vector of" at all 3 levels. DONE: the "only one" option now exists at the level of case index constants and case alternatives. 090109/Friday Done the above case optimization. Also cleaned up and commented the code generation for case constructs. One thing left that can be added is to treat the one-index-one-alternative case special, with just a simple comparison and conditional branch, instead of either 'case' instruction. DONE. Found a nasty mis-conversion from Z to C, dealing with incorrectly selecting a record variant. Fixed. The next big memory conserver/TempXXX fixup is for proc calls. Long argument lists can be a vector. Have to be careful in this one, though, because of all the special forms of proc calls. Interesting. During normal parsing, the packages being parsed are not added to the contents list of their parent package (usually '/'). That means that C/pktest/pktest.z can't just dump out '/'. FIXED LATER 090110/Saturday Been working on the last thing from yesterday. As expected, the total bytecode isn't very much. There seem to be *no* proc-level refs. And, only 4377/7050 package-level ones. So, only about 56K of memory for them, and 135K for bytecode. A "__PackageInit__" in a subpackage, along with package variables in a subpackage, are not setup. I temporarily made Proc_AddToBuffer.z not be a subpackage. Fixing this, I notice that subpackages are made sl_local within the package. Changing this to sl_export, but I'll want to think about it. SEE LATER 090111/Sunday %%% When doing the user-defined operators, might want to include the ability to have toUint and fromUint. Would certainly be useful for, say a library that implemented arbitrary precision integral arithmetic. 090114/Wednesday Nokia, current owners of Qt (GUI toolkit) are planning on making it available under LGPL. Finished the first part of converting "exk_call" to use a vector for the parameters other than the first. Next is the io parameters. 090115/Thursday Working on the io parameters. Lots of changes in Fmt.z, since I'm putting the Proc/Context_t into the Exec/ActiveIoCall_t. %%% Shouldn't Fmt just use a capsule which extends ActiveIoCall_t as its cookie value, rather than having FmtCookie_t as a separate record? That way I won't end up with an extra copy of the Proc/Context_t. Also, if I do that, then I think Debug/DebugActiveIoCall_t can then just extend the Fmt one, rather than containing a pointer to one. How then would I do the construction of these? Would I need explicit constructors? Should I have an explicit constructor for Exec/ActiveIoCall_t that requires and fills in the Proc/Context_t instead of passing that to Exec/ProvideIoHandler? Done the IoCall stuff. Chased a bug for a while - passed the old unset "ioc" instead of the proper "tioc" to bcRun for CallIoPhase(..., iop_done). Does Context_t still need to be in Proc? Cannot it be in Package, or perhaps a package of its own? [Ended up in Package] 090116/Friday From comp.risks: Date: Tue, 13 Jan 2009 15:36:04 -0500 From: "Olivier Dagenais" Subject: Tony Hoare: "Null References: The Billion Dollar Mistake" RISKS readers may be interested in the following presentation by Tony Hoare [Sir Anthony C. A. R. Hoare] at the upcoming QCon London 2009: Abstract: I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965. http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake [I learned AlgolW in my second year of university. I don't recall it being "object oriented" at all! It had 'record' types that were very similar to Zed's records, but didn't have variants or the various field tags. And, the IBM AlgolW compiler only allowed 15 record types or something, which caused Hank Bohm lots of trouble for his Algol68 implementation in AlgolW.] 090117/Saturday It would be nice to put TempXXX types relevant to proc calls into file Exec_Call.z (subpackage Exec_Call). Similar for TempBinary. But, that would need adding stuff to the C Types.c in order to create subpackages and to put types into them. Not difficult, but is it worth it? Started to do that, but it didn't work out - too early to properly free some of that stuff. DONE LATER 090118/Sunday Interesting. Naming subpackages is a bit of a nuisance. I can't name the subpackage of Exec that does ByteBuffer stuff "AddToBuffer", since that is the name of an exported proc in Exec. I also can't name it "ByteBuffer" since the subpackage needs to "use" package ByteBuffer, and that leads to a conflict. I settled on "ByteBufferIO". Did the thing noted yesterday. Took more work than I had thought it would. I don't like the fact that the parser is responsible for picking the package that will be the writeablePackage for a record type. I think I should just pass the "isPublicRecord" flag into Types/RecordNew, and have it do the determination. Same for capsules. [Done] 090119/Monday Done that last thing above. Watch our for generic instantiations - the instantiation process gets the "writeablePackage" value from the Proc/ Context of the instantiation call, rather than explicitly now. It seems OK so far. Problems with private/local/export. I was looking at Exec.z and realized that routines like "emitError" were private to the main Exec package, and so shouldn't be visible to subpackages like Binary. Tracked it down to how Package/ResolvePath is doing the lookup. However, this may have opened a can of worms. It looks like some code is assuming that you can't put packages (and subpackages) into subpackages. Is this the rule that I want to have? Why? Note that there ended up not being any kind of lookup rule that climbs the tree of packages towards the root. That happens with scopes within procs, but not at the package level. So, the only symbols you can directly reference that are outside of procs are those that are private to the current subpackage, local to the current package, or exported by the current package. Intermediate packages can have private symbols, but not local ones, assuming such intermediate packages are supposed to be possible. Currently, I've done #if changes in the C code, but no change in the Zed code. I've also changed the declarations of a bunch of stuff in Exec.z to be 'local', which they should be, regardless. Well, there seems no good use for packages inside subpackages. But, is it useful to have hierarchies of subpackages? Can the software that it makes sense to have in a single package get so large that it requires that? Semantically, subpackages all share their 'local' symbols (and with the main package). So, what it comes down to is that of what is essentially a library being large enough that the amount of source to it is enough to need more than one simple set of subpackages, but the private types and functions can reasonably all be within one package. Also, the package itself needs to be a public one, with exports. If the package is not public, i.e. its tables are not readable by others than the owner (or however I end up with permissions working), then full packages can be used for similar purposes to subpackages, with no loss of privacy. So, I'm tending to think that you shouldn't be able to create hierarchies of subpackages - only one level of subpackages within a package. I'll go make the changes for that. All seems well. I've even been able to change it back so that subpackages are in the local symbol table instead of the export symbol table. 090120/Tuesday <> Some language called "Vala". Ada permits '_' in numeric constants, for readability. I could do that, since I preserve the input forms for numeric constants (uint & float). DONE Ada also does not use string breaks. Instead, it relies on the compiler to concatenate string constants at compile time. The idea is that error recovery is better if strings cannot span multiple input lines. Since the Zed compiler does the concatenation as part of its constant folding, I could do the same. I'm hesitant about this. Not sure why. Perhaps it relates to having to actually do that error recovery, although it doesn't seem to be hard. Ada also does not use '/*' '*/' comments - only '--' comments. Thus it does not allow comments between arbitrary tokens - only at the ends of lines. This also aids parser error recovery. My problem with this is that I want a way to comment out, or otherwise disable, chunks of code. Doing that by inserting '--' at the front of each line is more work, and can yield an ugly source file since it increases the length of the lines. This might be more acceptible in the expected full Zed environment, where the pretty- printer could perhaps handle reformatting. Hmm. That would require something to be parsing the contents of comments - sounds like a bad idea. Perhaps the problem will sort-of solve itself. My current vague thoughts on comments is that I will only preserve them in certain situations. Those are basically only '--' style positions (even though I now enter them as '/*' '*/' since that is what I'm used to), and large commented chunks where the comments occupy full lines. Hmm again. When testing and experimenting I *do* make use of '/*' '*/' in the middle of things. Preserving those would definitely be complex. Those situations can perhaps be handled by using explicit "if true"-style conditional compilation. The pretty-printer can easily note those and display them in a different style/colour. 090121/Wednesday GetTypeConstantExpr skips past all sorts of stuff. That essentially removes them from view. Why do I need to do that? For example, in my test program test/hashhash.z, this removes the esik_typeExport nodes from the types gotten as type '##' exports, and so they are missing in the pretty- print of the routine. Similar might happen with instantiated types? I believe the code that is doing this is in pExec.c/parseAssignment. I #if-ed out the code in parseAssignment, and the full type "names" are back in the test program, but not in record constructors. This is *only* happening with types, because of GetTypeConstantExpr. I think in the case of a type export, that routine should be returning a new tk_exec node to wrap the type. The other cases are internally identifiable. That doesn't really work, since Types/ExecNew calls GetTypeConstantExpr to get the type that it is wrapping. Also, the case that is a problem is not visible without deeper examination of the Exec_t stuff that is representing the type. However, I have changed GetTypeConstantExpr to directly return the Type_t, since no call used the returned Exec_t. Well, I've resolved the problem. I'm not sure I'm happy with the result, but I don't see an alternative right now. I've made GetTypeConstantExpr check for an esik_typeExport, and used Types/ExecNew to wrap the Exec_t in a tk_exec in that instance. That required cloning GetTypeConstantExpr as GetTypeConstantExprForTypes for use in Types/ExecNew, which does need to find the inner real type. If I end up deciding to get rid of the '##' compile-time syntax, then I'll have to undo this! 090122/Thursday Why do I pass an explicit package into Exec/ConstantDeclarationCheck? Shouldn't it just use the current subPackage in the ctx, just like it uses the scope from there? NO LONGER PASSED Ok, pushed the GetTypeConstantExpr stuff further with testing. I also need to wrap the Exec_t in a Types/tk_exec when the Exec_t is an exk_alternative, i.e. some kind of compile-time expression. Tested all this with .../C/test/typeConst.z . Can use GetTypeConstantExprInner in Exec/ConstantDeclarationCheck, since it preserves the full Exec_t in the ConstantDeclaration_t. 090123/Friday Looks like I've got all the multi-step constructors handled, in terms of having a TempXXX record to use for successive steps. I now need to get more consistent in naming. - I think all types used only temporarily during construction should have Temp in front of their names. DONE for Exec_t/Type_t stuff. - there are lots of constructors with only one step. It doesn't make sense to name them with either "End" or "Done". "New" is the only thing that makes sense. So, either "Start" or "Begin" for the first step of multi-step ones. Most if not all are "Start" now, so I'll stay with that. I should use the same convention for procs, interfaces, capsules, etc. as well as for types. DONE for Exec_t stuff, Type_t stuff, and interfaces/capsules. 090124/Saturday Can I get more pieces of NamedDesc_t information from the Proc/Context_t passed to NamedNew? That would overall be more satisfying and safer. DONE %%% Should Type_t contain a "hasError" field? I can put one in, just like I did for Exec_t. To use it, I would need to check it every time a type is used in Exec code, and silently mark an error, likely in the Exec_t, and in the Context_t itself. How much bulk is that? Similarly, I would want to mark errors in things like package variables, so that any use of them also produces an error in the using proc. That kind of thing might actually be nice, since then all uses of bad types or variables, etc. would show up in the pretty-printer output. 090125/Sunday %%% Perhaps want a second error flag in Exec_t's. This is to indicate a secondary error. I.e. this code should not be executed, but there is no error message here to show to the user. This could be from the use of an invalid type or variable, for example. Is this something needed at that level, or can it just be a flag in each affected proc? I think it comes down to whether or not it would be useful to the user to be able to display this condition in the pretty-printer. It might not be useful if the pretty-printer wants to show that kind of thing. We only need the information within the Exec_t, etc. records if we produce the pretty-print strictly from those. Perhaps the pretty-print can actually work in conjunction with parsing, Exec_t building, etc. and modify its output based on the processing of tokens by the parsing, and the error calls it makes. There would need to be a trappable one for the "indirect error" cases. 090127/Tuesday Done a lot of the Types work. Just a bit of change needed for the struct and record case, to prevent callers from intermixing operation modes. The same is needed in Exec, for array/matrix indexing. It's possible that no real badness can occur in the latter situation, but it should be fixed anyway. 090202/Monday Bah humbug! I've done the Zed changes to store record case variants in a vector instead of a linked list, hoping to get rid of the overhead of the many allocations. However, as with an earlier situation, I then learned that the data needs to be accessible from SymTab/SymInfo_t entries, which means that the per-variant info needs to be a record - it cannot be a struct. And so the vector turns into one containing references to records instead of one containing structs. The net result is a slight *increase* in memory usage - the overhead for the vector. However, its not much, and in some senses this form is cleaner, in that the permanent data structures don't have the link pointers to wonder about. <> New OS by Russian Dmitry Zavalishin called "Phantom". It's supposed to be best with virtual machine (as in bytecode) languages. With some changes, the idea is that programs operate persistently, in that instead of files, you have check-pointed applications that can be resumed immediately. Article on El Reg: http://www.theregister.co.uk/2009/02/03/phantom_russian_os/ A comment provided a link to Eros: http://www.eros-os.org/ 090203/Tuesday %%% One day should go make mem_free merge adjacent chunks. %%% Thinking about parallel programming, etc. One need is that of being able to force alignment. Doing that in a struct/record/capsule data is nicer if I could have a "AlignTo(uint alignment)" proc somewhere, that ran at compile time, accessed the current offset in a Types/TempFields_t, and yielded the number of bits8's needed to align to that alignment. How do you get at the current open TempFields_t? It's in the parser, so it would have to make it accessible somehow. Note that they can be nested. The Proc/Context doesn't currently reference a parsing context, so right now I don't see away, asside from a package-level variable in the parser. [They can non longer be nested.] %%% A good way to use special machine instructions would be nice as well. Perhaps something based on 'ioProc' could yield some kind of generic Exec_t record that triggers an error by any code generator that doesn't recognize it, but is handled as desired by one that does. Might want to have something like "codeGen=..." so that no-one errors it. %%% I've been thinking off and on about block comments. There are block comments that comment out a bunch of code in a proc, or a bunch of stuff in a package, capsule, generic, interface, record, etc. Ick. There are also block comments that are the documentation for a proc, a package, etc. Perhaps those latter ones need to be marked as such, so that the system doesn't have to guess that they are associated. There could then be fields in the records for types, packages, etc. that are the block comment for them. Haven't been able to come up with a nice syntax. The closest so far is something that starts with '#' (in column 1) and ends with '#end' (in column one). The column-1 stuff is needed because these things don't parse the same as everything else - they are essentially at the lexical level, even though they must be recognized at the main parsing level. [Started doing, e.g. "/*proc", "/*type", etc. Could work.] 090204/Wednesday Probably need an equivalent of Exec error #124 for capsule construction. Should just cache it as a flag in the descriptor. [Huh? I think that error number got re-used.] 090205/Thursday <><> For native code generation, want to use arithmetic, etc. smaller than 64 bits whenever it is guaranteed to be correct. Especially if I end up doing an X86 (32 bit) code generator. This would likely include having variables smaller than 64 bits even if they are declared "uint", etc. The thing to do is to say that types "uint" and "sint" are 64 bit unless the compiler can determine that they don't need to be that big, and that things are more efficient if they are smaller. That emphasizes that if the programmer is mapping external specifications, hardware registers, etc., they must use bits8, bits16, bit32 and bits64, or bits types. <> Thinking about doing AmigaMUD/CGMud stuff using Zed. Basically, I think I want to recognized two related syntaxes: '#.' and '#.' #:= The former calls the compile-time proc associated with '#.', passing it the Exec_t for "" and the string of "". It looks "" up in some fashion, finding out what type of MUD property it is. It replaces the entire expression unit with a call to a proc which takes the " and an internal representation of the "", and yields a value of the appropriate type. In the second syntax, the end result is a call to a proc which stores the "" value via an appropriately chosen proc onto "". The normal Zed type-checking should be active here. Voila! - properly persistent MUD-like variable/property usage. I hope! Note that the selection of which '#.' proc to use is based on the type of the "" to the left of the '#.'. It may be possible to generate appropriate procs for the resulting code at compile-time, based on a "declaration" of a property type. For example, if a property is maintained as Zed type "MUD/Property_t", then creating a new property could be: MUD/Property_t contents := MUD/NewProperty([] MUD/Thing_t); Within that call, the needed proc for use in generated code can be created. It might even be possible that most of them could be instantiations of a generic one. (I've never tried explicitly doing an instantiation in a compileTime proc.) (Likely a pair of procs - one for "getProp" and one for "addProp".) 090206/Friday <> Perhaps there actually isn't any "searching" for the applicable proc for a '#' operator. Instead, it could come entirely from the type of the left-hand operand, and it could be directly retrieved by name using Types/ExportFind. The '#.' could perhaps also be used for something like database access, e.g. in zedweb server code. What would likely be useful there is the ability to have '#[' and '#]', so that, e.g. strings could be used as "indexes". 090207/Saturday Just doing some simplification (using one combined variant record instead of 3!) for bits types. It just occurred to me (perhaps again!) that I need to prevent bad re-use of a temporary descriptor that I've already used to build a type or an Exec_t. Either I should leave it in a state from which it is valid to use it again without re-creating it, or I should mark it as having been "completed", and never build a valid Types_t or Exec_t from a TempXXX that is so marked. I prefer the later - its "cleaner". DONE Hmm. Should I allow you to 'inline' bits values, like you can structs? That would allow the same efficiency and usage as doing some kind of "packed" representation for structs. [Nothing done] 090208/Sunday %%% Based on reading the Ada/Green stuff, I've been thinking for a few days about their use of named parameter passing, and default values for parameters (as well as constructors). I can see no way that allowing the use of named parameters (as in "Pk/Proc1(par1 = 2, par2 = 13.7)") can result in *less* comprehension of the code. About the only thing possible is that you could make the call so long that it becomes much harder to manage, but its still going to be clear - much clearer than a call with a zillion parameters passed positionally. I'm not so sure about default values for parameters (allowing them to be omitted) and bits/record/capsule fields (allowing them to be omitted in constructors). I already have the concept of not all fields appearing, since multi-valued fields don't appear and 'noInit' fields don't appear. So, its hard to strongly argue that default values would introduce a whole new way to misunderstand Zed code. In Ada/Green, you cannot have any positional parameters after any named paremeters. Similarly, you cannot supply any actual parameters without a name after omitting a parameter which has a default value. It goes without saying that all parameters that do not have default values must be explicitly provided. I think my tentative decision on this is to do it. I've done a bit of thinking on the implementation, and I don't think it would be too hard to do, but it would increase memory usage a bit. I'm thinking that in an Exec/Call_t there would be two vectors - one of provided positional parameters/initializers, and one of those provided by name. There would be routines exported from Exec that take such a Call_t and yield a vector of parameter values that is the properly ordered set of all parameters. This would be used by code generators, so that they don't have to try to figure it out themselves (over and over for each code generator). Note that when calling via a proc pointer, it is the proc-type of the pointer expression that provides any default values. It is fine for those to be different from any actual proc that is being referred to. As usual, the total set of parameters must match between the actual proc and any proc pointer it is assigned to. [Later decided not to do this. It would be quite a bit of work. You can accomplish much the same as default values by simply writing wrapper routines that explicitly provide the default values.] 090209/Monday %%% In Fmt, if all of the actual arguments to FmtS are constants, then can we just yield the final string constant? Etc. 090210/Tuesday Finished off doing the xxx_completed stuff. Also added xxx_hasError to TempInterface_t and TempCapsule_t. I've passed the error on to the type resulting from the interface or capsule, but do I need to remember it in the Interface_t/Capsule_t and do something based on that? DONE %%% Look into making the DefineProc and DefineGeneric stuff more consistent with the rest of the TempXXX stuff Do the Interface/Capsule hasError/completed stuff in C. DONE Turns out that "Scroll Lock" mode for keyboards goes back to the old IBM PC. With it activated, the arrow keys scroll the display rather than moving the cursor. 090211/Wednesday Open source re-implementation of BeOS is "Haiku". 090213/Friday In Zed stuff and C versions of that, change "symbol" to "name". Maybe even go all the way and use "name" instead of "identifier" in the tokenizer. DONE. Change package SymTab to be package Names. DONE %%% The only reason I require that all variants of a variant record be trackables is so that the ref counter and garbage collector don't have to examine the variant tag to figure out what to do. However, they have to walk down structures for records, capsules, structs, etc. so that check is likely not a big deal, especially now that there is a vector of the variants. Removing that restriction would allow me to have "uint" variants in particular, which could save me some memory in some Types/Exec structures (I think). %%% What about allow non-trackable types when instantiating generics? The reason for that restriction is that to do that I would have to create a copy of all procs in the generic, so that the code for them would not do ref-counting instructions for assignments, etc. That's a bigger chunk of work - I would need an Exec_t copier, essentially. Is it worth it? [I think later I noted that for multi-parameter generics, you could need quite a few copies of all the procs.] 090216/Tuesday No particular reason not to use array indexing syntax for selecting a char from a string. Similarly, could use [N : ] for substring. Syntax for substring could use ':' or ',' or something else. However, it isn't obvious what either form means. I'm thinking that the forms: str[ upto ] str[ for ] are much clearer. Those two selected because they are already reserved words. DONE %%% Note that, internally, strings are represented the same as a vector of char. I don't think I want to formalize that - I think things are clearer if there is an explicit 'string' type, with concatenation, etc. However, this does bring up the question of whether or not it would be useful to add the above 'upto' and 'for' syntaxes for vectors. It wouldn't be hard, it would perhaps be more consistent, but it wouldn't be of much value. Of more value, at least to my existing Zed code, would be a way to create a new vector whose initial elements are the elements of some existing vector. However, that's another language syntax. Its not a hard one, and it in some senses "raises the level" of the language. It would be possible for the compiler to recognize when vector-append is being done, and to optimize it. It would also be possible to write a generic vector package, with compileTime procs, that could do the same - I believe I wrote a bit about that earlier. One key thing that it might be able to achieve is to avoid messing with reference counts if a vector of trackables is expanded. Note that it may well be valuable to allow the above syntax, with '#' on the square brackets, for user operators. 090217/Tuesday Oops. I still have code in a few places that allows you to pre-declare "oneof" types. It needs to be removed. DONE - it was only a small remainders in pProc.c/ParseTypes and pType.c . It would be nice to avoid having to allocate/check the "XXXKey" values that I've been using. They are a neat answer, but there could be better ones. For example, there could be an extension to "export" that says to who the export is being made (e.g. a list of packages). Only stuff in those packages can use these exported symbols. This could be represented in name tables via an indirection node that lists (or has one-only as a special case) the packages that are allowed to use the symbol. I *think* that would work out. DONE <> Put a reference to an 'ro' lex state record into the Proc/Context_t, as a separate field, initialized when the Context_t is created. Whenever a user proc is called (e.g. compileTime, ioProc, etc.), clear out the normal lex state field. Restore it when the call returns. 090218/Wednesday There was code in Package/DefineTypePhase5 that issued a warning if a type being defined in a generic does not contain any use of a parameter to the generic. It's been commented out for quite a while, and the actual text of the error message is long gone. I've removed the code: if not tdt.tdt_isPreDecl and tdt.tdt_gen ~= nil then Types/Type_t t := tdt.tdt_finalSubType; case t.t_kind incase Types/tk_struct: incase Types/tk_record: if not Types/ContainsGenericParamSubtype(tdt.tdt_gen, tdt.tdt_finalSubType) then emitWarningSym(ctx, %%, tdt.tdt_name, %%); fi; default: esac; fi; [This came back as "Type "XXX" is not a type which can be instantiated"] %%% I think I want a "nonNil" attribute on trackable types (record, string, matrix, proc, capsule, interface, any). Only 'nonNil' values can be assigned to 'nonNil' variables, fields, etc. Constructors yield 'nonNil'. 'nil' does not. This should allow me to have "q" versions of several instructions, that do not need to check for nil. Note that bits types cannot contain enum fields. They must use set-oneof values instead. This is because Zed wants to have full control of enum values - they are always in-range. That is not possible with bits values, since you can assign uint values to them. [2010-02-24 - not currently allowed! Need to fix that. DONE] Previously, I mentioned the possibility of inlining bits types into structs and records. The problem with that is that I would then need to be able to represent fields that are not at exact byte offsets but also are not a simple bit offset from the beginning of the entire unit. I could certainly do that, but it takes more space, and likely more care in code generation. The system is not allowed to change the layout of bits types. That should likely also be true with struct types. However, I don't see a big reason for it to be true with record or capsule types - allowing the compiler to re-arrange those fields could be useful. From comp.arch: Actually, with modern generational GC techniques, it will often be faster to create new objects than trying to reuse old ones. 090219/Thursday Proc/Context_t should become Package/Context_t, since it is within a package that we are adding/removing/changing things. Context_t should not contain any reference to any parsing or lexing records - it should only contain a reference to an output place to send error messages (or a proc to call to emit them). DONE There should be routines in Package which allow you to set the active capsule or generic for subsequent operations (like defining types and procs). There should be corresponding unset calls. This is what will be needed when the IDE is letting you do things. Note, however, that the capsule or generic must be ready for use - i.e. the capsule has all of its data fields and operations defined (more can be added later), and a generic must have all parameters defined. Then proceed to change the type definition stuff so that it takes the capsule and generic from the Context_t. Similar for proc definition. [Nothing to be done for capsule/generic. Done for procs.] 090220/Friday Instead of going right into the above, I've spent some time doing something related, that I first mentioned quite a while ago. In Exec work, including when calling user-provided procs, the Context_t structure is needed for a variety of reasons. One reason is so that error messages can be properly emitted. That requires at least partial access to values that are actually associated with input - i.e. filename, line number and column for traditional parsing-from-file, and something analagous when parsing from a future Zed edit buffer. However, we do not want the user-provided procs to be able to *change* any aspect of input parsing. So, they must not be able to call a get-next-character routine, nor should they be able to get the parser or lexer to be able to do that. 'private' fields and records cannot be modified outside of their defining package, but they can be passed to procs which are intended to modify them. So, giving those user-provided procs even read access to a direct input- context record is no good. Zed doesn't have any concept of making a record read-only - the declaration "Record_t ro var" makes "var" an 'ro' variable, not a variable referencing an 'ro' record. It normally doesn't make any difference, since non-defining-package code can't change the record if it is not 'public'. However, if the record was 'ro', then it couldn't be passed to a proc that needed it to be non-'ro'. One solution is to use an anonymous type, i.e. one that is exported, but that just renames a type that is not exported. Thus, code outside of the exporting package cannot see what is inside the record, and so cannot do anything with the values in it. In that case, such an "OContext_t" can just rename the regular "Context_t" (these from, e.g. InteractiveSession). DONE 090221/Saturday Looking at Proc/Context_t. There actually are quite a few fields that are only relevant during proc compilation. Is there an easy way to split up the Context_t so that it isn't too awkward? It also looks like things would clean up if I moved Package.z to before Proc.z in the source sequence, since it doesn't have many dependencies on Proc. LATER: It's now Package/PContext_t The dependencies seem to be: SetContainingInstantiation, SetContainingGeneric, SetContainingInterface, SetContainingCapsule (all of which would move to Package if Context_t does), plus Instantiate, ForceFormalRo, AddToBuffer, PrintHeader and Print. DONE the change in source order. Putting Package.z before Types.z doesn't work very well though. Moving Package record types into Package0.z would be ugly, because it would need PackageElement, CapsuleElement, etc. %%% For uint and float constants, don't even pass in the value - just pass in the string. The routine should parse it to get the value. That prevents a caller from passing inconsistent values in. After getting the value that way, then check to see if we need to save the string. Fixed up bundle/mapping.z and bundle/symboltable.z and moved them into directory gen. Got rid of fields ctx_runningProcInstantiation and ctx_runningProc. 090223/Monday %%% Need to clean up the scope/sequence stuff. If nothing else, fix the comments in various places that want a scope. Grep for ScopeStart. In fact, is there any such thing as a sequence that isn't within its own scope? I think maybe I was, way back when, thinking that you could have such a thing if you wanted it. But, if the current language never does, then why add the complexity? 090225/Wednesday <> My thoughts on '$' identifiers has been that they are essentially persistent configuration variables that are dynamically obtained through a search up the directory chain from the current working directory. They can of course be used for many other things. In conjunction with all of the other '#' stuff, there could be '#$' syntax. This would have to specify an access technique (type, proc, whatever) along with the specific identifier to access or update. So, attempting to do that could look like: #$MyPersist.myVar := #$MyPersist.myVar + 1; This really doesn't seem any different from the earlier-discussed: MyPersist#.myVar #:= MyPersist#.myVar + 1; One difference could be if a package is allowed to specify globally what "MyPersist" is. I.e. it is fixed for all code in the package. (Is this a dynamic binding for running code, or is it a static macro-like thing that expands to the above?) That would allow the simpler: #$myVar := #$myVar + 1; This is syntactically as simple as the base '$' idea, but allows different access methods for variables. Is this workable? If it is workable, perhaps I could put off deciding exactly what the basic '$' semantics are until after experience gained using '#$'. Or, I don't add the '$' stuff at all, and the config-var/etc. stuff is done using, e.g.: Config#.myVar #:= Config#.myVar + 1; (Actually, I'm not sure I need the '#' on the ':=' here - I can perhaps figure out what assignment operation to use based on the LHS '#.'.) LATER: the persistance stuff could be one use for the general '#' operator syntax, rather than being the only use of such syntax. <> Video: You have to start with a full frame of course. One parameter of encoding a video is whether, and how frequently, to include additional full frames. They are needed when starting playback anywhere other than the start of the video. Video (and simple images) should be documents. I.e. they should preserve some of the sequence of operations that was used to create them. A picture from a camera of course can't do that (other than locating things like a date/time annotation and representing that as an overlay of text). The example of text overlay is a good one for video as well. That would be done using a video production program. Simply remember the font, size, initial location, motion vector, colour, etc. of the text, and apply it to the frames of the video as needed. An important thing to note is that the pixels underneath the annotation (or underneath other artifacts like a surrounding rectangle for the annotation, etc.) are don't-care pixels. It is not necessary that they be correct. This is also true for simple images. A delta between frames (or an image) can be constructed using several methods. If the results are layered according to the order of the methods in the video data, then all pixels replaced by later methods are don't- cares for earlier methods. This can help reduce the total data needed for those earlier methods. E.g. a run-length encoding can simply scan right over part of some text if the colour is the same on both sides, even if there is complex detail obscured by the text. If the text is not opaque, then this can't be done, of course. One set of delta information could be to treat the whole set of deltas as an image, and represent it using techniques for representing images. This might work well enough that it can handle things like annotations, so that they need not be represented at the video level. Another set of delta information could be to grab a rectangle from a previous frame (the video header must say how far back this can be done, to limit the count of frame buffers the decoder must maintain). That same rectangle grab could be treated as an image, so that it can have whatever image representation methods exist applied to it before it is applied to the new video frame. A video production program should probably have a way to remove things like annotation representations, so that a sequence of simple image frames are produced, which must then be encoded using whatever general techniques exist (which might well be powerful enough to almost reproduce the annotation). This sort of thing would be desired by those who wanted their annotations to not be removable - e.g. logos, copyright/author notices, etc. 090226/Thursday Done the grunt editing of replacing Proc/Context_t with Package/PContext_t and Exec/EContext_t, in both Zed and C. Now trying to get it to work again. I wonder if I really do need to add the xxx_completed fields to the C forms of the various TempXXX structures. I'm wondering if there are cases when compile-time execution switches from C to Zed, and the Zed code starts accessing the non-existant field in the C structs. Perhaps even only a few of the structs need them? Might be safer, and easier, to just add them to all of the TempXXX C structs. Done, but no change on the bug I'm chasing. Resolved. Much more testing needed, though. Add a "completed" to TempCall, etc.? 090227/Friday With the above changes, I get more errors from test/ctime.z, complaining that "u" is no defined in Weird_t. These are valid, and resulting from the fact that now that there is a separate EContext_t, there is no scope in which to define "u". Hmm. I could perhaps "fix" this by changing my new ParseExpression so that it creates an EContext_t. Hmm. It already does. RESOLVED - there was a check for ectx_containingProc in Exec/FindInScopes. Coming back to an issue I think I raised before. If a field of a capsule is 'private', can a capsule that extends that capsule write the field? It would be good if it cannot. Later: they are not. Good testing of this in capsules/capsule1.z . Add a tail pointer to TempSeq_t. Rename as TempSequence_t. DONE Types/BaseDesc_t => Types/BasicDesc_t. DONE 090228/Saturday <> Some part of my mind has been thinking that getting Zed going compatibly on 32 bit as well as the current 64 bit will be a pain because I would need to force pointers to occupy 64 bits somehow. That isn't true. Pointers can stay at 32 bits, and alignments can reduce to 32 bits. It does not matter if structs/records/capsules look different across the architectures, since it is not correct to attempt to persist or transmit raw structures. The ByteBuffer stuff must always be used on an element-by-element basis. The system can optimize and just copy bytes as appropriate. That would require some privileged ByteBuffer code which can deal with larger values as wholes, and so do the optimization based on examining the fields types. I was looking at what the C code for ioProc's does. Saw the duplication between C and Zed for IoCallDone. Replaced this bunch of C code with a call, via "callExecProc", to the Zed CallIoDone. All seems well. Saving the code here just in case. struct Package_PContext *pctx = tioc->tioc_pctx; Exec_EContext_t *ectx = pctx->pctx_ectx; Types_Type_t *t = Types_SkipNameAndExec(tioc->tioc_cl->cl_proc->ex_type), *resType; z_string_t name; Exec_Exec_t *expansion, *ex; IoCallStack_t *iocs; Exec_IoCall_t *ioc = MEM_ALLOC(sizeof(Exec_IoCall_t)); VECTOR paramVec = NIL; if (tioc->tioc_completed) { return NIL; } INC_REF(tioc); callExecProc(pctx, "CallIoPhase", 3, tioc, NIL, (void *) iop_complete); switch (t->t_kind) { case tk_proc: resType = Types_ValueType(t->t_.t_proc->pd_resultType); break; default: resType = TV->TV_Error; break; } expansion = Exec_SequenceNew(ectx->ectx_currentSequence); if (assignIncompat(pctx, resType, &expansion)) { Proc_ProcInstantiation_t *pi = NIL; Proc_Proc_t *pr; pr = Exec_GetConstantProc(tioc->tioc_cl->cl_proc, &pi); name = pr != NIL ? pr->pr_name : util_string(""); emitErrorName(pctx, 277, name, 278); zabort(); } iocs = ectx->ectx_iocs; COPY(ectx->ectx_currentSequence, iocs->iocs_tempSeq); COPY(ectx->ectx_iocs, iocs->iocs_next); /* Must not do this until at least after the "callExecProc" call, else the tio_completed test in the Zed code will trigger. */ tioc->tioc_completed = z_true; if (tioc->tioc_count != 0) { Exec_TempIoCallList_t *tiocl = tioc->tioc_ioParams; z_uint_t i; paramVec = util_vector(TV->TV_IoCallElementVec, sizeof(Exec_IoCallElement_t), tioc->tioc_count); for (i = 0; i != tioc->tioc_count; i += 1) { Exec_IoCallElement_t *ioce = &INDEX1(paramVec, i, Exec_IoCallElement_t); INIT((*ioce).ioce_mainExec, tiocl->tiocl_mainExec); INIT((*ioce).ioce_format, tiocl->tiocl_format); INIT((*ioce).ioce_widthExec, tiocl->tiocl_widthExec); INIT((*ioce).ioce_precisionExec, tiocl->tiocl_precisionExec); tiocl = tiocl->tiocl_next; } } ioc->useCount = 1; INIT(ioc->tptr, TV->TV_IoCall); INIT(ioc->ioc_cl, tioc->tioc_cl); INIT(ioc->ioc_ioParams, paramVec); INIT(ioc->ioc_expansion, expansion); ex = exec(resType, exk_ioCall); INIT(ex->ex_.ex_ioCall, ioc); ex->ex_hasError = tioc->tioc_hasError; COPY(tioc->tioc_pctx, NIL); COPY(tioc->tioc_cl, NIL); COPY(tioc->tioc_ioParams, NIL); return ex; And, done the same for CallConstruct: Exec_EContext_t *ectx = pctx->pctx_ectx; ConstructStack_t *cs = ectx->ectx_cs; Exec_Call_t *cl = cs->cs_cl; Exec_Construct_t *con; Exec_Exec_t *ex, *callRes, *exRes; Exec_TempScope_t *tscpOuter = cs->cs_outerScope; INC_REF(tscpOuter); /* End the inner scope containing the construct body. */ ex = Exec_SequenceNew(ectx->ectx_currentSequence); ex = Exec_ScopeNew(pctx, cs->cs_innerScope, ex); if (ex->ex_type != TV->TV_Void && ex->ex_type != TV->TV_Error) { emitError(pctx, 276); } /* These will be decremented when returning from RunCompleter, so do the needed increments here, the same as jsr/jsri would, with the fixParamRefs call. */ INC_REF(pctx); INC_REF(ex); callRes = (Exec_Exec_t *) callExecProc(pctx, "RunCompleter", 2, pctx, ex, NIL); con = MEM_ALLOC(sizeof(Exec_Construct_t)); con->useCount = 1; INIT(con->tptr, TV->TV_Construct); INIT(con->con_call, cl); INIT(con->con_body, ex); INIT(con->con_expansion, callRes); exRes = exec(TV->TV_Void, exk_construct); INIT(exRes->ex_.ex_construct, con); /* Append the construct to the outer scope, close that scope and return the resulting Exec_t. */ Exec_SequenceAppend(ectx->ectx_currentSequence, exRes, z_false); exRes = Exec_SequenceNew(ectx->ectx_currentSequence); exRes = Exec_ScopeNew(pctx, tscpOuter, exRes); DEC_REF(tscpOuter); return exRes; 090301/Sunday In DGol.z, it won't treat a "toUint" of an enum tag as a constant - it says I have code outside of a proc. FIXED. 090302/Monday Another link from Slashdot - Google's NativeClient, designed to be fully secure - they show the source and invite bughunters: http://nativeclient.googlecode.com/svn/trunk/nacl/googleclient/native_client/documentation/nacl_paper.pdf http://code.google.com/contests/nativeclient-security/ 090303/Tuesday <> Look at assert names from Eiffel. Don't seem to be able to call an interface method directly on a value whose capsule implements that interface (see DGol.z). [That was the intent - you have to assign the capsule value to an interface variable, and do the call from that context. This is needed because interface symbols can duplicate from interface to interface, and between interface and capsule.] Rename package Base to be Basic. DONE Add use of 'baseCall' to allow call of member function of something we are extending. Also use that in constructor instead of capsule name, when calling "parent/base/whatever". 'baseCall' is like Java's 'super'. DONE See DGol/SetHSV: complains about return type of case expression (Colour_t). FIXED Hmm. DGol/Frame_t calling its own SetColour - is this before I've filled in the virtual function table? NO. 090304/Wednesday To: djr@nk.ca Subject: DGol/virtual calls Date: Wed, 4 Mar 2009 09:26:29 -0700 (MST) From: Chris Gray It has occurred to me that you could be making the assumption that I know more about C++ than I do, and that our translation from C++ to Zed is incorrect because of that. In Zed, if one uses the "{obj.member-proc}(params)" syntax, then the call is going through "obj"s virtual proc table. Always. If, in C++, you call a member function from inside a member function or constructor, is that the case, or does the language specify that only that class's methods (or any ancestor ones, if this class lacks them) are used in that context? We saw the one place where you had specifically wanted to call an ancestor member function, but that doesn't actually answer the above question. In Zed at least, the use of the virtual proc table means that a member proc (or constructor) in capsule X can actually end up calling a member proc of an extender of X, if the object in question was created as such a type. This sounds like a not very good way to program. In particular, such a call from inside a constructor seems to me to be a very bad idea. In the constructor, it is possible that not all fields of the object have been initialized. That will certainly be the case for fields that are in the extending capsule. So, calling an extending capsule member proc in that situation is just wrong. Keep in mind that I have *NO* experience with object-oriented programming in C++. I've read descriptions of Java stuff, but I don't have a C++ language description (well, partial ones, but they are hard reading, and it has been many years). If you find yourself wondering what something in Zed does, please ask, rather than assuming it is the same as in C++ or Java. My lack of detailed understanding of those languages could mean that Zed does something different. My thinking on this stuff has been that the use of member procs is for stuff outside of the capsule itself. So, when doing the first member proc call inside a capsule member proc, I paused. But, I went ahead anyway, since I lacked (and still somewhat lack) the global understanding of DGol to know if that was the right thing or not. I believe I want to restrict Zed so that you cannot use member procs inside constructors. Well, 2 seconds of thinking shows that that does not work. If I force you to call a local routine ("procs {" section), then you can just do a member proc call in that local routine, and nothing has changed. Hmm. One vague (for now) idea is to prevent *all* member calls to the current capsule hierarchy in any proc inside a capsule. Instead, force the use of something like "super.( 'baseCall'. Right now, I want to try to read up on the Java and C++ answers to my main question above. Having just been reading on Eiffel, I find that I do not know the answer for it. Unfortunately, what I am reading is not a language specification, but more a philosophical writing about programming, in the context of Eiffel. Do you have Eiffel docs that you can find the answer in? -Chris Subject: Java like Zed Date: Wed, 4 Mar 2009 10:49:55 -0700 (MST) From: Chris Gray Ok, in Java, a constructor for a class A is perfectly happy to call a class method in a class B that extends A, and that will show that the class variables that B adds are not yet initialized. In other words, class methods are given no guarantee that the objects they deal with have been fully constructed. I got this indirectly from the manual, and verified it with a test program. Now to see if I can get the syntax right in C++ ... -Chris Subject: C++ test Date: Wed, 4 Mar 2009 14:44:45 -0700 (MST) From: Chris Gray Ok, C++ is almost the same, as long as you remember to put "virtual" in front of the methods. :-) I couldn't get my outer constructor to call the inner constructor inside its body - I'm pretty sure that can happen if I use the magic '&' in the magic places, but I didn't persue that. I only had your DGol samples handy, and you always use the extra syntax that calls the inner constructor before the body of the outer constructor, but that is the same semantic order anyway. However, I did note that my C++ inner constructor always called the inner member function - it did not call the outer one, unlike in Java and Zed. Do you have a C++ spec where it says that happens? -Chris Subject: C++ quirk Date: Wed, 4 Mar 2009 16:41:58 -0700 (MST) From: Chris Gray Here's my test program, methodTest.cc: #include class Inner { public: int n; virtual void print(void) { printf("Inner.print, n = %u\n", n); }; virtual void test(void) { printf("Inner.test\n"); print(); }; Inner(int j) { n = j; printf("Inner constructor\n"); print(); }; }; class Outer : public Inner { public: int m; virtual void print(void) { printf("Outer.print, n = %u m = %u\n", n, m); }; #if 1 virtual void test(void) { printf("Outer.test\n"); print(); }; #endif Outer(int j, int k) : Inner(j) { Inner::test(); test(); m = k; printf("Outer constructor\n"); print(); }; }; int main() { printf("main starting\n"); Inner* in = new Inner(10); in->test(); printf("\n"); Outer* out = new Outer(20, 30); out->test(); printf("\n"); in = out; in->test(); printf("\n"); ((Inner *) out)->test(); printf("\n"); return 0; } I invite you to predict what the output is before compiling it and running it, and before reading the paragraph below. ... (lots of blank lines) It appears that within a constructor, method calls are not virtual, regardless of the declaration of the methods. This differs from Java and Zed. I could make Zed work this way, and wondering about doing that is why I did this experiment. You can see, however, that C++ does not avoid the use of method calls on partially constructed objects. The evidence is the output at the beginning of the second block: Inner constructor Inner.print, n = 20 Inner.test Outer.print, n = 20 m = 0 The Inner constructor is being invoked on an object being created as an Outer, yet it calls Inner's print. I then force Outer's constructor to call Inner's test, and simply "test()", and both call Outer's "print". Another possibility is that the C++ runtime starts the Outer object out with Inner's virtual function table, and then switches to Outer's virtual function table after the header initializations in the Outer constructor... Ok, I added a Middle class. It's constructor, when constructing the Outer, calls the Middle print, and the Inner constructor still calls the Inner print. So, the situation is the first possibility above - within a constructor, method calls on the object being constructed are not virtual. -Chris Reply from Don: > In particular, such a call from inside a constructor seems to me to be > a very bad idea. Yep, OO allows new kinds of bugs. In C++, while the constructor of Cderived calls the constructor of Cbase, the Cderived object doesn't exist yet. But the Cbase object exists, and the polymorphic calls use that object's methods. The C++ reference manual has examples in section 10.9c and 12.7. I had expected Java to work the same way, but it (gcj) doesn't; calls within a constructor can polymorph too far. Eiffel has creation procedures, but initialization is quite different from C++ and Java. Modula-3 doesn't have any kind of constructor. > My thinking on this stuff has been that the use of member procs is for > stuff outside of the capsule itself. ... > I believe I want to restrict Zed so that you cannot use member procs > inside constructors. ... Don't do this. (For experts only:) Don't do this yet. 090305/Thursday Subject: small C++ note Date: Thu, 5 Mar 2009 09:48:20 -0700 (MST) From: Chris Gray Yep, OO allows new kinds of bugs. True, but I'd like to minimize those created by the language itself. In C++, while the constructor of Cderived calls the constructor of Cbase, the Cderived object doesn't exist yet. But the Cbase object exists, and the polymorphic calls use that object's methods. The C++ reference manual has examples in section 10.9c and 12.7. If they want to view it that way, then I would approach it by restricting what you can do in the constructor, not by changing the meaning of an already existing construct. What they have now is that the call syntax means one thing in the constructor, and another thing in a function immediately beside the constructor. Seems a bad choice to me. But then, lots of things in C++ seem bad choices to me. My choice would be to either disallow the calls, or do nothing. In Zed, the fact that you are using the '{' '}' call syntax spells out what is happening. I *could* do either of the alternatives I mentioned as for what C++ is doing. However, then I have the same issue - neighbouring identical calls do different things. In fact, if I do it exactly as C++ has, I would have a '{' '}' call that is not indirect. Very bad idea. I think the right answer (to me) is to do as Java does - provide some kind of 'super' syntax, that works for all methods, not just the constructor. Then the reader can see what the programmer is doing, and the syntax has consistent meaning. -Chris Looking at the Colour_t problem in DGol, where Zed complains that the alternatives of the 'case' and 'if' are not compatible. The problem is that "caseCompatible" is setting tc_type to the skipped form of the body type, rather than the original. However, I noted that "caseCompatible" is using "SkipNameAndExec", and then comparing types. That would values whose types are two different namings of an underlying type to be compatible. It should just be SkipExec, sort of. That alone doesn't work when the 'case' alternatives are a combination of enum tags and expressions returning values of the enum type. The tags are of the unnamed type, but the expressions are typically of the named type. In a 'case' (or 'if'), either form can appear first. So, I think I need a Types/Names to test to see if one type names another, and then switch the final type of a 'case' or 'if' to the named version, when both appear. Also, can I merge the code, so that 'case' and 'if' expressions are handled the same? ALL DONE TempCase_t still uses N**2 append. Add tail pointers. NO - have to scan for duplicates anyway - already noted earlier. Whheee! Can't use "Types/Names" for the above, since "Names" is a package in use. Get a ton of errors, since I allow that declaration, but then can no longer use "Names/" in any Types code. Ick. So, use "Types/IsNameOf" instead. DONE I should emit a warning or error about that. DONE When ending a capsule, check that all methods used within the capsule are either defined in the capsule, or a capsule it inherits from. To do this, will need a callout from Exec when they are used. DONE %%% Urph! Cannot allow methods to be used as proc values, since they take their parameters in a non-standard order (object ref is last, even though it is declared as first). Need to check this somehow. [Done - their type is actually Proc/Proc_t.] Whew! That was a tiring session. Added new stuff (exk_methodRef) which allows capsule methods and constructor to statically call to capsule methods. It uses the value from the virtual proc table. 090306/Friday <> Thought from a couple of days ago: in the package browser, there should be buttons for "more detail" and "less detail". Some icons or other. The "more detail" is in particular reference to the item whose data is at the very top of the window, or the highest on the window, if none. There should also be keystrokes to do those operations. Zedweb pages. If a website is mostly unchanging, then it is most efficient if it is a single large download, which may mean that it is all in one package. If it changes regularly, then there should be a fixed top framework page/package, which references the changing parts within it. That fixed framework should be shared among websites, where possible. It should be identified by my standard originator/version id pair, so that it does not need to be downloaded and cached for each website. Gaaaaa! Waiting until code generation to resolve the reference to a method in an exk_methodRef is useless. Code generation happens at the end of the proc definition. The only reasonable solution I see right now is to force the constructor to be the last proc in the "procs {" section. No, that doesn't work. RESOLVED - see below. There are currently checks in assignIncompat for Exec_t's that should not be used as values, e.g. non-regular procs. That's the wrong place for those checks. Consider a case or if expression that has one on a branch. FIXED 090307/Saturday Either way, final errors about non-existent methods cannot be earlier than the end of the capsule. So, the complexity of adding pre-declarations of the methods the capsule will implement (which is what C++ has) seems not worth it. C++ catches the final problem at link time. Instead, I should maintain a linked list of Exec/ExecMethodRef_t's, and scan and resolve those at the end of the capsule, complaining about those that cannot be resolved. Note that any attempt to resolve them before the end of the capsule can only resolve those that resolve into the capsule itself. [Done] <><> Will want to be able to do a full check and issue error messages right at the point of the call, when running under the IDE GUI. This is so that the call can be highlighted if it cannot be resolved. Or something. ExecNameRef_t => NameRef_t. ExecNameInfo_t => NameInfo_t. DONE 090309/Monday Done 'baseCall', at least for regular (capsule and interface) methods. However, need more testing - haven't tried interface ones yet. DONE It would be some work to use 'baseCall' in constructors. So, I'd rather not do it. Some justification for that is that if there is no explicit constructor in the extended capsule, then there is no actual call happening - just the default constructor processing. Semantically, I think I can argue that it should not be 'baseCall'. In terms of implementation, an explicit constructor uses a direct subroutine call ('jsr'). A default constructor uses the 'capcon2' instruction. 090314 Spent a while this week on income tax, then got lazy. Need to write a file on summarizing Eiffel stuff. A couple of points here: - Eiffel does not use '()' on calls to member functions with no args (other than the implicit 'this'/'self'. They specifically say that such a use allows for either a function or a data field. Also, an inheriting class can change from one form to another. Thus, a base class with a data field can find itself dealing with a descendant object that has a parameter-less function instead. I can see 3 alternatives: a) implement all field accesses as function calls internally b) have their own linker, which can figure out when a function call is needed, and writes the needed ones c) make the check at run-time - Eiffel allows multiple inheritance. There are a bunch of rules when you inherit that allow the programmer to control what happens: - members (fields or functions) can be renamed - members can be undefined - multiple virtual ("unrealized") members are not a problem if they have the same signature. One "realized" can satisfy them all. - if all of the signatures are the same, the multiple members are merged into a single one. This allows the programmer to choose what happens, since a rename is always allowed. - the public/private status of members can be changed (either way) - there are rules governing the "require", "assure" and "invariant" assertions, and how they all work together. Anyway, I think their answer is far too complex for me to include in Zed. However, the thought occurred to me of allowing data fields inside of interfaces. Another way to think of the same thing is to allow a capsule to implement another capsule, along with any interfaces. If I go that route, I think "implements" should become "provides", and the concept of separate interfaces should go away, subsumed under capsules. Some random thoughts on the implementation of this, and its consequences: <> 1) Currently, the capsule construction process allocates and initializes a small 2-element record for each interface that the capsule implements. Those records contain a reference to the method vector for that capsule's implementation of the interface, along with a reference to the actual capsule object. A reference to that small record is what is passed to methods in capsules that are that capsule's implementation of the interface. The conversion from capsule object to a reference to that record is required at runtime (the "exk_capsuleToInterface" Exec_t kind), and is simply an add of a constant offset and a load. The thought is to add the interface data fields to that record. Aha! Now that I write this down, that does not work - there is an extra value in there (the reference to the capsule object) that is not there in the layout of a normal capsule object. OK, this would work if I added data fields to interfaces, but not if I allow a capsule to "provide" another capsule. Well, it could be made to work, by leaving space for that extra field in all capsule objects. Objects that are created as top-level objects would leave it as 'nil'. Assume I go with the "provides" and full capsules, and use that extra field in all objects. 2) Methods within a capsule that implement stuff for a provided capsule have access to the data fields within that one capsule (which can of course be inherited or inlined), along with the fields in the capsule containing the method. Those fields, however, require an extra level of indirection. That is already true now in Zed. Syntactically, I think it makes sense to require that such methods have two initial parameters - the first is a reference to the data of the provided capsule object, and the second is a reference to the providing capsule object. This resolves any possible name-clashes for fields, and for the use of '{' / '}' calls. [No, see below] It could be possible for the method to have only the initial special parameter, if that matches the method declaration %%%%% Again, this doesn't work. The only way I can see right now to make this work is to provide that second 'poly' parameter on all capsule methods. Ick. Ok, I can't change the signatures in that way. That means I would need a syntax, which can only be used inside actual methods that implement virtual methods for provided capsules, that gives access to the object of the providing capsule. A bit icky, but semantically more correct, I think. What is the syntax? "self"? "this"? Hmm. Perhaps a capsule which wishes to allow itself to be "provided" must explicitly declare that special data field? Then, the name of that field is what is used. I may like this better. That data field must of course be the first data field in such a capsule. Its type would presumeably be 'poly'. [No. It is always the providing capsule reference that is passed in.] [090625: most code within any capsule will be dealing with fields of that capsule. So, only use the one standard poly parameter. In a providing capsule Cap2_t, methods and fields of a provided capsule Cap1_t are accessed as "cap2Expr.Cap1_t. '=' 'true'. The above would be handy in DGol.z 090323/Monday <> I think widgets should have minimum and maximum sizes that they want to be displayed with. Minimum would be no decorations - just the contents. That would be used when space is constrained, e.g. when the outer screen is not very big (as on a cell-phone display). There should be a multi-panel, which divides its parent either horizontally or vertically, and remembers and indicates the dividing position. When the multi-panel is resized, it can use child minimum/maximum sizes to intelligently reposition the dividing lines. 090324/Tuesday %%% Could allow 'export' of procs in the utility proc section of a capsule. They would go into the appropriate table of the enclosing package. They already have standard calling conventions, etc. 090325/Wednesday The above could get rid of a bunch of extra stuff to support util procs. They would be listed in the capsule as capsule procs, but there is no need for special lookup, calling, etc. of them - just put them into the package tables as normal - thus they can be private, local or exported. [Later: I no longer grok this.] %%% Somewhere above I mentioned the idea of not limiting variant record fields to just trackable pointers. Allowing any values that fit within the single Z word works too, with no issues of different sized allocation. This could be uint, bool, float, bits types, pointers, etc. The same rule can hold for types which instantiate a non-@ generic, with the big difference that two actual instances of the code must be generated, one which uses reference instructions, and one which uses non-tracking instructions. Either mode could be generated as needed, rather than at the time the generic is created (i.e. suppress code generation at generic creation time). I would need structures to represent the pairs of code bodies within the generic. Note that generics with more than one parameter can have more than 2 possible code instances. %%% <> Can I create either a capsule or a generic that maintains a sorted array of values, and allows insert, delete, iterate, binary search? If it is a capsule, then users would extend the capsule in order to provide actual procs for the compare and swap operations. In that situation, it would be good to have a mode on the capsule that allows the initial capsule to be 'partial' (no implementations of the methods), but requires that any extending capsules not be 'partial'. Perhaps add a possible 'complete' for that purpose, so the capsule in this case would be "partial complete", which looks weird. The big problem with doing this with capsules is that only trackable types (capsules) can be handled. Also, Zed's single inheritance rule greatly reduces the applicability of this. Perhaps it could be an interface instead of a capsule, and the various utility routines are just within the package. The problem with doing this with generics is that the values must be fixed size, since generic code cannot index arrays when the size of the elements is not known. The above idea of reducing the restriction on generic parameters would help a lot here. With C++ templates, these problems don't exist since C++ essentially just does a type macro substitution and then recompiles all of the code into new copies. Generics as they currently are also have a problem in that they have no way for the instantiating type to provide any operators to the generic code, such as comparison, swapper, uintHash, etc. Somewhat like Eiffel does, perhaps generics can take non-type parameters, e.g. proc parameters. Such things are values, not types, so generics would need some kind of compile-time handling procs in order to make use of such things. It's not at all clear that it would be workable. 090327/Friday %%% Noticed while doing a drag-select box. Zed complains on a comparison between capsule values that are directly related by extending. It shouldn't, they are comparable. It also shouldn't be complaining when mixing signed and unsigned in arithmethic expressions - don't I allow that, with conversion to sint? Apparently not. [No, I didn't want to have the implicit range checks on simple arithmetic, etc.] 090328/Saturday When pretty-printing a 'construct', there are two extra semicolons after the 'end'. I think the Exec_t code that is substituting is ending up with a couple of empty statements in the sequence. FIXED Use exk_alternative to allow binary operator code to accept an unsigned value mixed with signed values - insert as the "alternative" a fromUint of the unsigned value. [No, I don't think I want to do that.] 090331/Tuesday <><> Note: far earlier discussion had 3 editing modes: straight text, electic text, and pictorial/GUI for non-programmers. A lot of the nice pretty-printing is now done, and can be displayed in the PackageBrowser.z that uses Don's dgol (via our DGol.z and my Matrix.z). There is more to do, like wrapping lines, [getting rid of extra parens in expressions], formatting comments, etc. One of the things that is still missing is any kind of highlighting - right now all of the output from Print.z just goes into a CharBuffer. I've been thinking about how to represent things. 1) I believe that the PackageBrowser code should maintain a cache of the displayable form of various package contents. This should probably be an array of arrays of lists of variant records. Or something. The outer level array would be indexed by the index of the item within the package. The inner level would be indexed by line number within that item. The linked list could include things like "one space", one character with properties X, string with properties X. It might end up just being another array, e.g. of property/string pairs. 2) The "properties" should include: - 2 bits for error level (0, 1, 2, 3) - bits for categorization: reserved word, comment, explicit constant, local symbol, symbol in this package, symbol from other package, ...? there might be some bits for "marking", depending on just where that should be done - see below. [Done in 2009] 3) having the individual lines represented makes it much easier to do scrolling - single line, whole page, and scroll-bar proportional 4) the pretty-print code would need interfaces for: - return number of elements in package. The pretty-printer might want to maintain its own tables different from the Package_t itself, if it wants to do special things with comments. Unless of course the representation within Package_t changes so that is not needed. - given an index, produce a stream of callouts yielding that item, pretty-printed. The receiver will then cache the output as above. Note that it is the pretty-printer that exports the definitions of the "properties". However, see below, it could be useful to allow the browser itself to use bits for marking. So, one big question that comes out of this, is that of what level at which editing happens. Does it happen at the text level, or at the Exec_t tree level? I had done the start of a tree-level programming editor on CP/M many years ago. I believe I gave a copy to Ruth Parnall (from the original Myrias) to try out, and she said she didn't like that whole way of doing things - I believe it was interference in the normal stream of typing. So, my thoughts for Zed had been that it would be a text editor, and that the resulting text would be re-parsed, either all the time, or when the text cursor leaves the current line, or when explicitly asked. Likely the user could select. It could also vary, depending on whether the user is entering new code or editing existing code. Basing the editing on the tree nodes could end up being a lot like the "electric C" stuff in Gnu emacs. This decision affects things like marking of stuff. For example, if you highlight things using the mouse (or keystrokes), is the selection constrained to be some number of Exec_t units, or is it arbitrary text? If a search is done, does it happen within the cached displayable forms, done by the browser itself, or does it happen within Exec_t and Package_t structures? In the latter case, "marking" in "properties" would flow from the pretty-printer to the browser. In the former (arbitrary text) case, the marking is done directly by the browser. Would it be reasonable to offer both modes? Perhaps the editor would normally operate in arbitrary text mode (in which case the same editing code might be usable for editing simple text stuff, if there is such a thing), but holding ALT while doing the edit action (whether via the mouse or via the keyboard) turns it into a structured action. For example, if the cursor (text or mouse, depending) is on the 'if' of an if-statement, and the mouse action or key combo for making a selection is done, then without the ALT, the single letter of the 'if' is selected, but if the ALT is held, then the entire if-statement is selected. The browser could thus use selection tags either generated itself, or from the pretty-printer. It would remember how the selection was made, so that it could either do simple text deletion itself, or call into Exec_t or whatever to delete the entire if-statement. In arbitrary text mode, the browser would then need to submit the text to the parser for checking. 090401/Wednesday <> On Wed, 1 Apr 2009, Don Reble wrote: > How's that? - the size of the knob, versus the size of the scroll box it moves in, is proportional to the size of the viewed portion of the data versus the total size of the data. There is a minimum knob size, of course. - the position of the knob within its scroll box is a direct mapping of the position of the viewed portion of the data within the total data - you can left-click-and-hold on the knob and drag it, with the data window being dragged accordingly. If you release when the pointer it well outside of the scroll box, the knob returns to its initial position, as does the data view window - left-down-clicking within the scroll box, but not within the knob, centers the knob at that position - right down-clicking within the scroll box moves the knob towards that end of the scroll box, the amount in proportion to how close to the end of the scroll box that the click was made I don't remember the details of how the old X scroll bars worked, but they were very non-intuitive. Left and right clicks moved the bar up or down, in proportion to where you clicked (or something). The size of the knob wasn't proportional to anything (maybe the size of the window). The position of the knob was hard to interpret as a guide to the scrolling. You must have Firefox on your system - play with the scroll bar on a large web page. I can set up a small-contents long web-page if needed. Gah! The way I put comments into enum types has broken them. Types/Range will return the wrong answer for an enum with comments. I think that Types/EnumToString can as well, and it has a weird implementation that should have changed when enums became represented by a vector. I don't know what the right answer is. I'll ponder for a while. FIXED Related to the earlier issue about their being two lines containing only ";" after a construct. The line before the construct, if it is in the same sequence, is indented 2 times as much as it should be. FIXED 090402/Thursday Constructs, or at least Lists/Iterate, are not treating their scopes properly. The initialization assignment for Iterate is being put into the outer new scope, rather than into the iteration body scope. Hmm. that may in fact be correct, sort-of. It doesn't come out in fmtExec properly, however - it shows up, explicitly, before the construct call. It should not show up at all in fmtExec - it should be in the scope that becomes the expansion of the construct call. FIXED 090403/Friday %%% I mentioned above the idea of merging the various stacks of things kept within the contexts. When doing that, can add a marker item, that can only be removed by Exec code. That should protect us against user code in 'compileTime' and 'construct' procs from trying to remove elements they didn't add. Also, have a routine that will pop off and complain about all remaining items up to the marker, then pop the marker to clean up. 090404/Saturday <> A thought from yesterday - one alternative for block comments could be that they are actually "documents". If they are a simple text document using the default fixed-width font, then that text document can contain formatting information, which specifies the layout of the block comment. I need a better name for the concept of "blank line, // comment or /* */ comment". They are not "whitespace" in the sense normally understood by most people. They are not all comments. They are not ignored, in the sense that Zed parses and preserves them. "Nonparsed" is not very easy on the tongue. "Separator" implies more kinds of things than are covered. I'd like something that is meaningful to the non-technical. Roel: "markup" Don: "filler" 090405/Sunday Me: "clarification", "aids", "clarifiers". Ick. 090406/Monday <> A secure browser could have several "personas" for the person using it. They would be each have a given identify, history, cache, bookmarkes, etc. 090407/Tuesday <> From a comp.arch discussion: "modern features like generics and delegates and aspects" 090408/Wednesday Make the bound selector on "getBound" optional - if not given then "0" is implied. This makes it a bit easier and tidier to use the language. DONE - required for multi-dim, disallowed for vectors. Strings can be "[] ro char" internally. That implies that 'getBound' can be used instead of "BI/Length". Also array indexing instead of "BI/ StringChar". If matrix slicing is introduced (I think there are thoughts on it earlier), then that can be used for "BI/SubString". DONE Would need a way to convert from between actual "[] ro char" and "string". In general a "copy" operation could go from "[] " to "[] ro ", and that could be used to turn "[] char" into a string. The same operation could go the other way. Is that reasonable? Syntax? It might actually be possible to just use the matrix slicing syntax to do the copying. [Did not make 'string' be "[] ro char".] Something I still haven't properly addressed is the concept of declarations that declare multiple things in the same declaration. Not really difficult to do, but a pain to have to do it. DONE <> I couldn't just now find any previous notes on it, but I recall there being some - could be a while ago. The "Fmt" code needs to look for a "fmt" proc on any arbitrary type it is given. However, in general you don't have access to the type itself. "Fmt" does because it runs at compile time. But, consider a debugger, or perhaps code that reads/writes from a byte-stream format. Also consider the package browser trying to find an icon for a data item. It doesn't know, at its own compile time, the type of the item. That type may not even exist yet. So, the package browser will have to use 'any' to deal with such elements. But then, how can it try to retrieve something like an "IconName" property from the type of the value? I don't want to allow something like "typeof" to non-privileged code. However, what I can do is allow the fetching of either any property, or some fixed set of pre- known properties, from "any" values. So, there could be a routine "GetIconName", which accepts an 'any' parameter and returns the value of the "IconName" property from the actual type of the 'any' value. That routine is written in privileged mode as: proc GetIconName(any a)Names/Info_t: if a ~= nil then Types/ExportFind(typeof(a), "IconName") else nil fi corp; Can I allow the property name to be any arbitrary string? Have to think about the implications of that. <> Language called "Groovy" is an extension of Java that allows "metaprogramming". 090411/Saturday Fixed a couple of bugs. One was a missing COPY relating to the new "tseq_nonWhiteTail" in Exec.c/Exec_SequenceNew. It was resulting in the aek_alternate being messed up for test/ctime.z test/ctime.z itself was passing "true" for "doNotCompile" on CreateContext. Sigh. Added missing code in bcRun.c/zapLocal for bits types. Fixed a bug there where it was still assuming "bool" is the same size as uint. That was the one that was making PackageBrowser die on exit, since main had a struct (DGol/FontParams) on the stack, which contains bools. Fixed a bug in Exec.c/FieldRefNew, union case, where it wasn't using the right variable in the string comparison. 090412/Sunday Thought from a couple days ago: item-specific comments can just be block comments with the item kind as the first characters in the comment, right after the "/*" (no intervening spaces). E.g. /*proc , /*capsule , etc. No need for tokenizer to look for those - that can be done by the parser, when it is absorbing the comments. Rename "whitespace" types, etc. in tokenizer to be "comment". That's consistent with the rest of the compiler. I'll stick to the term "comment" for all of these things. A blank line is simply a comment that has nothing to say. :-) DONE 090413/Monday <><> Another comment tag is "field". That would be for block comments before the declaration of a record/struct/capsule field. The comment would contain the documentation for the field, which could be displayed in a pop-up when the mouse is hovered over that field name, and could also be used in any kind of automatically generated docs. Perhaps there should also be a "section" tag. It is only there so that the comment will appear in any automatically generated "man page" or "book" based on the package. They are only meaningful at he package level, but the user might add them within procs, simply as a hint to the reader. 090414/Tuesday Installed some vim packages to try to figure out what it does for syntax highlighting. It is, of course, very complex, with lots of options and modes. I was not able to figure out the defaults from the 'vim' files, so I installed the full (no-GUI) vim package, and am just looking: comments: blue #include, #define, #if, #elif, #undef: purple strings, numbers, NULL: red #if-ed out stuff: blue return, sizeof, switch, case, default, if, else, for, while, break: brown const, static, extern, struct, union, void, typedef, unsigned, int, float, long: green [Added a "vim default" choice for colouring in PackageBrowser.z] 090421/Tuesday Been doing Lego again. Have been working on the pretty-printing stuff. Part of that is adding the ability to store comments in several larger collections. In the case of generics, I put them in as generic elements. The problem with that is that they then need to be instantiated, like other generic elements. That seems to me to be a waste of memory. Sigh. Perhaps I should switch to the alternate route and just add a comment vector to Generic_t. Ick. It also means that GenericInstantiationElement_t needs them. Definitely go to the comment vector implementation. DONE. 090422/Wednesday The reason for subpackages is their namespace. A package without any subpackages only has its local and exported namespaces. For a large enough project, that isn't enough for all of the local procs that could be needed. Hence the need for subpackages. Hopefully that will be enough - things larger can presumeably be split into multiple actual packages. Later: another reason is that they can represent "this is where you need to look for all of the stuff concerned with XXX". In other words, organization. Sure, you could keep all such stuff together in the larger capsule itself, but I expect it is satisfying for people to be able to see the very distinct beginning and end of such a collection. A good syntax for an explicit "cast a capsule value into the type of an interface that it implements" is '.' . Thus, if there is a proc "Doit1" in inferface "Int1_t", it can be called from capsule object "TheCap" using: "{TheCap.Int1_t.Doit1}()". [Later: had this in for a while, using "##path-to-interface". But, it is ugly, and it needs to allow for a full path when the interface is in a different package. So, ended up taking it out. If I put it back in, make it use a Package/NameReference_t to the interface, so that it can be pretty-printed properly. Also at 090314] 090423/Thursday <> Can I put more meaning into the little tags after the "/*" in block comments? E.g. can there be something like "proc,top=XXX", where "XXX" refers to a comment section, in the same package, with "section=XXX" in it. This is a lot like the "href" in HTML. Also, block comments that are within procs, but have "section" or something after the "/*" are perhaps significant to the doc viewer in that those comments are displayed by the doc viewer - the sequence of labelled comments in the proc is considered to be the documentation for that proc. All of the compounds (generic, interface and capsule) have "hasError" fields. So, there is no reason not to allow as full a definition of them as possible, even in the face of errors. Especially given that the use of such an interface or capsule type results in an error-flagged Type_t. DONE. Introduce a TempGeneric_t. Add head/tail lists and conversion to vectors for all elements of all 3 compounds. No reason not to, and it results in more consistency, and can improve execution time in extreme cases. ALL BUT GENERIC DONE. Later: adding TempGeneric_t would be icky. There are quite a few important uses of pctx_containingGeneric in Exec, Types and Proc. The save is that there likely aren't very many generics in the universe. And, those that do exist don't have all that many elements in them. E.g. the largest in the current Lists package has 15 elements. 090425/Saturday If I allow non-letter/digit in identifiers, as passed directly to the Types/Exec/Proc/Package routines, then when such code is pretty-printed, it could be valid source but with a different meaning. That's not good, since it would effectively allow someone to conceal what their code is actually doing. So, must check all declared identifiers for validity. DONE 090430/Thursday It ocurred to me last night that many of the TempXXX records that I use for Package/Proc/Exec/Types construction can actually be struct types, that are passed in via '@'. The actual struct type is not exported from the defining package, but a rename of it is. This prevents code outside of the defining package from reading or changing any fields. Doing things this way would cut down quite a bit on the amount of memory allocation/freeing going on during the construction of stuff. Doing this would constrain callers a bit, in that they couldn't have a reference that they can pass around and test. But, they can pass around the '@' value. If they need a reference, they can define a record type that has the TempXXX struct as a field, and pass that around. Started doing the above. Forgot that I was dealing with '@' types, and would have to assign a value of type "@ XXX" to a variable of type "@ XXXPrivate". That currently doesn't work. Should it? Later: actually, it does work - the problem is that TempDecl_t was still pre-declared as a record type by Types.c, and the declaration as a struct in Types0.z cannot override that. RESOLVED Days later: as it is, this will not maintain security. The outside programmer can just take the exported named hidden type as a value, then look inside it to grab a reference to the actual record type. From there, some use of Exec stuff should allow changes to struct members. That was the point behind the "rd_writeablePackage" stuff for records - they explicitly only permit writes to the fields from that package. I *could* do a similar thing with structs. Should I? DONE. SECURITY IS BACK, TOO. 090503/Sunday <><> It would likely be good if virtually all "applications" are actually widgets that users can use a GUI designer to place within windows. It would be good if the entire process were easy enough that non-programmers could do it. Essentially, let them totally customize their desktop experience, using all of the tools that the system has. Where applications have their own menu sets, within-frame menus will still work fine. Another alternative (allow both) is to lightly highlight the active frame within the larger window, just like windows are highlighted on the desktop, and the menus at the top of the window are then those of the highlighted frame. To make this work, it pretty much has to be click-to-select for those frames, else you can only get at the menus of those frames that are near the menu bar. It would be good to have a non-programmatic description of the layout of such a merged desktop view. People could design them, and send them to others. Sort of like "skins", but at a larger level. There could also be skins per window, which wraps the various panels within it. Later: I had a flash about communication between the "mini-applications" that the user can "plumb" to control how they work together. How useful or practical that would be I have no idea. I expect it is more useful when customizing a larger application that has multiple components that can naturally communicate, because they deal with roughly the same kind of data. 090505/Tuesday <> Having a "tainted" flag on systems, which is set if they go into privileged mode isn't a reliable way of knowing that code exported from the system is tainted. They can go privileged, then use that privilege to find ways of clearing the tainted flag. So, I think going privileged needs to interact with a central repository, so that that remote and secure repository can know that the system in question is now irrevocably tainted. Note that this only needs to happen when first going privileged. So, denial-of-service against the server requires them to go privileged, then clear their taint flag, then go privileged, etc. Likely not hard, so will have to protect against it on the server. One easy way is to have it wait one second to respond, and have the notification infrastructure wait for that response. <> The fairly common usage of "Package/Type_t t := Package/Type_t(...)" is bulky and redundant. Allow the second "Package/Type_t" to be absent in that specific case. [How parse? You are left seeing a '(', which could be the start of a parenthesized expression, rather than a constructor.] 090506/Wednesday <> Perhaps the way to arrange a large package with subpackages is to have all of the declarations and code in subpackages - only have the overall package documentation and description directly in the package itself. 090511/Monday MS releases concurrent programming language "Axum" (previously known as "Maestro"). Not committing to support - awaiting feedback. 090512/Tuesday Wrote a test program, test/privStr.z, which creates a proc which tries to "illegally" assign to a field of a non-exported struct type. It runs and does the assignment. However, the test is not fully valid right now, because type "Package/TempDeclPrivate_t" is actually exported. I need it to be writeable inside Exec, so it is marked "export". I can only fix that when I have implemented the limited export stuff. DONE I just tried marking the test field "tmpdcl_hasError" as 'private'. That doesn't work because the Exec code needs to assign to it. LATER IT DOES On that subject, one thought had been to use an optional parenthesized list of packages after the 'export' reserved word. [Done] Perhaps that is what I should implement next? DONE Eeek. I just realized that I don't have a way to make matrix (or array) elements be only writeable in specific packages. Well, array elements can be that way if the array is a field of a record. But if a matrix is a field of a record, you can't change which matrix the record references, but you can change the elements of the matrix. That means that all of those places where I've "compressed" Exec and Type structures using vectors of other things are vulnerable because those vectors are changeable by anyone who can get at them. Sigh. FIXED Problem verified with test "test/vecWrite.z". NOW GETS ERRORS Perhaps a matrix that is referenced by a non-writeable field of a record should also be non-writeable. That's likely not always the case, however. Perhaps matrix elements should include a 'private' possibility? 'private' to what, however? Hmm. Maybe they are private to the package within which the matrix type appears, and that addition makes them be a different matrix type from those so declared in another package. Other field flags, 'inline', 'noInit', and 'ro' are not applicable. 'volatile' is. DONE [So is 'ro' - only way to reference a matrix private to another package.] 090513/Wednesday Going about having 'private' for matrix types. Noticed that in Types/ instantiate0, the case for tk_record used "Package/GetContainingPackage(geni) = nil" as the value for the "isPublic" parameter of RecordNew. I see no reason for that, and have changed it to "rd.rd_writeablePackage = nil". Get rid of the general Types/Normalize - replace it with simple code for each kind of type. This was mentioned years ago, but not done yet. Hmm. Can I actually do that? The current routine is recursive, which handles more complex types. When types reference other types, can I rely on those other types already having been normalized? I don't see why not, if I do the normalization right in the type construction routines. [Partially done. There is now 'normalize' done at the end of each relevant type creation, but it is still general. All resolved] I've chosen not to implement 'volatile' for matrix types. It would rarely be useful, I expect, since a dynamically allocated shared matrix could only be used in a threading context, and it is unlikely that the matrix elements would be simple - more likely they are structs, whose fields can have their own 'volatile' specification. Any situation that really does need just 'volatile' simple elements in a matrix can simply define a struct type with only the one field, and make that field 'volatile'. UNDONE Having trouble defining the "modifiable" property for matrix types with 'private' in them. My issue is that the test for package equality in "baseTypeIsModifiable" is done on the name of the type. But, record/capsule types have a "containingPackage" within them too. Which one should be tested? Do I really need both of them? SEE BELOW It is theoretically possible to create an unnamed record type in one package, and have that available to be named in a second package. I think. This comes down to the question of what the pctx_containingPackage is when dynamic code is running. That value is normally set from the outer context in which parsing is taking place - parsing something within a package. However, for other uses of the Types and Exec code, the package can be any package writeable by the running user. In some senses, the answer to this doesn't really matter, since the situation isn't encountered when just using the regular parser. I think, based on no real evidence, that it is more correct to use the "writeablePackage" from the record type. That would also apply to using that field from a matrix type. Note that "baseTypeIsModifiable" is also used with struct types, which do not have a "writeablePackage". [Later: that proc no longer exists] One answer is to pass the type's writeable package into "bTIM". For struct types, that will be 'nil', and so writes to structs are allowed anywhere, unless the struct is defined in a generic, in which case the write is only allowed by code in the generic. An alternative is to check both package values, and only allow writes if ((either one is nil, or values are equal) and they are the same as pctx_containingPackage). Again, since only in very obscure cases will two non-nil values differ, the answer doesn't really matter. "nd_containingPackage" is really only used to indicate where the type is defined - it doesn't have any semantic use. It is the "writeablePackage" values that have the semantic use. So, lets do the first alternative of the two in the previous paragraph - pass the "writeablePackage" into "baseTypeIsModifiable", and not compare against "nd_containingPackage". DONE. LATER: there is no "baseTypeIsModifiable" anymore. Now, "nd_containingPackage" is applied against explicitly 'private' fields. Check on the use of generic parameters outside of the generic. The nasty programmer can get at them. "ContainsGenericParamSubtype" sort of does it, but why is it only checking for one generic? "GenericParamDesc_t" references the generic it is about, so perhaps all case's on t_kind that default to "allow use" can add a case for tk_genericParam which denies. The other thought from a while ago is that now that there are more fields in Type_t, there can be a bits field, which can contain a flag saying the type includes use of a generic param. Then the searching is only needed if that flag is set. I prefer the "check on cases" method, if it works. [ContainsGenericParamSubtype is in some cases required to only check for generic parameters from the one generic. In other call situations the test it does is enough, since there are other tests to prevent the use of an uninstantiated type outside of its generic. No change needed. Adding a bit flag to types likely wouldn't help much - most non-simple types are now named, and the search done by ContainsGenericParamSubtype stops at any named type, so there is actually very little searching.] What happens if I define a type within a proc within a generic. That type may or may not use a parameter of the generic. [Can no longer define types within procs.] I think the various "containing" fields of a NamedDesc_t should always be set to the values from the PContext_t. It is the presence of a non-nil containingScope that is the test for a type defined within a proc. It looks like that is fine - mostly just change the comment and Types/NamedNew. Hmm. I unconditionally assigned the "containing" stuff in NamedNew, and I get errors in generic/symboltable.z . Everything else I've checked is OK. Aha! Undoing the change to Types.c and I still get the errors, so those errors are something else. Investigate those *first*. Fixed, it was a bad comparison in "instantiate0" call of "MatrixNew". (New parameter) 090514/Thursday Look at where the various "containing" fields really need to be. They were put in the NamedDesc_t because I didn't want to have them in all Type_t's. Should they actually be in places like RecordDesc_t? I suspect that they are all OK as is, but need to think it through. DONE MORE LATER 090515/Friday Sigh. Doing lots of rethinking and cleanup. Finally got out an error message about assigning to a "'private' (here)" matrix element. But, also lots about cannot assign the vectors involved. I think I expected that because I haven't done the checks in "assignIncompat" yet. But, I'll need an 'ro' property on matrix types, so that there is some kind of matrix type that you *can* assign such a matrix to. DONE, also 'volatile'. Later - been adding the 'private' to lots of declarations. It trickles down all over the place - I need to add a *lot* of 'ro's. [Later undone] 090516/Saturday Added a modification to AssignIncompat dealing with matrix types. If the "got" matrix is 'private' to the active package, then it is compatible with a destination without 'private'. The 'private' does not change access rights within that package (it is the "writeablePackage"), so it is cleaner to not require the 'private' to be used everywhere within that package. I ended up declaring types "Lex/CommentVec_t" and "Package/RefTable_t" that contain 'private' within them. That's the way to make a matrix type with 'private' in it, that is not within a struct/record declaration. However, what I find is that there is no way to allocate such a named matrix!! Ok, I've sort-of resolved that. The issue is that the checking done for matrix types in AssignIncompat must fall-through if tGot is a matrix but tWant is not, so that checks for one being a rename of the other are done. But, since the actual matrix type contains 'private' in these cases, the variable used to hold the newly allocated matrix must also have 'private' in its declaration. I'm fine with that, but it'll need documenting. <> Note that you cannot declare such a variable using the named matrix type. That's because the 'matrix' construct returns a matrix that does not have the 'private' flag in its type, and so is not exactly the same as the type that the named type names. You need the explicit 'private' matrix variable as a go-between. This is unfortunately quite obscure. Example: type Stuff_t = struct {...}; type StuffVec_t = [] private Stuff_t; /* This does not work. */ StuffVec_t nVar := matrix([10] Stuff_t); /* This does. */ [] private Stuff_t sVec := matrix([10] Stuff_t); nVar := sVec; <><> Interesting thought. If you are running some "program" which saves some configuration/profile information, and you tell it to save that information just for the current directory, then, if you have a browser window open on that directory, a persistent object will appear (below the icons, of course) which is that persisted configuration. Cool. Just got a horrible error message. It complained that a variant record tag is not exported from the package that exports it. The problem is that I was using the field name, and not the field tag. Ugh. This was in a type case statement. Can a better error message be produced in that situation? Fixed. <> Possible absolute path to all this programming stuff: /System/Prog/Lang/ %%% Hmm. A basic problem with exports to a restricted set of packages is that there is absolutely nothing stopping someone from finding the symbol anyway - they can readily skip past the nk_targets themselves. %%% Of course, there is nothing stopping people from using Names calls to look stuff up for themselves. Perhaps the key is to restrict who Names exports its calls and type to. Then, it is all opaque to other packages. Will that work? Certainly much of the stuff in Types/Exec/Proc/Package cannot be opaque in that way. But, can Names stuff? That would help a lot. In order to do that, the actual parser code cannot use Names calls. There were two that I've replaced with Package/FindName calls. The parser heavily uses Names/Info_t to deal with names all over the place. so, that must be uniformly exported. But perhaps not the actual Names/Table_t's. 090517/Sunday <><> One of the first "document types" could be "binary". It just formats its value (vector of bits8) to the screen, showing offsets to the left. The data and the offsets can be chosen to be binary, octal, decimal or hex. You can also choose to see ASCII characters in addition or instead. Allow any and all of those formats in the output. See what "od" allows. For that document type, having a non-insert editing mode might be useful - then you can reasonably edit very large documents at reasonable cost. 090518/Monday I can essentially get rid of parse_checkLocal and parse_defineLocal. I need to move them into Types, since it is only type declarations that use them any more. I also need to make checkLocal check all of the things that the current Exec/SymbolDefined does, and issue appropriate error messages. MOVED into Package. Want a 3rd level of export targets representation. The first level is just a vector of packages. The parser must then call into the Package code with a pctx, to check that vector (for duplicates), and to turn it into a new private type, e.g. TempExportTargets_t. That in turn is passed into the various Package component definition routines. When I do the stuff to avoid duplicate vectors, TempExportTargets_t can have already looked up the vector in a table of such, and thus avoided duplicates. The same routine can issue an error if the use attempts to list the active package in the targets list. Actually, the name shouldn't have "Temp" in it - rather it is the representation of an export targets list in the table of such for each package. RESOLVED (PathToPackageVec_t) Should allow parsing of paths beginning with '.' and '..'. Shouldn't be a problem, I think - the tokens are already recognized - its just a matter of making the parser recognize them as starting units. DONE With that, perhaps don't allow use of own package name as a name - require the use of '.'. That frees up the ability to have a contained package or other item by the same name, which most people would expect for package paths, and which I've run into in code before. Also, being able to do it would be very useful, if not required, for shell-script-type stuff. DONE 090520/Wednesday Perhaps could switch to using paths relative to "used" packages for the export targets. Then, they can share the already existing infrastructure for paths to packages, etc. DONE 090522/Friday <> In IDE, can have a global-query-replace of all declared variables or formals with name "XXX" to name "YYY". 090524/Sunday Hmm. Zed doesn't seem to accept a unary +, at least for integrals. It does provide some nice symmetry in some cases. ADDED (for float too) Curious. Got an unexpected error message translating wrung.drc: @ Monster_t ro m := @Monsters[i]; wrung.z(642, 36): *** Name "Monsters" is not defined wrung.z(642, 40): *** Have 'ro' '@' type, but need non-'ro' '@' type wrung.z(642, 40): *** Value not type compatible with variable wrung.z(642, 44): *** Cannot declare variables of this type It should be "Monster", not "Monsters". Why the second message? FIXED - it was a special case needed in "modifiable" for matrix indexing. The last error message is nastier. It is happening because we are initializing an '@' variable to reference into the middle of something. That something is not known. The default for indexing is matrix. So, the code thinks we are taking the '@' of an element within a matrix. That needs protection from the overall freeing of the matrix (Exec/SaveRefToInnerVar), and that is done by creating a temporary variable to reference the matrix as a whole. However, the type involved here is "Error", since we don't have a type for the unknown name. We don't allow declaration of variables of type "Error", so the error message is emitted. Ick. I believe that in the past I concluded that I cannot suppress, in general, that error message, since there are situations in which the programmer can arrange to have "error" be the type for a declaration, without there having been any previous error messages directly in the declaration. I want the Exec sequence to be marked as erroneous, and the way to do that is to emit an error message. *Perhaps* I could just call a routine to increment the error counter without an error message in this case. However, the check is inside Types/CheckDeclarationType, called from Package/DeclarationsStart, so it would require some special-casing. Current choice: do nothing. One possibility is to make the default for indexing something unknown to be array, instead of matrix. Could be other bad consequences, however. That appears to be easy enough to do, in Exec/IndexingNew. I'll give it a try, leaving a comment as to why, and see what happens over the next years. Well, that got rid of this particular error, after I added handling of error Exec_t's in Exec_Binary/saveOuterRef. Later: Seems to be OK now. 090525/Monday This is really strange. Probably the C lexical scanner. I've been trying to comment out part of the big string in wrung.z/instructions. I've used both '//' and '/*-*/' style. They end up *NOT* commenting stuff out - the stuff I'm trying to comment out is not display, and other stuff isn't! Moving the stuff (without changing the commenting) outside the proc fixes it. FIXED. Later: without actually exploring the problem, I realized that it is my use of the string buffer for comments. You normally can't intermix, but with string breaks you can. I could just arrange to totally discard comments within a string, but I think I'll switch to using a separate buffer for comments, so that I can at least complain about them not being consumed. FIXED. 090526/Tuesday I allow regular programmers to create, assign, compare and pass around pointers. It is probably wise to prevent them from doing any pointer arithmetic, so that a pointer given to regular code cannot be modified, and so can be trusted when passed back in to privileged code. DONE 090527/Wednesday Article on IDE on /. From comments: Ctrl-Click on functions to jump there, project tree, file outline, etc. Eclipse is a bit slow to start (seconds) and uses a *lot* of memory http://www.eclipse.org/ CUAS (Common User Access Standard) KDevelop 4 QtCreator http://www.qtsoftware.com/products/appdev/developer-tools/developer-tools#qt-tools-at-a One key thing is that for small projects, the way things like Eclipse force you to make a "PROJECT" out of it just gets in the way. Added code to Exec/assignIncompat, *after* the skip down into subtypes of ref, pointer and matrix, which checks for either being "Error", and returns with no more errors if so. I believe this is safe, and it does help get rid of some unneeded error messages. <> I can't try "Any" stuff until I've got the '#' operators, etc., but I had a couple thoughts just now and would like to put them down. Here is a little test proc that has a local variable that changes from "uint" to "float" to "string" over time. use /Any; proc doit()void: any var #:= 1; [] any myVec #:= getVec(); Fmt/FmtL("A: var = ", var); for uint i from 0 upto getBound(myVec) - 1 do var #:= var #+ myVec[i] #* 0.27; od; Fmt/FmtL("B: var = ", var); var #:= "The final answer is: " #+ var; Fmt/FmtL("C: var = ", var); corp; With a different parser, designed to not use types, the above could omit the type in the declarations of "var" and "myVec", and omit the "uint" in the 'for' loop. It would also use operators without the leading '#'. One tricky bit is that some registration is needed in order to allow the Fmt code to properly deal with 'any' values. Right now, it will just print an address. Perhaps it would work if /Any exported a type, e.g. /Any/Any_t, and that type has a "fmt" proc attached to it. Hmm. I don't think you need to use 'any' for this - just define your own variant record, and define all of the '#' operations for it. The use of 'any' is only needed if we want to allow straight ":=" for assignment. Hmm. Even that is not needed if all values are of the variant record type. Only the operators need to be '#' ones. <> Need a way to let code like an "Any" package be involved in declarations of variables (and constants?). That way, they can allocate and maintain private information relating to each variable. Then, if the operator procs are actually 'compileTime', they can accumulate information about the use of the variables throughout their scope. With that information, they may be able to immediately optimize by changing the variables to be of some native type (e.g. 'uint'), and letting all of the operations be the corresponding native ones. This is a nice goal, but its going to take some work to get it all designed. 090528/Thursday What does "Fmt" do with character arrays? Ick. It prints "". Perhaps I need a way to easily convert from 1-D character array to string. Perhaps vice-versa, with allowed truncation. Or, do I just not worry about character arrays as a mainstream thing in Zed? I started a bit into making "Fmt" handle them a bit better, but it would have to generate a "for" loop in the code, with checks for just loop-unrolling for small enough arrays. Too much complexity for very little benefit. <> Draco made good use of arrays whose bounds were indicated as "*" on procs which accepted them. At run-time, such a proc could obtain the bounds using the equivalent of 'getBound', but the mechanism was different - extra parameters were passed in, and then referenced. This could be similarly useful in Zed. It's a different style of programming, where there are fewer allocations, there are fixed run-time limits (sometimes that is a good thing!), and the run-time cost is somewhat less. If that is done, there could be a CharBuffer entry point which accepts such an array of characters (perhaps a second which looks for left-justified non-spaces). There could also be an X entry for using such arrays for text output. That would have been useful in Wrung. However, it might have been easier to just use strings in Wrung. There was some earlier writing about array slices. Or perhaps it was only matrix slices? It would be useful, with the goal of making Zed convenient to use, of supporting slice and array assignments. LATER: I believe it was only matrix slices - the concept came about because of the desire to do substring operations that way. From an old email entitled "Zed amusing": The fix I mentioned earlier about removing an unneeded line has just been added back. However, it is in a slightly different place. That makes sure that I get an error message if a constructor tries to call its base capsule's constructor in the void-like way. Further uses of the constructor will yield a new object of the base capsule type. %%% Even that isn't correct. I'll have to push/pop the flag, so that you can create an object of your base capsule type inside the parameter list to your required call to that capsule's constructor. Sheesh. From another email, entitled "whither DGol/Container_t?": If Container_t remains an interface, then each implementer must implement GetSelfAsContainer and GetWindow. If it moves into the Widget tree, then only Container_t does GetSelf..., and GetWindow becomes a field reference. %%% Thinking about the issue of programmers being able to look through the names in Names/Table_t's. Current thought had been to limit the exports of the various Names routines to only packages like Proc, Exec, Package. Another possibility is to change the meaning of 'private'. If the record or capsule itself is not 'public', then perhaps 'private' fields are not even readable outside of the package, since the lack of 'public' makes the fields not writeable. Perhaps a better solution would be to introduce an explicit 'hidden' property. <> Language statement: the language assumes that a capsule object will not be accessed in another thread or other execution context until after its explicit constructor has returned. Making the object visible to another thread/context before the constructor is complete is a violation of the language, and yields undefined results. Can get rid of the restriction that 'private' requires 'noInit'. Sure, that might be a bad thing, but if you have an explicit capsule constructor, it doesn't matter. Also, the initial contents of a record or capsule object constructed by an implicit constructor might not be that important - later use of the object may be what needs protection. Simplify the language. DONE 090529/Friday An issue that has just come up when trying to make all references to things in any package be a NameReference_t, is that of names in the private or local tables. They are accessible only within the main package. Normally, a path won't see them, since it only looks in the exports table of packages in the path. However, there cannot be a path from a name found in the private or local table, since contained packages are always in the exports table. So, a nil PathToPackage within a NameReference can be used for those names, just as it is used for names within the current (".") package. However, that means that there is no direct indication of which table such names are in. That shouldn't matter semantically, since you can't have the same name in more than one of them, since all are "in scope" within the package. Note that the above is yet another reason why we don't want to allow code to arbitrarily add things to any of the package name tables. <> Should make the IDE do TAB completion on identifiers of all kinds. This includes path completion, just like in a shell. It may in fact be possible to allow spaces in path components, and to still use those freely in Zed code. It could all be handled by some lookahead in the path resolver (Package/ParsePath at the moment). A path component containing spaces will appear at the lexical level as a sequence of names. Later: no! Could it also handle wildcards and/or other path expressions? Should the concept of "path" be introduced as a type and entity in the programming language? Is there any benefit from this? Later: now that you can put paths starting with '.', '..' and '/' right in your code, there is likely no real need to add the "path" concept to the language. 090530/Saturday Allowing spaces in path components pretty much means allowing spaces in identifiers. The comes from the possible 'use' of package whose name contains a space, thus requiring that the context of seeing an identifier must allow for the spaces. So, it would likely make more sense to do the handling in the tokenizer. It might work. Should I? One area that could be a problem is that of CLI commands - the Unix shells manage to work with spaces in file names, but its ugly. Aha! At supper Don mentioned declarations. That is one common situation where Zed has two identifiers side by side (type name and variable name). I think that pretty much shuts down the idea of allowing spaces in identifiers - I really don't want the horrible problem of trying to figure out where the actual gap between identifiers is. The magic rule is relevant here: if its hard for the compiler, then its hard for people. <><> The concept of tab-completion can be expanded to include the completion of constructs. For example, if the programmer has just entered 'if', and the cursor is on it or just after it, then hitting a pair of TABs could generate the remaining fixed portions of an 'if'. Similarly, and even more drastic, doing the tab-tab inside a capsule would generate dummy procs for all of the interfaces that are required (skipping ones not required because of 'partial's). Without that, programmers are going to see a lot of red. A capsule is marked as erroneous until all needed interface implementations have been provided, for example. Perhaps the programmer should have control over when the IDE calls the parser. I believe I've thought of that before - this just reinforces that desire. Later: this is partially handled by now only showing the interface/capsule/generic header with the error indication. <> When tab-expanding at the beginning of a CLI command, separate out the possible completions based on where they come from. E.g. "builtins", "commands in current package", "commands in ", etc. This may work best if a pop-up window is used. Such a window should specifically not cover the current input line, and it should go away without any mouse use when the ambiguity is resolved. It should likely allow selection with the mouse, however. 090531/Sunday %%% Limiting the character set for program identifiers is one thing, but limiting it that much for general package and data item names is too much. One possibility is to place the restriction only on identifiers that are explicitly used in programs. But, consider what happens if someone renames a package or item, using say the package browser. Nothing looks odd from that point of view, but if a piece of code had been using that name directly in code, it is now invalid. So, it looks like some kind of quoting scheme will be needed. The usual quoting schemes are already used in Zed for string and character constants. There are "<<", ">>", but those are already in use as the usual shift operators. Similar for single angle brackets, braces and square brackets. An idea that popped up for me is to not use apostrophes for character constants - just use double quotes like for strings. The tokenizer cannot tell the difference between a single character string constant and a character constant. However, the parser in general can simply accept a single character string constant as being a character constant, and modify its ex_kind to exk_charConstant from exk_stringConstant. I should try this out first, but if it workable, then I can use apostrophes around identifiers which use an extended character set. A bit strange perhaps, but I believe it is workable. The pretty printer would need to examine the characters of an identifier in order to know how to print it. Alternatively, there can be a flag in many of the places which have identifiers. Note that the flag would need to be updated by *any* mechanism which changes the name. <> When the IDE is asked to produce a "GUI project", and creates all of the code, etc. for creating a main window, menus, dealing with menus, etc., one of the menu items it should create is "Preferences". It should be able to create the complete code, etc. needed to deal with preferences. It can create a struct in the project package called "ToolPreferences_t", and another called "ProjPreferences_t". It can even create a package-level variable (declared as 'local") of the first type. Each struct will contain just one field, called "tpref_replaceMe"/"ppref_replaceMe". Type can be anything, like 'uint'. The "preferences" menu will bring up the preferences editor, which is a Zed library routine that basically does all the work. It should have hooks which can be set on a per-field basis, that allow the program to constrain preferences fields. The library code can read the current preferences values from the proper places (I'm not sure if I want to go with scanning the path from "~", or put all of them in one directory such as "~/.Preferences" (which is introducing the concept of a "hidden" file)), allows full editing of them based on the field types, and can save them to the appropriate places. Any type that has an "editor" is a valid preferences type, all the way to images, movies, sounds, etc. If the preferences are all in one common place, then there can be a "preferences" tool that uses the "+-tree" style interface to allow editing of all of the preferences. Note, however, that it won't itself be able to constrain the values like the actual programs can. Perhaps there could be a way to mark the fields as only editable by the real program - the general tool can only view such fields. If there is such a tool, then it will need to present the individual preferences hierarchically, which means that there would need to be a system-defined hierarchy, and all programs must say where there preferences fit in that hierarchy. Getting a bit messy. This also implies an install/uninstall process which extracts the preferences structure and documentation from the program being installed, so that they can be properly added to the preferences. It *would* be possible to do that on demand, using the actual structs in the actual programs, but there still needs to be the hierarchy. Hmm. Perhaps a program should simply contain, in its top-level package, an object of type string which lists its position in the preferences hierarchy. A '//'-style comment in the 'struct' type that defines the preferences is the short description of the field, and it can show up in a rollover box within the preferences editor (either kind). There can be a "field" comment before the field that has a larger description, and it will be displayed if the user asks for it (not sure how - perhaps the ubiquitous "HELP" key). Later note: it might be simpler to mark "sections" by having inline structs - each such struct is a new tab. Also, going straight from comments and field names means there is only one language for the text. Perhaps there needs to be a convention for putting resource numbers into the comments, so that multiple languages are supported. Possibly, if those numbers are not found, just use the comments, as described here. 090601/Monday <> The whole issue of spaces, punctuation, etc. in "identifiers" comes out of my decision to unify the user-visible "folders-and-documents" concept with the programmer-visible "packages-and-procs/variables/constants" concept. I don't think I want to reconsider that decision. I still think there is a lot to be gained by unifying the concepts. For example, having only one "view/edit" environment is good, I think. Being able to persist values as simple items in a folder, and still access them trivially from code is a good thing. There is clearly a conflict. Within code, I want identifiers, etc. to be restricted to the Western subset of letters, digits and underscores. For general "folder" and "file" names, I want there to be no restrictions on the character set available. One possibility is to basically do nothing about it. Thus, within code, it is only possible to access those packages/etc. whose names are valid identifiers. Outside of code, you can do whatever you like. But, what happens if a package has an identifier name, gets used in code, and then is renamed to be non-identifier, using tools outside of code? The answer to this depends on "linking". If a chunk of code has never been loaded into memory, and a package it references gets renamed to another valid identifier, what happens? (Similar questions arise relating to other ways that "paths" can become invalid.) Is the reference in the code a "hard" link, or is it is "soft" (symbolic) link? I believe that such references should be "soft" links, that need to be resolved when the code is loaded into memory. An example situation is one where someone writes a "script" that does something with items in a package named "Current". The user cycles through stuff on a regular basis, and the script is part of that cycling. The user may be renaming packages as part of the cycling process, and so wants the name "Current" to refer to the latest package with that name, not the package that had that name when the script was originally written. In this case, the "soft" link is what is desired. There are likely examples wanting "hard" links too, but I think most of those will be using paths that aren't quite as relative, and which do not change very often. Such a user is likely happy to modify the script when the package is renamed (or the first time the script fails to load). However, if some code is in-memory when a package it refers to is renamed, the in-memory reference to the target package is not invalidated. If the code is viewed, the package will have the new name, which may no longer be a valid identifer. What should happen? Perhaps one possibility is that the pretty-printer never actually looks at pk_name when it is printing things - it always simply shows paths. This concept has only just ocurred to me, and it is one that I like. On second thought, maybe not. The result is that code would run, continuing to refer to the renamed item, but examination of the code would show something that either refers to no item, or refers to some other item. So, the pretty-printer should likely use the names directly from any packages involved in paths - the paths are only used when actually resolving these references. So, this leaves me requiring a way for code that is pretty-printed to show non-identifier names in paths. Note that there is nothing preventing code from making full use of package and data names that are not identifiers - it just cannot make references to them directly in the code itself - the references must be indirect, through strings, references to packages and Name_t entries, etc. I'm not a big shell-script user, so I don't really have the right instincts to know how this will affect those. I do know that the Unix shells can deal with such names, but the escaping mechanism is a bit ugly. One remaining issue is that of actual command lines. At that level, users will need to be able to access non-identifier names fairly directly. The standard solution is the various escaping mechanisms used in shells. As mentioned above, those can be a bit ugly. If the parsing of commands is done using the normal tokenizer, then that tokenizer will need to be able to deal with such names. This could be a different tokenizing mode, where it allows a much larger character set in "identifiers". However, then we end up allowing that character set in non-path identifiers, such as field names, local variable names, etc. I could simply disallow that at the language level, by having two kinds of name tokens produced by the scanner. (Another brand-new-right-now idea.) I expect I could only do that restriction for certain kinds of names, however. Some package-level symbols would need to allow the larger namespace. Here's a thought: if I do the "one-character string is char" concept, and use apostrophes as an extended-identifier quoting mechanism, then a hard rule of acceptability of Zed code is that no names (identifiers) within the code use the extended identifiers. That will be imposed for all names - local variables, field names, directly referenced package names, proc names, etc. The check is easy to do automatically. So, it is "valid" Zed to use apostrophe-quoted extended names, but it is not "acceptable" Zed. The former is enforced by the scanner, compiler, etc., but the latter is enforced by people. Having made changes to the Exec code to use single-character string constants as character constants, I find the problem is with the code generator - when it has a string constant, it doesn't know whether to deal with it as a string reference value, or as a simple character value. Ick. This sounds somewhat like the same thing with nil versus nul - so I may have to deal with it in the same way. Hmm. Shouldn't there be an issue with comparing a pointer versus 'nil', just like there should be with comparing a char versus "a"? Ah, there are actually calls to "MakeNull" in Binary. Ick. Having done a bunch of the changes to use single-char-string-constants as character constants, I thought of situations like this: char ch2 := if flag then "a" else ch1 fi; This is hard to handle - I don't know that "a" needs converting until I see the ch1. There could be several, and they could be nested further. Then it ocurred to me that I could use ''stuff'' for non-simple names. That can be handled by the lexical scanner without too much trouble. Good. However, I'm eventually going to need unicode (or whatever) char and string constants. It might be good to use ""stuff"" and ''X'' for that. What does C do? How does X handle internationalized strings, etc.? Maybe the absolute simplest thing to do is to have the pretty printer just replace all non-Western-ASCII characters in names with '_'. Its simple, and it works. [Later: no longer need ' for char constants - the stuff with a single-char string constant now works properly. This was aided by new stuff for dealing with the alternatives in 'if' and 'case' constructs. So, ' is now free for other use.] 090602/Tuesday There are several calls to "CreateReference" from packages like Debug, Fmt, Lists, etc. That functionality is needed. Could call "export(xxx, yyy, ...)" a "qualified export", and "export" an "unqualified export". An issue I can't seem to get past is that of access to symbols in the same package as code that is running (often at compile time). Clearly such references should be just fine. However, normally references to things like private symbols are restricted to code that is being compiled for the package containing the private symbols, and so is in pctx_containingXXX. When code is *executing* however, those pctx references are not there. Hmm. If compileTime code in package XXX inserts a call to its private proc "yyy" into code in package ZZZ, it seems likely that code in package ZZZ can examine the code produced by the XXX code and find proc "yyy", whether "yyy" is private or not. Thus, perhaps the answer is again to "do nothing". Only a package contained in pctx_containingXXX can reference non-exported symbols in that package. MORE LATER I had thought about splitting out the concepts of a RemoteReference_t and a LocalReference_t. The former is a reference to a name in some other package, and the latter is a reference to a symbol in the current package. The problem is that in some cases I need either one. For example, when specifying the interface that a given interface extends, the extended interface can be in the current package or some other package. Needing to pass in and preserve both kinds of reference, and having one be nil all the time, is not a good way to save space. Saving space is what the splitup was going to be about. 090603/Wednesday <> What kind of arguments do 'command' procs accept? Perhaps an optional initial one that is a record or struct containing things like the current package, current user, etc. Will likely want a kind of parameter that is the result of calling Package/ParsePath on the user-typed parameter. That works for finding if a symbol exists, and if it doesn't, allowing it to be created. Well, I guess I expected it from the changes I was doing to make NameReference_t work better. I have to make all of the little formatting routines in Fmt exported, since the Fmt code generates calls to them in the code that it is operating within. DONE 090604/Thursday Hmmmmm. I have no way to get a NameReference_t for the root package! FIXED. Symbol links are an issue. With them, a ".." does not necessarily get you back to where you were, since it goes to the parent of the current package, rather than to where you were before. So, I really need to allow for and preserve a full general path. Perhaps the right thing to do is to have ParsePath always build and return a path structure. A lot of the time we will discard it, but perhaps that doesn't matter. RESOLVED When working from a CLI, will likely need a "FollowPath", that simply returns the Names/Info_t that is at the end of the path. None of the other things that ParsePath returns are needed. Later: it's much better now. 090605/Friday Want ExportTargets_t to contain a vector of PathToPackage_t, and not a vector of Package_t. It ought to be able to share parsing code with things like ParseUsePath. Want to actually move the checking of 'use' names right into ParsePath, I believe. Right now it is in both ParseUsePath and ResolveName. DONE. Hmm. The way I do generic handling of comments in the parser, using utility "handleInnerComments" will basically require a capsule that is extended by the various places that use that proc. That means that the temporary struct for constant/variable declarations, Package/TempDecl[Private]_t will need to be a capsule or at least a record type. It won't work with it being an '@' to a struct type. So, that was a bad one to try the '@' struct experiment on. 090611/Thursday Just had a bug with a capsule "extends". A name was given that was not a capsule. The parser gave an error but hadn't cleared "nrExt", and this then got a SEGV inside capsule code, since it was testing nrExt. This suggests that I should perhaps only pass in the nr, and not extract the extended capsule in the parser. DONE. Should do for the other cases where I switched to passing both in as well. DONE. 090612/Friday Reminders to do: - check qualification lists in redeclarations (should be the same) DONE - (may not be needed, not sure) make C code in Types.c able to do qualified exports. DONE, but turned out not needed. - if the experiment of using struct types for TempXXX fails, then undo the one used for constant/variable declarations. See test/privStr.z (2009-06-20 - with structs now like records, that test fails properly.) - allow for targets lists in several other package element kinds. Hmm. Perhaps not really needed - just on types and procs. Ah, perhaps on interfaces, generics and capsules too. Hmm. Also variables. May as well on constants as well. DONE. As of this morning, Display is not showing some type names/paths correctly in test/privStr.z and test/proctest.z . FIXED. 090613/Saturday %%% I think I should not need Package/FindPathToPackage at all, except perhaps in Package/CreateReference. Instead, the two cases in Display that now use it shouldn't have to, which means that the type names they use it with should actually have NameReference_t's. I'm thinking perhaps that the tk_named alternative should perhaps be a NameReference_t whose nr_info references the NamedDesc_t. Hmm. When you use a capsule or interface as a type, I have a NamedDesc_t for them, but where do I put a NameReference_t? Also, should I have a separate tk_instantiation which indicates that a type is from an instantiation? That way I should be able to avoid looking at all of the instantiation elements each time. Or something. [Routine Package/ TypeIsInInstantiation is used only in one specific instance (NamedNew), and that special case needs to do the search. The key with that special case is that the original type will be found in the instantiation, but a rename of that original type will not. Removed TypeIsInGeneric. The remaining scans of the generic appear to be necessary.] You can now use field selection right after a Package_t expression and it will select a field from the Package_t. This does not work for either Type_t or for procs. It doesn't work for Type_t because there is already a syntax which uses . , and that is a tagged constructor for a variant record. It doesn't work for procs because the type of a proc is not Proc_t, but is the proc type dictated by its parameters and result type. %%% Perhaps Zed is completely broken. All Package, Types, Proc and Exec structures are completely examinable by anyone. Thus, any type can be found and examined, any proc can be found and perhaps called, etc. An earlier comment said that qualified exports might not really be much use, since you can always look through the ExportTargets_t to find the actual value. With this situation, Zed is perhaps a nice language to program in, but you cannot use it for situations in which information must be hidden. I would *like* to be able to hide some types/procs/etc., but perhaps that isn't crucial. What is crucial is being able to hide data. Currently, I don't think there is any way to access local (stack) variables that you don't have an '@' to, or are in your frame. Also, I don't think you can currently access package variables. Package *objects* if/when I ever get such things, are a different matter. However, if you have a reference to a record, it seems likely that you can look through the type/code of whatever gave it to you, and so find the definition of the record. With that, you can likely dynamically create code to read the fields of it. I don't think you can modify the fields, if the record was not 'public', or the field is 'private'. [Later: just having the record type descriptor allows you to read the fields of a record object. Even later: not if the fields have been marked 'private'.] Another note: for a long time there will be something like a "ZedWorld" file. If there are Zed means to open files on the host file system, then a programmer can read ZedWorld, and extract anything they want from it. Given the things that operating systems allow you to do, protecting against this is going to be tricky. Perhaps read any file that Zed code asks to open, and check whether it is a ZedWorld or not. If it is, then do not allow the open. Also do not allow any host OS calls that would allow things like switching around of file descriptors. Perhaps a test could be worded this way: there is a package /Sys/Secret in your Zed system. It contains a string object named "Word". Write a non-privileged Zed program that shows the value of that string. One possibility is to add another field attribute, "readable". By default, fields of non-public records are not "readable". The non-public records in the system packages would then have "readable" specified for all/most of their fields. That should keep other new record types not readable, even if references to values are passed around. The key here, I think, is that I *want* all/most of the Package, Types, Proc, Exec types and values to be fully readable, whereas most programmers do not need or want that. [There is now the 'private' field attribute.] Mentioned somewhere above is the idea of a system "FieldRead" proc, which is privileged, and carefully reads the value of a field of a record, given a record reference, a reference to the type of the record reference, and a reference to the field description. If I have such a thing, either it should accept something similar to Package/PContext and so verify the access the same way that the Exec code does, or it should simply refuse to access fields that not everyone can read. Similar things could likely be done with a "FieldWrite", but again, it has to be very careful. Is this good enough? Lets say someone has an entry point in their private code that returns a reference to an anonymous record type (one that is exported, but is not 'public'). Zed allows programmers to use that proc as a value. From that value, they can obtain the Types/Type_t of the type, and thus learn what the fields of the record are. I think they now won't be able to examine or modify the fields of the record. However, there seems to be no way of hiding all of the data structures reachable from that first record type from outside programmers. One thought I had had was to make "Types/SkipOneName" specifically not skip a name if the named type is itself named. That would let anonymous types work. However, there is nothing at all that prevents someone from writing their own version of the proc which does not have that restriction. Similarly, since they can look through any Exec_t structures, they can get at all code reachable from the one entry point they are allowed to call. Hmm. Perhaps the key here is to not allow a proc to be used as a value unless it gives permission? Again, that goes back to the better way of having things - the default is no access, but my system stuff will allow access explicitly. Similar for types. So, for procs, it could be: export(MyFriend) ProcType_t: proc readable compileTime myProc()void: ... corp; For types it could be: export(MyFriend) type readable MyType = record public { ... }; Will that work? It helps, but if all of the Package structures are fully readable, then a programmer can still get at everything. That suggests that the package structures *not* be readable. Instead, I should perhaps provide a few accessors that carefully provide limited access. This is similar to the idea written about earlier of not making the Names stuff public, since I don't want people to be able to add things to my Names/Table_t's. This makes sense, since I don't in general want people to be able to look through the contents of all packages in the system - there will be access controls on packages, similar to how there are acess controls on folders/ directories on other systems. Perhaps "readable" isn't the right word here, given that you would need to specify it to be able to use your *own* procs as values. And that would mean that anyone could use them as values if you exported them. So, it would seem that you shouldn't export such procs. That should be OK. Ick! Exporting any "readable" proc means that others can look inside it, and then get at any procs it calls. Is this a big issue? Likely not, since if the user wants a proc pointer, they can just create their own proc that does nothing except call the first one. Also, the "readable" property can interact with packages, in that code in the same package as the proc can always use it as a value. At supper, no better suggestion for the reserved work was found than "value". Check into whether you can grab a method from a capsule or interface value and use it as a Proc_t value. I think that is already prevented - something about the type of the actual Proc_t not being known at compile time. Both are actually allowed. But that shouldn't be a problem since you can get the Proc_t values from the interface/capsule contents anyway. You can't declare a proc type with 'poly' in it, so you can never 'procAssign' such a Proc_t to a variable in order to call it. Can you get around the above by dynamically generating a call to it? [Yes, you can, but it doesn't gain anything - you can't call the uninstantiated one anyway.] Hmm. Maybe there is an attribute of types, somehow, that makes values of the type not examinable. E.g. even though a variable is of type Rec_t, it is actually of type "hidden Rec_t", which prevents selection of any fields of the record type, either for reading or writing. That attribute works like 'const' and 'volatile', although it would seem to need to be an actual type attribute rather than a location attribute. When doing something like: hidden OtherPk/Type_t := OtherPk/NonPublicType_t; the 'hidden' attributes must work like 'ro' does. If the assignment goes out of the owning package (OtherPk), then the 'hidden' attribute is automatically added to the Type_t value type unless the type is declared as 'public'. Dunno. A problem: if other packages are allowed to declare their own variables of your private type, they can get at your type in a way that it is no longer private, by simply looking through the Exec_t stuff for their own proc, and finding the local variable declaration (or similarly finding some other use of your type). A later thought: what if there is a new kind of type, tk_private. By having it be a kind, it is actually there in the type structure. That variant is a small record, and that particular record type is private and not exported from package Types. Furthermore, there is no exported routine in package Types that will skip past a tk_private node in a type. The query routines, like IsRef, will skip over it, however, since that is needed in order to be able to make much normal use of the type. Is this workable? Does it "shut the door" for types? Dirty pool! An article in Slashdot mentions how someone can learn a lot about your recent browsing history without you having Javascript turned on. They have a big page, which contains links to lots and lots of pages that you might have visited. They use CSS to use different images for visited versus non-visited links. Thus, by watching which images your browser requests, they can see which of those pages you have recently visited. 090614/Sunday Does the 'private' type kind actually help? Since Types/Type_t is fully exported, and I likely need to allow it to be used as a value (is that true?), you can simply look inside it and so find the definition of the tk_private variant. Presumeably then you can dynamically create code to skip over a tk_private node. The basic problem with all of this is that because you can look through all of the Type/Exec/Proc/Package structures, you can violate the compile-time rules at run-time. Is there any reasonable way to prevent that? One idea is to make the various structures *not* readable by everyone. Instead, publish what they look like, and provide accessor routines for everything. Those accessor routines are constructed so that they *do* follow the compile-time rules. An approach that seems to work is to change field properties so that a field can be unreadable outside of its package of definition. (If it needs to be readable in some specific associated packages, then there can be qualified export accessor procs.) I don't think it matters whether this is unreadable because of a field property, or unreadable because the record type is not exported, or for some other reason. The key is that the access to the field is done by Exec code, and Exec code requires a PContext_t, which in turn contains a Package_t reference. The Exec field-accessing code can check the PContext_t Package_t against the Package_t that the record type was created in (or is private to), and deny the access. With this, there is no way to skip over a tk_private node. For procs, I'm leaning towards the idea of requiring a flag on the proc which allows the proc to be used as a Proc_t value. Currently, the assignment of a proc to a Proc_t destination just happens, but I can change the semantics of it to require a run-time check. Since I'm hoping that most code in the Zed universe will be pretty public, perhaps the default is that a proc can be freely used, so a flag is needed to prevent it. So, 'private' can be used for that purpose. Hmm, back to types. There is room for a bool flag in Type_t. So, perhaps add a 'private' flag just like for procs, and do the same thing - prevent such a type from being used as a Type_t value. No need for tk_private then, and field accesses don't have to change. The key will be that you can't get at the Type/Exec stuff within someone's code if they have marked all exported types and procs as 'private'. Of course, I need to protect the Package_t fields to prevent others from finding things in packages that they aren't supposed to, and that might be best with non-readable fields. Or perhaps just don't publically export Package stuff. Further, make the conversion from a proc to Proc_t, and from a type to a Type_t require an explicit proc or type constant. Thus, the check can be done right at compile time, and no run-time check is needed. Later: Wrong! The run-time check is needed. Consider a proc, written by the external programmer, that calls a 'private' proc or uses a 'private' type. All the programmer has to do is look inside his own proc, and so obtain a Proc_t or Type_t of the 'private' entity, from the Exec_t structures of his own proc. Later: but, *how* is the run-time check done? Is it all or nothing, i.e. a 'private' proc or type simply cannot be used at all? Even that is ugly, since the check would be needed on any assignment of Proc_t or Type_t, since at that point that's what the values already are. %%% Early morning thought that *might* work! Introduce new tk_private and new exk_private. The former points to a record containing the package of definition of the subtype and the private subtype itself. That record has unreadable fields. This is introduced by the 'private' keyword used as a type constructor. Outsider programmers can get hold of the type, but cannot go down into the private type. For procs, don't do it at the proc level, do it at the Exec_t level, with some kind of 'private' construct. It does the same sort of thing (except the package is not needed since it is already known from the proc, but perhaps it is useful anyway, for Exec_t's that are not within an explicit proc) - it produces an exk_private node which points to the Exec_t that is private, and that exk_private record type itself is not readable outside of the Exec package. The pretty-printer itself may not be able to go through tk_private and exk_private nodes. The only way is if it can call into Types/Exec procs to ask permission. In order to safely do that, it must pass in something like a PContext_t, which contains the package relating to the active InteractiveSession context which is doing the pretty-printing. A bit more thinking suggests that the 'private' construct for the language can work just like the 'private' type kind - just put the keyword in front of some other Exec_t. On pretty-print, if the pretty-printer cannot go down past the exk_private, then that is all that is displayed. If it can, then display the 'private' in front of the hidden Exec_t. 090615/Monday When comparing qualified export targets lists, ran into a problem when trying to switch from the new, different one to the old one when defining a proc after a pre-declaration of it. The old one, from the Names/Info_t, is essentially "[] ro Package/PathToPackage_t", but the new one does not have the 'ro'. Changing would propagate through into the Package code, and not let it modify them. I think the right answer here is to export type "PTPVec_t" from package Package, along with a routine to add a new ptp to an existing PTPVec_t. That type can use 'private' in the vector type, and things should then work out, just as they did with Lex/CommentVec_t. DONE 090616/Tuesday The only remaining uses of "parse_defineLocal", which calls DefineName are for several type declaration things: enum tags, set oneof tags, and the selector, index tags and field names in variant records. I don't think the selector and field names need to be defined. The thing is, I don't think I actually want to export Package/DefineName, since it allows a caller to define any kind of Names/Info_t in a package's tables. All of the other names that are defined are subject to close control in Proc/Package, and I think that is better. RESOLVED Note, however, that I think I went through the issue of defining names before. I may have decided to continue defining the field names at the scope level in order to force programmers to make them all unique. With no definitions at the scope level, Zed becomes like C, in that you can use the same field name in as many different structs/records/unions as you want. I personally don't like that - I prefer the use of taglets. Hmm. I think this means that currently I am checking against the scope, but not defining in the scope! Nope - the only check is for the variant record index tag. Why are the names of generic types and procs added to the package symbol tables? Aren't they always used qualified by the generic or by an instantiation? Hmm. It could be because those names can be used directly, from stuff *inside* the generic. That stuff also needs to be able to reference those symbols via a NameReference_t. Different ways to handle such symbols could be created. That would allow a symbol inside a generic to be re-used outside the generic, or inside another generic in the same package. I don't know that I would be happy with that, though. Mind you, packages like Lists, that do several similar things, might end up wanting to do that. Dunno. Do nothing for now. Same day: ALL DONE. Well, not quite, there are some issues. Sigh. NOW ALL DONE. <> Pretty much any error box or requester that comes up in the GUI should have a "Help" button explaining what went wrong, with suggestions as to why and what to do. For example if you are in a package browser and you are creating a new package within the current one, and you get a name clash, an error box will come up saying something like "Name "XXX" is already defined in package "YYY". The "Help" button could bring up a box saying something like: Users: The name you entered for the new package is already used inside the package you are trying to create the new one in. It could be the name of a package or subpackage, or it could be the name of some data item (picture, text document, etc.) in the package. You can switch to "sort-by-name" display to more easily find it. Programmers: The name might also be a package variable, capsule, generic, enum member name, etc. Basically, any programming symbol used at the capsule level (whether private, local or public) will prevent the creation of a new item with the same name. 090618/Thursday Heck, for the symbols coming out of types that I *do* want to define, those symbol definitions should be done entirely within Exec/Proc/Package/Types code - there should be no means for external code to define symbols. What they *can* do is create packages and subpackages, and add data items (whatever those turn out to be) to both. DONE When defining symbols in a package, we check for a 'use' by that name. Should do the reverse check when adding a 'use'. DONE <> For CLI commands, it could be useful if the infrastructure is such that folks can use it to write their own command set. Think about the commands on the YY console, or the ones in yycli. One need might be for a way to say what "kind" a parameter is, so that code provided by a programmer can do tab-expansion on things of that "kind". Hmm. Pretty-print (or lexical scan) of nested /* */ comments loses some of the /* */ stuff. FIXED. 090619/Friday Skip past the tk_named of a non-public record type. Using that record type, attempt to write the fields of the record, or construct the record, when we shouldn't be able to. Hopefully the "default:" case checks in "modifiable" will prevent the writes. [Works right - see test/privSneak.z] 090620/Saturday I have to do more checking in Exec/ProcCheck. What it currently is about is making sure that all local and proc formal references are to those from the current proc, and not weird references to a chunk of code in another proc. However, a piece of code from a proc in another package can be using only package-level variables, and can be doing something that is supposed to be done only in its package. A reference to a chunk of code like that, inserted into a new proc, is invalid. Part of the checking could be done by calling "modifiable" on assignment left-hand sides, but there are also invalid reads that must be prevented. RESOLVED with new Copy verification. By the implementation, the concept of a 'private' field refers *only* to the use of that field within generic code. Such a field is not modifiable anywhere else. 'private' on a field does *not* refer to modifying the field in packages other than that in which it is declared - the 'public'/ 'private' attribute of the entire struct/record determines that. The effect is that if a record is not 'public', or a struct is 'private' then all fields are 'private', whether marked that way or not. The issue is that without the "writeablePackage" field, which comes from the overall record/struct 'public'/'private', I have no way to know in which package fields should be writeable. Hmm. I *could* use nd_containingPackage. Gah! I've just done the changes to make structs work the same way as records in terms of public/private and private fields, and find that the problem now is that even though I've exported type TempDeclPrivate to package Exec, package Exec cannot write fields of the struct. I think qualified export of types is not much (if any!) use. Well, I guess it does let the export targets at least read the fields. 090621/Sunday <> A next step is to make a way for some struct/record fields to be not readable outside of the defining package. The 'private' word seems best for that, but then what do I use for fields that are read-only outside of the defining package? You could use 'ro', but then I need something for what used to be 'ro'. I could use 'const' for that, but is it confusing to have both 'ro' and 'const' available, but with different meanings? Perhaps there are cases where I don't actually need the full control on a per-field basis. Then, I could put more of these tokens after the 'record' or 'struct' in the type. [Went with the 'private'/'ro'/'const' set.] <> Note for when I add string indexing: the string is 'ro', so cannot use this to assign to individual characters in strings. E.g. string constants. [OK] <> Wide chars in C. E.g. wcscmp(). 090622/Monday I've been reading through this file. Got all the way to the beginning, then started in on 020413-initial as well. Stopped on it. So many good ideas, so long ago! Even using "Zed" as the name was done back then. Sigh. Looks like I wasted nearly 3 years on bundles. <> One thought from back then that sticks in my mind. Allowing what are now called capsules to be exported from a package has a downside. That is that changing the layout of the capsule (adding/deleting fields) makes any code, persisted data, etc. invalid in importing packages. The thought from way back can be translated to: don't allow capsules in other packages to be inherited - only allow that for interfaces. You can certainly use refs to values of those capsule types, you just shouldn't create them or directly reference fields in them. That keeps the effect of field changes localized to the package that defines the capsule. Is this practical? What about Exec/Types/Proc/etc? The structure of those record types are known everywhere. I *could* provide a whole bunch of accessor routines, which thus hide the actual structure of the records. The basic nature of those types as variant records must be known outside, however. I couldn't write "Display" without that knowledge. At the moment I'm wondering if dgol could be done without actually exporting its capsules. I've thought for some time that I don't like the idea that users of it must extend the dgol capsules in order to accomplish some things. I would be happier if they simply had to implement interfaces to do that. Note that I use capsules to allow the polymorphism needed for the 'ioProc' and 'construct' implementations. Is the above compatible with the "provides" concept, where a capsule can "provide" other capsules, rather than implementing interfaces, i.e. allowing interfaces to contain data? Actually doing the last few years worth of work has in some senses pulled me too far in the direction of traditional languages and implementations. Much of my original thoughts have been lost. Perhaps what I need to do for a "quick" check is to change the Zed version of dgol so that a user of it only implements interfaces, they do not ever extend any of dgol's capsules. Actually, do the Exec ones first. All Done. The delegation of events in DGol is a bit of a pain - having to store an Eventer_t in Window_t, and having to remember to pass the events on all the time. However, since the parameter is explicit, to say the Matrix_t constructor, there is a hint that the Matrix_t event handlers need to pass the event on. Can the enum-like variant selector type actually *be* an enum type? We don't want to have to repeat the tags, so using a pre-existing enum type likely isn't the way to go. However, when defining a variant record, we just define the enum type as well. The name of the enum type would appear somewhere in the variant record specification. E.g. type VarRec_t = record { uint vr_fixed1; string vr_fixed2; case VarRecKind_t vr_kind incase vrk_string: string vr_string; incase vrk_uintVec: [] uint vr_uintVec; esac; }; The variant tags would no longer be exported as independent symbols in the scope, but those same names, as enum members, would be so exported. Internally, the record description would reference the enum type. [Done, but kept them as variant record tags, since those are needed anyway.] With this, then it should be possible to hide all details of a variant record type within a given package, providing accessor procs for all of the stuff that we want to export. On such proc returns the kind, as a value of the enum type (VarReckKind_t above). Then, for each variant, there is an accessor routine that returns all required fields (via '@' parameters) of that variant from a passed in record reference. If the record is not actually of that variant, then naturally it cannot return the values. [Hard to export the variant kind type without exporting the record type!] Later. Well, switching to interfaces works for the Exec stuff for constructs and ioProcs. There was one data field in the ioProc capsule, but it wasn't actually used anywhere. However, a test in the io directory is buftest.z . That points out that the while point of the CharBuffer/Consumer_t was that it is extended by clients in order for them to work. Can it be an interface? Looks like it. Done. Also do the one in Display, used in PackageBrowser. So, this looks like it might be workable. Perhaps a rule that would work is this (strange terminology, since I'm thinking about the idea of a merged interface/capsule): you cannot extend a capsule from another package if that capsule contains any data elements. I've switched to using "provides" instead of "implements". Here is a question though: how does the system fill in the virtual proc table things? Currently, the procs come from all the way through the extends chain. We don't want to have to look inside the capsule code in the exporting package in order to know what procs it has. So, the language could simply enforce that - it will not look across a package boundary when finding procs to put into the virtual proc table. Now, that's not quite how the implementation works now, but it can be changed. You can't extend a capsule that contains data elements, but you can 'provide' such a capsule. Each provider of a capsule has its own set of fields. Polymorphic code that works at the capsule level does not know about any of them - it only works via the procs. If capsule C2 provides some other capsule, C1, any capsules that C1 provides are irrelevant to C2 - it does not have access to any of that stuff, and if it wishes to provide some of those other capsules itself, it may do so. The point of "provides" is that the providing capsule can be used as a value of the provided capsule type, there is no implication that the providing capsule type can use or even know about any capabilities of the capsule type that it provides. Hmm. If you can provide capsules, I don't think there is any point in having data fields in the stub record that is used for the provides. Where would you declare such fields? The only code that can access them is the capsule that is being provided, and there is no explicit constructor call dealing with a provided capsule, so the only thing they could ever be used for is caching values that the provided capsule wants to have, with separate copies for each provides object relationship. Yes, there is a place to put them, and a syntax to access them, but if the concepts of capsules and interfaces are merged, there is nowhere to declare them. Hmm. They are of course declared in the provided capsule. But, there is still no way to initialize such fields when an object of the providing capsule type is constructed. Hmm. Wait. An explicit constructor of the providing capsule type can explicitly call a constructor of the provided capsule type. Its only implicit constructors that have no such opportunity. A language rule could be that if capsule C2 is implementing capsule C1, and C1 contains data fields, then both C1 and C2 must have explicit constructors, and the one for C2 must call the one for C1. Everything that you can do with interfaces can be done with this scheme - just don't have any data fields, and don't have any provides clauses. Why is "provides" better than "implements" here? The only issue I can think of is that Java programmers will find it strange that a "class" can "implement" another "class". Is that important? 090624/Wednesday Perhaps "uses" instead of "implements" or "provides". Hard to say, since if I'm going the merged capsule/interface route, then there can be calls going both ways. The "Eventer_t" concept in DGOL is a fine example of something that goes both ways, and requires data. The data is just the event mask, which is changed by routines exported as part of Eventer_t. However, an implementer (or whatever) of Eventer_t provides some or all of the event trigger procs. 090625/Thursday An issue with the merging idea (which I vaguely recall noting in my first thinking about this) is that word 3 of the interface stub record is a pointer to the capsule data. So, the layout isn't exactly the same as the layout of the capsule data itself. Other than that, it looks straightforward to make the 'capcon1' instruction create capsule stubs instead of just interface stubs. See 090314. Having great trouble deciding whether to go ahead and try to do the merged capsule/interface idea. There are compile errors in test/proctest that look like they shouldn't be there. They involve routines "blahPrint" and "blahPrint2". Hmm. I could have sworn that worked fine fairly recently, because I had been using that program to test the Display code. However, the error messages go away if I get rid of most of the tk_exec case in Types/instantiate0. The problem is that ppl_type is a tk_exec of an exk_genericType, and that doesn't match what is needed, which is the actual instantiated type. Hmm. Why is the comparison of "t3 != t2" failing? Resolved. There is no need for Types/instantiate0 to re-create the tk_exec when instantiating a tk_exec. Also made Proc/matchesWantedType skip past tk_execs. Either change fixes the problem. Now that types and procs in generics are not directly exported to the package containing the generic, some of the tests in generic/gen.z no longer work as desired. Have to do some dynamic code generation to achieve the same kind of test. [See following] Ick. I cannot get the error messages about invalid use of uninstantiated generic procs to come out, using dynamic code generation in generic/dyn.z . One of the messages: Cannot use uninstantiated generic proc "XXX" outside of its generic is only within Exec/NameRef, which happens when you reference such a proc by name. But, you can't do that anymore, so that message cannot come out. The other message: Cannot use uninstantiated generic proc "XXX" as a value comes out of AssignIncompat, but only when it encounters an exk_genericProc node. Pulling the proc out of the generic by other means bypasses that. That ex kind is only created within GenericNameRefNew, which the parser only calls when within the generic. The use of exk_proc is part of the problem. With that use, none of the places that containingInstantiation exists are available. Currently, when a proc is instantiated, the Proc_t is not duplicated, so putting such a field in there won't work. Duplicating the Proc_t on instantiation could have its own problems - when one is compiled, the others won't see the new code, etc. Maybe there needs to be a subrecord of Proc_t that contains the shared stuff, and both the original and the instantiations reference it. [Later: working with dyn.z has convinced me I don't need this.] More testing may not be necessary. Perhaps you can never actually use a proc that has an uninstantiated type somewhere in its signature. A bit later: problems, in generic/dyn.z . Can you create a matrix of matrixes of a type involving a generic parameter inside a generic proc? If so, then likely you can simply assign across instantiations within such a proc. Is there an influence from 'any'? [Later: just doesn't seem to fly. See generic/genMat.z] 090627/Saturday Add reserved words 'ichar', 'istring', 'achar' and 'astring' [Done] 090702/Thursday In source code I've been using "selector" and "record variant index tag" and "variant tag". Pick something. Hmm. I think in the past I used the term "selector" to mean the selection expression in a case. I think I'm mostly OK now - see the Glossary file. I may want to change the type name "RecordCase" to be "VariantPart" to match the glossary. DONE. 090703/Friday <> IBM releases Milepost GCC, (http://www.milepost.eu/), which is a learning optimizing gcc. Average 18% improvement. See also a website on tuning: http://ctuning.org/wiki/index.php/Main_Page Don't allow non-privileged code to create pointers ('&') - privileged code could never use such a pointer anyway, since it could be pointing at something that is freed or otherwise invalid almost immediately. Hmm. I take it back. There is no harm in allowing it, other than perhaps the false expectation of it being useful. The key is to not write any privileged code that thinks it is useful. I take back the taking back - do not allow non-privileged code to create pointers. If it is allowed, then privileged code cannot rely on any pointer that it has created that may have passed through non-privileged code. E.g. my use in the X stuff of pointers would become invalid. The only thing non-privileged code can do with pointers is to copy them around and compare them (including against 'nil'). Done. 090713/Monday <> The system should have the concept of a "site" or "office", which is some restricted subset of the universe to which documents can be constrained. The simplest way to do this is to require that all copies of the documents are stored only on a specified set of servers. Caching of the documents is not allowed, or at least should be done in such a way that it does not allow any extra access to the documents. A window snapshot of information from such a constrained document, or a screen snapshot containing such a window, must be similarly constrained. There needs to be a way to indicate that a network printer is within the "site" or "office", so that printouts of documents and snapshots are limited to such printers. A VNC-like connection into such a site is possible, but should properly constrain things - e.g. saves of anything, including snapshots, can only be done to storage within the site, and *not* to local machine storage. Such a setup is vulnerable to the designated server(s) going down. This is reduced by having multiple synchronized servers. However, there is another way to handle this. Each machine must identify itself with an unforgeable identifier (generated, perhaps including the MAC, when the machine is added to the "site"). These should not be extractable - they must only be identifiable as valid. So, some fancy crypto stuff needed. A machine within a "site" can then look around the site and note the machines it can see. Machines can be setup to be a peer-to-peer network that can save documents. Thus, the documents can be worked on when the central servers are down. However, enough of the other machines known to be within the "site" (or, more likely, securely identifying themselves as being within the site), must be reachable before the restricted documents can be worked with. This is needed because the local machine, being part of the peer-to-peer group, may have enough data to allow working with some documents. Changes are saved to the peer group, and the servers can be re-synced when they come back online. 090715/Wednesday <> I think the way the identification is supposed to work using the magic public keys or something, is that the requesting machine sends a request to the machine that it thinks might be part of the site. The request contains a randomly generated value. The site machine sends back a reply in which the random value has been encrypted with the "site key". The requesting machine verifies that the response is valid for the site key and for the random value it sent. Doing it this way makes recording any of the interactions useless. I've been working on the LEGO Red Deer Water tower for the last several days, so no Zed work. It seems odd to have a capsule "extend" one other capsule, but "provide" a set of others. Why not just get rid of the "extend"? Well, the "extend" concept can still be used within a package (see above writings on not allowing "extend" of a capsule in another package). So, it is a locally usable tool, hence its separate nature from "provides". 090718/Saturday <> Thinking a bit yesterday about ABI versus API. When allocating a capsule object, where the capsule implements interfaces, there are the interface virtual proc tables for the implemented interfaces. Are they made bigger if new virtual procs are added to the interface? What if the interface is not partial - what does that imply for adding virtual procs? Currently, I believe this isn't an issue since the implementing capsule is always "parsed" after the definition of the interfaces. However, in the future, where stuff is just read from memory, will that work out? What about when a live system receives a new update that adds virtual procs? I've been converting DGol/Matrix/PackageBrowser to make Window_t contain an Eventer_t field, which it delegates events to. This is towards setting DGOL up so that I never have to extend one of its capsules, outside of the DGOL sources themselves. The problem I've hit is that in PackageBrowser/ IconMatrix, proc TakeChild is given only the Widget_t by the main DGOL TakeChild call. In that context I know that the Widget_t will have a non- nil win_evtr, and that that value is actually a NamedIcon_t, but I have no way to extract the NamedIcon_t reference. It is inside the interface stub record which I can get. So, I believe I need a new language construct, say 'interfaceAssign', which takes an interface value, extracts the actual data reference from it, and checks its type against its first argument type, and does the assign if they match. I haven't thought about what happens to this if I do away with interfaces and just use capsules. It ought to still work, but what do I call it? 'capsuleAssign' I guess. [Done] 090727/Monday Much of last week spent working on motorizing my ETS low-floor bus. <> Just noticed that the pretty-printer is showing the expanded form of the 'for' body of Names/Dump. We see variable "__1" being declared and used. In fact, why is there a variable needed anyway? The extra variable is needed since the code could assign 'nil' to "entries", which could result in the freeing of the entire vector, and thus invalidate the "en" '@' variable. Ick. Will want to optimize this out somehow. Display Fixed. Also, again got the "Too many windows" abort from dgol. The package browser code (or internal to dgol for menus) is not deleting them. FIXED The special case code for exk_recordField in Display/fmtExec needs to be generalized so that if the base expression is any path that ends in either '.' or '..', then it needs to do the space-insertion. DONE 090728/Tuesday <> Thinking about distribution of updates. With my current Ubuntu system, you can (if you can notice the option) select a site to get your updates from. It seems to me that John Q. Public wants something more automatic. So, perhaps when new local sites become available, the existing site that used to encompass their areas turns into a hub site, which only distributes to those new leaf sites. Whenever individual systems connect to the newly hub site, they are redirected to the appropriate leaf site. Systems remember the site they update from, and try it first next time. If they can't contact it, they can try going upwards through the "tree" of hubs sites (they remember their own path through it). That should either affirm their current selection, or give them an alternate path. 090804/Tuesday <> Provide system routines that return the host machine's behavior for the various sized numeric types. E.g. trap-on-overflow, carry-on-overflow, borrow-on-underflow, detect-zero-divide, etc. This could conceivably allow someone to write things like extended precision integer arithmetic packages which are correct on all platforms. Actually, we want the info for both the host platform and the target platform, but I haven't gone into that at all yet. 090806/Thursday <> Paper on making the new Eiffel "void-safe" (which means that it cannot encounter nil pointers: http://se.ethz.ch/~meyer/publications/hoare/void-safety.pdf 090911/Tuesday For a while, the Display has been doing extra blank lines. Or so it seemed. Actually, it was doing comments wrong. The reason for that was a code generation error - wrong size load for a record selector value. 090813/Thursday Formally verified OS: http://ertos.nicta.com.au/research/l4.verified/ From comp.arch discussion (see 090803-progCor): >>> http://www.adaic.org/standards/05aarm/html/AA-C-6.html (From that document:) Implementation Requirements 20 {external effect (volatile/atomic objects) [partial]} The external effect of a program (see 1.1.3) is defined to include each read and update of a volatile or atomic object. The implementation shall not generate any memory reads or updates of atomic or volatile objects other than those specified by the program. 20.a Discussion: The presumption is that volatile or atomic objects might reside in an “active” part of the address space where each read has a potential side-effect, and at the very least might deliver a different value. 20.b The rule above and the definition of external effect are intended to prevent (at least) the following incorrect optimizations, where V is a volatile variable: 20.c * X:= V; Y:=V; cannot be allowed to be translated as Y:=V; X:=V; 20.d * Deleting redundant loads: X:= V; X:= V; shall read the value of V from memory twice. 20.e * Deleting redundant stores: V:= X; V:= X; shall write into V twice. 20.f * Extra stores: V:= X+Y; should not translate to something like V:= X; V:= V+Y; 20.g * Extra loads: X:= V; Y:= X+Z; X:=X+B; should not translate to something like Y:= V+Z; X:= V+B; 20.h * Reordering of loads from volatile variables: X:= V1; Y:= V2; (whether or not V1 = V2) should not translate to Y:= V2; X:= V1; 20.i * Reordering of stores to volatile variables: V1:= X; V2:= X; should not translate to V2:=X; V1:= X; Afaik those rules means that the compiler _has_ to insert some form of membar between each pair of loads or stores to volatile variables, right? >>> >>> How could those rules be improved (for _any_ high-level language)? I like them a lot, they read like the set of rules a small group of smart people would come up with when trying to specify a portable way to access any kind of memory/address with potential side effects or simultaneous external updates. 090820/Thursday Mum visiting, Lego stuff, no Zed. <><> I was thinking about the event procs in DGol. If a specific widget wants to handle an event, it will likely need to then forward the event to the Widget_t's wid_evntr. It would be nice if that could be something as simple as writing "forward(wid)". "forward" would be a compileTime proc in DGol.z. It would use pctx.pctx_proc.pr_name to know the name of the event to forward, and would replace its call with a test for that handler in wid_eventr, followed by the proper call (hmm, how does it know what parameters to forward?). How could this be done? One initial thought was to have a "template" or "runTime" proc. It can only have a pctx parameter and some number of Exec/Exec_t parameters. The end result is that the body of the template proc, with all occurences of its named formal parameters replaced by the actual Exec/Exec_t's passed in, be the expansion of the original compileTime proc call. Another thought was that the "template" be a language construct that can occur inside compileTime procs, that has the effect of being "expanded" by the given substitutions. Perhaps something like: template(eventName = Util/ProcName()) begin Eventer_t evtr := wid.wid_evntr; if evtr ~= nil then if evtr.eventName ~= nil then {evtr.eventName}(... something ...); fi; fi; end; Hmm. I think it works better as a separate proc. Maybe. More thought. 090821/Friday <><> Further thought on the above. Stick to 'template' rather than 'runTime'. A compile time proc can have 'compileTime' and 'template' parts. The former run at compile time. The latter are templates that are copied into the proc containing the call, with template parameters replaced with their current values. Two new language constructs are "compileTime" and "template", which simply switch between the two modes. There can be 'template' procs which start out in template mode rather than in compileTime mode. Compile time procs can have formal parameters of "type" "template", which means they are Exec/Exec_t values which replace the corresponding formal parameter name in template code. They can also have a final parameter of "type" "otherArgs", which is all remaining arguments to the call to the compile time proc. The only use of that formal "parameter" is in another proc call in template context, as the last parameter to that call. All of the remaining parameters to the compile time proc are passed to that call, unmodified. Code in template sections can use string variables declared in compileTime sections for identifiers, which will be resolved in the context of the caller of the compile time proc. So, the "forward" proc above could perhaps be: proc template forward(Package/Context_t pctx; template Widget_t wid; otherArgs tail)void: Eventer_t evtr := wid.wid_evntr; if evtr ~= nil then compileTime; string eventName := Util/ProcName(); template; if evtr.eventName ~= nil then {evtr.eventName}(tail); fi; fi; corp; 090822/Saturday <><> The above idea can perhaps play the role of both C macros and of inline procs. Can it be used for some/all of the small routines in the Lists package? Another example, from talking with Don: proc compileTime BitSet(Exec/Exec_t ex; uint bitNum)void: case ex.ex_kind incase Exec/exk_procFormal: incase Exec/exk_localVar: incase Exec/exk_packageVar: template; ex := ex | 1 << bitNum; compileTime; default: template; @ bits64 ro b := @ex; b@ := b@ | 1 << bitNum; compileTime; esac; corp; proc test()void: [4] bits64 bA; uint i := 2; BitSet(ba[i], 13); corp; We decided that "template" is indeed the best word. On dictionary.com, one definition is in terms of document processing, where the description of how a template is used is nearly identical to the proposed usage here. 090824/Monday <> Going back to the issues about not allowing "extends" of a capsule from another package, not allowing direct access to fields of a capsule (and record??) from another package, allow "implements " instead of "implements ", etc. First point: if I go with "implements ", and do away with interfaces altogether, then there isn't much point in preventing the cross-package "extends", since I effectively require direct field access to those capsules that are "implemented". Hmm. I guess that's not really true - if the "implemented" capsule is in another package, direct access to the data fields can be denied, and use of accessors required, just like for other cross-package uses of capsules. Think about a traditional Linux system. Windows and other *nixes will be similar. There are shared libraries (.so's) that are used by multiple client programs and other libraries. They present an ABI - Application Binary Interface for any data structures they share with their clients. This allows them to be upgraded without affecting the existing clients. They can present new functionality, which is used by new clients that know of that functionality, by presenting additional structures and functions. Functions are found by name within the shared library, so adding new ones does not change any sizes or offsets. If such a shared library wishes to change the size or layout of a structure shared with clients, it must be very careful. In C, you can add parameters to functions, and, so long as the library can distinguish new callers from old callers using just the old parameters, it will work. Even in C, however, source code will need to be re-compiled to accept the added parameters. Similarly, new fields can be added to the end of existing shared structures, so long as it is the libary which allocates space for the structures, or, again, the library can tell from old fields whether or not new fields are present. This is all fairly hacky. Note that the varying number of parameters only works in C because C compilers typically have the calling routine pop parameters off the stack, and parameters are pushed in reverse order. Without both of those, the added parameters won't work properly. Thus, most other languages cannot do that. An early example of the above is the "open" *nix call. What do other languages do? Presumeably they only offer the form with all parameters. 090825/Tuesday <> Continuing on... "First point" above can be deleted - I don't think there is an issue. With the usual inheritance implementation done in C++, child classes contain their parent class(es) as their first parts. If a parent class has its layout changed, all code that references its fields must be recompiled. If a parent class has its data size changed, then all code which uses the child class must be recompiled. Clearly then, this kind of direct inheritance is not compatible with the standard use of shared libraries. The first effect can be handled by having accessor functions for all parent class fields that are meant to be read or updated by code outside of the class and its friends. The second effect can be handled by not directly including the parent class data in the child class data, but having a separately allocated region and pointing to it. These lose much of the efficiency of the C++ style. Also, the net result is close to just having a reference to parent class objects in child class objects, and using some macros to access the parent class fields. A C++ compiler usually sees the full definitions of all parent classes when it is compiling child classes. When optimizing, it will typically incline accessors, thus removing their cost. However, this also breaks binary compatibility between implementations of the parent classes. C++ has, as far as I am aware, no mechanism to tell the compiler to *not* do such an optimization in certain instances, while still being allowed to do it in other instances. You can use 'inline' on functions, but that doesn't allow you to say something like "inline only in this set of code". Thus, I believe that C++ -style classes are not a good mechanism for use with ABI's. There are OS's written mostly in C++, e.g. BeOs. What do they do here? Are the OS and libary interfaces frozen for all time? Must applications be rebuilt and redistributed whenever a parent class changes somewhere? Awkward. In Zed, the basic unit of distribution is likely to be the package. I would like public interfaces (like, e.g. Exec, Types, etc.) to change very very little over time. Internals can change whenever needed, but the external interfaces should not. My desire is that new versions of packages can be loaded into a running Zed system, and programs/processes can switch over to using the new ones as they restart. This is similar to updating a .so in Linux. In order for that to work, I need the same kind of constant ABI as discussed above, although in terms that work for Zed. Thus, interfaces presented to other packages should not in general involve extending capsules, or directly accessing capsule or record fields. As mentioned above, new procs can be exported by new versions of packages, since the linking between client packages and providing packages is done via name lookup. Such new interfaces can use new, expanded structures, if necessary. It is this desire that triggers my thinking that you should not be able to extend a capsule from another package. You can only extend capsules within your own package. Another way of looking at this is that the object- oriented paradigm is an implementation technique within packages, but is not part of the public-facing interface to a package. In many ways this concept is in direct violation of object-oriented programming. However, it is a choice made for reasons of practicality - it is simply not practical to not have shared libraries of some sort (and in Zed I want to rely on them as much as I possibly can, to avoid redundancy), and it is not, in my opinion, practical to require that application code must be completely recompiled, on all client systems, when an important bug fix to a providing package is distributed. What about interfaces? The current test stuff that I have is my PackageBrowser on top of Don's Dgol (modified by me to satisfy the above requirements about not extending capsules cross-package). I have used interfaces exported from Dgol, implemented by client capsules in the client code, to handle things like button presses, redraw requests, etc. that go from Dgol to the client code. This is thus still OOP, but using just interfaces, and having client capsules typically contain a reference to the parent Dgol capsule. Is the above the right way to go? If the providing capsule wishes to extend some interface it exports, then how can existing code be compatible with the extended interface? If the interface is declared as "partial", then the providing code must check for interface procs before trying to use them, so a mechanism exists to allow some form of compatibility. However, interface proc vectors are currently allocated when the Zed compiler is compiling the capsule that implements the interface. That means that they are fixed for the life of the implementing capsule within the system. Perhaps the building of those virtual proc vectors should be delayed until some "process initialization" time, at which point the process is controlled by the interface definition that the referenced package exports at that time. That, combined with the 'partial' flag, would allow old code to start running with a new version of a referenced package with no need of recompilation. Suggested new restriction: all exported interfaces must be 'partial'. Another way of dealing with this is to have a pair of numbers in each package. One simply counts the exact release of the package, as fixes are made, new interfaces are added, etc. Code created against a given release can use that or any later release of the package. Each package also has a version number. If any *public* struct, union, record, proc header, enum type, defined constant, etc. etc. changes, then the package gains a new version, and is no longer compatible with existing users of the package. Thus, a Zed system must maintain multiple versions of such a package until all users of the old version are updated. This is the basic thought I had a long time ago anyway. <><> This suggests that I not implement restrictions here, but rather make sure that the version number of a package goes up whenever any of the "bad" changes are made to a library that has already been distributed, before it is distributed again. Just setting an "interface changed" flag should work, then checking that before any official new version can happen. <> Dictionary.com uses "version" as a "larger" word than "release", as in 17.3 is version 17, release 3. <><> 'template' can perhaps be a type attribute, as in "template bool". It means that the type represents Exec/Exec_t trees (or nodes) that have type bool. I don't see any particular reason why such things need to be restricted to just the formal parameters of a 'compileTime' or 'template' proc. They can even be fields of records that are saved statically. Of course full checking on used template values will happen via ProcCheck at the end of defining any actual proc. 090826/Wednesday Grrr. I made a backup this morning, since I thought I would be diving into one of the larger issues. However, I looked at "090620/Saturday" first, and have been working there. I took out some of the tests in the checkCall helper for ProcCheck, wondering why checks of Exec_t parameters to a non-regular proc couldn't be checked. Inside the body of a real compileTime proc I think the checks are OK, because a pctx from the proc which has the compileTime call is being explicitly used. However, inside the calling stub which "runCompileTime" creates and calls, the pctx is a new one for that stub proc. Thus, the use of local variable "i" in test/ctime/main is a problem, since "i" is local to proc "main", not to that stub. The check going off is in the check of that Exec_t being done now by checkCall. I can put the tests back, but what is the right thing to do here? IRRELEVANTED The reason I was looking at that stuff is in order to have ProcCheck verify the modifiability of values wrt non-ro '@' parameters. 090827/Thursday <> Random thought: the columns of a spreadsheet can be thought of somewhat like the fields of a database record. That suggests that database-like operations (combining stuff from multiple spreadsheets based on common keys, etc.) could be useful. For all I know, spreadsheets already do that. <> Back to the stuff from yesterday. I haven't been able to think of any other way to do the call to the compile time proc (other than perhaps adding a new construct to the language to do that, which would require "run-time" checks of the parameters/result). The way it is currently done, there is no way that the ProcCheck checks on local variables, etc. can work. Any attempt to make a special case would be vulnerable to other uses. Hmm. Unless I have the compiler only allow the special case if the current proc is /Exec/[Exec_]Call/runCompileTime. That's kind of icky/kludy. Also the current code is wrong anyway. The stub routine is pt_regular, so any checks based on the context not being regular are actually backwards. Looking at it, I do need to check Exec_t parameters. Otherwise a programmer could grab an Exec_t local variable reference out of another proc and use it inside of a proc he is building as a parameter to a call, and the net result could be an invalid local variable offset use. One thought right now is to export, only to Exec, a different routine from Proc that allows the creation of a proc, but does *not* create and set in a new pctx. Gah! Idiot! That doesn't work at all - ProcCheck is checking the proc in pctx, not the pctx itself. 090828/Friday <> Is it possible that after I do 'template' stuff that other, direct means of building executable stuff will not be needed? Perhaps, but even if so, that means nothing, since all of the Exec entry points are needed by the parser, and by any alternate parsers that people write. Ok, fixed things so that the "runCompileTime" stub procs pass the ProcCheck checking. I think I've said this before, but perhaps the whole ProcCheck thing should be replaced by code which simply copies an entire Exec_t tree, using the Exec_t routines to rebuild it. Thus it can run through all of the proper checks without having to reproduce them in ProcCheck code. One key thing is that it would only handle the ae_alternate path of AlternateExec_t nodes, i.e. it would not attempt to re-run compile time calls, etc. This results in checking everything produced by such compile time calls, which is what we base code generation on. The ae_original pointer can simply point at the ae_original tree from the AlternateExec_t being copied. Note that we save *only* the original tree when we store Exec_t persistently. DONE 090829/Saturday <><> Don't actually have any 'template' procs. Just allow the use of 'template' sections in any compile time proc. The 'template' and 'private' sections can have 3 forms: 'template' '(' stuff-to-expand ')' // used for expressions 'template' // used for statements 'template' 'begin' 'end' // used for larger blocks These could either be seperate exk_ kinds, or there could be an enum inside of an exk_template saying what kind it is. It really only affects the pretty-printing of it. 090830/Sunday %%% I think I need to detect the case where a variable declaration is being done, and the Phase2 call is omitted - that call can mark the declaration as in error. 090831/Monday Started to write a Types/Copy proc. Got as far as the second case, for tk_named, and stopped. Why do I need to copy types? Any Exec_t used in a type should be validated (copied) when it is accepted by the type construction routines. If I try to rebuild a named type every time I encounter it, I'll be doing it forever. If nothing else, the process would need recursion protection. Ok, so types should be valid as built. Hmm. No, that's not true. If a programmer grabs a reference to a chunk of code from another proc (possibly from another package), there could be a type declaration in that code which uses a symbol from that other proc or package, and so the resulting type is invalid (or different) in the new context. I'm wondering if the handling of named types will depend on whether the Copy routine is being used as part of validating an Exec_t tree, or whether it is being used during the actual copying of an Exec_t tree, as in the template replacement process. I'll add a 'mode' flag and see what happens. I defined it, and put in a full set of visibility tests for the "validate" case. Then I went to do the "copy template" case and just went with the same set of tests! Ran into more troubles with enum types, and other types that define symbols. I settled on re-using the type if the first name is visible in the current context, else cloning the type. Need to change all of the types stuff to preserve comments. This does not need to be done when expanding a template, but does need to be done when just verifying a type as part of verifying a proc. Later: this is likely not necessary. When Exec/Copy is used to verify a proc body, I can just discard the copy and stay with the original, which has the comments. When it is used as part of template expansion, I don't need any comments to be preserved, since the template expansion is not normally visible to anyone. Put all of the "Normalize" calls (suitable specialized, since we already know the kind of type) right into the type constructor XXXNew calls. [Partially done] DONE Gah! Ok, that doesn't fly. I think I likely need two separate routines, or to bring the mode flag back. When validating types, I want to validate any Exec_t's within types *in place*. I can't build the types anew, since they can define symbols. Consider a record type with a variant part - it defines an enum-like type with the symbols of the variant tags. I can't do that again - they already exist. So, I think I need the tests of type name visibility that I've put into the code so far. But for the rest, either ignore the type, or recurse into it in order to validate any Exec_t's within it. Note that a re-creation of such an Exec_t can cause the type to become invalid! Ick. That would be handled by re-creating the type. Hmm. Perhaps some types I can safely re-create, such as matrix and array types. I *can* re-create bits types, since the symbols within them are not defined externally - they are only within the bits type itself. Is that true? Yes it is. This could preclude me ever making them visible independently, which would be required for any kind of bits constant that does not involve the bits type name. Basically, have to try again, thinking about this as I do it. LATER: what I've ended up doing is *not* doing anything with types when validating Exec_t's. The type itself was valid in some context, and so can be used as such. Hmmm. Thinking about it again, do I need to validate the use of a type name? 090901/Tuesday %%% Record types come back as a problem. They define symbols into their scope *and* their layout can change if field types are re-evaluated. So, I can't just re-create them, and I can't just validate their field types. Perhaps I need to re-evaluate the field types, and compare the resulting byteSize and alignment against the already existing ones. If they differ, then someone has tried something nasty, and I simply bail out, returning type Error for the whole thing. Should be good enough. Note, however, that the above is when validating types. When copying a type due to template copying, I need to actually re-create the record type. I think I've asked this before, but: are array types equivalent if they happen to have the same size in all dimensions, but the Exec_t's that produce those dimensions are different? Currently the Normalize code only examines the actual bounds values. Also need to check writeablePackage for matrix types. DONE [Also did checks on the Exec_t for the bounds, only allowing very simple forms to be equivalent. "Types/boundsEquiv"] 090903/Thursday Exec/DefineLocal is too powerful - can't let users specify a Names/Info_t for arbitrary things. E.g. a local variable with an arbitrary offset. [Exported it only to Package.] 090904/Friday Fix up the damn scope/sequence mess. Merge them. Beware of the current use of a temporary TempScope_t in Exec_Call/SaveRefToInnerVar. Later: It looks like I need to keep the two as separate concepts, because of the need of 'for' and 'while' statements to wrap a scope around the entire construct. If each sequence has its own scope, then, e.g. any variables declared in the condition part of a 'while' will be in a new scope just for the condition, and so cannot be seen in the body of the 'while'. DONE Arrghh. There are lots of comments around the '@' saving stuff in Exec_Call, but there is none about why SaveRefToInnerVar needs to have the "forceLocalSave" parameter to "saveRefToInner". Nor can I find any notes on it in this file. I am looking at that since I've just added the Exec/Copy stuff, and it is generating an extra local temporary, based on an already existing local temporary. I want to get rid of that extra, since it is not needed, and it is resulting in the maxLocalSize being larger on the Copy result than on the original Exec_t tree. I don't want that either. It seems to me that if the base variable referencing a larger value that a new '@' is now within, if that local base is a local or a formal and is 'ro', then there is never a need for a new local. Thus, in something like: @ SubType_t ro oneVal := @rec.subField; if "rec" is 'ro', there is no need for any temporary local variable. That test would also catch the extra '@'-saving variables being created. Ah, of course. If we are in the context of creating an '@' local variable, then we are not bounded by the body of a proc we are calling as to what can be modified of our local variables and formals. Following code can modify our locals, and leave us with a dangling pointer. E.g. local1 := getBigRecord(...); @ SubType ro oneVal := @local1.subField; local1 := nil; /* Oooops - oneVal is now a dangling pointer */ 090905/Saturday When I do the thing of getting rid of allocations of TempXXX stuff, it would also be good to make some of them internally have space for one item of any lists they have. That would allow them to handle the common case without *any* allocations/frees. DONE for Package/TempDecl_t, which is the only one converted so far. It did reduce alloc/free count on size 48 bucket. DONE. <><> Eventually, the optimizer should be able to identify situations similar to the one mentioned yesterday where a temporary is not needed in order to make a new '@' variable or actual parameter safe. One approach might be to have things depend on the 'ro' attribute, and have a pre-pass go through and internally mark as 'ro' all variables that can be so marked. Interesting. I tried to mark "entries" as 'ro' in Names/Enter, but it is actually assigned to, so I couldn't. What the optimizer would have to notice in that proc is that there is no assignment to "entries" within the scope of "enOld". Similar for "en" lower down. 090906/Sunday <> Would it be useful to have a property on all reference values that is "threadLocal"? Such values could never be assigned to non "threadLocal" places (and hence never be seen by another thread). Asside from issues of correctness and readability, this would allow dynamic allocation from a per-thread memory pool. Garbage collection of them could be purely local to the thread. If the thread exits, the entire pool is simply reclaimed. This would be similar in some ways to the "nonNil" attribute that I talked about earlier. 090911/Friday Way too much Lego-related stuff (e.g. LUGBULK) See the LLVM Compiler infrastructure: http://llvm.org/ 090913/Sunday <> Spreadsheets should be formattable in documents using the table formatting code. They should be pretty much the same things. Thoughts on pie charts from a spreadsheet: usual format has the size of the arc (angle) proportional to the values being plotted. Another possibility is to have the volume of the wedge be the proportional value. Smaller wedges could be inside the main bounding circle (user controllable of course!). When doing 3D charts, the charting code should actually generate a 3D model of the chart, and 3D solid modelling software can then be used to edit it. There would be options that the charting code applies as it generates the 3D model. For example, it might make the height of the wedges vary in uniform steps, or perhaps proportional to the value as well. It could also be the wedges moving up, rather than getting taller. Etc. 090914/Monday Haiku (desktop OS inspired by BeOS) releases Alpha bootable ISO image. http://www.haiku-os.org/news/2009-09-13_haiku_project_announces_availability_haiku_r1alpha_1 <> Before the break to do LEGO/LUGBULK stuff, I had written Exec/Copy and used it instead of Exec/ProcCheck. One bug I had to fix was that Copy wasn't appending the body of a construct to the current sequence. I fixed that, but the testing code I had put in to chase that was showing further problems, with compile time procs. A difference in length of the code for the stub proc used to call the compile time proc was showing up. First, I'm not supposed to use Exec/Copy on those stub procs at all. The problem there was that when a compile time proc is used outside of a proc (e.g. in the expression for a package-level constant declaration), there is no "callingProc" to pass in for "DefineProcPhase6Special"'s "forceProc" parameter, and so the test in defineProcPhase6 was going wrong. This is fixed by using "theProc"'s value in that case. However, that doesn't explain why I was seeing the complaint about the bytecode length differing after the recompile of the stub proc. The issue there is hard to wrap my mind around, but what happens is that the initial compile of the stub proc (in this case in "ctime.z", calling "yieldType"), yields byteCode: Disassemble: proc '': 0000: jsr yieldType 0003: rtsv However, the proc call code then generates an aek_alternate node for the call to "yieldType", with the alternate half being the "uint" type node itself (since that is what "yieldType" returns when run). So, on the second compile of the stub proc, it gives: Disassemble: proc '': 0000: pshtr uint 0005: rtsv which is actually equivalent and the correct thing to happen. Notice that the second version is 2 bytes more bytecode. Note, however, that all of this is simply providing a type value to use in a package-level declaration. So, the code doesn't matter, and neither version is ever executed at runtime. Now that I discard the result of Exec/Copy (I keep the original so that I don't have to make Exec/Copy bother about comments, etc.), it is the first version of the bytecode that runs at compile time. So, the above fix for the call to "DefineProcPhase6Special" is all that is needed. 090915/Tuesday When using Exec/Copy, any warnings are emitted a second time at the end of the proc being compiled. Avoid that. DONE. 090917/Thursday Google's new language, "Noop" (no-op), based on Java Virtual Machines: http://code.google.com/p/noop/source/checkout 090927/Sunday Way too much work on the train show and on the NALUG LUGBULK order. Here is an email I'm just sending to Don: I finally got back to working on Zed this morning. I had it mostly figured out by lunch time, and fixed it this afternoon. I'm glad I noticed and chased it now instead of a couple of years later! The bug has been there a *long* time. The bug is only in the Zed source for the Proc package. It is in a place where there is a slight difference between the Zed source and the C source, necessitated by the need to avoid using shared (between Zed and C) structures that are not yet defined when the C version is running. The bug would cause no symptom in nearly all situations. The outer scope for a proc ended up not being properly used when the proc is created by the Zed version of the code. The problem is *only* for the outer scope, which is created by that code so that all users of it do not have to do it themselves. In the C version, the parser uses a routine it already has to parse a scope complete with its contents. So, with a routine created by the Zed version there was no "scope enter" or "scope leave" instructions generated within a proc in that situation. That causes the disassembler to be unable to find the name of a top-level local symbol in the proc. It would also result in a variable not being initialized on entry to the proc (which would only matter if you use it without initializing it explicitly). And, if the variable is a trackable pointer, on exit from the proc there would have been no decrement of the usecount of any value the variable had, thus resulting in a possible small memory leak. The error has to do with the scope that is started in DefineProcPhase5 and finished in DefineProcPhase7. The body passed into DefineProcPhase6 was being used directly as pr_body for the new proc. So, the scope was not visible at that level. That is OK with the C version of the code, where the parser ensures that a scope exists. The simple fix is to call Exec/SequenceAppend to append pr_body to that sequence, and then use the result of Exec/ScopeNew as a new value for pr_body. Hmm. I don't allow 'nil' as a value for a string constant. Should I? I ran into this with "SPECIFIC_PROC" in Proc.z . DONE Right now, Display of package Names shows only proc pre-declarations for a number of procs. FIXED, but I don't understand why I have to copy the proc body into tdp_newProc as well as ectx_containingProc when the proc has been predeclared. RESOLVED 090930/Wednesday While testing the above fix, I ran into a problem with generic/dyn.z . I think the issue somehow relates to declaring enough (11 words?) variables inside an "if false then" body. This causes routine "ct4" to get a NIL or invalid value for "pctx". FIXED - because Exec/Copy only copies the ae_alternate side of an aek_alternate, the second computation of the maxLocalSize value yields a smaller value, and thus parameter offsets are messed up. The fix is to move the assignment to pr_maxLocalSize to after the code which uses Exec/Copy. I've changed Exec/Copy to be Exec/copy, with Exec/Validate. Having done that that I now get an error when declaring the actual Exec/Validate, complaining that the export targets are not the same. They are "(../Proc)" in both cases. In gdb, I see that the two targets vectors have the same size, and their elements are both PathToPackage_t's of ".." "Proc", with the ptp_pk being the same. But, they are not the same PathToPackage_t values in this case. Why? The same export targets exist in Exec0.z and Exec.z, and they work - why not for Exec0.z versus Exec_Copy.z? Is it because it is a subpackage? Hmm. That could be true - Exec/Validate should be defined in the subpackage, not the package. Interesting. FIXED, by using the pk_paths from the containing package, not the containing subpackage, in Package/addPathFromTarget. 091003/Saturday <> Discussions in comp.arch about parallel programming, and how it affects high and low levels of programming. The need for arbitrary types of memory, as visible at the programming language level, is suggested. So, this would need things like record allocators to be specific to a given type of memory, rather (or as well as) specific to a thread. There would need to be facilities in the language to define memory types. Specific code generators could perhaps understand tags and use the right code sequences for accesses. Perhaps some tags actually disallow the assigning of a tagged value to an untagged destination, rather than simply discarding the tags. Similarly, it might be possible to assign an untagged value to a tagged destination, depending on the tags, since it might be adding restrictions to the accesses 091004/Sunday The Normalize calls are missing in Types/Copy and Types_ByteBufferIO - if I don't get rid of the whole concept, they are needed. [Resolved] 091006/Tuesday Currently there is type TempExecList_t. There should be a struct type, say TempExecSet_t that contains some number (4?) of fixed Exec_t references, along with a list head for TempExecList_t. Then, those situations which currently have TempExecList_t can have the new struct internal to them, and thus save allocations whenever the list has no more than 4 elements. It would also be possible to do that in the permanent Exec_t records, but that is a more interesting choice, since they already use arrays. However, there is Call_t that has one parameter directly. Perhaps other cases could do the same, or have a different number of direct entries. Or, all could do something similar to this suggestion for TempExecSet_t. DONE 091007/Wednesday Putting in Exec/TempExecs_t, and moving types around a bit. Hit a problem with the new Verify code: I get errors saying that type TempExecList_t is not defined. These are verification errors, since they come at the end of the proc using them, and they include the generic "verification failed" error. The type in question is now a 'local' type, so somehow Exec/copy is not finding local symbols. FIXED. The issue was with Package/ FindNameInPackage. When called from Types/checkNamed, "targetPk" is the package in which the symbol was defined, not the subpackage. That is because Types/NamedDesc_t does not record the subpackage as different from the package. 091012/Monday Done removal of extra alloc/free in Type/Exec temporary types. Looking for other similar cleanups. %%% Need to remember to test all kinds of "index" values in all contexts, and to change 'getBound' to only accept uint. [See Exec/badTypeForIndexing and Exec/getIndexingValue.] 091013/Tuesday Yesterday I had changed Display/Category_t from a set oneof to an enum, and thought all was fine. Well, its not. You can't put an enum into a bits field. The reason for that is that you can assign explicit constants to bits values, and that could allow you to put an invalid value into an enum field, which breaks the Zed semantic rule that enum values are never out of range. So, I need to turn Category_t back to a set oneof. DONE 091019/Monday <> Need a generic, say called a "row", which is essentially a vector of some type (typically a struct type). You can index them like vectors. But, you can always append new elements to the end of the row. Perhaps you can also flag an element as deleted. This is a whole lot like a database. This came to mind when thinking about writing a replacement for my chequebook, if I decide that the cost of my savings is account is too large and want to get rid of it. 091020/Tuesday Leave off a '*/' in an interior comment, and the parser basically goes bonkers - it never recovers to the proc level and so the remaining procs in the package file are not properly defined. [Need more info. If you leave off the '*/', then typically the rest of the file is part of that comment. There must have been some other comment mismatch that got out of that situation.] 091026/Monday Linked to from /. is an article showing how root running "ldd" can get your machine taken over. It turns out that ldd is just a script that checks for the LD_TRACE_LOADED_OBJECTS environment variable. The system dynamic loader will use an alternate loader, as specified in the Elf file. The actual tracing is done by that loader, not by ldd. So, by pointing at a non-standard loader, and then getting root to run ldd on a binary compiled to reference that non-standard loader, you can run as root. To me, a simple fix could be within the standard loader: if have root privs and alternate loader specified and alternate loader not owned by root then issue warning message; exit fi; The whole concept of being able to specify an alternate loader seems pretty hokey to me. It is asking for nasty problems. You *cannot* expect the average user or even admin to know of this - it is guru level stuff. 091028/Wednesday Adding a few months worth of notes from paper I keep in my bedroom: <> Watch out for garbage collection happening in the middle of doing the construction of something: capsule, record, variant record... <> On a proc, have an "active-code array (which is what is running in the system currently) and a "new-code" array, which is created by compilation. With it are indications of what kind of code it is - bytecode, native code for arch_XXX). Also, whether or not it is linkable to arbitrary blob form (might affect subroutine calls). [I don't understand that last sentence.] <> With the above, a "link" facility can produce the bytecode boot blob I will need, or a native-code binary for some need. assign(var-my-inst-of-poly/gen, un-inst-value); Must fail to run. Depends on the actual types. Hmm, should be OK. It's the other direction that might be an issue if the value is allocated in generic code. Then won't EVER work in above. [Well, there cannot be any instantiation of a generic before the end of the generic. So there is no way to 'assign' an uninst value to an instantiated variable because there are no instantiated variables yet. With polymorphism, the objects will have differing actual types, even if one type extends the other. So, the 'assign' will always fail. A regular assignment allows an extending value to be assigned to a destination of the extended type.] An unnamed type in a generic will get instantiated, but without the checks in "assignIncompat", will you be able to assign an instantiated value to an uninstantiated destination? [The question is now meaningless. The only uninstantiated destinations are within the generic. At that point there are no instantiations, so no instantiated values.] <> Should Fmt/Fmt be line buffered, and not necessarily flush at the end of each call? (This relates to the old bundles, but check it for generics.) When calling API routines, it is normally an uninstantiated value as a parameter. However, an instantiated value is acceptable to a proc wanting an uninstantiated value. So, do the "basesMatch" checks need to be used? In other words, is it always the prototype that matters, not the actual param type? [There is no "basesMatch" anymore. Types in generics cannot be predeclared before the generic, so there is no way to call anything outside of the generic with a type that is either a generic parameter type or a type derived from generic parameter types. Uninstantiated types and procs cannot be used outside of their generic. So I see no issue.] <><> Preferences. (also discussed earlier) /*tag in enum /*section in fieldlist => named tab in preferences popup window comments (//) after fields/tags are put into rollovers /*field & /*tag descriptions come up when mouse over text of field-name/ tag and hit HELP key (how do with just mouse? Right click?) Perhaps better is to have 'inline' structs for subsections. Field name is the tab name. Field comment is tab rollover. Perhaps need some standardized way where leading numbers (one or two needed, I think) give resource numbers for non-prog-lang variants of the text. <> (May be discussed earlier.) Reserve 'ichar' and 'istring' for "international", or use 'achar' and 'astring' for ASCII. DONE Actuals to compile time procs must be constant exprs, or must go to an Exec_t formal. VERIFIED <><> ("templates" discussed earlier - just entering these early thoughts) Not Exec_t, but 'template', or 'template' "template void", "template nil" make sense Perhaps "template type" for any Exec_t value? What are the ramifications of exporting a proc which returns a value of a record type which is *not* exported? First result: if you try to use the type directly, the new copy-based proc verification complains that the type (by name) is not exported from the package. Neat. If you skip over the name, you can use the type to declare a variable and successfully receive a value from the proc, and use it as a tracked value. Third result: you can access the fields of the non-exported record type, because the field names are part of the type, and are thus visible given access to the record type. You cannot assign to the record fields, because of the usual non-public record rules. Size of union should be size of biggest member rounded up to aligment of whole union. I.e. rounded up to largest member alignment. Fixed, and found and fixed a bug in the C version of 'pretend'. Am I consistent about whether a Types/NamedDesc_t refers to the main package a name is in, or the subpackage? Consequences of this? Resolved. There were some issues, but I think I've fixed them properly. 091102/Monday %%% What about geni_pk? Is it a package only, or can it be a subpackage? Consequences? Other things containing a package reference? 091103/Tuesday In checking out the issue just above, I've got file procRes.z . What I find is that there is going to be no way to have such a thing as an anonymous record reference. My thoughts had been to hide the type using an extra level of names, as in: type XXXPrivate_t = record { ... }; export type XXX_t = XXXPrivate_t; However, there is nothing stopping code in another package from using SkipOneName twice to skip both names, and thus get at the underlying record type. From there, you can access the fields of the record. Is there anything I can do about this? I've currently done this with types "Package/TempDeclPrivate_t" and "Types/TempFieldsPrivate_t", so are those not protecting me as I had desired? Well, the use of the "Private" type for indirection seems pointless - it does not prevent read access to the fields of the struct. However, I recently changed the code so that the public/ private attribute applies to struct fields as well as record fields, so you can't actually change any of the fields. Possibly I can make SkipOneName (and SkipNameAndExec) not skip over names that should not be visible in the current context. That is a bit expensive, I expect, since I'll have to check the active package versus the package of definition, then, if they differ, look the name up in the exports table of the defining package to see if the name is supposed to be visible or not. Even that is not enough, since code can examine the fields of type Type_t and do the skip itself. Even if I prevent any access to fields of record types that are outside of your package (I believe I discussed that earlier), I was going to end up with exported accessor procs for the fields of types Exec_t and Type_t, else you can't do a lot of things that I wanted to allow. For example, "Fmt" and "Display" would be not be possible. Having added the 'private' attribute to fields, I can now do this. I think I have to do something with SkipOneName and SkipNameAndExec however. <> Why? I don't currently see a need. I think I'm all done now, other than adding the ability to have 'private' Type_t and Exec_t alternatives. If those are done via a 'private' field (likely the only field, unless I decide to allow 'private' for variant fields), then they will be hidden, regardless of any name-skipping done. 091104/Wednesday Working on changing stuff to separate 'ro' and 'const'. 'const' will mean that the value cannot ever change. 'ro' will mean that the value can only change in certain circumstances. A field that is 'const' can only be set in the constructor. A variable that is 'const' can only be set by its initialization. A field that is 'ro' can only be changed by code in the package defining the type. A package variable that is 'ro' can only be changed in that package. A local variable that is 'ro' can only be changed by code in its scope, and when an '@' of it is passed to procs, it must be passed to 'ro' '@' formals. DONE The question of the need for explicit (now-) 'const' for all '@' formals and local variables came up again. Note that the iterator variable in a 'for' statement is implicitly 'const' - the 'const' is not there explicitly. This is inconsistent with requiring it to be explicit for '@' formals and locals. What I've seen is that the use of 'const' can make some proc headers quite long. I even saw a couple where the combination of the proc name and the parameter type and result made it so long that it didn't fit on a line! OK, lets go back to not requiring them. Sigh. DONE. OK, have removed them all. Have replaced 'ro' with 'const' for package and local variables, proc formals and in fields. Have replaced 'private' with 'ro' in fields. Need to implement new 'private' in fields. Also need to implement new 'ro' for local variables. DONE 091105/Thursday 'const' means that the value is initialized once and then can never be changed again, by any code (ignoring low-level code using cheats). 'ro' means that the value cannot be changed in some contexts, but probably can be changed in some other contexts. Test file roField.z does not get expected errors, and gets an extra one from Exec verification. RESOLVED Can't have capsule inside a generic, so there are some tests in "modifiable" and in FieldRefNew that are not needed. FIXED Still seems to be mixing up 'ro' and 'private' in new privField.z . FIXED Check up on the sf_notMine flag, now that there are 'ro' and 'private'. Were problems. Fixed and cleaned up. 091106/Friday The combination 'const' 'noInit' for a field seems pointless if there is no explicit constructor (as there isn't for a struct, record or union type). However, it *does* serve to reserve space in the compound type that cannot be touched. So, I shouldn't complain about it. [No error now.] 091107/Saturday <> Apparantly there are sometimes big chunks of memory that are written but never read back by the same process. One example given (comp.arch) is that of graphics images, and other stuff that will just be DMA-ed somewhere. Then, the buffers are written again with the next data set. So, is it worthwhile to have a 'wo' (write-only) tag that can be applied to pointers, '@'s, matrixes, variables, etc.?