2 Compiling and installing
3.6 Artificial sweeteners
4.1 Evaluation rules
4.2 Intrinsic operators
4.3 Special forms
6 Abstract syntax definitions
6.1 Compiler back-end interfaces
in the examples/jolt directory of the idst release. If you want to exercise it, typemake
or runmake run
for an interactive prompt../main
Floating-point numbers consist of an optional sign, one or more digits, a decimal point, zero or more digits, and an optional exponent consisting of a single letter 'e' and an integer. Only decimal floating-point numbers are supported.
Note: Jolt does not currently support floating-point numbers.
\a 'bel' (ASCII code 7) \b backspace (ASCII code 8) \e escape (ASCII code 27) \f form feed (ASCII code 12) \n newline (ASCII code 10) \r carriage return (ASCII code 13) \t horizontal tab (ASCII code 9) \v vertical tab (ASCII code 11) \nnn octal character nnn (ASCII code 0nnn)
Interaction with (sending messages to) the objects that describe program structures and the compiler environment is supported by the following syntactic sugaring:
- (single quote) is equivalent to (quote element)
- (backquote) is equivalent to (quasiquote element)
- (comma) is equivalent to (unquote element)
- (comma followed by commercial at) is equivalent to (unquote-splicing element)
[ receiver selector [arguments...] ]is equivalent to
( send (quote selector ) receiver [arguments...] )(where no attempt is made to impose the correct number of arguments for the selector provided). Note that the syntactic sugar quotes the selector for you (if you need to compute the selector, use the send form directly). Note also that the send form and the [...] syntax have the receiver and selector in a different order.
For keyword selectors one more feature is provided:
[ receiver keyword1: argument1 keyword2: argument2 ... ]is equivalent to
( send (quote keyword1:keyword2:... ) receiver argument1 argument2 ... )with the individual keyword parts collected into a single selector. This feature makes unary, binary and keyword message syntax entirely compatible with traditional Smalltalk syntax.
Strings evaluate to a primitive (non-object) pointer to an array of characters terminated with a nul character.
Identifiers always represent an r-value (some physical location associated with the identifier during the execution of the code in which the identifier appears). When they appear outside special syntactic forms they evaluate to the associated l-value -- the value stored in the physical location associated with the identifier.
Lists (that are not special syntactic or semantic forms, as described below) represent the application of a function value to zero or more argument values. The form
( function [arguments...] )evaluates the function and arguments (if present) in an unspecified order. The result of evaluating function must be a function (native code) address that is called with the evaluated arguments.
Certain built-in operators behave like functions but are not evaluated as identifiers. (Their arguments are evaluated exactly as descried above.)
The intrinsic arithmetic and relational operators are as follows:
( + args... ) addition of two or more arguments ( - arg ) negation of a single argument ( - args... ) subtraction of two or more arguments ( * args... ) multiplication of two or more arguments ( / args... ) division of two or more arguments ( % args... ) remainder after division of two or more arguments ( < args... ) non-zero iff first argument less than second argument <= >= > ... idem for other (in)equalities ( == args... ) non-zero iff arguments identical ( != args... ) non-zero iff arguments not identical
Pointers behave like (long) integers. Pointer dereferencing is provided for the traditional machine types:
( char@ address [index] )evaluates to the integer (char, short, int or long, sized according to local platform conventions) stored at the given address in memory. If the optional index is present, it is first scaled by the size of the addressed value (typically by 1, 2, 4 or 8) and then added to the base address to yield the effective address for the operation. The related forms
( short@ address [index] )
( int@ address [index] )
( long@ address [index] )
( set-char@ address [index] value )write value into a location whose effective address is calculated as just described. (These 'setter' forms need not be written explicitly; see the set form described below.) The form as a whole yields value as its result.
( set-short@ address [index] value )
( set-int@ address [index] value )
( set-long@ address [index] value )
- ( and args... )
- evaluates each of the args from left to right until one of them yields zero; args to the right of the first zero value are not evaluated. The result of the overall form is non-zero iff all args yield non-zero.
- ( or args... )
- evaluates each of the args from left to right until one of them yields non-zero; args to the right of the first non-zero value are not evaluated. The result of the overall form is zero iff all args yield zero.
- ( if condition consequent [alternate] )
- evaluates the condition and then either the consequent (if the condition was non-zero) or the alternate (if present). The result of the overall form is the result of whichever of consquent and alternate is evaluated. If alternate is not present and condition is zero then the overall result is zero.
- ( while condition [args...] )
- while the condition evaluates to non-zero, all the args are evaluated in order, left to right. This process continues until condition evaluates to zero, causing the overall form to yield zero.
- ( let ( [bindings...] ) [args...] )
- evaluates each of the args in order, left to right, in an environment augmented by the result of evaluating the bindings. Each binding has the form( identifier value )where value is evaluated according to the usual rules before being bound to identifier during the evaluation of the args. The identifiers are added to the environment before evaluating and binding their values (which is a bug and will be fixed sometime in the near future). The overall result is the value of the last of the args; if no args are present then the overall value is zero.
- ( set identifier value )
- evaluates value and then assigns it to the location associated with the r-value identifier.
- ( set ( identifier args... ) value )
- does not cause any evaluation. It simply rewrites itself in the form( set-identifier args... value )and re-evaluates the resulting form. This permits any field or slot 'getter' to be converted into a corresponding 'setter' by enclosing the getter expression in a set form. For example, the expression( set ( long@ address ) 42 )is rewritten as( set-long@ address 42 )by the compiler and then immediately re-evaluated to effect the assignment.
- ( lambda ( [formal...] ) [expr...] )
- evaluates to the address of a function accepting zero or more formal arguments, where each formal must be an identifier.
When applied, the function value of a lambda expression evaluates each of its exprs in order, from left to right, in an environment in which each formal names a variable bound to a corresponding actual argument supplied in the function application. The value of the applied function is the value of the final expr; if no expr is present then the function returns zero.
Note: The evaluation of a lambda yields a function that can be applied; the exprs are not evaluated until the function result of lambda is applied to zero or more actual arguments as explained for the evaluation of lists above. A given function value can be applied any number of times and obeys the local platform's C ABI conventions; it can therefore be passed as a callback 'function pointer' to statically-compiled library and OS routines.
- ( return value )
- appearing in the body lambda function evaluates value and then immediately returns control to the function's caller yielding value as the result of the function application.
- ( define identifier value )
- evaluates value, creates a variable binding for identifier in the current scope, and assigns the value to the variable. If the form appears as a top-level expression then the binding is created in the current global environment. If the form appears within a let or lambda expression then the binding is created as a local (automatic) variable within that expression.
- ( syntax identifier function )
- is similar to define except that the function value must evaluate to an applicable function address. The identifier names a new abstract syntax tree node type with the function providing the compilation rules for any node of the newly-created type. (The scoping rules for the binding are identical to those of define). See the section below on abstract syntax definitions for further explanation of the function argument.
- ( quote element )
- evaluates to an object corresponding to the compiler's internal representation for element. (This object can be incorporated into abstract syntax trees passed to the compiler for evaluation and compilation.)
- ( send message receiver [args...] )
- evaluates message, receiver and args (if present). The message must evaluate to an identifier and receiver to a compiler object. The message identifier is then sent as a message to the receiver object along with any args supplied. The overall value of the form is the (object) answered by the message send.
( _dlsym handle symbol )evaluates symbol to yield a string and handle to yield an integer representing a library or process namespace. The primitive returns the address of the code or data location associated with the symbol in the namespace.
Which namespaces are searched depends on the handle parameter. If handle is zero then every namespace associated with the process and its dynamic libraries is searched in the order they were loaded. If handle is associated with a library opened with the library function dlopen then only that library (and any libraries it depends on) are searched for the symbol.
The _dlsym primitive returns zero if symbol cannot be found, and sets an error condition which may be queried with the dlerror library function.
Note: the library functions dlopen and dlerror must be imported explicitly. For example:(define _dlopen (_dlsym 0 "dlopen")) (define _dlerror (_dlsym 0 "dlerror")) (define printf (_dlsym 0 "printf")) (define exit (_dlsym 0 "exit")) (syntax begin (lambda (form) `(let () ,@[form copyFrom: '1]))) (define dlopen (lambda (name) (let ((handle (_dlopen name 0))) (if handle handle (begin (printf "Error: dlopen: %s\n" (dlerror)) (exit 1))))))
where type is the name of the defined node type and action... are statements/expressions controlling the compilation of the node.(syntax type (lambda (node compiler) action...))
The actions are responsible for compiling the node. Three mechanisms are available to the actions:
defines a begin node type providing 'compound expressions' that evaluate several expressions in order and yield the value of their final expression. The node is rewritten as a let and returned to the compiler for re-evaluation.(syntax begin (lambda (node compiler) `(let () ,@[node copyFrom: '1])))
The node is sent copyFrom: to extract all but its first element (removing the begin from the start of the node) and the ,@ splices the result into a quasiquoted let expression. The argument to copyFrom: must be an object (not a primitive value) hence the quoted integer.
(where compiler is the second argument passed to the syntax function).[anObject compile: compiler]
The vpu implements an abstract machine supporting a superset of C semantics. Three stacks are used.(syntax foo (lambda (node compiler) (let ((vpu [compiler vpu])) ...)))
Function prologue and epilogue
- [ vpu new ]
- creates a new instance of a vpu. (This is normally not of interest, since the vpu passed to the syntax function indirectly through the compiler argument will be the correct instance to use for code generation.)
- [ vpu iTmp: numTemps ]
- [ vpu iTmp ]
- pushes numTemps integer-valued temporary variables onto the variable stack. In the second form, the missing numTemps is assumed to be one.
- [ vpu dropTmp: numTemps ]
- removes the topmost numTemps variables from the variable stack.
- [ vpu begin: numLabels ]
- pushes numLabels local labels onto the label stack.
- [ vpu end: numLabels ]
- removes the topmost numLabels local labels from the label stack.
- [ vpu define: n ]
- defines the nth local label (relative to the topmost) in the label stack to refer to the current 'program counter' location. A local label must not be defined more than once. A local label can be left undefined provided it is never used as the target of a branch instruction.
- [ vpu iEnter ]
- declares the prologue for an integer-valued function. This must be the first (non label defining) instruction in a function.
- [ vpu iArg: numArgs ]
- [ vpu iArg ]
- declares numArgs integer-valued arguments. Arguments must be declared immediately after the iEnter prologue and before any other instructions in the function body. In the second form, the missing numArgs is assumed to be one.
- [ vpu ret ]
- returns the topmost item on the stack as the result of the function being compiled. A ret always pops one item from the stack (the value returned), can occur any number of times in the body of a function, but must also be the last instruction in the function.
Arithmetic and relational operators
- [ vpu ldInt: integer ]
- pushes integer (which must be an integer object value) onto the arithmetic stack.
- [ vpu ldPtr: object ]
- pushes the address of the given object onto the arithmetic stack.
- [ vpu ldArg: n ]
- pushes the nth argument of the function being compiled onto the arithmetic stack. Arguments are numbered from zero.
- [ vpu stArg: n ]
- stores the topmost item in the arithmetic stack into the nth argument of the function being compiled. The item is not removed from the stack.
- [ vpu ldTmp: n ]
- pushes the contents of the nth local variable (relative to the top of the variable stack) onto the arithmetic stack.
- [ vpu stTmp: n ]
- stores the topmost item in the arithmetic stack into the nth local variable. The item is not removed from the stack.
- [ vpu dup: n ]
- [ vpu dup ]
- pushes a copy of the nth item in the arithmetic stack (relative to the top of the stack before the value is pushed) onto the top of the arithmetic stack. In the second form, n is implicitly zero (i.e., the topmost item in the stack is duplicated).
- [ vpu drop: n ]
- [ vpu drop ]
- pops the topmost n items off the arithmetic stack. In the second form, n is implicitly one.
- [ vpu add ]
- [ vpu sub ]
- [ vpu mul ]
- [ vpu div ]
- [ vpu mod ]
- replaces the topmost two items on the arithmetic stack with their sum, difference, product, quotient or remainder (respectively).
- [ vpu lt ]
- [ vpu le ]
- [ vpu eq ]
- [ vpu ne ]
- [ vpu ge ]
- [ vpu gt ]
- replaces the topmost two items on the arithmetic stack with zero or one according to whether the topmost item is less than, less or equal to, equal to, not equal to, greater or equal to, or greater than (respectively) the second item.
- [ vpu rdb ]
- [ vpu rdh ]
- [ vpu rdw ]
- replaces the topmost item on the arithmetic stack with the byte, half word or word (respectively) at that address in memory.
- [ vpu wrb ]
- [ vpu wrh ]
- [ vpu wrw ]
- writes the second item on the arithmetic stack into the memory byte, half word or word (respectively) addressed by the topmost item in the stack. The topmost item is then popped from the arithmetic stack.
Compilation and miscellaneous information
- [ vpu br: n ]
- transfers program control to the instruction immediately following the point of definition of the nth local label (relative to the top of the local label stack).
- [ vpu bf: n ]
- [ vpu bt: n ]
- pops the topmost item off the arithmetic stack and conditionally transfers program control to the instruction immediately following the point of definition of the nth local label (relative to the top of the local label stack). The bf instruction transfers control to the label's address only if the topmost item popped from the stack was zero. The bt instruction transfers control to the label's address only if the topmost item popped from the stack was non-zero.
- [ vpu iCall: n label: label]
- [ vpu iCalli: n ]
- pops n actual arguments from the arithmetic stack and calls an integer-valued function, leaving the result of the function call on the top of the arithmetic stack. In the first form the address to call is retrieved from a global label (see the section on global labels below). In the second form the address to call is popped off the arithmetic stack before the arguments (i.e., the arguments are pushed followed by the address to call in preparation for an iCalli instruction). Arguments are pushed in 'reverse' order, starting with the last and working towards the first.
- [ vpu compile ]
- compiles an entire function, as described by the sequence of message sends previously sent to the receiver, into native code. This message must never be sent more than once to a given instance of the vpu.
- [ vpu codeSize ]
- answers an integer object representing the total number of native bytes compiled by any vpu in the current process since the process began executing.
- [ vpu versionString ]
- answers a string object containing a human-readable version string for the vpu library linked into the running process.
(See also the iCall:label: instruction described earlier.) Once defined, the function address associated with a global label can be retrieved or invoked as follows:
- [ VPULabel new ]
- answers a new, uninitialised global label.
- [ vpuLabel lookup: symbol]
- defines the receiver as the address associated with the code or data object named by symbol (a string object) looked up in the process global namespace via _dlsym (see the section on primitives).
- [ vpu defineLabel: label]
- defines the given global label to refer to the address of the next instruction generated in the receiver.
- [ vpuLabel _labelAddress ]
- answers a primitive pointer equal to the defined address of the receiver.
- [ vpuLabel value ]
- answers the result of calling the defined address of the receiver as a function with no arguments.
- [ vpuLabel value: argument ]
- answers the result of calling the defined address of the receiver as a function with a single argument.