Wednesday, 2013-10-30
The Proto Interpreter for J
After spending another evening or two I incorporated a bunch of corrections into a better write-up; you probably want to read that instead.
This comes from Appendix A
of An Implementation of J by Roger Hui via Eugene
Wallingford's post
at Knowing and
Doing. He commented that I think it will take me a week of hard
work to grok this code
and...well...that felt like a challenge to
see how fast I could work through the code (about two and a half hours,
it turns out). I wrote it up as I went along, so here's my attempt at
commentary.
Of course, I haven't tried actually using the language, so I can't really say I grok it. It does (amazingly enough) compile on a recent gcc C compiler, so I'll have to play around with it.
Here we go:
Everything is a single letter—digits 0-9, variable names (‘a’-‘z’), and operators (‘+’, ‘{’, ‘~’, ‘<’, ‘#’, and ‘,’). Operators can have both a binary (infix) and a unary version, though currently there are only five operators of each type (and six operator characters). '=' is used for assignment and is built into the interpreter.
The single data type is an array (supposed to be 0 to 3 dimensional AFAICT).
I suspect it's fairly useless as it has no arithmetic or logical operations other than addition (not even negation). And I don't see any way to define functions or procedures or whatever. (10/31: also no parenthesis or other mechanisms for grouping—it's strict left-to-right evaluation with no operator precedence, so you have to assign any partial results to a variable and then use them.) But it's a neat example of an interpreter.
Some abbreviations to support the terse style:
typedef char C;typedef long I;
#define P printf
#define R return
The Array data type:
- t - “Type”: Are elements pointers (1) or integers (0)?
- r - How many dimensions? (presumably 0-3). (Edit 10/31: Aha! ‘r’ is for rank. I knew there was a word that meant “number of dimensions” but I couldn't think of it.)
- d - Dimensions of the array.
- p - First two array elements (just for convenience in the C code).
typedef struct a{
I t,r,d[3],p[2];
}*A;
Function prototypes (in K&R style?) for unary and binary operators.
That's f(A w)
and f(A a, A w)
.
#define V1(f) A f(w)A w;
#define V2(f) A f(a,w)A a,w;
This is totally straightforward: do x
n
times.
#define DO(n,x) {I i=0,_n=(n);for(;i<_n;++i){x;}}
Allocate memory (assuming 4-byte integers), and copy n integers from destination to source (odd argument order...I'm guessing it mirrors the assembly-language for whatever his preferred CPU was...).
I *ma(n) { R(I*)malloc(n*4); }
mv(d,s,n) I *d,*s;{
DO(n,d[i]=s[i]);
}
Get the size of an array (number of integers), multiplying the
r
dimensions from d
. I have no idea why it's
called tr
.
tr(r,d)I *d;{
I z=1;
DO(r,z=z*d[i]);
R z;
}
“Get Array”? Allocate
a new A
from the r
dimensions at
d
, setting its t
, r
, and
d
parameters to the arguments given. Note that the
allocation size ignores the p[2]
field of the
A
structure.
Note also that tr(0,0) = 1
, so this always gives an
array with at least one value. It uses 0-dimensional values for scalars, so
this is important.
A ga(t,r,d)I *d;{
A z=(A)ma(5+tr(r,d));
z->t=t, z->r=r, mv(z->d,d,r);
R z;
}
V1(iota)
declares a function iota(A w)
.
This treats w
as a single integer and returns an array
whose values are the sequence 0..n-1
. Note that the
i
there is the loop index provided by the DO
macro.
V1(iota) {
I n=*w->p;
A z=ga(0,1,&n);
DO(n,z->p[i]=i);
R z;
}
Add two arrays element by element, allocating a new array for the result.
V2(plus) {
I r=w->r, *d=w->d, n=tr(r,d);
A z=ga(0,r,d);
DO(n, z->p[i]=a->p[i]+w->p[i]);
R z;
}
Remember the V2
prototype: this is from(a,
w)
. It treats a
as a single integer and returns
w[a]
. Basically it drops the first dimension of w and uses
the remaining dimensions to compute the size of the new array.
V2(from) {
I r=w->r-1, *d=w->d+1, n=tr(r,d);
A z=ga(w->t,r,d);
mv(z->p, w->p + (n * *a->p), n);
R z;
}
Return a single-element array containing a pointer to
w
V1(box) {
A z=ga(1,0,0);
*z->p=(I)w;
R z;
}
Concatenate a
and w
, as you would
expect. Note that it uses the t
from the second argument,
but you should presumably never use it on arrays with different types,
so it shouldn't be a problem.
V2(cat) {
I an=tr(a->r,a->d), wn=tr(w->r,w->d), n=an+wn;
A z=ga(w->t,1,&n);
mv(z->p,a->p,an);
mv(z->p+an,w->p,wn);
R z;
}
V2(find){}
The first parameter, a
, gives dimensions (a scalar or a
one-dimensional array) for a new array. The new array is initialized
from the contents of w
, which are repeated as necessary to
fill the space. Again, I don't understand the name...
Edit 10/31: rsh(a, w)
reshapes w
to the dimensions given in a
.
V2(rsh) {
I r=a->r?*a->d:1, n=tr(r,a->p), wn=tr(w->r,w->d);
A z=ga(w->t, r, a->p);
mv(z->p, w->p, wn=n>wn?wn:n);
if(n-=wn) mv(z->p+wn,z->p,n);
R z;
}
Get the dimensions (edit 10/31: that is, the shape) of an array.
V1(sha) {
A z=ga(0,1,&w->r);
mv(z->p,w->d,w->r);
R z;
}
The identity function, obviously.
V1(id) { R w; }
The size of an array (number of elements, i.e. the first dimension). This
gives 1 for a scalar (instead of 0), but should otherwise be the same as the
0th element of sha(w)
.
V1(size) {
A z=ga(0,0,0);
*z->p=w->r?*w->d:1;
R z;
}
Print integers, newlines, and arrays. Note that this is what tells
us the function of the t
field in the array header: if it
is set, we treat the values as arrays and print them recursively,
otherwise we print them as integers.
pi(i) { P("%d ",i); }
nl() { P("\n"); }
pr(w) A w; {
I r=w->r,*d=w->d, n=tr(r,d);
DO(r,pi(d[i]));
nl();
if(w->t) DO(n, P("< "); pr(w->p[i]))
else DO(n, pi(w->p[i]));
nl();
}
The character table (operator names) and the vtables. Each operator
has corresponding two- (vd
) and one-argument
(vm
) functions.
10/31: That's “verb table”, I think. I also believe that the function-pointer tables are for “ditransitive” and “monotransitive” verbs...?
C vt[]="+{~<#,";
A(*vd[])()={0,plus,from,find,0,rsh,cat},
(*vm[])()={0,id,size,iota,box,sha,0};
The state table holds the values of the variables (‘a’ through ‘z’)
I st[26];
Is the token a variable name? (edit 10/31: I think that he thinks of variables as pronouns, because they refer to a value (noun). So this is an abbreviation for “question-pronoun”?)
qp(a) { R a>='a' && a<='z'; }
Is the token an operator (vtable index)? (edit 10/31:
“question-verb”) This is an extremely sloppy check,
as we will then use the value to index into the vm
table (which
has 7 elements), but most of this code does no checking at all, so that's
ok.
qv(a) { R a<'a'; }
Take a tokenized string which represents an expression and execute it recursively. Returns an array result.
We split the string, giving the value
of the first token
and the tail
of the expression.
If value
is a variable name, then (if the next token is
‘=’) we set the variable to ex(tail)
, otherwise
we resolve the name to its value and continue.
Now, if the value is an operator, then do
vm[value](ex(tail))
. Otherwise, we look at the next token
(call it op
) and the new tail. If op
is null,
return value
, otherwise do vd[op](value,
ex(tail))
.
A ex(e) I *e; {
I a=*e;
if(qp(a)) {
if(e[1]=='=') R st[a-'a']=ex(e+2);
a=st[ a-'a'];
}
R qv(a) ?
(*vm[a])(ex(e+1)) :
e[1] ? (*vd[e[1]])(a,ex(e+2)) : (A)a;
}
When tokenizing a string, a noun is a scalar from 1 to 9 (0 means failure).
(10/31: oops, 0 is included, too. An integer 0 means failure, a
pointer to a scalar A containing 0 is fine.) A verb is a valid operator
character (as a 1-based index into the function tables). Other characters are
passed through unchanged.
noun(c) {
A z;
if(c<'0'||c>'9') R 0;
z=ga(0,0,0);
*z->p=c-'0';
R z;
}
verb(c) {
I i=0;
for(;vt[i];) if(vt[i++]==c) R i;
R 0;
}
I *wd(s)C *s; {
I a, n=strlen(s), *e=ma(n+1);
C c;
DO(n, e[i] = (a=noun(c=s[i])) ? a : (a=verb(c))?a:c );
e[n]=0;
R e;
}
Then main
is the obvious read/tokenize/execute/print
loop.
main() {
C s[99];
while(gets(s)) pr(ex(wd(s)));
}
--Josh Grams <josh@qualdan.com>