Wednesday, December 5, 2018

Lisp flavoured java

The problem of how to make a thin layer over the Java virtual machine instruction set is addressed here. Lisp programs are built up two components: atomic expressions and compound forms. As the Java virtual machine is a stack machine, the thin layer over it will be constructed by making it so that atomic expressions correspond to push instructions and compound forms correspond to everything else. In this way, the Java virtual machine can be made isomorphic to a Lisp dialect. The structure of the Java virtual machine described in the JVM specification further describes the instruction set of the JVM and how it handles types with prefixes with type prefixes like Top which describes how instructions are provided as combiners.

Atomic expressions:
The Atomic expressions in the Java virtual machine come in two forms: constants and variables. The atomic expressions in Lisp correspond directly to the Atomic instructions in the Java virtual machine. In this way, the Lisp flavored java is orthogonal to the Java virtual machine's high level instruction set. Here are some examples of constant expressions:
10
2.2
"foo"
Integer like expressions are read as integers rather then as longs by default, just like the Java programming language, though unlike Clojure which assumes you are using a long by default. The use of integers by default is necessary for any kind of low level programming on the Java virtual machine, as ints are used for array access, array length, etc and they have the most specialized instructions associated with them, so especially for performance it is necessary to use integers by default. Float like expressions are read as doubles just like Java again, as there is no need to change that standard. So literals are like what you would expect from Java. The atomic expressions are directly converted one to one to push instructions and the appropriate push instructions that save the most space are determined for you.
bipush 10
ldc2_w 2.2
ldc "foo"
Variables are different from constants in that they can have a type associated with them. Consequently, type declarations can be associated with variables. One of the first thing one notices when writing JVM bytecode by hand, is that it is hard to keep track of local variables without the use of names, so certainly is nice to be able to use symbols instead that refer to local variables.
(type I a)
(type J b)
a
b
The appropriate push instruction is determined for you based upon the type of the local variable referred to be the symbolic expression. The atomic expression then corresponds one to one with a push instruction in the machine code.
iload_0
lload_1
The other type of variables besides local variables is class variables. The class variables can be designated using a slash in a symbol.
java.lang.Math/PI
java.lang.System/out
The class variables are then converted to get static expressions. Since variables can have types, class variables can have types as well. The types of class variables is determined by reflection rather then by type declarations.
getstatic java/lang/Math.PI D
getstatic java/lang/System.out Ljava/io/PrintStream;
This demonstrates how different atomic expressions like constants, local variables, and class variables in the language correspond directly to push instructions in the Java virtual machine, making the language a thin layer over the Java instruction set.

Compound forms:
Compound forms are constructed by combining atomic expressions with a combiner. The definition of combiners is determined by their types as described by the Java virtual machine specification. The appropriate typed version of an instruction is then determined at compile time. Consider the operation Tneg which can come in the forms ineg and lneg. The Tneg combiner is compiled to the appropriate instruction based upon the type of the argument passed to it. The standard means of defining these typed combiners is to simply remove the T prefix in front of it. So Tneg is simply neg.
(neg 1)
The neg operation is then compiled to its corresponding combiner instruction well the atomic expression one is compiled to its appropriate push instruction and these are then added to the machine code.
iconst_1
ineg
The same is true for the Trem operation which is simply reduced to rem when it appears in a compound form.
(rem 10 2)
When an operation like rem is applied to multiple arguments the order the arguments appear in is preserved by the stack machine. In many ways the nature of Lisp as a means of constructing programs from ordered lists is especially suited for programming with a stack machine.
bipush 10
iconst_2
irem
In order to make the programming easier, for the other arithmetic operations + corresponds to Tadd, * corresponds to Tmul, - corresponds to Tsub, and / corresponds to Tdiv. These are merely aliases for these fundamentally typed operations.
(type I x)
(/ (* x (+ x 1)) 2)
These expressions can be nested, and then their arguments will be pushed onto the stack in order and then evaluated accordingly. In this way, we can see how this Lisp dialect directly corresponds to the JVM instruction set.
iload_0
iload_0
iconst_1
iadd
imul
iconst_2
idiv
Logical operations like shl,shr,ushr,and,or,xor are similarly provided directly to the programmer accordingly and they are compiled to their typed versions. Casts can be expressed as T2i, T2l, T2f, T2d, T2s, T2c, and T2b. These convert an operand of some type to some other type. The ordinary standard of expressing these operands by removing the type in the front would leave a combiner with a name starting with a letter, so I decide that these can aliased by their cast names int is T2i, long is T2l, float is T2f, double is T2d, short is T2s, char is T2c, and byte is T2b. This basically describes how the data transformations in the JVM can be converted directly to a Lisp like syntax in an effective manner. The next detail is to determine how to deal with arrays. Elements of arrays can be loaded with aload as you would expect.
(type I i)
(type (arr I) coll)
(+ (aload coll i) (aload coll (+ i 1)))
This expression and all of its components are then converted directly to the following bytecode:
aload_0
iload_1
iaload
aload_0
iload_1
iconst_1
iadd
iaload
iadd
Arraylength is an interesting case unlike the other ones described so far as it doesn't have any special type information associated with it, because it simply applies to arrays every time.
(type (arr I) coll)
(arraylength coll)
This is one is so simple that you hardly even need a compiler to deal with it, it can be converted directly to its associated bytecode.
aload_0
arraylength
You can do a getfield operation much like you would in Clojure by specifying the field name in the combiner.
(type java.awt.Point point)
(.-x point)
The owner of the field is determined by the argument passed to the field accessor, and then the type of the field is determined by reflection which then is used to output the appropriate getfield argument.
aload_0
getfield java/awt/Point.x I
So all of the atomic expressions and combiners so far have directly correspond to instructions in the Java virtual machine instruction set with their operands pushed onto the stack. You can make instructions that have instruction parameters by using extended prefix notation. Extended prefix notation means that the instruction parameters are passed in the front next to the combiner before the operands to be pushed onto the stack are specified.
(type java.lang.Object obj)
(newarray I 9)
(multianewarray I 9 9)
(instanceof java.lang.String obj)
These operations like new, newarray, multianewarray, instanceof, and checkcast correspond directly to their corresponding instructions except they must have some instruction parameter passed to it in the front.
bipush 9
newarray int
bipush 9
bipush 9
multianewarray [[I 2
aload_0
instanceof java/lang/String
This describes how all the different combiners and atomic expressions can be made to correspond to machine instructions, except in the case in which their is an extended prefix notation which means that the programmer will have to pass an argument in the front to define the instruction before the following stack operands. This deals with most of the data operations of the Java virtual machine. Miscellaneous other no operand instructions like nop, pop, athrow, monitorenter, and monitorexit simply correspond directly to their machine instructions. Pop is simply a function which takes its argument pushes it onto the stack and then returns nothing. Variable modification and control flow will be dealt with later.

No comments:

Post a Comment