Thursday, February 7, 2019

Ontology of JVM instructions

The JVM opcodes by function helps users to better understand the opcodes of the Java virtual machine (JVM) by category. But this classification has its limitations, its a tree and therefore it doesn't include all the instruction types that may useful to the compiler developer. Additionally, there is a category of miscellaneous operations and array length is categorized as an object function rather then an array function. At the same time, instanceof and checkcast are object operations even though they can be applied to arrays. It is clear that there is a general type of references, and certain instructions are applicable to both types of references. The only functions that are truly reserved for object references are the instance field functions. The static field functions are not really related to references, and should therefore be grouped under atomic variables with the local variables.



In order to enable stack compatibility I decided to separately classify multi-valued instructions and uniquely-valued instructions. The only multi valued instructions are the duplication and swap instructions. The uniquely valued instructions have the same calling convention as methods, so they can be grouped together with them. In this sense, this classification of instructions is ultimately stack based and it deals with the fact that the JVM is a stack machine. Here is an alternate view of a hierarchical part of this ontology.
Uniquely valued stack instructions
    Constant push instructions
    Trivial procedures
Atomic variable instructions 
Reference allocation instructions (these return references)
    New, newarray, anewarray, multianewarray
Reference operations (these take references as arguments)
    Reference procedures (zero valued)
        Reference variable modification
        Unary reference procedures 
            throw, monitorenter, monitorexit
    Reference transformations (single valued)
        Reference variable access
        Unary reference operations 
            getfield, arraylength, reference type checkers
Primitive transformations (these take and return primitives)
    Unary primitive transformations
        Cast instructions
        Neg
    Binary primitive transformations
        Binary arithmetic instructions
        Logical instructions
        Comparison instructions
The atomic variables include both the local variables and the static variables, they are characterized by the fact that they do not require a reference as an argument to access them. The class of value push instructions, which are nullary single valued instructions, includes all the constant push operations, local variable access operations, and the class variable access operations. The class of value push instructions can be used to construct the atomic expressions in a Lisp dialect, like Clojure or Lisp flavoured Java. In this way, when an atomic expression like 2, 2.2, x, or class/name appears in the code then they are automatically converted to value push instructions. This is part of the correspondence between Lisp and a stack machine. The atomic expressions in Lisp correspond to their own instruction class (the value push instructions) and the combiners correspond to their own instruction classes as well separately.

In order to make Lisp correspond to the stack machine you only need to make it so that atomic expressions correspond to certain value push instructions, and the combiner forms correspond to certain uniquely valued instructions. All Lisp programs consist of these two types of components, just as a stack machine consistent of these different types of instruction classes, which makes it so effective to construct a correspondence between them. The only other nullary single valued instruction is a static method call that takes no arguments and returns something, basically a constant function or an object reference allocation. By convention, the constant function should be a call to a function like (Class/function) rather then appearing as an atomic term, so that effectively deals with the problem of atomic expressions.

Generic instructions:
When it comes to combiners on the other hand, in the JVM there are different combiners for different data types of arguments presented to the instruction on the stack. In order to make generic instructions available, it is useful to be able to have instruction classes corresponding to the different versions of an instruction that takes different types. For example, the add class could include iadd, ladd, fadd, and dadd instructions. Then when a generic combiner is presented to the Lisp flavoured Java developer, it is immediately known that it will produce some member of the generic instruction class dependent upon the types of the arguments given to it. This need for generic instruction classes, is especially important because of the typed nature of the JVM.

Generalized variable instructions:
The JVM actually has two types of variables avaiable to it: atomic variables and reference variables. The atomic variables do not take any arguments on the stack to get their value or any extra ones to set their value. The atomic variables can therefore be accessed as atomic expressions like in Lisp, hence their name. The atomic variables are the local variables and the class variables. The reference variables are the array variables as described by the array store and array load operations and the instance variables which are the fields of some object reference. Each of these different variable types have both the getters and accessors on them, so they can be both accessed and modified. Generalized variables correspond to the l-values in the Java programming language. By using the generalized variable instruction class, we can better understand how the setf operation can be implemented by the compiler.

No comments:

Post a Comment