Tuesday, February 12, 2019

CLR instruction classes

The classification of instructions provided by the CIL instruction set specification describes base instructions as well as object instructions, but a much better classification of the different types of CIL instructions was provided by Serge Lindin in his publications on the IL assembler. I modified the classification of CIL instructions a little bit to note some regularities of its opcodes. It can be noticed that the upper ontology of the CLR is relatively similar to the JVM because both of them have similar instruction classes at the highest level as they are both high level stack machines, but the differences between them come in the different instructions belonging to different instruction classes. Some of the differences of these instructions belonging to the six different types of instruction classes will be examined below, which can be used to make a full ontology of opcodes of the instruction set.

Stack operations:
The stack manipulation operations are relatively similar to their JVM counterparts. There are two types of stack manipulation operations, the operations that deal with stack manipulation and constant pushing. The stack manipulation operations are nop, pop, and dup. There are fewer stack manipulation operations in the CLR then in the JVM, there doesn't appear to be any swap operation and there are fewer forms of dup, but that is okay. The constant pushing operations are also very similar to the JVM, they mainly deal with pushing i4, i8, r4 and r8 values onto the stack as well as pushing strings and null.

Generalized local variables:
I describe these operations as dealing with generalized local variables because unlike the JVM the CLR has different types for arguments and local variables, which are both stored together in the JVM. In order to convert a JVM instruction to the CLR, you need to change the local variables in the arrays into argument variables and then store the remaining variables as local variables. The only other difference worth mentioning is that unlike the JVM which is a 32-bit machine, and which requires you to store 64-bit values with 2 locals, the CLR seems to be more generic. You can also get the addresses of variables with certain instructions. Along with static fields, these make up the atomic variables which along with constants can be pushed onto the stack directly with push instructions.

Transformations
The CLR has all the operations you need to deal with arithmetic, logic, conversion and related operations. One thing to note is that the operations in the CLR are generic, so you don't need to put the type of each instruction at its prefix. In place of comparison operations, the CLR has logical condition check operations, which allows you to check if two values are equal, greater, or less then one another without having to use control flow operations to do it. The only other thing worth noting is that the CLR has operations dealing with pointers like direct and indirect access operations as well as operations dealing with blocks, that do not appear in the JVM.

Vector operations
The CLR has all the vector operations you need in order to create vectors, like creating vectors, checking their length, and getting and storing the elements of them. As the values of vectors can be modified at any index, a system of generalized variable operations should include the vector operations along with static fields, instance fields, local variables, arguments, as types of variables with their own operations.

Classes and value types
All of these operations are object model instructions in the specification of the CLR instruction set. There are instructions for allocation, type checking, casting, throwing exceptions, and so on. One significant difference is that the CLR has a system of value types as well as reference types, rather then simply having only reference types and primitives. As a result, there are box and unbox operations built in at the instruction level, as well as other operations like sizeof twhich gets the size of some value type, and cpobj which copies value types. This is a significant difference, because the CLR has its own unique object model.

Control flow operations
The control flow operations are actually surprisingly similar to the JVM, you have unconditional branching instructions, unary conditional branching instructions, comparative branching instructions, and switch instructions as jump operations all of which appear in the JVM instruction set ontology. There is also the ret instruction which doesn't need to be prefixed by any type, because the operations in the CLR tend to be generic. There are also special operations for dealing with exception handling. The greatest difference appears to be in the operations that deal with method calls, like the tail prefix which allows for tail call optimization at the instruction level.

No comments:

Post a Comment