Sunday, April 24, 2022

Compilation of Java operators

When implementing a language on the JVM, inevitably you are going to want to implement a similar set of basic operators as those provided by the Java language. Most of these Java operators correspond one-to-one with their JVM counterparts, so in this post I will focus only on the interesting tidbits.

Addition

The addition operator is one interesting case where Java doesn't directly correspond to its underlying bytecode. In the case when you are adding two integers it does, but for Strings it produces a different command using invokedynamic instead.
class ArithmeticOperators {
	
	// special behaviour of addition
	public static int add(int n, int m) {
		return n + m;
	}
	
	public static String add(String n, String m) {
		return n + m;
	}

}
We are then going to be looking at compiled bytecode like this:
  public static int add(int, int);
    Code:
       0: iload_0
       1: iload_1
       2: iadd
       3: ireturn

  public static java.lang.String add(java.lang.String, java.lang.String);
    Code:
       0: aload_0
       1: aload_1
       2: invokedynamic #7,  0              // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
       7: areturn
This demonstrates that whilst addition operates normally on integers, it produces a special invokedynamic instruction on Strings using the java.lang.invoke.StringConcatFactory. This is a special change of Java 9 because before that it used StringBuilders as you would expect.

Bitwise not

There is no bitwise not operator on the JVM, so that is another case where there is a slight difference between language and bytecode.
class BitwiseOperators { 
	public static int bitwiseCompliment(int n) {
		return ~n;
	}   
}
Of course, we can get around this by using bitwise not with negative one which happens to be the largest bit set in the twos complement representation of integers.
  public static int bitwiseCompliment(int);
    Code:
       0: iload_0
       1: iconst_m1
       2: ixor
       3: ireturn
So thats pretty easy to get around using the bitwise xor operator. All the other bitwise operators like &, |, ^ all compile directly to their corresponding JVM opcodes: and, or, xor, etc. The only difference is that the static type information determines the exact opcode for the JVM type system.

Logical operators

There are no logical operators in the JVM instruction set. This is very easy to get around using conditional jump instructions like this.
class LogicalOperators {
	public static boolean logicalAnd(boolean x, boolean y) {
		return x && y;
	}
	
	public static boolean logicalOr(boolean x, boolean y) {
		return x || y;
	}
	
	public static boolean logicalNot(boolean x) {
		return !x;
	}
}
The ifeq operator in fact only checks if the top value on the stack is zero. So to implement logicalAnd we simply perform two checks on each value to see if they are zero, and in either case we return false and otherwise we return one. The logical or does the same thing but it checks for truth instead. The conditional jumps implement short-circuiting evaluation.
  public static boolean logicalAnd(boolean, boolean);
    Code:
       0: iload_0
       1: ifeq          12
       4: iload_1
       5: ifeq          12
       8: iconst_1
       9: goto          13
      12: iconst_0
      13: ireturn

  public static boolean logicalOr(boolean, boolean);
    Code:
       0: iload_0
       1: ifne          8
       4: iload_1
       5: ifeq          12
       8: iconst_1
       9: goto          13
      12: iconst_0
      13: ireturn

  public static boolean logicalNot(boolean);
    Code:
       0: iload_0
       1: ifne          8
       4: iconst_1
       5: goto          9
       8: iconst_0
       9: ireturn
So we can get around the lack of dedicated JVM opcodes for logical operators by using conditional jumps. The basic point is that the operators && and || are short circuiting so they need to be implemented using conditional jumps.

Assignment operators

The assignment operator = in the Java programming language is not as simple as you would think because it takes into account lvalues. Assignment operators can be compiled to putfield,putstatic,store, or astore depending upon the context.
import java.awt.Point;

class AssignmentOperators { 
	public static void assignment(Point[][] coll) {
		coll[0][0].x = 10;
	}
}
This modifies the value of the x field in the Point class by first getting the place in the array in which it is stored using aaload.
  public static void assignment(java.awt.Point[][]);
    Code:
       0: aload_0
       1: iconst_0
       2: aaload
       3: iconst_0
       4: aaload
       5: bipush        10
       7: putfield      #7                  // Field java/awt/Point.x:I
      10: return
So this demonstrates the lvalue support in the javac language compiler, which goes a long way to making Java as nice as it is to use. You have a unified interface which saves you from having to worry about the differences between global variables, local variables, array locations, and instance fields.

Relational operators

Instead of relational operators, the JVM has conditional jump opcodes like ifeq, ifne, ifgt, ifge, ifle, and iflt. There is a pretty straight forward translation from the relational operators to their JVM conditional jump instructions.

References:
String Concatenation with Invoke Dynamic

No comments:

Post a Comment