Friday, January 7, 2022

Clojure java interop

The Java virtual machine is a general building material for programs, not necessarily tied to any language or programming paradigm. All that matters is that you can produce JVM bytecode. Since it is all the same in the end, it is worth asking what advantages the Java language has for generating JVM bytecode.
  • Support for variable names and distinguished argument lists.
  • Type deduction in opcode generation. In particular, you don't have to distinguish between iadd, ladd, fadd, and dadd when adding two numbers. Likewise, for the other arithmetic operations, loading and storing values in arrays, returning values, casting, etc. By the same token, you don't have to fully specify method signatures.
  • The import statement saves you from always having to fully specify java class names. There is no corresponding import opcode for the Java virtual machine, so this is purely a Java language feature.
  • The Java language doesn't force you to have to distinguish between different opcodes when loading constants. The Java virtual machine supports the different opcodes to make more compact bytecode. Any high level language (including the ASM bytecode library) should handle this automatically.
  • Java provides control flow constructs in the place of goto and conditional jumps. Part of this is the uniform condition system, which prevents you from having to determine which conditional jump opcode to use.
  • A uniform call syntax so you don't have to distinguish between invokestatic, invokevirtual, invokeinterface, and invokespecial.
  • Built in support for l-values and generalized place forms, so you can use a general assignment form on local variables, array indices, and static and instance fields. You can even set values in multidimensional arrays and the Java compiler will produce the correct combination of opcodes for you for that.
  • Last but not least, the Java language automatically handles arguments on the stack for you. This can be useful for example when dealing with mathematical expressions.
The Java language does a lot when you put it like that, but this by no means that Java virtual machine bytecode is hard to use. The Java virtual machine is still a high level architecture which is easier to use than any C-machine. Hopefully this explains why you might use the Java language to generate JVM bytecode, but it is all the same if you want to use something else or even write you or own compiler as I have.

Setting up a mixed project:
So in order to set up a mixed Java and Clojure project I suggest using Intellij IDEA. Intellij is the only Java focused IDE that also has good support for Clojure. Then you just need to create separate Java and Clojure folders and configure Leiningen to specify your Clojure folder in :source-paths and your Java folder in :java-source-paths.

Creating a Java class:
In order to create a first example of Java and Clojure interop, I have chosen the special case of defining a prime number sieve. Obviously, you could easily do this in Clojure, but perhaps some mathematical functionality should be written in Java so that they are more performant.
import java.util.BitSet;

public class NumberUtilities {

    public static int[] sieve(int n) {

        BitSet primes = new BitSet(n+1);
        primes.flip(2, n+1);

        for(int p = 2; p*p <= n; p++) {
            if(primes.get(p)) {
                for(int i = p*p; i <= n; i += p) {
                    primes.set(i, false);
                }
            }
        }

        return primes.stream().toArray();
    }

}
I mentioned that the Java language is just a tool for generating Java virtual machine bytecodes. Its kind of like an M-expression syntax for the JVM, and Lisp Flavoured Java is an S-expression syntax. Clojure is a different beast entirely from either of them. For the purposes of this demonstration, lets examine the output bytecode as it appears with Jasmin.
.class public NumberUtilities

.method public static sieve(I)[I
    .limit stack 4
    .limit locals 4

; initialize the bit set
    new java/util/BitSet
    dup
    iload_0
    iconst_1
    iadd
    invokespecial java/util/BitSet.(I)V

; flip the possible primes to true
    astore_1
    aload_1
    iconst_2
    iload_0
    iconst_1
    iadd
    invokevirtual java/util/BitSet.flip(II)V

; initial the first prime to two
    iconst_2
    istore_2

; start a loop in order to do the sieve on the main bit set
loop:
    iload_2
    iload_2
    imul
    iload_0
    if_icmpgt loop_breakpoint

; ensure that this number is a prime before starting the inner loop
    aload_1
    iload_2
    invokevirtual java/util/Bitset.get(I)Z
    ifeq inner_loop_breakpoint

; initialize the current multiple to the first non flipped index
    iload_2
    iload_2
    imul
    istore_3

; flip all multiples of the current prime to false
inner_loop:
    iload_3
    iload_0
    if_icmpgt inner_loop_breakpoint

; set the current index to false
    aload_0
    iload_3
    iconst_0
    invokevirtual java/util/BitSet.set(IZ)V

    iload_3
    iload_2
    iadd
    istore_3
    goto inner_loop

inner_loop_breakpoint:

    iinc 2,1
    goto loop
loop_breakpoint:

    ; convert the bitset into an int array containing all true indices and return
    aload_1
    invokevirtual java/util/BitSet.stream()Ljava/util/stream/IntStream;
    invokeinterface java/util/stream/IntStream.toArray()[I
    areturn
.end method
The advantages of the Java language can clearly be seen by comparing the Java language code to the Java virtual machine bytecode. Whenever someone says that Java is verbose, I just remember how much typing it saves from having to write JVM bytecode in hand, which I have a done a lot. Too much.

A notable aspect of this is how the compiler structures the output of for loops. There is quite a lot to unpack when using a for loop, and its logic appears all over the place in the compiled output. That is why some parts of the for loop appear before the loop starts, at the start of the loop, and at the end. Once you unpack all of that it is fairly easy to see how Java code corresponds to bytecode. In that sense, Java is one of the easier languages to understand in terms of its compiler output.

Calling Java functions from Clojure
All the countless hours spent reading the documentation of the Java virtual machine, the Java language, and the thousands of classes in the Java standard library are finally rewarded by using Clojure, which has seamless Java interop.
(prn (seq (NumberUtilities/sieve 1000)))
The execution of the sieve function written in Java produces the first prime numbers up to a thousand, which confirms our memory of the smallest primes.
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 
71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 
149 151 157 163 167 173 179 181 191 193 197 199 211 223 
227 229 233 239 241 251 257 263 269 271 277 281 283 293 
307 311 313 317 331 337 347 349 353 359 367 373 379 383 
389 397 401 409 419 421 431 433 439 443 449 457 461 463 
467 479 487 491 499 503 509 521 523 541 547 557 563 569 
571 577 587 593 599 601 607 613 617 619 631 641 643 647 
653 659 661 673 677 683 691 701 709 719 727 733 739 743 
751 757 761 769 773 787 797 809 811 821 823 827 829 839 
853 857 859 863 877 881 883 887 907 911 919 929 937 941 
947 953 967 971 977 983 991 997)
Clojure is able to call the Java sieve function because they both speak the same basic language: Java virtual machine bytecode. We saw the JVM output of the Java class. The corresponding Clojure class produces the same sort of opcodes, starting with a call with invokestatic to call the sieve function. The only difference is that Clojure might have to use reflection if not enough type information is provided to it.

In this case that isn't necessary because NumberUtilities doesn't use method overloading, but in general the most important performance benefit you can add to your Clojure code is to use type hints so you don't have to use reflection to determine method signatures at runtime. Finally, after calling sieve we use seq in order to print it, because sequences produce better output when converted to Strings. A more Java like solution would be to use java.util.Arrays/toString.

The Java language and Clojure perfectly complement each other, because Clojure isn't just another copy of Java. Java is static, imperative, heteroiconic, etc while Clojure is dynamic, functional, and homoiconic. The fact that two languages that are so different from one another can come together is the ultimate testament to the power of the Java virtual machine.

No comments:

Post a Comment