jzbrooks

Bytecode for Android Developers, A Gentle Introduction

Computers are complicated systems built on decades of abstraction. As software craftsmen, it is a worthwhile endeavor to understand these abstractions. Pursuing an understanding of the details of systems being programmed helps us more deeply appreciate our craft—propelling it forward.

Low(ish)-level code gets a bad rap. It can sometimes be hard to follow, but it’s worth a little effort to understand the flow of information in our programs a little more deeply. Why? First and foremost, understanding the machine being programed makes us better programmers even when using high-level languages. It’s a closer mapping to what the underlying machine is being programmed to do, and understanding it is another tool in the toolbelt that helps us write faster, more reliable programs. Debugging a memory leak? Understanding the bytecode can help. Confused about why your program is spending so much time in a method call? Understanding the bytecode can help. Writing a new program for a new platform? Understanding the bytecode can help.

What is it like?

It’s less scary than you might think.

We’ll start by compiling a simple hello world program written in Kotlin.

fun main(args: Array<String>) {
    println("Hello, world!")
}

It only defines a single, top-level main function that prints “Hello, World!” to the standard output stream.

kotlinc hello_world.kt

This outputs a class file, which is really just the bytecode that the Java Virtual Machine (JVM, the abstract machine being programmed) will later be responsible for executing. The Java Development Kit (JDK) supplies a tool called javap that is helpful in digging through class files.

javap -c HelloWorldKt produces:

Compiled from "hello_world.kt"
public final class Hello_worldKt {
  public static final void main(java.lang.String[]);
    Code:
       0: aload_0
       1: ldc           #9                  // String args
       3: invokestatic  #15                 // Method kotlin/jvm/internal/Intrinsics.checkParameterIsNotNull:(Ljava/lang/Object;Ljava/lang/String;)V
       6: ldc           #17                 // String Hello, world!
       8: astore_1
       9: iconst_0
      10: istore_2
      11: getstatic     #23                 // Field java/lang/System.out:Ljava/io/PrintStream;
      14: aload_1
      15: invokevirtual #29                 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
      18: return
}

That’s not too bad. javap even adds helpful comments outlining what the hashes reference. I’ve omitted the constant pool (where the hashes are defined) for brevity. The -v option includes the pool in the output among other things.

What of it?

The Kotlin compiler is doing interesting things for a tiny program.

  1. The main function we defined (without the @JvmStatic annotation) is static anyway because it’s required for a JVM main method.
  2. The top level function is added to a class that’s created by the compiler because functions must belong to a class on the JVM.
  3. Because the Kotlin compiler can’t always verify nullability at compile time, it also enforces the nullability modeled in its type system at runtime (code #3). For example, libraries written in the Java programming language are often ambiguous unless they’re heavily annotated with @NonNull.

What now?

Interesting things can be learned by digging into bytecode. So far we’ve only looked at JVM bytecode, which the Android OS doesn’t know how to deal with. The OS ships with its own bytecode format that the Android Runtime (ART) executes. It’s called Davlik Bytecode or DEX and is the subject of the next post.