The JVM bytecodes are an instruction set that can be statically type checked. For example, bytecodes that invoke methods explicitly declare the expected argument and result types. The state in the execution model is stored in a set of local variables and on a stack. The types of each storage location can be determined by straightforward dataflow analysis that infers types for each stack entry and each local variable slot, at each point in the program.
It is important that extensions for parameterized types not
remove the ability to statically type check bytecodes. The standard
Java bytecode verifier works by verifying one class at a
time [Yel95]. A call to a method of another class is checked
using a declaration of the signature of that method, which is inserted
in the .class
file by the compiler. When the other class is loaded
dynamically, these declarations are checked to ensure that they match
the actual class signature.
Our extensions to the JVM preserve this efficient model of verification.
The code of a parameterized or non-parameterized class is
verified only once, in isolation from other classes,
thus verifying it for all legal instantiations of the code.
An instantiation of a class or interface can be checked
for legality by examining only signature information in the
.class
file for the parameterized class or interface; examining the bytecodes
in the .class
file is unnecessary.
Both the compiler and the verifier perform similar type checking, treating formal parameter types as ordinary types with a limited set of allowed operations. The important difference between the compiler and the verifier is that compiler variables have declared types, but the verifier must infer their types. The verifier must assign types to stack locations and local variable slots which are specific enough that an instruction can be type checked (e.g., if it invokes a method, the object must have that method). The assigned types must also be general enough that they include all possible results of the preceding instructions. For each instruction, the verifier records a type with this property, if possible. It uses standard iterative dataflow analysis techniques either to assign types to all stack locations for all instructions, or to detect that the program does not type check.
Because the bytecodes include branch instructions, different instructions may precede the execution of a particular instruction X. For type safety, the possible types of values placed on the stack by the preceding instructions must all be subtypes of the types expected by X. The core of the verifier performs this operation; it is a procedure to merge a set of types, producing the most specific supertype, or least upper bound in the type hierarchy. The dataflow analysis propagates this common supertype through X and sends its results on to the succeeding instruction(s). The analysis terminates when the types of all stack locations and local variable slots are stably assigned.
The primary change to the verifier for parameterized types
is a modification to this merge procedure.
To find the lowest common class in the hierarchy,
we walk up the hierarchy from all the classes to be merged. Each
time that a link is traversed to a superclass, we
apply the parameter substitution that is described by the extends
clause of the class definition. When a common class is reached in the
hierarchy, the actual parameters of the common class must be equal.
A[T] extends Object
B[U] extends A[U]
C[K,V] extends A[K]
D extends A[B[int}]]
Consider the class and interface hierarchy shown in Figure 7,
with the corresponding extends
clauses. The union of B[X] and
C[X,Y] can be conservatively approximated by successively moving up
the tree to find a common node while substituting parameters. The
result is as follows:
(B[X] union C[X,Y]) (A[X] union A[X]) = A[X]
So these two types are merged to produce A[X]. Similarly, for B[X] and C[Y,X] we have
(B[X] union C[Y,X]) (A[X] union A[Y]) Object
In this case, the merge result is Object
, since the parameters did
not match for A. Note that
unlike in the non-parameterized verifier, the lowest common superclass node
is not always sufficient for the merge result, since it may be
instantiated differently by the merged types.
Finally, consider merging B[B[int]] and D, which demonstrates
that parameterized and non-parameterized classes can be merged:
(B[B[int]] union D) A[B[int]]